Previous: Machine Descriptions Next: Forcing Memory Constants Up: Intro
There are several ways to create an instruction pattern, define_insn, define_expand, define_split, etc. Which one you choose can depend on which compiler pass you expect them to be used in.
RTL pattern matching
An important realisation I made early on is that in all but the final stages of compilation, gcc is using the RTL to drive aspects like as register allocations and optimisations. The compiler knows nothing about the opcodes - it doesn’t even see them - they are emitted directly to the output file. So while it feels like the opcode output is the most important thing to focus on, I think getting the instruction and RTL patterns right is actually more likely to get the compiler to output efficient code. Knowing when to use each method of defining a pattern took me a while to understand.
In the TMS9900 back-end I have used these patterns:
- define_insn
- define_expand
- define_insn_and_split
- define_peephole2
define_insn
The standard instruction pattern is defined using define_insn. This is generally straightforward but does have limitations. When there are many alternatives and options the output code can become convoluted. It can also become very repetitive when slight different insns have to share similar methods (signed and unsigned mul/div for example).
The named insns are used initially to build the RTL block list. Later passes match patterns and will use named patterns and unnamed patterns.
define_expand
define_insn_and_split
define_peephole2
(define_peephole2 |
[(set (match_operand:QI 0 "register_operand" "") |
(match_operand:QI 1 "memory_operand" "")) |
(set (match_operand:QI 2 "memory_operand" "") |
(match_dup 0))] |
"peep2_reg_dead_p(2, operands[0])" |
[(set (match_operand:QI 2 "memory_operand" "") |
(match_operand:QI 1 "memory_operand" ""))] |
{ |
tms9900_debug_operands ("peep-movqi-mem-mem", NULL_RTX, operands, 3); |
} |
) |
Problems during compiler passes
- expand - The expand pass expands all the define_expand patterns.
- combine - The combine pass looks for insns that can be merged together.
- reload - The reload pass reduces the register usage and spills values to the stack when it can't allocate a register
- ira - The ira (integrated register allocator) actually does the register allocation.
- peephole2 - The peephole pass looks for sequences of instructions that match a pattern.
- final - emits the actual assembly
Expand and insn lists
As the compiler does its thing, it generates lists of insns for each function. Because it doesn't actually emit any code until a function has been completely parsed (as far as I can tell) we know a lot about the function even before we have emitted any code. For example, we know if it is a leaf (doesn't call any other functions) and don't need to save R11 in the function prologue.
Insns are numbered using a UID. In the dumps for each pass, each insn is listed with its UID and the UID of the previous and next insn. This is useful for finding where insns were combined or eliminated.
If we compiled this code fragment:
void f (void) { int x = 3; } |
gcc would create a list of blocks (with only one block in this case). In the file gccdump.128r.expand we would see:
(insn 6 5 11 3 <stdin>:3 (set (mem/c/i:HI (reg/f:HI 17 virtual-stack-vars) [0 x+0 S2 A16]) (mem/u/c/i:HI (symbol_ref/u:HI ("*LC0") [flags 0x2]) [0 S2 A16])) -1 (nil)) |
This does look confusing initially, but essentially it says it is an insn, numbered 6, previous is 5, next is 11, input is stdin line 3 and it is doing a "set" of a HI reg to a symbol. It has allocated a pseudo-reg 17 for now to hold the result. Any reg higher than the highest physical reg (r15 in our case) is a pseudo-reg. This will be allocated physical regs later in the ira and reload passes.
The compiler will look for a named insn called "movhi" (move a halfword) to do this operation. Our implementation of movhi is a define_expand which emits a CLR or a SETO if it can or otherwise emits a tms9900_movhi insn. The expand also replaces an immediate with a const if required.
In the next pass (gccdump.133r.vregs) we can see a reference to tms9900_movhi and an allocation on the stack frame (R9 = frame pointer):
(insn 6 5 11 3 <stdin>:6 (set (mem/c/i:HI (reg/f:HI 9 r9) [0 x+0 S2 A16]) (mem/u/c/i:HI (symbol_ref/u:HI ("*LC0") [flags 0x2]) [0 S2 A16])) 12 {tms9900_movhi} (nil)) |
(insn 6 3 14 2 <stdin>:6 (set (mem/c/i:HI (reg/f:HI 10 r10) [0 x+0 S2 A16]) (mem/u/c/i:HI (symbol_ref/u:HI ("*LC0") [flags 0x2]) [0 S2 A16])) 12 {tms9900_movhi} (nil))
|
Combiner
; This handles reverse-order not-and combinations (define_insn "*not_andhi" [(set (match_operand:HI 0 "nonimmediate_operand" "=rR>,rR>,Q,Q") (and:HI (not:HI (match_operand:HI 1 "nonimmediate_operand" "rR>,Q,rR>,Q")) (match_operand:HI 2 "nonimmediate_operand" "0,0,0,0")))] "" { tms9900_debug_operands ("*not_andhi", insn, operands, 3); return "szc %1, %0"; } [(set_attr "length" "2,4,4,6")])
|
int x;
int y; void f() { x = x&~y; }
|
def f
f szc @y, @x b *r11
|
No comments:
Post a Comment