Insns
The machine description file gcc/config/tms9900.md is the primary definition of the TMS9900 backend. Machine descriptions for gcc in general are documented here. They use the .md extension (an unfortunate clash with markdown) and essentially list a series of instructions with patterns (called "insns") that emit assembly opcodes to achieve a result. The machine description is parsed by the compiler build system and a number of header and source code files are emitted which are then built in to the compiler. In many places in the md file it is possible to include C code. Names of named insns are used to generate a function called gen_<insn-name> so names must follow the same rules as C function naming (can't include symbols such as "-" or "$" for example). The insn patterns are used by the compiler to match the RTL generated by the compiler front-end.
Insn names
Insns can be named or unnamed. Named insns are either standard names (like "movqi") or else should have a unique name that won't clash with another backend, typically by prepending the back end name (e.g. "tms9900_movqi"). Unnamed insns can simply have an empty name ("") or a name starting with an asterisk ("*my_insn"), in which case the name is emitted in debug output but otherwise treated as empty. In early compilation, gcc will look for named insns to match patterns in the RTL. For example, it will look for "movqi" if it wants to move a byte or "addhi3" if it wants to add 16-bit values. In later stages, it will try to match patterns which is where unnamed insns come in.
NOTE: only named insns can be referred to by define_expand, define_split, etc.
Library calls
When the compiler can't find a named insn to perform an operation it will often instead emit a library call so floating point methods are defined in libgcc. This can't be done for fundamental insns like "movqi" but can be done for example to add two 32-bit values. If gcc can't find an insn called "addsi3" it will emit "bl __addsi3" instead.
Sometimes, the compiler will try to generate its own code for missing insns. For example this code:
long x; |
int y; |
long z; |
|
void f(void) |
{ |
z = x>>y; |
} |
compiled without optimisation will generate a sequence of about 30 opcodes inline to do the 32-bit shift, but if compiled with -Os, the output will simply be "bl @__ashrsi3". This is a very significant space saver but does add a few cycles for the call and return.
Operands
Each insn takes several operands, typically 2 or 3. Operand values have a mode which indicates their size. The size of standard int (SI) in gcc is 32-bits. A half-word (HI) is 16 bits and a quarter-word (QI) is 8 bits. Other types like double-word (DI) are not used in the TMS9900 backend. There are also float and double (SF and DF) modes but there are no insns defined for these since the TMS9900 does not have any floating point hardware capability. One unusual mode is the condition code (CC) which is set by compare operations. This is implicitly "clobbered" by insns in gcc-4.4.0. The TMS9900 does an implicit compare to zero on most instructions.
The operand modes and count are usually appended to the end of the insn so "extendqihi2" means perform an extend operation where the src is QI (8 bits), the dest is HI (16 bits) and there are two operands.
Constraints and lengths
The type of the operand is defined using a single letter called a constraint. Constraints can be grouped together and comma-separated into alternatives. The values use in the TMS9900 backend are:
- r = a register
- R = a register indirect reference to a memory location (e.g. MOV *R4 ....)
- > = like R, but with a post increment (e.g. MOV *R4+....)
- Q = a label address of a memory location (e.g. MOV @LABEL ...)
- i = an immediate value (e.g. LI R1,>1234)
- n = a constant number, excludes labels and other constants
- 0 = not really a constraint. It just means this operand must be the same as operand[0] for this alternative. It could also be 1, 2... to copy constraints for other operands.
For r, R, > and Q, the opcode is usually the same, as the addressing mode doesn't change the opcode length, but the length of the instruction varies if using Q, so it makes sense to break these out in separate constraint "alternatives". Each alternative can have a slightly different implementation and/or can have a different length.
In addition, some values can use shorter methods for set, add, etc. Loading a zero into a register can be achieved with "CLR Rx" instead of "LI Rx,0"; loading -1 can be done with "SETO rx"; adding 1 to a reg can be done with "INC Rx" instead of "AI Rx,1" and so on. Each of these opcodes is 2 bytes shorter than an immediate. To make these easier to detect as alternatives, these constraints are also defined:
- L = +2 or -2; used to emit INCT or DECT
- M = -1; use to emit SETO or DEC
- N = +1; used to emit INC
- O = 0; used to emit CLR
NOTE: It is important with these constraints that they appear in the alternatives list before a broader constraint like 'i'. For example, the addhi3 operand2 constraints are:
(match_operand:HI 2 "general_operand" "rR>,Q,LMN,LMN,i,rR>,Q, rR>,Q")
|
Because 'i' will match any immediate value so must be after the 'LMN' constraints which are more specific.
Constraints can also be preceded by modifiers. The equals ("=") modifier means this value is set. This is always operand[0] in the TMS9900 backend. The percent ("%") modifier means operands are commutative, i.e. can be swapped because A+B is the same as B+A, etc. Sometimes this makes the compiler's life easier.
Having commutative operands is actually particularly important for the TMS9900 since arithmetic operators only take two operands. So there is no opcode to do for example x=a+b. Instead, we need to do x=a;x+=b. For this reason, generally, operand[1] uses the "0" constraint to say it must match operand[0].
Predicates
When declaring an insn, each operand is given a predicate that gcc uses to match operand types. The predicates used in the TMS9900 backend are:
- general_operand - anything will match
- register_operand - only a register will match
- nonimmediate_operand - anything except an immediate (constant) value will match
- const_int_operand - only a constant integer will match
- memory_operand - only a memory reference (R or Q, see below) will match
The pattern for an insn may look like this:
(set (match_operand:HI 0 "register_operand" "=r") ... )
|
The predicate is used first to find a matching pattern and then the constraint is used to identify which alternative to use.
NOTE: a constraint list MUST contain all alternatives that match. A pattern like:
(set (match_operand:HI 0 "general_operand" "=rR,Q") ... )
|
can cause a compiler error since the predicate will match an immediate value but gcc won't be able to find an alternative to use since there is no "i" constraint.
Extending and truncating
Changing an operand to a larger width is called an extend. Reducing it a shorter width is called a truncate. There is only way to truncate an operand but extending can be zero extend or sign extend. We can extend from QI to HI, HI to SI or QI to SI and truncate from SI to HI, SI to QI or HI to QI. This is gives us 9 unique insn patterns we need to implement!
Truncating or extend from SI to HI and vice versa is quite straightforward. It's just a matter of discarding a word or adding a zero word or an inverted word (>FFFF) if sign extending a negative value.
Generally, we can't truncate or extend a memory reference, only a register. There is usually no use case for this anyway as a memory location never changes size. But a value in a register could be loaded as a QI and then accessed as a HI given that gcc thinks low-order byte values are accessible in a wider word.
Going from SI to QI and vice versa needs to be done in two steps, SI to HI and then SI to QI, so these are candidates for define_expand. More on define_expand later. To be safe, I use a scratch reg for the intermediate value. For example:
(define_expand "zero_extendqisi2" |
[(set (match_operand:SI 0 "register_operand" "") |
(zero_extend:SI (match_operand:QI 1 "nonimmediate_operand" "")))] |
"" |
{ |
tms9900_debug_operands ("zero_extendqisi2", NULL_RTX, operands, 2); |
rtx tmp = gen_reg_rtx (HImode); |
emit_insn (gen_zero_extendqihi2 (tmp, operands[1])); |
emit_insn (gen_zero_extendhisi2 (operands[0], tmp)); |
DONE; |
} |
) |
NOTE: define_expand C code branches must end in either a DONE or a FAIL, or the insn pattern will remain in the list. These are macros that perform a goto in the generated code. Omitting these won't cause any error to be emitted but will cause really weird behaviour in the output that will take forever to track down. Been there, done that.
The trickier conversions are QI to HI and vice versa. For HI to QI truncation we don't care about the low-order byte and we can emit "SWPB" to just change the order of the bytes. To extend QI to HI we do care and must fill the high-order byte with zeros or a sign extension. Sign extension is done using an arithmetic shift "SRA Rx,8"and zero extension with a logical shift "SRL Rx,8".
Lengths, jumps and branches
Each "define_insn" and (not "define_expand" or "define_split") has an array of length values at the end, like this:
[(set_attr "length" "4,2,4,4,6")]
|
These lengths are the number of bytes emitted by each of the alternatives identified by the constraints. The number of length attributes must exactly match the number of alternatives (e.g. 5 alternatives must have 5 lengths). These lengths are used to determine the distance to a label for a jump instruction. In TMS9900 we can jump backwards by 256 bytes or forward by 254 bytes. If a longer jump is needed we need to emit a branch ("B") instruction.
In some cases we can't determine the exact length of an insn because it may vary depending on the circumstances at the time. For example, we may have to emit an additional SWPB to correct a dropped truncate. In these cases I've made the length to be the worst case length. This does mean we will occasionally emit a branch where a jump would have been sufficient but this should be rare.
The function output_branch() in gcc/config/tms9900/tms9900.c determines what opcodes to output for all conditional jumps and will replace a jump with an inverted jump over a branch if distance requires it.
Tailcalls
If the optimising compiler sees that the end sequence of two functions are identical, it will emit insns for only one of the functions and issue a branch instruction in the other and use a common epilogue and return. This is called a tailcall. It is identified in the md by checking the macro SIBLING_CALL_P(insn).
Defining insns
The main way to define an insn is by using "define_insn". To define the insn "movhi", which moves one word to another, we include this in the md file:
(define_insn "movhi" |
[(set (match_operand:HI 0 "nonimmediate_operand" "=rR>,rR>,Q, Q,r") |
(match_operand:HI 1 "general_operand" "rR>, Q, rR>,Q,i"))] |
"" |
{ |
if (which_alternative == 4) |
{ |
return("li %0, %1"); |
} |
else |
{ |
return("mov %1, %0"); |
} |
} |
[(set_attr "length" "2,4,4,6,4")]) |
|
Here we have defined the name (movhi), the operation (set) and described the two operands. The first (destination) operand is a "non_immediate" operand, i.e. any type except an immediate. The second (source) is a "general_operand" which can be anything. There are 5 alternatives defined. The Q constraint adds 2 bytes to the length of the opcode but r, R and > are the same length, so each combination of rR> and Q are broken out as separate alternatives. "MOV r,r" is 2 bytes but "MOV @label1,@label2" would be 6 bytes. The final constraint is for immediates. Since TMS9900 does not allow immediate values to be used with MOV we instead emit load immediate (LI) for alternative number 4. This operation is 4 bytes long.
The empty string after the pattern is the condition. This insn will only be used if the condition is met. This is usually empty.
NOTE: The entire insn is surrounded by a pair of brackets (). The pattern and the set_attr are each surrounded by a pair of square brackets []. Each part of the pattern is surrounded by brackets as well (). The nested brackets gets confusing. My most common cause of build failures is a missing or misplaced closing bracket.
Emitting code
There are many ways to emit code in the md file. The example above uses C code as indicated by the opening brace ("{"). If instead of a brace, there is a double quote, the text is output directly. For an insn like abs which has only one operand, could have done this:
(define_insn "abshi2"
[(set (match_operand:HI 0 "nonimmediate_operand" "=rR>,Q")
(abs:HI (match_operand:HI 1 "nonimmediate_operand" "0,0")))]
""
"abs %0"
[(set_attr "length" "2,4")])
|
Or we could also list instructions for each alternative like this (although there is no point as the same instruction is used for both alternatives):
(define_insn "abshi2"
[(set (match_operand:HI 0 "nonimmediate_operand" "=rR>,Q")
(abs:HI (match_operand:HI 1 "nonimmediate_operand" "0,0")))]
""
"@"
"abs %0"
"abs %0"
[(set_attr "length" "2,4")])
|
Or within the C code, we could just do { return "abs %0"; }
Or within C, we could call you can use { output_asm_insn ("abs %0", operands); return ""; }
Or if you only have a short piece of C code you could add a double quote followed by an asterisk and then a C statement with any double quotes escaped like this: "* return \"abs %0\""
I think there are just too many alternatives here tbh. I think mixing and matching methods of emitting output is going to be error prone. I personally like consistency so have tried to always use C code to emit opcodes. I also try and avoid mixing output_asm_insn("abs %0") and return "abs %0" in the same insn.
NOTE: since the MD file is compiled to C code, omitting a return will cause random output. Been there done that too.