Thursday, 1 February 2024

Missing extend and truncate insns

Previous: Forcing Memory Constants            Next: Dev,Release,Install                Up: Intro  

Throughout the past few months, there have been frequently reports of cases where the compiler missed an extend or truncate insn.  A simple example is where a short and char are added together.  Insomia's blog shows that I'm not the only one who wrestled with this problem:

Various fixes have been tried including adding a separate subreg pass to the compiler to detect when subregs have been inserted.  I've disabled this pass as I don't think modifying the compiler is the way to go and in any case it didn't seem to be working as intended anyway.

There didn't seem to me to be a pattern to the dropped extends until I noticed it often happened when doing binary "and" operation.  So while this C code:

char c;

int i;
int z;

void f()
{
  z = (c + i) & 0x1f;
}

correctly emitted this assembly unoptimised:

def f
f
movb @c, r1
movb r1, r2
sra  r2, 8
mov  @i, r1
a    r2, r1
andi r1, >1F
mov  r1, @z
b    *r11


BUT when using -Os or -O2, it emitted this:

def f
f
movb @c, r1
a    @i, r1
andi r1, >1F
mov  r1, @z
b    *r11


Shorter, but in fact too short.  Why did the optimiser drop the extend "SRA r2,8"?  It turns out it is because of the mask in the binary and.  The compiler determined that the bits in >1F don't affect the high order byte of the result so the extend was redundant.  But this is based on the (faulty) assumption that the low byte of the register is written to MOVB whereas we know in the TMS9900 it is the MSB that is changed.

Finding exactly where in GCC this reduction happens would be like searching for a needle in a haystack (it happens in the combine pass, but gcc/combine.c alone is 13KLOC) and modifying gcc is not something I want to do.

Looking through the GCC dumps I could see that insn 6 and 7 in gccdump.158r.dce :

(insn 6 3 7 2 <stdin>:7 (set (reg:HI 22 [ c+-1 ])
        (sign_extend:HI (mem/c/i:QI (symbol_ref:HI ("c") <var_decl 0x7f7e81e82000 c>) [0 c+0 S1 A8]))) 17 {extendqihi2} (nil))

(insn 7 6 8 2 <stdin>:7 (set (reg:HI 23)
        (plus:HI (reg:HI 22 [ c+-1 ])
            (mem/c/i:HI (symbol_ref:HI ("i") <var_decl 0x7f7e81e820a0 i>) [2 i+0 S2 A16]))) 63 {addhi3} (expr_list:REG_DEAD (reg:HI 22 [ c+-1 ])
        (nil)))

were replaced in gccdump.159r.combine with:

(insn 7 6 8 2 <stdin>:7 (set (reg:HI 23)
        (plus:HI (mem/c/i:HI (symbol_ref:HI ("i") <var_decl 0x7f7e81e820a0 i>) [2 i+0 S2 A16])
            (subreg:HI (mem/c/i:QI (symbol_ref:HI ("c") <var_decl 0x7f7e81e82000 c>) [0 c+0 S1 A8]) 0))) 63 {addhi3} (nil))   
 
So I could see the extend had been replaced with a subreg.  There was a message at the start of the combine dump that said:

Failed to match this instruction:
(set (reg:HI 23)
    (plus:HI (sign_extend:HI (mem/c/i:QI (symbol_ref:HI ("c") <var_decl 0x7f7e81e82000 c>) [0 c+0 S1 A8]))
        (mem/c/i:HI (symbol_ref:HI ("i") <var_decl 0x7f7e81e820a0 i>) [2 i+0 S2 A16])))

So it is possible I could have provided a pattern that combines plus and sign_extend and this would have satisfied the combiner.  But I didn't want to duplicate every pattern with variants that include extends.

Instead, looking further on, I could see that in gccdump.174r.ira the subreg ref disappeared and was replaced with an offset of -1:

(insn 7 17 8 2 <stdin>:7 (set (reg:HI 1 r1 [23])
        (plus:HI (reg:HI 1 r1 [+-1 ])
            (mem/c/i:HI (symbol_ref:HI ("i") <var_decl 0x7f7e81e820a0 i>) [2 i+0 S2 A16]))) 63 {addhi3} (nil))

This makes sense.  The register allocator (IRA) has to allocate real regs to replace subregs.  As far as it is concerned, R1 with an offset of -1 contains the byte it wants.  The macro REG_OFFSET returns this values, so at least we have a way to detect when an extend has been dropped.  So I added an emit of SRA whenever I saw an offset of -1.  But then I got another bug report for code like this:

int f (char c, unsigned int len)
{
    return c+len;
}

which was emitting an SRA where one already existed.  Looking at gccdump.s I could see:

def f
f

; extendqihi2-8 : (insn 8 5 9 <stdin>:2 (set (reg:HI 1 r1 [orig:26 c+-1 ] [26])
;         (sign_extend:HI (reg:QI 1 r1 [ c ]))) 17 {extendqihi2} (nil))

sra  r1, 8

; addhi3-14 : (insn 14 9 20 <stdin>:4 (set (reg/i:HI 1 r1)
;         (plus:HI (reg:HI 1 r1 [orig:26 c+-1 ] [26])
;             (reg:HI 2 r2 [ len ]))) 63 {addhi3} (expr_list:REG_DEAD (reg:HI 2 r2 [ len ])
;         (nil)))

So here the compiler actually did extend and the offset refers to the original reg (pseudo reg 26), not the current reg.  So I had to add another condition:

  (ORIGINAL_REGNO (operand) == REGNO (operand)) 

And eventually I had a function called tms9900_operand_subreg_offset() that detects dropped extends or truncates.  If it returns true, the caller must emit a corrective opcode.

A call to this function has been added to addhi3, subhi3 and andhi3 and they take corrective action if it returns true.  I subsequently noticed that these insns could have an offset present in either operands[1] or operands[2] so I had to make two calls in each.  

But so far, touch wood, I haven't had any more reports of missed truncates or extends.



No comments:

Post a Comment

I published the URL to this blog on atari age.  The posts are in reverse chronological order but the best place to start is the beginning .