Thursday, 29 February 2024

I published the URL to this blog on atari age.  The posts are in reverse chronological order but the best place to start is the beginning.

Wednesday, 28 February 2024

Dev, Release, Install, Test, Debug

Previous: Missing Extends                           Up: Intro  

There are a few scripts in the repo for installing gcc and binutils as well as scripts to create new patches and debian packages.

I work exclusively on Linux (debian bookworm currently) and have a script to create a .deb package which can be used for debian, ubuntu and others.  The .deb package installs bins, libs, etc, in /opt/tms9900-gcc by default.  Many users though are on Windows or MacOS and I am relying on others to build packaged releases for those OSs since I can't.

Development

The src files are contained in the "dev" directory.  Only modifications to gcc and binutils are stored in the repo.  This means only storing about 50 files as opposed to over 60,000.  The dev directory contains two copies of any files that have been modified.  The original files for gcc are in gcc-4.4.0-orig and the modified files are in gcc-4.4.0.  Similarly for binutils.  New files (including "gcc/config/tms9900/tms9900.md") have a corresponding empty file in the orig directory.  This allows diff to create a patch that includes all changes and all additions to the stock gcc-4.4.0 distribution.

Files to be edited are mostly in "dev/gcc-4.4.0/gcc/config/tms9900".  This dir contains the md file as well as a few C and H files.  It also contains some library asm functions which get included in libgcc.a

To rebuild the compiler in the dev hierarchy, from the "dev/gcc-4.4.0/build" directory, you can use the following commands to build and install gcc and libgcc:

    make all-gcc
    make install-gcc
    make all-target-libgcc
    make install-target-libgcc

This assumes of course this directory has previously been configured for builds, from the build directory, using:

    ../configure --prefix <target-dir> --target=tms9900 --enable-languages=c,c++

Release

The script mkpatches.sh will generate new patch files.  It takes a version number as a parameter.  It will  update the gcc DATESTAMP and REVISION files with the current date and the supplied version number.  The patch revision can be queried in code by accessing the __TMS9900_PATCH_MAJOR__ and  __TMS9900_PATCH_MINOR__ macros which return integer values.

Install

The primary install tool is a script called install.sh.  It downloads gcc-4.4.0 and binutils-2.19.1, applies the patches to them, builds the compiler and toolchain, and installs to the specified directory.  This script will create a directory called build in the pwd and do all its work in there.  It marks progress by creating progress files such as ".gcc_patched" and ".gcc_built".  To do a clean build from updated patches, the easiest thing to do is "rm -rf build" or at the very least remove the progress files.

Test

There are a number of unit test files in the "test" directory.  These are broken up so each fits into a single cartridge ROM (8KiB).  A single EA5 test binary is on the TODO list.  The tests depend on libti99 to provide printf and other functions.  The file "tap.c" contains some basic functions to run tests and produce TAP output.  For example, the test_subreg.c file produces this output:



As issues are reported in the forums on Atari age, I add new tests to reproduce these and run them before each release.

Debug

If all goes well, you don't need this step.  But of course you probably will.  Finding out where things go wrong is of course very important.  Usually, this involves compiling a single module and examining the output.  I have found it easier to invoke the cc1 part of the compiler instead of running gcc.  cc1 takes input from stdin and creates (by default) an assembly output file called "gccdump.s".  My debug compile script looks like this:

    #!/bin/bash
    INCLUDE=-I../libti99i
    CFLAGS="$2 $3 $4 $5 $6 -Wall -da -std=c99 -fomit-frame-pointer     $INCLUDE"
    ../dev/gcc-4.4.0/build/gcc/cc1 $CFLAGS < $1

I can then add or not add options such as "-Os" as needed.  The "-da" flag to cc1 creates dump files of the various compiler passes, such as "gccdump.128r.expand", "gccdump.159r.combine" and so on.  These are not the easiest files to read but comparing them to each other can be useful to find where various changes to the output occurred.

I also added a command line switch called "-minline_rtl" which outputs the RTL patterns to the asm_out_file (the file being created as cc1 runs) as comments.  I found this very helpful to understand why certain patterns matched and some didn't.  For example, a standard function prologue might have output like this:

    ; addhi3-20 : (insn 20 4 21 <stdin>:6 (set (reg/f:HI 10 r10)
    ;         (plus:HI (reg/f:HI 10 r10)
    ;             (const_int -2 [0xfffffffffffffffe]))) 63 {addhi3}     (nil))
     dect r10
    
    ; movhi-21 : (insn 21 20 22 <stdin>:6 (set (mem:HI (reg/f:HI 10     r10) [0 S2 A16])
    ;         (reg:HI 11 r11)) 12 {tms9900_movhi} (expr_list:REG_DEAD     (reg:HI 11 r11)
    ;         (nil)))
       mov  r11, *r10

Here I can see that addhi3 insn number 20 emitted the DECT and movhi3 insn number 21 emitted the MOV.

ASIDE: Note the REG_DEAD note on R11 after the MOV.  This means the compiler does not need the value in R11 any more after this insn.  This is sometimes useful to know as we avoid use of scratch regs and so on when we know it is safe to destroy the contents of a reg.

NOTE: writing tests that work with -Os or -O2 can be tricky.  The compiler is very crafty at eliminating code that it sees produces invariant results.  A function that adds two constants together and assigns the result to a local variable will just get reduced down to assigning the result for example.  Declaring vars as non static and external to the test function ensures the compiler can't assume they hold particular values.  Assigning test values in a separate function also ensures it can't make assumptions about their contents.  Declaring functions as non static ensures they don't get inlined or eliminated.

Dummy md

I found it useful also to create a dummy md file and this is included in the repo.  This file creates no output, just comments.  The output is only to help me understand why some patterns get matched and others don't and so on.  It also lets me play around with constraints and predicates without having to worry about alternatives or lengths.



Thursday, 1 February 2024

Missing extend and truncate insns

Previous: Forcing Memory Constants            Next: Dev,Release,Install                Up: Intro  

Throughout the past few months, there have been frequently reports of cases where the compiler missed an extend or truncate insn.  A simple example is where a short and char are added together.  Insomia's blog shows that I'm not the only one who wrestled with this problem:

Various fixes have been tried including adding a separate subreg pass to the compiler to detect when subregs have been inserted.  I've disabled this pass as I don't think modifying the compiler is the way to go and in any case it didn't seem to be working as intended anyway.

There didn't seem to me to be a pattern to the dropped extends until I noticed it often happened when doing binary "and" operation.  So while this C code:

char c;

int i;
int z;

void f()
{
  z = (c + i) & 0x1f;
}

correctly emitted this assembly unoptimised:

def f
f
movb @c, r1
movb r1, r2
sra  r2, 8
mov  @i, r1
a    r2, r1
andi r1, >1F
mov  r1, @z
b    *r11


BUT when using -Os or -O2, it emitted this:

def f
f
movb @c, r1
a    @i, r1
andi r1, >1F
mov  r1, @z
b    *r11


Shorter, but in fact too short.  Why did the optimiser drop the extend "SRA r2,8"?  It turns out it is because of the mask in the binary and.  The compiler determined that the bits in >1F don't affect the high order byte of the result so the extend was redundant.  But this is based on the (faulty) assumption that the low byte of the register is written to MOVB whereas we know in the TMS9900 it is the MSB that is changed.

Finding exactly where in GCC this reduction happens would be like searching for a needle in a haystack (it happens in the combine pass, but gcc/combine.c alone is 13KLOC) and modifying gcc is not something I want to do.

Looking through the GCC dumps I could see that insn 6 and 7 in gccdump.158r.dce :

(insn 6 3 7 2 <stdin>:7 (set (reg:HI 22 [ c+-1 ])
        (sign_extend:HI (mem/c/i:QI (symbol_ref:HI ("c") <var_decl 0x7f7e81e82000 c>) [0 c+0 S1 A8]))) 17 {extendqihi2} (nil))

(insn 7 6 8 2 <stdin>:7 (set (reg:HI 23)
        (plus:HI (reg:HI 22 [ c+-1 ])
            (mem/c/i:HI (symbol_ref:HI ("i") <var_decl 0x7f7e81e820a0 i>) [2 i+0 S2 A16]))) 63 {addhi3} (expr_list:REG_DEAD (reg:HI 22 [ c+-1 ])
        (nil)))

were replaced in gccdump.159r.combine with:

(insn 7 6 8 2 <stdin>:7 (set (reg:HI 23)
        (plus:HI (mem/c/i:HI (symbol_ref:HI ("i") <var_decl 0x7f7e81e820a0 i>) [2 i+0 S2 A16])
            (subreg:HI (mem/c/i:QI (symbol_ref:HI ("c") <var_decl 0x7f7e81e82000 c>) [0 c+0 S1 A8]) 0))) 63 {addhi3} (nil))   
 
So I could see the extend had been replaced with a subreg.  There was a message at the start of the combine dump that said:

Failed to match this instruction:
(set (reg:HI 23)
    (plus:HI (sign_extend:HI (mem/c/i:QI (symbol_ref:HI ("c") <var_decl 0x7f7e81e82000 c>) [0 c+0 S1 A8]))
        (mem/c/i:HI (symbol_ref:HI ("i") <var_decl 0x7f7e81e820a0 i>) [2 i+0 S2 A16])))

So it is possible I could have provided a pattern that combines plus and sign_extend and this would have satisfied the combiner.  But I didn't want to duplicate every pattern with variants that include extends.

Instead, looking further on, I could see that in gccdump.174r.ira the subreg ref disappeared and was replaced with an offset of -1:

(insn 7 17 8 2 <stdin>:7 (set (reg:HI 1 r1 [23])
        (plus:HI (reg:HI 1 r1 [+-1 ])
            (mem/c/i:HI (symbol_ref:HI ("i") <var_decl 0x7f7e81e820a0 i>) [2 i+0 S2 A16]))) 63 {addhi3} (nil))

This makes sense.  The register allocator (IRA) has to allocate real regs to replace subregs.  As far as it is concerned, R1 with an offset of -1 contains the byte it wants.  The macro REG_OFFSET returns this values, so at least we have a way to detect when an extend has been dropped.  So I added an emit of SRA whenever I saw an offset of -1.  But then I got another bug report for code like this:

int f (char c, unsigned int len)
{
    return c+len;
}

which was emitting an SRA where one already existed.  Looking at gccdump.s I could see:

def f
f

; extendqihi2-8 : (insn 8 5 9 <stdin>:2 (set (reg:HI 1 r1 [orig:26 c+-1 ] [26])
;         (sign_extend:HI (reg:QI 1 r1 [ c ]))) 17 {extendqihi2} (nil))

sra  r1, 8

; addhi3-14 : (insn 14 9 20 <stdin>:4 (set (reg/i:HI 1 r1)
;         (plus:HI (reg:HI 1 r1 [orig:26 c+-1 ] [26])
;             (reg:HI 2 r2 [ len ]))) 63 {addhi3} (expr_list:REG_DEAD (reg:HI 2 r2 [ len ])
;         (nil)))

So here the compiler actually did extend and the offset refers to the original reg (pseudo reg 26), not the current reg.  So I had to add another condition:

  (ORIGINAL_REGNO (operand) == REGNO (operand)) 

And eventually I had a function called tms9900_operand_subreg_offset() that detects dropped extends or truncates.  If it returns true, the caller must emit a corrective opcode.

A call to this function has been added to addhi3, subhi3 and andhi3 and they take corrective action if it returns true.  I subsequently noticed that these insns could have an offset present in either operands[1] or operands[2] so I had to make two calls in each.  

But so far, touch wood, I haven't had any more reports of missed truncates or extends.



I published the URL to this blog on atari age.  The posts are in reverse chronological order but the best place to start is the beginning .