Friday 8 December 2023

The TMS9900

Previous: Scope                Next: Coding in C              Up: Intro  

Background

The TMS9900 has some unique features among processors of its era.  Firstly, it is a true 16-bit CPU - a rarity in the 1970's.  All operations are natively 16-bit - the address bus doesn't even have a least significant bit!  In fact, some ops like MPY and DIV have a src or dst that is 32-bit values.  

One oddity is that the CPU has no general-purpose registers on-chip. Instead, it has a workspace pointer register that defines a location in memory where 16 16-bit general-purpose registers reside. These are numbered R0 through R15.

Details of the CPU are already detailed comprehensively elsewhere, so these are some highlights that are relevant to building a GCC backend:

  • Registers in memory.  The workspace pointer (WP) points to a memory location where general-purpose registers reside.  Changing the WP provides a mechanism for an instant context switch.  Very handy for interrupt vectors.
  • No stack.  There is no dedicated stack pointer in the TMS9900.  By convention, R10 is often used.  If we also decide the stack grows downward, then a post-increment instruction (e.g. MOV *R10+, R1) can be used to implement a POP. Unfortunately, we have no post-decrement addressing mode so a PUSH needs two instructions (DEC R10, MOV R1, *R10).
  • Big-endian.   In itself, being big-endian is not terribly unusual but coupled with the registers-in-memory aspect does present some unique challenges.
  • Restricted byte instructions.  Many instructions have both a word and a byte variant (e.g. MOV/MOVB) but many also do not (NEG, ABS, INV).
  • Unsigned mul/div.  The multiply (MPY) and divide (DIV) opcodes only take unsigned operands.  To do signed mul/div/mod we need to first note the signs of the operands, then take absolute values, and finally correct the sign of the results.
  • Incomplete set of conditional jumps.  TMS9900 is missing some conditional jump instructions.  For example, it has JL, JLE, JLT for logical less, less or equal and less than but no JLTE for less-than-or-equal.  This means sometimes having to emit an additional negative jump.
  • No AND instruction.  There is no binary &value function in TMS9900.  The closest is SZCB which is really &~value so INV Rx; SZCB Rx is needed.  This can cause problems when the compiler actually does try to do &~value as it will try and emit INV Rx; INV Rx; SZCB Rx.  Also INV is word only.

Registers usage

While the 16 general purpose registers are generally fully orthogonal, some have additional special functions:

  • R0 - used as shift count register in shift instructions, also cannot be used as an index reg
  • R11 - return address from a branch and link (BL) instruction
  • R12 - base address for CRU operations
  • R13,R14,R15 - status, workspace and return address from a BLWP instruction (e.g. an interrupt)

Of these, we don't really care about R12 thru R15 as we don't emit any CRU operations or BLWP but we do take special care when using R0 and R11.

In addition for gcc, we need a stack pointer and a base pointer. R9 and R10 are used by convention for these purposes.

R12 has another function as a static chain register but this hasn't been implemented yet.

8-bit bytes and 16-bit words

So far, these peculiarities pose no great challenge.  But there is one oddity that has taken much if not most of the effort in stabilising gcc and this is the nature of registers that are big-endian and stored in RAM.  Insomnia had found byte order issues early on and these still persist today.

Specifically, as with any big-endian system, the most significant byte is stored at the lower address in a word.  The sequence:

    LI R1, >ABCD
    MOV R1,@>1234

will place the word >ABCD into memory address >1234.  Being big-endian, >AB is placed in >1234 and >CD is placed in >1235.   All good so far, but if we then do:

    CLR  R2
    MOVB >1234, R2

We might logically expect R2 to contain the 16-bit value >00AB but unfortunately, instead, it contains >AB00.  It's logical - it has moved a byte into the lower byte address of the register - but not intuitive.  When gcc comes along this becomes a major issue as gcc assumes that a register accessed with a byte operation can subsequently be used as a word reg, but we cannot mix and match 8 and 16-bit datatypes on the TMS9900 without explicit conversions.  While gcc does emit conversion instructions (insns) to extend and truncate values, it can in certain cases decide to omit these, as we will see later, which causes major headaches.

Long (32-bit) values

32-bit integer values are handled by allocating two consecutive registers or memory locations.  For basic operations like MOV this is straightforward.  For ADD or S, it requires also checking the carry from the low word and adding it to the high word.  For MPY and DIV it is much more complicated so these have been relegated to libgcc, as have 32-bit shift operations.

Register allocation

Scratch registers

In some insns, it is necessary to allocate a temporary, or scratch, register to hold an intermediate value. Ideally, any insns that need a scratch reg can declare that they “clobber” a scratch reg.  In the TMS9900 backend, I have prevented the compiler from using R0 as a general reg.  There are a couple of reasons for this: R0 cannot be used as an index register (MOV R1,@label[R0] doesn’t do anything) and R0 is needed as the shift count for shift operations.  Since we know R0 is generally available for us, we can and do use it as a scratch reg for several insns.  This simplifies the md patterns slightly and lowers the chances of a failure to allocate a scratch reg.  One use of R0 for example is to do a comparison to zero. Since MOV does an implicit comparison to zero, emitting MOV Rx,R0 is a low-cost way to compare a register to zero.  But why not move the register to itself?  Well we could (and I did for a short while), but if comparing say a memory location to zero then a couple of issues arise.  Doing MOV @label,@label is 6 bytes long but also may have unintended side effects.  It is common for some memory locations to have different read and write behaviours.  An example is writing to the cartridge ROM addresses>6000 to >601E which causes a bank switch on some hardware.  Comparing one of these to zero using the label move was causing an inadvertent bank switch!

Parameters

The tms9900.h file defines R1 thru R8 as registers available to pass parameters.  Since most functions have 8 or fewer parameters, this allows almost all parameter passing to happen in registers. Parameters 9 onwards, or any parameters in a variable parameter list, are passed on the stack instead.  I am not sure if it is safe to use any of R1 through R8 for other purposes if it is known that the parameter count is less than 8.  I’m also not sure if it is safe to modify registers passed as parameters.  In any assembly library calls, I have saved any regs that are modified just to be safe.

General registers

R12 thru R15 are allocated as general regs but marked as non-volatile.  The compiler will save the existing value of these regs to the stack if it uses them in a function.  Optimised code will often use R9 as well when it can optimise away the stack frame.  I did try to mark R12-R15 as general regs, but I saw the compiler tended to spill to the stack instead for some reason so reverted them back to nvols.


No comments:

Post a Comment

I published the URL to this blog on atari age.  The posts are in reverse chronological order but the best place to start is the beginning .