Jump table: Difference between revisions

From NESdev Wiki
Jump to navigationJump to search
m (→‎Indirect jumping: linked to Wikipedia:reentrant)
(→‎Split Tables: in ca65 .lobyte is a prefix operator that doesn't specify that its parameter is raw data to be included (because maybe it's an instruction operand). So include both .byt and .lobyte to imply that)
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
A jump table is a table of code addresses, meant to be indexed by a selector value. For example, a game script might specify an action to be performed via an index, which is then used to select a routine from a jump table of available scripting actions. The alternative to a jump table is a long string of comparisons with each possible selector value. This approach is tedious to set up and maintain, and slow:
A jump table is a table of code addresses, meant to be indexed by a selector value. The program uses the selector to look up an address in the table, then jumps to that address.


; Jumps to routine selected by A
The alternative to a jump table is a long string of comparisons with each possible selector value. This approach is tedious to set up and slow in comparison to jump tables.
do_action:
 
        cmp #0
Jump tables are similar to "switch" statements found in other programming languages.
        bne not0
        jmp action0
not0:  cmp #1
        bne not1
        jmp action1
not1:  cmp #2
        bne not2
        jmp action2
not2:  ...


== Indirect jumping ==
== Indirect jumping ==
The NES doesn't have a JMP (addr,X) instruction, as other members of the 65xx family do. If it had one, a jump table would be trivial to implement, as in the following 65C02/Hu6280/65C816 code:
The NES supports JMP (addr), an indirect jump instruction, so a jump table can be implemented by copying the address to a temporary variable and jumping to it:
 
<pre>
; Jumps to the subroutine indexed by 'A'.
do_action:
      asl
      tax
      lda table,x
      sta ptr
      lda table+1,x
      sta ptr+1
      jmp (ptr)
</pre>


; Jumps to routine selected by A, from 0 to 127. High bit of A is ignored.
While there is no indirect version of JSR, the behavior can be imitated by combining regular JSR with JMP (addr):
do_action:
        asl a          ; A = A * 2
        tax
        jmp (table,x)
table:
        .word action0, action1, action2 ; ...


The NES does support a JMP (addr) instruction, so a jump table can be implemented by copying the address to a temporary variable, then jumping through it:
<pre>
do_action:
      asl
      tax
      lda table,x
      sta ptr
      lda table+1,x
      sta ptr+1
      jsr callSubroutineInPtr
      ; Do other stuff here once the called subroutine returns.
      rts


; Jumps to routine selected by A, from 0 to 127. High bit of A is ignored.
callSubroutineInPtr:
do_action:
      jmp (ptr)
        asl a
</pre>
        tax
        lda table,x
        sta temp
        lda table+1,x
        sta temp+1
        jmp (temp)


To call a routine via a selector, load the selector into A, then JSR do_action. This will then JMP to the appropriate routine, which will eventually RTS back to the routine that did JSR do_action. Essentially, you have JSR do_action, which then does JMP routine, which then does RTS; the JMP in the middle has no effect on the call stack. Note that the above code cannot be used without a JSR to it, since without that it's just a glorified JMP. That is, do_action must never be inlined in the code that uses it; it must always be called with JSR like a normal routine.
Two things to ensure:
* ''ptr'' must not lie on the edge of a page boundary (<code>$xxFF</code>), as a [[Errata#CPU|bug in the original 6502]] prevents it from being fetched properly. This is easy to avoid, especially if ''ptr'' is on the zero-page, but most assemblers should at least have a warning to catch the accidental case.
* ''ptr'' must only be used by a single thread. If you need a jump table in both your main thread code, and within an interrupt/NMI, a separate variable must be used for the interrupt thread to prevent conflicting use.


This routine has a significant limitation: if it's used by the game code and from an interrupt, perhaps the music driver, it can fail. The use of temp, a global variable, prevents the routine from being [[Wikipedia:Reentrant (subroutine) | reentrant]]. If the game code were in the middle of a call to do_action, and had already written temp, but then an interrupt occurs and its code then calls do_action, it will overwrite the value in temp. Then, after the interrupt handler returns and resumes the interrupted code, temp won't have the value expected by the original call to do_action. To overcome this, the stack must be used.
A stack-based alternative can avoid the need to use a ''ptr'' variable, at the expense of 1 extra cycle for RTS vs JMP (assuming ''ptr'' was zero-page). See below.


== Stack-based dispatch ==
== Stack-based dispatch ==
{{main|RTS Trick}}
{{main|RTS Trick}}
RTI and RTS allow use of the stack for holding the temporary address. These are normally used to return to some calling/interrupted code, but at their core they pull an address from the stack then jump to it. This is the behavior we need. We push the address on the stack, then execute RTI or RTS to jump to it. It's roundabout, but it solves the interrupt problem.
Like JMP (addr), the RTS and RTI instructions also perform indirect jumps. Rather than jumping to a pointer variable stored in zero page memory, RTS and RTI jump to the address on top of the stack.
 
To use RTI for indirect jumps, first push the address and then push the processor flags. Executing RTI will pop these values and jump.
<pre>
do_action:
      asl a
      tax
      lda table+1,x ; high byte first
      pha
      lda table,x
      pha
      php ; RTI expects processor flags on top.
      rti
</pre>
 
RTS is slightly more tricky, because it adds one to the address it pulls from the stack. This requires that every entry in the jump table have one subtracted from it. This could be done by the code, but it's tedious because the low byte must be decremented first, while the high byte needs to be pushed first. Thus, it's preferable to simply subtract one from each entry in the assembly source text:
 
<pre>
do_action:
      asl a
      tax
      lda table+1,x
      pha
      lda table,x
      pha
      rts
 
table:
      .word action0-1, action1-1, action2-1 ; ...
</pre>
 
The benefit of the RTS version is that it's three clock cycles faster than the RTI version, due to not having to push the flags. The disadvantage is that you must adjust every table entry by -1.


Even though RTI is meant for returning from an interrupt, it happens to be simpler to use for this technique, since it doesn't adjust the address it pulls from the stack:
== Split Tables ==


do_action:
The previous examples have used a single table storing two-byte addresses, but on the 6502 it is slightly more efficient to split the table into a table of low bytes and a table of high bytes:
        asl a
        tax
        lda table+1,x ; high byte first
        pha
        lda table,x
        pha
        php
        rti


RTS is more tricky, because it adds one to the address it pulls from the stack. This requires that every entry in the jump table have one subtracted from it. This could be done by the code, but it's tedious because the low byte must be decremented first, while the high byte needs to be pushed first. Thus, it's preferable to simply subtract one from each entry in the assembly source text:
<pre>
table_lo:
    .byt .lobyte(addr1)
    .byt .lobyte(addr2)
    .byt .lobyte(addr3)
table_hi:
    .byt .hibyte(addr1)
    .byt .hibyte(addr2)
    .byt .hibyte(addr3)


do_action:
; Jumps to the subroutine indexed by 'A'.
        asl a
do_action:
        tax
      tax
        lda table+1,x
      lda table_lo,x
        pha
      sta ptr
        lda table,x
      lda table_hi,x
        pha
      sta ptr+1
        rts
      jmp (ptr)
</pre>
table:
        .word action0-1, action1-1, action2-1 ; ...


The only benefit of the RTS version is that it's three clock cycles faster than the RTI version, due to not having to push the flags. Unless speed is critical, the RTI version is preferable because it doesn't require adjusting every entry in the table. Forgetting a -1 could result in hard-to-find bugs in the RTS version.
256 addresses can be contained in both tables this way as opposed to 128 using a single table.

Latest revision as of 22:50, 3 November 2023

A jump table is a table of code addresses, meant to be indexed by a selector value. The program uses the selector to look up an address in the table, then jumps to that address.

The alternative to a jump table is a long string of comparisons with each possible selector value. This approach is tedious to set up and slow in comparison to jump tables.

Jump tables are similar to "switch" statements found in other programming languages.

Indirect jumping

The NES supports JMP (addr), an indirect jump instruction, so a jump table can be implemented by copying the address to a temporary variable and jumping to it:

; Jumps to the subroutine indexed by 'A'.
do_action:
       asl
       tax
       lda table,x
       sta ptr
       lda table+1,x
       sta ptr+1
       jmp (ptr)

While there is no indirect version of JSR, the behavior can be imitated by combining regular JSR with JMP (addr):

do_action:
       asl
       tax
       lda table,x
       sta ptr
       lda table+1,x
       sta ptr+1
       jsr callSubroutineInPtr
       ; Do other stuff here once the called subroutine returns.
       rts

callSubroutineInPtr:
       jmp (ptr)

Two things to ensure:

  • ptr must not lie on the edge of a page boundary ($xxFF), as a bug in the original 6502 prevents it from being fetched properly. This is easy to avoid, especially if ptr is on the zero-page, but most assemblers should at least have a warning to catch the accidental case.
  • ptr must only be used by a single thread. If you need a jump table in both your main thread code, and within an interrupt/NMI, a separate variable must be used for the interrupt thread to prevent conflicting use.

A stack-based alternative can avoid the need to use a ptr variable, at the expense of 1 extra cycle for RTS vs JMP (assuming ptr was zero-page). See below.

Stack-based dispatch

Main article: RTS Trick

Like JMP (addr), the RTS and RTI instructions also perform indirect jumps. Rather than jumping to a pointer variable stored in zero page memory, RTS and RTI jump to the address on top of the stack.

To use RTI for indirect jumps, first push the address and then push the processor flags. Executing RTI will pop these values and jump.

do_action:
       asl a
       tax
       lda table+1,x ; high byte first
       pha
       lda table,x
       pha
       php ; RTI expects processor flags on top.
       rti

RTS is slightly more tricky, because it adds one to the address it pulls from the stack. This requires that every entry in the jump table have one subtracted from it. This could be done by the code, but it's tedious because the low byte must be decremented first, while the high byte needs to be pushed first. Thus, it's preferable to simply subtract one from each entry in the assembly source text:

do_action:
       asl a
       tax
       lda table+1,x
       pha
       lda table,x
       pha
       rts

table:
       .word action0-1, action1-1, action2-1 ; ...

The benefit of the RTS version is that it's three clock cycles faster than the RTI version, due to not having to push the flags. The disadvantage is that you must adjust every table entry by -1.

Split Tables

The previous examples have used a single table storing two-byte addresses, but on the 6502 it is slightly more efficient to split the table into a table of low bytes and a table of high bytes:

table_lo:
    .byt .lobyte(addr1)
    .byt .lobyte(addr2)
    .byt .lobyte(addr3)
table_hi:
    .byt .hibyte(addr1)
    .byt .hibyte(addr2)
    .byt .hibyte(addr3)

; Jumps to the subroutine indexed by 'A'.
do_action:
       tax
       lda table_lo,x
       sta ptr
       lda table_hi,x
       sta ptr+1
       jmp (ptr)

256 addresses can be contained in both tables this way as opposed to 128 using a single table.