Scanning large tables
Scanning tables larger than 256 bytes is more involved than scanning small tables, because there are no 16-bit equivalents to X and Y. Consider clearing a large block of RAM quickly. A straight-forward approach is to keep a 16-bit index in X and Y, and increment it until it reaches the desired index:
begin = ... ; address of first byte to clear count = ... ; number of bytes .zeropage addr: .res 2 .code lda #<begin sta addr lda #>begin sta addr+1 lda #0 ; X and Y form 16-bit index, with Y holding low byte ldx #0 ldy #0 loop: ; Clear one byte sta (addr),y ; Go to next byte iny bne nohigh inx inc addr+1 nohigh: ; See if low and high bytes of index match count cpy #<count bne loop cpx #>count bne loop
Using the same approach used in Efficient Forwards Scanning, we can make this more efficient.
First, consider the case of clearing all 64K of RAM. X and Y start out at 0, and they increment up to $FF $FF, then the loop ends. If we wanted to clear just the last $103 bytes of this 64K block, we'd start with the values the clear would have when it only has $103 bytes remaining. So we'd start with X=$FE and Y=$FD, and add $FE to addr+1. This would first clear a byte at $FEFD, then $FEFE, $FEFF, and the 256 bytes beginning at $FF00, as desired:
lda #<begin sta addr lda #>(begin + $FE00) sta addr+1 lda #0 ldx #$FE ldy #$FD loop: sta (addr),y iny bne nohigh inx inc addr+1 nohigh: cpy #0 bne loop cpx #0 bne loop
Now, if we can have the above loop clear the last $103 bytes of the 64K block "beginning" at address 0, we can have it clear the last $103 bytes of a 64K block that "begins" at any address in memory (it would wrap around to the beginning). Thus, we can use it to clear a block of $103 bytes anywhere in memory; it doesn't matter whether they are the last $103 bytes of a 64K-byte block, since the loop only accesses those $103 bytes. But it does add $FD to the address, since Y starts out with that value, so we must subtract that. The following efficiently clears $103 bytes anywhere in memory:
lda #<(begin - $FD) sta addr lda #>(begin - $FD) sta addr+1 lda #0 ldx #$FE ldy #$FD loop: sta (addr),y iny bne loop inc addr+1 inx bne loop
Addr starts out as begin-$FD. The first time through the loop, Y=$FD, so it accesses a byte at begin-$FD+$FD, that is, at begin, as desired. Then Y=$FE, so it accesses begin+1. Then Y=$FF and it accesses begin+2. Y wraps around to 0 and the inner loop falls through to the outer loop, which increments the high byte of addr and increments X to $FF and loops back. Now, Y=0 and addr=begin-$FD+$100, or more simply, begin+3, which is the desired address to be accessing this time through. Y keeps incrementing to $FF, then the inner and outer loops end, and it's cleared exactly $103 bytes of memory.
The general pattern is to load X with the high byte of $10000-count, Y with the low byte, and also subtract Y's initial value from begin. $10000-count can more simply be calculated as -count, since you get the same result:
.code lda #<(begin - <-count) sta addr lda #>(begin - <-count) sta addr+1 lda #0 ldx #>-count ldy #<-count loop: sta (addr),y iny bne loop inc addr+1 inx bne loop
If the address and/or size isn't known until the program is running, the above must be done at run-time with instructions, rather than relying on the assembler:
.zeropage addr: .res 2 size: .res 2 .code ; Clears memory from addr through addr+size-1. ; Clears all memory if size=0. ; Adjust addr. Subtracting low byte of negated size is the same ; as adding the low byte of size and then a high byte of $FF, ; which greatly simplifies this calculation. lda addr clc adc size sta addr lda addr+1 adc #$FF sta addr+1 ; Subtract size from 0 to negate it, and put that into X and Y lda #0 sec sbc size tay lda #0 sbc size+1 tax lda #0 loop: sta (addr),y iny bne loop inc addr+1 inx bne loop
An alternate approach of similar efficienty is a dual loop, with the first handling full 256-byte pages efficiently and the second handling the final partial page:
; Clears memory from addr through addr+size-1. ; Clears nothing if size=0. lda #0 ldy #0 ; Load page count and handle case where it's zero ldx size+1 beq final ; Do full pages first pages: sta (addr),y iny bne pages inc addr+1 dex bne pages final: ldx size beq done ldy #0 loop: sta (addr),y iny dex bne loop done:
This dual-loop approach works if the loop operation is short (here just the STA (addr),Y). If lots is done to each table entry, the previous approach is preferable due to reduced code size and not having to duplicate the code that operates on the table.