Scanning large tables: Difference between revisions

From NESdev Wiki
Jump to navigationJump to search
m (moved Scanning Large Tables to Scanning large tables: only initial word should be capitalized)
(Removal of syntax highlighter)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
Scanning tables larger than 256 bytes is more involved than [[Scanning Tables | scanning small tables]], because there are no 16-bit equivalents to X and Y. Consider clearing a large block of RAM quickly. A straight-forward approach is to keep a 16-bit index in X and Y, and increment it until it reaches the desired index:
Scanning tables larger than 256 bytes is more involved than [[Scanning Tables | scanning small tables]], because there are no 16-bit equivalents to X and Y. Consider clearing a large block of RAM quickly. A straight-forward approach is to keep a 16-bit index in X and Y, and increment it until it reaches the desired index:


begin = ... ; address of first byte to clear
<pre>
count = ... ; number of bytes
begin = ... ; address of first byte to clear
count = ... ; number of bytes
.zeropage
 
addr:    .res 2
.zeropage
addr:    .res 2
.code
 
        lda #<begin
.code
        sta addr
        lda #<begin
        lda #>begin
        sta addr
        sta addr+1
        lda #>begin
       
        sta addr+1
        lda #0
       
       
        lda #0
        ; X and Y form 16-bit index, with Y holding low byte
       
        ldx #0
        ; X and Y form 16-bit index, with Y holding low byte
        ldy #0
        ldx #0
       
        ldy #0
loop:  ; Clear one byte
       
        sta (addr),y
loop:  ; Clear one byte
       
        sta (addr),y
        ; Go to next byte
       
        iny
        ; Go to next byte
        bne nohigh
        iny
        inx
        bne nohigh
        inc addr+1
        inx
nohigh:
        inc addr+1
        ; See if low and high bytes of index match count
nohigh:
        cpy #<count
        ; See if low and high bytes of index match count
        bne loop
        cpy #<count
        cpx #>count
        bne loop
        bne loop
        cpx #>count
        bne loop
</pre>


Using the same approach used in [[Scanning Tables#Efficient Forwards Scanning | Efficient Forwards Scanning]], we can make this more efficient.
Using the same approach used in [[Scanning Tables#Efficient Forwards Scanning | Efficient Forwards Scanning]], we can make this more efficient.


First, consider the case of clearing all 64K of RAM. X and Y start out at 0, and they increment up to $FF $FF, then the loop ends. If we wanted to clear just the last $103 bytes of this 64K block, we'd start with the values the clear would have when it only has $103 bytes remaining. So we'd start with X=$FE and Y=$FD, and add $FE to addr+1. This would first clear a byte at $FEFD, then $FEFE, $FEFF, and the 256 bytes beginning at $FF00, as desired:
First, consider the case of clearing all 64K of RAM. X and Y start out at 0, and they increment up to $FF $FF, then the loop ends.


        lda #<begin
<pre>
        sta addr
        lda #0
        lda #>(begin + $FE00)
        sta addr
        sta addr+1
        lda #0
       
        sta addr+1
        lda #0
       
       
        lda #0
        ldx #$FE
       
        ldy #$FD
        ldx #0
       
        ldy #0
loop:  sta (addr),y
       
       
loop:  sta (addr),y
        iny
       
        bne nohigh
        iny
        inx
        bne nohigh
        inc addr+1
        inx
nohigh:
        inc addr+1
        cpy #0
nohigh:
        bne loop
        cpy #0
        cpx #0
        bne loop
        bne loop
        cpx #0
        bne loop
</pre>
 
Since we end once X and Y wrap to 0, we can optimize the inner loop to not check X every time:
 
<pre>
        lda #0
        sta addr
        lda #0
        sta addr+1
       
        lda #0
       
        ldx #0
        ldy #0
       
loop:  sta (addr),y
        iny
        bne loop
 
        inc addr+1
        inx
        bne loop
</pre>
 
If we wanted to clear just the last $103 bytes of memory, we'd start with the values the above clear loop has when there are only $103 bytes remaining. So we'd start with X=$FE and Y=$FD, and add $FE00 to addr. This would first clear a byte at $FEFD, then $FEFE, $FEFF, and the 256 bytes beginning at $FF00, as desired:
 
<pre>
        lda #0
        sta addr
        lda #$FE
        sta addr+1
       
        lda #0
       
        ldx #$FE
        ldy #$FD
       
loop:  sta (addr),y
        iny
        bne loop
 
        inc addr+1
        inx
        bne loop
</pre>
 
Now, if we can have the above loop clear the last $103 bytes of the 64K block "beginning" at address 0, we can have it clear the last $103 bytes of a 64K block that "begins" at any address in memory (it would wrap around to the beginning). For example, a 64K block at $1234 extends through $FFFF, then back to $0000 and through $1233, covering 64K total, and the last $103 bytes of this table would go from $1131-$1233.


Now, if we can have the above loop clear the last $103 bytes of the 64K block "beginning" at address 0, we can have it clear the last $103 bytes of a 64K block that "begins" at any address in memory (it would wrap around to the beginning). Thus, we can use it to clear a block of $103 bytes ''anywhere'' in memory; it doesn't matter whether they are the last $103 bytes of a 64K-byte block, since the loop only accesses those $103 bytes. But it does add $FD to the address, since Y starts out with that value, so we must subtract that. The following efficiently clears $103 bytes anywhere in memory:
Thus, we can use this to clear a block of $103 bytes ''anywhere'' in memory; it doesn't matter whether they are the last $103 bytes of a 64K-byte block, since the loop only accesses those $103 bytes. But it does add $FD to the address, since Y starts out with that value, so we must subtract that. The following efficiently clears $103 bytes ''anywhere'' in memory:


        lda #<(begin - $FD)
<pre>
        sta addr
        lda #<(begin - $FD)
        lda #>(begin - $FD)
        sta addr
        sta addr+1
        lda #>(begin - $FD)
       
        sta addr+1
        lda #0
       
       
        lda #0
        ldx #$FE
       
        ldy #$FD
        ldx #$FE
       
        ldy #$FD
loop:  sta (addr),y
       
        iny
loop:  sta (addr),y
        bne loop
        iny
       
        bne loop
        inc addr+1
       
        inx
        inc addr+1
        bne loop
        inx
        bne loop
</pre>


Addr starts out as begin-$FD. The first time through the loop, Y=$FD, so it accesses a byte at begin-$FD+$FD, that is, at begin, as desired. Then Y=$FE, so it accesses begin+1. Then Y=$FF and it accesses begin+2. Y wraps around to 0 and the inner loop falls through to the outer loop, which increments the high byte of addr and increments X to $FF and loops back. Now, Y=0 and addr=begin-$FD+$100, or more simply, begin+3, which is the desired address to be accessing this time through. Y keeps incrementing to $FF, then the inner and outer loops end, and it's cleared exactly $103 bytes of memory.
Addr starts out as begin-$FD. The first time through the loop, Y=$FD, so it accesses a byte at begin-$FD+$FD, that is, at begin, as desired. Then Y=$FE, so it accesses begin+1. Then Y=$FF and it accesses begin+2. Y wraps around to 0 and the inner loop falls through to the outer loop, which increments the high byte of addr and increments X to $FF and loops back. Now, Y=0 and addr=begin-$FD+$100, or more simply, begin+3, which is the desired address to be accessing this time through. Y keeps incrementing to $FF, then the inner and outer loops end, and it's cleared exactly $103 bytes of memory.
Line 84: Line 136:
The general pattern is to load X with the high byte of $10000-count, Y with the low byte, and also subtract Y's initial value from begin. $10000-count can more simply be calculated as -count, since you get the same result:
The general pattern is to load X with the high byte of $10000-count, Y with the low byte, and also subtract Y's initial value from begin. $10000-count can more simply be calculated as -count, since you get the same result:


.code
<pre>
        lda #<(begin - <-count)
        lda #<(begin - <-count)
        sta addr
        sta addr
        lda #>(begin - <-count)
        lda #>(begin - <-count)
        sta addr+1
        sta addr+1
       
       
        lda #0
        lda #0
       
       
        ldx #>-count
        ldx #>-count
        ldy #<-count
        ldy #<-count
       
       
loop:  sta (addr),y
loop:  sta (addr),y
        iny
        iny
        bne loop
        bne loop
        inc addr+1
 
        inx
        inc addr+1
        bne loop
        inx
        bne loop
</pre>


If the address and/or size isn't known until the program is running, the above must be done at run-time with instructions, rather than relying on the assembler:
If the address and/or size isn't known until the program is running, the above must be done at run-time with instructions, rather than relying on the assembler:


.zeropage
<pre>
addr:  .res 2
.zeropage
size:  .res 2
addr:  .res 2
size:  .res 2
.code
 
        ; Clears memory from addr through addr+size-1.
.code
        ; Clears all memory if size=0.
        ; Clears memory from addr through addr+size-1.
       
        ; Clears all memory if size=0.
        ; Adjust addr. Subtracting low byte of negated size is the same
       
        ; as adding the low byte of size and then a high byte of $FF,
        ; Adjust addr. Subtracting low byte of negated size is the same
        ; which greatly simplifies this calculation.
        ; as adding the low byte of size and then a high byte of $FF,
        lda addr
        ; which greatly simplifies this calculation.
        clc
        lda addr
        adc size
        clc
        sta addr
        adc size
        lda addr+1
        sta addr
        adc #$FF
        lda addr+1
        sta addr+1
        adc #$FF
       
        sta addr+1
        ; Subtract size from 0 to negate it, and put that into X and Y
       
        lda #0
        ; Subtract size from 0 to negate it, and put that into X and Y
        sec
        lda #0
        sbc size
        sec
        tay
        sbc size
        lda #0
        tay
        sbc size+1
        lda #0
        tax
        sbc size+1
       
        tax
        lda #0
       
       
        lda #0
loop:  sta (addr),y
       
        iny
loop:  sta (addr),y
        bne loop
        iny
        inc addr+1
        bne loop
        inx
 
        bne loop
        inc addr+1
        inx
        bne loop
</pre>


An alternate approach of similar efficienty is a dual loop, with the first handling full 256-byte pages efficiently and the second handling the final partial page:
An alternate approach of similar efficienty is a dual loop, with the first handling full 256-byte pages efficiently and the second handling the final partial page:
          
          
        ; Clears memory from addr through addr+size-1.
<pre>
        ; Clears nothing if size=0.
        ; Clears memory from addr through addr+size-1.
       
        ; Clears nothing if size=0.
        lda #0
       
       
        lda #0
        ldy #0
       
       
        ldy #0
        ; Load page count and handle case where it's zero
       
        ldx size+1
        ; Load page count and handle case where it's zero
        beq final
        ldx size+1
       
        beq final
        ; Do full pages first
       
pages:  sta (addr),y
        ; Do full pages first
        iny
pages:  sta (addr),y
        bne pages
        iny
        inc addr+1
        bne pages
        dex
        inc addr+1
        bne pages
        dex
       
        bne pages
final:  ldx size
       
        beq done
final:  ldx size
       
        beq done
        ldy #0
       
loop:  sta (addr),y
        ldy #0
        iny
loop:  sta (addr),y
        dex
        iny
        bne loop
        dex
       
        bne loop
done:
       
done:
</pre>


This dual-loop approach works if the loop operation is short (here just the STA (addr),Y). If lots is done to each table entry, the previous approach is preferable due to reduced code size and not having to duplicate the code that operates on the table.
This dual-loop approach works if the loop operation is short (here just the STA (addr),Y). If lots is done to each table entry, the previous approach is preferable due to reduced code size and not having to duplicate the code that operates on the table.

Latest revision as of 04:45, 11 September 2014

Scanning tables larger than 256 bytes is more involved than scanning small tables, because there are no 16-bit equivalents to X and Y. Consider clearing a large block of RAM quickly. A straight-forward approach is to keep a 16-bit index in X and Y, and increment it until it reaches the desired index:

begin = ... ; address of first byte to clear
count = ... ; number of bytes

.zeropage
addr:    .res 2

.code
        lda #<begin
        sta addr
        lda #>begin
        sta addr+1
        
        lda #0
        
        ; X and Y form 16-bit index, with Y holding low byte
        ldx #0
        ldy #0
        
loop:   ; Clear one byte
        sta (addr),y
        
        ; Go to next byte
        iny
        bne nohigh
        inx
        inc addr+1
nohigh:
        ; See if low and high bytes of index match count
        cpy #<count
        bne loop
        cpx #>count
        bne loop

Using the same approach used in Efficient Forwards Scanning, we can make this more efficient.

First, consider the case of clearing all 64K of RAM. X and Y start out at 0, and they increment up to $FF $FF, then the loop ends.

        lda #0
        sta addr
        lda #0
        sta addr+1
        
        lda #0
        
        ldx #0
        ldy #0
        
loop:   sta (addr),y
        
        iny
        bne nohigh
        inx
        inc addr+1
nohigh:
        cpy #0
        bne loop
        cpx #0
        bne loop

Since we end once X and Y wrap to 0, we can optimize the inner loop to not check X every time:

        lda #0
        sta addr
        lda #0
        sta addr+1
        
        lda #0
        
        ldx #0
        ldy #0
        
loop:   sta (addr),y
        iny
        bne loop

        inc addr+1
        inx
        bne loop

If we wanted to clear just the last $103 bytes of memory, we'd start with the values the above clear loop has when there are only $103 bytes remaining. So we'd start with X=$FE and Y=$FD, and add $FE00 to addr. This would first clear a byte at $FEFD, then $FEFE, $FEFF, and the 256 bytes beginning at $FF00, as desired:

        lda #0
        sta addr
        lda #$FE
        sta addr+1
        
        lda #0
        
        ldx #$FE
        ldy #$FD
        
loop:   sta (addr),y
        iny
        bne loop

        inc addr+1
        inx
        bne loop

Now, if we can have the above loop clear the last $103 bytes of the 64K block "beginning" at address 0, we can have it clear the last $103 bytes of a 64K block that "begins" at any address in memory (it would wrap around to the beginning). For example, a 64K block at $1234 extends through $FFFF, then back to $0000 and through $1233, covering 64K total, and the last $103 bytes of this table would go from $1131-$1233.

Thus, we can use this to clear a block of $103 bytes anywhere in memory; it doesn't matter whether they are the last $103 bytes of a 64K-byte block, since the loop only accesses those $103 bytes. But it does add $FD to the address, since Y starts out with that value, so we must subtract that. The following efficiently clears $103 bytes anywhere in memory:

        lda #<(begin - $FD)
        sta addr
        lda #>(begin - $FD)
        sta addr+1
        
        lda #0
        
        ldx #$FE
        ldy #$FD
        
loop:   sta (addr),y
        iny
        bne loop
        
        inc addr+1
        inx
        bne loop

Addr starts out as begin-$FD. The first time through the loop, Y=$FD, so it accesses a byte at begin-$FD+$FD, that is, at begin, as desired. Then Y=$FE, so it accesses begin+1. Then Y=$FF and it accesses begin+2. Y wraps around to 0 and the inner loop falls through to the outer loop, which increments the high byte of addr and increments X to $FF and loops back. Now, Y=0 and addr=begin-$FD+$100, or more simply, begin+3, which is the desired address to be accessing this time through. Y keeps incrementing to $FF, then the inner and outer loops end, and it's cleared exactly $103 bytes of memory.

The general pattern is to load X with the high byte of $10000-count, Y with the low byte, and also subtract Y's initial value from begin. $10000-count can more simply be calculated as -count, since you get the same result:

        lda #<(begin - <-count)
        sta addr
        lda #>(begin - <-count)
        sta addr+1
        
        lda #0
        
        ldx #>-count
        ldy #<-count
        
loop:   sta (addr),y
        iny
        bne loop

        inc addr+1
        inx
        bne loop

If the address and/or size isn't known until the program is running, the above must be done at run-time with instructions, rather than relying on the assembler:

.zeropage
addr:   .res 2
size:   .res 2

.code
        ; Clears memory from addr through addr+size-1.
        ; Clears all memory if size=0.
        
        ; Adjust addr. Subtracting low byte of negated size is the same
        ; as adding the low byte of size and then a high byte of $FF,
        ; which greatly simplifies this calculation.
        lda addr
        clc
        adc size
        sta addr
        lda addr+1
        adc #$FF
        sta addr+1
        
        ; Subtract size from 0 to negate it, and put that into X and Y
        lda #0
        sec
        sbc size
        tay
        lda #0
        sbc size+1
        tax
        
        lda #0
        
loop:   sta (addr),y
        iny
        bne loop

        inc addr+1
        inx
        bne loop

An alternate approach of similar efficienty is a dual loop, with the first handling full 256-byte pages efficiently and the second handling the final partial page:

        ; Clears memory from addr through addr+size-1.
        ; Clears nothing if size=0.
        
        lda #0
        
        ldy #0
        
        ; Load page count and handle case where it's zero
        ldx size+1
        beq final
        
        ; Do full pages first
pages:  sta (addr),y
        iny
        bne pages
        inc addr+1
        dex
        bne pages
        
final:  ldx size
        beq done
        
        ldy #0
loop:   sta (addr),y
        iny
        dex
        bne loop
        
done:

This dual-loop approach works if the loop operation is short (here just the STA (addr),Y). If lots is done to each table entry, the previous approach is preferable due to reduced code size and not having to duplicate the code that operates on the table.