Delay code: Difference between revisions
From NESdev Wiki
Jump to navigationJump to search
m (→Delay code: typo) |
(Remove broken delay_256a_x_31_clocks, tidy up) |
||
Line 6: | Line 6: | ||
If you want to ensure this condition at compile time, use the bccnw/beqnw/etc. macros that are listed at [[Fixed cycle delay]]. | If you want to ensure this condition at compile time, use the bccnw/beqnw/etc. macros that are listed at [[Fixed cycle delay]]. | ||
=== 25 | === A + 25 cycles of delay, clobbers A, Z&N, C, V === | ||
<pre>;;;;;;;;;;;;;;;;;;;;;;;; | <pre>;;;;;;;;;;;;;;;;;;;;;;;; | ||
; Delays A clocks + overhead | ; Delays A clocks + overhead | ||
; | ; Clobbers A. Preserves X,Y. | ||
; Time: A+25 clocks (including JSR) | ; Time: A+25 clocks (including JSR) | ||
;;;;;;;;;;;;;;;;;;;;;;;; | ;;;;;;;;;;;;;;;;;;;;;;;; | ||
Line 27: | Line 27: | ||
: rts ; (thanks to dclxvi for the algorithm)</pre> | : rts ; (thanks to dclxvi for the algorithm)</pre> | ||
=== | === A + 27 cycles of delay with no zero check, clobbers A, Z&N, C, V === | ||
<pre>;;;;;;;;;;;;;;;;;;;;;;;; | <pre>;;;;;;;;;;;;;;;;;;;;;;;; | ||
; Delays A | ; Delays A clocks + overhead | ||
; Time: | ; Clobbers A. Preserves X,Y. | ||
; | ; Time: A+27 clocks (including JSR) | ||
; If A = 0, is interpreted as 256. | |||
;;;;;;;;;;;;;;;;;;;;;;;; | ;;;;;;;;;;;;;;;;;;;;;;;; | ||
: | delay_a_27_clocks: | ||
; ; Cycles Accumulator Carry flag | ; ; Cycles Accumulator Carry flag | ||
; ; 0 1 2 3 4 (hex) 0 1 2 3 4 | ; ; 0 1 2 3 4 (hex) 0 1 2 3 4 | ||
Line 59: | Line 49: | ||
: rts ;15 16 17 18 19 (thanks to dclxvi for the algorithm)</pre> | : rts ;15 16 17 18 19 (thanks to dclxvi for the algorithm)</pre> | ||
=== 256×A + X + 33 cycles of delay, clobbers A, Z&N, C, V === | |||
<pre>;;;;;;;;;;;;;;;;;;;;;;;; | <pre>;;;;;;;;;;;;;;;;;;;;;;;; | ||
; Delays A:X clocks+overhead | ; Delays A:X clocks+overhead | ||
; Time: 256*A+X+33 clocks (including JSR) | ; Time: 256*A+X+33 clocks (including JSR) | ||
; Clobbers A | ; Clobbers A. Preserves X,Y. Has relocations. | ||
; | ;;;;;;;;;;;;;;;;;;;;;;;; | ||
: ; do 256-5 cycles. | |||
sbc #1 ; 2 cycles - Carry was set from cmp | |||
pha | |||
lda #(256-5 - 27-7-2) | |||
jsr delay_a_27_clocks | |||
pla | |||
delay_256a_x_33_clocks: | |||
cmp #1 ; +2 | |||
bcs :- ; +3 (-1) | |||
; 0-255 cycles remain, overhead = 4 | |||
txa ; +2; 6; +27 = 33 | |||
;passthru | |||
<<Place the function delay_a_27_clocks immediately following here>></pre> | |||
Can be trivially changed to swap X, Y. | |||
=== 256×A + X + 33 cycles of delay, relocatable, clobbers A, Y, Z&N, C, V === | |||
<pre>;;;;;;;;;;;;;;;;;;;;;;;; | |||
; Delays A:X clocks+overhead | |||
; Time: 256*A+X+33 clocks (including JSR) | |||
; Clobbers A,Y. Preserves X. Relocatable. | |||
;;;;;;;;;;;;;;;;;;;;;;;; | ;;;;;;;;;;;;;;;;;;;;;;;; | ||
: ; do 256 | : ; do 256-5 cycles. | ||
sbc #1 ; 2 cycles - Carry was set from cmp | sbc #1 ; 2 cycles - Carry was set from cmp | ||
ldy #48 ;\ | |||
dey ; |- Clobbers Y; 246 cycles, 253 total | dey ; |- Clobbers Y; 246 cycles, 253 total | ||
bpl *-1 ;/ | bpl *-1 ;/ | ||
ldy $A4 ; ; 3 cycles, 256 total | ldy $A4 ; ; 3 cycles, 256 total | ||
delay_256a_x_33_clocks_b: | delay_256a_x_33_clocks_b: | ||
cmp #1 ; +2 | cmp #1 ; +2 | ||
bcs :- ; + | bcs :- ; +3 (-1) | ||
; 0-255 cycles remain, overhead = 4 | ; 0-255 cycles remain, overhead = 4 | ||
txa ; +2; 6; +27 = 33 | txa ; +2; 6; +27 = 33 | ||
;passthru | |||
<<Place the function delay_a_27_clocks immediately following here>></pre> | |||
Can be trivially changed to swap X, Y. | |||
=== 256×A + X + 33 cycles of delay, relocatable, clobbers A, Z&N, C, V === | |||
<pre>;;;;;;;;;;;;;;;;;;;;;;;; | <pre>;;;;;;;;;;;;;;;;;;;;;;;; | ||
; Delays A:X clocks+overhead | ; Delays A:X clocks+overhead | ||
; Time: 256*A+X+33 clocks (including JSR) | ; Time: 256*A+X+33 clocks (including JSR) | ||
; Clobbers A. Preserves X,Y. | ; Clobbers A. Preserves X,Y. Relocatable. | ||
; Does not depend on delay_a_25_clocks. | ; Does not depend on delay_a_25_clocks. | ||
;;;;;;;;;;;;;;;;;;;;;;;; | ;;;;;;;;;;;;;;;;;;;;;;;; | ||
Line 112: | Line 118: | ||
; 0-255 cycles remain, overhead = 4 | ; 0-255 cycles remain, overhead = 4 | ||
txa ; +2; 6; +27 = 33 | txa ; +2; 6; +27 = 33 | ||
;passthru | |||
<<Place the function delay_a_27_clocks immediately following here>></pre> | |||
=== | === 256×A + 16 cycles of delay, clobbers A, Z&N, C, V === | ||
<pre>;;;;;;;;;;;;;;;;;;;;;;;; | <pre>;;;;;;;;;;;;;;;;;;;;;;;; | ||
; Delays | ; Delays A*256 clocks + overhead | ||
; Time: 256 | ; Clobbers A. Preserves X,Y. | ||
; | ; Time: A*256+16 clocks (including JSR) | ||
; Depends on delay_a_25_clocks | |||
;;;;;;;;;;;;;;;;;;;;;;;; | ;;;;;;;;;;;;;;;;;;;;;;;; | ||
delay_256a_16_clocks: | |||
cmp #0 | |||
bne :+ | |||
rts | |||
delay_256a_11_clocks_: | |||
: pha | |||
lda #(256-25- | lda #(256-25-7-2-2-3) | ||
jsr delay_a_25_clocks | jsr delay_a_25_clocks | ||
pla | pla | ||
clc | |||
adc #-1&$FF | |||
bne :- | |||
rts</pre> | |||
Alternative that depends on different function: | |||
<pre>;;;;;;;;;;;;;;;;;;;;;;;; | <pre>;;;;;;;;;;;;;;;;;;;;;;;; | ||
; Delays A*256 clocks + overhead | ; Delays A*256 clocks + overhead | ||
; | ; Clobbers A. Preserves X,Y. | ||
; Time: A*256+16 clocks (including JSR) | ; Time: A*256+16 clocks (including JSR) | ||
; Depends on delay_a_27_clocks | |||
;;;;;;;;;;;;;;;;;;;;;;;; | ;;;;;;;;;;;;;;;;;;;;;;;; | ||
delay_256a_16_clocks_b: | |||
cmp #0 | cmp #0 | ||
bne :+ | bne :+ | ||
rts | rts | ||
delay_256a_11_clocks_b_: | |||
: pha | : pha | ||
lda #256- | lda #(256-27-7-2-2-3) | ||
jsr | jsr delay_a_27_clocks | ||
pla | pla | ||
clc | clc | ||
Line 165: | Line 165: | ||
rts</pre> | rts</pre> | ||
=== | === 256×X + 16 cycles of delay, relocatable, clobbers X, Y, Z&N === | ||
<pre>;;;;;;;;;;;;;;;;;;;;;;;; | |||
; Delays X*256 clocks + overhead | |||
; Clobbers X,Y. Preserves A. Relocatable. | |||
; Time: X*256+16 clocks (including JSR) | |||
;;;;;;;;;;;;;;;;;;;;;;;; | |||
delay_256x_16_clocks: | |||
cpx #0 | |||
bne :+ | |||
rts | |||
delay_256x_11_clocks_: | |||
;5 cycles done. Loop is 256 cycles | |||
: ldy #50 | |||
dey | |||
bne *-1 | |||
dex | |||
bne :- | |||
;Loop end is -1 cycles. Total: 4+JSR+RTS = 16 | |||
rts</pre> | |||
Can be trivially changed to swap X, Y. | |||
=== 256×X + A + 30 cycles of delay, clobbers A, X, Z&N, C, V === | |||
<pre>;;;;;;;;;;;;;;;;;;;;;;;; | <pre>;;;;;;;;;;;;;;;;;;;;;;;; | ||
; Delays | ; Delays X*256 clocks + overhead | ||
; Time: 256 | ; Clobbers A,X. Preserves Y. | ||
; Depends on delay_a_25_clocks within short branch distance | |||
; Time: X*256+16 clocks (including JSR) | |||
;;;;;;;;;;;;;;;;;;;;;;;; | ;;;;;;;;;;;;;;;;;;;;;;;; | ||
: ; | delay_256x_a_30_clocks: | ||
cpx #0 | |||
beq delay_a_25_clocks | |||
lda #(256- | ;4 cycles done. Loop is 256 cycles | ||
: pha | |||
lda #(256-7-2-2-3) | |||
jsr delay_a_25_clocks | jsr delay_a_25_clocks | ||
pla | pla | ||
dex | |||
beq delay_a_25_clocks ; count as 2 | |||
bne :- | |||
; 0 | ;Loop end is -1+1 = 0 cycles. Total: 4+JSR+RTS = 16</pre> | ||
Can be trivially changed to swap X, Y. | |||
</pre> | |||
== See also == | == See also == | ||
* [[Fixed cycle delay]] | * [[Fixed cycle delay]] |
Revision as of 23:14, 20 April 2016
Delay code
Functions that cause a parametrised number of cycles of delay.
Note that all branch instructions are written assuming that no page wrap occurs. If you want to ensure this condition at compile time, use the bccnw/beqnw/etc. macros that are listed at Fixed cycle delay.
A + 25 cycles of delay, clobbers A, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A clocks + overhead ; Clobbers A. Preserves X,Y. ; Time: A+25 clocks (including JSR) ;;;;;;;;;;;;;;;;;;;;;;;; : sbc #7 ; carry set by CMP delay_a_25_clocks: cmp #7 bcs :- ; do multiples of 7 lsr a ; bit 0 bcs :+ ; A=clocks/2, either 0,1,2,3 beq @zero ; 0: 5 lsr a beq :+ ; 1: 7 bcc :+ ; 2: 9 @zero: bne :+ ; 3: 11 : rts ; (thanks to dclxvi for the algorithm)
A + 27 cycles of delay with no zero check, clobbers A, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A clocks + overhead ; Clobbers A. Preserves X,Y. ; Time: A+27 clocks (including JSR) ; If A = 0, is interpreted as 256. ;;;;;;;;;;;;;;;;;;;;;;;; delay_a_27_clocks: ; ; Cycles Accumulator Carry flag ; ; 0 1 2 3 4 (hex) 0 1 2 3 4 sec ; 0 0 0 0 0 00 01 02 03 04 1 1 1 1 1 : sbc #5 ; 2 2 2 2 2 FB FC FD FE FF 0 0 0 0 0 bcs :- ; 4 4 4 4 4 FB FC FD FE FF 0 0 0 0 0 lsr a ; 6 6 6 6 6 7D 7E 7E 7F 7F 1 0 1 0 1 bcc :+ ; 8 8 8 8 8 7D 7E 7E 7F 7F 1 0 1 0 1 : sbc #$7E ;10 11 10 11 10 FF FF 00 00 01 0 0 1 1 1 bcc :+ ;12 13 12 13 12 FF FF 00 00 01 0 0 1 1 1 beq :+ ; 14 15 14 00 00 01 1 1 1 bne :+ ; 16 01 1 : rts ;15 16 17 18 19 (thanks to dclxvi for the algorithm)
256×A + X + 33 cycles of delay, clobbers A, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A:X clocks+overhead ; Time: 256*A+X+33 clocks (including JSR) ; Clobbers A. Preserves X,Y. Has relocations. ;;;;;;;;;;;;;;;;;;;;;;;; : ; do 256-5 cycles. sbc #1 ; 2 cycles - Carry was set from cmp pha lda #(256-5 - 27-7-2) jsr delay_a_27_clocks pla delay_256a_x_33_clocks: cmp #1 ; +2 bcs :- ; +3 (-1) ; 0-255 cycles remain, overhead = 4 txa ; +2; 6; +27 = 33 ;passthru <<Place the function delay_a_27_clocks immediately following here>>
Can be trivially changed to swap X, Y.
256×A + X + 33 cycles of delay, relocatable, clobbers A, Y, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A:X clocks+overhead ; Time: 256*A+X+33 clocks (including JSR) ; Clobbers A,Y. Preserves X. Relocatable. ;;;;;;;;;;;;;;;;;;;;;;;; : ; do 256-5 cycles. sbc #1 ; 2 cycles - Carry was set from cmp ldy #48 ;\ dey ; |- Clobbers Y; 246 cycles, 253 total bpl *-1 ;/ ldy $A4 ; ; 3 cycles, 256 total delay_256a_x_33_clocks_b: cmp #1 ; +2 bcs :- ; +3 (-1) ; 0-255 cycles remain, overhead = 4 txa ; +2; 6; +27 = 33 ;passthru <<Place the function delay_a_27_clocks immediately following here>>
Can be trivially changed to swap X, Y.
256×A + X + 33 cycles of delay, relocatable, clobbers A, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A:X clocks+overhead ; Time: 256*A+X+33 clocks (including JSR) ; Clobbers A. Preserves X,Y. Relocatable. ; Does not depend on delay_a_25_clocks. ;;;;;;;;;;;;;;;;;;;;;;;; : ; do 256 cycles. ; 5 cycles done so far. Loop is 2+1+ 1+2+1+2+1 + 1+1 = 12 bytes. sbc #1 ; 2 cycles - Carry was set from cmp pha ;\ txa ; | ldx #46 ; | dex ; |- ; 247 cycles, 254 total bpl *-1 ; | tax ; | pla ;/ nop ; 2 cycles; 256 cycles total delay_256a_x_33_clocks_c: cmp #1 ; +2; 2 cycles overhead bcs :- ; +2; 4 cycles overhead ; 0-255 cycles remain, overhead = 4 txa ; +2; 6; +27 = 33 ;passthru <<Place the function delay_a_27_clocks immediately following here>>
256×A + 16 cycles of delay, clobbers A, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A*256 clocks + overhead ; Clobbers A. Preserves X,Y. ; Time: A*256+16 clocks (including JSR) ; Depends on delay_a_25_clocks ;;;;;;;;;;;;;;;;;;;;;;;; delay_256a_16_clocks: cmp #0 bne :+ rts delay_256a_11_clocks_: : pha lda #(256-25-7-2-2-3) jsr delay_a_25_clocks pla clc adc #-1&$FF bne :- rts
Alternative that depends on different function:
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A*256 clocks + overhead ; Clobbers A. Preserves X,Y. ; Time: A*256+16 clocks (including JSR) ; Depends on delay_a_27_clocks ;;;;;;;;;;;;;;;;;;;;;;;; delay_256a_16_clocks_b: cmp #0 bne :+ rts delay_256a_11_clocks_b_: : pha lda #(256-27-7-2-2-3) jsr delay_a_27_clocks pla clc adc #-1&$FF bne :- rts
256×X + 16 cycles of delay, relocatable, clobbers X, Y, Z&N
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays X*256 clocks + overhead ; Clobbers X,Y. Preserves A. Relocatable. ; Time: X*256+16 clocks (including JSR) ;;;;;;;;;;;;;;;;;;;;;;;; delay_256x_16_clocks: cpx #0 bne :+ rts delay_256x_11_clocks_: ;5 cycles done. Loop is 256 cycles : ldy #50 dey bne *-1 dex bne :- ;Loop end is -1 cycles. Total: 4+JSR+RTS = 16 rts
Can be trivially changed to swap X, Y.
256×X + A + 30 cycles of delay, clobbers A, X, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays X*256 clocks + overhead ; Clobbers A,X. Preserves Y. ; Depends on delay_a_25_clocks within short branch distance ; Time: X*256+16 clocks (including JSR) ;;;;;;;;;;;;;;;;;;;;;;;; delay_256x_a_30_clocks: cpx #0 beq delay_a_25_clocks ;4 cycles done. Loop is 256 cycles : pha lda #(256-7-2-2-3) jsr delay_a_25_clocks pla dex beq delay_a_25_clocks ; count as 2 bne :- ;Loop end is -1+1 = 0 cycles. Total: 4+JSR+RTS = 16
Can be trivially changed to swap X, Y.