Delay code: Difference between revisions
m (→A + 25 cycles of delay, clobbers A, Z&N, C, V: addendum) |
(→Delay code: Experimentally try a more mathematical notation that emphasizes the possible range of cycles to delay) |
||
Line 8: | Line 8: | ||
=== Inline code === | === Inline code === | ||
==== | ==== 2—3 cycles of delay: delay=A+2; 0 ≤ A ≤ 1, A⊢Z, ΔA = 0) ==== | ||
<pre> bne @1 | <pre> bne @1 | ||
@1:</pre> | @1:</pre> | ||
==== | ==== 4—5 cycles of delay: delay=A+4; 0 ≤ A ≤ 1, ΔA = 0) ==== | ||
<pre> ora #0 | <pre> ora #0 | ||
bne @1 | bne @1 | ||
@1:</pre> | @1:</pre> | ||
==== | ==== 4—5 cycles of delay: delay=X+4; 0 ≤ X ≤ 1, ΔX = 0) ==== | ||
<pre> dex | <pre> dex | ||
Line 25: | Line 24: | ||
@1:</pre> | @1:</pre> | ||
==== | ==== 5—7 cycles of delay: delay=A+5; 0 ≤ A ≤ 2, A⊢Z) ==== | ||
<pre> beq @2 | <pre> beq @2 | ||
Line 32: | Line 31: | ||
@3:</pre> | @3:</pre> | ||
==== | ==== 5—7 cycles of delay: delay=A+5; 0 ≤ A ≤ 2, ΔA = 0) ==== | ||
<pre> cmp #1 | <pre> cmp #1 | ||
Line 39: | Line 38: | ||
@3:</pre> | @3:</pre> | ||
==== | ==== 5—7 cycles of delay: delay=X+5; 0 ≤ X ≤ 2) ==== | ||
<pre> dex | <pre> dex | ||
Line 46: | Line 45: | ||
@3:</pre> | @3:</pre> | ||
==== | ==== 5—7 cycles of delay: delay=X+5; 0 ≤ X ≤ 2, ΔA = 0) ==== | ||
<pre> cpx #1 | <pre> cpx #1 | ||
Line 53: | Line 52: | ||
@3:</pre> | @3:</pre> | ||
==== | ==== 6—9 cycles of delay: delay=A+6; 0 ≤ A ≤ 3, A⊢Z) ==== | ||
<pre> beq @2 | <pre> beq @2 | ||
Line 61: | Line 60: | ||
@4:</pre> | @4:</pre> | ||
==== | ==== 7—10 cycles of delay: delay=A+7; 0 ≤ A ≤ 3) ==== | ||
<pre> lsr | <pre> lsr | ||
Line 69: | Line 68: | ||
@4:</pre> | @4:</pre> | ||
==== | ==== 8—11 cycles of delay: delay=X+8; 0 ≤ X ≤ 3) ==== | ||
<pre> dex | <pre> dex | ||
Line 78: | Line 77: | ||
@5:</pre> | @5:</pre> | ||
==== | ==== 9—14 cycles of delay: delay=A−251; 251 ≤ A ≤ 255; C = 0) ==== | ||
<pre> adc #3 ; 2 2 2 2 2 FE FF 00 01 02 | <pre> adc #3 ; 2 2 2 2 2 FE FF 00 01 02 | ||
Line 88: | Line 87: | ||
@6:</pre> | @6:</pre> | ||
==== | ==== 10—14 cycles of delay: delay=X+10; 0 ≤ X ≤ 4) ==== | ||
<pre> cpx #3 | <pre> cpx #3 | ||
Line 98: | Line 97: | ||
@6:</pre> | @6:</pre> | ||
==== | ==== 9—14 cycles of delay: delay=A+9; 0 ≤ A ≤ 5) ==== | ||
<pre> lsr a | <pre> lsr a | ||
Line 108: | Line 107: | ||
@6:</pre> | @6:</pre> | ||
==== | ==== 9—16 cycles of delay: delay=A+9; 0 ≤ A ≤ 7) ==== | ||
<pre> lsr a | <pre> lsr a | ||
Line 119: | Line 118: | ||
@7:</pre> | @7:</pre> | ||
==== | ==== 15—270 cycles of delay: delay=A+15; 0 ≤ A ≤ 255) ==== | ||
This code peels slices of 5 cycles with a SBC-BCS loop, and then executes the delay code for | This code peels slices of 5 cycles with a SBC-BCS loop, and then executes the delay code for 9—14 cycles where delay = A−251 cycles and carry=clear. The same code will appear later as a function version (which adds 12 cycles overhead due to JSR+RTS cost). | ||
<pre> sec | <pre> sec |
Revision as of 22:08, 7 May 2016
Delay code
Code that causes a parametrised number of cycles of delay.
Note that all branch instructions are written assuming that no page wrap occurs. If you want to ensure this condition at compile time, use the bccnw/beqnw/etc. macros that are listed at Fixed cycle delay.
Inline code
2—3 cycles of delay: delay=A+2; 0 ≤ A ≤ 1, A⊢Z, ΔA = 0)
bne @1 @1:
4—5 cycles of delay: delay=A+4; 0 ≤ A ≤ 1, ΔA = 0)
ora #0 bne @1 @1:
4—5 cycles of delay: delay=X+4; 0 ≤ X ≤ 1, ΔX = 0)
dex bpl @1 @1:
5—7 cycles of delay: delay=A+5; 0 ≤ A ≤ 2, A⊢Z)
beq @2 lsr @2: bne @3 @3:
5—7 cycles of delay: delay=A+5; 0 ≤ A ≤ 2, ΔA = 0)
cmp #1 bcc @3 bne @3 @3:
5—7 cycles of delay: delay=X+5; 0 ≤ X ≤ 2)
dex bmi @3 bne @3 @3:
5—7 cycles of delay: delay=X+5; 0 ≤ X ≤ 2, ΔA = 0)
cpx #1 bcc @3 bne @3 @3:
6—9 cycles of delay: delay=A+6; 0 ≤ A ≤ 3, A⊢Z)
beq @2 lsr @2: beq @4 bcs @4 @4:
7—10 cycles of delay: delay=A+7; 0 ≤ A ≤ 3)
lsr beq @3 bpl @3 @3: bcs @4 @4:
8—11 cycles of delay: delay=X+8; 0 ≤ X ≤ 3)
dex bmi @4 dex bmi @5 @4: bne @5 @5:
9—14 cycles of delay: delay=A−251; 251 ≤ A ≤ 255; C = 0)
adc #3 ; 2 2 2 2 2 FE FF 00 01 02 bcc @4 ; 3 3 2 2 2 FE FF 00 01 02 lsr ; - - 2 2 2 -- -- 00 00 01 beq @5 ; - - 3 3 2 -- -- 00 00 01 @4: lsr ; 2 2 - - 2 7F 7F -- -- 00 @5: bcs @6 ; 2 3 2 3 2 7F 7F 00 00 00 @6:
10—14 cycles of delay: delay=X+10; 0 ≤ X ≤ 4)
cpx #3 bcc @3 bne @3 @3: dex bmi @6 bne @6 @6:
9—14 cycles of delay: delay=A+9; 0 ≤ A ≤ 5)
lsr a bcs @2 @2: beq @5 lsr bcs @6 @5: bne @6 @6:
9—16 cycles of delay: delay=A+9; 0 ≤ A ≤ 7)
lsr a bcs @2 @2: beq @6 lsr beq @7 bcc @7 @6: bne @7 @7:
15—270 cycles of delay: delay=A+15; 0 ≤ A ≤ 255)
This code peels slices of 5 cycles with a SBC-BCS loop, and then executes the delay code for 9—14 cycles where delay = A−251 cycles and carry=clear. The same code will appear later as a function version (which adds 12 cycles overhead due to JSR+RTS cost).
sec @L: sbc #5 bcs @L ; 6 6 6 6 6 FB FC FD FE FF adc #3 ; 2 2 2 2 2 FE FF 00 01 02 bcc @4 ; 3 3 2 2 2 FE FF 00 01 02 lsr ; - - 2 2 2 -- -- 00 00 01 beq @5 ; - - 3 3 2 -- -- 00 00 01 @4: lsr ; 2 2 - - 2 7F 7F -- -- 00 @5: bcs @6 ; 2 3 2 3 2 7F 7F 00 00 00 @6:
851968×Y + 3328×A + 13×X + 18 cycles of delay
iny @l1: nop nop @l2: cpx #1 dex sbc #0 bcs @l1 dey bne @l2 rts
Callable functions
A + 25 cycles of delay, clobbers A, Z&N, C, V
This code peels slices of 7 cycles with a CMP-BCS-SBC loop, and then executes the delay code for 0—7 cycles. The reason its overhead is smaller than in the version that peels 5 cycles is because the case for A<7 executes only two instructions instead of three. This comes at the cost that the entry point is not the first instruction. Therefore the code can only exist as a callable function and not inline code.
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A clocks + overhead ; Clobbers A. Preserves X,Y. ; Time: A+25 clocks (including JSR) ;;;;;;;;;;;;;;;;;;;;;;;; ; Cycles Accumulator Carry flag ; 0 1 2 3 4 5 6 (hex) 0 1 2 3 4 5 6 ; ; 6 6 6 6 6 6 6 00 01 02 03 04 05 06 : sbc #7 ; carry set by CMP delay_a_25_clocks: cmp #7 ; 2 2 2 2 2 2 2 00 01 02 03 04 05 06 0 0 0 0 0 0 0 bcs :- ; 2 2 2 2 2 2 2 00 01 02 03 04 05 06 0 0 0 0 0 0 0 lsr ; 2 2 2 2 2 2 2 00 00 01 01 02 02 03 0 1 0 1 0 1 0 bcs *+2 ; 2 3 2 3 2 3 2 00 00 01 01 02 02 03 0 1 0 1 0 1 0 beq :+ ; 3 3 2 2 2 2 2 00 00 01 01 02 02 03 0 1 0 1 0 1 0 lsr ; 2 2 2 2 2 00 00 01 01 01 1 1 0 0 1 beq @rts ; 3 3 2 2 2 00 00 01 01 01 1 1 0 0 1 bcc @rts ; 3 3 2 01 01 01 0 0 1 : bne @rts ; 2 2 3 00 00 01 0 1 0 @rts: rts ; 6 6 6 6 6 6 6 00 00 00 00 01 01 01 0 1 1 1 0 0 1 ; Total cycles: 25 26 27 28 29 30 31
A + 27 cycles of delay, clobbers A, Z&N, C, V
This code has longer overhead than delay_a_25_clocks, but it can be appended into other functions, as the execution begins from the first instruction.
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A clocks + overhead ; Clobbers A. Preserves X,Y. ; Time: A+27 clocks (including JSR) ;;;;;;;;;;;;;;;;;;;;;;;; delay_a_27_clocks: sec @L: sbc #5 bcs @L ; 6 6 6 6 6 FB FC FD FE FF adc #3 ; 2 2 2 2 2 FE FF 00 01 02 bcc @4 ; 3 3 2 2 2 FE FF 00 01 02 lsr ; - - 2 2 2 -- -- 00 00 01 beq @5 ; - - 3 3 2 -- -- 00 00 01 @4: lsr ; 2 2 - - 2 7F 7F -- -- 00 @5: bcs @6 ; 2 3 2 3 2 7F 7F 00 00 00 @6: rts ;
256×A + X + 33 cycles of delay, clobbers A, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A:X clocks+overhead ; Time: 256*A+X+33 clocks (including JSR) ; Clobbers A. Preserves X,Y. Has relocations. ;;;;;;;;;;;;;;;;;;;;;;;; : ; do 256-5 cycles. sbc #1 ; 2 cycles - Carry was set from cmp pha lda #(256-5 - 27-7-2) jsr delay_a_27_clocks pla delay_256a_x_33_clocks: cmp #1 ; +2 bcs :- ; +3 (-1) ; 0-255 cycles remain, overhead = 4 txa ; +2; 6; +27 = 33 ;passthru <<Place the function delay_a_27_clocks immediately following here>>
Can be trivially changed to swap X, Y.
256×A + X + 33 cycles of delay, relocatable, clobbers A, Y, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A:X clocks+overhead ; Time: 256*A+X+33 clocks (including JSR) ; Clobbers A,Y. Preserves X. Relocatable. ;;;;;;;;;;;;;;;;;;;;;;;; : ; do 256-5 cycles. sbc #1 ; 2 cycles - Carry was set from cmp ldy #48 ;\ dey ; |- Clobbers Y; 246 cycles, 253 total bpl *-1 ;/ ldy $A4 ; ; 3 cycles, 256 total delay_256a_x_33_clocks_b: cmp #1 ; +2 bcs :- ; +3 (-1) ; 0-255 cycles remain, overhead = 4 txa ; +2; 6; +27 = 33 ;passthru <<Place the function delay_a_27_clocks immediately following here>>
Can be trivially changed to swap X, Y.
256×A + X + 33 cycles of delay, relocatable, clobbers A, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A:X clocks+overhead ; Time: 256*A+X+33 clocks (including JSR) ; Clobbers A. Preserves X,Y. Relocatable. ; Does not depend on delay_a_25_clocks. ;;;;;;;;;;;;;;;;;;;;;;;; : ; do 256 cycles. ; 5 cycles done so far. Loop is 2+1+ 1+2+1+2+1 + 1+1 = 12 bytes. sbc #1 ; 2 cycles - Carry was set from cmp pha ;\ txa ; | ldx #46 ; | dex ; |- ; 247 cycles, 254 total bpl *-1 ; | tax ; | pla ;/ nop ; 2 cycles; 256 cycles total delay_256a_x_33_clocks_c: cmp #1 ; +2; 2 cycles overhead bcs :- ; +2; 4 cycles overhead ; 0-255 cycles remain, overhead = 4 txa ; +2; 6; +27 = 33 ;passthru <<Place the function delay_a_27_clocks immediately following here>>
256×A + 16 cycles of delay, clobbers A, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A*256 clocks + overhead ; Clobbers A. Preserves X,Y. ; Time: A*256+16 clocks (including JSR) ; Depends on delay_a_25_clocks ;;;;;;;;;;;;;;;;;;;;;;;; delay_256a_16_clocks: cmp #0 bne :+ rts delay_256a_11_clocks_: : pha lda #(256-25-7-2-2-3) jsr delay_a_25_clocks pla sec sbc #1 bne :- rts
Alternative that depends on different function:
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays A*256 clocks + overhead ; Clobbers A. Preserves X,Y. ; Time: A*256+16 clocks (including JSR) ; Depends on delay_a_27_clocks ;;;;;;;;;;;;;;;;;;;;;;;; delay_256a_16_clocks_b: cmp #0 bne :+ rts delay_256a_11_clocks_b_: : pha lda #(256-27-7-2-2-3) jsr delay_a_27_clocks pla sec sbc #1 bne :- rts
256×X + 16 cycles of delay, relocatable, clobbers X, Y, Z&N
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays X*256 clocks + overhead ; Clobbers X,Y. Preserves A. Relocatable. ; Time: X*256+16 clocks (including JSR) ;;;;;;;;;;;;;;;;;;;;;;;; delay_256x_16_clocks: cpx #0 bne :+ rts delay_256x_11_clocks_: ;5 cycles done. Loop is 256 cycles : ldy #50 dey bne *-1 dex bne :- ;Loop end is -1 cycles. Total: 4+JSR+RTS = 16 rts
Can be trivially changed to swap X, Y.
256×X + A + 30 cycles of delay, clobbers A, X, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays X*256+A clocks + overhead ; Clobbers A,X. Preserves Y. ; Depends on delay_a_25_clocks within short branch distance ; Time: X*256+A+30 clocks (including JSR) ;;;;;;;;;;;;;;;;;;;;;;;; delay_256x_a_30_clocks: cpx #0 ;2 beq delay_a_25_clocks ;3 ;4 cycles done. Must consume 256 cycles; 252 cycles remain. pha ;3 lda #(256-4-(3+2+4+2+3))-25 ;2 jsr delay_a_25_clocks ;238 pla ;4 dex ;2 jmp delay_256x_a_30_clocks ;3
Can be trivially changed to swap X, Y.
851968×Y + 3328×A + 13×X + 30 cycles of delay, clobbers A, X, Y, Z&N, C, V
;;;;;;;;;;;;;;;;;;;;;;;; ; Delays 30+13*(65536*Y+256*A+X) cycles including JSR. ; Clobbers A,X,Y. delay_851968y_3328a_13x_30_clocks: iny @l1: nop nop @l2: cpx #1 dex sbc #0 bcs @l1 dey bne @l2 rts