Synthetic instructions
There are several additional instructions that would be nice to have on the NES. Even though not present, they can be synthesized using existing instructions. If turned into assembler macros, they can be used almost as if they were natively supported. Being able to think of them as native instructions lightens the mental load when programming, because instructions are an important tool for abstraction. Even without making the following into macros, after reading them you will be more likely to think of one of these while coding, saying "I need a subtract-from instruction here".
Negate A
Many processors have a native negate instruction, which subtracts the value from zero. Here we must manually calculate the two's complement of A, which involves a one's complement and increment:
; A = -A eor #$FF sec adc #0
Subtract A from Value
Using SBC, a value can be subtracted from A, but there's no direct way to subtract A from some value. To do so, A must be negated, then added to the value:
; A = Value - A eor #$FF sec adc Value
As a special case, if we want to subtract A from 255, we can just do
; A = 255 - A eor #$FF
This also shows another way of understanding the general subtract from; we first subtract A from 255, then add one, so it's as if we subtracted from 256, which is the same as subtracting from zero, since A is only 8 bits.
Arithmetic Shift Right
The 65xx series lacks an arithmetic right shift, which doesn't alter the sign (top) bit. This shift is used to divide a signed value by two. LSR doesn't work because it shifts the sign bit to the right, then clears it.
To implement this, we need carry set to the sign bit, then we can use ROR. CMP #$80 performs this task; if the value is less than $80, carry is cleared, otherwise it's set:
; Arithmetic shift right A cmp #$80 ror a
If the operand is in memory, we just use ASL to move the sign bit into carry:
; Arithmetic shift right Value lda Value asl a ror Value
8-Bit Rotate
The 65xx series rotate instructions are all 9-bit, not 8-bit as often imagined. If they really were 8-bit, then eight ROR or ROL instructions in a row would leave A with its original value. In actuality, nine are required to do so, since the carry acts as a ninth bit of A.
Similar to arithmetic right shift, we must set carry to the top or bottom bit in advance of the rotate. For 8-bit rotate left, it's simple:
; 8-bit rotate left A cmp #$80 rol a ; alternate method asl a adc #0
For 8-bit rotate right, we must save and restore A:
; 8-bit rotate right A pha lsr a pla ror a
A could be saved and restored using other methods, like TAX and TXA, etc.
If the operand is in memory:
; 8-bit rotate left Value lda Value asl a rol Value ; 8-bit rotate right Value lda Value lsr a ror Value
16-bit Increment and Decrement
Incrementing/decrementing a 16-bit value involves first adjusting the low byte, then adjusting the high byte if necessary. Increment is simpler, since the high byte is adjusted when the low byte wraps around to zero; for decrement, the high byte is adjusted when the low byte wraps around to $FF.
; 16-bit increment Word inc Word bne :+ inc Word+1 : ; 16-bit decrement Word lda Word bne :+ dec Word+1 : dec Word
16-bit increment shows even more advantage when used to control a loop, because the 16-bit decrement conveniently leaves the zero flag set at the end only if the entire 16-bit value is zero.
X/Y as Operand
Normally X and Y cannot be used as an operand to an instruction operating on A. For example, CMP X isn't possible. Where X or Y needs to be used in such a way, they are usually saved to a temporary variable:
; Compare A with X stx temp cmp temp
By putting a 256-byte table in memory with each entry simply having the value of its index, X and Y can be used as operands:
table: .byte $00,$01,$02,$03,$04,$05,$06,$07,$08,$09,$0A,$0B,$0C,$0D,$0E,$0F .byte $10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$1A,$1B,$1C,$1D,$1E,$1F ... .byte $F0,$F1,$F2,$F3,$F4,$F5,$F6,$F7,$F8,$F9,$FA,$FB,$FC,$FD,$FE,$FF
cmp table,x ; CMP X eor table,x ; EOR X clc adc table,y ; ADC Y
JMP (addr,X)
The JMP (addr,X) instruction is present in later 65xx processors. It behaves like JMP (addr), except it fetches the 16-bit value from addr+X. The least-problematic way to implement this is using RTI:
; Jump to address stored at addr+X lda addr+1,x pha lda addr,x pha php rti
See Jump Table for further explanation and alternate approaches.