8-bit Multiply: Difference between revisions

From NESdev Wiki
Jump to navigationJump to search
(replacing lead which was orphaned text)
m (→‎Bob Rost / frantik: code size relevant because it's a macro)
Line 83: Line 83:
</pre>
</pre>


Assuming no page crossings and zero page, this routine takes 48-398 cycles (average 223), though it is an in-place macro generating new code each time it is used.
Assuming no page crossings and zero page, this routine takes 48-398 cycles (average 223), though it is an in-place macro generating 42 bytes of new code each time it is used.


This also has a problem with the value 0 in value2ptr, which will cause it to enter an infinite loop, rather than correctly returning 0.
This also has a problem with the value 0 in value2ptr, which will cause it to enter an infinite loop, rather than correctly returning 0.

Revision as of 06:39, 12 October 2017

Since the 6502 CPU has no multiplication instruction, this feature has to be written in software.

Bregalad

This routine by Bregalad does binary long multiplication (a.k.a. "the Russian Peasant method").

;8-bit multiply
;by Bregalad
;Enter with A,Y, numbers to multiply
;Output with YA = 16-bit result (X is unchanged)
Multiply:
	sty Factor  ;Store input factor
	ldy #$00
	sty Res
	sty Res2    ;Clear result
	ldy #$08    ;Number of shifts needed

-	lsr A       ;Shift right input number
	bcc +       ;Check if bit is set
	pha
	lda Res2
	clc
	adc Factor
	sta Res2    ;If so add number to result
	pla
+	lsr Res2    ;Shift result right
	ror Res
	dey
	bne -
	lda Res
	ldy Res2
	rts

An optimization for efficiency is made here; binary long multiplication requires adding one multiplicand to the result at various bit-shifts (i.e. multiply by each power of 2). The naive approach might maintain the value to add as a 16-bit value, left shifting it once each iteration to reach the next power of 2. This one, however, takes advantage of the input being only 8-bits wide, and instead pre-multiplies the result by 256 (8 bits), and each iteration instead right-shifts the result. After 8 iterations the pre-multiply is undone, and the advantage gained is that only the shift is 16-bit; adding the multiplicand remains an efficient 8-bit add.

Assuming no page crossings and zero page, this routine takes 184-320 cycles (average 252), not counting the JSR to call it. (Each non-zero bit in A adds 17 cycles.)

Bob Rost / frantik

This routine by Bob Rost is another binary long multiplication, taken from his NES Development Class PDFs.

; Multiply two bytes in memory using Russian peasant algorithm
; by frantik

; Accepts: value1ptr and value2ptr, pointers to bytes in memory
;   value2ptr should point to the lesser of the two values 
;   for increased efficiency
; Uses: $00, $01, $02 for temporary variables
; Returns: 16 bit value in $00 and $01

.macro multiply value1ptr, value2ptr

ret  = $00              ; return value
temp = $02              ; temp storage

	lda #$00        ; clear temporary variables
	sta ret
	sta ret+1
	sta temp
	jmp start:

-loop:
	asl value1ptr   ; double first value
	rol temp        ; using 16bit precision
	lsr value2ptr   ; halve second vale
start:
	lda value2ptr   ;
	and #01         ; is new 2nd value an odd number?
	beq -loop:      ; 
	clc             ; if so, add new 1st value to running total
	lda ret         ;
	adc value1ptr   ;
	sta ret         ;
	lda ret+1       ;
	adc temp        ;
	sta ret+1       ;
	lda value2ptr   ;
	cmp #01         ; is 2nd value 1?  if so, we're done
	bne -loop:      ; otherwise, loop
.endm

Assuming no page crossings and zero page, this routine takes 48-398 cycles (average 223), though it is an in-place macro generating 42 bytes of new code each time it is used.

This also has a problem with the value 0 in value2ptr, which will cause it to enter an infinite loop, rather than correctly returning 0.

External Links

  • Forum post: Fast multi, ... - Faster multiplication routine using lookup tables, by keldon.