8-bit Multiply: Difference between revisions
Rainwarrior (talk | contribs) (replacing lead which was orphaned text) |
Rainwarrior (talk | contribs) m (→Bob Rost / frantik: code size relevant because it's a macro) |
||
Line 83: | Line 83: | ||
</pre> | </pre> | ||
Assuming no page crossings and zero page, this routine takes 48-398 cycles (average 223), though it is an in-place macro generating new code each time it is used. | Assuming no page crossings and zero page, this routine takes 48-398 cycles (average 223), though it is an in-place macro generating 42 bytes of new code each time it is used. | ||
This also has a problem with the value 0 in value2ptr, which will cause it to enter an infinite loop, rather than correctly returning 0. | This also has a problem with the value 0 in value2ptr, which will cause it to enter an infinite loop, rather than correctly returning 0. |
Revision as of 06:39, 12 October 2017
Since the 6502 CPU has no multiplication instruction, this feature has to be written in software.
Bregalad
This routine by Bregalad does binary long multiplication (a.k.a. "the Russian Peasant method").
;8-bit multiply ;by Bregalad ;Enter with A,Y, numbers to multiply ;Output with YA = 16-bit result (X is unchanged) Multiply: sty Factor ;Store input factor ldy #$00 sty Res sty Res2 ;Clear result ldy #$08 ;Number of shifts needed - lsr A ;Shift right input number bcc + ;Check if bit is set pha lda Res2 clc adc Factor sta Res2 ;If so add number to result pla + lsr Res2 ;Shift result right ror Res dey bne - lda Res ldy Res2 rts
An optimization for efficiency is made here; binary long multiplication requires adding one multiplicand to the result at various bit-shifts (i.e. multiply by each power of 2). The naive approach might maintain the value to add as a 16-bit value, left shifting it once each iteration to reach the next power of 2. This one, however, takes advantage of the input being only 8-bits wide, and instead pre-multiplies the result by 256 (8 bits), and each iteration instead right-shifts the result. After 8 iterations the pre-multiply is undone, and the advantage gained is that only the shift is 16-bit; adding the multiplicand remains an efficient 8-bit add.
Assuming no page crossings and zero page, this routine takes 184-320 cycles (average 252), not counting the JSR to call it. (Each non-zero bit in A adds 17 cycles.)
Bob Rost / frantik
This routine by Bob Rost is another binary long multiplication, taken from his NES Development Class PDFs.
; Multiply two bytes in memory using Russian peasant algorithm ; by frantik ; Accepts: value1ptr and value2ptr, pointers to bytes in memory ; value2ptr should point to the lesser of the two values ; for increased efficiency ; Uses: $00, $01, $02 for temporary variables ; Returns: 16 bit value in $00 and $01 .macro multiply value1ptr, value2ptr ret = $00 ; return value temp = $02 ; temp storage lda #$00 ; clear temporary variables sta ret sta ret+1 sta temp jmp start: -loop: asl value1ptr ; double first value rol temp ; using 16bit precision lsr value2ptr ; halve second vale start: lda value2ptr ; and #01 ; is new 2nd value an odd number? beq -loop: ; clc ; if so, add new 1st value to running total lda ret ; adc value1ptr ; sta ret ; lda ret+1 ; adc temp ; sta ret+1 ; lda value2ptr ; cmp #01 ; is 2nd value 1? if so, we're done bne -loop: ; otherwise, loop .endm
Assuming no page crossings and zero page, this routine takes 48-398 cycles (average 223), though it is an in-place macro generating 42 bytes of new code each time it is used.
This also has a problem with the value 0 in value2ptr, which will cause it to enter an infinite loop, rather than correctly returning 0.
External Links
- Forum post: Fast multi, ... - Faster multiplication routine using lookup tables, by keldon.