Fixed Bit Length Encoding: Difference between revisions
MetalSlime (talk | contribs) m (added backlinks to the RTS Trick article) |
(nibble) |
||
Line 15: | Line 15: | ||
== 4-bit Encoding == | == 4-bit Encoding == | ||
The most common type of [[Fixed Bit Length Encoding]] is 4-bit. 4-bit encoding is really easy to work with. Each byte will have exactly two data items: the left nibble is one data item, the right nibble is another. Here is one way to pull the two data items out of a byte that is 4-bit encoded: | The most common type of [[Fixed Bit Length Encoding]] is 4-bit or "nibble" encoding. 4-bit encoding is really easy to work with. Each byte will have exactly two data items: the left nibble is one data item, the right nibble is another. Here is one way to pull the two data items out of a byte that is 4-bit encoded: | ||
;left nibble | ;left nibble |
Revision as of 12:58, 21 February 2010
A byte is 8 bits and can have a value from 0-255. But let's pretend your data only uses the values 0-7. In this case, you don't need a whole 8 bits to represent each data item. You only need 3 bits. Fixed Bit Length Encoding will compress your data by packing it into the minimum number of bits needed to represent all possible values for your data. Using our 3-bit example, let's say we have this sequence of data:
7 1 2 4 7 7 7 1 1 1 2 3 4.
In binary, that's:
00000111, 00000001, 00000010, 00000100, 00000111, 00000111, 00000111, 00000001, 00000001, 00000001, 00000010, 00000011, 00000100
Those left 5 bits are always zero, so we can chop them off. Let's compress this data using Fixed Bit Length Encoding (3-bit). 00000111 will become 111, 00000001 will become 001, etc. to give us this:
11100101, 01001111, 11111001, 00100101, 0011100x
That's some nice savings!
4-bit Encoding
The most common type of Fixed Bit Length Encoding is 4-bit or "nibble" encoding. 4-bit encoding is really easy to work with. Each byte will have exactly two data items: the left nibble is one data item, the right nibble is another. Here is one way to pull the two data items out of a byte that is 4-bit encoded:
;left nibble lda data, y ;read a byte from our data stream lsr lsr lsr lsr ; right shift four times to get the left nibble jsr do_stuff
;right nibble lda data, y ;read a byte from our data stream and #$0F ; ANDing with #$0F zeros out the left side, jsr do_stuff
5-bit Encoding Example
4-bit encoding works out nice because two data items fit cleanly into one byte. But other Fixed Bit Length Encodings will have data items bleed into more than one byte. Let's look at an example.
Dragon Warrior 2 uses Fixed Bit Length Encoding (5-bit) to compress its text (modified and coupled with a dictionary, but nevermind). Using 5 bits allows the data to have values in the range 0-31 ($00-$1F). Let's say we have this sequence of data:
$02, $17, $16, $1F.
In binary, that's:
00000010, 00010111, 00010110, 00011111
Let's compress this using 5-bit encoding (ie, chop off the left 3 bits and smoosh them together):
00010101 11101101 1111xxxx
Notice that some of our data items bleed across two bytes:
[00010][101 11][10110][1 1111]xxxx
This makes decompressing the data more complicated. To decompress we will need to:
- read 2 bytes from the data stream at a time instead of one
- keep track of the bit position within the first byte
- have a different bit-extraction routine for each possible bit position
Code Example
Let's look at a code example that will decompress data encoded with Fixed Bit Length Encoding (5-bit):
; if we have the following values we want to compress: ; 14, 08, 07, 00, 1C, 06, 1E, 1F ; 01, 0F, 0C, 1F, 1F, 1F, 00, 13 ; ; this is how they look in binary: ; 00010100, 00001000, 00000111, 00000000, 00011100, 00000110, 00011110, 00011111 ; 00000001, 00001111, 00001100, 00011111, 00011111, 00011111, 00000000, 00010011 ; ;below are these values compressed in Fixed Bit Length Encoding (5bit): compressed_data: .byte %10100010, %00001110, %00001110, %00011011, %11011111 .byte %00001011, %11011001, %11111111, %11111100, %00010011
;;;;;;;;;;;;;;;;;;;;;;; ; ; Reads two bytes from the data stream. ; Called everytime bit_position crosses from byte1 to byte2 ; When that happens, the old byte2 becomes the new byte1 ; and a new byte2 is read from the data stream. ; get_next_2_bytes: lda compressed_data, y sta byte1 iny lda compressed_data, y sta byte2 rts
;;;;;;;;;;;;;;;;;;;;;;;;; ; ; This routine is called after every 5-bit value is extracted from the data. ; It finds the starting point for the next 5-bit value ; update_bit_position: lda bit_position clc adc #$05 ;next bit position is 5 bits later and #$07 ;keep the value between 0 and 7 sta bit_position cmp #$05 ;in the case of updating 0->5, 1->6, or 2->7, ;we are still in the first byte. ;We don't need to change anything bcs @skip jsr get_next_2_bytes ;bit position updates of 3->0, 4->1, 5->2, 6->3, or 7->4 ;indicate that we have jumped from byte1 to byte2, ;so we need to make byte2 the new byte1 ;and grab the next byte from the data stream (the new byte2) @skip: rts
;;;;;;;;;;;;;;;;;;;;;;; ; ; depending on our bit position, we will enter this subroutine from a different spot. ; we will use the RTS Trick to enter this subroutine (avoids a ridiculous 8-way branch) ; ; temp1 is a copy of byte1, ; temp2 is a copy of byte2 ; register A holds the value of byte1 ; bitposition0: ;[xxxxx]xxx xxxxxxxx shift right 3 times to get our 5 bits lsr bitposition1: ;x[xxxxx]xx xxxxxxxx shift right 2 times, AND with #$1F lsr bitposition2: ;xx[xxxxx]x xxxxxxxx shift right once, AND with #$1F lsr bitposition3: ;xxx[xxxxx] xxxxxxxx AND with #$1F jmp done bitposition7: ;xxxxxxx[x xxxx]xxxx roll left 4 times, AND with #$1F asl temp2 rol temp1 bitposition6: ;xxxxxx[xx xxx]xxxxx roll left 3 times, AND with #$1F asl temp2 rol temp1 bitposition5: ;xxxxx[xxx xx]xxxxxx roll left 2 times, AND with #$1F asl temp2 rol temp1 bitposition4: ;xxxx[xxxx x]xxxxxxx roll left once, AND with #$1F asl temp2 rol temp1 lda temp1 done: and #$1F rts
bit_pos_table: .word bitposition0-1 ;subtract 1 from the address because we are doing the RTS Trick .word bitposition1-1 ;(ie push the address-1 onto the stack and rts to it) .word bitposition2-1 .word bitposition3-1 .word bitposition4-1 .word bitposition5-1 .word bitposition6-1 .word bitposition7-1
;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ; this subroutine take an address from bit_pos_table, pushes it on the stack, and rts to it ; this trick saves us from having a long 8-way conditional branch ; (see RTS Trick for more details) ; extract_5bits: lda bit_position asl tax lda bit_pos_table+1, x pha lda bit_pos_table, x pha lda byte2 ;make temporary copies of byte1 and byte2 sta temp2 lda byte1 sta temp1 ;byte1 is in A when we start extracting rts ;return to the address we just pushed
main: ;use this as your reset code to try it out lda #$00 sta bit_position sta ram_index ;init variables tay jsr get_next_2_bytes ;grab the first 2 bytes from the data stream @loop: jsr extract_5bits ;pull out the next 5-bit value ldx ram_index sta $200, x ;output the result to RAM inx stx ram_index jsr update_bit_position cpx #$10 ; loop 16 times (because we are extracting 16 bytes.) bne @loop forever: jmp forever