Tile compression: Difference between revisions

From NESdev Wiki
Jump to navigationJump to search
(→‎Run-length encoding: Added an example bit-based RLE (profiled and optimized exhaustively using a brute force algorithm))
(→‎Bit-based RLE: correct data (there was a bug in the benchmark))
Line 141: Line 141:


Most RLE schemes deal with whole bytes. There are also schemes where the compressed data forms a bitstream, that contains integers of different bit widths.
Most RLE schemes deal with whole bytes. There are also schemes where the compressed data forms a bitstream, that contains integers of different bit widths.
RLEINC, described above, achieves 54% compression ratio (46% reduction) for all the data it processes in Simon's Quest.
RLEINC, described above, achieves 37% compression ratio (63% reduction) for all the data it processes in Simon's Quest.
If, however, the data was compressed using the following scheme, 66% reduction would be achieved:
If, however, the data was compressed using the following scheme, 66% reduction was measured:


{| class="wikitable"
{| class="wikitable"
Line 158: Line 158:
|}
|}


The sequence and double-byte options are not as useful with bitmap data as they are with nametables. When compressing the combined tile graphics of Super Mario Bros. and Kirby's Adventure, RLEINC achieves a 11% reduction in data size (PackBits gets 12%), while a bit-based version can achieve a 21% reduction at best, and that is by doing away with the SEQ/DBL options entirely. For comparison, PB53 achieves 25%.
<!-- For the record, PB53 achieves 54% compression ratio (46% reduction) for Simon's Quest nametables.-->
<!-- In this method: 0000 = END, 0nnn = LIT, 1nnn = RUN. PB53 achieves 46% reduction for Simon's Quest nametables, just like RLEINC.-->
 
While useful for packing nametables, the sequence and double-byte options are not as useful with bitmap data.
When compressing the combined tile graphics of Super Mario Bros. and Kirby's Adventure, RLEINC achieves a 11% reduction in data size (PackBits gets 12%), while a simple bit-based RLE can achieve a 21% reduction. For comparison, PB53 achieves 25% for that data set.
 
<!-- The exact bit-based method that achieves 21% reduction for that data set: 0000 = END, 0nnn = Copy next (n+1)*8  bits, 1nnn bbbbbbbb = Put b n+2 times. -->


== Common byte ==
== Common byte ==

Revision as of 20:48, 30 April 2016

Tile compression refers to techniques that allow fitting more graphics data into a smaller space. Programs using CHR ROM cannot use compressed tiles, as their tile data must be stored in the PPU's native format. But programs using CHR RAM can process tile data while copying it from PRG ROM to CHR RAM, and this processing allows storing more tiles in the same space.

Run-length encoding

Run-length encoding transforms runs of identical bytes into a shorter sequence of bytes that specifies the length of the run.

In NES tile data, byte-level run-length encoding works well when a row of 8 pixels in a tile is identical to the row above it. It also works well for nametable data because a horizontal run of blank tiles becomes a single tile.

Pixel-level run-length encoding is much slower but can achieve impressive results within a tile.

There are several different RLE data formats.

PCX

The PCX image format became popular on PC.

Value Meaning
00-BF Write this byte to the output.
C0-FF Read another byte, and write it to the output n - 192 times.

PackBits

The PackBits format was invented by Apple for MacPaint. It is also used in TIFF files and a few homebrew releases by Damian Yerrick.

Value Meaning
00-7F Copy n + 1 bytes from input to output.
80 No operation
81-FF Read another byte, and write it to the output 257 - n times.

Konami RLE

This format is used in Blades of Steel, the U.S. version of Contra, and the Japanese version of Simon's Quest. It can be decoded and encoded with the Python program GraveyardDuck. Compression ratio is more or less identical to PackBits.

Value Meaning
00-80 Read another byte, and write it to the output n times.
81-FE Copy n - 128 bytes from input to output.
FF End of compressed data

GBA RLUnComp

The BIOS of the Game Boy Advance and Nintendo DS contains a decompressor for an RLE format very similar to PackBits and Konami. As described in GBATEK, it has a 4-byte size header followed by this:

Value Meaning
00-7F Copy n + 1 bytes from input to output.
80-FF Read one byte from input and write it to output n - 125 times.

PB53

This codec was conceived by Damian Yerrick as an alternative to PackBits for the Action 53 multicart, and a Python packer and 6502 unpacker are included in the Action 53 menu distribution. Unlike freeform RLE formats such as Konami and PackBits, PB53 operates on 16-byte units, making it easy to divide the decompressed data into fixed-size packets to be sent to the PPU during vblank while rendering is turned on. Like LZSS, PB53 uses unary coding on the lengths of literal runs to save on overhead from switching between literal and run modes. This means that like LZSS, it has a worst case expansion of 12.5%, but it works fairly well on real tile data and OK on nametable data, which have shorter runs than the high-resolution files for which PackBits was designed. It also has a special mode to accommodate the layout of Shiru's NROM games LAN Master, Lawn Mower, and Chase, which have many identical tiles between the two pattern tables to allow tile animation.

Each tile consists of several 8-byte planes, two planes in the NES implementation. For the first plane in a tile:

Value Meaning
00-7F Copy one byte from input to output. Then, for each bit from 6 to 0, if the bit is 1, repeat the previous byte; otherwise, copy another byte from the input. This is somewhat similar to how control bytes are formatted in LZSS.
80 Write eight $00 bytes.
81 Write eight $FF bytes.
82 Copy one tile (16 bytes) starting one tile back. (Used for a repeated tile, such as the unused tiles in many games.)
83 Copy one tile starting one segment back, usually 4096 bytes. (Used for pattern tables that share tiles, as seen in several Shiru games. The decoder switches between two instances one segment apart, each with its own input stream and output buffer.)
84 Write sixteen $00 bytes. (Solid color 0)
85 Write eight $FF bytes then eight $00 bytes. (Solid color 1)
86 Write eight $00 bytes then eight $FF bytes. (Solid color 2)
87 Write sixteen $FF bytes. (Solid color 3)

For other planes:

Value Meaning
00-81 Same as first plane
82 Copy previous 8 bytes. (Used for 1-bit tiles with colors 0 and 3.)
83 Copy previous 8 bytes, bit-inverted. (Used for 1-bit tiles with colors 1 and 2.)

RLEINC

This RLE variant was used by Joel Yliluoma in the Simon's Quest retranslation project. It is designed for streaming data quickly into the PPU's VRAM or nametables, with relative small code for the decompressor. It emphasises compression without using backreferences, and replaced the game's own RLE compression scheme, making it possible to leave more room for text in the ROM. Ideas are borrowed from the compression scheme used in the first generation Pokémon games.

Value Meaning
00–3F LIT: Copy (n+1) bytes from input to output backwards
40 END: End of stream
41–7F SEQ: Read next byte b. Put b, (n−0x3F) times; add 1 to b after each iteration
80–9F DBL: Read next byte b1, and next byte b2. Put b1, (n−0x7D) times; swap b2 and b1 after each iteration
A0–FF RUN: Read byte b. Put b, (0x101−n) times.

A compressor for this scheme can be found at http://bisqwit.iki.fi/src/nes-ppu_rleinc_compress.php.txt (PHP), and a decompressor at http://bisqwit.iki.fi/src/nes-ppu_rleinc_v2b.inc (CA65).

Codemasters

This is a hybrid of RLE and a Markov chain (predictive) algorithm. It works on packets measured in whole tiles, and it largely operates on pixels horizontally within a tile, making it slow. Explained on forum, and with an improvement by tokumaru to reduce the overhead of a frequency table change.

Bit-based RLE

Most RLE schemes deal with whole bytes. There are also schemes where the compressed data forms a bitstream, that contains integers of different bit widths. RLEINC, described above, achieves 37% compression ratio (63% reduction) for all the data it processes in Simon's Quest. If, however, the data was compressed using the following scheme, 66% reduction was measured:

Bit sequence Meaning
00 END
01 LIT: Read next 8 bits (byte b). Put b.
100 nnnn SEQ: Read next 8 bits (byte b). Put b, n+2 times, adding 1 after each iteration.
101 nnnnnnnn DBL: Read next 8 bits (byte b1), and next byte b2. Put b1, n+3 times; swap b2 and b1 after each iteration.
11 nnnnn RUN: Read next 8 bits (byte b). Put b, n+2 times.


While useful for packing nametables, the sequence and double-byte options are not as useful with bitmap data. When compressing the combined tile graphics of Super Mario Bros. and Kirby's Adventure, RLEINC achieves a 11% reduction in data size (PackBits gets 12%), while a simple bit-based RLE can achieve a 21% reduction. For comparison, PB53 achieves 25% for that data set.


Common byte

Oracle common byte

This codec, used in The Legend of Zelda: Oracle of Seasons and The Legend of Zelda: Oracle of Ages according to Dwedit, is roughly comparable to RLE in complexity. For each 16-byte block, the compressor determines the most common byte in that block. The compressed data for each block starts with a 2-byte mask, then the most common byte, then other bytes in order that aren't the most common.

To decode a block: First read the two-byte mask. If the entire mask is zero, set the bit corresponding to the first byte to true. Then read the most common byte. For each bit in the mask, if the bit is 1, write the most common byte to output; otherwise, copy one byte.

Maximum expansion is 12.5% for any block that has 16 different bytes in it: two bytes of mask and 16 bytes of data.

LZSS

A lot of games on platforms after the NES use LZ77 family compression methods such as LZSS, which generalizes run-length encoding to allow copying data from several bytes ago, not just the previous byte. Few NES games use LZ77 because the NES's limited work RAM and limited access to video memory make it difficult to resolve back references. (Fewer still use LZW or any other LZ78-based method because of patents that subsisted through the NES, Super NES, and Nintendo 64 eras.)

In LZSS, the mask contains 8 commands, each either to copy a literal byte or to copy a back reference. determines whether the next 8 things are literal bytes or back references. Once all commands have been processed, read another mask.

Read a mask byte from input.
For each bit in the mask byte:
If the bit is 0, this is a literal:
Copy one byte from input to output.
Otherwise, this is a back-reference:
Read and decode a length and distance from input. These will be positive integers.
Copy length bytes from the previous output, distance bytes before now, to output.

LZSS flavors vary mainly in how they encode lengths and distances.

LZ77UnComp

The BIOS of the Game Boy Advance and Nintendo DS has a built-in decompressor for a straightforward LZSS flavor with 3- to 18-byte references into the previous 4096 bytes of output. The data format is described in Martin Korth's GBATEK.

Caution: In data intended for decompression directly to the GBA or DS video memory, the second byte of a 16-bit word cannot refer to the first byte of the same word. So the encoder must not write a run with distance 1 that starts at an odd offset.

Oracle LZ

This flavor of LZSS is used in The Legend of Zelda: Oracle games for Game Boy Color according to Dwedit.

An entire compressed block can be compressed with one of two subtypes of Oracle LZ: short word mode and long word mode. Short words use references of 2 to 8 bytes into the previous 32 bytes of output, and long words use references of 3 to 33 bytes into the previous 2048 bytes. (A length value of 0 means read an additional byte and use that as the length.) Only short words would work very well on NES.

Pokémon LZ

This compression scheme is used in the first-generation Pokémon games on the Game Boy. It is used for bitmap compression.

A byte n is read and split into two parts: code = n >> 5, and c = n & 0x1F. Byte 0xFF marks the end of the stream. Otherwise the code is interpreted as follows:

code Meaning
0 Copy the next c + 1 bytes to the output.
1 Read another byte, and write it to the output c + 1 times.
2 Read another byte b1 and byte b2, and write byte b1 to the output c + 1 times, swapping b1 and b2 after each write.
3 Write a zero byte (0x00) to the output c + 1 times.
4 Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output.
5 Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output, reversing the bits in each byte.
6 Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output, decrementing the read position after each write (i.e. reverse the data).
7 Read another byte b. Change code = (c >> 2), and change c = (c & 3) × 256 + b. Re-interpret code according to this table.

Chrono Trigger LZ

This compression scheme is used in Square‘s Chrono Trigger for the SNES for compression of graphics and various data.

The data consists of packets. Each packet begins with a 16-bit offset, which gives the packet ending offset relative to the beginning of compressed data. At that offset, there is always a control byte. The first control byte sets the size of offsets: If the byte is < 0xC0, then offsetsbits=12. Else offsetbits=11. Interpreting the offsetbits is done only once. The higher-order two bits in the control bytes of all other packets are ignored. The counter is assigned a default value of 8.

Loop: If the packet end has been reached, then a control byte is read, and counter is assigned the low 6 bits of that byte (i.e. counter = byte & 0x3F). If the counter is zero, the decompression is complete. Otherwise, a new 16-bit packet end offset is read.

An instruction byte is read. If the byte is zero, then next eight bytes are copied verbatim to the output. The counter is not affected. Goto loop.

Otherwise, the instruction byte is treated as a multibit integer, where counter determines the number of bits. E.g. if the instruction byte is 01100011, and counter is 14, it is treated as 00000001100011. If counter is 4, it is treated as 0011.

Each bit from the instruction byte is read, beginning from the lowest-order bit. If the bit is zero, a single byte is copied verbatim to the output. Otherwise a 16-bit word is read from the input. offset is the lowest-order offsetbits from that word, and length is the rest of the bits plus 3. The decompressor copies length bytes from offset bytes behind to the output. After all bits have been read, the counter is reset to the default value of 8. Goto loop.

External links