Text compression
Text compression refers to techniques that allow fitting more text data into a smaller space.
Dictionary compression
Dictionary compression is a technique where part of the character set is reserved to denote references to a "dictionary". If the byte falls into this range, a string is copied from the dictionary rather than the byte being copied verbatim. As this compression technique does not require knowledge of past data, it is very easy to implement on machines having little memory like the NES.
Sometimes the compression may be applied recursively, where the dictionary string itself may contain references to other dictionary strings.
Example implementations:
In Simon's Quest (NES) (all versions), the character set is 256 values, although only a portion of those are actual text symbols. The following bytes have a special meaning:
- FF = End of string
- FE = Newline
- FD = Save current string number. The next byte determines the string number; all consecutive characters will be read from that string rather than the current one.
- FC = End current string rendering and return to the string whose number was saved by opcode FD.
- 00-FB = Print this character.
There is room for only one saved string number, so substrings can not refer to other substrings, unless it is to terminate the entire string.
In Chrono Trigger (SNES) (all versions), the character set is 768 elements long, but the strings are 8-bit. The following special bytes are defined:
- 00 = End of string
- 01 = Read next byte; print character (byte+0x100)
- 02 = Read next byte; print character (byte+0x200)
- 03..20 = Various text effects, references to item tables, and references to party member names
- 21..xx = Reference to a dictionary string. xx is a compile-time constant that determines the length of the dictionary. This number is 0xA0 in the USA version.
- xx+1..FF = Print this character.
Dictionary strings are not applied recursively. The dictionary strings are stored in length-string format without an end delimiter.
Dual-tile encoding
Dual-tile encoding, or DTE for short, is a special case of dictionary compression. It is also known as [byte-pair encoding], or digram coding. In this case, the dictionary strings are all two bytes long.
Bitrate reduction methods
Fixed-bit encoding
When the character set is small, such as 64 characters at most, strings could be encoded in a bitstream that packs 6 bits per character rather than 8 bits per character. This results in 20 % reduction of data size.