Watermarking: Difference between revisions
m (→Instruction encoding: clarify: MMC5 ports are completely decoded) |
m (dangling pronoun) |
||
Line 18: | Line 18: | ||
Writes to some [[PPU registers]] act similarly: when setting up $2000 and $2001 at the end of vertical blanking, it doesn't matter which is written first. | Writes to some [[PPU registers]] act similarly: when setting up $2000 and $2001 at the end of vertical blanking, it doesn't matter which is written first. | ||
The markup for such "bundles of instructions" resembles the markup for stop bits in [[wikipedia:explicitly parallel instruction computing|explicitly parallel instruction computing]]. | The markup for such "bundles of instructions" resembles the markup for stop bits in [[wikipedia:explicitly parallel instruction computing|explicitly parallel instruction computing]]. | ||
Adding the required markup also has a benefit beyond watermarking: | Adding the required markup also has a benefit beyond watermarking: thinking about what instructions affect others forces a code review, which allows a programmer to refamiliarize himself with a code base and possibly discover defects. | ||
A common method to cope with bus conflicts on discrete [[mapper]]s brings in another trick. | A common method to cope with bus conflicts on discrete [[mapper]]s brings in another trick. |
Revision as of 15:30, 28 January 2011
Watermarking is defined by Wikipedia as "the process of embedding information into a digital signal in a way that is difficult to remove." In some cases, the developer of an unreleased NES program wants to distribute copies to beta testers but still trace any leaked copies of the program to the tester who broke the non-disclosure agreement. There are several ways to produce binaries that can be traced back to a particular recipient.
Shuffling
One way to make each copy unique is to shuffle, or randomly rearrange, pieces of a program at compile time.
A code preprocessor can randomize the order of statically allocated variables in a program. This causes the addresses embedded in the code to change every time the program is compiled. It has benefits beyond watermarking: as the program is shuffled, a randomly chosen variable acts as a canary for the variable before it, and the effects of a buffer overflow may become more apparent.
A code preprocessor can shuffle the order of subroutines or lookup tables in a program. Watch out: A common technique on the NES is the "inline tail call", in which a subroutine doesn't return but instead falls off the end into the following subroutine. You'll need to take this into account when adding markup to control the preprocessor.
A code preprocessor can shuffle the order of instructions in a subroutine.
In a lot of cases, the order of instructions doesn't matter, such as LDA
vs. CLC
, or LDX
vs. LDY
where neither is indexed, or STA (d),Y
vs. DEX
, or STA of the same value to several different variables.
Writes to some PPU registers act similarly: when setting up $2000 and $2001 at the end of vertical blanking, it doesn't matter which is written first.
The markup for such "bundles of instructions" resembles the markup for stop bits in explicitly parallel instruction computing.
Adding the required markup also has a benefit beyond watermarking: thinking about what instructions affect others forces a code review, which allows a programmer to refamiliarize himself with a code base and possibly discover defects.
A common method to cope with bus conflicts on discrete mappers brings in another trick.
For example, a game using UNROM might load from a table and write back to that table to make sure that the written bits match the bits in ROM.
(See the banktable
example at Programming UNROM.)
But if you shuffle the data in banks 0 through 6 and shuffle the bank numbers in the table, you can make 7! = 5040 different binaries from this alone.
Even in a game no bigger than NROM-128, shuffling alone allows for more distinct binaries than the number of atoms in the known universe squared. With the size of NES games and with modern solid archiving tools such as 7-Zip, you can save each binary that you send out to each tester and still not fill a 4 GB USB flash drive. As long as the binary doesn't get leaked to someone with the knowledge to disassemble and reassemble a binary (as in SMBDis), computing the Hamming distance between the leaked copy and your saved copies is likely to result in a high-confidence match to the leaker.
Instruction encoding
A few instructions have multiple encodings. A code preprocessor can introduce any of several NOP instructions at random points in a non-time-critical subroutine.
The addresses of the PPU's ports (nominally $2000-$2007) are incompletely decoded. So are the ports of almost every ASIC mapper, such as MMC1 and MMC3 (but not MMC5). Because the NES's internal memory decoder ignores A12 through A3, each PPU port appears 1024 times in the range $2000-$3FFF. Likewise, the MMC3 ignores A12 through A1, and each port appears 4096 times. A code preprocessor could randomize these address bits in any instruction that reads or writes these ports. This would also serve to hinder a cracker's use of an in-emulator debugger that doesn't take mirroring into account.
Graphics changes
Graphics can depend on the build:
- Choose one of several alternatives for grass and other noisy tiles
- Tester's name or something derived from tester's name on the title screen. This is easy to remove, but it acts as a deterrent.
- Tester's name or something derived from tester's name on a sign in a building in the game
Compression
If your program includes compressed data, you can change the interpretation of bits in the data format. For example, in RLE compression of graphics, the sense of bits denoting a run of repeated pixels vs. bits denoting a run of several literal pixels can be inverted.