Catch-up
The NES CPU, PPU, APU, and mapper run at the same time. But most emulators are programmed for a Von Neumann architecture that does only one thing at a time. So an emulator must switch among emulating these components one at a time. The author of Nintendulator takes clarity and accuracy over speed and emulates each component for one CPU cycle before switching to the next.
But efficient emulators do some level of catch-up, involving running the emulated CPU for several dozen cycles and then running the PPU and APU until they are synchronized. Keeping one component in the host CPU for a longer time speeds things up because the relevant data stays in the host CPU's fast registers and cache, not (slower) main memory. The basic technique looks like this:
- Find the next time that one component could affect another, such as the CPU writing to a PPU register or the PPU asserting an interrupt to the CPU.
- Run the CPU up to that time.
- Run the other component up to that time.
At the end of each frame (e.g. the start of scanline 0 or scanline 240), the emulator catches up everything and hands off the completed video surface and audio stream to the operating system.
Prediction
One basic technique involves predicting when each component will do something "important", like firing an IRQ or changing a status register.
Some things can be predicted:
- Vertical blanking NMI
- Sprite 0 hit
- Lines containing at least 8 sprites that would trigger the overflow flag
- APU frame counter IRQ
- APU length counter status
An emulator might make a rough prediction and fall back to I/O catch-up or cycle-by-cycle emulation until the "important" event has happened.
Timestamping
Another technique involves remembering at what time (that is, what cycle) the CPU has written to each register, and then having the other component process the write at that cycle.
But if a timestamp changes a prediction, you'll want to catch-up the other components instead of timestamping the write:
- Writes to PPU registers (especially OAMDATA) might change the sprite 0 prediction.
- Writes to mapper or PPU registers might change the mapper IRQ prediction.
- Writes to APU registers might change the Frame IRQ prediction and the length counter predictions.
Scanline-based emulation
A scanline-based emulator is an emulator that uses a crude form of prediction and timestamping: something "important" might happen on each scanline, and timestamps are rounded to a scanline boundary. They run the CPU for one scanline's worth of cycles and then run the PPU and mapper for one scanline (341 dots), and then run the APU for one frame. This isn't perfect, but it works OK on emulators designed for old PCs or handheld devices because most mappers that generate interrupts do so at some predictable point in the scanline, and few games use the APU interrupt or write to the same APU register multiple times in a frame.