The skinny on NES scrolling: Difference between revisions

From NESdev Wiki
Jump to navigationJump to search
m (→‎split X/Y scroll: "only the last write matters" seems ambiguous, clarifying)
(renaming '1st' toggle register 'w', adding it to the list of side effects of various reads/writes, adding $2002 read to list)
Line 15: Line 15:
;t: Temporary VRAM address (15 bits)
;t: Temporary VRAM address (15 bits)
;x: Fine X scroll (3 bits)
;x: Fine X scroll (3 bits)
;1st
;w: First or second write toggle (1 bit)
:First or second write toggle (1 bit)
Registers ''v'' and ''t'' are 15 bits, but because emulators commonly store them in 16-bit machine words, they are shown with an extra bit that's never used.
Registers ''v'' and ''t'' are 15 bits, but because emulators commonly store them in 16-bit machine words, they are shown with an extra bit that's never used.


Line 35: Line 34:
  t: ...BA.. ........ = d: ......BA
  t: ...BA.. ........ = d: ......BA


* $2005 first write:
* $2002 read:
w:                  = 0
 
* $2005 first write (''w'' == 0):
  t: ....... ...HGFED = d: HGFED...
  t: ....... ...HGFED = d: HGFED...
  x:              CBA = d: .....CBA
  x:              CBA = d: .....CBA
w:                  = 1


* $2005 second write:
* $2005 second write (''w'' == 1):
  t: .....HG FED..... = d: HGFED...
  t: .....HG FED..... = d: HGFED...
  t: CBA.... ........ = d: .....CBA
  t: CBA.... ........ = d: .....CBA
w:                  = 0


* $2006 first write:
* $2006 first write (''w'' == 0):
  t: .FEDCBA ........ = d: ..FEDCBA
  t: .FEDCBA ........ = d: ..FEDCBA
  t: G...... ........ = 0
  t: G...... ........ = 0
w:                  = 1


* $2006 second write:
* $2006 second write (''w'' == 1):
  t: ....... HGFEDCBA = d: HGFEDCBA
  t: ....... HGFEDCBA = d: HGFEDCBA
  v                  = t
  v                  = t
w:                  = 0


* At dot 250 of each scanline (?), if rendering is enabled, the PPU increments the vertical position in ''v'':
* At dot 250 of each scanline (?), if rendering is enabled, the PPU increments the vertical position in ''v'':

Revision as of 10:54, 8 January 2013

Preface

"The skinny on NES scrolling" was posted by loopy on 1999-04-13 to what eventually became the NESdev Yahoo! Group. It was the first to publicly tell how exactly how the PPU uses addresses written to its ports. After over a decade, it is still believed accurate. Some people get turned off by the fact that it's provided as monospaced text inside a zipfile, that addresses have nothing to distinguish them from years, and that the diagrams of what bits get copied where are allegedly difficult to read. What follows is this document, reformatted to web standards, with a few minor things made slightly clearer.

PPU registers

If you aren't trying to split the screen, scrolling the background is as easy as writing the X and Y coordinates to $2005 and writing the high bit of both coordinates to $2000. Programming or emulating a game that uses complex raster effects, on the other hand, requires a complete understanding of how the various address registers inside the PPU work. Here are the related registers:

v
Current VRAM address (15 bits)
t
Temporary VRAM address (15 bits)
x
Fine X scroll (3 bits)
w
First or second write toggle (1 bit)

Registers v and t are 15 bits, but because emulators commonly store them in 16-bit machine words, they are shown with an extra bit that's never used.

The PPU uses the current VRAM address for both reading and writing PPU memory thru $2007, and for fetching nametable data to draw the background. As it's drawing the background, it updates the address to point to the nametable data currently being drawn. Bits 10-11 hold the base address of the nametable minus $2000. Bits 12-14 are the Y offset of a scanline within a tile.

Stuff that affects register contents

In the following, d refers to the data written to the port, and A through H to individual bits of a value.

$2005 and $2006 share a common write toggle, so that the first write has one behaviour, and the second write has another. After the second write, the toggle is reset to the first write behaviour. This toggle may be manually reset by reading $2002.

  • $2000 write:
t: ...BA.. ........ = d: ......BA
  • $2002 read:
w:                  = 0
  • $2005 first write (w == 0):
t: ....... ...HGFED = d: HGFED...
x:              CBA = d: .....CBA
w:                  = 1
  • $2005 second write (w == 1):
t: .....HG FED..... = d: HGFED...
t: CBA.... ........ = d: .....CBA
w:                  = 0
  • $2006 first write (w == 0):
t: .FEDCBA ........ = d: ..FEDCBA
t: G...... ........ = 0
w:                  = 1
  • $2006 second write (w == 1):
t: ....... HGFEDCBA = d: HGFEDCBA
v                   = t
w:                  = 0
  • At dot 250 of each scanline (?), if rendering is enabled, the PPU increments the vertical position in v:
The effective Y scroll coordinate is incremented, which is a complex operation that will correctly skip the attribute table memory regions, and wrap to the next nametable appropriately. See Wrapping around below.
  • At dot 256 of each scanline, if rendering is enabled, the PPU copies all bits related to horizontal position from t to v:
v: ....H.. ...EDCBA = t: ....H.. ...EDCBA
  • At dot 304 of the pre-render scanline (end of vblank), if rendering is enabled, the PPU copies all bits from t to v:
v                   = t
  • $2007 writes:
Writes to $2007 will add either 1 or 32 to v depending on the VRAM increment bit set via $2000. This is not normally useful for scrolling, or done during rendering.


All of this info agrees with the tests Loopy has run on an NES console and Quietust's analysis of a micrograph of the PPU die. If there's something you don't agree with, please let the BBS know so that a member can verify it.

Wrapping around

You can think of bits 4-0 of the VRAM address as the "coarse x scroll"(*8) that the PPU increments as it draws. As it wraps from 31 to 0, bit 10 is switched. You should see how this causes horizontal wrapping between nametables (0,1) and (2,3).

You can think of bits 9-5 as the "coarse y scroll"(*8). This functions slightly different from the X. It wraps to 0 and bit 11 is switched when it's incremented from 29 instead of 31. There are some odd side effects from this. If you manually set the value above 29 (from either $2005 or $2006), the wrapping from 29 obviously won't happen, and attribute data will be used as nametable data. The "y scroll" still wraps to 0 from 31, but without switching bit 11. This explains why writing 240+ to 'Y' in $2005 appeared as a negative scroll value.

Examples

Below is an example of 6502 code that completely sets the scroll register before the next scanline, indicating what happens to all relevant variables described above, both before and after the 6502 instructions are executed.

Individual bits written to a PPU register are colour-coded to reflect where they end up in t.

Assume all 6502 code is run sequentially in the order shown, one instruction after the next.

single scroll

If only one scroll setting is needed for the entire screen, this can be done by writing $2000 once, and $2005 twice before the end of vblank.

  1. The low two bits of $2000 select which of the four nametables to use.
  2. The first write to $2005 specifies the X scroll, in pixels.
  3. The second write to $2005 specifies the Y scroll, in pixels.

After this, do not make any writes to $2006 before the end of vblank, as they will overwrite the t register. The v register will be completely copied from t at the end of vblank, setting the scroll.

Note that the series of two writes to $2005 presumes the toggle that specifies which write is taking place. If the state of the toggle is unknown, reset it by reading from $2002 before the first write to $2005.

split X scroll

The X scroll can be changed at the end of any scanline when the horizontal components of v get reloaded from t. Simply make two writes to $2005 before the end of the line.

  1. The first write to $2005 alters the horizontal scroll position. The fine x register (sub-tile offset) gets updated immediately, but the coarse horizontal component of t (tile offset) does not get updated until the end of the line.
  2. The second write to $2005 is inconsequential; the changes made to t will be ignored at the end of the line.

Like the single scroll example, reset the toggle by reading $2002 if it is in an unknown state. Also, you may omit the second write to $2005 if you do not need to flip the toggle back to one. If making multiple X splits

To avoid visual glitching when x is updated, the first $2005 write should be made during hblank. When perfect timing cannot be guaranteed, it is often considered acceptable to perform it somewhere near the end of the scanline, leaving a short glitch on the right side of the screen.

split X/Y scroll

To split both the X and Y scroll on a scanline, we must perform four writes to $2006 and $2005 alternately in order to completely reload v. Without the second write to $2006, only the horizontal portion of v will loaded from t at the end of the scanline. By writing twice to $2006, the second write causes an immediate full reload of v from t, allowing you to update the vertical scroll in the middle of the screen.

This is based on Drag's example on the nesdev forum where writes to PPU registers are done in the order of $2006, $2005, $2005, $2006. This order of writes is important, understanding that the write toggle for $2005 is shared with $2006. As always, if the state of the toggle is unknown before beginning, read $2002 to reset it.

In this example we will perform two writes to each of $2005 and $2006. We will set the X scroll (X), Y scroll (Y), and nametable select (N) by writes to $2005 and $2006. This diagram shows where each value fits into the four register writes.

N: %01
X: %01111101 = $7D
Y: %00111110 = $3E
$2005.1 = X                                                          = %01111101 = $7D
$2005.2 = Y                                                          = %00111110 = $3E
$2006.1 = ((Y & %11000000) >> 6) | ((Y & %00000011) << 4) | (n << 2) = %00010100 = $14
$2006.2 = ((X & %11111000) >> 3) | ((Y & %00111000) << 2)            = %11101111 = $EF

However, since there is a great deal of overlap between the data sent to $2005 and $2006, only the last write to any particular bit of t matters. This makes the first write to $2006 mostly redundant, we can simplify its setup significantly:

$2006.2 = n << 2                                                     = %00000100 = $04

There are other redundancies in the writes to $2005, but since it is likely the original X and Y values already calculated, these can be left as an exercise for the reader.

Before Instructions After Notes
t v x t v x
....... ........ ....... ........ ... LDA #$04 (%00000100)
STA $2006
0000100 ........ ....... ........ ... Bit 14 of t set to zero
0000100 ........ ....... ........ ... LDA #$3E (%00111110)
STA $2005
1100100 111..... ....... ........ ... Behaviour of 2nd $2005 write
1100100 111..... ....... ........ ... LDA #$7D (%01111101)
STA $2005
1100100 11101111 ....... ........ 101 Behaviour of 1st $2005 write
1100100 11101111 ....... ........ 101 LDA #$EF (%11101111)
STA $2006
1100100 11101111 1100100 11101111 101 After t is updated, contents of t copied into v

The perfect time to perform the last two writes is during hblank. This is because x and v are immediately changed by these writes and will cause a visual glitch if they are performed in the middle of the scanline. As before, where perfect timing is not available it is generally considered acceptable to perform it a bit before the end of the scanline, leaving a short glitch on the right hand side of the screen.