Visual circuit tutorial

From NESdev Wiki
Revision as of 11:42, 6 June 2013 by Ulfalizer (talk | contribs) (Add note on hashes and tildes on Visual 6502 node names ()
Jump to navigationJump to search

This is a crash course on making sense of the circuit displays in Visual 6502/2C02/2A03, written for people without much low-level electronics experience (like the author). It aims to present the information needed to read the diagrams at a basic level in simple language, omitting details that are unimportant when starting out.

You might want to read the Visual 6502 user's guide and the Visual 2C02 page first.

What the different colored areas are

Let's start by defining what the different colors mean:

Vis areas.png
  • Green areas are diffusion (explained below) connected to ground.
  • Red areas are diffusion connected to VCC (power).
  • Yellow areas are diffusion that is neither connected directly to ground nor directly to VCC.
  • Gray areas are metal.
  • Purple areas are polysilicon (often shortened to just "poly").

At the level presented here, diffusion, metal, and polysilicon can be thought of as roughly equivalent when viewed in isolation; they all conduct current. The important difference is in how they interact with each other, which is explained below.

Basic building blocks

Transistors

When a piece of polysilicon is sandwiched between two areas of diffusion, it acts as a gate, only letting current through when the polysilicon is powered (or, equivalently, on, conducting, high, or 1). The diffusion area from which current will flow when the gate is high is called the source. The diffusion area into which current will flow is called the drain. The gate together with the source and drain is what makes a transistor.

Vis transistor.png

The transistor here is an enhancement-mode transistor. All the "ordinary" selectable (see the nodes section) transistors have this type.

Power sources

Around an area of powered diffusion we will often see something like the following (note the distinctive "hook" in the polysilicon):

Vis power.png

Here the polysilicon acts roughly like a resistor (or more specifically a pull-up resistor), preventing a short from VCC to ground when the power source would otherwise have a direct connection to ground along some path of high gates.

The transistor here is a depletion-mode transistor, a different type of transistor compared to above (though it appears the same visually). In the simulators, this configuration is simply modelled as a power source.

Nodes

Electrically common areas are called nodes in Visual 6502/2C02/2A03. Clicking on a node will highlight it, making it easier to see how things are connected (clicking on powered or grounded diffusion won't work; these only modify properties of other nodes and are not themselves nodes). When a node is highlighted, a numeric ID unique to the node will be displayed in the upper right, along with a name for the node if it has one. Node names are defined in nodenames.js.

Transistors can be selected separately by clicking on the gate (the part of the polysilicon between the diffusion areas). They have names that start with "t", followed by a numeric ID.

The Find: edit field can be used to locate nodes, either by numeric ID or by name. Numeric IDs can also be used to trace the values of nodes without an assigned name.

Basic logic elements

Inverters

An inverter is constructed like in the image below:

Vis inverter.png

When the input gate is low, current flows into the output wire. When the input gate is high, current flows into ground, driving the output wire low. The output wire is hence the inverse of the input wire.

When one node is the inverse of another, it is said that it inverts into the other node.

NOR gates

Below is an example of a NOR gate taken from Visual 2A03, related to controlling when the first square channel is silenced:

Vis nor.png

If any of the gates in red circles are high, the voltage of the highlighted node will be pulled to ground instead of pulled high (current will flow to ground through the gates in red circles that are high). The value that reaches the gate in the blue circle is hence the NOR of the values on the gates in the red circles.

The gate in the blue circle is part of a pass transistor, so called because it passes current between two nodes rather than driving or grounding a node. The gate in this case is apu_clk1, and we say that value is "buffered on apu_clk1".

Storage elements

Wire capacitance as storage

This is the simplest form of storage, and so is covered first.

If a wire is "closed off" so that it is no longer connected to neither power nor ground, it will retain its value for a while through capacitance. This is used to store some short-lived data "on the wire". As an example, here's the read buffer for the 2C02's VBlank flag, which lets its value be read even though reading $2002 immediately clears the VBlank flag:

Vis vblbuf.png

The circled gate is controlled by the /read_2002_output_vblank_flag signal. While it is high, the value of vbl_flag (or rather /vbl_flag in this case) is connected to the highlighted wire. When it goes low, the value on the wire is held.

While a node or wire is isolated from both VCC and ground in this fashion, it is said to be floating. For bus lines, a floating line is said to be tri-stated, as the floating state can be viewed as a third state in addition to 0 and 1. This third state allows other devices to use the bus without interference.

Using capacitance as storage in the above fashion is an instance of dynamic logic, so called since it has time-dependent behavior beyond just the input clock. Chips that make use of dynamic logic techniques tend to have a minimum clock speed at which they will function correctly, as values stored via capacitance will degrade to zero over time.

Cross-coupled inverters

Two cross-coupled inverters will make a latch (an element that stores a single bit). This arrangement is often used for latches that are set or cleared by specific logic rather than by having a value copied into them (from e.g. a data bus line).

Below is the VBlank flag from Visual 2C02. To the left the vbl_flag node is highlighted, and to the right its inverse is highlighted. (We would label the inverse /vbl_flag, where "/" denotes "inverse" or "active low"). As can be seen by the two gates in gray circles, each inverts into the other, forming two cross-coupled inverters.

Vis crossreg.png

The gates marked "set" and "clear" set and clear the latch, respectively. To clear the latch, vbl_flag is driven low. To set the latch, /vbl_flag is driven low.

This circuit is an example of an SR Latch, where S stands for set and R for reset, corresponding to the set and clear gates above. It is more specifically an SR NOR Latch, as it can be viewed as being built of NOR gates, where e.g. the values on the set gate and the upper gate in the gray circle constitute the inputs to a NOR gate. The corresponding schematic using NOR gates is shown on the right.

Clocked latches

When a latch can be set directly from the value of some line, e.g. a data bus line, an arrangement involving a clock is often used. The motivation is to avoid having to form both data_line and /data_line and route them to the respective terminals of the latch, which would use more logic. (The clock is already routed all around the chip, so mixing it in usually isn't as much of a problem.)

As an example, here's the noi_lfsrmode node (the "Loop noise" flag from $400E):

Vis clockedreg.png

While apu_clk1 is high, noi_lfsrmode will flow into the floating node (so called because it will float when both apu_clk1 and w400e are low), which then inverts into noi_/lfsrmode, forming a cross-coupled inverter latch. While apu_clk1 is low, the loop will be broken momentarily, and during this phase a new value can be copied into the latch through the gate controlled by the w400e signal (which goes high on writes to $400E). The value let through by the pass transistor is the db7 node, corresponding to the seventh bit of the data bus. (There's a via between the diffusion and the metal db7 line - easier to see if the node is highlighted.) If the loop was not broken during the write operation, the old value in the latch would interfere with setting a new value.

For another, less cluttered view of the same type of circuit, see this image (substitute "apu_clk1" for "/φ₁" and "w400e" for "φ₁").

(The circuitry in the lower-right corner is a multiplexer, which selects between one of two inputs depending on whether noi_lfsrmode or noi_/lfsrmode is high; i.e., depending on whether noi_lfsrmode is 0 or 1. The output of the multiplexer is on the left side.)

DRAM (Dynamic RAM)

Below is an example of a DRAM cell, taken from the internal PPU OAM memory:

Vis dram cell.png

In the left and right pictures the two sides of the cell are highlighted (with a different highlight color on the right due to the node being high). The two nodes are always inverses of each other, with the node highlighted in the left picture corresponding to the value held in the cell (low for 0 and high for 1).

Note that this is not an instance of cross-coupled inverters, as neither node is directly connected to a power source. Rather, DRAM depends on capacitance to hold the value, which will fade unless the capacitor is regularly refreshed (the high side recharged). (This is the "dynamic" part of DRAM.)

Below is a picture of the upper edge of the PPU OAM DRAM array:

Vis oam.png

(The "column" and "row" labels are conventional memory terminology; they confusingly happen to get the opposite orientation in Visual 2C02. "Row" and "column" below will refer to this terminology.)

The spr_rowx lines (sometimes called word lines) are used to connect a row of memory cells to the horizontal bit lines (by opening up each cell to a pair of vias); this is called opening that row. For example, spr_row16 opens the highlighted row, while spr_row0 opens the row on its right side. As can be guessed from the node names, the memory layout is not as straightforward as consecutive memory locations being stored in consecutive rows.

On the left side of OAM we see pass transistors on the spr_col1 and spr_col3 lines select the bit lines from the first and second columns of the memory array, respectively (there are other, similar, lines next to them) . Each such spr_colx line is connected to eight different columns (16 bit lines), corresponding to the eight bits of the byte to be read or written (increasing bit positions are not stored in consecutive columns either). One notable exception to this pattern is that two columns only connect to five sets of bit lines; these columns correspond to the "flags" bytes in OAM, where the middle 3 bits don't actually exist.

DRAM refresh

At the right side in the picture above we see pclk0 running down the edge of OAM, connected to pull-up transistors for each bit line. During pclk0, these are used to precharge the bit lines, after which the pull-up transistors are disabled but the lines remain charged through capacitance. When the selected row is opened after pclk0, it will be exposed to the precharged bit lines, which has the effect of charging up the high side of the cell. On the low side of the cell, the precharge current will simply drain to ground, as the gate on that side will be driven high.

In a typical DRAM circuit, the rows are automatically and periodically refreshed to prevent values from fading. In the PPU, no such logic exists, and rows are only refreshed when accessed. The reason the PPU (usually) gets away with this is that sprite evaluation will access the entire OAM (provided rendering is enabled), refreshing the rows as a side effect.

In Visual 2C02, the precharge logic has been disconnected (clicking on the gates of the pull-up transistors will show that there are no transistors there, even though the visual display is as-if there would be) as it is not necessary in a purely digital simulator and causes timing glitches.

SRAM (Static RAM)

SRAM uses cross-coupled inverters for storage and is accessed using a row/column scheme similar to DRAM. Compared to DRAM, SRAM does not need to be refreshed, tends to be faster, uses more die area per memory cell, and draws more power for the NMOS version.

Below is a picture of SRAM memory cells used to store the PPU's palette (in this case the rows do go horizontally):

Vid sram.png

Miscellaneous circuitry

Decoders and mask ROMs

  • A decoder is a circuit that maps input values to output values. A decoder that maps m input lines to n output lines is called an m-to-n decoder.
  • A mask ROM is a type of read-only memory constructed by masking off parts of a circuit grid.

The two elements are covered together since their implementation turns out to be identical in this case.

Pictured below is the decoder and mask ROM that acts as the lookup table for initialization of the length counters in the APU:

Vis len rom.png

The length is set by writing bits 7-3 of e.g. $4003 (in the case of the first pulse channel), so the inputs to the decoder are bits 7-3 of the data bus. The output from the decoder feeds into the mask ROM, and the output from the mask ROM is the length from the lookup table. The length is used to initialize a counter that counts down to zero before silencing the channel.

The picture below shows a zoomed-in view of the lower part of the decoder and mask ROM:

Vis len pla zoom.png

The spots of yellow diffusion in the decoder and mask ROM are connections to the metal wires, which run horizontally in the decoder and vertically in the mask ROM. By setting the gates connected to the diffusion high, the wires can be driven low.

In the decoder (right part) the input lines and their inverses run vertically (/db7 has been highlighted to show its connection). By looking carefully at the bottom-most horizontal row in the decoder, we see that it is powered on the right side, and that the condition for it to remain high as it passes into the mask ROM is /db7 AND /db6 AND /db5 AND /db4 AND /db3. Another way to put this condition is db7-db3 = $00.

Similarly, the condition for the second row from the bottom to be high is /db7 AND /db6 AND /db5 AND db4 AND /db3, which translates to db7-db3 = $02. The conditions for the third and fourth rows from the bottom are db7-db3 = $04 and db7-db3 = $06, respectively.

The decoder is set up so that dbx and /dbx will never both drive the same horizontal line low (which would make it impossible for that line to ever be high), and in this case each row has a unique bit pattern that activates it. (It would also be possible to insert a "don't care" condition in the decoder by having neither dbx nor /dbx drive the line low.)

The decoder here is a 5-to-32 decoder, with 32 rows corresponding to the 32 possible bit patterns made with five bits. This type of decoder is said to fully decode its inputs, and is an instance of an n-to-2n decoder.

In the mask ROM, we see that each horizontal line from the decoder when high will cause a particular pattern to appear on the lenx outputs. Reading off the bottom row, this pattern is len7-0 = 00001001b = 9. Reading off the remaining rows from bottom to top, we get the values 00010011b = 19, 00100111b = 39, and 01001111b = 79.

Putting together the above, we have the following incomplete map from inputs to outputs:

Index Length
$00 9
$02 19
$04 39
$06 79

By checking against the APU length counter table, we see that these indeed are the length values corresponding to those indices (minus one, due to details of how the length counter works).

To give an example of a decoder that does not feed into a mask ROM, the picture below shows the internal 2A03 address decoder for the address range $4000-$4017, where signals such as r4017 (read 4017) and w4004 (write 4004) are generated.

Vis addr pla.png

The theory behind the decoder and mask ROM seen here is closely related to that of PLAs (Programmable Logic Arrays), where we could view the decoder as the AND plane and the mask ROM as the OR plane (both implemented with NOR gates). This introduction to PLAs is helpful.

Adders

Pictured below is part of the adder used by the sweep units in the 2A03 to calculate the target period for sweep period updates to the second square channel (the first square channel is identical except for a small quirk related to subtraction; see below). The pictured part calculates the second bit (bit 1) of the sum, along with the carry for that bit position.

Vis adder.png

The adder is split into two parts. The left-most part (having four columns) calculates bit 1 of the sum. The right-most part (with three columns) calculates the carry. Both /sum1 out and /carry out are powered, and can be forced low by certain combinations of the input signals being high (for e.g. the left-most column, this combination is addend1 AND carry in AND sq1_p1). The essential information is captured in the following truth table:

sq1_p1 addend1 carry in /sum1 out /carry out
0 0 0 1 1
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 0 0

As expected, this corresponds to an addition operation (with the sum and carry inverted).

The same logic is used to perform subtraction, by inverting each bit of the addend (using separate logic) and setting the carry in for the zeroth bit to 1. This corresponds to the usual invert-bits-and-add-one operation for negating a number in two's complement.

For unknown reasons, the carry in for the zeroth bit is not connected on the first square channel, making it always zero. This leads to the value minus one being subtracted instead on that channel.

Barrel shifters

The below circuitry forms part of a barrel shifter, used to shift the inputs to the adders for sweep unit period updates in this case.

Vis barrel shifter.png

As a side note, the bit inversion for subtraction by the sweep units happens before the bits enter the barrel shifter.

Shift registers

(This section might be considered "advanced" on a first reading. I just wanted an example that made more complex use of clocks.)

The picture below shows the 16-bit shift register that holds the high bits for background tiles (see the PPU rendering page). The upper eight bits can be reloaded from PPU VRAM data bus lines, and the output is taken from the lower eight bits (in this case, the particular bit to use is selected by the fine x scroll). Bits flow clockwise through the shift register.

Vis shift reg.png

Below is a zoomed-in view of three bits (tile_h15-13) from the upper-left part of the shift register:

Vis shift reg zoom.png

The value of each bit corresponds to the value on the (2) side.

Control signals

The following signals control the shifting and reloading of the register (the names used were invented for the article and are not standard terminology):

  • The Invert signal corresponds to pclk0, which is high during the initial half-cycle of a PPU cycle (see the Clocks section).
  • The Shift signal corresponds to pclk1, which is high during the second half-cycle.
  • The Parallel load signal controls pass transistors connected to _db0-7, used to load the upper eight bits of the shift register.

Shift does not always exactly mirror pclk1, as explained below, which is the reason for the ~ notation.

Shifting

Shifting the register is a two-step process:

  1. During pclk0, Invert is driven high, making the value of (1) flow through the pass transistor into the node in the blue circle, which causes (1) to invert into (2).
  2. During pclk1, Shift is driven high, which causes the node marked (2) to invert into the node marked (3) (the next bit of the shift register). Invert is low during this phase, and the value on the node in the blue circle is held via wire capacitance, which makes this a dynamic shift register.

Due to the bit of powered diffusion circled in red, the default value shifted into (1) is 1. However, as the value is held on the inverted side (2), this means that zeroes are being shifted in.

Parallel load

To perform a parallel load of the register, step (2) from above is modified so that Shift remains low during pclk1 and Parallel load goes high instead, causing the new value for each cell to come from the data bus lines instead of from the previous cell.

The diagram below might clarify how the control signals are related. Each row is a PPU half-cycle.

pclk0  Invert  Shift  Parallel load
1      1       0      0
0      0       1      0
1      1       0      0
0      0       1      0
1      1       0      0
0      0       0      1 <-- Reloaded here
1      1       0      0
0      0       1      0
1      1       0      0
0      0       1      0
1      1       0      0
0      0       1      0

Digital-to-analog conversion (DAC)

The below Visual 2A03 circuitry controls the volume on the output pin for the two square channels (the triangle, noise, and DMC channels use a separate pin). Note that each successive bit has twice the weight of the preceding one in terms of the amount of powered diffusion connected to it.

Vis da conversion.png

This is an example of a binary-weighted DAC. A different type of DAC is used for the video output from the PPU (found in the upper-left of Visual 2C02, rotated 90 degrees here):

Vis vid dac.png

The upper-left end is actually connected to VCC, and the lower-right to ground. This is a voltage ladder, and works by tapping the wire at different points along the run to get different voltages. As the simulator is purely digital, this circuit is not directly used in the simulation, and some parts that would otherwise interfere with it have been disconnected.

Output drivers

These are found on pins capable of doing output, which need to be able to source (generate) and sink large currents to drive the line high or low. Large clusters of pull-up and pull-down transistors like these are sometimes called superbuffers. The polysilicon wire that would cause the pin to source current is highlighted below.

Vis output driver.png

Cut-off connections

Some parts of the chips, especially outside the 6502 core, were designed using a copy-and-paste process called "standard cell", leading to some seemingly nonsensical and cut-off connections. These carry no special significance. The image below contains an example.

The 6502 core inside the 2A03 is a substantially tighter block of NMOS (having been designed by hand), but it still has a few cut-off connections remaining from removal of the original output drivers.

Vis cutoff.png

Layers

(This information is not essential to reading the diagrams.)

The layers that make up the chip are as follows, in order from bottom to top: substrate, diffusion, oxide (with holes for buried contacts and vias), polysilicon, more oxide (with holes for vias), metal, and passivation (or "overglass", containing holes where bond wires connect).

The way diffusion is powered or grounded is through vias to large areas of metal that are either grounded or powered.

Clocks

This section lists node names for various clocks that sequence operations within the chips. Some of the 6502 pin signals might have gained a "c_" prefix in Visual 2A03 compared to Visual 6502.

6502 core pins

clk0
The φ0 clock input pin. Goes low at the beginning of a CPU cycle.
clk1out, clk2out
The φ1 and φ2 output pins. φ2 is used to form M2 in the 2A03, which has a modified duty cycle.

6502 internal clock signals

cp1
High during the first phase (half-cycle) of a CPU cycle. The inverse of clk0.
cclk
High during the second phase of a CPU cycle. Roughly equivalent to clk0, but modified slightly to never overlap with cp1 (though that won't be visible in the simulators).

APU clock signals

apu_clk1
This clock signal has a 25% duty cycle. It ticks at half the rate of the CPU clock, and is high only when φ2 is low.
apu_/clk2
Like apu_clk1, but ticks on the opposite phase, and is also inverted so that it has a 75% duty cycle.

This clock arrangement helps to ensure that timed events (various counters being decremented or reloaded) do not conflict with writes from the CPU (which only happen when φ2 is high).

φ1 1 0 1 0 1 0 1 0
φ2 0 1 0 1 0 1 0 1
apu_clk1 1 0 0 0 1 0 0 0
apu_/clk2 1 1 0 1 1 1 0 1

PPU clock signals

clk0
The input clock, fed from the master clock. Used directly in video waveform generation.
_clk0
The inverse of clk0.
pclk0
The pixel clock. Derived from clk0 by dividing by four (NTSC) or five (PAL). One cycle corresponds to a rendered dot, with pclk0 being high during the first phase (half-cycle).
pclk1
The inverse of pclk0. High during the second phase of a pixel clock.

Terms

Below are various terms you might run into:

Bond wire
A wire that connects an internal pad to an external pin on the chip package; see e.g. [1].
Buried contact
A connection between diffusion and polysilicon.
NMOS
The technology used for the transistors in the 2A03 and 2C02. In NMOS, transistors are made by creating regions of n-doped semiconductor that become the source and drain ("n-doped" because this doping increases the mobility of electrons and their negative charge). This type of transistor is good at sinking current to ground (this is what causes a 0 bit to usually "win" in bus conflicts), and worse at pulling up. PMOS is the opposite. The transistors used in NMOS and PMOS are more precisely called n(-type )MOSFETs and p(-type )MOSFETs, respectively.
Open drain
A type of output that works by sinking current from an external pull-up resistor instead of generating current on its own. An example is the PPU's INT pin. The pull-up resistor is denoted "RM1" in this wiring diagram.
Pull-up resistor
A resistor connected to power. "Pull-up" comes from pulling the wire to a high state.
Pull-up transistor
A transistor whose gate when high causes current to flow from a power source.
Pull-down transistor
The analogue of a pull-up transistor for sinking to ground.
Via
A connection between polysilicon/diffusion and metal.

Tips for working with the simulators

Node names in Visual 6502

A hash (#) or tilde (~) on a node name signifies active low or negation in Visual 6502. Due to problems passing hashes in URLs, aliases were automatically introduced that used tildes instead (hence the "automatic alias replacing hash with tilde" comments).

Clearing highlighting

When the simulator is loaded and after it has been run with "animate during simulation" enabled, nodes that are high will be highlighted. To get rid of this highlighting, click the "clear highlighting" button.

Local copies of the simulator

Being able to add node names to nodenames.js can be very helpful when figuring out a circuit. To do this, a local version of the simulator can be downloaded with e.g. $ wget --convert-links on a *nix system. Please watch the recursion level and avoid downloading data needlessly, as at least Visual 2C02 and Visual 2A03 are hosted on a limited uplink.