APU DMC: Difference between revisions
(→Usage of DMC for syncing to video: some quick fixes) |
(some improvements) |
||
Line 141: | Line 141: | ||
In order to use this for stable timing one should : | In order to use this for stable timing one should : | ||
* | * Start a dummy single byte sample at rate $f for timing measurement. Because of a hardware bug this sample should be started twice in a row like that : | ||
* | |||
* When this | lda #$10 | ||
sta $4015 | |||
nop | |||
sta $4015 | |||
* Count how much time it takes before the IRQ flag is risen by polling $4015. This process should takes between 4 and 8 scanlines. | |||
* Enable IRQs and start the actual sample that will be used for the timing. | |||
* When this IRQ happens, you have to wait some amount of time that is complementary to the amount of time measured in A). | |||
That way the time ellapsed in the first and last point will add in an (almost) constant time, and since the lenght of the sample itself is constant, you get perfect control on the timing, allowing to sync with the video while the main program does something else than polling a register or being in a carefully timed idle loop. | That way the time ellapsed in the first and last point will add in an (almost) constant time, and since the lenght of the sample itself is constant, you get perfect control on the timing, allowing to sync with the video while the main program does something else than polling a register or being in a carefully timed idle loop. | ||
It is no problem to do it more than once per frame - it is only required to do the "measurement" part one single time. The | It is no problem to do it more than once per frame - it is only required to do the "measurement" part one single time. The first IRQ will have to play a new DMC sample that will trigger a second IRQ and so on. The second variable waiting part has to be done for all split-points though. | ||
This is only viable for splitpoints which are far apart enough so that triggering another IRQ is worth it - if the splitpoints are close, timed code is the best (or even only) solution. | This is only viable for splitpoints which are far apart enough so that triggering another IRQ is worth it - if the splitpoints are close, timed code is the best (or even only) solution. |
Revision as of 21:47, 24 April 2012
The NES APU's delta modulation channel (DMC) can output 1-bit delta-encoded samples or can have its 7-bit counter directly loaded, allowing flexible manual sample playback.
The DMC channel contains the following: memory reader, interrupt flag, sample buffer, timer, output unit, 7-bit counter.
Timer | v Reader ---> Buffer ---> Output ---> Counter ---> (to the mixer)
$4010 | IL--.FFFF | Flags and frequency (write) |
bit 7 | I--- ---- | IRQ enabled flag. If clear, the interrupt flag is cleared. |
bit 6 | -L-- ---- | Loop flag |
bits 3-0 | ---- RRRR | Rate indexRate $0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $A $B $C $D $E $F ------------------------------------------------------------------------------ NTSC 428, 380, 340, 320, 286, 254, 226, 214, 190, 160, 142, 128, 106, 84, 72, 54 PAL 398, 354, 316, 298, 276, 236, 210, 198, 176, 148, 132, 118, 98, 78, 66, 50 |
$4011 | -DDD.DDDD | Direct load (write) |
bits 6-0 | -DDD DDDD | The counter is loaded with D. If a sample is currently playing, the counter is occasionally not changed properly. |
$4012 | AAAA.AAAA | Sample address (write) |
bits 7-0 | AAAA AAAA | Sample address = %11AAAAAA.AA000000 |
$4013 | LLLL.LLLL | Sample length (write) |
bits 7-0 | LLLL LLLL | Sample length = %LLLL.LLLL0001 |
The counter's value is sent to the mixer. It is loaded with 0 on power-up.
Automatic 1-bit delta-encoded sample playback is carried out by a combination of three units. The memory reader fills the 8-bit sample buffer whenever it is emptied by the sample output unit. The status register is used to start and stop automatic sample playback.
The sample buffer either holds a single 8-bit sample byte or is empty. It is filled by the reader and can only be emptied by the output unit; once loaded with a sample byte it will be played back.
Pitch table
For NTSC:
$4010 | Period | Frequency | Note |
---|---|---|---|
$0 | $1AC | 4181.71 Hz | C8 -1.78 cents |
$1 | $17C | 4709.93 Hz | D8 +4.16 cents |
$2 | $154 | 5264.04 Hz | E8 -3.29 cents |
$3 | $140 | 5593.04 Hz | F8 +1.67 cents |
$4 | $11E | 6257.95 Hz | G8 -3.86 cents |
$5 | $0FE | 7046.35 Hz | A8 +1.56 cents |
$6 | $0E2 | 7919.35 Hz | B8 +3.77 cents |
$7 | $0D6 | 8363.42 Hz | C9 -1.78 cents |
$8 | $0BE | 9419.86 Hz | D9 +4.16 cents |
$9 | $0A0 | 11,186.08 Hz | F9 +1.67 cents |
$A | $08E | 12,604.03 Hz | G9 +8.29 cents |
$B | $080 | 13,982.60 Hz | A9 -12.02 cents |
$C | $06A | 16,884.65 Hz | C10 +14.48 cents |
$D | $054 | 21,306.82 Hz | E10 +17.20 cents |
$E | $048 | 24,857.95 Hz | G10 -15.93 cents |
$F | $036 | 33,143.94 Hz | C11 -17.88 cents |
(A "cent" is 1/100 of a semitone or 1/1200 of an octave.)
Memory reader
When the sample buffer is emptied, the memory reader fills the sample buffer with the next byte from the currently playing sample. It has an address counter and a bytes remaining counter.
When a sample is (re)started, the current address is set to the sample address, and bytes remaining is set to the sample length.
Any time the sample buffer is in an empty state and bytes remaining is not zero, the following occur:
- The CPU is suspend for up to four clock cycles.
- The sample buffer is filled with the next sample byte read from the current address, subject to whatever mapping hardware present.
- The address is incremented; if it exceeds $FFFF, it is wrapped around to $8000.
- The bytes remaining counter is decremented; if it becomes zero and the loop flag is set, the sample is restarted (see above); otherwise, if the bytes remaining counter becomes zero and the IRQ enabled flag is set, the interrupt flag is set.
At any time, if the interrupt flag is set, the CPU's IRQ line is continuously asserted until the interrupt flag is cleared.
Output unit
The output unit continuously outputs a 7-bit value to the mixer. It contains an 8-bit right shift register, a bits-remaining counter, a 7-bit delta-counter, and a silence flag.
When an output cycle ends, a new cycle is started as follows:
- The bits-remaining counter is loaded with 8.
- If the sample buffer is empty, then the silence flag is set; otherwise, the silence flag is cleared and the sample buffer is emptied into the shift register.
When the timer outputs a clock, the following actions occur in order:
- If the silence flag is clear, bit 0 of the shift register is applied to the counter as follows: if bit 0 is clear and the delta-counter is greater than 1, the counter is decremented by 2; otherwise, if bit 0 is set and the delta-counter is less than 126, the counter is incremented by 2.
- The right shift register is clocked.
- The bits-remaining counter is decremented. If it becomes zero, a new cycle is started.
Nothing can interrupt a cycle; every cycle runs to completion before a new cycle is started.
Likely internal implementation of the read
The following is speculation, and thus not necessarily 100% accurate. It does accurately predict observed behavior.
The 6502 cannot be pulled off of the bus normally. The 2A03 DMC gets around this by pulling RDY low internally. This causes the CPU to pause during the next read cycle, until RDY goes high again. The DMC unit holds RDY low for 4 cycles. The first three cycles it idles, as the CPU could have just started an interrupt cycle, and thus be writing for 3 consecutive cycles (and thus ignoring RDY). On the fourth cycle, the DMC unit drives the next sample address onto the address lines, and reads that byte from memory. It then drives RDY high again, and the CPU picks up where it left off.
This matters, because it can interfere with the expected operation of any register where reads have a side effect: the controller registers ($4016 and $4017), reads of the PPU status register ($2002), and reads of VRAM/VROM data ($2007) if they happen to occur in the same cycle that the DMC unit pulls RDY low.
For the controller registers, this can cause an extra rising clock edge to occur, and thus shift an extra bit out. For the others, the PPU will see multiple reads, which will cause extra increments of the address latches, or clear the vblank flag.
Usage of DMC for syncing to video
Concept
The NES hardware only has limited tools for syncing the code with video rendering. The VBlank NMI and sprite zero hit are the only two really reliable flags that can be used, so only 2 synchronizations per frame are doable easily. In addition to that only the VBlank NMI can trigger an interrupt, the sprite zero flag has to be polled, potentially wasting a lot of CPU resources.
It is however possible to use the DMC channel for syncing with video instead of using it for sound. Unfortunately it's a bit complicated, but it can proof to be a life saver when one wants to do complex graphical effects without using an advanced mapper.
The DMC timing itself is completely unsynced with the video, and starting a sample has an effect that jitters over more than one scanline. In order to use this for stable timing one should :
- Start a dummy single byte sample at rate $f for timing measurement. Because of a hardware bug this sample should be started twice in a row like that :
lda #$10 sta $4015 nop sta $4015
- Count how much time it takes before the IRQ flag is risen by polling $4015. This process should takes between 4 and 8 scanlines.
- Enable IRQs and start the actual sample that will be used for the timing.
- When this IRQ happens, you have to wait some amount of time that is complementary to the amount of time measured in A).
That way the time ellapsed in the first and last point will add in an (almost) constant time, and since the lenght of the sample itself is constant, you get perfect control on the timing, allowing to sync with the video while the main program does something else than polling a register or being in a carefully timed idle loop.
It is no problem to do it more than once per frame - it is only required to do the "measurement" part one single time. The first IRQ will have to play a new DMC sample that will trigger a second IRQ and so on. The second variable waiting part has to be done for all split-points though.
This is only viable for splitpoints which are far apart enough so that triggering another IRQ is worth it - if the splitpoints are close, timed code is the best (or even only) solution.
To avoid anything being audible, it's recommended to use samples only made of NULL $00 bytes.
Timing table
This table converts sample lenght in scanline lenght (all values are rounded to the higher integer).
NTSC Rate Length $0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $a $b $c $d $e $f ---------------------------------------------------------------------------------------------------- 1-byte (8 bits) 31 27 24 23 21 18 16 16 14 12 10 10 8 6 6 4 17-byte (136 bits) ** ** ** ** ** ** ** ** 228 192 170 154 127 101 87 65 33-byte (264 bits) ** ** ** ** ** ** ** ** ** ** ** ** ** 196 168 126 49-byte (392 bits) ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** 187
PAL Rate Length $0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $a $b $c $d $e $f ---------------------------------------------------------------------------------------------------- 1-byte (8 bits) 30 27 24 23 21 18 16 15 14 12 10 9 8 6 5 4 17-byte (136 bits) ** ** ** ** ** ** ** ** 225 189 169 151 126 100 85 64 33-byte (264 bits) ** ** ** ** ** ** ** ** ** ** ** ** ** 194 164 124 49-byte (392 bits) ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** 184
Number of scanlines to wait table
This table gives the best sample lenght and frequency combinations for all possible scanlines interval to wait. They are best because they are where the CPU will have to kill the less time. However it's still possible to use options to wait for fewer lines and kill more time during the interrupt before the video effect.
Because a PAL interrupt will always happen about the same time or a bit sooner than a NTSC interrupt, the NTSC table will be used to set the "best" setting here :
Scanlines Best opt. for IRQ 1-3 Timed code 4-5 Length $0, rate $f 6-7 Lenght $0, rate $d 8-9 Length $0, rate $c 10-11 Length $0, rate $a 12-13 Length $0, rate $9 14-15 Length $0, rate $8 16-17 Length $0, rate $6 18-20 Length $0, rate $5 21-22 Length $0, rate $4 23 Lenght $0, rate $3 24-26 Length $0, rate $2 27-30 Length $0, rate $1 31-64 Length $0, rate $0 65-86 Length $1, rate $f 87-100 Length $1, rate $e 101-125 Lenght $1, rate $d 126 Lenght $2, rate $f 127-153 Length $1, rate $c 154-167 Lenght $1, rate $b 168-169 Lenght $2, rate $e 170-186 Length $1, rate $a 187-191 Length $3, rate $f 192-195 Length $1, rate $9 196-227 Length $2, rate $d 228-239 Length $1, rate $8