User:Myask/Myapper thoughts
8x8 attributes, beginning of 8x1
8x1 attributes?
It's all in the fine Y and keeping it in mind. The intensive solution is to watch the PPU bus and keep our own copy of PPUSCROLL, which requires knowing the latch and thus watching several of the registers. Pattern-fetch snooping can yield it, but as it is after the attribute fetch, it does not work for the first fetch of a scanline. Triple fetch is after said 2 pre-fetched tiles. Is really a double-fetch duplicating next one normally, but VBLANK is between the last line's double-fetch and the first of pre-render, which is interrupted by whatever the program wants to access. One could count tiles, then:
if(this_fetch == last_fetch) reset(counter); if(is_AT(this_fetch) && is_NT(last_fetch)) counter += 1; if(counter == (32? 33? 34?)) //override the usual pattern, we're doing first prefetch ///...after which we can just let the pattern fetch give us the information
But is this any simpler than register-snooping? This segues nicely into a scanline counter interrupt, though for some reason I have the thought of a nametable-relative Y-based interrupt instead (which is just a different frame of reference on the same thing.)
Flipping
BG tileflipping is pretty easy.
chr_a[2:0] = ppu_a[2:0] ^ {3{is_chr_access & vert_flip}}; ppu_d[8:0] = (is_chr_access) ? (horiz_flip ? chr_d[0:8] : chr_d[8:0]) : something_else);
If one stored the attributes in the same byte you can piggyback that to set the flip bits. But what to use the other four bits for? Just do two tiles per byte? Allow MMC5-like extended tile index allowance? Allow swapping of two colors in the attribute?
DMA theft
If you watch for a $4014 write, you can then watch the DMA happen...and thus have another destination to copy it to. This could alleviate the VBLANK crunch a bit (for e.g. the 8x1 mapper), as DMA copies much faster than program can.