8: The *actual* $a instructions, pt. 2

(1 comment)

Time to think about the data store a bit. The d0/d2/d4/d6 opcodes are very likely just ld/st instructions targetting it. However, the channel switch sequence is supposed to read/write the context DMA object, and the VP architecture diagram shows a line connecting the data store to the memory interface. This means we should likely be looking for a way to launch a bulk transfer between the data store and the VM. Obviously, this should be done by the unknown instructions at the start and end of the channel switch sequences.

First off, a few notes about the simple load/store instructions:

  • Low 13 bits of the address are the actual address to load/store
  • Bits 16-28 or 16-29 of the address are used for the counter trick and thus presumably ignored by the actual load/store
  • Bits 13-15 may or may not be ignored
  • Bits 29-31 or 30-31 likely do something - they're set to 000 in the clear sequence, 110 in the channel switch sequence
  • We don't yet know what happens with misaligned load/stores

Bits 30-31 and unaligned accesses seem to be worth checking. Let's write a code sample that sets data store address 4*i to 0xdead0000 | 4*i for i < 0x800, then dump it out with loads from addresses 0..0xffff, 0x20000000..0x2000ffff, 0x40000000..0x4000ffff and so on. We'll use all three opcodes currently suspected to be loads: d0, d1, and d2. d0 and d1 seem to load into $v, while d2 loads into $r.

Order of fields: high 3 bits of $a / low 16 bits of $a, d2 result, d0 result, d1 result

0/0000: dead0000 dead0000.dead0004.dead0008.dead000c 30201000.70605040.b0a09080.f0e0d0c0
0/0001: dead0000 dead0000.dead0004.dead0008.dead000c 00000000.00000000.00000000.00000000
0/0002: dead0000 dead0000.dead0004.dead0008.dead000c adadadad.adadadad.adadadad.adadadad
0/0003: dead0000 dead0000.dead0004.dead0008.dead000c dededede.dededede.dededede.dededede
0/0004: dead0004 dead0000.dead0004.dead0008.dead000c 34241404.74645444.b4a49484.f4e4d4c4
0/0005: dead0004 dead0000.dead0004.dead0008.dead000c 00000000.00000000.00000000.00000000
0/0006: dead0004 dead0000.dead0004.dead0008.dead000c adadadad.adadadad.adadadad.adadadad
0/0007: dead0004 dead0000.dead0004.dead0008.dead000c dededede.dededede.dededede.dededede
0/0008: dead0008 dead0000.dead0004.dead0008.dead000c 38281808.78685848.b8a89888.f8e8d8c8
0/0009: dead0008 dead0000.dead0004.dead0008.dead000c 00000000.00000000.00000000.00000000
0/000a: dead0008 dead0000.dead0004.dead0008.dead000c adadadad.adadadad.adadadad.adadadad
0/000b: dead0008 dead0000.dead0004.dead0008.dead000c dededede.dededede.dededede.dededede
0/000c: dead000c dead0000.dead0004.dead0008.dead000c 3c2c1c0c.7c6c5c4c.bcac9c8c.fcecdccc
0/000d: dead000c dead0000.dead0004.dead0008.dead000c 00000000.00000000.00000000.00000000
0/000e: dead000c dead0000.dead0004.dead0008.dead000c adadadad.adadadad.adadadad.adadadad
0/000f: dead000c dead0000.dead0004.dead0008.dead000c dededede.dededede.dededede.dededede
0/0010: dead0010 dead0010.dead0014.dead0018.dead001c 30201000.70605040.b0a09080.f0e0d0c0
0/0011: dead0010 dead0010.dead0014.dead0018.dead001c 00000000.00000000.00000000.00000000
0/0012: dead0010 dead0010.dead0014.dead0018.dead001c adadadad.adadadad.adadadad.adadadad
0/0013: dead0010 dead0010.dead0014.dead0018.dead001c dededede.dededede.dededede.dededede
0/0014: dead0014 dead0010.dead0014.dead0018.dead001c 34241404.74645444.b4a49484.f4e4d4c4
0/0015: dead0014 dead0010.dead0014.dead0018.dead001c 00000000.00000000.00000000.00000000
0/0016: dead0014 dead0010.dead0014.dead0018.dead001c adadadad.adadadad.adadadad.adadadad
0/0017: dead0014 dead0010.dead0014.dead0018.dead001c dededede.dededede.dededede.dededede
0/0018: dead0018 dead0010.dead0014.dead0018.dead001c 38281808.78685848.b8a89888.f8e8d8c8
0/0019: dead0018 dead0010.dead0014.dead0018.dead001c 00000000.00000000.00000000.00000000
0/001a: dead0018 dead0010.dead0014.dead0018.dead001c adadadad.adadadad.adadadad.adadadad
0/001b: dead0018 dead0010.dead0014.dead0018.dead001c dededede.dededede.dededede.dededede
0/001c: dead001c dead0010.dead0014.dead0018.dead001c 3c2c1c0c.7c6c5c4c.bcac9c8c.fcecdccc
0/001d: dead001c dead0010.dead0014.dead0018.dead001c 00000000.00000000.00000000.00000000
0/001e: dead001c dead0010.dead0014.dead0018.dead001c adadadad.adadadad.adadadad.adadadad
0/001f: dead001c dead0010.dead0014.dead0018.dead001c dededede.dededede.dededede.dededede
[...]
0/00f0: dead00f0 dead00f0.dead00f4.dead00f8.dead00fc 30201000.70605040.b0a09080.f0e0d0c0
0/00f1: dead00f0 dead00f0.dead00f4.dead00f8.dead00fc 00000000.00000000.00000000.00000000
0/00f2: dead00f0 dead00f0.dead00f4.dead00f8.dead00fc adadadad.adadadad.adadadad.adadadad
0/00f3: dead00f0 dead00f0.dead00f4.dead00f8.dead00fc dededede.dededede.dededede.dededede
0/00f4: dead00f4 dead00f0.dead00f4.dead00f8.dead00fc 34241404.74645444.b4a49484.f4e4d4c4
0/00f5: dead00f4 dead00f0.dead00f4.dead00f8.dead00fc 00000000.00000000.00000000.00000000
0/00f6: dead00f4 dead00f0.dead00f4.dead00f8.dead00fc adadadad.adadadad.adadadad.adadadad
0/00f7: dead00f4 dead00f0.dead00f4.dead00f8.dead00fc dededede.dededede.dededede.dededede
0/00f8: dead00f8 dead00f0.dead00f4.dead00f8.dead00fc 38281808.78685848.b8a89888.f8e8d8c8
0/00f9: dead00f8 dead00f0.dead00f4.dead00f8.dead00fc 00000000.00000000.00000000.00000000
0/00fa: dead00f8 dead00f0.dead00f4.dead00f8.dead00fc adadadad.adadadad.adadadad.adadadad
0/00fb: dead00f8 dead00f0.dead00f4.dead00f8.dead00fc dededede.dededede.dededede.dededede
0/00fc: dead00fc dead00f0.dead00f4.dead00f8.dead00fc 3c2c1c0c.7c6c5c4c.bcac9c8c.fcecdccc
0/00fd: dead00fc dead00f0.dead00f4.dead00f8.dead00fc 00000000.00000000.00000000.00000000
0/00fe: dead00fc dead00f0.dead00f4.dead00f8.dead00fc adadadad.adadadad.adadadad.adadadad
0/00ff: dead00fc dead00f0.dead00f4.dead00f8.dead00fc dededede.dededede.dededede.dededede
0/0100: dead0100 dead0100.dead0104.dead0108.dead010c 30201000.70605040.b0a09080.f0e0d0c0
0/0101: dead0100 dead0100.dead0104.dead0108.dead010c 01010101.01010101.01010101.01010101
0/0102: dead0100 dead0100.dead0104.dead0108.dead010c adadadad.adadadad.adadadad.adadadad
0/0103: dead0100 dead0100.dead0104.dead0108.dead010c dededede.dededede.dededede.dededede
0/0104: dead0104 dead0100.dead0104.dead0108.dead010c 34241404.74645444.b4a49484.f4e4d4c4
0/0105: dead0104 dead0100.dead0104.dead0108.dead010c 01010101.01010101.01010101.01010101
0/0106: dead0104 dead0100.dead0104.dead0108.dead010c adadadad.adadadad.adadadad.adadadad
0/0107: dead0104 dead0100.dead0104.dead0108.dead010c dededede.dededede.dededede.dededede
0/0108: dead0108 dead0100.dead0104.dead0108.dead010c 38281808.78685848.b8a89888.f8e8d8c8
0/0109: dead0108 dead0100.dead0104.dead0108.dead010c 01010101.01010101.01010101.01010101
0/010a: dead0108 dead0100.dead0104.dead0108.dead010c adadadad.adadadad.adadadad.adadadad
0/010b: dead0108 dead0100.dead0104.dead0108.dead010c dededede.dededede.dededede.dededede
0/010c: dead010c dead0100.dead0104.dead0108.dead010c 3c2c1c0c.7c6c5c4c.bcac9c8c.fcecdccc
0/010d: dead010c dead0100.dead0104.dead0108.dead010c 01010101.01010101.01010101.01010101
0/010e: dead010c dead0100.dead0104.dead0108.dead010c adadadad.adadadad.adadadad.adadadad
0/010f: dead010c dead0100.dead0104.dead0108.dead010c dededede.dededede.dededede.dededede
[...]

d0 and d2 look quite standard. Unaligned accesses aren't supported. d1 is more interesting: it seems to load 16 bytes from addresses spread out by 16 bytes, with the initial address always having its bits 4-7 forced to 0.

It looks like the data store may be arranged into 2d blocks of 16x16 bytes, with d0 (and d2) being horizontal loads, d1 being the vertical load. Bits 0-3 of address would be the X coordinate inside the block, bits 4-7 the Y coordinate, and bits 8-12 would select the tile. d0 always reads a full horizontal line of a block, d1 reads a full vertical line, and d2 reads a quarter of a horizontal line. That makes a lot of sense, given the 16x16 size of macroblock in most video codecs.

Let's look at the reminder of the dump. The pattern continues for addresses up to 0x1fff. Bits 13-15 and 29 of the address appear to be ignored. Interesting things start to happen with non-0 bits 30-31, though.

2/0000: dead0000 dead0000.dead0004.dead0008.dead000c 60402000.e0c0a080.68482808.e8c8a888
2/0001: dead0000 dead0000.dead0004.dead0008.dead000c 00000000.00000000.01010101.01010101
2/0002: dead0000 dead0000.dead0004.dead0008.dead000c adadadad.adadadad.adadadad.adadadad
2/0003: dead0000 dead0000.dead0004.dead0008.dead000c dededede.dededede.dededede.dededede
2/0004: dead0004 dead0000.dead0004.dead0008.dead000c 64442404.e4c4a484.6c4c2c0c.ecccac8c
2/0005: dead0004 dead0000.dead0004.dead0008.dead000c 00000000.00000000.01010101.01010101
2/0006: dead0004 dead0000.dead0004.dead0008.dead000c adadadad.adadadad.adadadad.adadadad
2/0007: dead0004 dead0000.dead0004.dead0008.dead000c dededede.dededede.dededede.dededede
2/0008: dead0008 dead0000.dead0004.dead0008.dead000c 68482808.e8c8a888.60402000.e0c0a080
2/0009: dead0008 dead0000.dead0004.dead0008.dead000c 00000000.00000000.01010101.01010101
2/000a: dead0008 dead0000.dead0004.dead0008.dead000c adadadad.adadadad.adadadad.adadadad
2/000b: dead0008 dead0000.dead0004.dead0008.dead000c dededede.dededede.dededede.dededede
2/000c: dead000c dead0000.dead0004.dead0008.dead000c 6c4c2c0c.ecccac8c.64442404.e4c4a484
2/000d: dead000c dead0000.dead0004.dead0008.dead000c 00000000.00000000.01010101.01010101
2/000e: dead000c dead0000.dead0004.dead0008.dead000c adadadad.adadadad.adadadad.adadadad
2/000f: dead000c dead0000.dead0004.dead0008.dead000c dededede.dededede.dededede.dededede
2/0010: dead0010 dead0010.dead0014.dead0018.dead001c 70503010.f0d0b090.78583818.f8d8b898
2/0011: dead0010 dead0010.dead0014.dead0018.dead001c 00000000.00000000.01010101.01010101
2/0012: dead0010 dead0010.dead0014.dead0018.dead001c adadadad.adadadad.adadadad.adadadad
2/0013: dead0010 dead0010.dead0014.dead0018.dead001c dededede.dededede.dededede.dededede
2/0014: dead0014 dead0010.dead0014.dead0018.dead001c 74543414.f4d4b494.7c5c3c1c.fcdcbc9c
2/0015: dead0014 dead0010.dead0014.dead0018.dead001c 00000000.00000000.01010101.01010101
2/0016: dead0014 dead0010.dead0014.dead0018.dead001c adadadad.adadadad.adadadad.adadadad
2/0017: dead0014 dead0010.dead0014.dead0018.dead001c dededede.dededede.dededede.dededede
2/0018: dead0018 dead0010.dead0014.dead0018.dead001c 78583818.f8d8b898.70503010.f0d0b090
2/0019: dead0018 dead0010.dead0014.dead0018.dead001c 00000000.00000000.01010101.01010101
2/001a: dead0018 dead0010.dead0014.dead0018.dead001c adadadad.adadadad.adadadad.adadadad
2/001b: dead0018 dead0010.dead0014.dead0018.dead001c dededede.dededede.dededede.dededede
2/001c: dead001c dead0010.dead0014.dead0018.dead001c 7c5c3c1c.fcdcbc9c.74543414.f4d4b494
2/001d: dead001c dead0010.dead0014.dead0018.dead001c 00000000.00000000.01010101.01010101
2/001e: dead001c dead0010.dead0014.dead0018.dead001c adadadad.adadadad.adadadad.adadadad
2/001f: dead001c dead0010.dead0014.dead0018.dead001c dededede.dededede.dededede.dededede
2/0020: dead0020 dead0020.dead0024.dead0028.dead002c 60402000.e0c0a080.68482808.e8c8a888
2/0021: dead0020 dead0020.dead0024.dead0028.dead002c 00000000.00000000.01010101.01010101
2/0022: dead0020 dead0020.dead0024.dead0028.dead002c adadadad.adadadad.adadadad.adadadad
2/0023: dead0020 dead0020.dead0024.dead0028.dead002c dededede.dededede.dededede.dededede
2/0024: dead0024 dead0020.dead0024.dead0028.dead002c 64442404.e4c4a484.6c4c2c0c.ecccac8c
2/0025: dead0024 dead0020.dead0024.dead0028.dead002c 00000000.00000000.01010101.01010101
2/0026: dead0024 dead0020.dead0024.dead0028.dead002c adadadad.adadadad.adadadad.adadadad
2/0027: dead0024 dead0020.dead0024.dead0028.dead002c dededede.dededede.dededede.dededede
[...]
2/00f8: dead00f8 dead00f0.dead00f4.dead00f8.dead00fc 78583818.f8d8b898.70503010.f0d0b090
2/00f9: dead00f8 dead00f0.dead00f4.dead00f8.dead00fc 00000000.00000000.01010101.01010101
2/00fa: dead00f8 dead00f0.dead00f4.dead00f8.dead00fc adadadad.adadadad.adadadad.adadadad
2/00fb: dead00f8 dead00f0.dead00f4.dead00f8.dead00fc dededede.dededede.dededede.dededede
2/00fc: dead00fc dead00f0.dead00f4.dead00f8.dead00fc 7c5c3c1c.fcdcbc9c.74543414.f4d4b494
2/00fd: dead00fc dead00f0.dead00f4.dead00f8.dead00fc 00000000.00000000.01010101.01010101
2/00fe: dead00fc dead00f0.dead00f4.dead00f8.dead00fc adadadad.adadadad.adadadad.adadadad
2/00ff: dead00fc dead00f0.dead00f4.dead00f8.dead00fc dededede.dededede.dededede.dededede
2/0100: dead0108 dead0108.dead010c.dead0100.dead0104 60402000.e0c0a080.68482808.e8c8a888
2/0101: dead0108 dead0108.dead010c.dead0100.dead0104 00000000.00000000.01010101.01010101
2/0102: dead0108 dead0108.dead010c.dead0100.dead0104 adadadad.adadadad.adadadad.adadadad
2/0103: dead0108 dead0108.dead010c.dead0100.dead0104 dededede.dededede.dededede.dededede
2/0104: dead010c dead0108.dead010c.dead0100.dead0104 64442404.e4c4a484.6c4c2c0c.ecccac8c
2/0105: dead010c dead0108.dead010c.dead0100.dead0104 00000000.00000000.01010101.01010101
2/0106: dead010c dead0108.dead010c.dead0100.dead0104 adadadad.adadadad.adadadad.adadadad
2/0107: dead010c dead0108.dead010c.dead0100.dead0104 dededede.dededede.dededede.dededede
2/0108: dead0100 dead0108.dead010c.dead0100.dead0104 68482808.e8c8a888.60402000.e0c0a080
2/0109: dead0100 dead0108.dead010c.dead0100.dead0104 00000000.00000000.01010101.01010101
2/010a: dead0100 dead0108.dead010c.dead0100.dead0104 adadadad.adadadad.adadadad.adadadad
2/010b: dead0100 dead0108.dead010c.dead0100.dead0104 dededede.dededede.dededede.dededede
2/010c: dead0104 dead0108.dead010c.dead0100.dead0104 6c4c2c0c.ecccac8c.64442404.e4c4a484
2/010d: dead0104 dead0108.dead010c.dead0100.dead0104 00000000.00000000.01010101.01010101
2/010e: dead0104 dead0108.dead010c.dead0100.dead0104 adadadad.adadadad.adadadad.adadadad
2/010f: dead0104 dead0108.dead010c.dead0100.dead0104 dededede.dededede.dededede.dededede
2/0110: dead0118 dead0118.dead011c.dead0110.dead0114 70503010.f0d0b090.78583818.f8d8b898
2/0111: dead0118 dead0118.dead011c.dead0110.dead0114 00000000.00000000.01010101.01010101
2/0112: dead0118 dead0118.dead011c.dead0110.dead0114 adadadad.adadadad.adadadad.adadadad
2/0113: dead0118 dead0118.dead011c.dead0110.dead0114 dededede.dededede.dededede.dededede
2/0114: dead011c dead0118.dead011c.dead0110.dead0114 74543414.f4d4b494.7c5c3c1c.fcdcbc9c
2/0115: dead011c dead0118.dead011c.dead0110.dead0114 00000000.00000000.01010101.01010101
2/0116: dead011c dead0118.dead011c.dead0110.dead0114 adadadad.adadadad.adadadad.adadadad
2/0117: dead011c dead0118.dead011c.dead0110.dead0114 dededede.dededede.dededede.dededede
[...]

Looking at d0/d2 results, bit 3 of the address appears to be flipped whenever bit 8 is 1. More interesting stuff now happens in d1 opcode: the stride is now bumped to 0x20 bytes. So, bits 0-4 would be the X coord, and 5+ are the Y coord. The d1 op does the same bit 3 flipping as the d0/d2 ops, so if we only used addresses with bits 30-31 set to 01, we'd never notice the flip.

A similar thing, though with much uglier address mangling, happens when these bits are set to 10 (stride 0x40) and 11 (stride) 0x80:

4/0000: dead0000 dead0000.dead0004.dead0008.dead000c 00adde00.01adde04.02adde08.03adde0c
4/0001: dead0000 dead0000.dead0004.dead0008.dead000c adde4000.adde4401.adde4802.adde4c03
4/0002: dead0000 dead0000.dead0004.dead0008.dead000c de8000ad.de8401ad.de8802ad.de8c03ad
4/0003: dead0000 dead0000.dead0004.dead0008.dead000c c000adde.c401adde.c802adde.cc03adde
4/0004: dead0004 dead0000.dead0004.dead0008.dead000c 00adde04.01adde08.02adde0c.03adde00
4/0005: dead0004 dead0000.dead0004.dead0008.dead000c adde4400.adde4801.adde4c02.adde4003
4/0006: dead0004 dead0000.dead0004.dead0008.dead000c de8400ad.de8801ad.de8c02ad.de8003ad
4/0007: dead0004 dead0000.dead0004.dead0008.dead000c c400adde.c801adde.cc02adde.c003adde
4/0008: dead0008 dead0000.dead0004.dead0008.dead000c 00adde08.01adde0c.02adde00.03adde04
4/0009: dead0008 dead0000.dead0004.dead0008.dead000c adde4800.adde4c01.adde4002.adde4403
4/000a: dead0008 dead0000.dead0004.dead0008.dead000c de8800ad.de8c01ad.de8002ad.de8403ad
4/000b: dead0008 dead0000.dead0004.dead0008.dead000c c800adde.cc01adde.c002adde.c403adde
4/000c: dead000c dead0000.dead0004.dead0008.dead000c 00adde0c.01adde00.02adde04.03adde08
4/000d: dead000c dead0000.dead0004.dead0008.dead000c adde4c00.adde4001.adde4402.adde4803
4/000e: dead000c dead0000.dead0004.dead0008.dead000c de8c00ad.de8001ad.de8402ad.de8803ad
4/000f: dead000c dead0000.dead0004.dead0008.dead000c cc00adde.c001adde.c402adde.c803adde
4/0010: dead0010 dead0010.dead0014.dead0018.dead001c 00adde10.01adde14.02adde18.03adde1c
4/0011: dead0010 dead0010.dead0014.dead0018.dead001c adde5000.adde5401.adde5802.adde5c03
4/0012: dead0010 dead0010.dead0014.dead0018.dead001c de9000ad.de9401ad.de9802ad.de9c03ad
4/0013: dead0010 dead0010.dead0014.dead0018.dead001c d000adde.d401adde.d802adde.dc03adde
4/0014: dead0014 dead0010.dead0014.dead0018.dead001c 00adde14.01adde18.02adde1c.03adde10
4/0015: dead0014 dead0010.dead0014.dead0018.dead001c adde5400.adde5801.adde5c02.adde5003
4/0016: dead0014 dead0010.dead0014.dead0018.dead001c de9400ad.de9801ad.de9c02ad.de9003ad
4/0017: dead0014 dead0010.dead0014.dead0018.dead001c d400adde.d801adde.dc02adde.d003adde
4/0018: dead0018 dead0010.dead0014.dead0018.dead001c 00adde18.01adde1c.02adde10.03adde14
4/0019: dead0018 dead0010.dead0014.dead0018.dead001c adde5800.adde5c01.adde5002.adde5403
4/001a: dead0018 dead0010.dead0014.dead0018.dead001c de9800ad.de9c01ad.de9002ad.de9403ad
4/001b: dead0018 dead0010.dead0014.dead0018.dead001c d800adde.dc01adde.d002adde.d403adde
4/001c: dead001c dead0010.dead0014.dead0018.dead001c 00adde1c.01adde10.02adde14.03adde18
4/001d: dead001c dead0010.dead0014.dead0018.dead001c adde5c00.adde5001.adde5402.adde5803
4/001e: dead001c dead0010.dead0014.dead0018.dead001c de9c00ad.de9001ad.de9402ad.de9803ad
4/001f: dead001c dead0010.dead0014.dead0018.dead001c dc00adde.d001adde.d402adde.d803adde
4/0020: ad0020de ad0020de.ad0024de.ad0028de.ad002cde ec00adde.e001adde.e402adde.e803adde
4/0021: ad0020de ad0020de.ad0024de.ad0028de.ad002cde 00adde20.01adde24.02adde28.03adde2c
4/0022: ad0020de ad0020de.ad0024de.ad0028de.ad002cde adde6000.adde6401.adde6802.adde6c03
4/0023: ad0020de ad0020de.ad0024de.ad0028de.ad002cde dea000ad.dea401ad.dea802ad.deac03ad
4/0024: ad0024de ad0020de.ad0024de.ad0028de.ad002cde e000adde.e401adde.e802adde.ec03adde
4/0025: ad0024de ad0020de.ad0024de.ad0028de.ad002cde 00adde24.01adde28.02adde2c.03adde20
4/0026: ad0024de ad0020de.ad0024de.ad0028de.ad002cde adde6400.adde6801.adde6c02.adde6003
4/0027: ad0024de ad0020de.ad0024de.ad0028de.ad002cde dea400ad.dea801ad.deac02ad.dea003ad
4/0028: ad0028de ad0020de.ad0024de.ad0028de.ad002cde e400adde.e801adde.ec02adde.e003adde
4/0029: ad0028de ad0020de.ad0024de.ad0028de.ad002cde 00adde28.01adde2c.02adde20.03adde24
4/002a: ad0028de ad0020de.ad0024de.ad0028de.ad002cde adde6800.adde6c01.adde6002.adde6403
4/002b: ad0028de ad0020de.ad0024de.ad0028de.ad002cde dea800ad.deac01ad.dea002ad.dea403ad
4/002c: ad002cde ad0020de.ad0024de.ad0028de.ad002cde e800adde.ec01adde.e002adde.e403adde
4/002d: ad002cde ad0020de.ad0024de.ad0028de.ad002cde 00adde2c.01adde20.02adde24.03adde28
4/002e: ad002cde ad0020de.ad0024de.ad0028de.ad002cde adde6c00.adde6001.adde6402.adde6803
4/002f: ad002cde ad0020de.ad0024de.ad0028de.ad002cde deac00ad.dea001ad.dea402ad.dea803ad
4/0030: ad0030de ad0030de.ad0034de.ad0038de.ad003cde fc00adde.f001adde.f402adde.f803adde
4/0031: ad0030de ad0030de.ad0034de.ad0038de.ad003cde 00adde30.01adde34.02adde38.03adde3c
4/0032: ad0030de ad0030de.ad0034de.ad0038de.ad003cde adde7000.adde7401.adde7802.adde7c03
4/0033: ad0030de ad0030de.ad0034de.ad0038de.ad003cde deb000ad.deb401ad.deb802ad.debc03ad
4/0034: ad0034de ad0030de.ad0034de.ad0038de.ad003cde f000adde.f401adde.f802adde.fc03adde
4/0035: ad0034de ad0030de.ad0034de.ad0038de.ad003cde 00adde34.01adde38.02adde3c.03adde30
4/0036: ad0034de ad0030de.ad0034de.ad0038de.ad003cde adde7400.adde7801.adde7c02.adde7003
4/0037: ad0034de ad0030de.ad0034de.ad0038de.ad003cde deb400ad.deb801ad.debc02ad.deb003ad
4/0038: ad0038de ad0030de.ad0034de.ad0038de.ad003cde f400adde.f801adde.fc02adde.f003adde
4/0039: ad0038de ad0030de.ad0034de.ad0038de.ad003cde 00adde38.01adde3c.02adde30.03adde34
4/003a: ad0038de ad0030de.ad0034de.ad0038de.ad003cde adde7800.adde7c01.adde7002.adde7403
4/003b: ad0038de ad0030de.ad0034de.ad0038de.ad003cde deb800ad.debc01ad.deb002ad.deb403ad
4/003c: ad003cde ad0030de.ad0034de.ad0038de.ad003cde f800adde.fc01adde.f002adde.f403adde
4/003d: ad003cde ad0030de.ad0034de.ad0038de.ad003cde 00adde3c.01adde30.02adde34.03adde38
4/003e: ad003cde ad0030de.ad0034de.ad0038de.ad003cde adde7c00.adde7001.adde7402.adde7803
4/003f: ad003cde ad0030de.ad0034de.ad0038de.ad003cde debc00ad.deb001ad.deb402ad.deb803ad
4/0040: ad0040de ad0040de.ad0044de.ad0048de.ad004cde 00adde00.01adde04.02adde08.03adde0c
4/0041: ad0040de ad0040de.ad0044de.ad0048de.ad004cde adde4000.adde4401.adde4802.adde4c03
4/0042: ad0040de ad0040de.ad0044de.ad0048de.ad004cde de8000ad.de8401ad.de8802ad.de8c03ad
4/0043: ad0040de ad0040de.ad0044de.ad0048de.ad004cde c000adde.c401adde.c802adde.cc03adde
4/0044: ad0044de ad0040de.ad0044de.ad0048de.ad004cde 00adde04.01adde08.02adde0c.03adde00
4/0045: ad0044de ad0040de.ad0044de.ad0048de.ad004cde adde4400.adde4801.adde4c02.adde4003
4/0046: ad0044de ad0040de.ad0044de.ad0048de.ad004cde de8400ad.de8801ad.de8c02ad.de8003ad
4/0047: ad0044de ad0040de.ad0044de.ad0048de.ad004cde c400adde.c801adde.cc02adde.c003adde
4/0048: ad0048de ad0040de.ad0044de.ad0048de.ad004cde 00adde08.01adde0c.02adde00.03adde04
4/0049: ad0048de ad0040de.ad0044de.ad0048de.ad004cde adde4800.adde4c01.adde4002.adde4403
4/004a: ad0048de ad0040de.ad0044de.ad0048de.ad004cde de8800ad.de8c01ad.de8002ad.de8403ad
4/004b: ad0048de ad0040de.ad0044de.ad0048de.ad004cde c800adde.cc01adde.c002adde.c403adde
4/004c: ad004cde ad0040de.ad0044de.ad0048de.ad004cde 00adde0c.01adde00.02adde04.03adde08
4/004d: ad004cde ad0040de.ad0044de.ad0048de.ad004cde adde4c00.adde4001.adde4402.adde4803
4/004e: ad004cde ad0040de.ad0044de.ad0048de.ad004cde de8c00ad.de8001ad.de8402ad.de8803ad
4/004f: ad004cde ad0040de.ad0044de.ad0048de.ad004cde cc00adde.c001adde.c402adde.c803adde
4/0050: ad0050de ad0050de.ad0054de.ad0058de.ad005cde 00adde10.01adde14.02adde18.03adde1c
4/0051: ad0050de ad0050de.ad0054de.ad0058de.ad005cde adde5000.adde5401.adde5802.adde5c03
4/0052: ad0050de ad0050de.ad0054de.ad0058de.ad005cde de9000ad.de9401ad.de9802ad.de9c03ad
4/0053: ad0050de ad0050de.ad0054de.ad0058de.ad005cde d000adde.d401adde.d802adde.dc03adde
4/0054: ad0054de ad0050de.ad0054de.ad0058de.ad005cde 00adde14.01adde18.02adde1c.03adde10
4/0055: ad0054de ad0050de.ad0054de.ad0058de.ad005cde adde5400.adde5801.adde5c02.adde5003
4/0056: ad0054de ad0050de.ad0054de.ad0058de.ad005cde de9400ad.de9801ad.de9c02ad.de9003ad
4/0057: ad0054de ad0050de.ad0054de.ad0058de.ad005cde d400adde.d801adde.dc02adde.d003adde
4/0058: ad0058de ad0050de.ad0054de.ad0058de.ad005cde 00adde18.01adde1c.02adde10.03adde14
4/0059: ad0058de ad0050de.ad0054de.ad0058de.ad005cde adde5800.adde5c01.adde5002.adde5403
4/005a: ad0058de ad0050de.ad0054de.ad0058de.ad005cde de9800ad.de9c01ad.de9002ad.de9403ad
4/005b: ad0058de ad0050de.ad0054de.ad0058de.ad005cde d800adde.dc01adde.d002adde.d403adde
4/005c: ad005cde ad0050de.ad0054de.ad0058de.ad005cde 00adde1c.01adde10.02adde14.03adde18
4/005d: ad005cde ad0050de.ad0054de.ad0058de.ad005cde adde5c00.adde5001.adde5402.adde5803
4/005e: ad005cde ad0050de.ad0054de.ad0058de.ad005cde de9c00ad.de9001ad.de9402ad.de9803ad
4/005f: ad005cde ad0050de.ad0054de.ad0058de.ad005cde dc00adde.d001adde.d402adde.d803adde
4/0060: 0060dead 0060dead.0064dead.0068dead.006cdead ec00adde.e001adde.e402adde.e803adde
4/0061: 0060dead 0060dead.0064dead.0068dead.006cdead 00adde20.01adde24.02adde28.03adde2c
4/0062: 0060dead 0060dead.0064dead.0068dead.006cdead adde6000.adde6401.adde6802.adde6c03
4/0063: 0060dead 0060dead.0064dead.0068dead.006cdead dea000ad.dea401ad.dea802ad.deac03ad
4/0064: 0064dead 0060dead.0064dead.0068dead.006cdead e000adde.e401adde.e802adde.ec03adde
4/0065: 0064dead 0060dead.0064dead.0068dead.006cdead 00adde24.01adde28.02adde2c.03adde20
4/0066: 0064dead 0060dead.0064dead.0068dead.006cdead adde6400.adde6801.adde6c02.adde6003
4/0067: 0064dead 0060dead.0064dead.0068dead.006cdead dea400ad.dea801ad.deac02ad.dea003ad
4/0068: 0068dead 0060dead.0064dead.0068dead.006cdead e400adde.e801adde.ec02adde.e003adde
4/0069: 0068dead 0060dead.0064dead.0068dead.006cdead 00adde28.01adde2c.02adde20.03adde24
4/006a: 0068dead 0060dead.0064dead.0068dead.006cdead adde6800.adde6c01.adde6002.adde6403
4/006b: 0068dead 0060dead.0064dead.0068dead.006cdead dea800ad.deac01ad.dea002ad.dea403ad
4/006c: 006cdead 0060dead.0064dead.0068dead.006cdead e800adde.ec01adde.e002adde.e403adde
[...]

I won't dig too much into this mangling now. My guess is that the data store is organized into 16 banks, with the addresses carefully mangled into bank index + physical bank address in such a way that both horizontal and vertical 16-byte loads always hit each bank with exactly one byte access. This mangling needs to be different for every supported stride - so the assumption is that memory is never accessed with different stride parameters (and if it is, you don't care about its previous contents). The mangling also never affects any address bits other than 0-3, so it's safe to have the data store split into several areas using different strides.

Save start:

00000061: cd000000     sethi $a0 0
00000061: cc000000     setlo $a0 0
00000061: cd081f7f     sethi $a1 0x1f7f0000
00000061: cc080080     setlo $a1 0x80
00000062: cd10c000     sethi $a2 0xc0000000
00000062: cc100000     setlo $a2 0
00000062: c700a000     ??? [unknown: c700a000]
00000062: cd000000     sethi $a0 0
00000063: cc001000     setlo $a0 0x1000
00000063: cc101000     setlo $a2 0x1000
00000063: c700a000     ??? [unknown: c700a000]
00000063: ce090000     ??? [unknown: ce090000]
00000064: cf090000     ??? [unknown: cf090000]
00000064: cd00c000     sethi $a0 0xc0000000
00000064: cc000000     setlo $a0 0
00000064: ef0001ff     bnop

Save end:

00000104: cd000000     sethi $a0 0
00000104: cc002000     setlo $a0 0x2000
00000104: cd081f7f     sethi $a1 0x1f7f0000
00000104: cc080080     setlo $a1 0x80
00000105: cd10c000     sethi $a2 0xc0000000
00000105: cc100000     setlo $a2 0
00000105: c700a000     ??? [unknown: c700a000]
00000105: ce090000     ??? [unknown: ce090000]
00000106: cf090000     ??? [unknown: cf090000]

Load start:

0000010b: cd000000   B sethi $a0 0
0000010b: cc000000     setlo $a0 0
0000010b: cc180000     setlo $a3 0
0000010b: cd180000     sethi $a3 0
0000010c: 65000000     mov $r0 0
0000010c: 75000000     sethi $r0 0
0000010c: 6a000007     mov $v0 0 $r0
0000010c: 6a00000f     mov $v0 0x1 $r0
0000010d: 6a000017     mov $v0 0x2 $r0
0000010d: 6a00001f     mov $v0 0x3 $r0
0000010d: d4000007     st [$a0++0] $v0
0000010d: c80001e7     ??? [unknown: c80001e7]
0000010e: cd000000     sethi $a0 0
0000010e: cc002000     setlo $a0 0x2000
0000010e: cd081f7f     sethi $a1 0x1f7f0000
0000010e: cc080080     setlo $a1 0x80
0000010f: cd10c000     sethi $a2 0xc0000000
0000010f: cc100000     setlo $a2 0
0000010f: c3102000     ??? [unknown: c3102000]
0000010f: ad07c000     ??? [unknown: ad07c000]
00000110: ad07c001     ??? [unknown: ad07c001]
00000110: ad07c002     ??? [unknown: ad07c002]
00000110: ad07c003     ??? [unknown: ad07c003]
00000110: 80000000     ??? [unknown: 80000000]
00000111: d3000030     xor $a0 $c0 $a0 $a0
00000111: ca07c1c0     hadd $a0 $c0 $a0
00000111: d3000031     xor $a0 $c1 $a0 $a0
00000111: ca07c1c1     hadd $a0 $c1 $a0
00000112: d3000032     xor $a0 $c2 $a0 $a0
00000112: ca07c1c2     hadd $a0 $c2 $a0
00000112: d3000033     xor $a0 $c3 $a0 $a0
00000112: ca07c1c3     hadd $a0 $c3 $a0
00000113: f000b6db     mov $l0 $c0 0xb6db
00000113: f008246d     mov $l1 $c1 0x246d
00000113: f0102f76     mov $l2 $c2 0x2f76
00000113: f0185cdf     mov $l3 $c3 0x5cdf
00000114: 4cffffc0     add 0 $c0 0 0
00000114: 4cffffc1     add 0 $c1 0 0
00000114: 4cffffc2     add 0 $c2 0 0
00000114: 4cffffc3     add 0 $c3 0 0
00000115: ce090001     ??? [unknown: ce090001]
00000115: cf090001     ??? [unknown: cf090001]
00000115: df000007     anop
00000115: df000007     anop
00000116: df000007     anop
00000116: cd00c000     sethi $a0 0xc0000000
00000116: cc000000     setlo $a0 0
00000116: df000007     anop

Load end:

000001b7: cd000000     sethi $a0 0
000001b7: cc000000     setlo $a0 0
000001b7: cd081f7f     sethi $a1 0x1f7f0000
000001b7: cc080080     setlo $a1 0x80
000001b8: cd10c000     sethi $a2 0xc0000000
000001b8: cc100000     setlo $a2 0
000001b8: c3102000     ??? [unknown: c3102000]
000001b8: cd000000     sethi $a0 0
000001b9: cc001000     setlo $a0 0x1000
000001b9: cc101000     setlo $a2 0x1000
000001b9: c3102000     ??? [unknown: c3102000]
000001b9: ce090001     ??? [unknown: ce090001]
000001ba: cf090001     ??? [unknown: cf090001]

There's also the weird sami-save sequence:

00000008: cd00c000   B sethi $a0 0xc0000000
00000008: cc000000     setlo $a0 0
00000008: f000001f     mov $l0 $c0 0x1f
00000008: ef0001ff     bnop
00000009: 6b0840bf   B mov $r1 $z1
00000009: d6004027     st [$a0++0x4] $r1
00000009: 6b0000bf     mov $r0 $z0
00000009: e30001a0     bra loop $l0 $c0 $l0 not $c0 lzf 0x9
0000000a: d6000027     st [$a0++0x4] $r0
0000000a: ef0001ff     bnop
0000000a: bf000007     vnop
0000000a: ef0001ff     bnop
0000000b: 6b0000a7     mov $r0 $x0
0000000b: d6000027     st [$a0++0x4] $r0
0000000b: bf000007     vnop
0000000b: ef0001ff     bnop
0000000c: 6b0040a7     mov $r0 $x1
0000000c: d6000027     st [$a0++0x4] $r0
0000000c: bf000007     vnop
0000000c: ef0001ff     bnop
[...]
00000049: 6b0780af     mov $r0 $x62
00000049: d6000027     st [$a0++0x4] $r0
00000049: bf000007     vnop
00000049: ef0001ff     bnop
0000004a: 6b07c0af     mov $r0 $x63
0000004a: d6000027     st [$a0++0x4] $r0
0000004a: bf000007     vnop
0000004a: ef0001ff     bnop
0000004b: 6b0000b7     mov $r0 $d0
0000004b: d6000027     st [$a0++0x4] $r0
0000004b: bf000007     vnop
0000004b: ef0001ff     bnop
0000004c: 6b0040b7     mov $r0 $d1
0000004c: d6000027     st [$a0++0x4] $r0
0000004c: bf000007     vnop
0000004c: ef0001ff     bnop
0000004d: 6b0080b7     mov $r0 $d2
0000004d: d6000027     st [$a0++0x4] $r0
0000004d: bf000007     vnop
0000004d: ef0001ff     bnop
[...]
00000052: 6b01c0b7     mov $r0 $d7
00000052: d6000027     st [$a0++0x4] $r0
00000052: bf000007     vnop
00000052: ef0001ff     bnop
00000053: 6b00c047     mov $r0 $sr3
00000053: d6000027     st [$a0++0x4] $r0
00000053: bf000007     vnop
00000053: ef0001ff     bnop
00000054: cd000000     sethi $a0 0
00000054: cc002380     setlo $a0 0x2380
00000054: cd08037f     sethi $a1 0x37f0000
00000054: cc080080     setlo $a1 0x80
00000055: cd10c000     sethi $a2 0xc0000000
00000055: cc100000     setlo $a2 0
00000055: c700a000     ??? [unknown: c700a000]
00000055: ce090000     ??? [unknown: ce090000]
00000056: cf090000     ??? [unknown: cf090000]
00000056: df000007     anop
00000056: df000007     anop
00000056: df000007     anop
00000057: df000007     anop
00000057: cd000000     sethi $a0 0
00000057: cc002580     setlo $a0 0x2580
00000057: cd080023     sethi $a1 0x230000
00000058: cc080080     setlo $a1 0x80
00000058: cd10c000     sethi $a2 0xc0000000
00000058: cc100200     setlo $a2 0x200
00000058: c700a000     ??? [unknown: c700a000]
00000059: ce090000     ??? [unknown: ce090000]
00000059: cf090000     ??? [unknown: cf090000]
00000059: df000007     anop
00000059: df000007     anop
0000005a: df000007     anop
0000005a: df000007     anop
0000005a: ef0001ff     bnop
0000005a: ef0001ff     bnop
0000005b: ef0001ff     bnop
0000005b: ef0001ff     bnop
0000005b: ef0001ff     bnop
0000005b: ef0001ff     bnop
0000005c: ef0001ff     bnop
0000005c: ef0001ff     bnop
0000005c: fff80000     exit 0
[...]

Seems like c7/ce/cf is the "store to external memory" sequence, c3/ce/cf is the "load from external memory" sequence. They take a lot of params, though.

Looking for a few more data samples, the only simple enough I found for nv50 is part01.0 (which is identical for nv44, by the way):

00000000: cc000000     setlo $a0 0
00000000: cd000000     sethi $a0 0
00000000: 6ba7c0af     mov $r20 $x63
00000000: 65b80003     mov $r23 0x3
00000001: 75b80000     sethi $r23 0
00000001: 6ea500f7     sar $r20 $r20 0x1e
00000001: 42a52e47     and $r20 $r20 $r23
00000001: ef0001ff     bnop
00000002: 6ca5000f   B add $r20 $r20 0x1
00000002: 6a1d005f     mov $l3 $r20
00000002: 65b00003     mov $r22 0x3
00000002: e10001db     bra loop $l3 $c3 $l3 $c3 false 0x2
00000003: ef0001ff     bnop
00000003: e10011bb     bra loop $l3 $c3 $l3 $c3 lzf 0xb
00000003: 6ba400af     mov $r20 $x48
00000003: ef0001ff     bnop
00000004: 6ba480af     mov $r20 $x50
00000004: e1000fbb     bra loop $l3 $c3 $l3 $c3 lzf 0xb
00000004: ef0001ff     bnop
00000004: e1000fbb     bra loop $l3 $c3 $l3 $c3 lzf 0xb
00000005: 6ba500af     mov $r20 $x52
00000005: ef0001ff     bnop
00000005: 6ba580af     mov $r20 $x54
00000005: e1000dbb     bra loop $l3 $c3 $l3 $c3 lzf 0xb
00000006: ef0001ff     bnop
00000006: e1000bbb     bra loop $l3 $c3 $l3 $c3 lzf 0xb
00000006: 6ba600af     mov $r20 $x56
00000006: ef0001ff     bnop
00000007: 6ba680af     mov $r20 $x58
00000007: e10009bb     bra loop $l3 $c3 $l3 $c3 lzf 0xb
00000007: ef0001ff     bnop
00000007: e10009bb     bra loop $l3 $c3 $l3 $c3 lzf 0xb
00000008: 6ba700af     mov $r20 $x60
00000008: ef0001ff     bnop
00000008: 6ba780af     mov $r20 $x62
00000008: e10007bb     bra loop $l3 $c3 $l3 $c3 lzf 0xb
00000009: ef0001ff     bnop
00000009: fff90000     exit intr 0
00000009: ef0001ff     bnop
00000009: ef0001ff     bnop
0000000a: ef0001ff     bnop
0000000a: ef0001ff     bnop
0000000a: bf000007     vnop
0000000a: ef0001ff     bnop
0000000b: 42ad2c47   B and $r21 $r20 $r22
0000000b: 42b5be4f     nxor $r22 $r22 0
0000000b: 42a52c47     and $r20 $r20 $r22
0000000b: 6bb800a7     mov $r23 $x0
0000000c: 65c8ffff     mov $r25 0xffff
0000000c: 75c8ffff     sethi $r25 0xffff0000
0000000c: 6ebdc007     sar $r23 $r23 0
0000000c: 42bdf247     and $r23 $r23 $r25
0000000d: 6bc040a7     mov $r24 $x1
0000000d: 65c8ffff     mov $r25 0xffff
0000000d: 75c8ffff     sethi $r25 0xffff0000
0000000d: 6ec60007     sar $r24 $r24 0
0000000e: 42c63247     and $r24 $r24 $r25
0000000e: 65c80000     mov $r25 0
0000000e: de060007     ??? [unknown: de060000]
0000000e: 4ca5e9c7     add $r20 $r23 $r20
0000000f: 6a150067     mov $a2 $r20
0000000f: 6ead7f17     sar $r21 $r21 -0x1e
0000000f: 65b00040     mov $r22 0x40
0000000f: 75b00003     sethi $r22 0x30000
00000010: 42b5aa77     or $r22 $r22 $r21
00000010: 6a1d8067     mov $a3 $r22
00000010: 65c80000     mov $r25 0
00000010: 6a264067     mov $a4 $r25
00000011: c7112000     ??? [unknown: c7112000]
00000011: ce190000     ??? [unknown: ce190000]
00000011: bf00100f     vnop
00000011: cf190000     ??? [unknown: cf190000]
00000012: bf00101f     vnop
00000012: ef0001ff     bnop
00000012: ef0001ff     bnop
00000012: ef0001ff     bnop
00000013: ef0001ff     bnop
00000013: fff80000     exit 0
[...]

A quick analysis:

  • A value X is read from $x register number 48+($x63 bits 30-31)*2. Curiously, values 0-7 are handled in the switch() over values of bits 30-31.
  • $a0 is set to 0
  • $r24 is set to $x1
  • 0xde060007 is executed
  • $a2 is set to $x0 + (X & 0xfffffffc)
  • $a3 is set to 0x30040 | (X & 3) << 30
  • $a4 is set to 0
  • 0xc7112000, 0xce190000, 0xcf190000 are executed

The de opcode is likely to be another form of store instruction, storing $r24 to [$a0] (because they're the only regs with contents unaccounted for at that point).

After a bit of testing, it turns out that d8,d9,da,dc,dd,de behave like d0,d1,d2,d4,d5,d6, except the addressing is [$a+imm], with simple addition instead of postincrement. Good.

Looking at nv44 (which hopefully has similiar enough memory access - part01.0 matched after all...), there's also a nice memory access in the mthd microcode:

00000000: 6b528047     mov $r10 $sr10
00000000: 6b5ac047     mov $r11 $sr11
00000000: 65080180     mov $r1 0x180
00000000: 4dfa83c0     sub 0 $c0 $r10 $r1
00000001: 65100194     mov $r2 0x194
00000001: e0003407     bra $c0 sf 0x1b
00000001: 4df895c1     sub 0 $c1 $r2 $r10
00000001: ef0001ff     bnop
00000002: e000320f     bra $c1 sf 0x1b
00000002: 7e5affe7     shr $r11 $r11 -0x4
00000002: cc080040     setlo $a1 0x40
00000002: cd084000     sethi $a1 0x40000000
00000003: 6a02c067     mov $a0 $r11
00000003: cc100000     setlo $a2 0
00000003: cd100000     sethi $a2 0
00000003: c310000b     ??? [unkno2n: c310000b]
00000004: ce180000     ??? [unknown: ce180000]
00000004: cf180000     ??? [unknown: cf180000]
00000004: ef0001ff     bnop
00000004: ef0001ff     bnop
00000005: ef0001ff     bnop
00000005: ef0001ff     bnop
00000005: bf000007     vnop
00000005: ef0001ff     bnop
00000006: da608007     ld $r12 [$a2]
00000006: da688027     ld $r13 [$a2+0x4]
00000006: da708047     ld $r14 [$a2+0x8]
00000006: 65083000     mov $r1 0x3000
00000007: 42105847     and $r2 $r1 $r12
00000007: 4df845c0     sub 0 $c0 $r1 $r2
00000007: ef0001ff     bnop
00000007: e2002227     bra not $c0 zf 0x18
00000008: ef0001ff     bnop
00000008: 7e7b0087     shr $r15 $r12 0x10
00000008: 627bc01f     and $r15 $r15 0x3
00000008: 7e8300a7     shr $r16 $r12 0x14
00000009: 650ff000     mov $r1 -0x1000
00000009: 420b8247     and $r1 $r14 $r1
00000009: 42840277     or $r16 $r16 $r1
00000009: 428c1e77     or $r17 $r16 $r15
0000000a: 65080180     mov $r1 0x180
0000000a: 4dfa83c0     sub 0 $c0 $r10 $r1
0000000a: 65080184     mov $r1 0x184
0000000a: e2000427     bra not $c0 zf 0xc
0000000b: ef0001ff     bnop
0000000b: 6a8440af     mov $x48 $r17
0000000b: e00015e7     bra 0x15
0000000b: ef0001ff     bnop
0000000c: 4dfa83c0   B sub 0 $c0 $r10 $r1
0000000c: 65080188     mov $r1 0x188
0000000c: e2000427     bra not $c0 zf 0xe
0000000c: ef0001ff     bnop
0000000d: 6a9440af     mov $x50 $r17
0000000d: e00011e7     bra 0x15
0000000d: bf000007     vnop
0000000d: ef0001ff     bnop
0000000e: 4dfa83c0   B sub 0 $c0 $r10 $r1
0000000e: 6508018c     mov $r1 0x18c
0000000e: e2000427     bra not $c0 zf 0x10
0000000e: ef0001ff     bnop
0000000f: 6aa440af     mov $x52 $r17
0000000f: e0000de7     bra 0x15
0000000f: bf000007     vnop
0000000f: ef0001ff     bnop
00000010: 4dfa83c0   B sub 0 $c0 $r10 $r1
00000010: 65080190     mov $r1 0x190
00000010: e2000427     bra not $c0 zf 0x12
00000010: ef0001ff     bnop
00000011: 6ab440af     mov $x54 $r17
00000011: e00009e7     bra 0x15
00000011: bf000007     vnop
00000011: ef0001ff     bnop
00000012: 4dfa83c0   B sub 0 $c0 $r10 $r1
00000012: ef0001ff     bnop
00000012: e2000427     bra not $c0 zf 0x14
00000012: ef0001ff     bnop
00000013: 6ac440af     mov $x56 $r17
00000013: e00005e7     bra 0x15
00000013: ef0001ff     bnop
00000013: ef0001ff     bnop
00000014: 6ad440af   B mov $x58 $r17
00000014: 4fffffff     snop
00000014: bf000007     vnop
00000014: ef0001ff     bnop
00000015: fff80000   B exit 0
00000015: ef0001ff     bnop
00000015: ef0001ff     bnop
00000015: ef0001ff     bnop
00000016: ef0001ff     bnop
00000016: ef0001ff     bnop
00000016: ef0001ff     bnop
00000016: ef0001ff     bnop
00000017: ef0001ff     bnop
00000017: 4fffffff     snop
00000017: bf000007     vnop
00000017: ef0001ff     bnop
00000018: fff90002   B exit intr 0x2
00000018: ef0001ff     bnop
00000018: ef0001ff     bnop
00000018: ef0001ff     bnop
00000019: ef0001ff     bnop
00000019: ef0001ff     bnop
00000019: ef0001ff     bnop
00000019: ef0001ff     bnop
0000001a: ef0001ff     bnop
0000001a: 4fffffff     snop
0000001a: bf000007     vnop
0000001a: ef0001ff     bnop
0000001b: fff90001   B exit intr 0x1
0000001b: ef0001ff     bnop
0000001b: ef0001ff     bnop
0000001b: ef0001ff     bnop
0000001c: ef0001ff     bnop
0000001c: ef0001ff     bnop
[...]

Quick analysis again:

  • If $sr10 < 0x180 or $sr10 > 0x194, interrupt is raised with exit code 1. Seems that $sr10 is the method register, and this code is for some reason only supposed to be called for the DMA object methods - perhaps other methods are handled in hardware?
  • $a0 is set to $sr11 << 4; $sr11 is very likely the method parameter, which in this case would mean setting $a0 to the unshifted RAMIN address of the DMA object
  • $a1 is set to 0x40000040
  • $a2 is set to 0
  • 0xc310000b, 0xce180000, 0xcf180000 are executed
  • Three values (X, Y, Z) are read from data store addresses 0, 4, 8 - these are very likely the first 3 words of the DMA object
  • If bits 12-13 of X are not 11, interrupt is raised with exit code 2. This would correspond to making sure that the DMA object is present and non-paged.
  • $x register number 48 + DMA slot index * 2 is set to Z&0xfffff000 | X>>20 | X bits 16-17 - thus bits 0-1 of $x48+ would correspond to the NV4x memory type (VRAM/unused/PCI/AGP or VRAM/unused/PCIE/PCIE_NOSNOOP), bits 2-31 to the base address of the DMA object. Guess we now know what these regs are for...

Seems like part01.0 is supposed to be a simple FENCE write then... $x1 is written to DMA slot ($x63 >> 30) address $x0.

What can be concluded about th c3/c7/ce/cf opcodes now:

  • c3 is load, c7 is store, ce/cf are used for both
  • There are three params to the load/store sequences, let's call them X, Y, Z
  • X is the offset in whatever memory space is being targetted
  • Y is strange and usually constant, bits 30-31 are the selected memory space on nv44 (0 VRAM, 1 apparently RAMIN from the DMAobj load sequence, 2 PCI/PCIE, 3 AGP/PCIE nosnoop)
  • Z is probably the address in data store - in the same format as used by normal load/store instructions (bits 0-12 address, bits 30-31 select the stride/mangling mode)
  • c7 bits 14-18 select $a register for Z
  • c7 bits 19-23 select $a register for X
  • c3 bits 14-18 select $a register for X
  • c3 bits 19-23 select $a register for Z
  • Y appears to implicitely be $a(X^1)
  • c7/c3 bits 0-13 contain something unknown
  • ce/cf low bits don't appear to match any registers used elsewhere, but always match between ce and cf. May select temporary regs.

One important thing that certainly must be specified somehow is the length. Candidates are the Y word and remaining bits in c3/c7/ce/cf opcodes. Let's look at the sequences with known length.

The DMA object load (12 bytes):

00000002: cc080040     setlo $a1 0x40
00000002: cd084000     sethi $a1 0x40000000
[...]
00000003: c310000b     xdld $a2 $a0 $a1 0xb
00000004: ce180000     ??? [unknown: ce180000]
00000004: cf180000     ??? [unknown: cf180000]

The FENCE (4 bytes):

0000000f: 65b00040     mov $r22 0x40
0000000f: 75b00003     sethi $r22 0x30000
00000010: 42b5aa77     or $r22 $r22 $r21
00000010: 6a1d8067     mov $a3 $r22
[...]
00000011: c7112000     xdst $a2 $a3 $a4 0x2000
00000011: ce190000     ??? [unknown: ce190000]
00000011: bf00100f     vnop
00000011: cf190000     ??? [unknown: cf190000]

The semi-save (0x224 bytes):

00000054: cd000000     sethi $a0 0
00000054: cc002380     setlo $a0 0x2380
00000054: cd08037f     sethi $a1 0x37f0000
00000054: cc080080     setlo $a1 0x80
00000055: cd10c000     sethi $a2 0xc0000000
00000055: cc100000     setlo $a2 0
00000055: c700a000     xdst $a0 $a1 $a2 0x2000
00000055: ce090000     ??? [unknown: ce090000]
00000056: cf090000     ??? [unknown: cf090000]
[...]
00000057: cd000000     sethi $a0 0
00000057: cc002580     setlo $a0 0x2580
00000057: cd080023     sethi $a1 0x230000
00000058: cc080080     setlo $a1 0x80
00000058: cd10c000     sethi $a2 0xc0000000
00000058: cc100200     setlo $a2 0x200
00000058: c700a000     xdst $a0 $a1 $a2 0x2000
00000059: ce090000     ??? [unknown: ce090000]
00000059: cf090000     ??? [unknown: cf090000]

The semi-save seems to be actually split into two stores, of length 0x200 and 0x24... hm.

The full store, data store (0x2000 bytes):

00000061: cd000000     sethi $a0 0
00000061: cc000000     setlo $a0 0
00000061: cd081f7f     sethi $a1 0x1f7f0000
00000061: cc080080     setlo $a1 0x80
00000062: cd10c000     sethi $a2 0xc0000000
00000062: cc100000     setlo $a2 0
00000062: c700a000     xdst $a0 $a1 $a2 0x2000
00000062: cd000000     sethi $a0 0
00000063: cc001000     setlo $a0 0x1000
00000063: cc101000     setlo $a2 0x1000
00000063: c700a000     xdst $a0 $a1 $a2 0x2000
00000063: ce090000     ??? [unknown: ce090000]
00000064: cf090000     ??? [unknown: cf090000]

The full load, data store (0x2000 bytes):

000001b7: cd000000     sethi $a0 0
000001b7: cc000000     setlo $a0 0
000001b7: cd081f7f     sethi $a1 0x1f7f0000
000001b7: cc080080     setlo $a1 0x80
000001b8: cd10c000     sethi $a2 0xc0000000
000001b8: cc100000     setlo $a2 0
000001b8: c3102000     xdld $a2 $a0 $a1 0x2000
000001b8: cd000000     sethi $a0 0
000001b9: cc001000     setlo $a0 0x1000
000001b9: cc101000     setlo $a2 0x1000
000001b9: c3102000     xdld $a2 $a0 $a1 0x2000
000001b9: ce090001     ??? [unknown: ce090001]
000001ba: cf090001     ??? [unknown: cf090001]

Apparently 0x2000-byte access is too large to fit in a single transfer and has to be split into two 0x1000-byte ones...

Seems the length is in bits 0-13 of xdst/xdld ops, or in Y bits 16+ if the xdst/xdld bitfield is 0x2000. Small lengths are simply encoded as (length-1), but larger lengths are... stranger.

Actually, multiplying (bits 0-6) + 1 by (bits 8-12) + 1 seems to match with the size - it's quite plausible that, like the data store itself, the transfers are 2d-oriented, with bits 0-6 being X size and 8-12 being Y size (both minus 1). And Y parameter bits 0-15 could be external memory stride...

One important unknown is target selection on nv50. The mthd microcode is pretty much a noop on nv50 (it only handles mthd 0), so the DMA methods must be processed by hardware somehow. However, it seems using target 0 in channel switch code will just let us access the context DMA object.

Let's try it. We'll attempt to store 0x1000 bytes from address 0 to context DMA object at address 0.

  • $a0
  • $a1
  • $a2
  • c700a000 - xdst $a0 $a1 $a2 0x2000
  • ce090001
  • cf090001

Boom. Hard hang. Apparently VP1 isn't particularly forgiving for invalid transfers.

Another attempt: maybe the low bits of ce/cf are wrong, I took them from a load sequence, but I'm attempting a store... let's set them to 0.

Boom again. What do I still do differently... maybe the 0x10-byte stride is wrong for this transfer and I should be using the 0x80-byte mode?

Yeah, that worked.

00700000: dead0000 dead0004 dead0008 dead000c
00700010: dead0010 dead0014 dead0018 dead001c
00700020: ad0020de ad0024de ad0028de ad002cde
00700030: ad0030de ad0034de ad0038de ad003cde
00700040: 0040dead 0044dead 0048dead 004cdead
00700050: 0050dead 0054dead 0058dead 005cdead
00700060: 60dead00 64dead00 68dead00 6cdead00
00700070: 70dead00 74dead00 78dead00 7cdead00
00700080: 80dead00 84dead00 88dead00 8cdead00
00700090: 90dead00 94dead00 98dead00 9cdead00
007000a0: dead00ac dead00a0 dead00a4 dead00a8
007000b0: dead00bc dead00b0 dead00b4 dead00b8
007000c0: ad00ccde ad00c0de ad00c4de ad00c8de
007000d0: ad00dcde ad00d0de ad00d4de ad00d8de
[...]

Let's just change the data store init sequence to write using stride 0x80...

00700000: dead0000 dead0004 dead0008 dead000c
00700010: dead0010 dead0014 dead0018 dead001c
00700020: dead0020 dead0024 dead0028 dead002c
00700030: dead0030 dead0034 dead0038 dead003c
00700040: dead0040 dead0044 dead0048 dead004c
00700050: dead0050 dead0054 dead0058 dead005c
00700060: dead0060 dead0064 dead0068 dead006c
00700070: dead0070 dead0074 dead0078 dead007c
00700080: dead0080 dead0084 dead0088 dead008c
00700090: dead0090 dead0094 dead0098 dead009c
007000a0: dead00a0 dead00a4 dead00a8 dead00ac
007000b0: dead00b0 dead00b4 dead00b8 dead00bc
007000c0: dead00c0 dead00c4 dead00c8 dead00cc
007000d0: dead00d0 dead00d4 dead00d8 dead00dc
007000e0: dead00e0 dead00e4 dead00e8 dead00ec
007000f0: dead00f0 dead00f4 dead00f8 dead00fc
[...]

Much better now. So let's verify a few things. Changing Y to 0x1e7f0080:

[...]
00700f60: dead0f60 dead0f64 dead0f68 dead0f6c
00700f70: dead0f70 dead0f74 dead0f78 dead0f7c
00700f80: cd100f80 cd100f84 cd100f88 cd100f8c
00700f90: cd100f90 cd100f94 cd100f98 cd100f9c
00700fa0: cd100fa0 cd100fa4 cd100fa8 cd100fac
00700fb0: cd100fb0 cd100fb4 cd100fb8 cd100fbc
00700fc0: cd100fc0 cd100fc4 cd100fc8 cd100fcc
00700fd0: cd100fd0 cd100fd4 cd100fd8 cd100fdc
00700fe0: cd100fe0 cd100fe4 cd100fe8 cd100fec
00700ff0: cd100ff0 cd100ff4 cd100ff8 cd100ffc

0x1f7e0080:

00700000: dead0000 dead0004 dead0008 dead000c
00700010: dead0010 dead0014 dead0018 dead001c
00700020: dead0020 dead0024 dead0028 dead002c
00700030: dead0030 dead0034 dead0038 dead003c
00700040: dead0040 dead0044 dead0048 dead004c
00700050: dead0050 dead0054 dead0058 dead005c
00700060: dead0060 dead0064 dead0068 dead006c
00700070: dead0070 dead0074 dead0078 cdad007c
00700080: dead0080 dead0084 dead0088 dead008c
00700090: dead0090 dead0094 dead0098 dead009c
007000a0: dead00a0 dead00a4 dead00a8 dead00ac
007000b0: dead00b0 dead00b4 dead00b8 dead00bc
007000c0: dead00c0 dead00c4 dead00c8 dead00cc
007000d0: dead00d0 dead00d4 dead00d8 dead00dc
007000e0: dead00e0 dead00e4 dead00e8 dead00ec
007000f0: dead00f0 dead00f4 dead00f8 cdad00fc
[...]

0x1f7f0100:

00700000: dead0000 dead0004 dead0008 dead000c
00700010: dead0010 dead0014 dead0018 dead001c
00700020: dead0020 dead0024 dead0028 dead002c
00700030: dead0030 dead0034 dead0038 dead003c
00700040: dead0040 dead0044 dead0048 dead004c
00700050: dead0050 dead0054 dead0058 dead005c
00700060: dead0060 dead0064 dead0068 dead006c
00700070: dead0070 dead0074 dead0078 dead007c
00700080: cd100080 cd100084 cd100088 cd10008c
00700090: cd100090 cd100094 cd100098 cd10009c
007000a0: cd1000a0 cd1000a4 cd1000a8 cd1000ac
007000b0: cd1000b0 cd1000b4 cd1000b8 cd1000bc
007000c0: cd1000c0 cd1000c4 cd1000c8 cd1000cc
007000d0: cd1000d0 cd1000d4 cd1000d8 cd1000dc
007000e0: cd1000e0 cd1000e4 cd1000e8 cd1000ec
007000f0: cd1000f0 cd1000f4 cd1000f8 cd1000fc
00700100: dead0080 dead0084 dead0088 dead008c
00700110: dead0090 dead0094 dead0098 dead009c
00700120: dead00a0 dead00a4 dead00a8 dead00ac
00700130: dead00b0 dead00b4 dead00b8 dead00bc
00700140: dead00c0 dead00c4 dead00c8 dead00cc
00700150: dead00d0 dead00d4 dead00d8 dead00dc
00700160: dead00e0 dead00e4 dead00e8 dead00ec
00700170: dead00f0 dead00f4 dead00f8 dead00fc
00700180: cd100180 cd100184 cd100188 cd10018c
00700190: cd100190 cd100194 cd100198 cd10019c
007001a0: cd1001a0 cd1001a4 cd1001a8 cd1001ac
007001b0: cd1001b0 cd1001b4 cd1001b8 cd1001bc
007001c0: cd1001c0 cd1001c4 cd1001c8 cd1001cc
007001d0: cd1001d0 cd1001d4 cd1001d8 cd1001dc
007001e0: cd1001e0 cd1001e4 cd1001e8 cd1001ec
007001f0: cd1001f0 cd1001f4 cd1001f8 cd1001fc

0x1f7f00ff:

00700000: dead0000 dead0004 dead0008 dead000c
00700010: dead0010 dead0014 dead0018 dead001c
00700020: dead0020 dead0024 dead0028 dead002c
00700030: dead0030 dead0034 dead0038 dead003c
00700040: dead0040 dead0044 dead0048 dead004c
00700050: dead0050 dead0054 dead0058 dead005c
00700060: dead0060 dead0064 dead0068 dead006c
00700070: dead0070 dead0074 dead0078 dead007c
00700080: cd100080 cd100084 cd100088 cd10008c
00700090: cd100090 cd100094 cd100098 cd10009c
007000a0: cd1000a0 cd1000a4 cd1000a8 cd1000ac
007000b0: cd1000b0 cd1000b4 cd1000b8 cd1000bc
007000c0: dead0080 dead0084 dead0088 dead008c
007000d0: dead0090 dead0094 dead0098 dead009c
007000e0: dead00a0 dead00a4 dead00a8 dead00ac
007000f0: dead00b0 dead00b4 dead00b8 dead00bc
00700100: dead00c0 dead00c4 dead00c8 dead00cc
00700110: dead00d0 dead00d4 dead00d8 dead00dc
00700120: dead00e0 dead00e4 dead00e8 dead00ec
00700130: dead00f0 dead00f4 dead00f8 dead00fc
00700140: cd100140 cd100144 cd100148 cd10014c
00700150: cd100150 cd100154 cd100158 cd10015c
00700160: cd100160 cd100164 cd100168 cd10016c
00700170: cd100170 cd100174 cd100178 cd10017c
00700180: dead0100 dead0104 dead0108 dead010c
[...]

I guess the stride bits 0-5 are ignored - would match the usual stride alignment requirements. 0x1f7fffff:

00700000: dead0000 dead0004 dead0008 dead000c
00700010: dead0010 dead0014 dead0018 dead001c
00700020: dead0020 dead0024 dead0028 dead002c
00700030: dead0030 dead0034 dead0038 dead003c
00700040: dead0040 dead0044 dead0048 dead004c
00700050: dead0050 dead0054 dead0058 dead005c
00700060: dead0060 dead0064 dead0068 dead006c
00700070: dead0070 dead0074 dead0078 dead007c
00700080: cd100080 cd100084 cd100088 cd10008c
00700090: cd100090 cd100094 cd100098 cd10009c
007000a0: cd1000a0 cd1000a4 cd1000a8 cd1000ac
007000b0: cd1000b0 cd1000b4 cd1000b8 cd1000bc
007000c0: cd1000c0 cd1000c4 cd1000c8 cd1000cc
007000d0: cd1000d0 cd1000d4 cd1000d8 cd1000dc
007000e0: cd1000e0 cd1000e4 cd1000e8 cd1000ec
007000f0: cd1000f0 cd1000f4 cd1000f8 cd1000fc
00700100: cd100100 cd100104 cd100108 cd10010c
00700110: cd100110 cd100114 cd100118 cd10011c
00700120: cd100120 cd100124 cd100128 cd10012c
00700130: cd100130 cd100134 cd100138 cd10013c
[...]
00703fa0: cd103fa0 cd103fa4 cd103fa8 cd103fac
00703fb0: cd103fb0 cd103fb4 cd103fb8 cd103fbc
00703fc0: dead0080 dead0084 dead0088 dead008c
00703fd0: dead0090 dead0094 dead0098 dead009c
00703fe0: dead00a0 dead00a4 dead00a8 dead00ac
00703ff0: dead00b0 dead00b4 dead00b8 dead00bc
00704000: dead00c0 dead00c4 dead00c8 dead00cc
00704010: dead00d0 dead00d4 dead00d8 dead00dc
00704020: dead00e0 dead00e4 dead00e8 dead00ec
00704030: dead00f0 dead00f4 dead00f8 dead00fc
00704040: cd104040 cd104044 cd104048 cd10404c
00704050: cd104050 cd104054 cd104058 cd10405c
00704060: cd104060 cd104064 cd104068 cd10406c
[...]

Seems bits 14-15 of stride are also ignored - this would match with the max value of stride being 0x3fc0 elsewhere, too. 0x1fff0080:

00700000: dead0000 dead0004 dead0008 dead000c
00700010: dead0010 dead0014 dead0018 dead001c
00700020: dead0020 dead0024 dead0028 dead002c
00700030: dead0030 dead0034 dead0038 dead003c
00700040: dead0040 dead0044 dead0048 dead004c
00700050: dead0050 dead0054 dead0058 dead005c
00700060: dead0060 dead0064 dead0068 dead006c
00700070: dead0070 dead0074 dead0078 dead007c
00700080: dead0080 dead0084 dead0088 dead008c
[...]
00700fe0: dead0fe0 dead0fe4 dead0fe8 dead0fec
00700ff0: dead0ff0 dead0ff4 dead0ff8 dead0ffc
00701000: dead1000 dead1004 dead1008 dead100c
00701010: dead1010 dead1014 dead1018 dead101c
00701020: dead1020 dead1024 dead1028 dead102c
00701030: dead1030 dead1034 dead1038 dead103c
00701040: dead1040 dead1044 dead1048 dead104c
00701050: dead1050 dead1054 dead1058 dead105c
00701060: dead1060 dead1064 dead1068 dead106c
00701070: dead1070 dead1074 dead1078 dead107c
00701080: cd101080 cd101084 cd101088 cd10108c
00701090: cd101090 cd101094 cd101098 cd10109c
007010a0: cd1010a0 cd1010a4 cd1010a8 cd1010ac
[...]

Hmm. 0x1fff0100:

00700000: dead0000 dead0004 dead0008 dead000c
00700010: dead0010 dead0014 dead0018 dead001c
00700020: dead0020 dead0024 dead0028 dead002c
00700030: dead0030 dead0034 dead0038 dead003c
00700040: dead0040 dead0044 dead0048 dead004c
00700050: dead0050 dead0054 dead0058 dead005c
00700060: dead0060 dead0064 dead0068 dead006c
00700070: dead0070 dead0074 dead0078 dead007c
00700080: dead0080 dead0084 dead0088 dead008c
00700090: dead0090 dead0094 dead0098 dead009c
007000a0: dead00a0 dead00a4 dead00a8 dead00ac
007000b0: dead00b0 dead00b4 dead00b8 dead00bc
007000c0: dead00c0 dead00c4 dead00c8 dead00cc
007000d0: dead00d0 dead00d4 dead00d8 dead00dc
007000e0: dead00e0 dead00e4 dead00e8 dead00ec
007000f0: dead00f0 dead00f4 dead00f8 dead00fc
00700100: dead0080 dead0084 dead0088 dead008c
00700110: dead0090 dead0094 dead0098 dead009c
00700120: dead00a0 dead00a4 dead00a8 dead00ac
00700130: dead00b0 dead00b4 dead00b8 dead00bc
00700140: dead00c0 dead00c4 dead00c8 dead00cc
00700150: dead00d0 dead00d4 dead00d8 dead00dc
00700160: dead00e0 dead00e4 dead00e8 dead00ec
00700170: dead00f0 dead00f4 dead00f8 dead00fc
00700180: dead0100 dead0104 dead0108 dead010c
00700190: dead0110 dead0114 dead0118 dead011c
007001a0: dead0120 dead0124 dead0128 dead012c
007001b0: dead0130 dead0134 dead0138 dead013c
007001c0: dead0140 dead0144 dead0148 dead014c
007001d0: dead0150 dead0154 dead0158 dead015c
007001e0: dead0160 dead0164 dead0168 dead016c
007001f0: dead0170 dead0174 dead0178 dead017c
00700200: dead0100 dead0104 dead0108 dead010c
00700210: dead0110 dead0114 dead0118 dead011c
00700220: dead0120 dead0124 dead0128 dead012c
00700230: dead0130 dead0134 dead0138 dead013c
[...]
00701fc0: dead1040 dead1044 dead1048 dead104c
00701fd0: dead1050 dead1054 dead1058 dead105c
00701fe0: dead1060 dead1064 dead1068 dead106c
00701ff0: dead1070 dead1074 dead1078 dead107c
00702000: cd102000 cd102004 cd102008 cd10200c
00702010: cd102010 cd102014 cd102018 cd10201c
00702020: cd102020 cd102024 cd102028 cd10202c
00702030: cd102030 cd102034 cd102038 cd10203c

Apparently >0x80 X sizes are accepted, but will just run over into the next line. Oh well.

0x3f7f0080, 0x5f7f0080, 0x9f7f0080 appear to be no different than 0x1f7f0080.

Let's mess with other params. X=0x100:

00700000: cd100000 cd100004 cd100008 cd10000c
00700010: cd100010 cd100014 cd100018 cd10001c
00700020: cd100020 cd100024 cd100028 cd10002c
00700030: cd100030 cd100034 cd100038 cd10003c
00700040: cd100040 cd100044 cd100048 cd10004c
00700050: cd100050 cd100054 cd100058 cd10005c
00700060: cd100060 cd100064 cd100068 cd10006c
00700070: cd100070 cd100074 cd100078 cd10007c
00700080: cd100080 cd100084 cd100088 cd10008c
00700090: cd100090 cd100094 cd100098 cd10009c
007000a0: cd1000a0 cd1000a4 cd1000a8 cd1000ac
007000b0: cd1000b0 cd1000b4 cd1000b8 cd1000bc
007000c0: cd1000c0 cd1000c4 cd1000c8 cd1000cc
007000d0: cd1000d0 cd1000d4 cd1000d8 cd1000dc
007000e0: cd1000e0 cd1000e4 cd1000e8 cd1000ec
007000f0: cd1000f0 cd1000f4 cd1000f8 cd1000fc
00700100: dead0000 dead0004 dead0008 dead000c
00700110: dead0010 dead0014 dead0018 dead001c
00700120: dead0020 dead0024 dead0028 dead002c
00700130: dead0030 dead0034 dead0038 dead003c
00700140: dead0040 dead0044 dead0048 dead004c
00700150: dead0050 dead0054 dead0058 dead005c
00700160: dead0060 dead0064 dead0068 dead006c
00700170: dead0070 dead0074 dead0078 dead007c
00700180: dead0080 dead0084 dead0088 dead008c
00700190: dead0090 dead0094 dead0098 dead009c
007001a0: dead00a0 dead00a4 dead00a8 dead00ac
007001b0: dead00b0 dead00b4 dead00b8 dead00bc
[...]

X=0xff:

00700000: cd100000 cd100004 cd100008 cd10000c
00700010: cd100010 cd100014 cd100018 cd10001c
00700020: cd100020 cd100024 cd100028 cd10002c
00700030: cd100030 cd100034 cd100038 cd10003c
00700040: cd100040 cd100044 cd100048 cd10004c
00700050: cd100050 cd100054 cd100058 cd10005c
00700060: cd100060 cd100064 cd100068 cd10006c
00700070: cd100070 cd100074 cd100078 cd10007c
00700080: cd100080 cd100084 cd100088 cd10008c
00700090: cd100090 cd100094 cd100098 cd10009c
007000a0: cd1000a0 cd1000a4 cd1000a8 cd1000ac
007000b0: cd1000b0 cd1000b4 cd1000b8 cd1000bc
007000c0: cd1000c0 cd1000c4 cd1000c8 cd1000cc
007000d0: cd1000d0 cd1000d4 cd1000d8 cd1000dc
007000e0: cd1000e0 cd1000e4 cd1000e8 cd1000ec
007000f0: cd1000f0 cd1000f4 cd1000f8 001000fc
00700100: 04dead00 08dead00 0cdead00 10dead00
00700110: 14dead00 18dead00 1cdead00 20dead00
00700120: 24dead00 28dead00 2cdead00 30dead00
00700130: 34dead00 38dead00 3cdead00 40dead00
00700140: 44dead00 48dead00 4cdead00 50dead00
00700150: 54dead00 58dead00 5cdead00 60dead00
00700160: 64dead00 68dead00 6cdead00 70dead00
00700170: 74dead00 78dead00 7cdead00 80dead00
[...]

Z=0xc0000100:

00700000: dead0100 dead0104 dead0108 dead010c
00700010: dead0110 dead0114 dead0118 dead011c
00700020: dead0120 dead0124 dead0128 dead012c
00700030: dead0130 dead0134 dead0138 dead013c
00700040: dead0140 dead0144 dead0148 dead014c
00700050: dead0150 dead0154 dead0158 dead015c
00700060: dead0160 dead0164 dead0168 dead016c
00700070: dead0170 dead0174 dead0178 dead017c
00700080: dead0180 dead0184 dead0188 dead018c
00700090: dead0190 dead0194 dead0198 dead019c
007000a0: dead01a0 dead01a4 dead01a8 dead01ac
007000b0: dead01b0 dead01b4 dead01b8 dead01bc
007000c0: dead01c0 dead01c4 dead01c8 dead01cc
007000d0: dead01d0 dead01d4 dead01d8 dead01dc
[...]

Z=0xc00000ff:

00700000: 0100dede 0104dead 0108dead 010cdead
00700010: 0110dead 0114dead 0118dead 011cdead
00700020: ad0120de ad0124de ad0128de ad012cde
00700030: ad0130de ad0134de ad0138de ad013cde
00700040: ad0140de ad0144de ad0148de ad014cde
00700050: ad0150de ad0154de ad0158de ad015cde
00700060: ad0160de ad0164de ad0168de ad016cde
00700070: ad0170de ad0174de ad0178de ad017cde
00700080: 0180dede 0184dead 0188dead 018cdead
00700090: 0190dead 0194dead 0198dead 019cdead
007000a0: ad01a0de ad01a4de ad01a8de ad01acde
007000b0: ad01b0de ad01b4de ad01b8de ad01bcde
007000c0: ad01c0de ad01c4de ad01c8de ad01ccde
007000d0: ad01d0de ad01d4de ad01d8de ad01dcde
007000e0: ad01e0de ad01e4de ad01e8de ad01ecde
007000f0: ad01f0de ad01f4de ad01f8de ad01fcde
00700100: 0200dede 0204dead 0208dead 020cdead
00700110: 0210dead 0214dead 0218dead 021cdead
00700120: ad0220de ad0224de ad0228de ad022cde
00700130: ad0230de ad0234de ad0238de ad023cde
00700140: ad0240de ad0244de ad0248de ad024cde
[...]

I'm... not entirely sure if I want to know. Z=0xc0000004:

00700000: dead0004 dead0008 dead000c dead0010
00700010: dead0014 dead0018 dead001c dead0020
00700020: dead0024 dead0028 dead002c dead0030
00700030: dead0034 dead0038 dead003c dead0040
00700040: dead0044 dead0048 dead004c dead0050
00700050: dead0054 dead0058 dead005c dead0060
00700060: dead0064 dead0068 dead006c dead0070
00700070: dead0074 dead0078 dead007c ad0080de
00700080: dead0084 dead0088 dead008c dead0090
00700090: dead0094 dead0098 dead009c dead00a0
007000a0: dead00a4 dead00a8 dead00ac dead00b0
007000b0: dead00b4 dead00b8 dead00bc dead00c0
007000c0: dead00c4 dead00c8 dead00cc dead00d0
007000d0: dead00d4 dead00d8 dead00dc dead00e0
007000e0: dead00e4 dead00e8 dead00ec dead00f0
007000f0: dead00f4 dead00f8 dead00fc ad0100de
00700100: dead0104 dead0108 dead010c dead0110
00700110: dead0114 dead0118 dead011c dead0120
[...]

Let's change the c7 opcode to 0xc7009f3f:

00700000: dead0000 dead0004 dead0008 dead000c
00700010: dead0010 dead0014 dead0018 dead001c
00700020: dead0020 dead0024 dead0028 dead002c
00700030: dead0030 dead0034 dead0038 dead003c
00700040: cd100040 cd100044 cd100048 cd10004c
00700050: cd100050 cd100054 cd100058 cd10005c
00700060: cd100060 cd100064 cd100068 cd10006c
00700070: cd100070 cd100074 cd100078 cd10007c
00700080: dead0080 dead0084 dead0088 dead008c
00700090: dead0090 dead0094 dead0098 dead009c
007000a0: dead00a0 dead00a4 dead00a8 dead00ac
007000b0: dead00b0 dead00b4 dead00b8 dead00bc
007000c0: cd1000c0 cd1000c4 cd1000c8 cd1000cc
007000d0: cd1000d0 cd1000d4 cd1000d8 cd1000dc
007000e0: cd1000e0 cd1000e4 cd1000e8 cd1000ec
007000f0: cd1000f0 cd1000f4 cd1000f8 cd1000fc
[...]

And to 0xc700bf3f:

00700000: dead0000 dead0004 dead0008 dead000c
00700010: dead0010 dead0014 dead0018 dead001c
00700020: dead0020 dead0024 dead0028 dead002c
00700030: dead0030 dead0034 dead0038 dead003c
00700040: dead0040 dead0044 dead0048 dead004c
00700050: dead0050 dead0054 dead0058 dead005c
00700060: dead0060 dead0064 dead0068 dead006c
00700070: dead0070 dead0074 dead0078 dead007c
00700080: dead0080 dead0084 dead0088 dead008c
00700090: dead0090 dead0094 dead0098 dead009c
007000a0: dead00a0 dead00a4 dead00a8 dead00ac
007000b0: dead00b0 dead00b4 dead00b8 dead00bc
007000c0: dead00c0 dead00c4 dead00c8 dead00cc
007000d0: dead00d0 dead00d4 dead00d8 dead00dc
007000e0: dead00e0 dead00e4 dead00e8 dead00ec
007000f0: dead00f0 dead00f4 dead00f8 dead00fc
[...]

So it's apparently opcode bit 13 that selects whether length comes from the opcode or Y param.

Now let's move from $a0-$a2 to $a4-$a6, and change c7 opcode accordingly.

Crash. Seems the ce/cf opcodes do encode some register operands, after all... specifically, it seems bits 19-23 have to target the Y reg. Strange...

So what do these instructions do, anyway? Let's remove them and see what happens.

00700000: dead0000 dead0004 dead0008 dead000c
00700010: dead0010 dead0014 dead0018 dead001c
00700020: dead0020 dead0024 dead0028 dead002c
00700030: dead0030 dead0034 dead0038 dead003c
00700040: dead0040 dead0044 dead0048 dead004c
00700050: dead0050 dead0054 dead0058 dead005c
00700060: dead0060 dead0064 dead0068 dead006c
00700070: dead0070 dead0074 dead0078 dead007c
[...]

Huh, seems to still work. Maybe, like on fµc, xdst/xdld are merely the "request transfer" ops, while ce/cf are "wait for transfer"? Inserting timing measurements between these instructions should answer that.

01545de6 01545ded 01545df4 01545dfb. All instructions execute almost immediately. Crap.

Maybe xdld would be a better thing to measure - xdst could be doing some trickery after all.

0154e62c 0154e633 0154e63a 0154e641. Crap.

There's no way a 0x1000-byte load can happen in 20 clock cycles, something must be doing trickery here. Let's try loading from the data store to $r and measuring the time after *that*.

Before xdld: 01548308, after xdld/ce/cf: 01548310, after load: 01548482, loaded value: cd100ffc.

~0x180 cycles... now that's a saner timeframe. So what if I remove opcodes ce/cf now?

0154708b 01547091 01547097 dead0ffc. No wait, stale value loaded. Seems ce/cf may behave like some sort of a barrier...

Only ce: 01548394 0154839a 0154839f dead0ffc

Only cf: boom.

ce inserted before xdld: 015513bb 015513c3 01551437 dead0ffc.

My guess is that ce sets up a barrier, while cf waits on it. Effectively, cf waits for all transfers issued before the corresponding ce to finish, while allowing those issued after ce to still be in progress. Given the unknown bits in ce/cf, it's possible there are several barrier "slots".

Further experiments with ce/cf:

  • Set bit 0 of both opcodes to 0: barrier no longer works - probably switches between a xdld and xdst barrier
  • Bit 0 mismatched between ce and cf: boom (both ways)
  • Bits 19-23 changed to point to a different $a register, but with same value: still works
  • Bits 19-23 pointed to a different register, with bits 0-29 set to random junk: still works, even if value is mismatched between ce and cf
  • Bits 30-31 of Y set to random values: still works, even if mismatched with bits 30-31 of ce/cf register args
  • Bits 30-31 of ce register arg set to random values: still works
  • Bits 30-31 of cf register arg set to random values: boom, even if matched with Y and ce reg arg
  • Bit 16 unset in ce/cf: boom
  • Bits 16-23 unset in ce/cf: no longer works as a barrier
  • Bits 16-23 set to 0x08 in ce/cf: works, and apparently $a contents have nothing to do with it - unsetting bit 16 causes 19-23 to no longer signify a $a register
  • Bits 22-23 set to random junk with bit 16 set in ce/cf: still works, even if mismatched
  • Bits 20-21 set to random junk with bit 16 set in ce/cf: boom, even if matched
  • Bit 19 mismatched between ce/cf with bit 16 set: boom
  • Bit 0 set to random junk with bit 16 set in ce/cf: still works, even if mismatched
  • Bits 1-2, 5-15, 17-18 set to random junk: works, even if mismatched, whether bit 16 is set or not
  • Bits 3-4 set to random junk: boom, even if matched, whether bit 16 is set or not

In summary, there seem to be two types of these instructions: immediate and register. The immediate type is indicated by bit 16 being set to 0, has store/load bit at bit 19, and some unknown argument (let's call it W) in bits 20-21. The register type is indicated by bit 16 being set to 1, has store/load bit at bit 0, and W is taken from bits 30-31 of $a register selected by bits 19-23. In addition, both types have an unknown parameter U at bits 3-4. It also seems very likely that W is the transfer target and we're simply thrown off because only target 0 is valid in nv50 channel switch microcode.

Still a lot of unknowns in this area. To be continued...

Elapsed time: 9h

Currently unrated

Comments

Kilja 4 years, 10 months ago

Keep up the good work !
It's always a pleasure to read you experiments :)
It reminds me of some of mine (even though they were less advanced than yours !).

Link | Reply
Currently unrated

New Comment

required

required (not published)

optional

required