1: The beginning

(0 comments)

My next target will be the VP1 video processor. This processor is present on most NV40 generation cards and the original NV50, corresponding to the first generation of PureVideo HD. Together with two other engines (PMPEG and PMSRCH) and an arbiter, it makes up the video decoding/encoding subsystem.

The tasks of this engine include at least VC-1 and H.264 motion compensation. However, this capability isn't supported on linux via vdpau - we can't get any example of using the engine to decode anything.

Luckily, it is possible to manually create the VP object (0x4176 or 0x5076) using the official nvidia driver and intercept the microcode upload. The bulk of early RE process will be based on microcode images obtained this way. By chance, I stumbled on another very useful piece of information on the web:

the PureVideo technical brief, which reveals some fairly useful information about the internal architecture of the VP1:

VP1 architecture

What can be seen in this picture:

  • [at least?] three register files, of 32 registers each: 32-bit address registers, 32-bit scalar registers, 128-bit vector registers. Let's call them $aX, $rX, $vX.
  • $a and $r have 2 read ports and 1 write port. $v have 3 read ports and 1 write port.
  • the opcodes are 128-bit, the call stack is built into the processor and likely very small
  • instructions are executed from VRAM, with 4kB instruction cache - there's no dedicated instruction SRAM like on falcon/vµc.
  • the data space, on the other hand, is dedicated, 8kB, byte addressed (note the arrow that says 13 bits of address registers are used). There appears to be some way to launch a bulk transfer of data to/from VRAM.
  • $r and $v may be directly loaded/stored from/to the data memory. No word on $a.
  • the transaction size between instruction cache / data store and the VRAM is 256 bits. Not sure what this means exactly - cache line size? Seems a bit too small with instructions that large...
  • there's a connection from $r file to $v file - mov from $r to $v?
  • there's some sort of connection between "host" (PFIFO?) and the processor.

All of this will be *very* useful during early RE. I have absolutely no idea why nvidia published this figure in the first place. There's no possible use for it other than helping me RE this engine. Well, thanks.

There's nothing else that can be deduced by looking at this figure. Time to look at the microcode samples.

I've extracted two sets of micorocode samples, from NV44 and NV50. First let's look at the upload sequence on NV44.

[1] 417.501156 MMIO32 W 0x00f41c 0xffffffff PVP+0x41c <= 0xffffffff
[1] 417.501181 MMIO32 R 0x00f420 0x00000000 PVP+0x420 => 0
[1] 417.501205 MMIO32 W 0x00f420 0x00000001 PVP+0x420 <= 0x1

Hm. No ideas at this time.

[1] 417.501222 RAMIN32 64c70 <= 0
[1] 417.501235 RAMIN32 64c74 <= 0
[1] 417.501248 RAMIN32 64c78 <= 0
[...]
[1] 417.502087 RAMIN32 64d64 <= 0
[1] 417.502100 RAMIN32 64d68 <= 0
[1] 417.502112 RAMIN32 64d6c <= 0
[1] 417.502138 MMIO32 R 0x002500 0x00000001 PFIFO.CACHES => { REASSIGN }
[1] 417.502163 MMIO32 W 0x002500 0x00000000 PFIFO.CACHES <= { 0 }
[1] 417.502188 MMIO32 R 0x003204 0x0001001e PFIFO.CACHE1.PUSH1 => { CHID = 0x1e | MODE = DMA }
[1] 417.502202 RAMIN32 2005c <= 64c7
[1] 417.502226 MMIO32 W 0x002500 0x00000001 PFIFO.CACHES <= { REASSIGN }

Ok, that's clearly creating the per-channel context save area for my test channel. The size is 0x100 bytes.

[1] 417.502267 RAMIN32 65180 <= 65000000
[1] 417.502280 RAMIN32 65184 <= 65080000
[1] 417.502293 RAMIN32 65188 <= 65100000
[1] 417.502306 RAMIN32 6518c <= 65180000
[1] 417.502319 RAMIN32 65190 <= 65200000
[...]
[1] 419.579720 RAMIN32 f3968 <= efffffff
[1] 419.579733 RAMIN32 f396c <= efffffff
[1] 419.579745 RAMIN32 f3970 <= efffffff
[1] 419.579758 RAMIN32 f3974 <= efffffff
[1] 419.579839 MMIO32 R 0x00f498 0x00000001 PVP.UCODE_UNK0_ADDR => { UNK0 = 0x1 | INSTANCE = 0 }
[1] 419.579864 MMIO32 W 0x00f498 0x00000001 PVP.UCODE_UNK0_ADDR <= { UNK0 = 0x1 | INSTANCE = 0 }
[1] 419.579888 MMIO32 R 0x00f498 0x00000001 PVP.UCODE_UNK0_ADDR => { UNK0 = 0x1 | INSTANCE = 0 }
[1] 419.579913 MMIO32 W 0x00f498 0x00064d81 PVP.UCODE_UNK0_ADDR <= { UNK0 = 0x1 | INSTANCE = 0x64d80 }

Ah, here comes the first upload. As expected, the code is thrown into VRAM and its address is poked into some register. And a strange part: the code isn't uploaded in a single linear chunk, it's split into several pieces. I've processed it a bit and came up with the following list of uploaded pieces (with the "base" address substracted from everything):

start..end [length]
00000400..000006a0 [000002a0] p00.0
00000000..00000020 [00000020] p00.1
00000800..00000898 [00000098] p00.2
00000800..00000a00 [00000200] p01.0
00000000..00000020 [00000020] p0j.1
00000c00..00000cc8 [000000c8] p01.2
00000c00..000014a0 [000008a0] p02.0
00000000..00000020 [00000020] p02.1
00001800..000018f8 [000000f8] p02.2
00001800..00001aa0 [000002a0] p03.0
00000000..00000020 [00000020] p03.1
00001c00..00001d28 [00000128] p03.2
00001c00..00010480 [0000e880] p04.0
00000000..00000020 [00000020] p04.1
00010800..00010958 [00000158] p04.2
00010800..00014a20 [00004220] p05.0
00000000..00000020 [00000020] p05.1
00014c00..00014d88 [00000188] p05.2
00014c00..00016d20 [00002120] p06.0
00000000..00000020 [00000020] p06.1
00017000..000171b8 [000001b8] p06.2
00017000..00017280 [00000280] p07.0
00000000..00000020 [00000020] p07.1
00017400..000175e8 [000001e8] p07.2
00017400..0006f800 [00058400] p08.0
00000000..00000020 [00000020] p08.1
0006f800..0006fa18 [00000218] p08.2
0006f800..0007d5e0 [0000dde0] p09.0
00000000..00000020 [00000020] p09.1
0007d800..0007da48 [00000248] p09.2
0007d800..000814c0 [00003cc0] p10.0
00000000..00000020 [00000020] p10.1
00081800..00081a78 [00000278] p10.2
00081800..00081aa0 [000002a0] p11.0
00000000..00000020 [00000020] p11.1
00081c00..00081ea8 [000002a8] p11.2
00081c00..00081ea0 [000002a0] p12.0
00000000..00000020 [00000020] p12.1
00082000..000822d8 [000002d8] p12.2
00082000..000822a0 [000002a0] p13.0
00000000..00000020 [00000020] p13.1
00082400..00082708 [00000308] p13.2
00082400..00084880 [00002480] p14.0
00000000..00000020 [00000020] p14.1
00084c00..00084f38 [00000338] p14.2
00084c00..0008c3a0 [000077a0] p15.0
00000000..00000020 [00000020] p15.1
0008c400..0008c768 [00000368] p15.2
0008c400..0008cb40 [00000740] p16.0
00000000..00000020 [00000020] p16.1
0008cc00..0008cf98 [00000398] p16.2
0008cc00..0008dbe0 [00000fe0] p17.0
00000000..00000020 [00000020] p17.1
0008dc00..0008dfc8 [000003c8] p17.2
0008dc00..0008e540 [00000940] p18.0
00000000..00000020 [00000020] p18.1
0008e800..0008ebf8 [000003f8] p18.2

Some of the pieces are overwriting previous ones! Strange... I'll look at it later. Let's continue.

[1] 419.579927 RAMIN32 f3980 <= 6b528047
[1] 419.579940 RAMIN32 f3984 <= 6b5ac047
[1] 419.579952 RAMIN32 f3988 <= 65080180
[1] 419.579965 RAMIN32 f398c <= 4dfa83c0
[...]
[1] 419.581952 RAMIN32 f3c0c <= ef0001ff
[1] 419.581964 RAMIN32 f3c10 <= df000007
[1] 419.581976 RAMIN32 f3c14 <= 4fffffff
[1] 419.581989 RAMIN32 f3c18 <= bf000007
[1] 419.582001 RAMIN32 f3c1c <= ef0001ff
[1] 419.582014 RAMIN32 f3c20 <= df000007
[1] 419.582026 RAMIN32 f3c24 <= 4fffffff
[1] 419.582038 RAMIN32 f3c28 <= bf000007
[1] 419.582051 RAMIN32 f3c2c <= ef0001ff
[1] 419.582063 RAMIN32 f3c30 <= df000007
[1] 419.582076 RAMIN32 f3c34 <= 4fffffff
[1] 419.582088 RAMIN32 f3c38 <= bf000007
[1] 419.582101 RAMIN32 f3c3c <= ef0001ff
[1] 419.582125 MMIO32 R 0x00f46c 0x00000001 
               PVP.UCODE_UNK1_ADDR => { UNK0 = 0x1 | INSTANCE = 0 }
[1] 419.582149 MMIO32 W 0x00f46c 0x00000001 
               PVP.UCODE_UNK1_ADDR <= { UNK0 = 0x1 | INSTANCE = 0 }
[1] 419.582174 MMIO32 R 0x00f46c 0x00000001 
               PVP.UCODE_UNK1_ADDR => { UNK0 = 0x1 | INSTANCE = 0 }
[1] 419.582198 MMIO32 W 0x00f46c 0x000f3981 
               PVP.UCODE_UNK1_ADDR <= { UNK0 = 0x1 | INSTANCE = 0xf3980 }
[1] 419.582223 MMIO32 R 0x00f474 0x00000000 PVP+0x474 => 0
SLEEP 0.334000ms
[1] 419.582557 MMIO32 W 0x00f474 0x00000010 PVP+0x474 <= 0x10

Another piece of microcode, with base address stuffed into another register. Strange. The write at the end could cause it to start execution, or something.

[1] 419.582573 RAMIN32 f3c40 <= 6b534047
[1] 419.582585 RAMIN32 f3c44 <= 6b594047
[1] 419.582598 RAMIN32 f3c48 <= 6b618047
[1] 419.582610 RAMIN32 f3c4c <= 6b688047
[1] 419.582623 RAMIN32 f3c50 <= 6570ffff
[1] 419.582635 RAMIN32 f3c54 <= 7570000f
[1] 419.582647 RAMIN32 f3c58 <= 426b5c47
[1] 419.582660 RAMIN32 f3c5c <= 6cfac000
[...]
[1] 419.591725 RAMIN32 f46e0 <= df000007
[1] 419.591737 RAMIN32 f46e4 <= 4fffffff
[1] 419.591750 RAMIN32 f46e8 <= bf000007
[1] 419.591762 RAMIN32 f46ec <= ef0001ff
[1] 419.591775 RAMIN32 f46f0 <= df000007
[1] 419.591787 RAMIN32 f46f4 <= 4fffffff
[1] 419.591800 RAMIN32 f46f8 <= bf000007
[1] 419.591812 RAMIN32 f46fc <= ef0001ff
[1] 419.591840 MMIO32 R 0x00f464 0x00000001 PVP.UCODE_UNK2_ADDR => { UNK0 = 0x1 | INSTANCE = 0 }
[1] 419.591865 MMIO32 W 0x00f464 0x00000001 PVP.UCODE_UNK2_ADDR <= { UNK0 = 0x1 | INSTANCE = 0 }
[1] 419.591889 MMIO32 R 0x00f464 0x00000001 PVP.UCODE_UNK2_ADDR => { UNK0 = 0x1 | INSTANCE = 0 }
[1] 419.591914 MMIO32 W 0x00f464 0x000f3c41 PVP.UCODE_UNK2_ADDR <= { UNK0 = 0x1 | INSTANCE = 0xf3c40 }
[1] 419.591938 MMIO32 R 0x00f474 0x00000010 PVP+0x474 => 0x10
[1] 419.591962 MMIO32 W 0x00f474 0x00000011 PVP+0x474 <= 0x11

And yet another one. So 474 is some sort of "these pieces are ready to use" register? Hm.

[1] 419.591987 MMIO32 W 0x00f400 0x01111111 PVP.INTR <= { UNK0 | UNK4 | UNK8 | FIFO | UNK16 | UNK20 | UNK24 }
[1] 419.592011 MMIO32 W 0x00f404 0x01111111 PVP.INTR_EN <= { UNK0 | UNK4 | UNK8 | FIFO | UNK16 | UNK20 | UNK24 }
[1] 419.592063 RAMIN32 171e8 <= beef5076
[1] 419.592076 RAMIN32 171ec <= 400004

The first two regs have been determined beforehand to be interrupt regs from some fuzzing&scanning on the VP1 range. Sure enough, they're all cleared and enabled here. Then, the VP object is connected to the RAMHT - so PVP is considered ready to use at this point... whew, that was quick.

Looking at NV50 reveals similiar structure, with some differences:

  • the low 4 bits of UCODE_*_ADDR are 0 instead of 1
  • the three major parts are uploaded in reverse order
  • after uploading the UNK1 part, bit 8 is set in 474 in addition to bit 4
  • address of UNK0 is poked into 468 instead of 498

Now let's look at the actual microcode. A quick look reveals two important things:

  • the code has some periodicity at 16 bytes, as expected. There's a very common sequence, especially at the endings: 0xdf000007, 0x4fffffff, 0xbf000007, 0xef0001ff.
  • on the other hand, there's also periodicity at 4 bytes, for instance in the very first chunk uploaded on NV44.

The obvious conclusion is that VP1 is a VLIW machine with 128-bit instruction word made of 4 32-bit opcode slots. It's likely there are some restrictions on execution unit usage in these slots. 0xdf000007, 0x4fffffff, 0xbf000007, 0xef0001ff is likely a sequence of nops for different execution units.

Now let's get back to the chunked upload that overwrites itself. We have a single sequence repeated several times:

  • upload something of random length at next 0x400-aligned position following p(XX-1).0, or at 0x400 if XX==0 [pXX.0]
  • upload something of 0x20 bytes at position 0 [pXX.1]
  • upload something of length 0x98 + 0x30*XX at next 0x400-aligned position following pXX.0 [pXX.2]

Let's look at the smallest one first - pXX.1, always at position 0:

mwk@nightmare ~/microcode/vp1/p1 $ envydis -m vp1 -w part-00.1
00000000: df000007 ??? [unknown: df000007]
00000000: 4fffffff ??? [unknown: 4fffffff]
00000000: bf000007 ??? [unknown: bf000007]
00000000: eaf80080 ??? [unknown: eaf80080]
00000001: df000007 ??? [unknown: df000007]
00000001: 4fffffff ??? [unknown: 4fffffff]
00000001: bf000007 ??? [unknown: bf000007]
00000001: efffffff ??? [unknown: efffffff]
mwk@nightmare ~/microcode/vp1/p1 $ envydis -m vp1 -w part-01.1
00000000: df000007 ??? [unknown: df000007]
00000000: 4fffffff ??? [unknown: 4fffffff]
00000000: bf000007 ??? [unknown: bf000007]
00000000: eaf800c0 ??? [unknown: eaf800c0]
00000001: df000007 ??? [unknown: df000007]
00000001: 4fffffff ??? [unknown: 4fffffff]
00000001: bf000007 ??? [unknown: bf000007]
00000001: efffffff ??? [unknown: efffffff]
mwk@nightmare ~/microcode/vp1/p1 $ envydis -m vp1 -w part-02.1
00000000: df000007 ??? [unknown: df000007]
00000000: 4fffffff ??? [unknown: 4fffffff]
00000000: bf000007 ??? [unknown: bf000007]
00000000: eaf80180 ??? [unknown: eaf80180]
00000001: df000007 ??? [unknown: df000007]
00000001: 4fffffff ??? [unknown: 4fffffff]
00000001: bf000007 ??? [unknown: bf000007]
00000001: efffffff ??? [unknown: efffffff]
mwk@nightmare ~/microcode/vp1/p1 $ envydis -m vp1 -w part-03.1
00000000: df000007 ??? [unknown: df000007]
00000000: 4fffffff ??? [unknown: 4fffffff]
00000000: bf000007 ??? [unknown: bf000007]
00000000: eaf801c0 ??? [unknown: eaf801c0]
00000001: df000007 ??? [unknown: df000007]
00000001: 4fffffff ??? [unknown: 4fffffff]
00000001: bf000007 ??? [unknown: bf000007]
00000001: efffffff ??? [unknown: efffffff]
mwk@nightmare ~/microcode/vp1/p1 $ envydis -m vp1 -w part-04.1
00000000: df000007 ??? [unknown: df000007]
00000000: 4fffffff ??? [unknown: 4fffffff]
00000000: bf000007 ??? [unknown: bf000007]
00000000: eaf81080 ??? [unknown: eaf81080]
00000001: df000007 ??? [unknown: df000007]
00000001: 4fffffff ??? [unknown: 4fffffff]
00000001: bf000007 ??? [unknown: bf000007]
00000001: efffffff ??? [unknown: efffffff]

The second bundle is made of mostly suspected nop opcodes, with slot 3 replaced by 0xefffffff. The first bundle is mostly suspected nops, but with slot 3 replaced by 0xeaf80000 | something, where something is the address of pXX.2, shifted right by 4! Now, we know something:

  • PC is likely counted in 0x10-byte bundles [ie. byte address shifted by 4]
  • 0xeaf8XXXX is almost certainly an unconditional absolute branch to bundle XXXX
  • PC is counted from the base address poked into the PVP register
  • PC is at least 16 and at most 19 bits - we'll conservatively assume 16 for now
  • if VP1 has delay slots, 0xefffffff is some further work to do on entry, or maybe another weird form of a nop
  • if VP1 doesn't have delay slots, 0xeaf8XXXX is likely a call opcode instead, and 0xefffffff is some sort of exit or return opcode.

That's all that can be extracted from pXX.1, let's move on to pXX.2:

mwk@nightmare ~/microcode/vp1/p1 $ envydis -m vp1 -w part-00.2
00000000: 6b0fc0af ??? [unknown: 6b0fc0af]
00000000: 7e087f80 ??? [unknown: 7e087f80]
00000000: 7e084080 ??? [unknown: 7e084080]
00000000: 4fffffff nop4
00000001: bf000007 nopb
00000001: efffffff unkend
00000001: 65100000 ??? [unknown: 65100000]
00000001: 75100000 ??? [unknown: 75100000]
00000002: 4df845c3 ??? [unknown: 4df845c3]
00000002: 4fffffff nop4
00000002: e200043f ??? [unknown: e200043f]
00000002: 4fffffff nop4
00000003: 4fffffff nop4
00000003: eaf80040 bra 0x40
00000003: 4fffffff nop4
00000003: 4fffffff nop4
00000004: bf000007 nopb
00000004: efffffff unkend
00000004: efffffff unkend
00000004: efffffff unkend
00000005: efffffff unkend
00000005: efffffff unkend
00000005: efffffff unkend
00000005: efffffff unkend
00000006: efffffff unkend
00000006: fff8dead ??? [unknown: fff8dead]
00000006: efffffff unkend
00000006: efffffff unkend
00000007: efffffff unkend
00000007: efffffff unkend
00000007: efffffff unkend
00000007: efffffff unkend
00000008: efffffff unkend
00000008: efffffff unkend
00000008: efffffff unkend
00000008: efffffff unkend
00000009: efffffff unkend
00000009: efffffff unkend
mwk@nightmare ~/microcode/vp1/p1 $ envydis -m vp1 -w part-01.2
00000000: 6b0fc0af ??? [unknown: 6b0fc0af]
00000000: 7e087f80 ??? [unknown: 7e087f80]
00000000: 7e084080 ??? [unknown: 7e084080]
00000000: 4fffffff nop4
00000001: bf000007 nopb
00000001: efffffff unkend
00000001: 65100000 ??? [unknown: 65100000]
00000001: 75100000 ??? [unknown: 75100000]
00000002: 4df845c3 ??? [unknown: 4df845c3]
00000002: 4fffffff nop4
00000002: e200043f ??? [unknown: e200043f]
00000002: 4fffffff nop4
00000003: 4fffffff nop4
00000003: eaf80040 bra 0x40
00000003: 4fffffff nop4
00000003: 4fffffff nop4
00000004: bf000007 nopb
00000004: efffffff unkend
00000004: 65100001 ??? [unknown: 65100001]
00000004: 75100000 ??? [unknown: 75100000]
00000005: 4df845c3 ??? [unknown: 4df845c3]
00000005: 4fffffff nop4
00000005: e200043f ??? [unknown: e200043f]
00000005: 4fffffff nop4
00000006: 4fffffff nop4
00000006: eaf80080 bra 0x80
00000006: 4fffffff nop4
00000006: 4fffffff nop4
00000007: bf000007 nopb
00000007: efffffff unkend
00000007: efffffff unkend
00000007: efffffff unkend
00000008: efffffff unkend
00000008: efffffff unkend
00000008: efffffff unkend
00000008: efffffff unkend
00000009: efffffff unkend
00000009: fff8dead ??? [unknown: fff8dead]
00000009: efffffff unkend
00000009: efffffff unkend
0000000a: efffffff unkend
0000000a: efffffff unkend
0000000a: efffffff unkend
0000000a: efffffff unkend
0000000b: efffffff unkend
0000000b: efffffff unkend
0000000b: efffffff unkend
0000000b: efffffff unkend
0000000c: efffffff unkend
0000000c: efffffff unkend
mwk@nightmare ~/microcode/vp1/p1 $ envydis -m vp1 -w part-02.2
00000000: 6b0fc0af ??? [unknown: 6b0fc0af]
00000000: 7e087f80 ??? [unknown: 7e087f80]
00000000: 7e084080 ??? [unknown: 7e084080]
00000000: 4fffffff nop4
00000001: bf000007 nopb
00000001: efffffff unkend
00000001: 65100000 ??? [unknown: 65100000]
00000001: 75100000 ??? [unknown: 75100000]
00000002: 4df845c3 ??? [unknown: 4df845c3]
00000002: 4fffffff nop4
00000002: e200043f ??? [unknown: e200043f]
00000002: 4fffffff nop4
00000003: 4fffffff nop4
00000003: eaf80040 bra 0x40
00000003: 4fffffff nop4
00000003: 4fffffff nop4
00000004: bf000007 nopb
00000004: efffffff unkend
00000004: 65100001 ??? [unknown: 65100001]
00000004: 75100000 ??? [unknown: 75100000]
00000005: 4df845c3 ??? [unknown: 4df845c3]
00000005: 4fffffff nop4
00000005: e200043f ??? [unknown: e200043f]
00000005: 4fffffff nop4
00000006: 4fffffff nop4
00000006: eaf80080 bra 0x80
00000006: 4fffffff nop4
00000006: 4fffffff nop4
00000007: bf000007 nopb
00000007: efffffff unkend
00000007: 65100002 ??? [unknown: 65100002]
00000007: 75100000 ??? [unknown: 75100000]
00000008: 4df845c3 ??? [unknown: 4df845c3]
00000008: 4fffffff nop4
00000008: e200043f ??? [unknown: e200043f]
00000008: 4fffffff nop4
00000009: 4fffffff nop4
00000009: eaf800c0 bra 0xc0
00000009: 4fffffff nop4
00000009: 4fffffff nop4
0000000a: bf000007 nopb
0000000a: efffffff unkend
0000000a: efffffff unkend
0000000a: efffffff unkend
0000000b: efffffff unkend
0000000b: efffffff unkend
0000000b: efffffff unkend
0000000b: efffffff unkend
0000000c: efffffff unkend
0000000c: fff8dead ??? [unknown: fff8dead]
0000000c: efffffff unkend
0000000c: efffffff unkend
0000000d: efffffff unkend
0000000d: efffffff unkend
0000000d: efffffff unkend
0000000d: efffffff unkend
0000000e: efffffff unkend
0000000e: efffffff unkend
0000000e: efffffff unkend
0000000e: efffffff unkend
0000000f: efffffff unkend
0000000f: efffffff unkend

So, pXX.2 appears to be the "dispatch" chunk that determines which pXX.0 chunk is to be called somehow, then runs it. Notable facts:

  • the code loads an integer from somewhere, then decides which chunk to launch based on its value.
  • 0x651000XX is either a compare instruction, or a mov from immediate instruction that loads an immediate to a register for use by a compare instruction.
  • the branches to pXX.0 chunks are unconditional - thus some of the unknown instructions have to be conditional branches skipping over them. I'm betting on 0xe200043f - it's in the right bundle [after the bundle with the compare/load], and has the place to encode a reasonable large relative displacement. The branch instruction has to be relative - the opcodes don't change with pXX.2 parts taken from different addresses.

Looking at p08.0 reveals a strange thing though - the absolute addreses it uses in 0xeaf80000 ocpode don't make sense and jump all over the place if counted from the microcode base. They however refer to seemingly valid targets if counted from start of p08.0... It seems as if the "absolute branch base" can be changed somehow.

Other notes from analysis of pXX.0 chunks:

  • all of them end with a 0xfff80000 opcode, followed by ~6 bundles of nops. The pXX.2 chunk contains 0xfff8dead ocpode near the end. Perhaps it's some sort of "exit" instruction, with the exit code as a parameter.
  • only p08.0 contains bra opcodes - it may be a fairly special type of a branch... perhaps the remainder are relative jumps?

Now, since we have two different sets of samples, it's time to compare the two.

Some differences from p01.0:

@@ -1,9 +1,9 @@
0xcc000000
0xcd000000
0x6ba7c0af
-0x65b8000f
+0x65b80003
0x75b80000
-0x6ea500e7
+0x6ea500f7
0x42a52e47
0xef0001ff
0x6ca5000f
@@ -11,28 +11,36 @@
0x65b00003
0xe10001db
0xef0001ff
-0xe1000dbb
+0xe10011bb
0x6ba400af
0xef0001ff
0x6ba480af
-0xe1000bbb
+0xe1000fbb
0xef0001ff
-0xe1000bbb
+0xe1000fbb
0x6ba500af
0xef0001ff
0x6ba580af
-0xe10009bb
+0xe1000dbb
0xef0001ff
-0xe10007bb
+0xe1000bbb
0x6ba600af
0xef0001ff
0x6ba680af
-0xe10005bb
+0xe10009bb
+0xef0001ff
+0xe10009bb
+0x6ba700af
+0xef0001ff
+0x6ba780af
+0xe10007bb
+0xef0001ff
+0xfff90000
0xef0001ff
-0xfff93040
0xef0001ff
0xef0001ff
0xef0001ff
+0xbf000007
0xef0001ff
0x42ad2c47
0x42b5be4f

Addition of a few instructions causes several instructions to have a common bitfield modified by a constant offset... this can only mean that 0xe1XXXXXX are [relative] branch instructions. A simliar analysis of p05.0 reveals 0xe0*, 0xe2*, 0xe4* as other likely candidates.

We may conclude that 0xe* opcodes are opcodes for branch execution unit, and that first 4 (or 3?) bits of an opcode determine its class.

Another interesting diff, on p05.0:

@@ -3816,7 +3836,7 @@
0x75504400
0x6bd180a7
0xad3fc003
-0x4cd6bdc7
+0x4cd699c7
0xbc29c107
0x6a068067
0xbc41c407
@@ -3892,7 +3912,7 @@
0x7e808021
0x6c84000f
0x62843ff7
-0x6c640007
+0x6cec0007
0x7e88c027
0x6bb880af
0x4cbdfdc7
@@ -3923,9 +3943,9 @@
0xc7134f1f
0xe2fff62f
0xcef90000
-0x7ecb3fe7
+0x7ecf7fe7
0x6c8c7ff8
-0x6c830007
+0x6c874007
0x6bb80067
0xe0000627
0x4dcdf3c7
@@ -3940,7 +3960,7 @@
0xef0001ff
0x6bc9c0a7
0x6bb8c0af
-0x4cce7dc7
+0x4cce59c7
0x6a064067
0x4cbdfdc7
0x65500000
@@ -3950,7 +3970,7 @@
0x63529807
0xce090001
0x6aa28067
-0x6c830000
+0x6c874000
0x6cfa3e82
0x4dbdf3c7
0x7e88c027
@@ -3999,9 +4019,9 @@
0xc713471f
0xe2fff62f
0xcef90000
-0x7ecb3fe7
+0x7ecf7fe7
0x6c8c7ff8
-0x6c830007
+0x6c874007
0x6bb80067
0xe0000627
0x4dcdf3c7

Here, NV44 code has 01100 while NV50 has 11101 in several places, at bit positions 19 (0x6c...), 14 (0x7e... and 0x6c...). Same thing also happens with 1111/0110 at position 12 (0x4c). These locations are likely register selection fields, and the differences are caused by register allocation being shaken up from other changes between these two samples.

Now let's take a look at p00.0:

00000000: 65000000 ??? [unknown: 65000000]
00000000: 65080000 ??? [unknown: 65080000]
00000000: 65100000 ??? [unknown: 65100000]
00000000: 65180000 ??? [unknown: 65180000]
00000001: 65200000 ??? [unknown: 65200000]
00000001: 65280000 ??? [unknown: 65280000]
00000001: 65300000 ??? [unknown: 65300000]
00000001: 65380000 ??? [unknown: 65380000]
00000002: 65400000 ??? [unknown: 65400000]
00000002: 65480000 ??? [unknown: 65480000]
00000002: 65500000 ??? [unknown: 65500000]
00000002: 65580000 ??? [unknown: 65580000]
00000003: 65600000 ??? [unknown: 65600000]
00000003: 65680000 ??? [unknown: 65680000]
00000003: 65700000 ??? [unknown: 65700000]
00000003: 65780000 ??? [unknown: 65780000]
00000004: 65800000 ??? [unknown: 65800000]
00000004: 65880000 ??? [unknown: 65880000]
00000004: 65900000 ??? [unknown: 65900000]
00000004: 65980000 ??? [unknown: 65980000]
00000005: 65a00000 ??? [unknown: 65a00000]
00000005: 65a80000 ??? [unknown: 65a80000]
00000005: 65b00000 ??? [unknown: 65b00000]
00000005: 65b80000 ??? [unknown: 65b80000]
00000006: 65c00000 ??? [unknown: 65c00000]
00000006: 65c80000 ??? [unknown: 65c80000]
00000006: 65d00000 ??? [unknown: 65d00000]

This is very likely a sequence meant to set [most of] one of the register files to 0. Similiar zeroing out is performed as part of vµc init sequence, so clearly nvidia likes this pattern. This confirms our guess about register bitfield, and further pinpoints bits 19-23 as the destination register. Sadly, we don't know *which* register file is accessed here. Combined with our notes from the pXX.2 sequence, this also means that 0x65 is likely a load-immediate instruction, or an add immediate.

00000006: 6cdfc000 ??? [unknown: 6cdfc000]
00000007: 6ce7c001 ??? [unknown: 6ce7c001]
00000007: 6cefc002 ??? [unknown: 6cefc002]
00000007: 6cf7c003 ??? [unknown: 6cf7c003]

A different opcode... With bits 0-1 being different. May be immediate loads, but why would immediate load of 0 differ from opcodes used in previous zeroing sequence? Another possibility is a load, with bits 14-18 used for base address register and 0-1 [probably more] as the offset. It's likely that $a31 would be hardwired to 0 for this purpose. Plausible. If that's true, 0x65/0x6c would likely operate on $a register file, since the previous diff means bits 14-18 and 19-23 of 0x6c access the same register file. Lack of initialisation for register 31 gives further support for this theory.

00000007: ad07c000 ??? [unknown: ad07c000]
00000008: ad0fc001 ??? [unknown: ad0fc001]
00000008: ad17c002 ??? [unknown: ad17c002]
00000008: ad1fc003 ??? [unknown: ad1fc003]
00000008: ad27c007 ??? [unknown: ad27c007]
00000009: ad2fc007 ??? [unknown: ad2fc007]
00000009: ad37c007 ??? [unknown: ad37c007]
00000009: ad3fc007 ??? [unknown: ad3fc007]
00000009: ad47c007 ??? [unknown: ad47c007]
0000000a: ad4fc007 ??? [unknown: ad4fc007]
0000000a: ad57c007 ??? [unknown: ad57c007]
0000000a: ad5fc007 ??? [unknown: ad5fc007]
0000000a: ad67c007 ??? [unknown: ad67c007]
0000000b: ad6fc007 ??? [unknown: ad6fc007]
0000000b: ad77c007 ??? [unknown: ad77c007]
0000000b: ad7fc007 ??? [unknown: ad7fc007]
0000000b: ad87c007 ??? [unknown: ad87c007]
0000000c: ad8fc007 ??? [unknown: ad8fc007]
0000000c: ad97c007 ??? [unknown: ad97c007]
0000000c: ad9fc007 ??? [unknown: ad9fc007]
0000000c: ada7c007 ??? [unknown: ada7c007]
0000000d: adafc007 ??? [unknown: adafc007]
0000000d: adb7c007 ??? [unknown: adb7c007]
0000000d: adbfc007 ??? [unknown: adbfc007]
0000000d: adc7c007 ??? [unknown: adc7c007]
0000000e: adcfc007 ??? [unknown: adcfc007]
0000000e: add7c007 ??? [unknown: add7c007]
0000000e: addfc007 ??? [unknown: addfc007]
0000000e: ade7c007 ??? [unknown: ade7c007]
0000000f: adefc007 ??? [unknown: adefc007]
0000000f: adf7c007 ??? [unknown: adf7c007]
0000000f: adffc007 ??? [unknown: adffc007]

Another register file is initialised somehow... the last few bits are worrying, however. These could be loads again, with 31 at bits 14-19, but why would it read from the same addresses as previously used by the suspected $a init?

0000000f: cd001fff ??? [unknown: cd001fff]
00000010: cc000010 ??? [unknown: cc000010]
00000010: 4fffffff nop4
00000010: bf000007 nopb
00000010: ef0001ff nope

No ideas at this time.

00000011: d4000081 B ??? [unknown: d4000081]
00000011: df000007 B nopd
00000011: e200014f B bra2 0x11 [unknown: 0000014f]
00000011: ef0001ff B nope

An infinite loop waiting for something? Seems far too simple to do anything else. Unless the condition used in the branch is some sort of "decrement counter and branch if non-zero". The unknown condition field looks long enough for that.

00000012: d3000007 ??? [unknown: d3000007]

Again no idea.

00000012: cb0801c0 ??? [unknown: cb0801c0]
00000012: cb1001c1 ??? [unknown: cb1001c1]
00000012: cb1801c2 ??? [unknown: cb1801c2]
00000013: cb2001c3 ??? [unknown: cb2001c3]
00000013: cb2801c7 ??? [unknown: cb2801c7]
00000013: cb3001c7 ??? [unknown: cb3001c7]
00000013: cb3801c7 ??? [unknown: cb3801c7]
00000014: cb4001c7 ??? [unknown: cb4001c7]
00000014: cb4801c7 ??? [unknown: cb4801c7]
00000014: cb5001c7 ??? [unknown: cb5001c7]
00000014: cb5801c7 ??? [unknown: cb5801c7]
00000015: cb6001c7 ??? [unknown: cb6001c7]
00000015: cb6801c7 ??? [unknown: cb6801c7]
00000015: cb7001c7 ??? [unknown: cb7001c7]
00000015: cb7801c7 ??? [unknown: cb7801c7]
00000016: cb8001c7 ??? [unknown: cb8001c7]
00000016: cb8801c7 ??? [unknown: cb8801c7]
00000016: cb9001c7 ??? [unknown: cb9001c7]
00000016: cb9801c7 ??? [unknown: cb9801c7]
00000017: cba001c7 ??? [unknown: cba001c7]
00000017: cba801c7 ??? [unknown: cba801c7]
00000017: cbb001c7 ??? [unknown: cbb001c7]
00000017: cbb801c7 ??? [unknown: cbb801c7]
00000018: cbc001c7 ??? [unknown: cbc001c7]
00000018: cbc801c7 ??? [unknown: cbc801c7]
00000018: cbd001c7 ??? [unknown: cbd001c7]
00000018: cbd801c7 ??? [unknown: cbd801c7]
00000019: cbe001c7 ??? [unknown: cbe001c7]
00000019: cbe801c7 ??? [unknown: cbe801c7]
00000019: cbf001c7 ??? [unknown: cbf001c7]
00000019: cbf801c7 ??? [unknown: cbf801c7]

And here comes the third register file. Same problems as with the second one. Also, reg 0 isn't initialised, unless the 0xd3 instruction deals with it.

0000001a: ca07c1c0 ??? [unknown: ca07c1c0]
0000001a: ca07c1c1 ??? [unknown: ca07c1c1]
0000001a: ca07c1c2 ??? [unknown: ca07c1c2]
0000001a: ca07c1c3 ??? [unknown: ca07c1c3]

Can't be an instruction with a destnation reg - they'd overwrite each other. Maybe a store instruction, with 0xcb being the load instruction?

0000001b: 80000000 ??? [unknown: 80000000]
0000001b: f0000000 ??? [unknown: f0000000]
0000001b: f0080000 ??? [unknown: f0080000]
0000001b: f0100000 ??? [unknown: f0100000]
0000001c: f0180000 ??? [unknown: f0180000]

Who knows... could there be a 4th register file? Sure there could. It's a fairly small one, though.

0000001c: 4fffffff nop4
0000001c: bf000007 nopb
0000001c: ef0001ff nope
0000001d: df000007 nopd
0000001d: 4fffffff nop4
0000001d: bf000007 nopb
0000001d: ef0001ff nope
0000001e: df000007 nopd
0000001e: 4fffffff nop4
0000001e: bf000007 nopb
0000001e: ef0001ff nope
0000001f: df000007 nopd
0000001f: 4fffffff nop4
0000001f: bf000007 nopb
0000001f: ef0001ff nope
00000020: df000007 nopd
00000020: 4fffffff nop4
00000020: bf000007 nopb
00000020: ef0001ff nope
00000021: df000007 nopd
00000021: 4fffffff nop4
00000021: bf000007 nopb
00000021: ef0001ff nope
00000022: df000007 nopd
00000022: 4fffffff nop4
00000022: bf000007 nopb
00000022: ef0001ff nope
00000023: df000007 nopd
00000023: 4fffffff nop4
00000023: bf000007 nopb
00000023: fff80000 exit 0

Ok, nothing to see here.

I don't see any other easy targets right now. I think I could consider trying to run some code at this point - hopefully the exit status will be visible somewhere. Enough for one session, though.

Elapsed time: 8h.

Current rating: 5

Comments

There are currently no comments

New Comment

required

required (not published)

optional

required