Continuing with the vector instruction set, we run into a problem: the results of the python script are very irregular and suggest we're pulling data from an unknown source. In particular, the 0x82 opcode is worrying: regardless of inputs, on first few dozen runs after running hwtest, it spits out some interesting values, then reverts to outputting all-0x80808080. This implies one of three things is in play:
But, scanning the vector instruction space reveals 4 more easy instructions:
All the others look strange. Later.
Let's pick off some low-hanging fruits first.
First, I'll check how different NV44 is from NV50. It appears that $c bits 6 and 7 are only set on NV50, and not on NV44. Other than that, everything passes. Oh well.
Second, let's test the address instructions that don't actually access memory. These are: setlo, sethi, add, hadd, logop. These are easily added to hwtest and verified. It appears I've made a mistake about hadd: bit 10 of $c is set depending on bits 0-15 and 16-29 of result, not 0-13 and 16-29. Also, it seems all of the opcode space is populated, and all unknown opcodes are memory loads or do something even more scary (they hang VP in tests).
Another thing: I started getting random failures on scalar instructions when used together with $c-modifying address instructions. Changing scalar simulator to use pre-address value of $c fixes things. True parallelism at work.
Third, let's try testing the branch instructions. These are going to be hard to test by single-instruction execution, but may be worth adding anyway. Perhaps the PC reg is modified properly...
But, no such luck. The only thing that can be verified is $l behavior. And it comes with a surprise or two.
For one, mov to $l apparently sets the "loop zero" flag if the low 8 bits are zero. Also, when low 8 bits of $l register are 0 and one of the looping branches is run, the low 8 bits are set to be equal to the high 8 bits. So apparently this can be used for repeatable loops.
Fourth, the 0x6a/0x6b opcodes should be tested too. Interesting facts:
And fifth, the rounding thing for multiplication is interesting. The H.263 and VC-1 standards use alternating round to nearest: when a value is exactly at the midpoint between integer values (ie. 0.5), a per-frame flag specifies whether it'll be rounded up or down. The flag is flipped on every frame to avoid rounding errors accumulating a bias. This strongly suggests that VP1 should have such a flag somewhere.
Sure enough, writing 0xffffffff to all special registers (0xf4xx, 0xf5xx) results in differences in rounding exactly matching the expected semantics. With a bit of binary searching, this is soon pinpointed to 0xf540 bit 0.
Elapsed time: 4h.Share on Twitter Share on Facebook