14: hwtest miscellany


Continuing with the vector instruction set, we run into a problem: the results of the python script are very irregular and suggest we're pulling data from an unknown source. In particular, the 0x82 opcode is worrying: regardless of inputs, on first few dozen runs after running hwtest, it spits out some interesting values, then reverts to outputting all-0x80808080. This implies one of three things is in play:

  1. There is some data connection between the scalar and vector units, and to use the unknown vector instructions, we need to use them together with corresponding scalar instructions. This would explain the multitude of scalar instructions that appear to do nothing but flag setting. The scalar -> vector arrow seen on the diagram supports this theory.
  2. The vector unit keeps some hidden state.
  3. The instructions access the data store (rather unlikely)

But, scanning the vector instruction space reveals 4 more easy instructions:

  1. 0xaa: AND with immediate
  2. 0xab: XOR with immediate
  3. 0xaf: OR with immediate
  4. 0xba: MOV

All the others look strange. Later.

Let's pick off some low-hanging fruits first.

First, I'll check how different NV44 is from NV50. It appears that $c bits 6 and 7 are only set on NV50, and not on NV44. Other than that, everything passes. Oh well.

Second, let's test the address instructions that don't actually access memory. These are: setlo, sethi, add, hadd, logop. These are easily added to hwtest and verified. It appears I've made a mistake about hadd: bit 10 of $c is set depending on bits 0-15 and 16-29 of result, not 0-13 and 16-29. Also, it seems all of the opcode space is populated, and all unknown opcodes are memory loads or do something even more scary (they hang VP in tests).

Another thing: I started getting random failures on scalar instructions when used together with $c-modifying address instructions. Changing scalar simulator to use pre-address value of $c fixes things. True parallelism at work.

Third, let's try testing the branch instructions. These are going to be hard to test by single-instruction execution, but may be worth adding anyway. Perhaps the PC reg is modified properly...

But, no such luck. The only thing that can be verified is $l behavior. And it comes with a surprise or two.

For one, mov to $l apparently sets the "loop zero" flag if the low 8 bits are zero. Also, when low 8 bits of $l register are 0 and one of the looping branches is run, the low 8 bits are set to be equal to the high 8 bits. So apparently this can be used for repeatable loops.

Fourth, the 0x6a/0x6b opcodes should be tested too. Interesting facts:

  1. If branch op 0xff (exit) is used in the same instruction, reading $l registers doesn't work.
  2. If you write to a vector register from both scalar and vector units, vector write wins. If you write to an address register, scalar wins.

And fifth, the rounding thing for multiplication is interesting. The H.263 and VC-1 standards use alternating round to nearest: when a value is exactly at the midpoint between integer values (ie. 0.5), a per-frame flag specifies whether it'll be rounded up or down. The flag is flipped on every frame to avoid rounding errors accumulating a bias. This strongly suggests that VP1 should have such a flag somewhere.

Sure enough, writing 0xffffffff to all special registers (0xf4xx, 0xf5xx) results in differences in rounding exactly matching the expected semantics. With a bit of binary searching, this is soon pinpointed to 0xf540 bit 0.

Elapsed time: 4h.

Currently unrated


Mat2 3 years, 7 months ago

0×80808080 - some hints:
- the opcode may implement some function f that for some x:
f(x) = x
and the x in this case could be 0×80808080 or for example 0x80

Link | Reply
Currently unrated

Mat2 3 years, 7 months ago

(0x80 - suggesting a SIMD instruction)

Link | Reply
Currently unrated

Mat2 3 years, 7 months ago

You could also look at H264 to see what could be useful here - what the implementation needs.

Link | Reply
Currently unrated

New Comment


required (not published)