16: hwtest and vector instructions, part 3


Today we'll try to make some sense out of the hidden state instructions.

We'll start by assuming that 0xbb, which always reads the same value into destination register no matter what non-hidden inputs or opcode bits we change, is just a plain read of the hidden state. We'll model it in hwtest like $c registers: by merely observing it, instead of randomizing it. For that, we'll execute opcode 0xbb000000 before and after each test execution and read result from $v0.

After doing that, opcode 0xbb passes perfectly - there are no weird flags on it or anything, it always returns the same thing. However, a lot of vector instructions turn out to modify the unknown state. In fact, most vector instructions modify the state when their opcode bit 2 is 0. And bits 0-1 select which of the 4 32-bit words of the unknown state they modify.

This quite clearly suggests the state we found is vector condition codes. Testing it on the signed add opcode confirms it: bits 0-15 of each word are sign flags (ie. bit 7 of result) for each byte of the destination vector, while bits 16-31 are zero flags. So let's call the individual words $vc0-$vc3 and the full 4-word vector $vc.

The minabs, shr/sar, mov from immediate, and signed versions of min, max, abs, neg, sub opcodes have the same $vc behavior as signed add. For unsigned versions and 8/9-bit add, it's different: the low bits (ie. sign flags) are set if bit 8 of the non-clamped result is 1 (or, equivalently, if the result is overflown). The bitwise operations and reg-reg movs, like their scalar versions, always set sign flags to 0.

The median opcode has stranger $vc behavior. After a bit of testing, we learn that it sets the sign bit unless SRC2 < SRC1 < SRC3. I suppose this means that the median opcode is actually meant as a "clip to range" opcode: SRC2 and SRC3 are the range, while SRC1 is the value to clip. The sign flag is set when SRC1 is out of the clipping range, or the range is invalid (SRC3 < SRC2).

Finally, watching $vc reveals one $vc-modifying instruction previously thought to be nop: 0x8f. After a bit of testing, we learn that its inputs include SRC1, SRC1|1, and SRC2 with $c-based mangling like seen previously on many scalar/address opcodes.

Again, I used a script that displays $vc value for every combination of the three sources in sensible range. Turns out all three sources are treated as unsigned and zero flag is set when abs(src1 - src3) == src2. And the sign flag is never set. Huh.

After plugging it into hwtest, the zero flag behavior is confirmed, but sign flags do get set sometimes. A bit of manipulating the opcode bits reveals that bits 19-22 of the opcode control sign flag setting. In fact, value 0 results in sign flag never getting set, values 1, 2, 4, and 8 are such that sign flag is always set for exactly one of them, and the other values set the sign flag if it would be set for any component power-of-two in binary notation. In other words, there appear to be 4 exclusive conditions that can be true for any given byte, and the 19-22 value gives the mask of such conditions that should cause the sign flag to be set.

Let's re-run the script with all values of the mask. It turns out that condition 0 or 1 is true when abs(src1 - src3) >= src2, and 2 or 3 is true otherwise. But whether 0/2 or 1/3 is selected seems to be decided in a strange manner. In fact, it flips on every executed instruction!

Maybe the decision depends on the previous state of $vc. Clearing $vc to 0 always results in condition 0 or 2 being true. And setting $vc to all-1 always results in condition 1 or 3. Good.

But hwtest fails again... and extremely rarely. The failure seems to indicate that the wrong $vc word is used in some rare situations (I assumed the source $vc word is same as the destination $vc word). A bit of further experimentation reveals it can only happen when the scalar opcode executed together with 0x8f is 0x04, 0x05, 0x0f, 0x24, or 0x45.

And these are exactly the opcodes suspected of sending stuff on the scalar -> vector path used for paired instructions.

So let's figure this out. Sticking with 0x0f opcode first. It seems that output depends only on scalar opcode bits 0 and 19-23. When all these bits are 0, $vc0 seems to be selected. In fact, when bits 0 and 21-23 are 0, bits 19-20 select the $vc word. But setting the other bits seems to give really weird results.

The exact 16-bit condition word used on input can be easily read by setting the 0x8f condition mask to 0xa - in this case the input flags are directly bypassed to output sign flags. After a bit of messing around, it's easy to see that setting bit 21 to 1 selects the zero flags of $vc instead of sign flags as input. And trying to use non-0 values for bits 22, 23, and 0 post-processes the zero/sign flags of given $vc word in a funny way.

Setting these bits to 1 always gives results where bits 0-3, 4-7, 8-11, and 12-15 of the result are set or unset together. Setting them to 2 likewise ties together bits (0, 2, 4, 6), (1, 3, 5, 7), (8, 10, 12, 14), (9, 11, 13, 15). And so on, apparently.

Let's assume each of these groups is set according to a single bit in the input flags. It's easy enough to determine the exact bits responsible for that - for each bit in $vc, set only that bit (manipulating $vc values is easy enough given the mov with $vc set instruction). Then, run the 0x8f opcode and see which bits of the output are set. We get the following results (for each bit of the output, the corresponding bit of input is given):

  • 0: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 (ie. bypass)
  • 1: 2, 2, 2, 2, 6, 6, 6, 6, 10, 10, 10, 10, 14, 14, 14, 14
  • 2: 4, 5, 4, 5, 4, 5, 4, 5, 12, 13, 12, 13, 12, 13, 12, 13
  • 3: 0, 0, 2, 0, 4, 4, 6, 4, 8, 8, 10, 8, 12, 12, 14, 12
  • 4: 1, 1, 1, 3, 5, 5, 5, 7, 9, 9, 9, 11, 13, 13, 13, 15
  • 5: 0, 0, 2, 2, 4, 4, 6, 6, 8, 8, 10, 10, 12, 12, 14, 14
  • 6: 1, 1, 1, 1, 5, 5, 5, 5, 9, 9, 9, 9, 13, 13, 13, 13
  • 7: 0, 2, 4, 6, 8, 10, 12, 14, ??, ??, ??, ??, ??, ??, ??, ??

These patterns remind me of complicated conditions needed to figure out neighbouring macroblocks/blocks/partitions/subpartitions in H.264. Maybe that's exactly their purpose, but let's not bother with looking up the spec right now.

Value 7 is worrying: whether the bits are set or not does not depend on the $vc input. One possibility is that it gathers inputs from two $vc words. This is easy enough to check by extending the test to cover all $vc inputs. And indeed, value 7 actually uses two $vc inputs, with the later 8 bits comming from the second input. The index of the second input is the index of the first input | 1.

After implementing all that in hwtest, we get a pass. In fact, the exact same scalar bitfields apply to the other scalar -> vector passing opcodes, not just 0x0f. This gives us a perfect pass. This also suggests that the scalar -> vector path is quite complicated: the $vc selector seems to be an independent datum from the factor consumed by 0x85/0x95.

Elapsed time: 8h.

Currently unrated


There are currently no comments

New Comment


required (not published)