I guess it's time to deal with 0xc8 and 0xc9 opcodes.
Merely enabling these instructions in hwtest gives worrying results: not only we see the obvious errors from their unimplemented behavior, but we get mismatches on 0xb6 and 0xb7 instructions, even when executed on their own. So... yet another piece of unknown context.
But, since these instructions also have the load behavior, let's figure that out first.
First, the instructions modify $a registers. It's quite likely it's post-increment behaviour. And indeed, the behavior matches register-register post-increment (with mangled second source).
Second, it appears that the instructions sometimes write some $v register, and sometimes they don't. Let's figure out when. Setting all opcode bits to 0 doesn't eliminate this variance, so it's probably based on $c input.
And sure enough, fixing $c0 bit 0 together with the opcode bits to 0 disables the write. Looking deeper, it seems that the decision is base on the same $c bit as source 2 mangling: 1 enables $v write, 0 disables it.
Third, there's another piece of weirdness: the destination $v register is not exactly determined by the destination opcode bitfield - although it's close (at most 3 registers away). So the register index is probably mangled using $c. And sure enough, it turns out that destination is $v[DST & 0x1c | ((DST + $c[4:5]) & 3)]. Same mangling as in the 0xb6 and 0xb7 instructions that it affects...
And fourth, the loaded value appears to be simple enough: it's a normal strideful load from the selected address, with 0xc8 doing a horizontal load and 0xc9 doing a vertical load.
This gives us a perfect hwtest pass if we disable 0xb6 and 0xb7 instructions. So let's figure them out now.
We'll assume the 0xc8/0xc9 instructions load the same thing as usual, but into some unknown register (let's call it $vx) and 0xb6/0xb7 consume it without modification (if they were able to modify it, we'd have seen the effects before). I suppose VP1 reset clears $vx to 0, since otherwise we should've seen random effects. I'll partially model it that way in hwtest and randomize it. Since I have no idea how to read it, I'll just skip it on comparison.
Certainly, stuffing 0 into it makes 0xb6 and 0xb7 behave as before. Before digging deeper, let's test one obvious hypothesis: since 0xb3/0xb4/0xb6/0xb7 appear to add terms like factor * (something - SRC1.0), and 0xb6/0xb7 have one term we currently model as factor * (- SRC1.0), perhaps it should really be factor * ($vx - SRC1.0).
Yes. That was it. The test passes. Now let's complete the model by adding a "read $vx" function. It can be done quickly by clearing $va and executing 0xb6 with all other operands 0 and a shift of 1.
Such readout properly reports the values that were written. And after putting it all together, we note that 0xc8/0xc9 always load into $vx (whether they also load into $v or not). And we get a complete test pass.
Elapsed time: 3h.Share on Twitter Share on Facebook