There are 12 unknown vector opcodes left: 0x84, 0x86, 0x87, 0x96, 0x97, 0xa6, 0xa7, and 0xb3-0xb7. Let's finish them off.
If 0x80-0x83 are any indication, 0x84 should be a version of 0x85 with no $v write. And so it is. Also, 0x86, 0x87, 0x97 should be versions of 0x84, 0x85, 0x95 that add to the accumulator instead of overwriting it. However, this causes test failures, which disappear when the third source (the one not multiplied by anything) is 0 - seems it just can't do so many additions. After removing the source from the computation, hwtest again passes perfectly.
Next up is 0x96. For some ridiculous reason, it happens to be an unsigned-output version of 0x86 (yeah, I know, it doesn't really have output, but it affects the rounding correction added to the accumulator), but with the third source coming from SRC3 field, where it conflicts with opcode flags, instead of SRC1|1. Oh well. Another instruction that looks like a design bug. Or maybe invalid encoding they didn't care about in the decoder and output junk for it. Also, 0xa6 appears to be identical to 0x96 and 0xa7 is likewise a broken version of 0x87.
Time for 0xb3. No luck finding anything in the first few tries, the instruction behaves rather chaotically. So, we'll just start forcing every register to 0 until it becomes sane. After a while, we come up with the following list of registers and bits affecting the output:
After clearing all of those out, we reliably get all 0s as output. In fact, it seems to suffice to set $v and the weirdo opcode bits to 0 to get zero at the output, but the other listed inputs are known to affect the result when $v is non-0. So we'll RE this instruction by setting everything other than $v to 0 and then slowly enabling more things. Except we'll work with bit 11 set - the $va register after all has much better precision than $v output.
But before that, let's look at the rounding bit: since rounding correction depends only on the bits of opcode (and rounding mode), this gives us a chance to easily figure out the bits that affect final $v read. Here's a list:
So, now let's set some interesting $v input. We quickly learn that the output is just the 0th register of the quarter, treated as unsigned, and shifted left the right amount to align with the read window. So let's mess with the other inputs now:
I guess it's a good time to figure out to figure out what bits select the $c input. They're quickly determined to be bits 3-4 of the opcode. But there's no way to aim at other flags than 4 and 5.
So, let's slowly figure out how the factors influence the output. by setting one factor and one $v input at a time. Since factors 0 vs 1 and 2 vs 3 are likely selected by the weirdo $vc-based input, we'll set them to identical values for now. Results:
So, it seems the instruction is a double linear interpolation: src0 + (src2 - src0) * alpha + (src3 - src0) * beta. And after coding it up, we get test failures...
One type of failure seems to involve factor selection: apparently the usual s2v $vc input selection is invalid here. After a bit of experimentation, it turns out that the mask is instead taken from $vc register selected by opcode bits 0-1, with bit 2 selecting sign flags or zero flags. Curious - they wasted 3 perfectly good bits for what could be taken from s2v bus.
The other type of failure seems to happen when bit 10 (the strange XOR flag) is set. Looking a bit closer, it seems it only affects the "baseline" source 0 read, but not the source reads used for multiplication with factors. What. But after fixing that, we get a full hwtest pass.
If the current trend keeps up, and 0xb4-0xb7 are even more scary... I'd rather not think of that.
So, 0xb4. Let's look at it. Since I have the big "clear all relevant registers" functions still on hand, let's go the same way as with 0xb3. Results:
And the test passes. Now, 0xb5:
Again, the test passes. Maybe we can finish that after all. But then we look at 0xb6 and everything goes to hell again:
It seems that source selection is, again, based on source 1 bitfield, but affected by the flags. However, this time it seems it's not just flags 4 and 5 - opcode bits 5-8 again seem to be the flag selector we've seen before. To make matters easy, we'll just set it to select the always-0 flag for now.
And the result is that we're always using the single register selected by the field, and the formula is simply $va -= factor23 * src. What.
Ok, so let's use a different flag: now the always-1 one. And the result is the same, except the source register is always src1^1.
And the test now passes, except when the flag is 4. Looking at it, it seems that in that case there are *two* inputs: first is selected by adding bits 4-5 of $c to bits 0-1 of source 1 bitfield, and the second is selected by adding 1 to that (again only to the low 2 bits). The formula is $va += factor01 * (src2 - src1) - factor23 * src1. Which matches the previous formula if both sources are identical. And the test again fully passes.
I'm not even trying to justify or understand these instructions at this point. They just are. Maybe we'll learn some excuse later, after disassembling the microcode.
And the last (also batshit insane) vector instruction remaining is 0xb7. Thankfully it quickly turns out to be just 0xb6 with signed output.
And this concludes the vector & scalar unit RE. Technically we don't yet know what non-s2v instructions put out on s2v bus, but since it's likely to be utter junk, we'll pass on that for now.
So, what's left to RE now in VP1?
All in all, enough for at least 20 more episodes, I think. And I also need to write hwdocs documentation for that...
To be continued - see you in a while.
Elapsed time: 9h, most of that being spent banging head against desk.Share on Twitter Share on Facebook