20: hwtest and vector instructions, part 5

(0 comments)

There are 12 unknown vector opcodes left: 0x84, 0x86, 0x87, 0x96, 0x97, 0xa6, 0xa7, and 0xb3-0xb7. Let's finish them off.

If 0x80-0x83 are any indication, 0x84 should be a version of 0x85 with no $v write. And so it is. Also, 0x86, 0x87, 0x97 should be versions of 0x84, 0x85, 0x95 that add to the accumulator instead of overwriting it. However, this causes test failures, which disappear when the third source (the one not multiplied by anything) is 0 - seems it just can't do so many additions. After removing the source from the computation, hwtest again passes perfectly.

Next up is 0x96. For some ridiculous reason, it happens to be an unsigned-output version of 0x86 (yeah, I know, it doesn't really have output, but it affects the rounding correction added to the accumulator), but with the third source coming from SRC3 field, where it conflicts with opcode flags, instead of SRC1|1. Oh well. Another instruction that looks like a design bug. Or maybe invalid encoding they didn't care about in the decoder and output junk for it. Also, 0xa6 appears to be identical to 0x96 and 0xa7 is likewise a broken version of 0x87.

Time for 0xb3. No luck finding anything in the first few tries, the instruction behaves rather chaotically. So, we'll just start forcing every register to 0 until it becomes sane. After a while, we come up with the following list of registers and bits affecting the output:

  • a $v register quartet selected by src1 bitfield
  • the s2v factors
  • one of $c registers (but not selected by any known field)
  • one of $vc registers (but not selected by any known field)
  • opcode bit 8, which enables rounding as usual
  • opcode bit 11, which enables $va write (in addition to $v write) - strange, I thought multiplication always has to use the accumulator...
  • opcode bit 10, which for some bizzare reason adds 0x8000 to the output

After clearing all of those out, we reliably get all 0s as output. In fact, it seems to suffice to set $v and the weirdo opcode bits to 0 to get zero at the output, but the other listed inputs are known to affect the result when $v is non-0. So we'll RE this instruction by setting everything other than $v to 0 and then slowly enabling more things. Except we'll work with bit 11 set - the $va register after all has much better precision than $v output.

But before that, let's look at the rounding bit: since rounding correction depends only on the bits of opcode (and rounding mode), this gives us a chance to easily figure out the bits that affect final $v read. Here's a list:

  • the shift bitfield is present at its usual place, 5-7
  • there is no hi/lo or normalized flag
  • instead of being determined by the major opcode, unsigned/signed read mode is selected by opcode bit 12 (1 is signed)

So, now let's set some interesting $v input. We quickly learn that the output is just the 0th register of the quarter, treated as unsigned, and shifted left the right amount to align with the read window. So let's mess with the other inputs now:

  • opcode bit 9 seems to determine whether $v input is signed
  • opcode bit 10 causes the input to be XORed with 0x80 (before sign extension, if input is signed) - strange, but not the strangest seen so far in VP1
  • setting non-0 s2v factors gives scary results - and it seems to involve multiple $v registers from the input quartet!
  • setting non-0 bits 4 and 5 of $c0 affects $v source selection: after a bit of a closer look, these bits are added to the low 2 bits of source register 1 index
  • messing with $vc does nothing

I guess it's a good time to figure out to figure out what bits select the $c input. They're quickly determined to be bits 3-4 of the opcode. But there's no way to aim at other flags than 4 and 5.

So, let's slowly figure out how the factors influence the output. by setting one factor and one $v input at a time. Since factors 0 vs 1 and 2 vs 3 are likely selected by the weirdo $vc-based input, we'll set them to identical values for now. Results:

  • factor 0/1 and $v input 0: the output is as above, minus source * factor
  • factor 0/1 and $v inputs 1, 3: 0
  • factor 0/1 and $v input 2: the output is source * factor
  • factor 2/3 and $v input 0: output as above, minus source * factor
  • factor 2/3 and $v inputs 1, 2: 0
  • factor 2/3 and $v input 3: the output is source * factor

So, it seems the instruction is a double linear interpolation: src0 + (src2 - src0) * alpha + (src3 - src0) * beta. And after coding it up, we get test failures...

One type of failure seems to involve factor selection: apparently the usual s2v $vc input selection is invalid here. After a bit of experimentation, it turns out that the mask is instead taken from $vc register selected by opcode bits 0-1, with bit 2 selecting sign flags or zero flags. Curious - they wasted 3 perfectly good bits for what could be taken from s2v bus.

The other type of failure seems to happen when bit 10 (the strange XOR flag) is set. Looking a bit closer, it seems it only affects the "baseline" source 0 read, but not the source reads used for multiplication with factors. What. But after fixing that, we get a full hwtest pass.

If the current trend keeps up, and 0xb4-0xb7 are even more scary... I'd rather not think of that.

So, 0xb4. Let's look at it. Since I have the big "clear all relevant registers" functions still on hand, let's go the same way as with 0xb3. Results:

  • the instruction always writes $va, never $v
  • the rounding and shift fields are in their usual places, but the non-present hi/lo flag is always treated as set to lo for some reason (and signedness as unsigned)
  • the inputs are selected in the same way as for 0xb3, but are always treated as unsigned and there's no XOR thing
  • the main calculation is as in 0xb3, and so is $vc/factor selection

And the test passes. Now, 0xb5:

  • rounding/shift behavior as in 0xb4
  • likewise only $va write
  • source 0 is not used, and is replaced by a single register selected by the usual source 2 bitfield... and it's read as signed, just because
  • the calculation is different: the baseline comes from the src2 single reg, with (src1.2 - src1.3) * factor01 + src1.3 * factor23 added

Again, the test passes. Maybe we can finish that after all. But then we look at 0xb6 and everything goes to hell again:

  • both $v and $va outputs are used, and the instruction adds to $va instead of overwriting it
  • rounding/shift behavior is different. For some reason, bit 9 is now the rounding flag, and 11-13 are the shift. And the output is always unsigned.
  • most importantly, neither the source selection nor computation formula match

It seems that source selection is, again, based on source 1 bitfield, but affected by the flags. However, this time it seems it's not just flags 4 and 5 - opcode bits 5-8 again seem to be the flag selector we've seen before. To make matters easy, we'll just set it to select the always-0 flag for now.

And the result is that we're always using the single register selected by the field, and the formula is simply $va -= factor23 * src. What.

Ok, so let's use a different flag: now the always-1 one. And the result is the same, except the source register is always src1^1.

And the test now passes, except when the flag is 4. Looking at it, it seems that in that case there are *two* inputs: first is selected by adding bits 4-5 of $c to bits 0-1 of source 1 bitfield, and the second is selected by adding 1 to that (again only to the low 2 bits). The formula is $va += factor01 * (src2 - src1) - factor23 * src1. Which matches the previous formula if both sources are identical. And the test again fully passes.

I'm not even trying to justify or understand these instructions at this point. They just are. Maybe we'll learn some excuse later, after disassembling the microcode.

And the last (also batshit insane) vector instruction remaining is 0xb7. Thankfully it quickly turns out to be just 0xb6 with signed output.

And this concludes the vector & scalar unit RE. Technically we don't yet know what non-s2v instructions put out on s2v bus, but since it's likely to be utter junk, we'll pass on that for now.

So, what's left to RE now in VP1?

  • load/store instructions
  • data store organisation (remember the weirdness when accessed with different strides?)
  • DMA engine and instructions
  • how the instruction stream is grouped into instruction bundles
  • the exact behavior of branch instructions (most importantly the delay slots)
  • code memory fetch behavior
  • FIFO behavior and execution start/stop control
  • context switch sequence
  • interrupts
  • special registers

All in all, enough for at least 20 more episodes, I think. And I also need to write hwdocs documentation for that...

To be continued - see you in a while.

Elapsed time: 9h, most of that being spent banging head against desk.

Currently unrated

Comments

There are currently no comments

New Comment

required

required (not published)

optional

required