24: Bundles, pt. 1


Now that we're going to be executing proper microcode (and not just submitting single instructions), it's time to look into two important things: bundles and branches.

We have identified branch instructions before, but they haven't been covered by hwtest - the single instruction submission makes it impossible to observe the actual branching behavior. We'll need to properly cover their branching conditions and their delay slots.

And the delay slots are strictly related to the other thing: bundles. All signs point to VP1 being a relatively simple VLIW architecture with 4 independent execution units. This means that one unit of execution (commonly called a bundle) is 4 instructions, one for each execution unit. Now the question is how the microcode stream is converted into bundles (for single instruction submission interface, the mapping was dead simple).

The simplest way would be for a bundle to always take 4 words, one for each execution unit. However, we know this is not the case - issuing only opcodes intended for, say, scalar unit works perfectly fine. So, we'll need to figure out where the bundle bounduaries are.

We're going to make a few assumptions (some of them may seem obvious, but stranger things have been spotted on nVidia GPUs...):

  1. Bundles are contiguous.
  2. Order of instructions in a bundle doesn't matter (ie. SV has the same behavior as VS, assuming they both get interpreted as a single bundle).
  3. The current bundle is fully executed before the next bundle starts (maybe with exception of delay slots).
  4. When determining bundle bounduaries, the code fetch unit doesn't care about the exact instructions, only about their execution units (ie. vnop is always treated the same as vmad)

If we're wrong about those, we'll find out soon enough.

Now we need to determine how to observe whether given two instructions are in the same bundle. Let's summarise known behavior of instructions in the same bundle:

  1. All instructions execute simultanously, and read their operands as they were before the whole bundle started.
  2. The s2v path only works between instructions in the same bundle.
  3. A single $r register is written by both scalar and address (load insn): scalar wins.
  4. A single $v register is written by both vector and address (load insn): vector wins.
  5. The mov to/from other register file scalar instructions have really funny behavior and their interactions are currently excluded from hwtest.

That'd be a good moment to figure out the interactions of the instructions mentioned in number 5. 

The obvious thing now is to check what happens if mov to sr instruction collides with ordinary $v write from vector unit, a load to $v, ordinary $a write from address unit, or ordinary $l write from branch unit. Results are as follows (listed from highest to lowest priority):

  1. $v register written by multiple units: vector, address (load), scalar ($r to $v)
  2. $a register written by multiple units: scalar ($r to $a), address
  3. $l register written by multiple units: branch, scalar ($r to $l)

However, running unrestricted hwtest quickly results in failures. Funnily enough, the mov from sr happens to have a different priority than ordinary scalar instructions writing $r registers. Even better, the priority seems to depend on the source register file! So, let's improve rule 4:

  1. $r register written by multiple units: scalar (ordinary, $x/$m to $r), address, scalar ($c/$v/$a/$l to $r)

Oh, and there's another thing to be noticed along the way...

  1. When $l to $r mov is executed along with exit (opcode 0xff) in the same bundle, the mov is not executed.

Um, what? Maybe the instruction takes long enough in that case for the (quite long, as we determined before) processor shutdown to finish. Strange, I'd have expected the single instruction path to be unaffected by exit.

Even these rules are not enough for hwtest to pass properly. Apparently there's also a place for read conflicts: the $r/$v read circuitry appears to be shared between the inter-register-file mov instruction and address unit's store instructions. So if you execute both a store $rX and mov $rY to $whatever instruction in the same bundle, both instructions will actually read $rX register. The rules seem to be:

  1. if both scalar mov from sr and address units read a $v register: the $v register that both read is determined by scalar instruction
  2. if both scalar mov to sr and address units read a $r register: the $r register that both read is determined by address instruction

And with all these rules, hwtest finally passes.

This is still not a complete list - we ignored the $f, $d, $sr register files, since they interact with outside stuff. But it's not worth the effort at this point.

Well, figuring out the interactions was enough for now. We'll leave actually using them to find the bundle bounduaries to the next episode.

Elapsed time: 7h

Currently unrated


There are currently no comments

New Comment


required (not published)