Now that we're going to be executing proper microcode (and not just submitting single instructions), it's time to look into two important things: bundles and branches.
We have identified branch instructions before, but they haven't been covered by hwtest - the single instruction submission makes it impossible to observe the actual branching behavior. We'll need to properly cover their branching conditions and their delay slots.
And the delay slots are strictly related to the other thing: bundles. All signs point to VP1 being a relatively simple VLIW architecture with 4 independent execution units. This means that one unit of execution (commonly called a bundle) is 4 instructions, one for each execution unit. Now the question is how the microcode stream is converted into bundles (for single instruction submission interface, the mapping was dead simple).
The simplest way would be for a bundle to always take 4 words, one for each execution unit. However, we know this is not the case - issuing only opcodes intended for, say, scalar unit works perfectly fine. So, we'll need to figure out where the bundle bounduaries are.
We're going to make a few assumptions (some of them may seem obvious, but stranger things have been spotted on nVidia GPUs...):
If we're wrong about those, we'll find out soon enough.
Now we need to determine how to observe whether given two instructions are in the same bundle. Let's summarise known behavior of instructions in the same bundle:
That'd be a good moment to figure out the interactions of the instructions mentioned in number 5.
The obvious thing now is to check what happens if mov to sr instruction collides with ordinary $v write from vector unit, a load to $v, ordinary $a write from address unit, or ordinary $l write from branch unit. Results are as follows (listed from highest to lowest priority):
However, running unrestricted hwtest quickly results in failures. Funnily enough, the mov from sr happens to have a different priority than ordinary scalar instructions writing $r registers. Even better, the priority seems to depend on the source register file! So, let's improve rule 4:
Oh, and there's another thing to be noticed along the way...
Um, what? Maybe the instruction takes long enough in that case for the (quite long, as we determined before) processor shutdown to finish. Strange, I'd have expected the single instruction path to be unaffected by exit.
Even these rules are not enough for hwtest to pass properly. Apparently there's also a place for read conflicts: the $r/$v read circuitry appears to be shared between the inter-register-file mov instruction and address unit's store instructions. So if you execute both a store $rX and mov $rY to $whatever instruction in the same bundle, both instructions will actually read $rX register. The rules seem to be:
And with all these rules, hwtest finally passes.
This is still not a complete list - we ignored the $f, $d, $sr register files, since they interact with outside stuff. But it's not worth the effort at this point.
Well, figuring out the interactions was enough for now. We'll leave actually using them to find the bundle bounduaries to the next episode.
Elapsed time: 7hShare on Twitter Share on Facebook