\1/24 On 21264, you have 80 physical registers of which 31 are visible.
However, on P4's you have 128 physical registers, of which
8 of them are visible at one time. The rest are used for
speculative storage and renaming (to prevent WAW and WAR hazards).
So, why 128 physical registers when only 8 are exposed (I presume
they're only ax-dx, and eax-edx)? Isn't that a lot?
\_ It's been a while since I've taken 152 but I believe this has
something to do with register renaming. Many CPUs have extremely
deep and wide execution pipelines allowing for a massive amount
of in-flight instructions. You cannot have 40 instructions in-
flight, for example, with only 8 registers. When an instruction
is issued, it is placed in a reservation station. Its regsiter
specifier is renamed to point to other reservation stations if
an instruction is already in-flight. When that in-flight
instruction is completed, the reservation station for the
pending instruction will source the write value of the instr
that it was dependent on. Since there can be more reservation
stations that user accessible registers, this is how we get
the 80 physical vs 31 visible registers.
\_ Which semester did you take 152? I didn't learn these in Sp93.
\_ Perhaps buy a more recent Comp Arch book (the H&P version,
not the P&H one). It's covered in the section on Tomasulo's
algorithm. This is more graduate level stuff so it's not
surprising that some profs don't cover it in 152.
\_ I think by 8 exposed registers they mean eax-edx, esi, edi, esp,
ebp.
\_ The P4 is a CISC machine and does a lot for one instruction. Hence
the work done per instruction is higher, and hence more registers
are necessary. When I learned RISC programming in CS60B it was
noted that 32 registers were available, but by convention most were
reserved--so you only really had 8-12 registers you could really
work with. Also, Intel has made the L0 cache very fast so that
there isn't much of a hit moving data from cache to register.
\_ I respectfully disagree with the post above. CISC requires
FEWER registers than RISC because not as many intermediate regs
need to be kept around, and if L0 was so fast there's not
"much of a hit" then FEWER registers would be required. Different
codes require very different number of registers; what all the
physical registers are used for in x86 are supporting massive
out-of-order execution, multiple simultaneous threads, loop
unrolling, etc. ... the existence proof is that Intel would not
have put them there unless they helped performance.
\_ Intel machines may be technically CISC, but they're more RISCy
than stated above. Most compilers write code that's very RISC-
like. However, since backward compatibility at every step is
key, the Intel line has been straightjacketed into the eight-
register world. Fortunately (or unfortunately if you hate the
Pentium architecture), Intel (and AMD) easily finds ways around
that. After 30 years shackled to the same base instructions
I'm not quite sure why Intel doesn't build a second assembly
language to fully leverage the architecture - having a special
mode for this - but I guess it would be more trouble than it's
worth to do so.
\_ They did try, and it went over like the Hindenberg (c.f.
IA-64)
\_ IA-64 sunk for other reasons. You don't need an
expensive 64-bit processor without a market in order to
allow some programs to execute the microcode (or some
one-to-one translation thereof) directly. |