7/14 So as I understand it, Intel introduced Hyperthreading because the
penalty for a cache miss or branch misprediction on Netburst was huge
so to make up for it they could work on a different thread while
waiting for memory latency. Given that the Core2 architecture has
such a wide execution path, couldn't they use HT to try and keep all
those execution units full?
\_ I attended one of Intel's talks on campus and the "huge penalty"
myth is a myth. As the number of pipelines increases, the depth
of the miss and a pipe flush is longer, yes. However, keep in mind
that since each stage in the pipe is shorter the clock cycle is
also faster. Thus, in terms of absolute time (time=number of
pipes that need to be flushed * cycle time), the increase in P4
from the old architecture is only increased by ~20% time, which
is insignificant in computer science speak.
As for keeping execution units full, it's very much application
dependent. Even with the huge instruction reorder mechanisms some
applications still can't utilize them all. |