11/28 Let's talk about virtual memory:
\__ if you'r edealing with virtual memory which
far exceeds physical memory, you've already lost.
\_ if you'r edealing with virtual memory which far exceeds
\_ First of all, the guy is talking about
number crunching, not image processing.
It is likely that he's going to be
addressing all of the memory he's
crunching with. And second, no one said
it was impossible--it just is painfully
slow. disk is like 6 orders of magnitude
slower than RAM.
\_ uhh. "number crunching" applications
usually exhibit greater locality
than almost any other app, if
optimized properly. -nick
\_ which will help not at all if
it's using more than physical RAM.
\_ What part of "locality" don't
you understand, twink? Nick knows
what he's talking about.
\_ Whether you can get away with crunching >> than RAM
depends strongly on the precise task. A blocked
matrix multiply might perform well, especially if
you somehow anticipate the data needed next and
stream it into RAM beforehand so you don't have to
eat the large latency while it is paged in.....
Having a RAID array (expensive) also helps.
A non-sparse matrix-vector multiply, however, requires
1 memory reference for every 2 flops (not counting the
memory reference for the vector element).
memory reference for the vector element or result).
Assuming .5 flop per tick on a 400MHz P-II, we'd need
floats from the matrix at 100MHz, or 400MB/sec.
SDRAM might sustain that, but if the matrix were
SDRAM might sustain that, but if the matrix were much
larger than memory.... Performance would drop 10x
at least. --PeterM
\_ I don't think ILP is heavily influenced by how
much a process' virtual memory compares to the
physical memory. Virtual memory pages are usually
on the order of 64kb. Compare that to, say, a Cray
vector register file which is a 32x32 64bit matrix.
That's 8kb and it takes several clock cycles for
a number crunching program to process the data in
the 64kb page anyway. But this guy is talking about
a Pentium and a K6 to do number crunching. I don't
think he's going to benefit from that kind of ILP
and even if he did he would still benefit from
the spatial and temporal locality of the program.
physical memory, you've already lost.
(\_ This has pretty much been my experience--PeterM )
\_ First of all, the guy is talking about number
crunching, not image processing. It is likely that
he's going to be addressing all of the memory he's
crunching with. And second, no one said it was
impossible--it just is painfully slow. disk is like 6
orders of magnitude slower than RAM.
\_ uhh. "number crunching" applications usually
exhibit greater locality than almost any other app,
if optimized properly. -nick
\_ which will help not at all if it's using more
than physical RAM.
\_ What part of "locality" don't you understand,
twink? Nick knows what he's talking about.
\_ Whether you can get away with crunching >>
than RAM depends strongly on the precise
task. A blocked matrix multiply might
perform well, especially if you somehow
anticipate the data needed next and stream
it into RAM beforehand so you don't have
to eat the large latency while it is paged
in..... Having a RAID array (expensive)
also helps. A non-sparse matrix-vector
multiply, however, requires 1 memory
reference for every 2 flops (not counting
the memory reference for the vector
element or result). Assuming .5 flop per
tick on a 400MHz P-II, we'd need floats
from the matrix at 100MHz, or 400MB/sec.
SDRAM might sustain that, but if the
matrix were much larger than memory....
Performance would drop 10x at least.
--PeterM
\_ I don't think ILP is heavily influenced by
\_ ILP ==> "Instruction Level Parallelism"
I don't understand why you mention it
here. --PeterM
\_ You mentioned vector ops which is a
form of instruction level parallelism.
how much a process' virtual memory
compares to the physical memory. Virtual
memory pages are usually on the order of
64kb. Compare that to, say, a Cray vector
register file which is a 32x32 64bit
matrix. That's 8kb and it takes several
clock cycles for a number crunching
program to process the data in the 64kb
page anyway. But this guy is talking
about a Pentium and a K6 to do number
crunching. I don't think he's going to
benefit from that kind of ILP and even if
he did he would still benefit from the
spatial and temporal locality of the
program. |