6/30 Now that the human genome appears to be all but decoded. Is
there any method to measure the number of bits that are encoded
in the genome. IE how does it compare to a modern operating system.
\_ Well, the encoding system is using a power of two, so there is
a very easy conversion. The problem is that it's not always
easy to see where code ends and garbage begins in DNA.
\_ ONE HUMAN ~ 4 TERABYTES
\_ Uh, I don't have my biochem text with me (on vacation),
but I seem to recall the human genome being 2,000,000 kbp
(kilobase pairs), or 4 Gbits of data (2 bits/bp). -nweaver
\_ That's just the program text. The Interesting Question(tm)
is How much does it take at runtime?
\_ The number is actually much less than that, since the
bitstring is EXTREMELY structured. Which means less bits. If
I were to guess, you're off by a factor of 100-1000. Maybe
worse. That doesn't mean anything however, since we know next
to nothing about the structure, and won't for quite a while
\_ ONE HUMAN ~ 4 TERABYTES NO COMPRESSION, PUNY HUMAN
\_ once the genome is there, the interesting stuff begins. For the
next 30-50 years, I think scientists will be working on the grand
\_ try 300-500; popular press is just listening to what the
funding proposals are babbling; anyone actually writing
them makes sure the timespan predicted is long enough
"so that i won't be around to be held responsible" but
short enough as to not to discourage investment. sad,
but true.
unification theory of DNA. A physical/biology/mathetical model
of the interaction of the different genes. Imagine running a
simulation of a new lifeform created by artifically pieceing
different genes! The complexity of such a simulation is beyond
anything we've done. Today's supercomputers used to simulate
nuclear explosions will look like toys next to computers
simulating artificial lifeforms. Who wants to guess on the
computational power needed to run a simulation of a single cell?
\_This is the typical clueless CompSci answer to biochemical
problems. I remember once one of my advisors said that the
problem with working with computer scientists on biological
simulation in actual living cells. Just pick your favority
problems was that they just didn't get it. I guess he had a
point. Why waste your time trying simulate a complete cell at
such a granular level on a computer? We can simply run the
simulation in actual living cells.
\_ Why bother running simulations of rockets, and atomic
bombs? Oh yeah, that's right, if you find something
*really* interesting, **THOUSANDS/MILLIONS** OF PEOPLE
**DIE**.
Apparrently, its true that those who can't do, teach.
\_ What are you trying to say? This makes no sense.
Just pick your favorite
organism and transform them. DNA is cheap and plentiful to
reproduce with a little lambda phage, plasmid, and PCR. Also,
simulation of a single cell, albeit interesting, isn't exactly
\_ >80 column idiocy fixed. Get a clue. -tom
completely useful. Since we are mainly interested in
multicellular organisms, a simulation of intercellular
interactions would be much more valuable. i.e. what exactly is
involved in the complex interaction of cell signalling during
embryonic growth, and how that interrelates to differentiated
cells. A more realistic goal is to use pattern recognition
techniques to be able to predict tertiary/quarternary structure
of proteins and enzymes from DNA, and probably one which is much
more profitable than trying to simulate organisms when the
actual organisms can be produced cheaply. Go buy yourself a copy
of Maniatis. -williamc.
\_ If you take a pure scientific view, there is lots of value
to understanding how cellular processes work, and being able
to model them means a huge step toward fully understanding
the schemes (algorithms if you will) nature has come up with.
From a practical viewpoint, you want to be able to model
a cell so you can design your own cellular signalling pathways
What you're saying, William, is that there is no value in
understanding the inner working of cells, that nuclear
transport, mRNA regulation, vessicle trafficking is not impt.
Thats a very narrow minded view.
\_ What he's saying is that full simulation is infeasible,
and suggesting a viable alternative. Get a clue.
\_ see below
\_ More than Moore's law can produce for you even if it lasts
through 2500 A.D.. Without a new computational paradigm, or a
better abstraction than sheer chemistry, this will not be
practical (in all likelihood) until well past the predicted
lifespan of the Homo sapiens species, or even genus Homo.
\_ Dude. Do you realize how LARGE the number
current_computational_speeds * 2 ^ (500 / 1.5) is?
\_ Yes I do. Do you realize that modeling a physical
system on quantum level is considered non-polytime on
a classical computer? And do you realize how many atoms
a cell contains?
\_ In something like 8 iterations of Moore's Law
(12 years) you'll be able to read 4 terabytes
(the DNA sequence) into RAM. The rest of the
cell structure is simple relative to DNA and
doesn't need to be fully modeled. By the time
you can read DNA into RAM, processors will be
running at 256 Ghz, with who knows how many
instructions per cycle. That's far more
processing power than a cell has. The only
computational barrier at that point will be
writing the code to model it correctly; that's
hard for a cell and much harder for a full
organism. -tom
\_ "The rest of the cell structure is simple
relative to DNA"? Get a clue, cs boy. You
can read the damn bytes into RAM, but you won't
know what the fuck to do with them. Predicting
"everything" from DNA, or even a small subset
of it such as the general protein problem
(folding, interaction, binding sites, etc), may
easily, to the best of mankind's current
knowledge, turn out to be, oh, say,
EXPSPACE-hard. All your Moore's law ramblings
aren't worth crap until we know SOME fully
encapsulated localization structure in the
problem (be it DNA, protein, life, etc). Which
doesn't seem too plausible.
\_ It would be stupid and _unnecessary_ to model the
individual atoms to model a cell or dna. For example,
weather modeling gets better everyday and they're
certainly not modeling every atom in a storm.
\_ See above.
\_ And do you honestly think we'll still be computing on
silicon then?
\_ The above was predicated on "no change of paradigm"
\_ But can distributed computer help, like what SETI@home does?
-- yuen
\_ Probably not; seti@home relies on the fact that an
arbitrarily large amount of computation can be done by
any node without needing input from any other ongoing
calculations; a cellular model would be much more
interactive. Still, I think the assertion that we'll
never have enough computing power to model a cell is
silly and unfounded. -tom
\_ 3 words for you -- "think avogadro's number"
\_ "No one would ever need more than 640k".
\_ I have to agree with william. You don't start a
computationally intensive calculation at the lowest
possible level of understanding. For instance, if you
ever want to see a result, you would not start a model
of even a modest polypeptide by doing ab intitio
calculations on the interactions between individual
electrons and nuclei. Modeling an entire cell based on
molecular interactions is similarly too complex and
really unnecessary.
\_I dont understand this fixation with atoms. You dont
need to model atoms, just the kinetics and thermodynamics
of interactions. Duh, anyone thats knows anything knows
theyre not going to figure out interactions in the cell
from scratch. We have 100+ years of abstraction to work
with.
\_ Nobody is talking about simulating cells at the atomic
level, dumbass. As for "why not try it on a real cell?"
It's a stupid question. It's always more economical to
simulate something first rather than try it first. You
can change your simulation parameters faster than you can
change your real-world experiment.
\_ this is utterly false. -tom
\_ this is the first intelligent thing you've
said in this thread, tom
How do you think we
build cars and airplains and computers? We break it down
into components, build models in computers, simulate them,
and then build small scale models. Drugs can be synthesized
in a computer faster than in real life. I'd love to see
how a particular drug will affect a cell even before
the drug exit in real life. Science fiction? maybe. But
then again, who would have thought of the internet 100
years ago?
\_ what is human gnome, and is it better than kde?
\_ alot of you missed a point made above, DNA isn't enough! The cell
itself carries much info that isn't in the DNA (via already synthed
proteins, sugars, biochemical microenvironents, mitochondria and
their DNA, imprinting (which the genome project is ignoring), as
well as other molecules that we probably don't realize are
necessary in a model). Yes, much will be able to be done, but the
necessary in a model).
\_ the total amount of cell information not contained in DNA
is almost certainly less than the amount of information
contained in the DNA. So call it 8 terabytes and 13
iterations of Moore's Law. -tom
Yes, much will be able to be done, but the
system will have holes and leave a lot to interpretation. That's
not to say that phages, bacterial sims, YACS, . . . are the answer,
they also have many, many flaws, but we are getting closer. And
it is probably the marriage of the techniques that will produce
the answers we are stiving for, with the great aid of human
intuition and analytical skills.
Anyhow, the 4TB, GB, whatever, of DNA isn't enough. Just
imprinting alone would add 2 bits to every base pair (methylated or
glycosylated), now add on everything else you forgot to consider.
Oh, and don't forget you need the environments of all surrounding
systems, i.e. in birth you need the mother, her DNA, and so forth
to get it all right. Bottom line, an approximation is better than
nothing, but don't get your hopes up too high!
\_ The first challenge is simulating an amoeba. -tom |