1/30 What's a good way to archive hundreds of GB, or even TB, of data?
Tape seems obvious, but it is not random access. Hard drives are
cheap, but I fear reliability issues, even with RAID. We're
talking about archiving data for decades. Is the best strategy
to write to tape *and* to hard drives, only going back to tape
in the event of a disaster?
\_ I just send an email to Chuck Norris, and he'll remember it forever.
\_ Not that this is your situation, but this brings us back to a
similar chat a few weeks ago about data retention: most of our data
is crap and no one would miss it if it vanished. If long term mass
data storage was a real problem for more people there'd be a lot
more effort going into a real solution usable by a larger number of
people/corporations/governments.
\_ This is an ongoing problem, and not one with any "standard" solutions
that I've seen. The closest I've seen to common wisdom on this topic
is: keep the data online, backed up, in multiple places, and keep
moving it to new media as the old dies. Not a lot of fun.
\_ whoever solves this problem so that it is both convenient and
cheap will become very rich.
\_ Contact the Internet Archive. They've solved the problem already.
They did the work to figure out the lowest cost petabyte array, and
I'm sure they'd be happy to work with you. Here's the overview of
their system: http://www.archive.org/web/petabox.php -emarkp
\_ What do these guys do when their server room explodes?
There must still be tapes or similar, right?
\_ What if the earth explodes? I make sure to stow away backups
on each nasa mission.
\_ drink 3 beers, eat some peanuts, bring a towel, and don't
forget to feed the dog a cheese sandwich
\_ Is this a lame Hitchhiker's reference?
\_ I assume it's redundant backup. Call or email them! -emarkp
\_ Bwahahahahahahahahahahah. No, they haven't. The good news is
that they have a redundant backup on the other side of the world.
The bad news is that the internet archive has incredibly high
hardware failure rates since they usually get hand-me down
hardware from Alexa (aka the for-profit half of the internet
archive, a wholly owned subsidiary of http://Amazon.com), and both
Alexa and the Internet Archive beat on their disks much harder
than the typical usage disks are designed for. Furthermore, the
Internet Archive is woefully underfunded and, as a consequence,
understaffed so they don't have the engineering man-hours needed
to effectively work around the problems caused by their disks
regularly taking a piss. -dans
\_ What do people here do for their home backup needs? Hard drives?
I don't understand the tape storage industry at all and optical
media is kind of a joke.
\_ I'm going with the faith based backup system.
\_ Oh, nice. I guess, you know, if I lose some data, Yahweh
decided that wasn't good to have around.
\_ God helps those who help themselves. Get a data backup
system if you don't want to lose data. God supports
data backups.
\_ This is a very important problem that very few people seem to be
paying attention to. For instance here are already gobs of NASA
telemetry data from missions in the 20th century that are now
effectively unreadable. This is probably one of the few real
advantages that truly analog storage mediums have over digital -
a degraded analog signal is still readable long after the
equivalent digital signal would be hopelessly lost. One wonders
what will happen to future historians trying to understand
political decisions made by past governments when crucial
information only passed through digital media. The Long Now
Foundation has projects exploring this issue, though I don't know
how much practical success they have had: http://www.longnow.org
\_ NASA didn't really save a lot of the telemetry back then,
only the products. However, there is data (on 9 track
tape) going back to Voyager, Pioneer, and such. I am
trying to address this issue for (my part of) NASA. In
the past, tape (4mm, 8mm, 9 track) or optical disks were used,
but today's missions generate quantities of data that would've
been unthinkable in the 1970s. Additional problem: no one
wants to pay (much) to do this stuff.
\_ Digital data is still ultimately stored in an analog medium
though, right? Besides, one can spend extra bits for redundancy
and repair.
\_ Sure, but once that analog medium degrades beyond the ability
of your error correction mechanism, the data is lost beyond
repair. Pure analog mediums do not have this issue - although
a degraded signal will be very distorted, it will still
retain useful information.
\_ If it is distorted enough it won't be particularly useful.
With digital, you can recover the bit-perfect original,
even after severe degradation, dependent on how much
redundancy you budget. I don't see why "digital" is the
culprit. You could engrave "digital" bits into a chunk
of granite. Agreed though it's kind of an all or nothing
affair; you don't have much once it fails.
redundancy you budget. Agreed though you don't have the
gradual degradation... you have some "buffer" then it's
just gone. So you would have to have a huge parity to data
ratio to achieve similar longevity. On the other hand you
can keep copying the data to new media and never lose any
data which is impossible with analog (other than abstract
content).
\_ All of what you say is true, but neither solution seems
practical from an everyday standpoint. Most data
storage solutions maximize size and have little parity,
and there is usually little economic incentive to
keep preserving data in that manner. Another huge
unresolved issue with digital data is format turnover.
I have a large collection of live recordings made with
a Sony DAT recorder in the 1990s. Sony used a DAT
implementation that is notoriously difficult to read on
non-Sony machines. With the market for DAT disappearing
and most of the major manufacturers discontinuing their
DAT machines, it will only be a matter of years before
my DAT recordings are unplayable on any easily
obtainable device - and before you mention the used
market, did I mention that DAT machines are prone to
failure and replacment components are hard to come by
due to the aforementioned death of the market for
DAT? Since my Sony machine died, my only choice at
this point is to try to track down another one
that is still functional, and that includes a cable that
can adapt Sony's non-standard digital output to
SP/DIF - and then transfer hours of recordings by hand
to another format. This is only an example, but it
illustrates the issue on a very small scale - multiply
this by a million times and you have some idea of what
future governments and corporations will be faced with.
\_ Speaking of data, much of our music, books and
movies will disappear not only because of the format
problem but because of the combination of silly
copyright periods and DRM that will make it very
difficult for future generations to recover any
of it.
\_ Our books aren't going anywhere. Most of our
movies and music *should* be destroyed.
\_ HEIL GERMAN JOHN! HEIL!!!
\_ Erm, bad troll, I wasn't even in the
room! -John
\_ Thank you Der Fuerher.
\_ Sounds like you're mostly getting screwed by using
nonstandard, proprietary stuff, not really digital
storage per se. If there was some specialized "AAT"
market and you did everything the same you'd have
similar problems. (Or if not proprietary, it's
relatively uncommon.) |