8/28 Followup on a previous post about network transfer of large files
and checksums: I have compared the 2nd download which passed the
md5sum with the first one which didn't. They have identical sizes
but differ in content on about 200 bytes out of about 640MB. Is
there a way to estimate the likelihood that this is the result of bad
transmission or a malicious substitution? I am asking both for
theoretical curiousity and practical interest. So besides some
high brow math. argument, is there some obvious indication like
whether the differences are concentrated, continuous, etc to check?
\_ Mount the iso file (assuming it's on a linux box) and poke around.
mount -o loop -t iso9660 filename.iso /mnt/tmp
\- yes there is a way to guess whether it is random or malicious
depending on what the contents are [probably], but it is a lot
of work, so i wouldnt bother. 200bytes is a hell of a lot.
that is a little strange. my guess is linux -> ass. --psb
\_ Have you determined what the differences are?
\_ All I did was was comparing the two images byte by byte with
a simple c program. Of course one could recursively look into
each volumes, and to be comprensive one has to look at
the partition map, catalog file, and auxillary partitions.
But as the posters above wrote, it is way TOO MUCH work for a
mild curiosity. I was asking if some statistical/probabilistic
analysis is possible (in theory) and some rule-of-thumb
available exists in practice. The transport was thru ftp, btw.
\- tcp checksum is not going to miss 200bytes in a <1gig xfer.
what you should do is do the xfer 100 times [or whatever]
and see how many times a strong checksum fails. if you do
that, i'd appreciate it if you would send me the info.
linux has a history of flailing on large data. --psb
\_ I transfer 8GB disk images and 600MB iso's between
my linux boxes. I've never had any problems. what do
you mean linux "flailing on large data"?
\_ I throw around 2 terabytes of data with linux every other day
and I haven't noticed any data loss yet but I have not
conducted an exhaustive statistical study. - danh
\- do you guys actually check the data or do you cross your
fingers? obviously if you dont look, you wont find.
also it may not manifest itself withing a certain range
of behavior/configurations.
anyway, first hand, i have had linux system writing
corrupted packets on the the net [went away when ethernet
driver was changed]. when we changed various things in
bpf and syskonnect ethernet driver fleebsd was fine
with our hacks, linux occasionally had issues (we didnt do
too much research on what the problem was ... we just abandoned
it ... and the problems seem to in part go away when we had
faster processors and faster disk bus). i dont remember which
file system it was, but one of them lost us some data and it
didnt appear to be a hardware problem [was a while ago also...
lately i havent been looking but havent casually noticed
data loss at fs level]. i dont need to say anything about
linux nfs server. admittedly these are rare, but they are
in areas you expect perfection. a bigger problem is just
general "weird behavior" under load [or sometimes even
not under load]. linux does too many short cut things for
"typical case" speed hacks. this can lead to your being out
to sea when something goes wrong [e.g. when you look at a
solaris crash dump, you have much better info than trying
to figure out what happened in the linux case. this might
partly be my better knowledge of solaris but in some cases
the relevant info about the thread state, locks, watchdogs
simply were not there] and also the system behavior often is
sort of unusual under load [e.g. low free memory + high io,
compared to FreeBSD and solaris (although when various large
changes were made in solaris kernel algorithms for short
periods i did see some performace issues)]. finally i dont like
the way the memory-file system subsystem has been evolving.
recently seen some problems in work environments with lots of
(tcp) connections ... you get weird hangs on clients when the
server drops packets ... admittedly this might have been fixable
by throwing hardware at the problem or tweaking various para-
meters (and this was on some HPC enviornments were we could not
compare against solaris/bsd).
YMWTGF: andrew hume HotOS linux suspect --psb
\_ Our answer was much simpler than yours. After too many
lost files, NFS problems, dropped packets, etc, etc, we
simply stopped using Linux because it sucks. We didn't
have the time to get into this driver vs that driver or
what kernel patch might have helped or which NIC, etc.
Linux = not ready for enterprise = out the fucking window.
Staff time is more expensive than the value of possibly
finding a solution to kludge Linux into working. The
moment we switched to real OS's our problems just magically
went away without hiring a team of Linux kernel developers.
Linux is cute but their development philosophy precludes
it's use in enterprise environments. Just FYI, I'm tossing
around 20-30TB/month between various hosts.
\_ Which OS did you switch to? FreeBSD? Solaris?
\_ Yeah, especially now that Sun sells the X1 for under $1k.
\_ I should have added: The system from which I run the ftp was
OS X, which is a (free)bsd derivative. And I also noticed that
the bad download had wrong modification time. It was set to be
the day of the download, even though I have "preserve" on. |