9/13 Thanks to Jordan, our disk server is no longer virtualized. Our long
nightmare of poor IO performance should hopefully be over. Prepare for
another long nightmare of poor hardware reliability!
...
Just kidding! (I hope)
In any case, this means that cooler was taken out back and shot, and
replaced with Keg, a real machine with real disks. Right now it's not
running at 100%, but already you should notice that soda's not only
fast, it's a fucking miracle compared to the past few years. I
personally blame unforeseen edge cases in a poisonous combination of
ZFS+NFS+OpenSolaris+1000s of users with a system too big to fail.
Indeed, syncing the data away from cooler took two continuous weeks.
It's no wonder it's taken until now for a very capable VP to be up to
the task of partially unbreaking the setup. Note - we no longer have
any VMs running off of virtualized disks stored on a NFS mounted disk
which, itself, was virtualized. Hmmmmmmmm. Though those were mostly
useless VMs you never saw. :P
So anyways, as mentioned earlier, Keg isn't at 100%, but it's up. It
looks good enough to keep for a bit, but it originally had a bunch of
Raptors or some such. The disks are still there, but the RAID cards are
most likely broken. We'll leave it to jordan to evaluate the server
needs and fix accordingly. As it is, RAIDing fifteenish 10000RPM disks
so you can edit motd SUPER-EXTRA FAST!!! is probably not a great use of
time. We'll see where our less-shaky infrastructure takes us in the
future. --toulouse
\_ cooler is dead. Long live KEG!
\_ Good work guys, thanks! #1 lesson here: don't virtualize i/o
\_ Good work guys, thanks! #1 lesson here: don't virtualize disk i/o
intensive applications. -ausman
\_ That is a good lesson but definitely not the #1 lesson.
* Exporting thousands of filesystems: bad idea, no matter how
easy it makes backups and ZFS snapshotting.
* Using an OS with superior filesystem support is a bad long-term
solution if nobody but the original installer knows almost
anything about it
* Choosing ZFS...the jury's still out.
* Maintaining FreeBSD 7 and 8 and OpenSolaris and Debian...kinda
hard.
* All of this, on top of virtualized disk i/o - bad news.
\_ Even after I collapsed NFS down to one filesystem, when our
FreeBSD boxes came back online and started automounting
thousands of filesystems apiece, the NFS server again ground
to a screeching halt (taking soda and friends with it).
Switching to one /home mount per server restored NFS's
snappiness; I suspect that even a virtualized NFS server could
perform well without the filesystem woes. --jordan |