Berkeley CSUA MOTD:Entry 53474
Berkeley CSUA MOTD
 
WIKI | FAQ | Tech FAQ
http://csua.com/feed/
2025/05/25 [General] UID:1000 Activity:popular
5/25    

2009/10/27-11/3 [Computer/HW/Drives] UID:53474 Activity:nil
10/27   I just read an article that Facebook had moved their database
        to all SSD to speed throughput, but now I can't find it. Has
        anyone else seen this? Any experience with doing this? -ausman
        \_ I hope you're not running mission critical data:
           http://ask.slashdot.org/story/09/10/27/1559248/Reliability-of-PC-Flash-SSDs?from=rss
        \_ Do you have any idea how much storage space is used by Facebook,
           and what the cost implication would be to move *ALL* the data
           to SSD? I believe they may have experimented with using SSD
           as a 3rd tier cache layer between RAM and disk but the cost
           of *ALL* data is simply prohibitive.                 -kchang
           \_ SSD is $3k/TB now, I doubt that Facebook has more than 1PB of
              total data, so that would only be $3M. They probably spend more
              than that on electricity every year.
        \_ Are you thinking of MySpace, perhaps?
           http://www.theregister.co.uk/2009/10/14/myspace_fusionio
           It's not clear from that article whether they're using flash for
           all their storage, or just for caches.
           \_ Yes, that is it, thanks. -ausman
           \_ SSD doesn't make any sense in terms of $ for stuff like video
              and music and pictures. It does make sense for frequently used
              metadata. The question one should ask is, what is the infrastructure
              of MySpace like, is it using sharded MySQL or something else?
           \- our 12x or 16x RAID is faster than our SSD [high $$$ SSD, medium
              quality RAID. These are generally large seq reads, not small
              random ones.].
              \_ How fast is each?
        \_ Just because SSD r/w is 4X faster, doesn't mean your system will
           run in 1/4 the time. You gotta take in account of request overhead,
           and processing time (complex MySQL join is particularly expensive).
           My friends in the SSD industry said basically that speed-up wasn't as
           mind-blowing as they originally anticipated, and that if the
           application isn't SSD tuned, you may not get the amazing speed-up
           that you thought you would get.
           \_ I know that some FS are not well designed to take advantage of
              SSD, what other tuning to I need to be aware of?
              \- our biggest speedup was buying a lot of memory [+100gb]
                 and judicious use of compression ... with lots of cores
                 but limited bus IO, this is generally a big win even using
                 gzip style compression, although there are some faster
                 compression systems which dont compress as much in space
                 but are like 4x faster than gzip. BTW, does anybody other
                 than GOOG use "zippy"? Are there any tools/filesystems which
                 use zippy, or is that GOOG intenal. (we didnt spend much time
                 researching this ... we want to throw money and a little time
                 and the problems to mitigate it ... at some point we'll start
                 indexing, which is what will get us to the orders-of-mag
                 improvements).
                 \_ I think Zippy is fine tuned for Bigtable data, which is
                    basically value-key-time. Since they know their input
                    well, you don't need an all purpose gzip, hence the
                    10X encode rate and 3X decode rate over gzip. By the way
                    if you find Zippy implementation, let us know!
                    http://feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff
2025/05/25 [General] UID:1000 Activity:popular
5/25    

You may also be interested in these entries...
2013/5/6-18 [Transportation/Car, Computer/HW/Printer] UID:54673 Activity:nil
5/6     http://goo.gl/KiIMT (shortened link from http://HP.com)
        seems like I get these every 6 months or so from HP.  Do all drives
         have these kind of issues and I only see the ones from HP because
         they are diligent about reporting/fixing these issues?  Or do they
         suck?   (It's not actually their drives so...)  Also, do I really
         need to bring down my production infrastructure and fix all this
	...
2012/1/4-2/6 [Computer/HW/Drives] UID:54281 Activity:nil
1/4     I want to test how my servers behave during a disk failure and
        a RAID reconstruction so I want to simulate a hardware failure.
        How can I do this in Linux without having to physically pull
        a drive? These disks are behind a RAID card and run Linux. -ausman
        \_ According to the Linux RAID wiki, you might be able to use mdadm
           to do this with something like the following:
	...
2011/9/14-10/25 [Computer/HW/Drives] UID:54173 Activity:nil
9/13    Thanks to Jordan, our disk server is no longer virtualized. Our long
        nightmare of poor IO performance should hopefully be over. Prepare for
        another long nightmare of poor hardware reliability!
        ...
        Just kidding! (I hope)
        In any case, this means that cooler was taken out back and shot, and
	...
2011/2/14-4/20 [Computer/SW/Unix] UID:54039 Activity:nil
2/14    You sure soda isn't running windows in disguise?  It would explain the
        uptimes.
        \_ hardly, My winbox stays up longer.
        \_ Nobody cares about uptime anymore brother, that's what web2.0 has
           taught us.  Everything is "stateless".
           \_ You;d think gamers would care more about uptime.
	...
2011/2/18-4/20 [Computer/SW/Unix] UID:54044 Activity:nil
2/18    Why does the system seem so sluggish lately?
        \_ Slow NFS is basically always the answer. --toulouse
        \_ Any truth to the rumor that soda will be decommissioned this summer?
           \_ Absolutely none. Soda might go down temporarily while disks are
              reorganized and stuff so soda doesn't suffer from such shitty
              performance nearly as much, but no, we've gotta maintain NFS and
	...
2010/7/22-8/9 [Computer/SW/OS/FreeBSD, Computer/HW/Drives] UID:53893 Activity:nil
7/22    Playing with dd if=/dev/random of=/dev/<disk> on linux and bsd:
        2 questions, on linux when <disk>==hda it always gives me this off
        by one report i.e. Records out == records in-1 and says there is an
        error. Has anyone else seen this?  Second, when trying to repeat this
        on bsd, <disk>==rwd0 now, to my surprise, using the install disk and
        selecting (S)hell, when I try to dd a 40 gig disk it says "409 records
	...
2010/1/22-30 [Computer/HW/Laptop, Computer/SW/OS/OsX] UID:53655 Activity:high
1/22    looking to buy a new development laptop
        needs ssdrive, >6 hr possible batt life, and runs linux reasonably
        Anyone have a recommendation? Thx.
        \_ thinkpad t23 w ssdrive and battery inplace of drive bay
        \_ Ever wondered what RICHARD STALLMAN uses for a laptop?  Well,
           wonder no more!
	...
Cache (8192 bytes)
ask.slashdot.org/story/09/10/27/1559248/Reliability-of-PC-Flash-SSDs?from=rss
storage An anonymous reader writes "SATA and IDE flash solid-state disks are all the rage these days -- faster and, allegedly, more reliable than traditional spinning-rust disks. My organization dipped its toe in the flash-disk waters, buying a handful for some PC and Linux boxes. Out of 8 drives from various manufacturers, 3 have failed in the space of four months! Some are reporting bad blocks, others just crapped out and stopped responding entirely. commodore64_love (1445365) We've not being seeing widespread failure of Ipods or other keydrives, even though they use the same F-RAM technology. I'm kinda surprised to hear any reports of failure in the new solid state PC drives, unless it's an issue of making the cells too small to be reliable. Aside #2 from the Summary - - The savings on CFLs is trivial. While nothing is ever a certaintly -- a tool for your OS that inspects SMART data from your drives' electronics would answer that question, at least from a trend perspective. Go from 1 60W bulb to 3 20W CFL's and you get significantly more light, making for a nice change in darker climates. And compared to using 400W, 60W actually becomes a significant saving. Of course, it doesn't actually reduce electricity use... The warm up time is also less of a problem if you use multiple CFL's. About 10 years ago, we bought five standing lamps, each with 3x32watt dimmable bulbs. The electronics in the lamp are specifically designed to dim CFLs, and the CFLs are designed to be dimmed. The total price for each lamp (they are nice lamps) was several hundred dollars. The warm-up time is negligible and the light quality is excellent. For that matter, any energy savings is also questionable, once you account for the energy used in production, not to mention disposal. For example, the big CFL lamps mentioned in the post above are in rooms that are often used by 20-40 people. Without CFLs, we would need some 2000 watts of lighting - that would be intolerable. So is just about anything touted for its energy conservation potential. Energy is the lifeblood of civilization - we ought to see how cheaply we can generate more of it, not shave pennies like misers. Tell that to my electric bill, which dropped roughly 25% when I switched to (almost) all CFLs. And as for lifespan, I still have half of my original set of them fully functional (almost a decade ago now). For that matter, any energy savings is also questionable, once you account for the energy used in production Yup. All those evil corporations actually sell their products at a loss compared to the cost of energy required to produce them - Because your statement implies exactly that. And yes, I appreciate all too well how massively unfairly the utilities favor corporate customers over mere humans - But even considering that, if GE could make more reselling electricity than selling CFLs, don't you think they would? Along with the 100% recyclable phosphorus coating the 100% recyclable glass. And the (merely) 99% recyclable fiberglass and plastic in the base, don't forget that. Yes, CFLs have their shortcomings - And most people get them totally wrong (with the exception of how poorly they work with dimmers, that alone holds true). They start right up, they only take a few seconds to reach full brightness, they do save money, they do last 10x (or more) longer (though they do admittedly have a slightly higher out-of-box failure rate), they come in full-spectrum versions (and something incandescents don't, they come in germicidal versions as well). They even come in every common form factor now, from candelabra to GX53 (I learned that part when I discovered my new house had all candelabra-base lights). There's a big difference between religion and relying on a reasonably unbiased testing company like consumer reports. I think it was last month that we had quite the discussion about them. BTW, I just had my first CFL blow on me - it still produced a visible glow, but no longer lit like the 100W equivalent it's supposed to be. It was in the bathroom, and a transplant from the time I lived in an apartment. It saw at least 5 years of usage, it predated the time I started writing the install date on the base in permanent marker. nomadic (141991) I know you Americans are so in love with your wasteful lifestyle that you can rationalise to the most extreme not using innovations to cut down on consumption, like hybrid cars are gay and all the religious issues with CFLs. you're western european I'm assuming, it's the only area where everyone thinks they are experts on American culture. Not expert enough to know that the CFL was invented by Americans (as was the modern hybrid car) though. Had the same problem myself, the problem turned out to be a bad ground wire. There's a reason incandescents didn't have a problem there: they operate using hot and neutral. There is no ground connection on a CFL, just hot and neutral. They can't break due to a "bad ground" because they never touch ground. It's like saying your car gets bad gas milage because the diesel fuel in the truck parked next to it was contaminated. They seem to have some current pulse that occurs after turnoff that makes the X10 controller think you are trying to turn the light back on using the local switch. Press X10 off -- click -- light off -- click -- light on! It's like a video game, how many times do you have to press "off" to get them to stay off, and how short can you get the 'on' times to be? That, and the extremely short lives they have compared to simple incandescents, make them a pain in the ass and poor replacements. I like the european guy who talks about us americans and our "extravagant lifestyles" because we use incandescents. Using a 50 cent light bulb for ten years compared to ten (mercury containing) CFLs in the same place is extravagant? The new one has been working since, but I don't store any critical data on that PC. Burns take too long, are too likely not to work on another drive or even the same drive, have one little bad spot that spoils everything, and drives go bad all the time. Wish more Linux distros were set up for easy installation onto and from flash memory drives. That's a much cheaper way of trying LED lighting than going for regular lights. Their brightness varies hugely even between the same models. Only one early failure so far, and it wasn't real early-- lasted 5 years. Manufacturers have done a very poor job of informing people that most CFLs do not work with dimmer switches. Last time I went looking for a CFL for dimmers, I couldn't find one. Took a while to go through the fine print on all the models and confirm that none could hack a dimmer switch. The drives were: 1 FHM16GF25H = Super Talent MasterDrive 16GB under linux 2 Transcend TS32GSSD25-M under Windows/XP 3 Patriot Warp v2 32GB under Ubuntu 804 with ext3 The machines were not super heavily loaded (ie, no compiles 24/7), and we did the "obvious" things like turning off atime updates to the filesystems, etc. The brands/models were the critical piece of information. You're probably aware that SSD's have been in the server space, at a very different price point, for a few years now, without any extraordinary reliability debacles. To some extent, this is a case of getting what you pay for. I did a moderate amount of research on SSD drives, relying especially on the independent review sites, and quickly eliminated all of the brands you described. The prevailing wisdom seemed to me (and to people like ie Torvalds) that Intel was far and away the top of the heap in terms of performance and reliability, and some drives based on a newer Samsung controller (ie OCZ Summit) were a perhaps credible alternative. Other brands were clearly struggling to even be in the game, with frequent firmware updates and outright debacles (ie Indilinux, Micron) and we're in the process of shaking out who will make it and who will not. I have only fielded a few consumer-grade SSDs over about the same amount of time as you, but going with Intel's G1 and G2 MLC products has so far yielded zero failures. If you are already in the market for an SSD, and you are ready to spend premium money...
Cache (1136 bytes)
www.theregister.co.uk/2009/10/14/myspace_fusionio -> www.theregister.co.uk/2009/10/14/myspace_fusionio/
Free whitepaper - Comparison of Static and Rotary UPS Social networking site MySpace has replaced traditional server/direct-attach disk combos with flash memory cached servers to save space, energy, cooling and cost. MySpace originally used multiple racks of 2U rackmount servers, with ten to twelve directly-attached 15,000rpm hard drives. They've been ripped-and-replaced with fewer racks holding 1U servers fitted with Fusion-io's ioDrive flash memory accelerator cards. The ioDrives use less than 1 per cent of the electricity needed by the replaced hard drives. MySpace now uses less floor space for its servers, less electricity to power and cool them, and expects them to be more reliable as they'll no longer suffer HDD failures. Richard Buckingham, technical operations VP for MySpace, said: "In the last 20 years, disk storage hasn't kept pace with other innovations in IT, and right now we're on the cusp of a dramatic change with flash technologies." This free to view webcast looks at how hosted software asset management options make tackling the support headache easier, as well as helping to save money for the business.
Cache (3324 bytes)
feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff -> feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff/
Spinn3r Sponsors 2009 International Conference for Weblogs and Social Data Challenge Kevin Burton's NEW FeedBlog You may say I'm a dreamer, but I'm not the only one. google A few months ago, when I was heads down finalizing the distributed database in Spinn3r, I was exceedingly curious about what other DBs are using for compression. GZip seems to be the obvious choice but its compression speed isn't very good when compared to LZO. I remembered some notes about compression in the original Bigtable paper and decided to dig a bit deeper. Apparently, there isn't much information about what Google uses for compression in Bigtable, GFS, etc. Andrew Hitchcock also took some notes on the talk: There is a lot of redundant data in their system (especially through time), so they make heavy use of compression. He went kind of fast and I only followed part of it, so I'm just going to give an overview. Their compression looks for similar values along the rows, columns, and times. BMDiff gives them high write speeds (100MB/s) and even faster read speeds (1000MB/s). It doesn't compresses as highly as LZW or gzip, but it is much faster. He gave an example of a web crawl they compressed with the system. Bentley McIlroy DCC 99 Data Compression Using Long Common Strings The use BMDiff to compute a dictionary diff between all columns in a column family. This way common strings between columns can be stored in a compressed dictionary to avoid duplicate storage. This also helps to diff between previous versions of a page across compactions. A page stored in your index will probably have a LOT in common with the same page stored a month ago. They then run the bmdiff through zippy (another compression algorithm they wrote). I'd like to see MySQL/Drizzle support more higher level DB primitives directly rather than having to build support for these above the DB level. The zlib compress/uncompress support in MySQL is horrible (binary data is not compatible with other zlib implementations). Supporting bmdiff, lzo, bloom filters, etc in DBs is going to be necessary to have drizzle support larger distributed databases. There are a few UDFs I want to write so maybe I'll take these on at the same time.. Come to think of it, crypto support isn't that hot in MySQL either. October 13, 2008 at 12:52 pm There is a open source implementation of bmdiff/zippy like library in Hypertable, called bmz. It's written in pure ANSI C, and can be easily embedded in any project (that was a goal I had in mind, so it's written in C instead of C++ like the rest of the Hypertable). The performance is similar to google's published numbers for small blocks (the size of sstable block from 64KB-128KB. The Rabin-Karp like hash functions can be easily plugged in for experiments. October 15, 2008 at 3:00 pm The quick and dirty implementation, depending on your app, is to use gzip for a column, but only compress/decompress when the data is used -- usually on a client webserver. Thus the compression cost is moved into a more scalable layer. Ideally, this would get built into mysql/drizzle such that decompression happens on an access of the column's data, and if that doesn't happen on the server, then it passes that flag on to the client. So the client/server protocol would need to be updated as well both the client and server.