Berkeley CSUA MOTD:Entry 53559
Berkeley CSUA MOTD
2022/08/07 [General] UID:1000 Activity:popular

2009/12/2-26 [Science/Disaster] UID:53559 Activity:low
12/2    So I am trying to convince my company to take disaster planning
        more seriously. Does anyone have any hard numbers on how often
        data centers fail? I mean blow up, burn down, flood, etc, with
        total loss of all services for an extended period of time.
        \_ hard numbers tend to be SEKRET.  But check out Yahoo's recent
           outage and UltraDNS' outage.  Those were both pretty bad.
        \_ I don't know, but I am in the same predicament. Instead of
           focusing on _TOTAL FAILURE_ (which is rare and evokes a
           response of "In a 10.2 earthquake I am not coming in to work
           anyway, because I am going to go check on my kids and
           get my gun, buddy.") focus on real problems which are much more
           likely to occur: power outages, localized floods, localized fires,
           etc. The entire data center does not have to blow up for there
           to be a major, major problem.
           \_ What are the likelihood of those events then? This is pretty
              important engineering data, someone must study this.
              \_ Any company that has a lot of data would know.
                 Insurance companies, government data on disaster, etc.
              \_ 1. This is going to vary based on a lot of factors like
                    your geography, industry, utilities, building construction
                    etc. I would argue it is so specialized to your site
                    that averages do not matter. Plus, we all have
                    different needs as far as availability and uptime.
                    \_ I don't care what your needs are, I can determine that
                       for my own company (okay I actually make a menu list
                       and senior executives make the decision based on my
                       risk analysis vs. cost). What I need is hard data to
                       make an informed risk analysis. What is the MTBF for
                       an entire data center? It is in Dallas if you really
                       think that matters.
                       \_ Listen, guy. I am trying to help you. I don't
                          need attitude from you. The MTBF between a data
                          center made of concrete and buried a mile under
                          the Rockies is quite different from the MTBF for
                          a data center built out of paper next to the banks
                          of the Mississippi. So MTBF for "the average data
                          center" has very little bearing on _your_ MTBF.
                          Hell yes it matters that you are in Dallas versus,
                          say, Somalia, dipshit.
                          \_ I'm not the op, but my reading comprehension
                             is a lot better than yours so let me help you
                             out. The disaster planning guy simply wants
                             a real life story of a disaster that was
                             averted due to planning. A story, albeit
                             irrelevant to actual circumstances, is
                             sometimes more powerful than listing boring
                             numbers that a lot of the upper management
                             MBA business dudes don't understand and
                             don't want to hear.
                             \_ Good point, but it's not what the guy asked
                                for. He asked for "hard data" for a
                                "risk analysis". Your reading comprehension
                                \_ why don't we let the op decide  -pp
                           \_ I suspect that in the public sector, the best
                              way to empire build is to create a horrific
                              scenario that only you can solve, by the
                              application of a multi-million dollar budget
                              and a bevy of new hires. This explains crazy
                              shit like having thousands of Federal officers
                              force everyone to take off their shoes before
                              forcing everyone to take off their shoes before
                              flying and data centers a mile underground.
                              In the private sector, you have to do a Cost-
                              Benefit analysis to prove that what you want
                              to accomplish makes financial sense. So anecdotes
                              won't do, though they might help. Your reminder
                              that historical data is the best way to go is
                              useful though, our DC provider is AT&T who
                              surely must have already done this analysis. I
                              will ask them. Thank you for your advice.
                              \_ You're an idiot.
                 2. This is not easy data to find. I am guessing companies
                    do not like to make this info public. Sungard keeps
                    some statistics like:
                    a. Hardware failure remains the leading cause of business
                       disruption (almost 50%)
                    b. Problems resulting from disruptions to power
                       supplies account for more than one-quarter (26%) of
                       customer disaster invocations
                    c. Flooding and infrastructure-related problems such
                       as air conditioning faults and failure of uninterrupted
                       power supply systems were the third biggest cause of
                       business disruption.
                    d. The average customer affected by Katrina used the
                       backup facility for 22 days.
                     \_ Can you point me to the Sungard info? Thanks.
                  3. What we did was analyze our own site and infrastructure
                     over the last 30 years. Sure, we might be missing the
                     100 year flood and 100 million year meteor impact, but it
                     gives us a good idea of events likely to occur and
                     protecting against those does a pretty good job
                     against the rare events, too, in most cases.
                     \_ This company does not have that kind of data available
                        internally, for one thing we are only 15 years old.
                        Plus we only are in a few datacenters, so we just
                        don't have enough data points.
                        \_ Certainly you can examine the last 15 years
                           and for certain catastrophes like hurricanes
                           you can go back even before the company existed.
                           For example, we have a good idea of how often
                           earthquakes strike California. You should have
                           a good idea of how often disastrous tornadoes
                           (or whatever) strike your area even if the
                           the last one happened in 1950. In our case we
                           expect a wildfire every 50 years, an earthquake
                           every 20 years, a windstorm every decade, etc.
                           Figure this out for your own site and then add
                           in other variables like construction of your
                           buildings, physical security (terrorism),
                           how good your utilities have been over the
                           15 years you have data (blackouts), etc.
2022/08/07 [General] UID:1000 Activity:popular

You may also be interested in these entries...
2011/10/20-11/8 [Science/Disaster] UID:54199 Activity:nil
10/20  Earthquake!
       \_ It's funny that the Great California ShakeOut earthquake drill just
          took place this morning.  It'd be even more funny if the quake hit
          during the drill.
2008/7/29-8/3 [Science/Disaster] UID:50722 Activity:nil
7/29    5.8 quake in LA
        \_ Felt exactly like Whittier quake, so it's not surprising it's
           roughly the same size. A whole lotta shaking, though.
        \_ Chino Hills is NOT Los Angeles. Los Angeles is in-between
           the 110 the 10 and the 5. Chino Hills is far away from
           Los Angeles. Fucking ignorant northerners. To be precise,
2008/6/19-23 [Science/Disaster, Science/GlobalWarming] UID:50309 Activity:low
6/19    CBS and MSNBC talk to a PhD about global warming causing more energetic
        earthquakes.  Only problem is that he's not a PhD, he's a crackpot.
        \_ Which obviously means that all white things are ducks.
           \_ Not the conclusion I was drawing.  The mainstream media is so
              enamored with global warming they'll take anyone to link anything
2008/5/19-23 [Science/Disaster] UID:50007 Activity:nil
5/19    Picture of China earthquake site. (
        I don't know what to say.  It's good if the China earthquake victims
        are drawing inspiration and encouragement from the Sep 11 victims.  I
        just hope that the flag wasn't planted by the Reuters journalist
        himself trying to get a Pulitzer Prize.