Entry 49852 (Berkeley CSUA MOTD)

Berkeley CSUA MOTD:Entry 49852

WIKI \| FAQ \| Tech FAQ
`http://csua.com/feed/`

2025/07/09 [General] UID:1000 Activity:popular

7/9

2008/4/29-5/5 [Computer/SW/Languages/Perl, Computer/SW/Languages/Python] UID:49852 Activity:moderate

4/29    Scaling your web app in the real world:
        http://teddziuba.com/2008/04/im-going-to-scale-my-foot-up-y.html
        \_ This article is crap.  While yes, 99.9% of all websites don't
           need any serious scalability plans, if any of them become worth
           anything they will need to scale.  If you write a web application
           without careing about scalability you are writing a webapp that can
           never be more than niche.  Any developer should know where the
           next few scaling bottlenecks live in his application and have some
           basic plan for how to solve them when they become an issue.
        \_ I feel the same way about language Nazis. "Java is the best!"
           "No C is the best!" "Perl sucks it's not readable!" "Python rules!"
           Dumb asses blame the language and not the stupid programmers.
           \_ Different tools for different jobs.  That said, I particularly
              like python.  Its syntax is very clean.
              \_ I don't know python.  I like Pascal the best, although I
                 haven't used it in 18 years.
                 \_ If you liked Pascal, you will love python, unless you
                    get hung up on the fact that blocks in python are
                    defined by indentation rather than by "begin/end"
                    \_ Yeah, I like python, but that blocks by indentation
                       thing drives me up the wall.  Couldn't they at least
                       make it optional? -!pp
              \_ My experience is that Ruby is a lot cleaner than Python, and
                 doesn't have stupid syntactical whitespace. However, I have
                 only used Perl for stuff at jobs etc due to familiarity.
                 Python's object orientedness was less complete than Ruby's
                 and I definitely don't like the indentation thing.
                 \_ I struggled, trying to like Python.  Then I found Ruby and
                    it's the most fun I've had programming in a *long* time.
                    The fact that regexes are as easy as Perl in Ruby was a big
                    deal.
                 \_ Ruby is whitespace sensitive. -- ilyas
                    \_ Far less so than Python.
                       \_ I am not a huge python fan, and I don't like python's
                          whitespace indentation, but I found ruby's specific
                          whitespace sensitivity far more confusing. -- ilyas
                          \_ Interesting, I've never noticed a problem.
                             \_ Perhaps this is because you are not used to
                                programming with closures (blocks are
                                the closest thing in ruby to closures).
                                Ruby blocks have very odd whitespace
                                requirements:
                                (1..3).each {|x| puts x} works
                                (1..3).each {|x|
                                                puts x} works
                                (1..3).each
                                            {|x| puts x} does not work.
                                        -- ilyas
                                \_ No, I use closures, just never have had to
                                   break across the line.
                                   \_ I write fairly hairy closures sometimes,
                                      and often my closure code is nested.
                                      I find this behavior completely bizarre
                                      and unintuitive.  I can't even imagine
                                      why Ruby would insist on this. -- ilyas
                                      \_ I got yer hairy closure right here,
                                         pal.
        \_ This is funny, but not really applicable to real world scaling.
           I have been doing this stuff for 15 years and scaling is more of
           a system architecture and capacity planning issue than a developer
           issue. Of course, if your code is bad enough, no one can make
           it scale.
           \_ I disagree completely. I've taken courses on optimizing
              applications for performance and the best bang for the buck
              is almost always received by altering the code to run
              faster. Sure, things like high-speed interconnects to reduce
              latency can solve problems not easily solved by modifying
              code, but the majority of problems are developer issues that
              they (I would say unknowingly, but maybe because they don't
              care) foist upon the systems people.
              \_ Most cases of performance scaling problems I have encountered
                 have been due to the volume of data being written to disk.
                 These problems can be fixed by using the right RAID type,
                 better use of filesystem caching, a better filesystem, or
                 most often, simply by throwing more disks at the problem.
                 These are not the kinds of issues I would expect a programmer
                 to know or even care too much about. I haven't "taken courses"
                 on it, but I have worked on numerous overloaded web and
                 application sites in the Real World.
                 \_ Sounds like you have haven't encountered a large
                    variety of problems then. Often when a developer
                    profiles his code he can find all sorts of
                    bottlenecks. Often it seems easier to throw h/w at
                    problems, but the biggest gains come from writing
                    better code. For instance, don't write so much data to
                    disk or be smarter about how you do it. You are
                    correct that programmers don't know and care about
                    these issues, but they should. They usually only care
                    when they are forced to because their code doesn't
                    meet requirements, because it compares unfavorably to
                    competing code, or because the hardware solution has
                    failed or is too costly to implement.
                    \_ "In nearly every case the most serious bottleneck
                        is an overloaded or slow disk." -Adrian Cockcroft
                        Sun_Performance_And_Tuning (Ch 1, Paragraph 1)
                        \_ You ever wondered why Google search is so fast?
                           They have the world's largest RAM disk. They
                           index and keep most of their search data
                           ***IN MEMORY***. Last time I attended a talk
                           I learned that they have more a shitload more
                           RAM than many corporations have on disk. It
                           is ridiculous.
                           \_ Thanks for making my point.
                        \_ Well no shit, but this is tangential. The
                           question isn't "Is disk slower than RAM?". It
                           question isn't "Is disk faster than RAM?". It
                           is "Is there a way to do this such that it
                           doesn't write to disk as much?" One example is
                           when developers decide to write 6 million
                           small files in one directory and the filesystem
                           bogs down. Sure, you can buy a faster
                           filesystem but that's correcting the symptom
                           and not the problem. You don't need to buy $$$
                           hardware that probably still can't handle that
                           particular issue if the code didn't do something so
                           stupid.
                           \_ I heard reiserfs is really good at storing lots
                              of little files.
                           \_ I heard reiserfs is really good at storing lots of
                              little files.
                              \_ unfortunately, it stores them in a dumpster
                                 in San Leandro.
                           \- lexis/nexis was pretty fast at seearch +20 yrs
                              go. the old bell labs people [who after all were
                              working for a phone company] have lots of
                              interesting stories about optimizations for
                              various phone company applications. one of
                              the main altavista people wrote some code to
                              use a cache that was physically closer to a
                              processing unit to avoid die-crossing latency
                              [and had numbers to show the difference it
                              made]. google is mostly read data and it's not
                              authoritative but a cache/copy for much.
                              contrast this with say ebay. for a somewhat
                              interesting discussion of scaling look at
                              randy shoup's presentation/talk on ebay scaling.
                              [trivia: randy was a high school acquaintance of
                              mine. i thought he was going to become a lawyer
                              and i was mighty surprised he went into cs/
                              databases].
                              \_ Getting all your caches right is not really
                                 a developer responsibility, but I admit that
                                 it starts to cross disciplines. Most people
                                 are just sort of confused how it works, so
                                 in this case, the one eyed man is king.
                                 \_ Whose responsibility do you think it is
                                    if not the developer? If he doesn't
                                    have the knowledge then he needs to
                                    consult with someone who does, but
                                    he's the implementer. Too often the
                                    developer has no idea, doesn't ask
                                    anyone, and implements something stupid.
                                    \_ I guess I would have to say that it
                                       is a shared responsibility between the
                                       system architect and the developers.
                                       A lot of times developers don't know
                                       what is possible, especially what is
                                       possible at a reasonable price point.
                                       How big a RAM disk cache can you expect
                                       to have available for your application
                                       in a shared disk array? How would a
                                       developer hope to possibly know that?
                                       But far too often system administrator
                                       types don't share this kind of info,
                                       even if they do know it themselves.
                                       \_ I would argue that developers
                                          should know what they don't know -
                                          or at least consider these issues
                                          early (before they become a problem).
                                          Part of the problem is that people
                                          with systems knowledge often come
                                          into the project late in the
                                          development of it - too late to
                                          make major changes. We see this
                                          problem in spacecraft operations.
                                          The hardware guys build a shiny new
                                          spacecraft without consulting with
                                          the people who are going to fly it.
                                          They make "sound technical decisions"
                                          and h/w design decisions that are
                                          intended to save lots of money, but
                                          they have no knowledge (or, worse,
                                          just enough to hang themselves) about
                                          how to operate the h/w they build.
                                          This often ends up being a case
                                          of saving $$ on the h/w and spending
                                          $$$$ on the operations (or not being
                                          able to operate at all - or with
                                          greatly increased risk). The *good*
                                          h/w guys know who to involve early in
                                          the process and why, but they are a
                                          small minority even in large,
                                          experienced companies like Lockheed.
                                          With scaled systems it's rather
                                          the opposite. The s/w guys design and
                                          build a system without considering
                                          h/w (or the systems environment).
                            \_ I had that exact problem at one place (millions
                               of files in one directory). We talked about
                               various ways to fix this and decided that
                               switching from WAFL to VxFS was the best
                               solution. In some ways this was just because
                               the developers were too lazy to figure out how
                               to use a database, but it worked.
                               \_ Why not spread those millions of files
                                  over many directories? In itself that
                                  helps a lot and it's a simple fix. A
                                  database is another idea. Switching
                                  filesystems sounds pretty drastic to me.
                                  \_ It was already hashed, so what we really
                                     had was billions of files, millions in
                                     each directory. There is no magic bullet
                                     for dealing with that quantity of data.
                                     Millions of directories is not really
                                     a good solution either, for reasons that
                                     should be obvious. By the time I left
                                     the company, they had started work on
                                     what was essentially their own filesystem
                                     but I don't know what happened to that
                                     project.
                                     \_ What a disaster. This sounds like poor
                                        s/w design.
                                     \_ All that because the devs don't want to
                                        figure out how to use a db?
                                        \_ Yeah, well it was 1999 and good
                                           developers (or sysadmins) were hard
                                           to come by. The new filesystem I was
                                           referring to had a DB included.
                                           \_ You think they are easier to come
                                              by now? If anything, it
                                              seems to be getting worse as
                                              a lot of Microsoft-trained,
                                              Java-loving weenies have
                                              entered the field and very
                                              few hardcore assembler-loving
                                              PDP-11 weenies still exist.
                                              Over time it seems the
                                              average developer/sysadmin
                                              knows less and less about
                                              the details of the systems
                                              in favor of high-level constructs
                                              like WWW and GUI design. There's
                                              a place for both, of course,
                                              but I am horrified by what
                                              recent CS grads do not know.
           \- I disagree as well. Some simple problems are solved by
              throwing money at them ... say $20k - $100k problems.
              But at some point programmer time does become cheaper than
              cycles, space etc. And there are other cases where the best
              hardware cant do what brainpower can. Trivial example are new
              crypto attacks. Another case is reading 10gb traffic streams...
              you cant just naively throw hardware at the problem. It's
              combination of hardware [ASIC, FPGA other specialized network
              devices], OS/kernel/devce driver hackery, and application
              design.
              \_ Any network with 10gb of traffic on it that cannot be easily
                 broken up is not scalable.
                 \- what you control may affect your options. we want to
                    do IDS on 10G. We cant tell say ESNet to tailor bandwidth
                    provisioning around IDS. What we can ask for is $ for
                    hardware as long as we're not being stupid about it.
                    The "web application scaling" is a different problem
                    than some other scaling issues ... something like the
                    LHC has different scaling issues, for example.

2025/07/09 [General] UID:1000 Activity:popular

7/9

You may also be interested in these entries...

2013/4/9-5/18 [Computer/SW/Languages/C_Cplusplus, Computer/SW/Apps, Computer/SW/Languages/Perl] UID:54650 Activity:nil

4/04    Is there a good way to diff 2 files that consist of columns of
        floating point numbers, such that it only tells me if there's a
        difference if the numbers on a given line differ by at least a given
        ratio?  Say, 1%?
        \_ Use Excel.
           1. Open foo.txt in Excel.  It should convert all numbers to cells in
	...

2012/12/18-2013/1/24 [Computer/SW/Languages/Perl] UID:54561 Activity:nil

12/18   Happy 25th birthday Perl, and FUCK YOU Larry Wall for fucking up
        the computer science formalism that sets back compilers development
        back for at least a decade:
        http://techcrunch.com/2012/12/18/print-happy-25th-birthday-perl
        \_ I tried to learn Perl but was scared away by it.  Maybe scripting
           lanauages have to be like that in order to work well?
	...

2011/12/23-2012/2/6 [Computer/SW/Languages/Python] UID:54272 Activity:nil

12/23   In Python, why is it that 'å¥½'=='\xe5\xa5\xbd' but
        u'å¥½'!='\xe5\xa5\xbd' ? I'm really baffled. What
        is the encoding of '\xe5\xa5\xbd'?
        \_ 'å¥½' means '\xe5\xa5\xbd', which is just a string of bytes; it has
           length 3.  Python doesn't know what encoding it's in.  u'å¥½' means
           u'\u597d', which is a string of Unicode characters; it has length 1,
	...

2011/4/16-7/13 [Computer/SW/Languages/Python] UID:54086 Activity:nil

4/16    Whoa, I just heard that MIT discontinued 6.001 (classic scheme)
        to 6.01. In fact, 6.00, 6.01 and 6.02 all use Python. What the
        hell? What has the world become? It's a sad sad day. SICP forever!
        \_ old story, they've ditched that shitty book and lang for a while.
        \_ I used to think scheme was cool, then I saw Ka Ping Yee's
           "Beautiful Code" class aka 61a in python, and converted.
	...

2011/2/24-4/20 [Computer/SW/Languages/Java] UID:54048 Activity:nil

2/24    Go Programming Language.  Anyone here use it?  It kind of
        reminds me of java-meets python, and well, that is fitting given it's
        a GOOG product.  What is so special about it?
        \_ as I understand it, it's a suitable OOP-y systems language with more
           structure than C, less complexity than C++, and less overhead than
           Java/Python.
	...

2011/3/31-4/20 [Computer/SW/Languages/Python] UID:54070 Activity:nil

3/20    Has anyone here had success in using python 3.0?  Any gotchas
        to worry about? I've got an entire set of apps in python 2.x
        and am wondering if it's worth it to upgrade?
	...

2010/8/8-9/7 [Computer/SW/Languages/C_Cplusplus, Computer/SW/Languages/Web] UID:53914 Activity:nil

8/8     Trying to make a list of interesting features languages have
        touted as this whole PL field comes around, trying to see if they
        have basis in the culture of the time: feel free to add some/dispute
        1970 C, "portability"
        1980 C++, classes, oop, iterators, streams, functors, templates
             expert systems
	...

2010/8/12-9/7 [Computer/SW/Languages/Perl] UID:53922 Activity:nil

8/12    Ruby coders, do you mostly DIY your stuff or use the ruby libs out
        there?   How is their quality compared to other libs you have used
        for other langs?  Thx.
        \_ I use Ruby for hobby stuff, etc.  I use libraries for system stuff
           (web access, process, etc.) but that's about it.  Perl libraries are
           much better/more complete.  I assume because of the maturity and
	...

Cache (2629 bytes)

teddziuba.com/2008/04/im-going-to-scale-my-foot-up-y.htmlIt makes us feel like the bad ass, dick-swingin' motherfuckers that we wish we could be. Once that post hits Reddit, son, everyone will know how hardcore you really are. People Who Talk Big About Scalability Don't Need To Worry About It Fact: every chest-thumping blog post I have seen written about scalability is either about architecture, Memcached, or both. Some asshole who writes shitty code starts pontificating about "scalable architecture" with data storage, web frontends, whatever-the-fuck. Dude, your app isn't having scalability problems because of the architecture. It's having scalability problems because you coded a ton of N^2 loops into it and you're too self-important to get peer reviews on your commits. And let's not forget the tools who discover Memcached for the first time, install it on a web server, and notice how fast their app runs now. If You Haven't Discussed Capacity Planning, You Can't Discuss Scalability You don't need to worry about scalability on your Rails-over-Mysql application because nobody is going to use it. You're going to get, at most, 1,000 people on your app, and maybe 1% of them will be 7-day active. Scalability is not your problem, getting people to give a shit is. Unless you know what you need to scale to, you can't even begin to talk about scalability. Here's a hint: the system you design to handle a quarter million users is going to be different from the system you design to handle ten million users. Of course you'll point to the engineer's wet dream: linear scalability. Lulz but when we get more users we just add more machines you are so stupid ted. Oh no, go ahead and try out Amazon SimpleDB and think to yourself that it will scale linearly. Then, when you get enough users that the latency becomes a problem, blame it on "those shitty Amazon datacenters". Choosing Technology Don't Mean Shit If You Don't Know How To Use It The most common butthurt about scalability is this: choose a technology. If you like the technology, claim "technology X scales better!" If you don't like it, claim "technology X doesn't scale!" Saying "Rails doesn't scale" is like saying "my car doesn't go infinitely fast". Alternatively, saying "We'll have no problems scaling because we're using Django" is like saying "I will win every race because my car is the most powerful". Maybe so, but you suck at driving, and you're up against professionals. If you're having scalability problems and blaming it on a single technology, chances are, you're doing it wrong. Search Search About this Entry This page contains a single entry by Ted Dziuba published on April 24, 2008 7:03 AM.