4/29 Scaling your web app in the real world:
http://teddziuba.com/2008/04/im-going-to-scale-my-foot-up-y.html
\_ This article is crap. While yes, 99.9% of all websites don't
need any serious scalability plans, if any of them become worth
anything they will need to scale. If you write a web application
without careing about scalability you are writing a webapp that can
never be more than niche. Any developer should know where the
next few scaling bottlenecks live in his application and have some
basic plan for how to solve them when they become an issue.
\_ I feel the same way about language Nazis. "Java is the best!"
"No C is the best!" "Perl sucks it's not readable!" "Python rules!"
Dumb asses blame the language and not the stupid programmers.
\_ Different tools for different jobs. That said, I particularly
like python. Its syntax is very clean.
\_ I don't know python. I like Pascal the best, although I
haven't used it in 18 years.
\_ If you liked Pascal, you will love python, unless you
get hung up on the fact that blocks in python are
defined by indentation rather than by "begin/end"
\_ Yeah, I like python, but that blocks by indentation
thing drives me up the wall. Couldn't they at least
make it optional? -!pp
\_ My experience is that Ruby is a lot cleaner than Python, and
doesn't have stupid syntactical whitespace. However, I have
only used Perl for stuff at jobs etc due to familiarity.
Python's object orientedness was less complete than Ruby's
and I definitely don't like the indentation thing.
\_ I struggled, trying to like Python. Then I found Ruby and
it's the most fun I've had programming in a *long* time.
The fact that regexes are as easy as Perl in Ruby was a big
deal.
\_ Ruby is whitespace sensitive. -- ilyas
\_ Far less so than Python.
\_ I am not a huge python fan, and I don't like python's
whitespace indentation, but I found ruby's specific
whitespace sensitivity far more confusing. -- ilyas
\_ Interesting, I've never noticed a problem.
\_ Perhaps this is because you are not used to
programming with closures (blocks are
the closest thing in ruby to closures).
Ruby blocks have very odd whitespace
requirements:
(1..3).each {|x| puts x} works
(1..3).each {|x|
puts x} works
(1..3).each
{|x| puts x} does not work.
-- ilyas
\_ No, I use closures, just never have had to
break across the line.
\_ I write fairly hairy closures sometimes,
and often my closure code is nested.
I find this behavior completely bizarre
and unintuitive. I can't even imagine
why Ruby would insist on this. -- ilyas
\_ I got yer hairy closure right here,
pal.
\_ This is funny, but not really applicable to real world scaling.
I have been doing this stuff for 15 years and scaling is more of
a system architecture and capacity planning issue than a developer
issue. Of course, if your code is bad enough, no one can make
it scale.
\_ I disagree completely. I've taken courses on optimizing
applications for performance and the best bang for the buck
is almost always received by altering the code to run
faster. Sure, things like high-speed interconnects to reduce
latency can solve problems not easily solved by modifying
code, but the majority of problems are developer issues that
they (I would say unknowingly, but maybe because they don't
care) foist upon the systems people.
\_ Most cases of performance scaling problems I have encountered
have been due to the volume of data being written to disk.
These problems can be fixed by using the right RAID type,
better use of filesystem caching, a better filesystem, or
most often, simply by throwing more disks at the problem.
These are not the kinds of issues I would expect a programmer
to know or even care too much about. I haven't "taken courses"
on it, but I have worked on numerous overloaded web and
application sites in the Real World.
\_ Sounds like you have haven't encountered a large
variety of problems then. Often when a developer
profiles his code he can find all sorts of
bottlenecks. Often it seems easier to throw h/w at
problems, but the biggest gains come from writing
better code. For instance, don't write so much data to
disk or be smarter about how you do it. You are
correct that programmers don't know and care about
these issues, but they should. They usually only care
when they are forced to because their code doesn't
meet requirements, because it compares unfavorably to
competing code, or because the hardware solution has
failed or is too costly to implement.
\_ "In nearly every case the most serious bottleneck
is an overloaded or slow disk." -Adrian Cockcroft
Sun_Performance_And_Tuning (Ch 1, Paragraph 1)
\_ You ever wondered why Google search is so fast?
They have the world's largest RAM disk. They
index and keep most of their search data
***IN MEMORY***. Last time I attended a talk
I learned that they have more a shitload more
RAM than many corporations have on disk. It
is ridiculous.
\_ Thanks for making my point.
\_ Well no shit, but this is tangential. The
question isn't "Is disk slower than RAM?". It
question isn't "Is disk faster than RAM?". It
is "Is there a way to do this such that it
doesn't write to disk as much?" One example is
when developers decide to write 6 million
small files in one directory and the filesystem
bogs down. Sure, you can buy a faster
filesystem but that's correcting the symptom
and not the problem. You don't need to buy $$$
hardware that probably still can't handle that
particular issue if the code didn't do something so
stupid.
\_ I heard reiserfs is really good at storing lots
of little files.
\_ I heard reiserfs is really good at storing lots of
little files.
\_ unfortunately, it stores them in a dumpster
in San Leandro.
\- lexis/nexis was pretty fast at seearch +20 yrs
go. the old bell labs people [who after all were
working for a phone company] have lots of
interesting stories about optimizations for
various phone company applications. one of
the main altavista people wrote some code to
use a cache that was physically closer to a
processing unit to avoid die-crossing latency
[and had numbers to show the difference it
made]. google is mostly read data and it's not
authoritative but a cache/copy for much.
contrast this with say ebay. for a somewhat
interesting discussion of scaling look at
randy shoup's presentation/talk on ebay scaling.
[trivia: randy was a high school acquaintance of
mine. i thought he was going to become a lawyer
and i was mighty surprised he went into cs/
databases].
\_ Getting all your caches right is not really
a developer responsibility, but I admit that
it starts to cross disciplines. Most people
are just sort of confused how it works, so
in this case, the one eyed man is king.
\_ Whose responsibility do you think it is
if not the developer? If he doesn't
have the knowledge then he needs to
consult with someone who does, but
he's the implementer. Too often the
developer has no idea, doesn't ask
anyone, and implements something stupid.
\_ I guess I would have to say that it
is a shared responsibility between the
system architect and the developers.
A lot of times developers don't know
what is possible, especially what is
possible at a reasonable price point.
How big a RAM disk cache can you expect
to have available for your application
in a shared disk array? How would a
developer hope to possibly know that?
But far too often system administrator
types don't share this kind of info,
even if they do know it themselves.
\_ I would argue that developers
should know what they don't know -
or at least consider these issues
early (before they become a problem).
Part of the problem is that people
with systems knowledge often come
into the project late in the
development of it - too late to
make major changes. We see this
problem in spacecraft operations.
The hardware guys build a shiny new
spacecraft without consulting with
the people who are going to fly it.
They make "sound technical decisions"
and h/w design decisions that are
intended to save lots of money, but
they have no knowledge (or, worse,
just enough to hang themselves) about
how to operate the h/w they build.
This often ends up being a case
of saving $$ on the h/w and spending
$$$$ on the operations (or not being
able to operate at all - or with
greatly increased risk). The *good*
h/w guys know who to involve early in
the process and why, but they are a
small minority even in large,
experienced companies like Lockheed.
With scaled systems it's rather
the opposite. The s/w guys design and
build a system without considering
h/w (or the systems environment).
\_ I had that exact problem at one place (millions
of files in one directory). We talked about
various ways to fix this and decided that
switching from WAFL to VxFS was the best
solution. In some ways this was just because
the developers were too lazy to figure out how
to use a database, but it worked.
\_ Why not spread those millions of files
over many directories? In itself that
helps a lot and it's a simple fix. A
database is another idea. Switching
filesystems sounds pretty drastic to me.
\_ It was already hashed, so what we really
had was billions of files, millions in
each directory. There is no magic bullet
for dealing with that quantity of data.
Millions of directories is not really
a good solution either, for reasons that
should be obvious. By the time I left
the company, they had started work on
what was essentially their own filesystem
but I don't know what happened to that
project.
\_ What a disaster. This sounds like poor
s/w design.
\_ All that because the devs don't want to
figure out how to use a db?
\_ Yeah, well it was 1999 and good
developers (or sysadmins) were hard
to come by. The new filesystem I was
referring to had a DB included.
\_ You think they are easier to come
by now? If anything, it
seems to be getting worse as
a lot of Microsoft-trained,
Java-loving weenies have
entered the field and very
few hardcore assembler-loving
PDP-11 weenies still exist.
Over time it seems the
average developer/sysadmin
knows less and less about
the details of the systems
in favor of high-level constructs
like WWW and GUI design. There's
a place for both, of course,
but I am horrified by what
recent CS grads do not know.
\- I disagree as well. Some simple problems are solved by
throwing money at them ... say $20k - $100k problems.
But at some point programmer time does become cheaper than
cycles, space etc. And there are other cases where the best
hardware cant do what brainpower can. Trivial example are new
crypto attacks. Another case is reading 10gb traffic streams...
you cant just naively throw hardware at the problem. It's
combination of hardware [ASIC, FPGA other specialized network
devices], OS/kernel/devce driver hackery, and application
design.
\_ Any network with 10gb of traffic on it that cannot be easily
broken up is not scalable.
\- what you control may affect your options. we want to
do IDS on 10G. We cant tell say ESNet to tailor bandwidth
provisioning around IDS. What we can ask for is $ for
hardware as long as we're not being stupid about it.
The "web application scaling" is a different problem
than some other scaling issues ... something like the
LHC has different scaling issues, for example. |