seankelly.tv/blog/blogentry.2007-03-02.4768602564
Print this page I love calamari, paella loaded with squid, and even ika (squid) sushi. But the Squid cache and HTTP accelerator is off my menu from now on. com, and helping out the websites of various local interests, I've also been doing some work for a major online magazine for home theater. I've never been involved with high-volume sites before, and so it's been a hugely educational experience.
Squid proxy and HTTP accelerator is generally considered some of the "best practice" when it comes to high volume web sites. Your CMS may spend a lot of time compositing a page, but once it's all assembled it doesn't change all that much. Sure, the little "portlets" on the sides may be updated, but the meat of the page is still the same, and so caching the assembled page for a hungry web audience to consume via direct-from-cache-to-browser is a good idea. And if not Squid, then there's always Apache's mod_proxy, or even a combination of Squid and Apache. That was until the site tried to go live, several times, with the Squid setup--only to find that its confounding configuration and impenetrable log files showed that it was doing almost no good.
It also refused to do round-robin upstream requests to multiple Zope servers after the site decided to make some major hardware investments. Now, I'm generally a patient person, and many of my colleagues know just how much I'll put into banging my head against a configuration file to get something working. With Squid, though, I feel as though I was literally wrestling with a giant leviathan from the deep, a massive monster of a squid risen up from oceanic depths to do battle with me, balancing futilely from a tiny little dinghy rocked upon angry waves. My bruised and battered body, covered in welts from the beast's suction cups, washed up on some lost shore.
Never heard of it--which isn't saying much for the names of open source software packages, which have given up on even providing the tiniest hints of what their function might be. Consider: Siege (a regression test system), Spring (web application framework), Scarab (issue tracker), Cantus (media file tagger), Elektra Initiative (key/value pair framework), Cactus (test framework), Azureus (P2P client), Fink (Mac package framework) and so forth. And not just by default--it's lightyears ahead of Squid. Now, Squid certainly tries to be a lot more than an HTTP accelerator, which is all I wanted for this client, so I should forgive it that. Except that in trying to do so many things--outgoing web proxy, FTP proxy, transparent proxy, load balancer, and reverse proxy--it made it really hard to do just the one task that I needed it to do. Squid also comes with a default configuration file that weighs in at over 4000 lines. Granted, there's a lot of documentation in that file, but it's essentially an embedded man page; strip out all the comments and it's still over 250 lines of configuration. Worse, the past stable version 24, and the current development version, 30, both have good online documentation. Are you supposed to surf the bleeding edge or be mired in the past with Squid? Lastly, Squid, despite my own lame attempts at configuration and an expert's corrections, is just slow. Squid, when acting as reverse-proxy HTTP accelerator, takes an incoming request for a page and sees if it already has it in its memory cache or disk cache. If it's in neither location, it'll ask its upstream content management system to compose the page, at which point it'll cache it in memory in the optimistic hope that someone else will request it again soon. If, after awhile, there are no other requests for it, it'll write it off to disk for the less optimistic case that some request will come in later. Sadly, this strategy effectively defeats the whole point of demand-paged virtual memory provided by modern operating systems. It's an operating system's job to abstract out memory so that an application doesn't have to worry about it. Squid, in actively worrying about it, ruins the operating system's ability to give the application any edge. Take the case where Squid has an object cached in memory--and the operating system sees that the memory page hasn't been referenced in awhile before Squid notices. The operating system transparently and without Squid's knowledge pages it out to disk. Then Squid's timers go off and it decides to write that same object out to disk. It forces the operating system to page it back into memory only so that Squid itself can write it back out to disk! Modern Unix operating systems have the mmap system call which makes such shenanigans totally unnecessary. Varnish instead just maps into memory a huge disk file and treats it as cache. It's the OS's job to manage that and it does just fine, whether you're on Linux, FreeBSD, or otherwise. Squid's disk cache consists of hundreds of files in dozens of subdirectories, forcing the disk subsystem to do dozens of cartwheels to track inodes, disk pages, and other metadata. Varnish's cache is just a big chunk of a file which you can preallocate with dd, minimizing fragmentation, maximizing speed. Look at what else Squid does that's slow and back-asswards: Squid takes the configuration file and uses it to set all sorts of conditions in memory whose codeways must be traversed in order to figure out what to do. Varnish takes its configuration and compiles it at launch time into executable machine code! Squid logs to files, causing disk I/O virtually all the time. I've seen this technique once before, when I worked for a company that made commercial-grade digital video servers. We had two of the core team from FreeBSD working there, and they used the same technique: logging was absolutely vital in debugging that system, and yet it was lighter than the breath of a fairy. I guess it's no wonder that the principal architect of Varnish is yet another FreeBSD core team member: Poul-Henning Kamp.
fuck, I learned here more in 5 minutes then in those 2 years at my present job Reply So, is it working? I would love to see this turn into a contribution to the cachefu configs. In fact, despite Runyaga's encouragement to discredit Squid I held off and gave it a number of additional honest chances. In the end, Squid came by my opinion of it completely fairly and honestly, no monetary exchange at all. If so, got some suggestions for Mac-specific installation and configuration? Yes, it has most of the plumbing necessary (memory mapping, dynamic generation, various timers), but there are a few components that are either different or missing to make it an easy reality. I'm all for writing a blog saying that you found Varnish to be better than Squid for what you are trying to accomplish, but FUCK SQUID? PS - Your join link seems to go to your login page, so I can't join. I've amended my posting now that I'm wearing a more level head. I guess I should call myself lucky to have so much work that I can't work on my own site! I worked with a site that had each reverse proxy with 8GB of RAM and little swap. The reason was that if we went into swap the game was over anyway. with an alternative tuning of 30 minutes for hot-release days when if a file wasnt in memory for that long it wasnt going to be seen that day. Now the site actually has its reverse proxies using something like 32GB of ram on x86_64 as the main targets are ISOs and large files. I would love to try and take down a kraken Smooge Reply Virtual Hosting? Anyway, we need to do virtual hosting with the VHMonster product; using Squid this was accomplished by a redirector application (typically Squirm or iredir) where you could set rules for (sub-)domain matching very elegantly by regular expressions. conf via backend definitions and a sub vcl_recv routine. The mappings tab had been invisible to me because for some reason the 'AddSiteRoor# permission was checked off. Now I'll figure out how to get proper statistics because logging is also handled differently by varnish. It is heavily customised and there are certain components that are triggered on each page view that are sub-optimal from a Plone performance point of view, but are necessary from a business requirements point...
|