queue.acm.org/detail.cfm?id=2076798
RISKS Forum Sign up for QueueNews Our free weekly newsletter showcases all of ACM Queue's latest articles and columns. Your Email: (Sign Up) Sign Up Sign up for Queue-Updates Stay up to date on the latest developments at ACM Queue, including announcements about contests, interviews, and more.
Photoshop Scalability: Keeping It Simple Clem Cole and Russell Williams discuss Photoshop's long history with parallelism, and what they now see as the main challenge.
Managing Contention for Shared Resources on Multicore Processors Contention for Caches, Memory Controllers, and Interconnects Can Be Alleviated by Contention-Aware Scheduling Algorithms.
Making the Move to AJAX What a software-as-a-service provider learned from using an AJAX framework for RIA development BufferBloat: What's Wrong with the Internet? A discussion with Vint Cerf, Van Jacobson, Nick Weaver, and Jim Gettys Internet delays are now as common as they are maddening. That means they end up affecting system engineers just like all the rest of us. And when system engineers get irritated, they often go looking for what's at the root of the problem. His slow home network had repeatedly proved to be the source of considerable frustration, so he set out to determine what was wrong, and he even coined a term for what he found: bufferbloat. Bufferbloat refers to excess buffering inside a network, resulting in high latency and reduced throughput. it provides space to queue packets waiting for transmission, thus minimizing data loss. In the past, the high cost of memory kept buffers fairly small, so they filled quickly and packets began to drop shortly after the link became saturated, signaling to the communications protocol the presence of congestion and thus the need for compensating adjustments. Because memory now is significantly cheaper than it used to be, buffering has been overdone in all manner of network devices, without consideration for the consequences. Manufacturers have reflexively acted to prevent any and all packet loss and, by doing so, have inadvertently defeated a critical TCP congestion-detection mechanism, with the result being worsened congestion and increased latency. Now that the problem has been diagnosed, people are working feverishly to fix it. This case study considers the extent of the bufferbloat problem and its potential implications. Working to steer the discussion is Vint Cerf, popularly known as one of the "fathers of the Internet." As the co-designer of the TCP/IP protocols, Cerf did indeed play a key role in developing the Internet and related packet data and security technologies while at Stanford University from 1972-1976 and with DARPA (the US Department of Defense's Advanced Research Projects Agency) from 1976-1982. He currently serves as Google's chief Internet evangelist. Van Jacobson, presently a research fellow at PARC where he leads the networking research program, is also central to this discussion. Considered one of the world's leading authorities on TCP, he helped develop the RED (random early detection) queue management algorithm that has been widely credited with allowing the Internet to grow and meet ever-increasing throughput demands over the years. Prior to joining PARC, Jacobson was a chief scientist at Cisco Systems and later at Packet Design Networks. Also participating is Nick Weaver, a researcher at ICSI (International Computer Science Institute in Berkeley where he was part of the team that developed Netalyzr, a tool that analyzes network connections and has been instrumental in detecting bufferbloat and measuring its impact across the Internet. He now is a member of the technical staff at Alcatel-Lucent Bell Labs, where he focuses on systems design and engineering, protocol design, and free software development. VINT CERF What caused you to do the analysis that led you to conclude you had problems with your home network related to buffers in intermediate devices? JIM GETTYS I was running some bandwidth tests on an old IPsec (Internet Protocol Security)-like device that belongs to Bell Labs and observed latencies of as much as 12 seconds whenever the device was running as fast it could. That didn't entirely surprise me, but then I happened to run the same test without the IPsec box in the way, and I ended up with the same result. With 12-second latency accompanied by horrible jitter, my home network obviously needed some help. The rule of thumb for good telephony is 150-millisecond latency at most, and my network had nearly 10 times that much. My first thought was that the problem might relate to a feature called PowerBoost that comes as part of my home service from Comcast. That led me to drop a note to Rich Woundy at Comcast since his name appears on the Internet draft for that feature. He lives in the next town over from me, so we arranged to get together for lunch. During that lunch, Rich provided me with several pieces to the puzzle. To begin with, he suggested my problem might have to do with the excessive buffering in a device in my path rather than with the PowerBoost feature. He also pointed out that ICSI has a great tool called Netalyzr that helps you figure out what your buffering is. Also, much to my surprise, he said a number of ISPs had told him they were running without any queue management whatsoever--that is, they weren't running RED on any of their routers or edge devices. I had been having trouble reproducing the problem I'd experienced earlier, but since I was using a more recent cable modem this time around, I had a trivial one-line command for reproducing the problem. The resulting SmokePing plot clearly showed the severity of the problem, and that motivated me to take a packet-capture so I could see just what in the world was going on.
Anyway, it became clear that what I'd been experiencing on my home network wasn't unique to cable modems. VC I assume you weren't the only one making noises about these sorts of problems?
about a year earlier had reported problems in 3G networks that also appeared to be caused by excessive buffering. He was ultimately ignored when he publicized his concerns, but I've since been able to confirm that Dave was right. In his case, he would see daily high latency without much packet loss during the day, and then the latency would fall back down again at night as flow on the overall network dropped.
had noticed that the DSLAM (Digital Subscriber Line Access Multiplexer) his micro-ISP runs had way too much buffering--leading to as much as six seconds of latency. And this is something he'd observed six years earlier, which is what had led him to warn Rich Woundy of the possible problem. VC Perhaps there's an important life lesson here suggesting you may not want to simply throw away outliers on the grounds they're probably just flukes. When outliers show up, it might be a good idea to find out why. NICK WEAVER But when testing for this particular problem, the outliers actually prove to be the good networks. JG Without Netalyzr, I never would have known for sure whether what I'd been observing was anything more than just a couple of flukes. After seeing the Netalyzr data, however, I could see how widespread the problem really was. I can still remember the day when I first saw the data for the Internet as a whole plotted out. NW It's actually a pretty straightforward test that allowed us to capture all that data. I remembered, however, that Nick McKeown and others had ranted about how amazingly over-buffered home networks often proved to be, so buffering seemed like a natural thing to test for. It turns out that would also give us a bandwidth test as a side consequence. Over just a 10-second period, it sends a packet and then waits for a packet to return. Then each time it receives a packet back, it sends two more. It either sends large packets and receives small ones in return, or it sends small packets and receives large ones. During the last five seconds of that 10-second period, it just measures the latency under load in comparison to the latency without load. It's essentially just a simple way to stress out the network. We didn't get around to analyzing all that data until a f...
|