www.kegel.com/c10k.html
Help save the best Linux news source on the web -- subscribe to Linux Weekly News! It's time for web servers to handle ten thousand clients simultaneously, don't you think? You can buy a 1000MHz machine with 2 gigabytes of RAM and an 1000Mbit/sec Ethernet card for $1200 or so. Let's see - at 20000 clients, that's 50KHz, 100Kbytes, and 50Kbits/sec per client. It shouldn't take any more horsepower than that to take four kilobytes from the disk and send them to the network once a second for each of twenty thousand clients. com, actually handled 10000 clients simultaneously through a Gigabit Ethernet pipe.
being offered by several ISPs, who expect it to become increasingly popular with large business customers. And the thin client model of computing appears to be coming back in style -- this time with the server out on the Internet, serving thousands of clients. With that in mind, here are a few notes on how to configure operating systems and write code to support thousands of clients. The discussion centers around Unix-like operating systems, as that's my personal area of interest, but Windows is also covered a bit.
presentation about network scalability, complete with benchmarks comparing various networking system calls and operating systems. One of his observations is that the 26 Linux kernel really does beat the 24 kernel, but there are many, many good graphs that will give the OS developers food for thought for some time.
Unix Network Programming : Networking Apis: Sockets and Xti (Volume 1) by the late W Richard Stevens. It describes many of the I/O strategies and pitfalls related to writing high-performance servers.
ACE, a heavyweight C++ I/O framework, contains object-oriented implementations of some of these I/O strategies and many other useful things. In particular, his Reactor is an OO way of doing nonblocking I/O, and Proactor is an OO way of doing asynchronous I/O.
libevent is a lightweight C I/O framework by Niels Provos. It supports kqueue and select, and soon will support poll and epoll. It's level-triggered only, I think, which has both good and bad sides.
Poller is a lightweight C++ I/O framework that implements a level-triggered readiness API using whatever underlying readiness API you want (poll, select, /dev/poll, kqueue, or sigio).
benchmarks that compare the performance of the various APIs. This document links to Poller subclasses below to illustrate how each of the readiness APIs can be used.
rn is a lightweight C I/O framework that was my second try after Poller. It's lgpl (so it's easier to use in commercial apps) and C (so it's easier to use in non-C++ apps).
a paper in April 2000 about how to balance the use of worker thread and event-driven techniques when building scalable servers. The paper describes part of his Sandstorm I/O framework.
library - an async socket, file, and pipe I/O library for Windows I/O Strategies Designers of networking software have many options. Here are a few: * Whether and how to issue multiple I/O calls from a single thread + Don't; use blocking/synchronous calls throughout, and possibly use multiple threads or processes to achieve concurrency + Use nonblocking calls (eg write() on a socket set to O_NONBLOCK) to start I/O, and readiness notification (eg poll() or /dev/poll) to know when it's OK to start the next I/O on that channel.
Build the server code into the kernel 1 Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification ... set nonblocking mode on all network handles, and use select() or poll() to tell which network handle has data waiting. With this scheme, the kernel tells you whether a file descriptor is ready, whether or not you've done anything with that file descriptor since the last time the kernel told you about it.
That's why it's important to use nonblocking mode when using readiness notification. An important bottleneck in this method is that read() or sendfile() from disk blocks if the page is not in core at the moment; setting nonblocking mode on a disk file handle has no effect. The first time a server needs disk I/O, its process blocks, all clients must wait, and that raw nonthreaded performance goes to waste. This is what asynchronous I/O is for, but on systems that lack AIO, worker threads or processes that do the disk I/O can also get around this bottleneck. One approach is to use memory-mapped files, and if mincore() indicates I/O is needed, ask a worker to do the I/O, and continue handling network traffic.
in November 2003 on the freebsd-hackers list, Vivek Pei et al reported very good results using system-wide profiling of their Flash web server to attack bottlenecks. One bottleneck they found was mincore (guess that wasn't such a good idea after all) Another was the fact that sendfile blocks on disk access; they improved performance by introducing a modified sendfile() that return something like EWOULDBLOCK when the disk page it's fetching is not yet in core. There are several ways for a single thread to tell which of a set of nonblocking sockets are ready for I/O: * The traditional select() Unfortunately, select() is limited to FD_SETSIZE handles. This limit is compiled in to the standard library and user programs.
benchmarks) for an example of how to use poll() interchangeably with other readiness notification schemes. The idea behind /dev/poll is to take advantage of the fact that often poll() is called many times with the same arguments. With /dev/poll, you get an open handle to /dev/poll, and tell the OS just once what files you're interested in by writing to that handle; from then on, you just read the set of currently ready file descriptors from that handle.
according to Sun, at 750 clients, this has 10% of the overhead of poll(). Various implementations of /dev/poll were tried on Linux, but none of them perform as well as epoll, and were never really completed.
kqueue() can specify either edge triggering or level triggering. It then assumes you know the file descriptor is ready, and will not send any more readiness notifications of that type for that file descriptor until you do something that causes the file descriptor to no longer be ready (eg until you receive the EWOULDBLOCK error on a send, recv, or accept call, or a send or recv transfers less than the requested number of bytes). When you use readiness change notification, you must be prepared for spurious events, since one common implementation is to signal readiness whenever any packets are received, regardless of whether the file descriptor was already ready.
It's a bit less forgiving of programming mistakes, since if you miss just one event, the connection that event was for gets stuck forever. Nevertheless, I have found that edge-triggered readiness notification made programming nonblocking clients with OpenSSL easier, so it's worth trying.
There are several APIs which let the application retrieve 'file descriptor became ready' notifications: * kqueue() This is the recommended edge-triggered poll replacement for FreeBSD (and, soon, NetBSD).
To change the events you are listening for, or to get the list of current events, you call kevent() on the descriptor returned by kqueue(). It can listen not just for socket readiness, but also for plain file readiness, signals, and even for I/O completion. Note: as of October 2000, the threading library on FreeBSD does not interact well with kqueue(); evidently, when kqueue() blocks, the entire process blocks, not just the calling thread.
This is just like the realtime signal readiness notification, but it coalesces redundant events, and has a more efficient scheme for bulk event retrieval. A patch for the older version of epoll is available for the 24 kernel.
unifying epoll, aio, and other event sources on the linux-kernel mailing list around Halloween 2002. It may yet happen, but Davide is concentrating on firming up epoll in general first.
some recent discussion * Drepper's New Network Interface (proposal for Linux 26+) At OLS 2006, Ulrich Drepper proposed a new high-speed asynchronous networking API.
LWN article from July 22 * Realtime Signals This is the recommended edge-triggered poll replacemen...
|