kerneltrap.org/node/view/3000
Vote 22 24 comments | 23 view results | 24 older polls 25 Home Linux: Tuning Swappiness Posted by 26 jeremy on Thursday, April 29, 2004 - 04:22 A number of Linux kernel developers recently debated "swapiness" at length on the 27 lkml, exploring when an application should or should not be swapped out, versus reclaiming memory from the cache. Fortunately a run-time tunable is available through the proc interface for anyone needing to adapt kernel behavior to their own requirements. To tune, simply echo a value from 0 to 100 onto /proc/sys/vm/swappiness. The higher a number set here, the more the system will swap. You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine. I reproduce this behavior by simply untarring a 260meg file on a production server, the machine becomes sluggish as it swaps to disk. Is there a way to limit the cache so this machine, which has 1 gigabyte of memory, doesn't dip into swap? I > reproduce this behavior by simply untarring a 260meg file on a > production server, the machine becomes sluggish as it swaps to disk. Running that process overnight on a quiet machines practically guarantees a huge burst of disk activity, with unwanted results: 1) Inode and page caches are blown away 2) A lot of your desktop apps are swapped out Additionally, a (IMO valid) maxim of sysadmins has been "a properly configured server doesn't swap". There should be no reason why this maxim becomes invalid over time. When Linux starts to swap out apps the sysadmin knows will be useful in an hour, or six hours, or a day just because it needs a bit more file cache, I get worried. There should be no reason why this > maxim becomes invalid over time. When Linux starts to swap out apps the > sysadmin knows will be useful in an hour, or six hours, or a day just > because it needs a bit more file cache, I get worried. What if you have some huge application that only runs once per day for 10 minutes? Do you want it to be consuming 100MB of your memory for the other 23 hours and 50 minutes for no good reason? Anyway, I have a small set of VM patches which attempt to improve this sort of behaviour if anyone is brave enough to try them. Against -mm kernels only I'm afraid (the objrmap work causes some porting difficulty). What if you have some huge application that only > runs once per day for 10 minutes? Do you want it to be consuming > 100MB of your memory for the other 23 hours and 50 minutes for > no good reason? This is one app, even though I use it infrequently, would prefer that it never be swapped out. Mainly when I want to use it, I *WANT* it now (ie not waiting for it to come back from swap) This is just my oppinion. I personally feel that cache should use available memory, not already used memory (swapping apps out for more cache). I personally feel that cache should use available > memory, not already used memory (swapping apps out for more cache). Strongly agreed, though there are pathological cases that prevent this from being something that's easy to implement on a global basis. My point is that decreasing the tendency of the kernel to swap stuff out is wrong. You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine. Get it out on the disk, use the memory for something useful. You really don't want hundreds of megabytes of BloatyApp's > untouched memory floating about in the machine. Get it out on the disk, > use the memory for something useful. These are at the heart of the thread (or my point, at least) -- BloatyApp may be Oracle with a huge cache of its own, for which swapping out may be a huge mistake. The 'SIZE' in top was only 160M and there were no other major apps running. My fairly modest desktop here stabilises at about 300 megs swapped out, with negligible swapin. Getting that memory out on disk, relatively freely is an important optimisation. OK, so it takes four seconds to swap mozilla back in, and you noticed it. Did you notice that those three kernel builds you just did ran in twenty seconds less time because they had more cache available? Otherwise, the subjective "oh gee, that took a long time" seat-of-the-pants stuff does not impress. Let me point out that the kernel right now, with default swappiness very much tends to reclaim cache rather than swapping stuff out. The top-of-thread report was incorrect, due to a misreading of kernel instrumentation. The point here is LATENCY, when a user comes back from lunch and continues typing in OpenOffice, his system should behave just like he left it. IMHO, the VM on a desktop system really should be optimised to have the best interactive behaviour, meaning decent latency when switching applications. I'm gonna stick my fingers in my ears and sing "la la la" until people tell me "I set swappiness to zero and it didn't do what I wanted it to do". I'm thinking that the problem is that the page cache is greedier that most people expect. For example, if I could hold the page cache to be under a specific size, then I could do some performance measurements. On a machine with loads of RAM, where's the optimal page cache size? For example, if I could hold the page cache to be > under a specific size, then I could do some performance measurements. On a > machine with loads of RAM, where's the optimal page cache size? Nope, there's no point in leaving free memory floating about when the kernel can and will reclaim clean pagecache on demand. What you discuss above is just an implementation detail. Thus far I've seen a) updatedb causes cache reclaim b) updatedb causes swapout c) prefer that openoffice/mozilla not get paged out when there's heavy pagecache demand. Some have been proposed but they could have serious downsides. For b) and c) we can tune the pageout-vs-cache reclaim tendency with /proc/sys/vm/swappiness, only nobody seems to know that. For example, if I could hold the page cache to be > > under a specific size, then I could do some performance measurements. On a > > machine with loads of RAM, where's the optimal page cache size? For example, if we had 500M total, we map 200M, then we do 400M of IO. Perhaps we'd like to be able to say that a 400M page cache is too big. The problem isn't about reclaiming pagecache it's about the cost of swapping pages back in. The page cache can tend to favor swapping mapped pages over reclaiming it's own pages that are less likely to be used. If I thought I had an method for doing this, I'd write code to try it out. Thus far I've seen The requirement is that we'd like to see pages aged more gracefully. A mapped page that is used continuously for ten minutes and then left to idle for 10 minutes is more valuable than an IO page that was read once and then not used for ten minutes. Some have been proposed but they > could have serious downsides. I've read the source for where swappiness comes into play. For example, if we had 500M > total, we map 200M, then we do 400M of IO. Perhaps we'd like to be > able to say that a 400M page cache is too big. Try it - you'll find that the system will leave all of your 200M of mapped memory in place. You'll be left with 300M of pagecache from that I/O activity. There may be a small amount of unmapping activity if the I/O is a write, or if the system has a small highmem zone. The page cache can tend to favor swapping mapped pages over > reclaiming it's own pages that are less likely to be used. No, the system will only start to unmap pages if reclaim of unmapped pagecache is getting into difficulty. The threshold of "getting into difficulty" is controlled by /proc/sys/vm/swappiness. We only have six levels of aging: referenced+active, unreferenced+active, referenced+inactive,unreferenced+inactive, plus position-on-lru*2. It controls the level of page reclaim distress at which we decide to start reclaiming mapped pages. We prefer to reclaim pagecache, but we have to start swapping at *some* level of reclaim failure. It might make sense to recast swappiness in terms of pages_reclaimed/pages_scanned, which is the real metric of page reclaim distress. But that would only affect the meaning of the actual number - it wo...
|