Paging when there is free memory?
An interesting issue crossed my desk today.
It seems that we had a customer machine displaying all the symptoms of heavy paging out.
From iostat we were seeing pages being written to swap as follows:
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 0.8 1476.6 0.0 11.5 147.5 248.9 99.8 168.5 95 100 c1t0d0s1 0.8 619.2 0.0 4.8 176.3 253.9 284.4 409.5 100 100 c1t1d0s1 1.6 619.2 0.0 4.8 1.0 3853.2 1.6 6207.0 97 100 md/d1 0.8 1476.6 0.0 11.5 0.0 1535.9 0.0 1039.6 0 100 md/d11 0.8 619.2 0.0 4.8 0.0 3852.6 0.0 6213.9 0 100 md/d21
d1 is a mirror consisting of d11 and d21, which in turn are c1t0d0s1 and c1t1d0s1. Looking at the vfstab we see that this is used for swap. Looking at the above output, we are certainly doing quite a bit of writing to this device. However, if we look at the vmstat output we see
kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr m0 m1 m4 m5 in sy cs us sy id 0 0 0 34971672 14839800 341 777 15776 0 0 0 0 0 2246 2 0 3711 19473 9920 9 6 85 0 0 0 35362808 15012488 16 209 117 0 0 0 0 0 671 1 0 2371 5007 5855 0 3 97
The free column is the amount in K bytes of free memory, in this sample it shows around 14 gb free (there was a total of 32 gb physically in the box).
Poking around a little more I noticed that the system is actually zoned into three local zones to run the application. Looking at the zone configuration I noticed the following (edited for clarity, the xml files don’t look like this).
Zone physcap zone1 17179869184(16 gb) zone2 8589934592 (8 gb) zone3 8589934592 (8 gb)
Ahhh, now we are getting somewhere, we have physical memory constraints on these zones. Looking at the output of rcapstat -z, we see that one of the zones is hitting this limit and as a result is paging, even though there is a lot of free memory actually available to the machine.
id zone nproc vm rss cap at avgat pg avgpg 6 zone3 154 18G 8317M 8192M 3707G 0K 2296G 0K 6 zone3 - 18G 8317M 8192M 0K 0K 0K 0K 6 zone3 - 18G 8317M 8192M 0K 0K 0K 0K 6 zone3 154 18G 8334M 8192M 7768K 0K 8192K 0K 6 zone3 - 18G 8334M 8192M 0K 0K 0K 0K 6 zone3 154 18G 8334M 8192M 250M 0K 126M 0K 6 zone3 - 18G 8334M 8192M 0K 0K 0K 0K 6 zone3 154 18G 8328M 8192M 54M 0K 25M 0K 6 zone3 154 18G 8328M 8192M 148M 0K 109M 0K ...
So, what we have here is a zone configured with a real memory cap of 8 gb and it’s hitting it. As a result rcapd is paging old pages from this zone out as fast as it possibly can, so we see the same effect in this zone, on the customer application, as if the whole machine was having a memory shortage.
The recommendation was to re-visit the setting of this resource constraint on this zone, given the usage.
Yeah, I’ve long recommended that most of the time, when people think they want physical memory capping, they really want virtual memory capping.
Physical caps, using rcapd, are a bit like punishing the whole platoon at boot camp because one member did the wrong thing.
Boyd Adamson
February 23, 2010 at 1:56 pm