Alan Hargreaves' Blog

The ramblings of an Australian SaND TSC* Principal Field Technologist

How Solaris Calculates %user, %system and %idle

A year or so back I wrote an infodoc that described how we
calculated the %iowait (or %wio) number. I had always intended to
create a companion document outlining the broader question of how we
do the %user, %system and %idle numbers. A few misconceptions that
I’ve seen have prompted me to do this as a ‘blog first.

The first thing that must be noted is that with Solaris 10 and
microstate accounting we completely changed how the numbers are
arrived at. I’ll go over the pre-Solaris 10 method first, then
discuss the current method along with links into the Open Solaris Source Tree.

Before Solaris 10

There is an array in cpu_t called
cpu_stats.sys.cpu[]. This array contains counters for:

CPU_USER
CPU_SYSTEM
CPU_WAIT
CPU_IDLE

The various array entries contain a count based on a sample (taken at
fixed intervals) of what each cpu is doing at the time of the
sampling. In order to determine usage, we must take two snapshots of
these counters and look at the differences.

If we sum these differences, we get a count of how many samples were
taken for this particular cpu. We then simply calculate a percentage for
each of the figures.

Okay, so how do we do the sampling?

Usually, the function clock() is called every 10ms1. We do
the sampling in here. For each cpu, we look at what it is currently
executing and increment the appropriate counter. Note that in
Solaris 9 and earlier we still have a counter for IO Wait. This
number is only calculated if the cpu is idle. See infodoc 75659
for more explanation of this.

The values are accessible through the cpu_stat kstat
module as idle, user, kernel and
wait. eg

$ kstat -m cpu_stat -s '/^(wait|user|kernel|idle)$/'
module: cpu_stat                        instance: 0
name:   cpu_stat0                       class:    misc
idle                            373121
kernel                          11557
user                            5196
wait                            0


1. If we define hires_tick as non-zero in
/etc/system, then clock will be called every millisecond.

Solaris 10 and Beyond

In general, the sampling method gives us a pretty good number.
It would be an unusual thread that takes a significant amount of cpu
time, that is not on cpu every time that clock() runs.
However, implementing microstate accounting gave us the opportunity
to make it even better.

The raw numbers are now kept in an array in cpu_t called
cpu_acct[]. This contains entries for:

CMS_USER
CMS_SYSTEM
CMS_IDLE

There is a state called CMS_DISABLED, but it’s used for something
else and there is not an array element for it.

So what are the numbers? We don’t sample anymore. The numbers
represent delta values from the high resolution timer (nanoseconds)
taken from
when the cpu entered this state, to where we are about to change it.
The values are calculated in
new_cpu_mstate().

The current state is saved in cpu->cpu_mstate. On a state
change, the high resolution time is stored in
cpu->cpu_mstate_start.

new_cpu_mstate()
reads the high resolution timer once at the beginning of the
routine. This time is used for the end of the period being measured
and the start of the new period so we do not lose small numbers of
cycles.

This function is called whenever we change state. It’s called
directly from
idle_enter() and
idle_exit(). The other state changes are
handled from
new_mstate(), which also updates per-lwp
statistics. The following functions and macros call new_mstate()

SEMA_BLOCK()
cv_block()
fp_precise()
fpu_trap()
lwp_block()
lwp_cond_wait()
lwp_mutex_timedlock()
lwp_mutex_trylock()
lwp_park()
lwp_rwlock_lock()
sched()
shuttle_resume()
shuttle_sleep()
shuttle_swtch()
stop()
term_mstate()
trap()
turnstile_block()

The upshot of this is that you can probably place a higher reliance
in the figures now, whereas the previous figures were a little more
coursely grained.

The kstats for the previous figures still exist as distinct
structure elements in cpu->cpu_stats.sys, the difference
being that they are now calculated from the microstate accounting
generated figures.

The new figures can be accessed through the new cpu kstat
module which has a grouping called sys, containing the
statistics cpu_nsec_idle, cpu_nsec_kernel and
cpu_nsec_user.

$ kstat -n sys -s 'cpu_nsec*'
module: cpu                             instance: 0
name:   sys                             class:    misc
cpu_nsec_idle                   3626708012091
cpu_nsec_kernel                 113348790642
cpu_nsec_user                   50875403788

Technorati Tags: ,

Advertisements

Written by Alan

July 30, 2005 at 4:14 pm

Posted in Solaris

3 Responses

Subscribe to comments with RSS.

  1. Thank you for the note to my own blogged misperceptions and this note, Alan. Very helpful.

    Michael Ernest

    August 4, 2005 at 6:30 am

  2. You’re very welcome Michael, and I learned some thing while researching it too, which is also good 🙂
    Alan.

    Alan Hargreaves

    August 4, 2005 at 3:02 pm

  3. I’ve a java process which seems to comsume a lot of system cpu resource. On doing truss, I see the like below:
    90659: lwp_mutex_timedlock(0x080732E8, 0x00000000) = 0
    /90685: lwp_mutex_timedlock(0x080732E8, 0x00000000) = 0
    /90659: lwp_mutex_wakeup(0x080732E8) = 0
    /90685: lwp_mutex_wakeup(0x080732E8) = 0
    /90697: lwp_mutex_timedlock(0x080732E8, 0x00000000) = 0
    /90697: lwp_mutex_wakeup(0x080732E8) = 0
    /90657: lwp_mutex_timedlock(0x080732E8, 0x00000000) = 0
    /90700: lwp_mutex_timedlock(0x080732E8, 0x00000000) = 0
    /90657: lwp_mutex_wakeup(0x080732E8) = 0
    /90700: lwp_mutex_wakeup(0x080732E8) = 0
    /90702: lwp_mutex_timedlock(0x080732E8, 0x00000000) = 0
    /90702: lwp_mutex_wakeup(0x080732E8) = 0
    /90659: lwp_mutex_timedlock(0x080732E8, 0x00000000) =
    Is there a DTrace way to find out what is causing this lwp_mytex_timedlock?

    Sandeep Thakur

    August 10, 2005 at 2:32 pm


Comments are closed.

%d bloggers like this: