Alan Hargreaves' Blog

The ramblings of an Australian SaND TSC* Principal Field Technologist

How do Solaris Filesystems Update Statistics Without Intimate Knowledge of the cpu Structure?

Well, opensolaris is now
available. One of the nice things about this is it means that there
are a lot more things that we can freely talk about.

There are a number of kstats that people who use
filesystems simply assume should be updated in the
Unfortunately, it appears that we never really advertised a method of
doing this. Subsequently, some of the third party filesystems directly
update them (for which we cannot really fault them).

Some time ago you may remember that we had a problem with various
third party filesystems (and a few other things) breaking as a result
of installing patch 108528-29 on Solaris 8.
The root cause of that problem was that a new element was added into the
middle of the

That is, the kstats changed their
offset within the structure, so the packages were doing their updates
to the wrong structure elements, as they had been compiled with the old

For my own interest I constructed the following write-up of how ufs
does it as a result of that problem. I hope folks find it useful, if
nothing else it will also give an introduction to navigating the
OpenSolaris Source Browser
and the bug tracking interface.

Now that the
code has been released under
the CDDL licence,
I can not only talk about this issue, but we can link into the
source tree as well.

Kstats that filesystems use

The kstats that filesystems are expected to update form a part of the
cpu_t structure. They are

cpu_stat.cpu_sysinfo.bread	/* physical block reads			*/
cpu_stat.cpu_sysinfo.bwrite	/* physical block writes (sync + async)	*/
cpu_stat.cpu_sysinfo.lread	/* logical block reads			*/
cpu_stat.cpu_sysinfo.lwrite	/* logical block writes			*/
cpu_stat.cpu_sysinfo.bawrite	/* physical block writes (async)	*/
cpu_stat.cpu_vminfo.pgin	/* pageins				*/
cpu_stat.cpu_vminfo.pgpgin	/* pages paged in			*/
cpu_stat.cpu_vminfo.anonpgin	/* anon pages paged in			*/
cpu_stat.cpu_vminfo.execpgin	/* executable pages paged in		*/
cpu_stat.cpu_vminfo.fspgin	/* fs pages paged in			*/
cpu_stat.cpu_vminfo.maj_fault	/* major page faults			*/

and can be found in

forms a part of the ON (O/S and Network) consolidation, it does not
directly update the stats in the cpu structure. The updates are
performed within a number of the routines that are used to do the
I/O. The reason for this is that
is considered Contract/Private interface. This basically means that if a project wants to use the interface, a contract must exist with the interface owner. In this way, if the interface changes, we know which other modules are affected.
For more information on interface stability, see attributes(5).


All of the cpu_vminfo statistics are updated from

* Allocate and initialize a buf struct for use with pageio.
struct buf *
pageio_setup(struct page *pp, size_t len, struct vnode *vp, int flags)

In the case of (flags & B_READ), this routine will update all
of the above values in cpu_vminfo as appropriate.

pgin will be incremented with each call.

pgpgin will be incremented by the number of pages required
to page in len bytes.

anonpgin, execpgin and fspgin will be
incremented similarly to pgpgin, based upon information
found in pp->p_vnode.

maj_fault will be incremented in the case of a syncronous
read (ie (flags & B_ASYNC) == 0).

cpu_sysinfo.bread and

lread is updated on every call to
If we actually go to disk then bread is also updated.

* Common code for reading a buffer with various options
* Read in (if necessary) the block and return a buffer pointer.
struct buf *
bread_common(void *arg, dev_t dev, daddr_t blkno, long bsize)

is similar to
except that it also triggers a read ahead on the next block.

* Read in the block, like bread, but also start I/O on the
* read-ahead block (which is not allocated to the caller).
struct buf *
breada(dev_t dev, daddr_t blkno, daddr_t rablkno, long bsize)

cpu_sysinfo.lwrite and

* Common code for writing a buffer with various options.
* force_wait  - wait for write completion regardless of B_ASYNC flag
* do_relse    - release the buffer when we are done
* clear_flags - flags to clear from the buffer
bwrite_common(void *arg, struct buf *bp, int force_wait,
int do_relse, int clear_flags)

Each call to
increments both bwrite and lwrite. If we are forced
to asyncronous, either by force_wait or (flag &
, then bawrite is also incrememented.

* Release the buffer, marking it so that if it is grabbed
* for another purpose it will be written out before being
* given up (e.g. when writing a partial block where it is
* assumed that another write for the same block will soon follow).
* Also save the time that the block is first marked as delayed
* so that it will be written in a reasonable time.
bdwrite(struct buf *bp)

also increments lwrite each time it is called.

The future?

In November I logged

RFE 6199092
which requests that these kstats be removed
from the cpu structure and made a part of the DDI. This would fit quite
nicely with some suggestions that we are hearing about creating filesystem
statistics on a per zone basis as well as on a per cpu. We’ll see how this
RFE progresses.

Technorati Tags: Solaris,OpenSolaris


Written by Alan

June 9, 2005 at 8:00 am

Posted in OpenSolaris

%d bloggers like this: