Alan Hargreaves' Blog

The ramblings of an Australian SaND TSC* Principal Field Technologist

DTrace – Using (placing) SDT probes

function startAnim(anim_to_play)
{
var myAnim = document.embeds[anim_to_play];
if (! myAnim.IsPlaying() )
{
myAnim.Rewind();
myAnim.Play();
}
}

This is a slight re-write of the talk that I gave at the
SOSUG meeting in Sydney on May 31.

I’ve also included in this entry some demonstrations that I’ve done up
using flash and vnc2swf. These are all set up to neither
automatically play nor loop. In order to start them you will need to
bring up the flash menu (right button insider the demo) and select
play. Hopefully I can find a better way to do this in the future.

Updates

June 6 – Thanks to Peter, I now have Start buttons on the demos.

June 7 – A question in the comments propted me to talk a little more about what the assembly looks like and how much space a probe adds to a binary. See the Comments section on this entry.

Overview

What is an SDT Probe?

It stands for Statically Defined Tracing Probe.

In reality, with the exception of fbt and pid probes, all probes are
static. They just don’t appear under the sdt provider as they have been
written to be under other providers. A provider is simply a grouping of
probes.

The fact that they are static probes does not detract from the dynamic
nature of DTrace, which refers to the fact that they can be dynamically
enabled.

These probe points can be placed anywhere in the code in order to make
that code more observable.

There are a number of probes that do exist under the sdt provider. On
the notebook that I write this on there are around 200.

ID   PROVIDER    MODULE                          FUNCTION NAME
460        sdt      unix         page_get_replacement_page page-get
461        sdt      unix                page_get_cachelist page-get
462        sdt      unix                 page_get_freelist page-get
475        sdt      unix                       lgrp_choose lgrp_choose_start
490        sdt      unix              av_dispatch_autovect interrupt-complete
491        sdt      unix                       intr_thread interrupt-complete
492        sdt      unix                            cmnint interrupt-complete
494        sdt      unix              av_dispatch_autovect interrupt-start
495        sdt      unix                       intr_thread interrupt-start
496        sdt      unix                            cmnint interrupt-start
513        sdt   genunix                   callout_execute callout-end
514        sdt   genunix                   callout_execute callout-start
550        sdt   genunix                 kcpc_pcbe_tryload kcpc-pcbe-spec
588        sdt   genunix                  priv_policy_only priv-ok
589        sdt   genunix                priv_policy_choice priv-ok
590        sdt   genunix                       priv_policy priv-ok
591        sdt   genunix                  priv_policy_only priv-err
592        sdt   genunix                priv_policy_choice priv-err
...

A little background on the output of dtrace -l as seen above.

The first column is the probe number. This number will vary from system
to system depending on their order of creation. You generally won’t
need to know this.

The second column is the provider. As mentioned earlier, the provider
is simply a grouping of probes. For example, mib, io, sdt and
sched.

Next we have the module. In kernel space, this will be the kernel
module in which the probe resides. In user space it will be the object.
For example a.out, libc, …

Then we have the probe function. This is simply the name of the
function into which the probe was written.

Finally there is the probe name. The name given to the probe.

So a probe is made up of a 4-tuple of these.

probeprov:probemod:probefunc:probename

Incidentally, the names used in the above layout are the variable names
that can be used from within the D language to access the particular
information about the probe.

To add a probe to the code we use the DTRACE_PROBE macros by
first including <sys/sdt.h>

As a part of loading the module (in kernel space) or building the
object (in user space) the function call generated by this macro is
replaced by a number of nop instructions.

Kernel SDT probes looks like

DTRACE_PROBE(name);
DTRACE_PROBE1(name, type1, arg1);
DTRACE_PROBE2(name, type1, arg1, type2, arg2);
DTRACE_PROBE3(name, type1, arg1, type2, arg2, type3, arg3);
DTRACE_PROBE4(name, type1, arg1, type2, arg2, type3, arg3, type4, arg4);

Each probe has a name and zero or more type/argument pairs. Each of
these pairs describe a variable that is passed to the probe.

User SDT probes look a little different.

DTRACE_PROBE(name);
DTRACE_PROBE1(name, arg1);
DTRACE_PROBE2(name, arg1, arg2);
DTRACE_PROBE3(name, arg1, arg2, arg3);
DTRACE_PROBE4(name, arg1, arg2, arg3, arg4);

You’ll notice that we don’t provide the type information in the code.
I’ll get to how we define them later on.

OK, so let’s look at some examples, starting in kernel space.

The io provider

This shows all kinds of nifty stuff when dealing with I/O.

$ dtrace -l -P io
ID   PROVIDER            MODULE                 FUNCTION NAME
521         io           genunix                  biodone done
522         io           genunix                  biowait wait-done
523         io           genunix                  biowait wait-start
532         io           genunix           default_physio start
533         io           genunix            bdev_strategy start
534         io           genunix                  aphysio start
1263         io               nfs                 nfs4_bio done
1264         io               nfs                 nfs3_bio done
1265         io               nfs                  nfs_bio done
1266         io               nfs                 nfs4_bio start
1267         io               nfs                 nfs3_bio start
1268         io               nfs                  nfs_bio start

The io provider is implemented in sys/sdt.h as wrappers around
the DTRACE_PROBE macros.

#define DTRACE_IO(name)                                         \
DTRACE_PROBE(__io_##name);
#define DTRACE_IO1(name, type1, arg1)                           \
DTRACE_PROBE1(__io_##name, type1, arg1);
#define DTRACE_IO2(name, type1, arg1, type2, arg2)              \
DTRACE_PROBE2(__io_##name, type1, arg1, type2, arg2);
#define DTRACE_IO3(name, type1, arg1, type2, arg2, type3, arg3) \
DTRACE_PROBE3(__io_##name, type1, arg1, type2, arg2,    \
type3, arg3);
#define DTRACE_IO4(name, type1, arg1, type2, arg2,              \
type3, arg3, type4, arg4)                                   \
DTRACE_PROBE4(__io_##name, type1, arg1, type2, arg2,    \
type3, arg3, type4, arg4);

The only thing that these wrappers add is the __io_ prefix to
the name. The kernel DTrace module picks this up as the provider name.

These probe points have been placed where io stats get updated.

This is probably better demonstrated with an example.

On starting this demonstration you will see the command issued. It will
then appear to pause for 10 seconds (because of the tick-10s probe) and
then dump a histogram showing the times taken to do each IO involved in
saving a Star Office presentation.

The tick-10s probe fires after 10 seconds and we use it to
exit the command.

We don’t list the probe module or probe func in the
wait-start/wait-done probe as we don’t care what these values are, we
simply want to match these probe names.

The string between the forward slashes is called a predicate. This
defines the conditions under which the probe will fire. In the
wait-start probe, the condition is that the process name (execname) is
“soffice.bin”. The wait-done probe will only fire if
self->start is non-null.

self->start probably also needs a little explanation. This is
a thread local variable. It has scope of the lifetime of the thread and
DTrace program. That is, if we had two seperate threads, the value of
self->start could be different for each. start is simply the variable
name that I chose to use to record the timestamp at which this probe
fired.

Assigning 0 to self->start frees the memory assoiated with it.

The quantize function is what produces the histogram.

The mib provider

The mib provider, while not defined in terms of the
DTRACE_PROBE macros, is also a static probe.

What makes this one interesting is that this minor change to two macros
has an enormous effect on the observability of the network stack.

In order to show this properly, the full definition of
BUMP_MIB and UPDATE_MIB from inet/mib2.h are
shown.

#define BUMP_MIB(s, x)          {                               \
extern void __dtrace_probe___mib_##x(int, void *);      \
void *stataddr = &((s)->x);                             \
__dtrace_probe___mib_##x(1, stataddr);                  \
(s)->x++;                                               \
}
#define UPDATE_MIB(s, x, y)     {                               \
extern void __dtrace_probe___mib_##x(int, void *);      \
void *stataddr = &((s)->x);                             \
__dtrace_probe___mib_##x(y, stataddr);                  \
(s)->x += (y);                                          \
}

The lines could have been written

#define BUMP_MIB(s, x)          {                               \
DTRACE_PROBE2(__mib_##x, int, 1, void *, &((s)->x))     \
(s)->x++;                                               \
}
#define UPDATE_MIB(s, x, y)     {                               \
DTRACE_PROBE2(__mib_##x, int, y, void *, &((s)->x))     \
(s)->x += (y);                                          \
}

These two macros would previously have been defined simply without
the reference to DTRACE_PROBE2. They are used exclusively to
increment the network kstats. As a result, we end up with a probe point
for every kstat update within the network code. On the machine that I
am writing this on, that means 436 probes!

The variables passed in to these macros are

  • s – The address of the kstat structure
  • x – The name of the kstat variable to be increased
  • y – The amount by which the kstat will be increased

This produces a probe with

  • arg0 – The amount by which the kstat will be increased
  • arg1 – The value of the kstat before it is increased

Let’s look at an example of tracing a mib probe.

OK, this is a pretty contrived example. What it does is to dump the
kernel stack when we first increase the value of the rawipOutDatagrams
kstat and inform us of it’s current value and what it will be
incremented by. I fired this probe by pinging localhost.

User Space

We use a slightly different macro in user space

#define DTRACE_PROBE1(provider, name, arg1) {                          \
extern void __dtrace_##provider##___##name(unsigned long);     \
__dtrace_##provider##___##name((unsigned long)arg1);           \
}

The arguments are

  • provider – name of the provider (duh!)
  • name – name of the probe
  • arg1, … – the actual value the probe will report

Note that we don’t define the type. This is done differently in user
space and we’ll get to that shortly.

Let’s take a simple little program (helloworld.c)

#include <stdio.h>
#include <unistd.h>
int
main(int ac, char **av) {
int i;
for (i = 0 ; i < 5; i++) {
printf("Hello World\n");
sleep(2);
}
}

and compile and run it.

Which is pretty much as you would expect.

Let’s try adding a probe to this. Say we wanted to be able to monitor
the loop variable.

#include <stdio.h>
#include <unistd.h>
#include <sys/sdt.h>
int
main(int ac, char **av) {
int i;
for (i = 0 ; i < 10; i++) {
DTRACE_PROBE1(world, loop, i);
printf("Hello World\n");
sleep(2);
}
}

The additions to the code are those in green.
Basically We need to include <sys/sdt.h> and add the probe.

But wait, in user space there’s more.

We still need to define the types of the arguments and the stability
levels. This gets linked into the code later (as you will see in the
demonstration).

Let’s create msderv.d.

provider world {
probe loop(int);
};
#pragma D attributes Evolving/Evolving/Common provider world provider
#pragma D attributes Private/Private/Common provider world module
#pragma D attributes Private/Private/Common provider world function
#pragma D attributes Evolving/Evolving/Common provider world name
#pragma D attributes Evolving/Evolving/Common provider world args

The stuff in provider is relatively self explanatory and I’m not going to
go into the stability stuff here. Chapter 39 of the reference manual does
a good job of that.

In order to build the probes incorporating the provider description we
need another step in the build.

Let’s build and run it without enabling any probes first.

Exactly as before, which we should expect.

The options to the dtrace command deserve some explanation.

  • -32 – I don’t have an opteron notebook, I need a 32 bit object.

  • -G – Generate ELF files containing the embedded DTrace
    program. The probes specified in the files listed in the -s
    options are saved into these objects to be linked in to the binary.
    This also goes looking through for the probe points in the other
    objects and replaces them with nops so that the default is that the
    probes are disabled.
  • -s – As previously mentioned, the file associated with this
    argument is treated as a D program containing the declarations of the
    probe points.

Let’s look at both the counter and the first argument to printf.

i is now observable, but with no overhead unless we are
tracing it.

Conclusion

SDT probes are an easy way to make stuff visible.

The helloworld example was trivial, but, imagine being able to place
probes like this into large applications or drivers, sendmail might be
a good example in user space, Veritas Volume Manager might be a good
candidate for kernel.

We get observability without the need for

  • Seperate instrumented binaries
  • Restart
  • Reboot (in the case of drivers/kernel)

With next to no overhead if they are not being observed.

Having placed probes into kernel code for debugging purposes, I can
definitely recommend putting SDTs into your code to improve
observability and diagnosability.

Technorati Tags: ,

Advertisements

Written by Alan

June 5, 2005 at 5:22 pm

Posted in Solaris

3 Responses

Subscribe to comments with RSS.

  1. Alan,

    thanks, great article and the demos.
    How much (if any) overhead theese probes adds to the binary size?
    For example, in Java2D there’s hundreds of primitives which would benefit from adding probes for tracing arguments and other stuff. Most of those primitives are generated by our macros during build time, so dtrace probes could easily be added with just a few lines in a couple of header files, but I’m worried that it’d increase the binary size (which also affects startup time, etc).
    Thanks,
    Dmitri
    Java2D Team

    Dmitri Trembovetski

    June 6, 2005 at 10:28 pm

  2. In the demonstration code shown, it adds 5 nop instructions in the place of the probe on x86.
    The actual assembly is:

    $ mdb helloworld2 -
    > main::dis ! ~/target.pl
    main            pushl  %ebp
    main+1          movl   %esp,%ebp
    main+3          subl   $0x8,%esp
    main+6          movl   $0x0,-0x8(%ebp)
    main+0xd        movl   -0x8(%ebp),%eax
    main+0x10       cmpl   $0x5,%eax
    main+0x15       jge    +0x3e    <main+0x53>(2f)
    main+0x1b 1:    movl   -0x8(%ebp),%eax
    main+0x1e       pushl  %eax
    <font color=green>main+0x1f       nop
    main+0x20       nop
    main+0x21       nop
    main+0x22       nop
    main+0x23       nop</font>
    main+0x24       addl   $0x4,%esp
    main+0x27       pushl  $0x8050fdc
    main+0x2c       call   -0x50c   <PLT:printf>
    main+0x31       addl   $0x4,%esp
    main+0x34       pushl  $0x2
    main+0x36       call   -0x506   <PLT:sleep>
    main+0x3b       addl   $0x4,%esp
    main+0x3e       movl   -0x8(%ebp),%eax
    main+0x41       incl   %eax
    main+0x42       movl   %eax,-0x8(%ebp)
    main+0x45       movl   -0x8(%ebp),%eax
    main+0x48       cmpl   $0x5,%eax
    main+0x4d       jl     -0x32    <main+0x1b>(1b)
    main+0x53 2:    leave
    main+0x54       ret
    

    It depends on how many arguments you are sending to the probe.
    I should also mention that in this case the myserv.o code (from the myserv.d) is a grand total of 376 bytes long.
    Thanks for bringing that up, I’d forgotten that I wanted to mention it.
    Also thank you for commenting on this entry. I put a lot of work into it and I was starting to worry that no-one had looked at it. 🙂
    Oh, by the way <tt><font color=green>target.pl</font></tt> is a small perl script that I wrote some time ago to list branch targets given ::dis output. Very useful when trying to work out conditionals and loops.
    Alan.

    Alan Hargreaves

    June 6, 2005 at 10:49 pm

  3. Alan, thanks for the clarification on the size increase.
    I have another question: our rendering loops are very tight, and highly optimized (and typically compiled with maximum optimization level).
    Could adding a probe to such a loop have negative effect on performance even when the probe is not enabled?
    For example, suppose I want to trace some variable, which the compiler would’ve optimized away freeing up a register if it wasn’t for being traced by dtrace.

    Thanks,
    Dmitri
    Java2 Team

    Dmitri Trembovetski

    June 7, 2005 at 11:22 am


Comments are closed.

%d bloggers like this: