Archive for the ‘DTrace’ Category
ksh93 ENOEXEC errors solved
If you have been following my blogs on transitioning to an OpenSolaris SPARC based Sun Ray server (part 1 and part 2) you would have seen my frustration at the errors I was getting like:
make: sh: cannot execute [Exec format error] make: sh: cannot execute [Exec format error] make: sh: cannot execute [Exec format error] sh: make: function not defined *** Error code 126
I did some poking around today. I noticed that the stack in the writing of that error message looked like:
libc.so.1`__write+0x8
libast.so.1`_sfflsbuf+0x24c
libast.so.1`sfwrite+0x3e0
libast.so.1`errorv+0xc34
libast.so.1`errormsg+0x1c
libshell.so.1`sh_lex+0x408
libshell.so.1`term+0x34
libshell.so.1`list+0x8
libshell.so.1`sh_cmd+0x18
libshell.so.1`sh_parse+0x2c0
libshell.so.1`sh_eval+0x1c8
libshell.so.1`funload+0x188
libshell.so.1`path_absolute+0x584
libshell.so.1`path_search+0x2b4
libshell.so.1`sh_exec+0xddc
libshell.so.1`sh_subshell+0x5c8
libshell.so.1`comsubst+0x998
libshell.so.1`varsub+0x4f8
libshell.so.1`copyto+0x8dc
libshell.so.1`sh_mactrim+0x10c
It’s trying to treat make as if it were a shell function. Huh? of course that’s not going to work.
Inside path_absolute in libshell I found the following:
767 isfun = (pp->flags&PATH_FPATH);
Further down this is checked and if it is, the file is opened to be read as a shell script.
ding!
Heading to the man page we see:
FPATH The search path for function definitions. The
directories in this path are searched for a
file with the same name as the function or
command when a function with the -u attribute
is referenced and when a command is not found.
If an executable file with the name of that
command is found, then it is read and executed
in the current environment. Unlike PATH, the
current directory must be represented expli-
citly by dot (.) rather than by adjacent colon
(:) characters or a beginning or ending colon
(:).
Now if I add in the fact that I construct my $PATH using $FPATH and $LPATH (among others) a light begins to dawn.
My $FPATH actually contained almost the entirety of my normal path. So, ksh93 (which is also /bin/sh on OpenSolaris) was assuming that anything in any of those directories was a script to be read and parsed, without having to check.
I changed the names of those variables and everything there is now working as it should.
DTrace helping to get to the bottom of load spikes
For a while now I have had a customer call that has been causing me a lot of pain to diagnose.
The symptom is that occasionally they will see the load average spike to 500ish and then decay back to normal. There is no other performance effect on the system.
The latest approach that I had been taking was to attempt to get DTrace to watch when a thread was put onto a dispatch queue and if that dispatch queue had more than a threshold number of threads on it, then to run mdb to get a list of the dispatch queues (echo ::cpulist -l | mdb -l).
Turns out that even this was too slow and the threads had been dealt with by the time I got the output.
OK this actually tells us something. These threads are being placed onto the dispatch queue incredibly quickly and also going to cpu and being dealt with incredibly quickly.
At this stage it occurred to me that it was too fast for the threads to be created. It had to be some kind of wakeup of a lot of threads, like a cv_broadcast(9F).
So I did up this script and asked the customer to run it.
#!/usr/sbin/dtrace -s
#pragma D option quiet
/*
* This D script will notify when any one process wakes up more
* than threshold threads currently sleeping on a synchronisation
* object.
*/
int threshold;
BEGIN {
threshold = 20;
printf("Monitoring, ...\n");
}
fbt::sleepq_wakeall_chan:entry {
self->wakeups = 0;
self->interest = 1;
}
wakeup /self->interest/ {
self->wakeups++;
}
fbt::sleepq_wakeall_chan:return /self->interest && self->wakeup > threshold/ {
printf("\n%Y %s [%d]: woke up %d threads",
walltimestamp, execname, pid, self->wakeups);
stack(20);
}
fbt::sleepq_wakeall_chan:return /self->interest/ {
self->interest = self->wakeups = 0;
}
We got immediate results. An excerpt of the output was:
2010 Mar 29 14:10:20 oracle [3102]: woke up 472 threads
genunix`cv_broadcast+0x44
zfs`zil_commit+0x78
zfs`zfs_write+0x5d0
genunix`fop_write+0x20
genunix`pwrite+0x22c
unix`syscall_trap+0xac
2010 Mar 29 14:10:20 oracle [3100]: woke up 94 threads
genunix`cv_broadcast+0x44
zfs`zil_commit+0x78
zfs`zfs_write+0x5d0
genunix`fop_write+0x20
genunix`pwrite+0x22c
unix`syscall_trap+0xac
2010 Mar 29 14:10:20 oracle [3102]: woke up 112 threads
genunix`cv_broadcast+0x44
zfs`zil_commit+0x78
zfs`zfs_write+0x5d0
genunix`fop_write+0x20
genunix`pwrite+0x22c
unix`syscall_trap+0xac
2010 Mar 29 14:10:20 oracle [3102]: woke up 59 threads
genunix`cv_broadcast+0x44
zfs`zil_commit+0x78
zfs`zfs_write+0x5d0
genunix`fop_write+0x20
genunix`pwrite+0x22c
unix`syscall_trap+0xac
2010 Mar 29 14:10:20 oracle [3100]: woke up 571 threads
genunix`cv_broadcast+0x44
zfs`zil_commit+0x78
zfs`zfs_write+0x5d0
genunix`fop_write+0x20
genunix`pwrite+0x22c
unix`syscall_trap+0xac
What we have here is Oracle doing a LOT of parallel writes at one time. The system is actually behaving normally and processing things quickly, the only issue is that we occasionally see load average spike to large numbers then decay back to normal. Perhaps a better way to monitor the system activity might be to look at the ‘r’ column in vmstat which shows snapshots of how many threads are in dispatch queues.
So now we know what is going on, we can actually move on.
