Alan Hargreaves' Blog

The ramblings of an Australian SaND TSC* Principal Field Technologist

DTracing a problem from Andrew Tridgell

Andrew Tridgell wandered over to our stand at LinuxWorld on Tuesday to ask about DTrace and look at some problems that he was having.

We uploaded the source to one of his performance tests, built it and started to have a look. The test in question is lock_scaling. To get a benchmark, he ran this on his notebook with an argument of 1, and I did likewise.

We saw (I think) about 200,000 locks per second on his. He looked over at mine and expressed surprise (definately an understatement) that I was getting about half a million per second on my ferrari.

Further testing showed, though that as we added zeros to the number of locks, his notebook (running a 2.6.? linux) degraded better than mine (running opensolaris b36 with non-debug bits).

We did a little poking around with DTrace to get a feel for it but then he had to run off to give a presentation.

I said I’d have a look at it if time permitted.

OK, Time is now permitting.

He believes that the issue is related to lock creation as the number of locks increase. Let’s have a look at that.

dtrace -q -n '
syscall::fcntl:entry /execname == "lock_scaling"/ { self->start = vtimestamp; }
syscall::fcntl:return /self->start/ {
@[probefunc] = avg((vtimestamp - self->start)/1000);
self->start = 0; }
tick-5s { printa(@); clear(@); }'

What this is doing is to look at the average time spent in fcntl() every five seconds. We’ll kick this script off and then run lock_scaling 20000

fcntl                                                           600
fcntl                                                          2261
fcntl                                                          2845
fcntl                                                          3323
fcntl                                                          3737
fcntl                                                          4111
fcntl                                                          4453
fcntl                                                          4770
fcntl                                                          5066
fcntl                                                          5372
fcntl                                                          5610
fcntl                                                          5863

After this it settles back to around 4.3ms as the test runs. It certainly does look like during the lock creation, the more we create, the longer the creation time. For me, 5.8ms inside fcntl() certainly seems excessive.

Andrew’s program sets a write lock on each byte in a file up to the argument. In the case we were looking at, we were locking 20000 bytes (with 20000 locks) and then toggling byte 20001. Let’s try something slightly different here while the program is running.

#!/usr/sbin/dtrace -Fs
syscall::fcntl:entry /execname == "lock_scaling"/ { self->interest = 1; }
syscall::fcntl:return /execname == "lock_scaling"/ { self->interest = 0; exit(0);}
fbt:::entry /self->interest/ {
@[probefunc] = count();

For the sake of reference, we’ll call thois script “t2.d”.

What this code does is to set a thread local variable to say that we are interested in a thread inside an executable called “lock_scaling”, in particular when it makes the fcntl system call. We then enable all fbt probes for this thread. We then count the number of times that each function is called. The output is sorted numerically. The last few lines read:

apic_intr_enter                                                   6
bsrw_insn                                                        10
dosoftint_prolog                                                 10
apic_setspl                                                      20
get_apic_pri                                                     20
psm_get_cpu_id                                                   26
flk_relation                                                  20001

Holy crap. In this call to fcntl(), we see 20001 calls to flk_relation. Given that we are dealing with the 20001st lock, this is ominous. Let’s have a look at the code that calls flk_relation.

1030          do {
1031                  lock1 = lock->l_next;
1032                  if (SAME_OWNER(request, lock)) {
1033                          done_searching = flk_relation(lock, request);
1034                  }
1035                  lock = lock1;
1036          } while (lock->l_vnode == vp && !done_searching);

Oh my. It looks like we are doing a linear list traversal through the entire list to find the last lock, each time we call fcntl()

I am thinking that this would perform an awful lot better as an avl tree. I’ll be logging a bug on this.

Technorati Tags:


Written by Alan

March 28, 2006 at 4:33 pm

Posted in OpenSolaris

2 Responses

Subscribe to comments with RSS.

  1. Sounds like this could help with the range locking of the tdb stuff in samba.


    March 29, 2006 at 2:17 am

  2. Thanks for the cool post Alan. I always enjoy reading about your DTrace adventures!


    March 29, 2006 at 7:36 am

Comments are closed.

%d bloggers like this: