Alan Hargreaves' Blog

The ramblings of an Australian SaND TSC* Principal Field Technologist has moved (and changed address)

Over the last couple of hours the physical location of the server changed. The benefit is that the machine is now in the same building as the machines that we use to analyse your uploads, so getting the data onto those machines is now substantially faster.

What do I have to do to take advantage of this?

If you are using the DNS to look it up, then nothing, the DNS has changed over to using the new address. However, if you are using the IP address, you need to start using the new one. We are still uploading from the old server for the moment, but it is a substantially slower link. The new address is

Written by Alan

September 27, 2011 at 8:29 pm

Posted in Uncategorized

What are these door things?

I recently had cause to pass on an article that I wrote for the now defunct Australian Sun Customer magazine (On#Sun) on the subject of doors. It occurred to me that I really should put this on the blog. Hopefully this will give some insight as to why I think doors are really cool.

Where does this door go?

If you have had a glance through /etc you may have come across some files with door in their name. You may also have noticed calls to door functions if you have run truss over commands that interact with the name resolver routines or password entry lookup.

The Basic Idea (an example)

Imagine that you have an application that does two things. First, it provides lookup function into a potentially slow database (e.g. the DNS). Second, it caches the results to minimise having to make the slower calls.

There are already a number of ways that we could call the cached lookup function from a client (e.g. RPCs & sockets), but these require that we give up the cpu and wait for a response from another process. Even for a potentially fast operation, it could be some time
before the client is next scheduled. Wouldn’t it be nice if we could complete the operation within our time slice? Well, this is what the door interface accomplishes.

The Server

When you initialise a door server, a number of threads are made available to run a particular function within the server. I’ll call this function the door function. These threads are created as if they had made a call to door_return() from within the door function. The server will associate a file and an open file descriptor with this function.

The Client

When the client initialises, it opens the door file and specifies the file descriptor when it calls door_call(), along with some buffers for arguments and return values. The kernel uses this file descriptor to work out how to call the door function in the server.

At this point the kernel gets a little clever. Execution is transferred directly to an idle door thread in the server process, which runs as if the door function had been called with the arguments that the client specified. As it runs in the server context, it has access to all of the
global variables and other functions available to that process. When the door function is complete, instead of using return(), it calls door_return(). Execution is transferred back to the client with the result returned in a buffer we passed door_call(). The server thread is left sleeping in door_return().

If we did not have to give up the CPU in the door function, then we have just gained a major speed increase. If we did have to give it up, then we didn’t really lose anything, as the overhead is only small.

This is how services such as the name service cache daemon (nscd) work. Library functions such as gethostbyname(), getpwent() and indeed any call whose behaviour is defined in /etc/nsswitch.conf are implemented with door calls to nscd. Syslog also uses this interface so that processes are not slowed down substantially because of syslog calls. The door function simply places the request in a queue (a fast operation) for another syslog thread to look after and then calls door_return()
(that’s actually not how syslog uses it).

For further information see the section 9 man pages on door_create, door_info, door_return and door_call.

Written by Alan

August 1, 2011 at 5:21 pm

I have a performance problem

So start 95% of the performance calls that I receive. They usually continue something like:

I have gathered some *stat data for you (eg the guds tool from Document 1285485.1), can you please root cause our problem?

So, do you think you could?

Neither can I, based on this my answer inevitably has to be “No”.

Given this kind of problem statement, I have no idea about the expectations, the boundary conditions, or even the application. The answer may as well be “Performance problems? Consult your local Doctor for Viagra”. It’s really not a lot to go on.

So, What kind of problem description is going to allow me to start work on the issue that is being seen? I don’t doubt that there really is an issue, it just needs to be pinned down somewhat.

What behavior exactly are you expecting to see?

Be specific and use business metrics. For example “run-time”, “response-time” and “throughput”.

This helps us define exit criteria.

Now, let’s look at the system that is having problems.

How is what you are seeing different? Use the same type of metrics.

The answers to these two questions take us a long way towards being able to work a call.

Even more helpful are answers to questions like

Has this system ever worked to expectation?

If so, when did it start exhibiting this behavior?

Is the problem always present, or does it sometimes work to expectation?

If it sometimes works to expectation, when are you seeing the problem? Is there any discernible pattern?

Is the impact of the problem getting better, worse, or remaining constant?

What kind of differences are there between when the system was performing to expectation and when it is not?

Are there other machines where we could expect to see the same issue (eg similar usage and load), but are not? Again, differences?

Once we start to gather information like this we start to build up a much clearer picture of exactly what we need to investigate, and what we need to achieve so that both you and me agree that the problem has been solved.

Please help get that figure of poorly defined problem statements down from its current 95% value.

Written by Alan

June 27, 2011 at 6:59 pm

Posted in Solaris, Work

Thunderbird imapd and OpenSSL 1.0

I upgraded my internal Solaris 11 build last night and this morning noticed that I was getting error popups from thunderbird like:

SSL received a record that exceeded the maximum permissible length.

Searching the web didn’t help me a lot except for this one which suggested that I try telneting to port 993 on the server to see what it looked like.

When I did this and saw a complaint about imapd not being able to open that I twigged that this must have been the build that we went to openssl 1.0 on.

This meant that I needed to rebuild imapd. Well I already have done most of the work here here.

The sad thing was it looks like something else changed and some structure elements have names different to what imapd was expecting in a (DIR *).

Adding -D__USE_LEGACY_PROTOTYPES__ to the EXTRACFLAGS macro in the top level Makefile allowed the build to complete. After putting the new binary into place, thunderbird is happy talking to this server again.

Update #1

I also needed to rebuild proxytunnel. I think that’s all that I had that linked against libssl.0.9.8.

Written by Alan

June 15, 2011 at 10:49 am

Sun Ray on Solaris 11 SPARC

After an experience I had yesterday, I need to say a little more than I did at Nevada to OpenSolaris Sun Ray on SPARC (part 5 – Sun Ray Server 4.2).

It seems that I missed something.

Part of the configuration that is done at install time sets up a small LDAP server, but instead of pointing at localhost, it points at the machine name. In general this is not a problem. Unfortunately as I moved the disk image from one machine to another, changing the host information, I didn’t realise that it was still talking to the server on my lab machine that I had used to build the image.

This was not a problem until the other night when someone else booked that machine and installed something else on it. All of a sudden I could no longer get access to my Sun Ray sessions.

I spent a while trying to address the problem, but didn’t get very far (probably because I don’t have a lot of skills in the Sun Ray area).

I had noticed some blog postings about a new release of Sun Ray software out (5.2) that includes the 4.3 Sun Ray Server software in it that I had been hearing some good things about with regards to Solaris 11.

I figured it was time to bite the bullet.

The first thing to do was to clone myself another boot environment so that if it did go really badly wrong I could go back and attempt to recover from the current broken point.

# beadm create Solaris11-sr5.2
# beadm activate Solaris11-sr5.2

Have to love ZFS root for instant clones.

I then rebooted into that new boot environment and removed the 4.2 software (I found the instructions for this are in the installation guide for 4.2).

# cd /opt/SUNWut/sbin
# ./utconfig -u
# cd /
# /opt/SUNWut/sbin/utinstall -u

Well that was pretty painless.

I had previously downloaded and unzipped the software so all I needed to do now was to run

# ./utsetup

and pretty much accept the defaults. This was an incredibly painless install in comparison to installing the previous version (well done folks), although in hindsight I should have stuck to the defaults a little more closely than I did as I found that I couldn’t get the DTU to connect, indeed it would either hang actually reboot the DTU.

Looking in /var/opt/SUNWut/log/messages, I saw the following

May 26 22:29:23 vesvi utauthd: [ID 355619] WatchIO UNEXPECTED: Connection from is not allowed
May 26 22:29:23 vesvi utauthd: [ID 572381] WatchIO UNEXPECTED: protocolError: networkNotAllowed
May 26 22:29:23 vesvi utauthd: [ID 303596] WatchIO UNEXPECTED: WatchIO.doRemove(null)

and it suddenly twigged that I’d answered the allow LAN connections question wrong.

Unfortunately I found that I can’t use utadm to fix this as I don’t have the DHCP packages installed on this machine (I have to see if there is a bug logged on that), but if you look at my previous writeup I had to address exactly this before. You have to make allowLANConnections true in /etc/opt/SUNWut/auth.props

# Allow LAN Connections
#       This parameter enforces the policy that only terminals on the
#       private Sunray interconnect can attach to the server. Connection
#       attempts from other network interfaces, including the local loopback
#       interface, will be rejected.
allowLANConnections = true

Doing a cold restart of the software allowed me to start using my Sun Ray at home again

# /opt/SUNWut/sbin/utrestart -c

Written by Alan

May 27, 2011 at 1:16 pm

Making audio default to a second sound device in Solaris 11

It finally got to me. I’ve got a nice USB audio adapter that I use at home on my Tecra M11, but I was only ever able to get firefox to use the builtin audio on Solaris 11. I could make it work under Virtual Box by importing it, but I have a nice sound setup in my office and I really wanted to use the Roland/Cakewalk UA-1G natively.

Searching the web found me lots of people asking the question and nothing in the way of answers.

I’d already tried

# cd /dev
# rm audio audioctl
# ln -s sound/1 audio
# ln -s sound/1ctl audioctl

but flash was still playing through the internal speakers.

The answer came when I ran pfiles on the firefox-bin process, I noticed that it had the dsp device for the internal audio controller open.

What I had forgotten was

# rm dsp
# ln -s dsp1 dsp

I went and started a youtube video and had to immediately halt it as the volume through the other device had been set WAY too high, but yea that’s all it took.

The creation of a script called audio that takes an argument of the device is then trivial, and left as an exercise for the reader (yes I’ve already written one).

Written by Alan

April 17, 2011 at 12:01 pm

A plea to security auditors

When you give your customers the list of “vulnerabilities” to take up with their vendor, can you please make sure of a couple of things?

  1. Actually identify the security vulnerability with a reference so we don’t have to try to interpret your vague description of it (a pointer to one of the sites that reports security vulnerabilities isn’t that hard is it?)
  2. Verify that the system really is vulnerable. As I pointed out in an earlier blog, looking at the version label is not always enough to say that a version is vulnerable. Let alone the fact that sometimes even the best of tools get false positives.

One call I have been dealing with over the last few days identified that a customer was vulnerable to five different items. After working out what was really meant by three of them I was able to determine that they were vulnerabilities that we put patches out for back in 2003 and the customer had patches on the system that included these fixes. If the scanner software had probed the vulnerability it would have seen the product in question safe. Of the other two, “rexec” was commented out of /etc/inetd.conf and netstat -a showed nothing listening on port 512, and they actually did still have rshd running, which they needed to turn off.

Because of the vagueness of the descriptions I was given I had to spend quite some time researching three of those vulnerabilities to find exactly what they meant (not helped by how old they were).

You can probably imagine how pleased I was at having to spend time doing this research when I have other calls in my queue that really also needed attention, only to find out that it could all have been avoided.

Written by Alan

March 18, 2011 at 3:01 pm

Posted in Security, Solaris