Alan Hargreaves' Blog

The ramblings of an Australian SaND TSC* Principal Field Technologist

An Interoperability Problem (WebLogic, Java and Solaris)

Last Friday I got pulled in to a very hot customer call.

The issue was best summarised as

Since migrating our WebLogic and database services from AIX to
Solaris, at random times we are seeing the the WebLogic server
pause for a few minutes at a time. This takes down the back
office and client services part of the business for these
periods and is causing increasing frustration on the part of
staff and customers. This only occurs when a particular module
is enabled in WebLogic.

Before going on, I should note that there were other parts to this call which required option changes to WebLogic that also addressed major performance issues (such as the field iPads timing out talking to the WebLogic service) that were being seen, but it was these pauses that were the great concern to the customer.

I’d been given data from a GUDS run which initially made me concerned that we were having pstack(1M) run on the WebLogic java service. Pstack will stop the process while it walks all of the thread stacks. This could certainly have a nasty effect on accessibility to the service.

Unfortunately it was not to be that simple. The pstack collection was actually part of the data gathering process that the WebLogic folks were running. A great example of the Heisenberg Effect while looking at a problem. The effect of this data gathering masked out the possibility of seeing anything else.

I should also mention that in order to keep business running that the customer had disabled the particular module, so we were very limited in when we could get it enabled. Data gathering also required the them to send someone out to site with an iPad (which was the field interface that seemed to be an enabler of the problem). We were pretty much getting one shot at data gathering in any given 24 hour period.

The next day we gathered data with the pstack command commented out.

This was a little more interesting; however, the small amount of time that the issue was present and the fact that we were only gathering small lockstat profiles meant that it was difficult to pin anything down as we were playing hit and miss for us to be taking a profile when the issue was apparent. I did notice that we seemed to be spending more time in page-faulting than I would have expected (about 25% of available cpu at one point!), and about half of that time was being spent spinning on a cross call mutex to flush the newly mapped addresses from all other cpu caches.

With the data from the next days run I also noticed that we had the kflt_user_evict() thread also fighting for the same mutex. My thought at this time was to disable that thread, and for good measure also disable page coalescing by adding the following lines to /etc/system and rebooting.

set kflt_disable=1
set mpss_coalesce_disable=1
set pg_contig_disable=1

It still felt like we were looking at addressing symptoms but not the cause.

We had our breakthrough on Tuesday when we got the following update from the customer:

The iPad transaction which triggers this issue has a label
printing as part of the transaction. The application uses the
Solaris print spooling mechanism to spool prints to a label
printer.  The application code spawns a lp process to do this
spooling. The code is used is something like below:

Runtime.getRuntime().exec("lpr -P <destination> <file-name>");

We have observed that the CPU sys% spike behaviour seems not to
occur once we have disabled this print spooling functionality.
Is there any known issues in Oracle Java 1.7 with spawning
multiple processes from within the JVM ? Note that this
functionality as such has always been working fine on an IBM JVM
on AIX.

This suddenly made the page-faulting make sense.

On chasing up the Java folks, it looks like the default mechanism for this kind of operation is fork()/exec().

Now, fork will completely clone the address space of the parent process. This is what was causing all of the page-faults. The WebLogic Java process had a huge memory footprint and more than 600 threads.

Further discussion with the Java folks revealed that there was an option in the later Java versions that we could have them use to force Java to use posix_spawn() rather than fork/exec, which would stop the address space duplication. Customer needed to start Java with the option:


They implemented this along with the other application changes and it looks to have been running acceptably now for a few days.

The hardest part of this call was the fact that without any one of the support groups looking at this (WebLogic, Java and Solaris), it is a virtual certainty that we would not have gotten to root cause and found a solution.

Well done everyone.

Written by Alan

August 28, 2015 at 10:08 am

Posted in Uncategorized

Quick and Dirty iSCSI between Solaris 11.1 targets and a Solaris 10 Initiator

I recently found myself with a support request to do some research involving looking at the results of removing vdevs from a pool in a recoverable way while doing operations on the pool.

My initial thought was to make the disk devices available to a guest ldom from a control ldom, but I found that Solaris and LDOMS coupled things too tightly for me to do something which had the potential to cause damage.

After a bit of thought, I realised that I also had two Solaris machines already configured in our dynamic lab set up based in the UK that I could use to create some iSCSI targets that could be made available to the guest domain that I’d already built. I needed to use two hosts to provide the targets as for reasons that I really don’t need to go in to, I wanted an easy way to make them progressively unavailable in such a way that I could make them available again. Using two hosts meant that I could do this with shutdown/boot.

The tricky part is that the ldom I wanted to test on was running Solaris 10 and the two target machines were running Solaris 11.1

I needed to reference the following documents

The boxes

Name Address Location> Solaris Release
target1 UK Solaris 11.1
target2 UK Solaris 11.1
initiator Australia Solaris 10

Setting up target1

Install the iSCSI packages

target1# pkg install group/feature/storage-server
target1# svcadm enable stmf


Create a small pool. Use a file as we don’t have any extra disk attached to the machine and we really don’t need much and then make a small volume.

target1# mkfile 4g /var/tmp/iscsi
target1# zpool create iscsi /var/tmp/iscsi
target1# zfs create -V 1g iscsi/vol0


Make it available as an iSCSI target. Take note of the target name, we’ll need that later.

target1# stmfadm create-lu /dev/zvol/rdsk/iscsi/vol0 
Logical unit created: 600144F000144FF8C1F0556D55660001
target1# stmfadm list-lu
LU Name: 600144F000144FF8C1F0556D55660001
target1# stmfadm add-view 600144F000144FF8C1F0556D55660001
target1# stmfadm list-view -l 600144F000144FF8C1F0556D55660001
target1# svcadm enable -r svc:/network/iscsi/target:default
target1# svcs -l iscsi/target
fmri         svc:/network/iscsi/target:default
name         iscsi target
enabled      true
state        online
next_state   none
state_time   Tue Jun 02 08:06:29 2015
logfile      /var/svc/log/network-iscsi-target:default.log
restarter    svc:/system/svc/restarter:default
manifest     /lib/svc/manifest/network/iscsi/iscsi-target.xml
dependency   require_any/error svc:/milestone/network (online)
dependency   require_all/none svc:/system/stmf:default (online)
target1# itadm create-target
Target successfully created
target1# itadm list-target -v
TARGET NAME                                                  STATE    SESSIONS  online   0        
        alias:                  -
        auth:                   none (defaults)
        targetchapuser:         -
        targetchapsecret:       unset
        tpg-tags:               default

Setting up target2

Pretty much the same as what we just did on target1.

Install the iSCSI packages

target2# pkg install group/feature/storage-server
target2# svcadm enable stmf


Create a small pool. Use a file as we don’t have any extra disk attached to the machine and we really don’t need much and then make a small volume.

target2# mkfile 4g /var/tmp/iscsi
target2# zpool create iscsi /var/tmp/iscsi
target2# zfs create -V 1g iscsi/vol0


Make it available as an iSCSI target. Take note of the target name, we’ll need that later.

target2# stmfadm create-lu /dev/zvol/rdsk/iscsi/vol0
Logical unit created: 600144F000144FFB7899556D5B750001
target2# stmfadm add-view 600144F000144FFB7899556D5B750001
target2# stmfadm list-view -l 600144F000144FFB7899556D5B750001
View Entry: 0
    Host group   : All
    Target Group : All
    LUN          : Auto
target2# svcadm enable -r svc:/network/iscsi/target:default
target2# svcs -l iscsi/target
fmri         svc:/network/iscsi/target:default
name         iscsi target
enabled      true
state        online
next_state   none
state_time   Tue Jun 02 08:31:01 2015
logfile      /var/svc/log/network-iscsi-target:default.log
restarter    svc:/system/svc/restarter:default
manifest     /lib/svc/manifest/network/iscsi/iscsi-target.xml
dependency   require_any/error svc:/milestone/network (online)
dependency   require_all/none svc:/system/stmf:default (online)
target2# itadm create-target
Target successfully created
target2# itadm list-target -v
TARGET NAME                                                  STATE    SESSIONS  online   0        
        alias:                  -
        auth:                   none (defaults)
        targetchapuser:         -
        targetchapsecret:       unset
        tpg-tags:               default

Setting up initiator

Now make them statically available on the initiator. Note that we use the Target Names we got from the last name of the earlier setups. We also need to provide the IP address of the machine hosting the target as we are attaching them statically for simplicity.

initiator# iscsiadm add static-config,
initiator# iscsiadm add static-config,
initiator# iscsiadm modify discovery --static enable


Now we need to get the device nodes created.

initiator# devfsadm -i iscsi
initiator# format < /dev/null
Searching for disks...done

c1t600144F000144FF8C1F0556D55660001d0: configured with capacity of 1023.75MB
c1t600144F000144FFB7899556D5B750001d0: configured with capacity of 1023.75MB

0. c0d0
1. c0d1
2. c0d2
3. c1t600144F000144FF8C1F0556D55660001d0
4. c1t600144F000144FFB7899556D5B750001d0
Specify disk (enter its number):


Great, we’ve found them. Let’s make a mirrored pool.

initiator# zpool create tpool mirror c1t600144F000144FF8C1F0556D55660001d0 c1t600144F000144FFB7899556D5B750001d0
initiator# zpool status -v tpool
  pool: tpool
 state: ONLINE
 scan: none requested
tpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t600144F000144FF8C1F0556D55660001d0 ONLINE 0 0 0
c1t600144F000144FFB7899556D5B750001d0 ONLINE 0 0 0

errors: No known data errors

I was then in a position to go and do the testing that I needed to do.

Written by Alan

June 3, 2015 at 12:12 pm

Posted in Uncategorized

Why you should Patch NTP

This story about massive DDoS attacks using monlist as a threat vector give an excellent reason as to why you should apply the patches listed on the Sun Security Blog for NTP.

Written by Alan

June 13, 2014 at 10:46 am

Posted in Uncategorized

Who is renicing these processes?

I was helping out a colleague on such a call this morning.  While the DTrace script I produced was not helpful in this actual case, I think it bear sharing, anyway.

What we wanted was a way to find out why various processes were running with nice set to -20.  There are two ways in which a process can have its nice changed.

  • nice(2) – where it changes itself
  • priocntl(2) – where something else changes it

I ended up with the following script after a bit of poking around.

# dtrace -n '
syscall::nice:entry {
        printf("[%d] %s calling nice(%d)", pid, execname, arg0);}
syscall::priocntlsys:entry /arg2 == 6/ {
        this->n = (pcnice_t *)copyin(arg3, sizeof(struct pnice));
        this->id = (procset_t *)copyin(arg1, sizeof(struct procset));
        printf("[%d] %s renicing %d by %d",
            pid, execname, this->id->p_lid, this->n->pc_val); }'


There is an assumption in there about p_lid being the PID that I want, but in this particular case it turns out to be ok. Matching arg2 against 6 is so that we only get priocntl() calls with the command PC_DONICE. I could have also had it check the pcnice_t->pc_op, but I can put up with the extra output.

So what happens when we have this running and then try something like

# renice -20 4147
dtrace: description 'syscall::nice:entry ' matched 2 probes
 0 508 priocntlsys:entry [4179] renice renicing 4147 by 0
 0 508 priocntlsys:entry [4179] renice renicing 4147 by -20

Which is exactly what we wanted. We see the renice command (pid 4179) modifying pid 4179.

Oh, why didn’t this help I hear you ask?

Turns out that in this instance, the process in question was being started by init from /etc/inittab, as such starting with nice set to whatever init is running at. In this case it is -20.

Written by Alan

December 23, 2013 at 10:39 am

Posted in Uncategorized

A Solaris tmpfs uses real memory

That title may sound a little self explanatory and obvious, but over the last two weeks  I have had two customers tell me flat out that /tmp uses swap and that I should still continue to investigate where their memory is being used.

This is likely because when you define /tmp in /etc/vfstab, you list the device being used as swap.

In the context of a tmpfs, swap means physical memory + physical swap. A tmpfs uses pageable kernel memory. This means that it will use kernel memory, but if required these pages can be paged to the swap device. Indeed if you put more data onto a tmpfs than you have physical memory, this is pretty much guaranteed.

If you are still not convinced try the following.

  1. In one window start up the command
    $ vmstat 2
  2. In another window make a 1gb file in /tmp.
    $ mkfile 1g /tmp/testfile
  3. Watch what happens in the free memory column in the vmstat.

There seems to be a misconception amongst some that a tmpfs is a way of stealing some of the disk we have allocated as swap to use as a filesystem without impacting memory. I’m sorry, this is not the case.


Written by Alan

February 28, 2013 at 9:38 am

Posted in Uncategorized

Using /etc/system on Solaris

I had cause to be reminded of this article I wrote for on#sun almost ten years ago and just noticed that I had not transferred it to my blog.

/etc/system is a file that is read just before the root filesystem is mounted. It contains directives to the kernel about configuring the system. Going into depth on this topic could span multiple books so I’m just going to give some pointers and suggestions here.

Warning, Danger Will Robinson

Settings can affect initial array and structure allocation, indeed such things as module load path and where the root directory actually resides.

It is possible to render your system unbootable if you are not careful. If this happens you might try booting with the ‘-a’ option where you get the choice to tell the system to not load /etc/system.

Just because you find a set of values works well on one system does  not necessarily mean that they will work properly on another. This is especially true if we are looking at different releases of the operating system, or different hardware.

You will need to reboot your system before these new values will take effect.

The basic actions that can be taken are outlined in the comments of the file itself so I won’t go into them here.

The most common action is to set a value. Any number of products make suggestions for settings in here (eg Oracle, Veritas Volume Manager and Filesystem to name a few). Setting a value overrides the system default.

A practice that I make when working on this file is to place a comment explaining why and when I make a particular setting  (remember that a comment in this file is prefixed by a ‘*’, not a ‘#’). This is useful later down the track when I may have to upgrade a system. It could be that the setting may actually not have the desired effect and it would be good to know why we originally did it.

I harp on this point but it is important.

Just because settings work on one machine does not make them directly transferable to another.

For example

set lotsfree=1024

This tells the kernel not to start running the page scanner (to start paging out memory to disc) until free memory drops below 8mb (1024 x 8k blocks). While this setting may be fine on a machine with around 512mb of memory, it does not make sense for a machine with 10gb. Indeed if the machine is under memory pressure, by the time we get down to 8mb of free memory, we have very little breathing space to try to recover before requiring memory. The end result being a system that grinds to a halt until it can free up some resources.

Oracle makes available the Solaris Tunable Parameters guide as a part of the documentation for each release of Solaris. It gives information about the default values and the uses of a lot of system parameters.

Written by Alan

January 22, 2013 at 10:22 am

Posted in Solaris, Work

The Importance of Fully Specifying a Problem

I had a customer call this week where we were provided a forced crashdump and asked to determine why the system was hung.

Normally when you are looking at a hung system, you will find a lot of threads blocked on various locks, and most likely very little actually running on the system (unless it’s threads spinning on busy wait type locks).

This vmcore showed none of that. In fact we were seeing hundreds of threads actively on cpu in the second before the dump was forced.

This prompted the question back to the customer:

What exactly were you seeing that made you believe that the system was hung?

It took a few days to get a response, but the response that I got back was that they were not able to ssh into the system and when they tried to login to the console, they got the login prompt, but after typing “root” and hitting return, the console was no longer responsive.

This description puts a whole new light on the “hang”. You immediately start thinking “name services”.

Looking at the crashdump, yes the sshds are all in door calls to nscd, and nscd is idle waiting on responses from the network.

Looking at the connections I see a lot of connections to the secure ldap port in CLOSE_WAIT, but more interestingly I am seeing a few connections over the non-secure ldap port to a different LDAP server just sitting open.

My feeling at this point is that we have an either non-responding LDAP server, or one that is responding slowly, the resolution being to investigate that server.


When you log a service ticket for a “system hang”, it’s great to get the forced crashdump first up, but it’s even better to get a description of what you observed to make to believe that the system was hung.

Written by Alan

June 3, 2012 at 9:19 am

Posted in Solaris, Uncategorized, Work


Get every new post delivered to your Inbox.