Alan Hargreaves' Blog

The ramblings of an Australian SaND TSC* Principal Field Technologist

A plea to help support help you!

This week, I am the duty engineer for Kernel calls within the PTS Organisation during Asia-Pacific coverage hours.

Yesterday I received a callout on an escalation that had gone around the world a number of times and been in existence since early July. The guts of the issue (as we had been told) was that the customer was setting the quotas of new usrs and the amount of disc used was being initialised to a rather large non-zero value.

There had been an awful lot of work done to try to determine where the values were initialised both in edquota and in the kernel ioctl() – which is why the Kernel group was involved.

Anyway, the news that came to light yesterday was that the customer was not using edquota. They were using some code that they had written themselves. The actual code itself looked fine. What was wrong was an assumption that they had made. That is that variables allocated on the stack are zero-filled.

They are not.

They contain the data that happened to be on the stack at the time the function was called. So, the bad value that was being written into the quota ioctl() was the result of non-initialised data.

The point of this plea is simple.

Once we had all of the relevant data, this call was solved within hours. If this information had been passed to us when the call had been opened, it would have been closed just as quickly – at the beginning of July, rather than the end of August.

It is crititcal when you log support calls with us to give us the full picture; in this call we had been working under the assumption that the problem was being exhibited using edquota, and a lot of man-weeks were effectively wasted and the customer did not see a solution until almost two months after the call had been placed.

We are currently pushing a process called Sun Global Resolution towards our customer facing folk. This process is based around the Kepner-Traego Analytical Troubleshooting process. Part of the beginning of this process is to really define the expected behaviour and deviation, along with all of the concerns and background information. If you start being asked a whole lot of questions that you don’t immediately see the relevance to your problem, hang in there. This information really does help us get to the bottom of problems faster.

Advertisements

Written by Alan

August 30, 2004 at 10:56 pm

Posted in General

4 Responses

Subscribe to comments with RSS.

  1. Adam:
    I agree that the big/full picture needs to be given but IMHO it depends on the person who is at the other end. The first level of support at Sun is bad, instead of trying to solve a problem, they ask me if my system has the latest Solaris 8 Recommended and Security patches, what my PROM level is, or some basic trouble shooting stuff which I would have already tried. I am willing to patch 250+ servers if they can demonstrate to me that a patch fixes my problem. The further up the chain I go, the more info I disclose. By the time I get to the back line engineers, I may have already solved the problem using google or testing different setup/config, etc.

    PJ

    September 1, 2004 at 6:20 am

  2. I have a question regarding this:
    “Anyway, the news that came to light yesterday was that the customer was not using edquota. They were using some code that they had written themselves. The actual code itself looked fine. What was wrong was an assumption that they had made. That is that variables allocated on the stack are zero-filled.”

    Ryan Matteson

    September 1, 2004 at 7:08 pm

  3. I have a question regarding this:
    “Anyway, the news that came to light yesterday was that the customer was not using edquota. They were using some code that they had written themselves. The actual code itself looked fine. What was wrong was an assumption that they had made. That is that variables allocated on the stack are zero-filled.”
    When would a variable actually get allocated on the stack? I
    thought the stack was used to save state during register window overflows, and to pass arguments between functions when more than 6 args were present? Can you help unconfuse me?
    Thanks for the awesome posts! Your BLOG rocks!!!!!!!
    – Ryan

    Ryan Matteson

    September 1, 2004 at 7:13 pm

  4. PJ, This is something that we are trying to address with the push down of the Sun Global Resolution model and knowledge databases. Unfortunately, much of the time a large number of issues are addressed by simply patching, which is why they ask those questions. We are also actively working to provide knowledge into the database that these folks use to deal with incoming calls.
    Ryan, variables that are local to a function (and not declared static) get allocated on the stack. For example:

    void func() {
    int intvar;
    ...
    

    In the general case, <tt>intvar</tt> would be put on the stack. It may be that the compiler will optimise it into a register to save some stack space, but this depends on the architecture, the compiler and the level of optimisation. Thanks for your sentiments.

    Alan Hargreaves

    September 2, 2004 at 12:45 am


Comments are closed.

%d bloggers like this: