Alan Hargreaves' Blog

The ramblings of a former Australian SaND TSC* Principal Field Technologist

Why I hate macros that make pointer dereferences look like structure elements

I have a colleague who generated an IDR patch for tcp in Solaris 10 for me to give relief to a customer for a bug while the formal fix was in progress.

As a part of the fix we had this code fragment

18984          /*
18985           * If the SACK option is set, delete the entire list of
18986           * notsack'ed blocks.
18987           */
18988          if (tcp->tcp_sack_info != NULL) {
18989                  if (tcp->tcp_notsack_list != NULL)
18990                          TCP_NOTSACK_REMOVE_ALL(tcp->tcp_notsack_list, tcp);
18991          }

replaced with this code fragment (the fix actually has a lot more to it than this, but this is what was relevent.)

18936          /*
18937           * If the SACK option is set, delete the entire list of
18938           * notsack'ed blocks.
18939           */
18941          if (tcp->tcp_notsack_list != NULL)
18942                  TCP_NOTSACK_REMOVE_ALL(tcp->tcp_notsack_list, tcp);

Now, the assembly around here reads

ip:tcp_process_shrunk_swnd+0x38:       ldx      [%i0 + 0xf8], %g1
ip:tcp_process_shrunk_swnd+0x3c:       add      %g3, %i1, %g2
ip:tcp_process_shrunk_swnd+0x40:       stw      %g2, [%i0 + 0x80]
ip:tcp_process_shrunk_swnd+0x44:       ldx      [%g1 + 0x48], %i5

and the register dump

pc:  0x7bed2918 ip:tcp_process_shrunk_swnd+0x44:   ldx  [%g1 + 0x48], %i5
npc: 0x7bed291c ip:tcp_process_shrunk_swnd+0x48:   subcc  %i5, 0x0, %g0   ( cmp   %i5, 0x0 )
global:                       %g1                  0
%g2             0x761c  %g3             0x68ec
%g4      0x600216f6e6c  %g5                  0
%g6               0x1c  %g7      0x2a101f89ca0
out:  %o0      0x600210d8640  %o1              0x1e0
%o2              0x5a8  %o3              0x3c8
%o4      0x600216f6e6c  %o5      0x600216f68c4
%sp      0x2a101f88c61  %o7         0x7bed2900
loc:  %l0         0xc7341c85  %l1             0x2000
%l2      0x60010972000  %l3      0x600210d8640
%l4             0x1000  %l5             0x1000
%l6             0x1000  %l7                0x5
in:   %i0      0x600210d8640  %i1              0xd30
%i2                  0  %i3         0xc73439b5
%i4                  0  %i5                0x4
%fp      0x2a101f88d11  %i7         0x7becbed4

The last instruction is where we paniced (yes the customer paniced [twice] as a result of this). As we can see from the
register dump, %g1 is NULL, so we definitely have a NULL pointer
dereference going on.

So where did this come from? It looks like a dereference of 0xf8
from %i0. %i0 is a (tcp_t *) making %g1 a (tcp_sack_info *), namely
arg0->tcp_sack_info if we look at the structure; but hang on, the code
says tcp->tcp_notsack_list, not tcp->tcp_sack_info. Indeed that element
name does not exist within a tcp_t.

A light dawns when we see that:

299  #define tcp_notsack_list	        tcp_sack_info->tcp_notsack_list

So in reality line 18941 is doing:

18941          if (tcp->tcp_sack_info->tcp_notsack_list != NULL)

Without checking whether or not tcp->tcp_sack_info is non-NULL. The correct line should perhaps read

18941          if (tcp->tcp_sack_info != NULL && tcp->tcp_notsack_list != NULL)

Now this would probably not have made it as far as in IDR patch delivered to a customer, if we didn’t have that macro definition because alarm bells would have rung that we were doing another dereference!


Written by Alan

August 13, 2009 at 5:40 pm

Posted in Solaris

%d bloggers like this: