SQL Server doubly linked lists revisited

It’s SQLbits this week, and I’m as excited as a query going through the compile semaphore for the first time, so I’m taking a break from the heavy stuff and gossiping about bugs instead.

The thing about linked lists

I’ve previously done a blog post on doubly linked lists, but here is all you really need to know:

  • A list entry consists of two pointers, flink (forward link) and blink (backward link).
  • This structure is embedded at a known offset within the object that wants to be within the list. Because the offset is known, no extra pointer is required to point to the containing object.
  • The list head is also a flink/blink pair, living within the object that owns the list. An empty list is denoted by both flink and blink pointing to the list head iself.
  • Because list manipulation needs multiple updates, it can’t be done atomically, and inevitably the list is protected by a spinlock. Even reading it safely requires acquiring the spinlock first, and this is one of the reasons why singly linked lists feature in Hekaton.
  • Traversing a list involves getting the address of the next entry – flink or blink, depending on direction- and then doing something with the containing record. This may be as simple as checking an attribute and returning that object if it is the desired item, or continuing the traversal if it is not.

On the implementation detail side, you’ll normally find that a function to retrieve a list entry will return a null pointer, i.e. the value zero, when it finds an empty list, because it can’t validly return the address of the list head itself. Consuming code will then treat the null pointer as a signal value that nothing was found, and it will most certainly not try and dereference that value by treating it as an address.

Examples from the wild

These beasties are sometimes easy to recognise from a data dump alone. Here is what the list head for an empty list looks like, in this case from the very CBatch instance I was looking at while researching last week’s post:

00000000`38773408  00000000`38773408
00000000`38773410  00000000`38773408

Two pointer-sized members, the first one pointing to itself, and the second also pointing to the first, or strictly speaking to the pair starting with the first.

Then we have a list with a single entry, which is also easy to recognise by sight: both flink and blink of the list head point to the same list entry, and both flink and blink of that list entry point back to the list head. Interestingly, from those two bits of information alone, it’s impossible to tell which is the list head and which is the list item.

Here is the listhead within a CSession instance pointing at the list item for the single CBatch within that session:

00000000`38772f40  00000000`38773378
00000000`38772f48  00000000`38773378

And on the list entry side of the equation, here is the very start of that CBatch instance, with the list entry starting at the second member:

00000000`38773370  00007ffd`61d17b80 sqllang!CBatch::`vftable
00000000`38773378  00000000`38772f40
00000000`38773380  00000000`38772f40

This is a lovely textbook example. Firstly the vftable symbol confirms that we’re dealing with a CBatch, but the list entry starts at offset 8 and not 0. What this means is that one gets at the CBatch instance linked into this particular list by deducting 8 from the list entry’s address.

One way to cause a stack dump

In my previous post, I referenced one of Kendra Little’s Dear SQL DBA episodes, where she demonstrates a stack dump. The exception that led to the stack dump in question was a particular variation on a null reference:

Access violation occurred writing address 0000000000000008

Now I must admit that I didn’t get around to running Kendra’s repro and checking for myself, but this looks like a slam dunk for a dereference problem on a linked list while simultaneously demonstrating a clever safety convention.

Let’s start with the safety convention. The “null” of a null pointer isn’t a magic value, but in real-life implementation is simply zero, which is a perfectly valid virtual address. However, on the premise that trying to access address zero or addresses near it probably indicates a program error, the OS will map that page in such a way that trying to access it causes an access violation. This is not a bug or an accident, but a damn clever feature! Robert Love explains it very nicely over here for Linux, and it applies equally to Windows.

Now recall the convention that trying to retrieve the head or tail of an empty list will – by convention – bring you back a null pointer. When iterating, a related convention may also return a zero when you’ve gone all the way around and come back to the list head. Clearly the onus is on the developer to recognise that null pointer and not dereference it, but attempting to do so sets in motion the safety feature of an access violation, which can then be neatly caught through standard exception handling, for instance yielding a diagnostic stack dump.

Finding a dereference of address zero doesn’t suggest much in itself – zeroes can come from lots of sources. But the address 8 is far more evocative. If you are doing linked list traversal, the normal direction is through blink, which is accessed by adding eight to the address of the list item. And let’s say that you got the value zero as the address of your list item, but didn’t check the value before doing that dereference, you’d be dereferencing address 8. Hey, there can be other ways to get to address 8, but this is a strong candidate.

I should of course emphasise that diagnostic code such as DBCC PAGE jumps through hoops to avoid standard safety features like taking out locks while remaining “safe enough”, so finding a null pointer bug in such an area is just a reinforcement of the idea that this is unsupported, undocumented, and doesn’t necessarily live by the platinum standards expected in the rest of the product.

Lurking bugs that don’t bite

Ah, the thrill of the chase. There is a perverse delight in reading through a bit of disassembly and finding that logical hole where a safety check was only half-done. Here is an example from the beginning of a function (which will remain unnamed) to the point where we have a null-reference bug – I added simple labels and ASCII-art arrows for control flow.

         mov  rax,qword ptr [rcx+58h]
         add  rcx,50h
         cmp  rax,rcx
 +------ je   *1*
 |
 |       text rax,rax ; is rax == 0?
 +------ je   *1*
 |
 |       add  rax, 0FFFFFFFFFFFFFFF8h ; i.e. subtract 8
 |    +- jmp  *2*
 v    |
*1*   |  xor  eax,eax ; i.e. mov rax,0
      v
     *2* cmp  dword ptr [rax+1ACh], r8d ; <-- may dereference address 1AC

It's all in the last two instructions. A path exists where rax may contain the value 0, and while you'd normally expect the zero-check before the dereference, that zero-check is missing.

Now unlike the DBCC PAGE example, which has been known to go wrong, this is in a "production" function which doesn't appear to blow up all the time, i.e. the preconditions for the bug coming to light simply aren't there. In practical terms, at the point this function is called, clearly the list in question is never empty. Does that kind of code pattern even count as a bug? I guess the only thing that makes it stand out is the acknowledgement that we may find rax to be zero, or even force it to zero, right before the dereference. But this is probably inherited from an inlined function, and it is only in the disassembly that we see the juxtaposition so clearly.

For my part, I shall continue to delight in finding oddities like this. But at the same time it reminds me that I have a lot to be humble about: if I can spot these little things in the code produced by seriously good programmers, just think of the horrors they would be able to find in my code.

Unsung SQLOS: SOS_WaitableAddress

One of the more amusing words in the SQL Server synchronisation lexicon is “lightweight”. Locks bad. Nolocks good. Latches lightweight. The more spinlocks you eat, the more wait you lose!

If only things were that simple… But hey, I love the poetry of compromise. Check out the SOS_WaitableAddress for one of the many competing definitions of “lightweight”.
Continue reading “Unsung SQLOS: SOS_WaitableAddress”

Unsung SQLOS: linked lists

I’m slowly working towards some more juicy subjects again, but for the moment it is sensible to get fundamentals out of the way. The use of linked lists by SQL Server and SQLOS is a great place to start, because so much is built upon them. Grok this and you’re set for countless hours of fun.

From general exposure to SQLOS scheduling, many of us are familiar with concepts like “the worker is put on the runnable list”. At a high level this is simple to grasp, and the good news is that not much is hidden from us in the implementation detail. Nonetheless, it seems that it’s only systems programmers who deal with these details nowadays, but it’s still useful for the rest of us to get a chance to get comfortable with such internals.

Linked list implementation in SQLOS

A SQLOS doubly linked list follows a common Windows pattern based on a ListEntry structure. This is remarkably simple, containing only two pointer-sized members, i.e. taking up 16 bytes on x64:

  • flink (forward link) – a pointer to the next entry in the list
  • blink (backward link) – a pointer to the previous entry in the list

Continue reading “Unsung SQLOS: linked lists”