SQL Agent and the hairiest Dateadd in town

Like many good things, it started with a #sqlhelp question, this time from Todd Histed (b | t):

Including a bit of back and forth, the issue boiled down to this:

  • A job is scheduled every minute and takes pretty much exactly 55 seconds to run, due to an included WAITFOR delay of 55 seconds
  • On SQL Server 2014, this did reliably run every minute
  • On SQL Server 2016, it skips minutes, effectively running only every two minutes
  • Reducing the WAITFOR to 30 seconds fixed the issue

Now I get unreasonably excited both by date manipulation and by scheduling issues, so this firmly grabbed my attention. After a bit of unwarranted thinking out loud, I figured it is time to go and do some spelunking.

TL;DR

I understand: you’re a busy elf. Here is the quick summary of what I found.

Disclaimer: the different schedule types, e.g. daily vs monthly, have different special case logic; I’m just concerning myself with (intra-) daily schedules here.

Upon completion of a job, the next run time is calculated based on the last scheduled time plus the schedule interval. However, allowance is made for the edge cases where the completed invocation overruns into the next start time. In such a situation, there isn’t a “catch-up” run; instead, the schedule is advanced iteratively until it reaches a future point in time.

However, 2016 introduces a new twist. When applying the “is the proposed next schedule time after Now()?” check, it adds five seconds to Now(). In other words, the question becomes “Is the proposed next schedule time more than five seconds in the future?”

This looks more like a bug fix than a bug. Clearly cases were found where erroneous schedule invocations were happening, perhaps caused by some combination of job duration, high schedule frequency, and clock adjustments. Whatever the story, this is the behaviour we now live with going forward.

Scheduling is a notoriously tricky subject, because we tend to have an intuitive understanding of how things “ought to work,” while merrily ignoring a million little edge cases that keep developers on their toes. Just in case you needed proof.

Spelunking time

In the course of researching this, a few more fun details emerged.

Please send two mongooses
Please send two mongeese
Please send a mongoose. While you’re at it, send another.

When date and time manipulation is needed, where do you turn to? If you’re SQL Agent, this involves custom structures and functions living in SQLSVC.dll. Internally it works with date and time structures with separate 32-bit integer components for the component parts like seconds or months, but translates between this and the funky “packed decimal” pair we see persisted in tables like sysjobhistory, e.g. the integer values 20180321 and 200000 for 20h00 on 21 March 2018.

What I find really cute is the simple approach to implementing Dateadd. We get two functions here, QScheduleDateModify() and QScheduleTimeModify() to operate on such a date/time structure pair. These will only increment or decrement the structure by one tick of the specified unit, so if you want to add six hours to a time, you call QScheduleTimeModify six times with parameter values indicating Increment and Hours.

This may seem primitive, but I suppose it has the advantage of straightforward implementation: as long as overflow of your chosen unit is dealt with correctly, you never need to deal with weird questions like “how many year boundaries are crossed when I add 23,655,101 seconds to the current date?” It isn’t a generic date library, but then it doesn’t need to be. And by definition, normal calculations for schedule times don’t involve a large number of increments, so there isn’t a case for hyper-optimisation.

At this point you probably expect me to whip out Windbg. Here you go then.

Having created a job that was due to fire, I attached to the SQL Agent process, put a breakpoint on SQLSVC!QScheduleCalcNextOccurrence and waited. At the appointed hour, the job ran and the breakpoint was hit as the schedule’s next occurrence needed to get calculated.

Breakpoint 0 hit
SQLSVC!QScheduleCalcNextOccurrence:
00007fff`0c0d3e20 48894c2408      mov     qword ptr [rsp+8],rcx ss:0000008c`e460c240=0000008ce460c274

Just for orientation, let’s look at the call stack and the state of the CPU registers upon entry:

0:022> k
 # Child-SP          RetAddr           Call Site
00 0000008c`e460c238 00007ff7`51595990 SQLSVC!QScheduleCalcNextOccurrence
01 0000008c`e460c240 00007ff7`51595459 SQLAGENT!SetScheduleNextRunDate+0x1f0
02 0000008c`e460c310 00007ff7`5158f992 SQLAGENT!SetJobNextRunDate+0x1a9
03 0000008c`e460c7f0 00007fff`18514f87 SQLAGENT!JobManager+0xfe2
04 0000008c`e460fc70 00007fff`1851512e MSVCR120!_callthreadstartex+0x17 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
05 0000008c`e460fca0 00007fff`261913d2 MSVCR120!_threadstartex+0x102 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
06 0000008c`e460fcd0 00007fff`266754e4 KERNEL32!BaseThreadInitThunk+0x22
07 0000008c`e460fd00 00000000`00000000 ntdll!RtlUserThreadStart+0x34
0:022> r
rax=000000000002ed9c rbx=0000000000000000 rcx=0000008ce460c288
rdx=000000000133c6b1 rsi=0000008cc8b23380 rdi=0000008ce428c278
rip=00007fff0c0d3e20 rsp=0000008ce460c238 rbp=0000008ce460c2a9
 r8=000000000002eda1  r9=0000000000000000 r10=00000000000118a0
r11=0000000000000011 r12=0000000000000001 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na pe cy
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000283
SQLSVC!QScheduleCalcNextOccurrence:
00007fff`0c0d3e20 48894c2408      mov     qword ptr [rsp+8],rcx ss:0000008c`e460c240=0000008ce460c274

Now whether this was C or C++ source code, the Windows x64 ABI tells us that the first parameter passed into the function lives in rcx (highlighted above). I’ll dump a chunk of memory starting at the address it points to, displaying it as 32-bit doublewords:

0:022> dd 8ce460c288 l0x20
0000008c`e460c288  00000004 00000001 00000004 00000005
0000008c`e460c298  00000000 00000000 0133c6b1 05f5bebf
0000008c`e460c2a8  00000190 000399b7 0133c6b1 0002ed9c
0000008c`e460c2b8  0133c6b1 0002ed9c 56b04598 00004634
0000008c`e460c2c8  c8b23390 0000008c 00000000 00000000
0000008c`e460c2d8  00000000 00000000 e4280400 0000008c
0000008c`e460c2e8  00000000 00000000 00000000 00000000
0000008c`e460c2f8  e428c278 0000008c e460c410 0000008c

I’ll push this along – the numbers highlighted in red are of interest. Windbg will oblige us by translating the hexadecimal representation into decimal if we ask nicely:

0:022> ? 133c6b1
Evaluate expression: 20170417 = 00000000`0133c6b1
0:022> ? 0x2ed9c
Evaluate expression: 191900 = 00000000`0002ed9c
0:022> ? 05f5bebf
Evaluate expression: 99991231 = 00000000`05f5bebf

Well, those kinds of numbers look more familiar, especially if you consider that the schedule fired at 19h19 on 17 April.

The rest of the experiment consisted of deliberately keeping the process frozen in the debugger for ten minutes before letting it run again; because this job was scheduled to run every five minutes, this guaranteed missing at least one run. As expected, and confirmed through a breakpoint on QScheduleTimeModify(), we went through three sets of five 1-minute increments to reach a next invocation time that was in the future.

Of course, some intent peering into assembly language was also involved, but I’ll not bore you further.

Where next?

I have a hunch that I’ll be revisiting SQL Agent at some point in the future again. However, the “Context in perspective” series is still hanging in the air, and will be picked up again. Let’s wait and see.

SQL Server doubly linked lists revisited

It’s SQLbits this week, and I’m as excited as a query going through the compile semaphore for the first time, so I’m taking a break from the heavy stuff and gossiping about bugs instead.

The thing about linked lists

I’ve previously done a blog post on doubly linked lists, but here is all you really need to know:

  • A list entry consists of two pointers, flink (forward link) and blink (backward link).
  • This structure is embedded at a known offset within the object that wants to be within the list. Because the offset is known, no extra pointer is required to point to the containing object.
  • The list head is also a flink/blink pair, living within the object that owns the list. An empty list is denoted by both flink and blink pointing to the list head iself.
  • Because list manipulation needs multiple updates, it can’t be done atomically, and inevitably the list is protected by a spinlock. Even reading it safely requires acquiring the spinlock first, and this is one of the reasons why singly linked lists feature in Hekaton.
  • Traversing a list involves getting the address of the next entry – flink or blink, depending on direction- and then doing something with the containing record. This may be as simple as checking an attribute and returning that object if it is the desired item, or continuing the traversal if it is not.

On the implementation detail side, you’ll normally find that a function to retrieve a list entry will return a null pointer, i.e. the value zero, when it finds an empty list, because it can’t validly return the address of the list head itself. Consuming code will then treat the null pointer as a signal value that nothing was found, and it will most certainly not try and dereference that value by treating it as an address.

Examples from the wild

These beasties are sometimes easy to recognise from a data dump alone. Here is what the list head for an empty list looks like, in this case from the very CBatch instance I was looking at while researching last week’s post:

00000000`38773408  00000000`38773408
00000000`38773410  00000000`38773408

Two pointer-sized members, the first one pointing to itself, and the second also pointing to the first, or strictly speaking to the pair starting with the first.

Then we have a list with a single entry, which is also easy to recognise by sight: both flink and blink of the list head point to the same list entry, and both flink and blink of that list entry point back to the list head. Interestingly, from those two bits of information alone, it’s impossible to tell which is the list head and which is the list item.

Here is the listhead within a CSession instance pointing at the list item for the single CBatch within that session:

00000000`38772f40  00000000`38773378
00000000`38772f48  00000000`38773378

And on the list entry side of the equation, here is the very start of that CBatch instance, with the list entry starting at the second member:

00000000`38773370  00007ffd`61d17b80 sqllang!CBatch::`vftable
00000000`38773378  00000000`38772f40
00000000`38773380  00000000`38772f40

This is a lovely textbook example. Firstly the vftable symbol confirms that we’re dealing with a CBatch, but the list entry starts at offset 8 and not 0. What this means is that one gets at the CBatch instance linked into this particular list by deducting 8 from the list entry’s address.

One way to cause a stack dump

In my previous post, I referenced one of Kendra Little’s Dear SQL DBA episodes, where she demonstrates a stack dump. The exception that led to the stack dump in question was a particular variation on a null reference:

Access violation occurred writing address 0000000000000008

Now I must admit that I didn’t get around to running Kendra’s repro and checking for myself, but this looks like a slam dunk for a dereference problem on a linked list while simultaneously demonstrating a clever safety convention.

Let’s start with the safety convention. The “null” of a null pointer isn’t a magic value, but in real-life implementation is simply zero, which is a perfectly valid virtual address. However, on the premise that trying to access address zero or addresses near it probably indicates a program error, the OS will map that page in such a way that trying to access it causes an access violation. This is not a bug or an accident, but a damn clever feature! Robert Love explains it very nicely over here for Linux, and it applies equally to Windows.

Now recall the convention that trying to retrieve the head or tail of an empty list will – by convention – bring you back a null pointer. When iterating, a related convention may also return a zero when you’ve gone all the way around and come back to the list head. Clearly the onus is on the developer to recognise that null pointer and not dereference it, but attempting to do so sets in motion the safety feature of an access violation, which can then be neatly caught through standard exception handling, for instance yielding a diagnostic stack dump.

Finding a dereference of address zero doesn’t suggest much in itself – zeroes can come from lots of sources. But the address 8 is far more evocative. If you are doing linked list traversal, the normal direction is through blink, which is accessed by adding eight to the address of the list item. And let’s say that you got the value zero as the address of your list item, but didn’t check the value before doing that dereference, you’d be dereferencing address 8. Hey, there can be other ways to get to address 8, but this is a strong candidate.

I should of course emphasise that diagnostic code such as DBCC PAGE jumps through hoops to avoid standard safety features like taking out locks while remaining “safe enough”, so finding a null pointer bug in such an area is just a reinforcement of the idea that this is unsupported, undocumented, and doesn’t necessarily live by the platinum standards expected in the rest of the product.

Lurking bugs that don’t bite

Ah, the thrill of the chase. There is a perverse delight in reading through a bit of disassembly and finding that logical hole where a safety check was only half-done. Here is an example from the beginning of a function (which will remain unnamed) to the point where we have a null-reference bug – I added simple labels and ASCII-art arrows for control flow.

         mov  rax,qword ptr [rcx+58h]
         add  rcx,50h
         cmp  rax,rcx
 +------ je   *1*
 |
 |       text rax,rax ; is rax == 0?
 +------ je   *1*
 |
 |       add  rax, 0FFFFFFFFFFFFFFF8h ; i.e. subtract 8
 |    +- jmp  *2*
 v    |
*1*   |  xor  eax,eax ; i.e. mov rax,0
      v
     *2* cmp  dword ptr [rax+1ACh], r8d ; <-- may dereference address 1AC

It's all in the last two instructions. A path exists where rax may contain the value 0, and while you'd normally expect the zero-check before the dereference, that zero-check is missing.

Now unlike the DBCC PAGE example, which has been known to go wrong, this is in a "production" function which doesn't appear to blow up all the time, i.e. the preconditions for the bug coming to light simply aren't there. In practical terms, at the point this function is called, clearly the list in question is never empty. Does that kind of code pattern even count as a bug? I guess the only thing that makes it stand out is the acknowledgement that we may find rax to be zero, or even force it to zero, right before the dereference. But this is probably inherited from an inlined function, and it is only in the disassembly that we see the juxtaposition so clearly.

For my part, I shall continue to delight in finding oddities like this. But at the same time it reminds me that I have a lot to be humble about: if I can spot these little things in the code produced by seriously good programmers, just think of the horrors they would be able to find in my code.