SQL Agent and the hairiest Dateadd in town

Like many good things, it started with a #sqlhelp question, this time from Todd Histed (b | t):

Including a bit of back and forth, the issue boiled down to this:

  • A job is scheduled every minute and takes pretty much exactly 55 seconds to run, due to an included WAITFOR delay of 55 seconds
  • On SQL Server 2014, this did reliably run every minute
  • On SQL Server 2016, it skips minutes, effectively running only every two minutes
  • Reducing the WAITFOR to 30 seconds fixed the issue

Now I get unreasonably excited both by date manipulation and by scheduling issues, so this firmly grabbed my attention. After a bit of unwarranted thinking out loud, I figured it is time to go and do some spelunking.

TL;DR

I understand: you’re a busy elf. Here is the quick summary of what I found.

Disclaimer: the different schedule types, e.g. daily vs monthly, have different special case logic; I’m just concerning myself with (intra-) daily schedules here.

Upon completion of a job, the next run time is calculated based on the last scheduled time plus the schedule interval. However, allowance is made for the edge cases where the completed invocation overruns into the next start time. In such a situation, there isn’t a “catch-up” run; instead, the schedule is advanced iteratively until it reaches a future point in time.

However, 2016 introduces a new twist. When applying the “is the proposed next schedule time after Now()?” check, it adds five seconds to Now(). In other words, the question becomes “Is the proposed next schedule time more than five seconds in the future?”

This looks more like a bug fix than a bug. Clearly cases were found where erroneous schedule invocations were happening, perhaps caused by some combination of job duration, high schedule frequency, and clock adjustments. Whatever the story, this is the behaviour we now live with going forward.

Scheduling is a notoriously tricky subject, because we tend to have an intuitive understanding of how things “ought to work,” while merrily ignoring a million little edge cases that keep developers on their toes. Just in case you needed proof.

Spelunking time

In the course of researching this, a few more fun details emerged.

Please send two mongooses
Please send two mongeese
Please send a mongoose. While you’re at it, send another.

When date and time manipulation is needed, where do you turn to? If you’re SQL Agent, this involves custom structures and functions living in SQLSVC.dll. Internally it works with date and time structures with separate 32-bit integer components for the component parts like seconds or months, but translates between this and the funky “packed decimal” pair we see persisted in tables like sysjobhistory, e.g. the integer values 20180321 and 200000 for 20h00 on 21 March 2018.

What I find really cute is the simple approach to implementing Dateadd. We get two functions here, QScheduleDateModify() and QScheduleTimeModify() to operate on such a date/time structure pair. These will only increment or decrement the structure by one tick of the specified unit, so if you want to add six hours to a time, you call QScheduleTimeModify six times with parameter values indicating Increment and Hours.

This may seem primitive, but I suppose it has the advantage of straightforward implementation: as long as overflow of your chosen unit is dealt with correctly, you never need to deal with weird questions like “how many year boundaries are crossed when I add 23,655,101 seconds to the current date?” It isn’t a generic date library, but then it doesn’t need to be. And by definition, normal calculations for schedule times don’t involve a large number of increments, so there isn’t a case for hyper-optimisation.

At this point you probably expect me to whip out Windbg. Here you go then.

Having created a job that was due to fire, I attached to the SQL Agent process, put a breakpoint on SQLSVC!QScheduleCalcNextOccurrence and waited. At the appointed hour, the job ran and the breakpoint was hit as the schedule’s next occurrence needed to get calculated.

Breakpoint 0 hit
SQLSVC!QScheduleCalcNextOccurrence:
00007fff`0c0d3e20 48894c2408      mov     qword ptr [rsp+8],rcx ss:0000008c`e460c240=0000008ce460c274

Just for orientation, let’s look at the call stack and the state of the CPU registers upon entry:

0:022> k
 # Child-SP          RetAddr           Call Site
00 0000008c`e460c238 00007ff7`51595990 SQLSVC!QScheduleCalcNextOccurrence
01 0000008c`e460c240 00007ff7`51595459 SQLAGENT!SetScheduleNextRunDate+0x1f0
02 0000008c`e460c310 00007ff7`5158f992 SQLAGENT!SetJobNextRunDate+0x1a9
03 0000008c`e460c7f0 00007fff`18514f87 SQLAGENT!JobManager+0xfe2
04 0000008c`e460fc70 00007fff`1851512e MSVCR120!_callthreadstartex+0x17 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
05 0000008c`e460fca0 00007fff`261913d2 MSVCR120!_threadstartex+0x102 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
06 0000008c`e460fcd0 00007fff`266754e4 KERNEL32!BaseThreadInitThunk+0x22
07 0000008c`e460fd00 00000000`00000000 ntdll!RtlUserThreadStart+0x34
0:022> r
rax=000000000002ed9c rbx=0000000000000000 rcx=0000008ce460c288
rdx=000000000133c6b1 rsi=0000008cc8b23380 rdi=0000008ce428c278
rip=00007fff0c0d3e20 rsp=0000008ce460c238 rbp=0000008ce460c2a9
 r8=000000000002eda1  r9=0000000000000000 r10=00000000000118a0
r11=0000000000000011 r12=0000000000000001 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na pe cy
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000283
SQLSVC!QScheduleCalcNextOccurrence:
00007fff`0c0d3e20 48894c2408      mov     qword ptr [rsp+8],rcx ss:0000008c`e460c240=0000008ce460c274

Now whether this was C or C++ source code, the Windows x64 ABI tells us that the first parameter passed into the function lives in rcx (highlighted above). I’ll dump a chunk of memory starting at the address it points to, displaying it as 32-bit doublewords:

0:022> dd 8ce460c288 l0x20
0000008c`e460c288  00000004 00000001 00000004 00000005
0000008c`e460c298  00000000 00000000 0133c6b1 05f5bebf
0000008c`e460c2a8  00000190 000399b7 0133c6b1 0002ed9c
0000008c`e460c2b8  0133c6b1 0002ed9c 56b04598 00004634
0000008c`e460c2c8  c8b23390 0000008c 00000000 00000000
0000008c`e460c2d8  00000000 00000000 e4280400 0000008c
0000008c`e460c2e8  00000000 00000000 00000000 00000000
0000008c`e460c2f8  e428c278 0000008c e460c410 0000008c

I’ll push this along – the numbers highlighted in red are of interest. Windbg will oblige us by translating the hexadecimal representation into decimal if we ask nicely:

0:022> ? 133c6b1
Evaluate expression: 20170417 = 00000000`0133c6b1
0:022> ? 0x2ed9c
Evaluate expression: 191900 = 00000000`0002ed9c
0:022> ? 05f5bebf
Evaluate expression: 99991231 = 00000000`05f5bebf

Well, those kinds of numbers look more familiar, especially if you consider that the schedule fired at 19h19 on 17 April.

The rest of the experiment consisted of deliberately keeping the process frozen in the debugger for ten minutes before letting it run again; because this job was scheduled to run every five minutes, this guaranteed missing at least one run. As expected, and confirmed through a breakpoint on QScheduleTimeModify(), we went through three sets of five 1-minute increments to reach a next invocation time that was in the future.

Of course, some intent peering into assembly language was also involved, but I’ll not bore you further.

Where next?

I have a hunch that I’ll be revisiting SQL Agent at some point in the future again. However, the “Context in perspective” series is still hanging in the air, and will be picked up again. Let’s wait and see.

Scheduler stories: The myth of the waiter list

‘Tis the season to be controversial, so let’s take a stroll down memory lane to Ken Henderson’s classic Inside the SQL Server 2000 User Mode Scheduler:

The waiter list maintains a list of workers waiting on a resource. When a UMS worker requests a resource owned by another worker, it puts itself on the waiter list for the resource and enters an infinite wait state for its associated event object. When the worker that owns the resource is ready to release it, it is responsible for scanning the list of workers waiting on the resource and moving them to the runnable list, as appropriate. And when it hits a yield point, it is responsible for setting the event of the first worker on the runnable list so that the worker can run. This means that when a worker frees up a resource, it may well undertake the entirety of the task of moving those workers that were waiting on the resource from the waiter list to the runnable list and signaling one of them to run.

John Tenniel's White Rabbit from "Alice in Wonderland"

The lists behind the legend

I have not gone as far as opening up my rusty copy of SQL Server 2000 to see how Ken’s description fits in there, but I am now pretty certain that the above quote has transmuted over the years into a common misunderstanding about SQLOS scheduling mechanics.

Now nothing Ken said is untrue or particularly out of date. It is just that we often hear “the waiter list” (by implication handling resource waits) described as an attribute of a scheduler, which is not the case.

Let’s revisit when the scheduler code runs, and what it does:

  • A worker will yield, either because it needs to wait for a resource, or because it is eaten up with guilt over reaching the end of its allotted quantum.
  • The act of yielding means that scheduler code (methods on the SOS_Scheduler class) gets invoked.
  • After a bit of housekeeping for the common good of all workers sharing the scheduler, control is transferred back to a worker to do its thing – this may even be the same worker who originally yielded.
  • The housekeeping consists of checking for aborted tasks, processing pending I/Os, and checking for I/O completions and timer list timeouts.

The single most important list that a scheduler owns is the collection of runnable workers, that is, the subset of workers belonging to this scheduler who are not waiting for anything other than CPU. This has variously been described as a list and a queue; I shall be using the term “runnable queue” by convention, but be aware that it is a data structure that has changed over the years and isn’t a simple queue.

A scheduler has one piece of “creative” interaction with this runnable queue, and it comes with only two variables:

  • When a context switch is requested by an outgoing worker owning the scheduler, the scheduler code has to choose which one of potentially multiple workers is going to be its next owner.
  • The incoming worker gets given a quantum expiry date, by which time it is expected to yield.

Core scheduler code running during context switching only dequeues runnable workers, and at such moments a given scheduler only looks at its own runnable queue. In contrast, code running all over the place, including in the context of workers belonging to other schedulers, may enqueue workers on to the runnable queue.

Time for a simple diagram:

Someone to watch over me

What I’m trying to get across here is that each instance of a waitable resource has its own wait list, and the scheduler has no interest in this, because a scheduler only acts upon its runnable queue. Seen from a different angle, once a worker is waiting on a resource, its scheduler doesn’t care, because it can’t and won’t manage the waiting logic of something like a latch. This splits the responsibilities neatly in two:

  • The synchronisation class guarding a resource (which inevitably will be built upon an EventInternal) stands watch over all the workers queueing up to have a ride on that resource. The act of granting access to a worker involves moving the worker from the wait list and getting it on to the runnable queue of that scheduler’s worker, and this is achieved by the synchronisation class.
  • The scheduler, in turn, doesn’t decide who is runnable, but it does get to pick which of the runnable workers (however they reached that state) runs next.

The I/O and timer lists

There are however two cases where the scheduler decides to make a worker runnable in the normal course of events. One is when a worker was waiting on I/O to complete, where periodic scheduler housekeeping is the mechanism by which SQLOS takes note of the I/O completion. At this point some workers who were on the I/O list may find themselves moved to the runnable queue just before the next worker is picked to be granted ownership of the scheduler – the lucky winner might be one of these workers, or it may be someone else who has been runnable for a while.

The second, and actually more interesting case, is the timer list. In its simplest use case, this is where you will find workers executing T-SQL WAITFOR statements. The list is neatly ordered by timer expiry date, and at each invocation of the scheduler context-switch housekeeping, workers whose timer expiry dates have now passed will be moved to the runnable queue.

What makes a timer list particularly interesting though, is when it implements a resource wait timeout, for instance a lock timeout. In this scenario we actually have a worker waiting on two things simultaneously: a resource and a timer. If the resource is acquired before the timer expires, all is good: the worker goes on to the runnable queue, and upon being woken up it finds a thumbs-up as the return value of its resource acquisition call.

However, should the timer expire before the resource has been acquired, the scheduler will actually venture forth and take the worker off that waiter list before making it runnable and setting an error return value as wake-up call. Think of it as every teenager’s worst nightmare: you’re not home by curfew, so Mom comes to your dodgy party to drag your sorry ass home. And then you wake up with a hangover and note stuck to your forehead reading “No cake for you”.

Whither next?

I tried to keep this comparatively high-level, but might take a nice little detour into the WorkerTimerRequest some day if time permits.

There you have it. Be home on time and have a thread-safe festive season.

Scheduler stories: Going Preemptive

SQLOS is built upon the idea of cooperative, AKA non-preemptive, scheduling: out of any given collection of threads belonging to a scheduler, only one will own the scheduler at a given moment. To the degree that these cooperative threads represent the only work done by the underlying CPU, this means that the thread owning the scheduler really owns the CPU. Of course, a CPU will occasionally get side-tracked into doing other work, so the SQLOS scheduler as “virtual CPU” only represents a chunk of the real CPU, but we live in the expectation that this is a suitably large chunk.

John Tenniel's White Rabbit from "Alice in Wonderland"

It is understandable when such competition for the CPU comes from outside of SQL Server, but it can also be instigated from within SQL Server itself. This is the world of preemptive – or if you prefer, antisocial – scheduling.
Continue reading “Scheduler stories: Going Preemptive”

Scheduler stories: Interacting with the Windows scheduler

In the previous post, The joy of fiber mode, we saw how a fiber mode scheduler firmly controls which worker runs on a thread at a given moment. While it can’t positively ensure that the thread in question remains running all the time, the soul of the scheduler lives in that one thread, and as long as the thread runs, the scheduler gets invoked by its team of fiber workers, dispatching them in an appropriate order.
Continue reading “Scheduler stories: Interacting with the Windows scheduler”

Scheduler stories: The joy of fiber mode

Probably the funniest thing I had ever seen on stage was a two-hander called “Frank ‘n Stein”. It’s a telling of the classic Frankenstein story, with the physical comedy of two actors having to rotate continuously between a large number of roles, including a whole crowd chasing the monster. This was all made possible by them never leaving the stage, but instead changing characters in front of the audience, using only rudimentary props to help differentiate the characters.

If this is the only thing you remember about fiber mode scheduling, it should see you through.
Continue reading “Scheduler stories: The joy of fiber mode”

Scheduler Stories: When does your scheduler run?

I am planning to burn a fair number of cycles on SQLOS scheduling internals for the foreseeable future, and with some luck, this turns into an interesting series. OS scheduling is already a subject that belongs “on the other side of the looking glass”, and this only gets more interesting when we look at user-mode SOS_Scheduler scheduling built on top of it.

If I don’t specifically mention a version, my frame of reference is SQL Server 2014. Yes, things changed since then, but the 2012-2014 scheduler is a good starting point, and the fundamental mechanisms I’ll initially cover have changed very little since the User Mode Scheduler (UMS) of SQL Server 7.0.
Continue reading “Scheduler Stories: When does your scheduler run?”

The Thread in the Head: Dr Sqlos explains context switching

The thread in the head

Oh, the things you can schedule,
the games you can play!
If you keep a clear head, you’ll
be well on your way.

You start with a blank sheet,
three nuts and a bolt,
a strong sense of fairness,
a large can of Jolt.
Continue reading “The Thread in the Head: Dr Sqlos explains context switching”

King Arthur, Energizer bunnies, and the search for the SQLOS scheduler

ARTHUR: Who lives in that castle?
WOMAN: No one lives there.
ARTHUR: Then who is your lord?
WOMAN: We don’t have a lord.
ARTHUR: What?
DENNIS: I told you. We’re an anarcho-syndicalist commune. We take it in turns to act as a sort of executive officer for the week.

(from Monty Python and the Holy Grail)

Yeah, whatever. I want to hear about bunnies

Picture a world consisting of forty Energizer bunnies, grouped into four teams of ten. Each team has one battery between them, and the main rule of the game is that each bunny may only use the battery for a little while before transferring it to a teammate. There is no way for a sleeping player to be woken up except by being passed the battery, and each battery strictly stays within one team.
Continue reading “King Arthur, Energizer bunnies, and the search for the SQLOS scheduler”

Unsung SQLOS: the SystemThread

SystemThread, a class within sqldk.dll, can be considered to be at the root of SQLOS’s scheduling capabilities. While it doesn’t expose much obviously exciting functionality, it encapsulates a lot of the state that is necessary to give a thread a sense of self in SQLOS, serving as the beacon for any code to find its way to an associated SQLOS scheduler etc. I won’t go into much of the SQLOS object hierarchy here, but suffice it to say that everything becomes derivable by knowing one’s SystemThread. As such, this class jumps the gap between a Windows thread and the object-oriented SQLOS.
Continue reading “Unsung SQLOS: the SystemThread”