Where do SQL Server tasks come from?

In my previous post I discussed the unsung early years of a SQLOS thread. Now it’s all very well knowing that threads extend themselves with SystemThreads, don Worker outfits, and execute SOS_Tasks, but I keep glossing over where tasks come from.

Gloss no more.

SOS_Task::Param – the seed of a work request

This nested struct only shows up by name in one spot on stack traces, when we see the below call as the crossover point between thread setup boilerplate and the meat of the task:

sqldk!SOS_Task::Param::Execute

This is a simple piece of abstraction, calling a previously specified function pointer with a previously specified parameter value. Those two values are the core data members of the SOS_Task::Param. In itself, such a pair is similar to the pair one would pass to a thread creation call like CreateRemoteThreadEx(). However, among the other data members we find SQLOS-specific things like a pointer to a resource group, the XEvent version of task identity and – if applicable – a parent task. These do pad out the picture a bit.

Now the SOS_Task::Param is a comparatively small structure, but it contains the core of what a task is about. Without it, an SOS_Task is a bag of runtime state without a defined mission, and isn’t “executable”. It’s just a typed block of memory: 984 bytes in the case of SQL Server 2016 SP1, of which the Param takes up 68 bytes.

Ultimately, getting a task run translates into this: Specify what you want done by phrasing it as an SOS_Task::Param, and get “someone” to create and enqueue a task based on it to a WorkDispatcher.

Which came first, the task or the task?

I have lately been spending some delightful hours unpicking the SQL Server boot process. For now, let’s stick with the simple observation that some tasks are started as part of booting. One example would be the (or nowadays “a”) logwriter task, whose early call stack looks like this:

sqlmin!SQLServerLogMgr::LogWriter
sqldk!SOS_Task::Param::Execute
sqldk!SOS_Scheduler::RunTask
sqldk!SOS_Scheduler::ProcessTasks
sqldk!SchedulerManager::WorkerEntryPoint
sqldk!SystemThread::RunWorker
sqldk!SystemThreadDispatcher::ProcessWorker
sqldk!SchedulerManager::ThreadEntryPoint
KERNEL32!BaseThreadInitThunk
ntdll!RtlUserThreadStart

Everything up to RunTask() is the thread starting up, getting married to a worker, and dequeuing a task from the WorkDispatcher. Apart from it being enqueued by the boot process, and the task never really completing, there is nothing special about this core piece of system infrastructure. It is just another task, one defined through an SOS_Task::Param with sqlmin!SQLServerLogMgr::LogWriter as its function pointer.

Here is one that is actually a bit more special:

sqldk!SOS_Node::ListenOnIOCompletionPort
sqldk!SOS_Task::Param::Execute
sqldk!SOS_Scheduler::RunTask
sqldk!SOS_Scheduler::ProcessTasks
sqldk!SchedulerManager::WorkerEntryPoint
sqldk!SystemThread::RunWorker
sqldk!SystemThreadDispatcher::ProcessWorker
sqldk!SchedulerManager::ThreadEntryPoint
KERNEL32!BaseThreadInitThunk
ntdll!RtlUserThreadStart

This is of course the network listener that makes it possible for us to talk to SQL Server whatsoever. It is a bit unusual in using preemptive OS scheduling rather than the cooperative SQLOS flavour, but this only affects its waiting behaviour. What makes it very special is that this is the queen bee: it lays the eggs from which other tasks hatch.

The SQL nursery

The above system tasks wear their purpose on their sleeves, because the function pointer in the SOS_Task::Param is, well, to the point. The tasks that run user queries are more abstract, because the I/O completion port listener can’t be bothered with understanding much beyond the rudiments of reading network packets – it certainly can’t be mucking about with fluent TDS skills, SQL parsing and compilation, or permission checks.

So what it does is to enqueue a task that speaks TDS well, pointing it to a bunch of bytes, and sending it on its way. Here is an example of such a dispatch, which shows the “input” side of the WorkDispatcher:

sqldk!WorkDispatcher::EnqueueTask
sqldk!SOS_Node::EnqueueTaskDirectInternal
sqllang!CNetConnection::EnqueuePacket
sqllang!TDSSNIClient::ReadHandler
sqllang!SNIReadDone
sqldk!SOS_Node::ListenOnIOCompletionPort
...

At this point, the Resource Group classifier function will have been run, and the target SOS_Node will have been chosen. Its EnqueueTaskDirectInternal() method picks a suitable SOS_Scheduler in the node, instantiates an SOS_Task based on the work specification in the SOS_Task::Param, and the WorkDispatcher within that scheduler is then asked to accept the task for immediate or eventual dispatching. Side note: it looks as if SOS_Scheduler::EnqueueTask() is inlined within EnqueueTaskDirectInternal, so we can imagine it being in the call stack before WorkDispatcher::EnqueueTask().

In simple terms, the network listener derives a resource group and chooses a node, the node chooses a scheduler, and the scheduler delegates the enqueue to its WorkDispatcher component.

Ideally an idle worker will have been found and assigned to the task. If so, that worker is now put on its scheduler’s runnable queue by the network listener thread, taking it out of the WorkDispatcher. If no worker is available, the task is still enqueued, but in the WorkDispatcher’s wallflower list of workerless tasks, giving us our beloved THREADPOOL wait.

Let’s however assume that we have a worker. Over on the other side, its underlying thread is still sleeping, but tied to a worker which is now on its scheduler’s runnable queue. It will eventually be woken up to start its assigned task in the normal course of its scheduler’s context switching operations. And what screenplay was thrown over the wall for it to read?

sqllang!CSQLSource::Execute
sqllang!process_request
sqllang!process_commands_internal
sqllang!process_messages
sqldk!SOS_Task::Param::Execute
...

Here we get an extra level of abstraction. The instantiated task runs the TDS parser, which creates and invokes specialised classes based on the contents of the TDS packet. In this case we got a SQL batch request, so a CSQLSource was instantiated to encapsulate the text of the request and set to work on it, taking us down the road of SQL parsing, compilation, optimisation and execution.

Once parallelism comes into play, such a task could instantiate yet more tasks, but that is a story for another day.

Conclusion

From a high enough view, this is quite elegant and simple. A handful of tasks are created to get SQL Server in motion; some of these have the ability to create other tasks, and the machine just chugs on executing all of them.

To tie this up with the previous post, here is the complete lifecycle of a SQLOS thread:

The SQLOS thread lifecycle

If you haven’t lately read Remus Rusanu’s excellent Understanding how SQL Server executes a query, do pay it a visit. This post is intended to complement its opening part, although I’m placing more emphasis on the underlying classes and methods recognisable from stack traces.

Since we’ve now covered the thread lifecycle from birth into its working life, next up I’ll go back and look at what a wait feels like from the worker angle.

The early life of a SQLOS thread

So I have checked off that bucket list item of speaking at a SQLSaturday. In the process of getting my act together, I learned a thing or two about the undocumented youth of SQLOS threads, between birth and entering the workplace. And you didn’t, which seems unfair.

We normally see stack traces while looking at the top of the stack, typically during a wait, at which point the thread is wearing full worker garb, and has been executing a task for a while. Let’s today reflect on those happy times when our thread was in diapers.

Conception and birth

Threads are born because the system as a whole decides they are cute and it wants more of them. The decision is made in the SystemThreadDispatcher, which is a component of a SchedulerManager, itself a component of an SOS_Node, aka CPU node.

We can simplify this: Threads are born into nodes.

Now a thread isn’t created at the moment that it is needed, and it isn’t legally able to perform work right from birth. The idea is to have a reasonable number of grown-up threads in the population, ready to be put to work at short notice. We are just at the first step.

Thread creation is done through a CreateRemoteThreadEx() call, within the function SystemThreadDispatcher::CreateNewSysThreadIfRequired(), which is invoked as a side task by another thread when it leaves the pool of unemployed threads.

The function pointer passed in as thread entry point is SchedulerManager::ThreadEntryPoint(), and the parameter that will be passed to that entry point is a pointer to the target node’s SchedulerManager. In other words, when the function runs, it will be a completely normal instance method call on that SchedulerManager, parameterless except for the This pointer. And since the SchedulerManager knows what node it belongs to, our newborn thread will instinctively be able to crawl into the arms of the maternal SOS_Node.

But I am getting ahead of myself here. Before even running that entry point function, the thread creation callback registered during SQLOS boot (SystemThread::DllMainCallback()) is invoked by the OS runtime in the context of the new thread. And that gives us a SystemThread associated with the thread, meaning it has – among other things – the Windows event that will let it participate in SQLOS context switching.

So the very first thing our newborn thread, cosily wrapped up in a SystemThread, does is to enlist itself in the parent SOS_Node – and by “enlist” I literally mean adding itself to a linked list. Strictly speaking, it enlists the SystemThread, which is now SQLOS’s proxy to the thread: whenever we want to refer to a thread, we do so through a pointer to its SystemThread. Looking at it from one direction, the SystemThread contains a handle to the thread. From the other direction, any running code can find the ambient SystemThread through a thread-local storage lookup.

As it stands, the thread can’t do much useful in polite company yet, other than suspend itself. SystemThread::Suspend() is the most rudimentary of scheduling functions, just calling WaitForSingleObject() on the thread’s personal Event.

When a thread loves a Worker

ThreadEntryPoint now calls SystemThreadDispatcher::ProcessWorker() on the SOS_Node’s SystemThreadDispatcher, i.e. the one within the current SchedulerManager.

The SystemThreadDispatcher shows itself to be a dating agency, keeping separate linked lists of unattached SystemThreads and idle Workers, and pairing them off according to supply and demand.

From the viewpoint of the thread running it, ProcessWorker() means “find me an unattached Worker so we can change the world together”. If there isn’t a spare Worker at this moment though, the thread goes to sleep through the aforementioned SystemThread::Suspend() call, only to be woken up when a fresh young Worker arrives on the dating scene. This moment is celebrated by ProcessWorker() moving on to call SystemThread::RunWorker()

Pairing the two up includes the SystemThread swearing a vow of loyalty to the Worker’s associated SOS_Scheduler. Up to this point, the thread was “in the SystemThreadDispatcher” and associated with an SOS_Node, but not a specific scheduler. From here onwards, the SystemThread and Worker are fully part of the family of workers for that scheduler.

We now move on to SchedulerManager::WorkerEntryPoint() which initialises the Worker, e.g. setting timestamps and the first quantum target, before invoking the first SOS_Scheduler method, ProcessTasks().

Interesting aside regarding waits: The suspension of a thread within the SystemThreadDispatcher isn’t a measured wait, because waiting is measured at the level of workers and schedulers, neither of which have yet entered the picture.

Your task, should you choose to accept it…

Moving into the family home of the Worker, the first stop within ProcessTasks() is a courtesy call on that scheduler’s WorkDispatcher. If the SystemThreadDispatcher was a dating agency for Workers and SystemThreads, the WorkDispatcher is an employment agency for those couples, pairing them up with jobs in the form of SOS_Tasks.

Entering the WorkDispatcher initially, the pair generally wouldn’t find a pending tasks. At this point they (though the pair is now just viewed as a Worker by the scheduler) are put to sleep through a full-fledged scheduler method, SOS_Scheduler::SuspendNonPreemptive(). This means that the Worker ends up on a suspend queue, specifically the WorkDispatcher’s list of idle workers.

When a task is lobbed over the wall into the scheduler from elsewhere, the WorkDispatcher will assign it to an idle Worker, and the worker made runnable. In due course it will be chosen as the next worker to run, continuing with the ProcessTasks() call to run the specific function specified through the task: this is SOS_Scheduler:RunTask() into SOS_Task::Param::Execute().

The task gets executed, through all the joys and heartaches of taskhood, and if parallelism is involved, child tasks may even be spawned. Ultimately though, the task will be done, and the pair return to the WorkDispatcher’s idle list, blocked in SOS_Scheduler::ProcessTasks() but ready for the next challenge.

You want pictures? Sure.

The relationship between SOS_Node, SOS_Scheduler and their dispatching components

(For the sake of honesty, I should note that a node actually has separate SchedulerManagers for normal and hidden schedulers.)

Up next

This takes care of how tasks, workers, and threads interact – at least in thread mode, which is the only mode we probably care about. In the next blog post I will look into how tasks actually get instantiated.

Speaking at SQL Saturday Manchester

I was surprised to find it within me to submit a session abstract to SQL Saturday 645 in Manchester. Not to mention delighted when my session on SQLOS scheduling got chosen.

This will be my first time speaking at a SQL Saturday, so I’m pretty excited about the experience. The plan is to cover some fundamentals, revisit a few things I have blogged about, and add bits that have probably never been covered anywhere before.

What can possibly go wrong if I get up on stage? Here endeth the SQL section of this blog post.

This can possibly go wrong when I get up on stage

In a previous life, I spent 7.5 years as a cruise ship musician, first playing in a lounge band, then moving to the show band. I ended up as music director, leading the show band from the piano, and vaguely responsible for supervising and scheduling all music around the ship. Fortunately the latter mostly involved letting people do what they’re good at, and I wasn’t in a position of any responsibility during the below incident.

Back in the mists of early 2001, I worked on a ship that was doing a series of musical theme cruises. One particular cruise was really juicy: the star attraction was a Frank Sinatra impersonator, and our normal 7-piece band got expanded to a 20-piece big band for the occasion. The act was great, the stage was satisfyingly full, and those classic arrangements were a joy to play.

The full show early in the week was over, and regular entertainment went on as usual, except that the star singer was doing a short segment in a variety show on the last night: a magician to open, the singer in the prime spot, and then a short production number from the dance cast.

In line with dinner, which was split between two seatings, shows were done twice each night, at 20:30 and then again at 22:15. Except for the last night, when the second seating had an early 18:45 show before their meal.

Somehow it never occurred to anyone to explicitly mention the flipped showtimes to the headliner. No big deal when we didn’t see his face before curtain up, because he still had twenty minutes to get backstage. But around this time we started asking whether he knows about the early show. Frantic phone calls were made, and towards the end of the magician’s act someone managed to get him on the phone after he just stepped out of the shower.

With one minute to go, and the hope that he can get dressed and ready in three more, the cruise director shouted at the band to play something – anything – while we figure out what to do. The cast was roused from their backstage lazing and told they might have to go on in two.

It was around this point where things completely fell apart. Like any production deployment, stage management works on reasonable certainty and a pre-arranged set of cues. Improvisation rarely has a happy ending.

Keeping a token brave face, the cruise director did a hand-over to the band, who figured we may as well play “A-train” and see where it goes. And the curtain opened up as a half-naked dancer frantically ran across the stage, trailing bits of costume.

The singer turned up a minute after the production cast finished their bit, culminating with the stage being littered with the disgorged content of streamer cannons and unfit for further use. The cruise director, having completely lost the erstwhile brave face, had by then walked on to tell the audience that this was indeed it, and could they please swallow their disappointment as they fill in their feedback forms rating the week’s experience.

The curtain resolutely stayed down. Muffled swearing could be heard backstage. Fingers pointed impotently.

The good news

None of this will happen at SQL Saturday Manchester. I have learned about the dangers of confetti cannons, I will turn up on time, and I probably won’t be in drag.

But if you are interested in the gubbins of context switching, what a SystemThreadDispatcher really does for a living, and you don’t live too far away, do drop by.