The early life of a SQLOS thread

So I have checked off that bucket list item of speaking at a SQLSaturday. In the process of getting my act together, I learned a thing or two about the undocumented youth of SQLOS threads, between birth and entering the workplace. And you didn’t, which seems unfair.

We normally see stack traces while looking at the top of the stack, typically during a wait, at which point the thread is wearing full worker garb, and has been executing a task for a while. Let’s today reflect on those happy times when our thread was in diapers.

Conception and birth

Threads are born because the system as a whole decides they are cute and it wants more of them. The decision is made in the SystemThreadDispatcher, which is a component of a SchedulerManager, itself a component of an SOS_Node, aka CPU node.

We can simplify this: Threads are born into nodes.

Now a thread isn’t created at the moment that it is needed, and it isn’t legally able to perform work right from birth. The idea is to have a reasonable number of grown-up threads in the population, ready to be put to work at short notice. We are just at the first step.

Thread creation is done through a CreateRemoteThreadEx() call, within the function SystemThreadDispatcher::CreateNewSysThreadIfRequired(), which is invoked as a side task by another thread when it leaves the pool of unemployed threads.

The function pointer passed in as thread entry point is SchedulerManager::ThreadEntryPoint(), and the parameter that will be passed to that entry point is a pointer to the target node’s SchedulerManager. In other words, when the function runs, it will be a completely normal instance method call on that SchedulerManager, parameterless except for the This pointer. And since the SchedulerManager knows what node it belongs to, our newborn thread will instinctively be able to crawl into the arms of the maternal SOS_Node.

But I am getting ahead of myself here. Before even running that entry point function, the thread creation callback registered during SQLOS boot (SystemThread::DllMainCallback()) is invoked by the OS runtime in the context of the new thread. And that gives us a SystemThread associated with the thread, meaning it has – among other things – the Windows event that will let it participate in SQLOS context switching.

So the very first thing our newborn thread, cosily wrapped up in a SystemThread, does is to enlist itself in the parent SOS_Node – and by “enlist” I literally mean adding itself to a linked list. Strictly speaking, it enlists the SystemThread, which is now SQLOS’s proxy to the thread: whenever we want to refer to a thread, we do so through a pointer to its SystemThread. Looking at it from one direction, the SystemThread contains a handle to the thread. From the other direction, any running code can find the ambient SystemThread through a thread-local storage lookup.

As it stands, the thread can’t do much useful in polite company yet, other than suspend itself. SystemThread::Suspend() is the most rudimentary of scheduling functions, just calling WaitForSingleObject() on the thread’s personal Event.

When a thread loves a Worker

ThreadEntryPoint now calls SystemThreadDispatcher::ProcessWorker() on the SOS_Node’s SystemThreadDispatcher, i.e. the one within the current SchedulerManager.

The SystemThreadDispatcher shows itself to be a dating agency, keeping separate linked lists of unattached SystemThreads and idle Workers, and pairing them off according to supply and demand.

From the viewpoint of the thread running it, ProcessWorker() means “find me an unattached Worker so we can change the world together”. If there isn’t a spare Worker at this moment though, the thread goes to sleep through the aforementioned SystemThread::Suspend() call, only to be woken up when a fresh young Worker arrives on the dating scene. This moment is celebrated by ProcessWorker() moving on to call SystemThread::RunWorker()

Pairing the two up includes the SystemThread swearing a vow of loyalty to the Worker’s associated SOS_Scheduler. Up to this point, the thread was “in the SystemThreadDispatcher” and associated with an SOS_Node, but not a specific scheduler. From here onwards, the SystemThread and Worker are fully part of the family of workers for that scheduler.

We now move on to SchedulerManager::WorkerEntryPoint() which initialises the Worker, e.g. setting timestamps and the first quantum target, before invoking the first SOS_Scheduler method, ProcessTasks().

Interesting aside regarding waits: The suspension of a thread within the SystemThreadDispatcher isn’t a measured wait, because waiting is measured at the level of workers and schedulers, neither of which have yet entered the picture.

Your task, should you choose to accept it…

Moving into the family home of the Worker, the first stop within ProcessTasks() is a courtesy call on that scheduler’s WorkDispatcher. If the SystemThreadDispatcher was a dating agency for Workers and SystemThreads, the WorkDispatcher is an employment agency for those couples, pairing them up with jobs in the form of SOS_Tasks.

Entering the WorkDispatcher initially, the pair generally wouldn’t find a pending tasks. At this point they (though the pair is now just viewed as a Worker by the scheduler) are put to sleep through a full-fledged scheduler method, SOS_Scheduler::SuspendNonPreemptive(). This means that the Worker ends up on a suspend queue, specifically the WorkDispatcher’s list of idle workers.

When a task is lobbed over the wall into the scheduler from elsewhere, the WorkDispatcher will assign it to an idle Worker, and the worker made runnable. In due course it will be chosen as the next worker to run, continuing with the ProcessTasks() call to run the specific function specified through the task: this is SOS_Scheduler:RunTask() into SOS_Task::Param::Execute().

The task gets executed, through all the joys and heartaches of taskhood, and if parallelism is involved, child tasks may even be spawned. Ultimately though, the task will be done, and the pair return to the WorkDispatcher’s idle list, blocked in SOS_Scheduler::ProcessTasks() but ready for the next challenge.

You want pictures? Sure.

The relationship between SOS_Node, SOS_Scheduler and their dispatching components

(For the sake of honesty, I should note that a node actually has separate SchedulerManagers for normal and hidden schedulers.)

Up next

This takes care of how tasks, workers, and threads interact – at least in thread mode, which is the only mode we probably care about. In the next blog post I will look into how tasks actually get instantiated.