The early life of a SQLOS thread

So I have checked off that bucket list item of speaking at a SQLSaturday. In the process of getting my act together, I learned a thing or two about the undocumented youth of SQLOS threads, between birth and entering the workplace. And you didn’t, which seems unfair.

We normally see stack traces while looking at the top of the stack, typically during a wait, at which point the thread is wearing full worker garb, and has been executing a task for a while. Let’s today reflect on those happy times when our thread was in diapers.

Conception and birth

Threads are born because the system as a whole decides they are cute and it wants more of them. The decision is made in the SystemThreadDispatcher, which is a component of a SchedulerManager, itself a component of an SOS_Node, aka CPU node.

We can simplify this: Threads are born into nodes.

Now a thread isn’t created at the moment that it is needed, and it isn’t legally able to perform work right from birth. The idea is to have a reasonable number of grown-up threads in the population, ready to be put to work at short notice. We are just at the first step.

Thread creation is done through a CreateRemoteThreadEx() call, within the function SystemThreadDispatcher::CreateNewSysThreadIfRequired(), which is invoked as a side task by another thread when it leaves the pool of unemployed threads.

The function pointer passed in as thread entry point is SchedulerManager::ThreadEntryPoint(), and the parameter that will be passed to that entry point is a pointer to the target node’s SchedulerManager. In other words, when the function runs, it will be a completely normal instance method call on that SchedulerManager, parameterless except for the This pointer. And since the SchedulerManager knows what node it belongs to, our newborn thread will instinctively be able to crawl into the arms of the maternal SOS_Node.

But I am getting ahead of myself here. Before even running that entry point function, the thread creation callback registered during SQLOS boot (SystemThread::DllMainCallback()) is invoked by the OS runtime in the context of the new thread. And that gives us a SystemThread associated with the thread, meaning it has – among other things – the Windows event that will let it participate in SQLOS context switching.

So the very first thing our newborn thread, cosily wrapped up in a SystemThread, does is to enlist itself in the parent SOS_Node – and by “enlist” I literally mean adding itself to a linked list. Strictly speaking, it enlists the SystemThread, which is now SQLOS’s proxy to the thread: whenever we want to refer to a thread, we do so through a pointer to its SystemThread. Looking at it from one direction, the SystemThread contains a handle to the thread. From the other direction, any running code can find the ambient SystemThread through a thread-local storage lookup.

As it stands, the thread can’t do much useful in polite company yet, other than suspend itself. SystemThread::Suspend() is the most rudimentary of scheduling functions, just calling WaitForSingleObject() on the thread’s personal Event.

When a thread loves a Worker

ThreadEntryPoint now calls SystemThreadDispatcher::ProcessWorker() on the SOS_Node’s SystemThreadDispatcher, i.e. the one within the current SchedulerManager.

The SystemThreadDispatcher shows itself to be a dating agency, keeping separate linked lists of unattached SystemThreads and idle Workers, and pairing them off according to supply and demand.

From the viewpoint of the thread running it, ProcessWorker() means “find me an unattached Worker so we can change the world together”. If there isn’t a spare Worker at this moment though, the thread goes to sleep through the aforementioned SystemThread::Suspend() call, only to be woken up when a fresh young Worker arrives on the dating scene. This moment is celebrated by ProcessWorker() moving on to call SystemThread::RunWorker()

Pairing the two up includes the SystemThread swearing a vow of loyalty to the Worker’s associated SOS_Scheduler. Up to this point, the thread was “in the SystemThreadDispatcher” and associated with an SOS_Node, but not a specific scheduler. From here onwards, the SystemThread and Worker are fully part of the family of workers for that scheduler.

We now move on to SchedulerManager::WorkerEntryPoint() which initialises the Worker, e.g. setting timestamps and the first quantum target, before invoking the first SOS_Scheduler method, ProcessTasks().

Interesting aside regarding waits: The suspension of a thread within the SystemThreadDispatcher isn’t a measured wait, because waiting is measured at the level of workers and schedulers, neither of which have yet entered the picture.

Your task, should you choose to accept it…

Moving into the family home of the Worker, the first stop within ProcessTasks() is a courtesy call on that scheduler’s WorkDispatcher. If the SystemThreadDispatcher was a dating agency for Workers and SystemThreads, the WorkDispatcher is an employment agency for those couples, pairing them up with jobs in the form of SOS_Tasks.

Entering the WorkDispatcher initially, the pair generally wouldn’t find a pending tasks. At this point they (though the pair is now just viewed as a Worker by the scheduler) are put to sleep through a full-fledged scheduler method, SOS_Scheduler::SuspendNonPreemptive(). This means that the Worker ends up on a suspend queue, specifically the WorkDispatcher’s list of idle workers.

When a task is lobbed over the wall into the scheduler from elsewhere, the WorkDispatcher will assign it to an idle Worker, and the worker made runnable. In due course it will be chosen as the next worker to run, continuing with the ProcessTasks() call to run the specific function specified through the task: this is SOS_Scheduler:RunTask() into SOS_Task::Param::Execute().

The task gets executed, through all the joys and heartaches of taskhood, and if parallelism is involved, child tasks may even be spawned. Ultimately though, the task will be done, and the pair return to the WorkDispatcher’s idle list, blocked in SOS_Scheduler::ProcessTasks() but ready for the next challenge.

You want pictures? Sure.

The relationship between SOS_Node, SOS_Scheduler and their dispatching components

(For the sake of honesty, I should note that a node actually has separate SchedulerManagers for normal and hidden schedulers.)

Up next

This takes care of how tasks, workers, and threads interact – at least in thread mode, which is the only mode we probably care about. In the next blog post I will look into how tasks actually get instantiated.

Speaking at SQL Saturday Manchester

I was surprised to find it within me to submit a session abstract to SQL Saturday 645 in Manchester. Not to mention delighted when my session on SQLOS scheduling got chosen.

This will be my first time speaking at a SQL Saturday, so I’m pretty excited about the experience. The plan is to cover some fundamentals, revisit a few things I have blogged about, and add bits that have probably never been covered anywhere before.

What can possibly go wrong if I get up on stage? Here endeth the SQL section of this blog post.

This can possibly go wrong when I get up on stage

In a previous life, I spent 7.5 years as a cruise ship musician, first playing in a lounge band, then moving to the show band. I ended up as music director, leading the show band from the piano, and vaguely responsible for supervising and scheduling all music around the ship. Fortunately the latter mostly involved letting people do what they’re good at, and I wasn’t in a position of any responsibility during the below incident.

Back in the mists of early 2001, I worked on a ship that was doing a series of musical theme cruises. One particular cruise was really juicy: the star attraction was a Frank Sinatra impersonator, and our normal 7-piece band got expanded to a 20-piece big band for the occasion. The act was great, the stage was satisfyingly full, and those classic arrangements were a joy to play.

The full show early in the week was over, and regular entertainment went on as usual, except that the star singer was doing a short segment in a variety show on the last night: a magician to open, the singer in the prime spot, and then a short production number from the dance cast.

In line with dinner, which was split between two seatings, shows were done twice each night, at 20:30 and then again at 22:15. Except for the last night, when the second seating had an early 18:45 show before their meal.

Somehow it never occurred to anyone to explicitly mention the flipped showtimes to the headliner. No big deal when we didn’t see his face before curtain up, because he still had twenty minutes to get backstage. But around this time we started asking whether he knows about the early show. Frantic phone calls were made, and towards the end of the magician’s act someone managed to get him on the phone after he just stepped out of the shower.

With one minute to go, and the hope that he can get dressed and ready in three more, the cruise director shouted at the band to play something – anything – while we figure out what to do. The cast was roused from their backstage lazing and told they might have to go on in two.

It was around this point where things completely fell apart. Like any production deployment, stage management works on reasonable certainty and a pre-arranged set of cues. Improvisation rarely has a happy ending.

Keeping a token brave face, the cruise director did a hand-over to the band, who figured we may as well play “A-train” and see where it goes. And the curtain opened up as a half-naked dancer frantically ran across the stage, trailing bits of costume.

The singer turned up a minute after the production cast finished their bit, culminating with the stage being littered with the disgorged content of streamer cannons and unfit for further use. The cruise director, having completely lost the erstwhile brave face, had by then walked on to tell the audience that this was indeed it, and could they please swallow their disappointment as they fill in their feedback forms rating the week’s experience.

The curtain resolutely stayed down. Muffled swearing could be heard backstage. Fingers pointed impotently.

The good news

None of this will happen at SQL Saturday Manchester. I have learned about the dangers of confetti cannons, I will turn up on time, and I probably won’t be in drag.

But if you are interested in the gubbins of context switching, what a SystemThreadDispatcher really does for a living, and you don’t live too far away, do drop by.