The Latch Files 2: The spinlock that dares not speak its name

Spinlocks live among us. We see them on duty, in uniform, and greet them by name. When we interact, they show a badge and leave a receipt for the time they eroded from our working day. Or so we’d like to think.

A spinlock headbanging orgy
A spinlock headbanging party in full spin

When looking at the 2016 SOS_RWLock, we came across the one-bit spinlock buried within its Count member. Since it protects a very simple wait structure, someone evidently made the decision that it is cheap enough to spin aggressively until acquired, with no backoff logic. This suggests that a low degree of spinlock contention is anticipated, either because few threads are expected to try and acquire the lock simultaneously or because the amount of business to be done while holding the lock is very light and likely to finish quickly.

As a reminder, such aggressive spinning looks something like this:

CurrCount = Count with spinlock bit cleared
while ( !cmpxchg(&Count, CurrCount + spinlock, CurrCount) )
{
  pause
}

This keeps probing for the moment when we catch the spinlock in the clear state, at which point we can atomically set it and treat the successful compare-and-swap as confirmation that we are now the current spinlock owner. The pause instruction between probes backs us off the CPU ever so gently, giving priority to our sibling hyperthread sharing the core, since clearly we have no better use for the CPU resource.

Exponential backoff review

As a reminder of how exponential backoff works for the fancy spinlocks we normally see (more detail here), the above busy spinning is interspersed with two forms of backoff. The below describes the lifecycle from the moment we first try and get the spinlock until we successfully acquire it, i.e. accounting for time and CPU cycles being burned during a single spinlock acquisition.

Firstly, after each failed compare-and-swap attempt, we burn CPU with a loop of N pause instructions, followed by a backoff before the next attempt. N starts at a low number: the greater of 250 or half the average spin count on this node, up to a maximum of 2500. After each CPU-burning loop we double N, capping it at ten million. After 32 times around this doubling loop, N is reset to 250. Of course, we hope to be out of the “loop of loops” well before then.

The backoff itself is a call to the Windows Sleep or SwitchToThread function, which will allow scheduling of another thread by the Windows scheduler. Importantly though, this does not yield the SQLOS scheduler, which remains bound to the worker stuck within the spinlock acquisition code. What this broadly means is that the CPU is released to do useful outstanding work other than cooperatively scheduled SQL Server code, i.e. it can run non-SQL Server code (other applications and system housekeeping) or any bits of preemptively scheduled SQL Server work that happen to be runnable at the time. In other words, there is the silver lining that some CPU may need to do stuff that nobody else has time for, and this CPU may be that poor bugger. However, unless there is a large amount of such work, the “benefit” soon wears thin once this CPU has nothing useful left to do.

If you want to get more subtle about it, consider that the above description – phrased in terms of a CPU that is hard bound to an SQLOS scheduler – only holds true when hard CPU affinity is used. Without such affinity, it needs to be put a bit more vaguely. Let’s say you have 24 processors and have set the affinity mask to use them all, meaning that SQLOS will try and keep 24 non-preemptively scheduled tasks viable (in a non-wait state) at any given moment. Now assume that one scheduler is tied up through being stuck in a long spinlock acquisition. While actively spinning, the stuck worker is burning CPU. And when it is having a backoff respite, having indicated to the Windows scheduler that it needs to cede the CPU to someone else, it remains bound to its SQLOS scheduler, which therefore remains out of action. So even if another SQLOS worker thread ends up running on the ceded CPU, we still only have 23 out of 24 schedulers available to run cooperatively scheduled tasks, with goodness how much work piling up on the runnable queue of the indisposed one.

But I digress. The point is that backoff is Good People, and exponential backoff is Clever Good People, making the best of a bad situation.

A particularly bad situation arises when we’ve been doing the spin-and-backoff tango for a really long time. So how long is a really long time? SQLOS reckons that twenty seconds is long enough to get seriously worried. I mean, this is about seventeen generations in dog years, and not something to be taken lightly in a supposedly highly concurrent system.

The counting around spin-and-backoff is done in CPU clock cycles, but this 20s tracking is driven by old-fashioned wall clock time. Once twenty seconds has passed since the first backoff, the worker’s Sick flag is set, a spinlock warning is published, and the thread is put to sleep for a whole five seconds. Once the five seconds are over, during which time the Sick state would be clearly visible in sys.dm_os_workers (is_sick = 1) and sys.dm_os_tasks (task_state = ‘SPINLOOP’), the Sick flag is cleared and the cycle starts anew, letting another twenty seconds pass before publishing the warning again.

Spinlock dynamics in latch acquisition

In my prior post on latch acquisition I only looked at the happy code path where the latch hasn’t been promoted to a superlatch and is amenable to the acquisition request – in this case the spinlock didn’t need to get involved. Sticking with the non-superlatch scenario, we now need to start considering what happens when the latch is currently held in an incompatible mode, meaning that the requesting thread has to suspend.

Although I’m not yet going into the act of suspension itself, the thing to note today is that the latch’s spinlock now comes into play. Because we’ll be updating the state of larger parts of the latch’s structure (specifically modifying the wait list) we briefly need exclusive ownership of the latch’s metadata, even if we aren’t getting ownership of the latch itself at this point.

So here we are then: we’re not in the position to modify much within the Count member, but we do need to gain ownership of the spinlock, which in this implementation means being the one who atomically finds its value as 0 and changes it to 1. This isn’t a million miles away from acquiring a normal spinlock (four bytes where we have to find a zero value and change it to our thread ID) although it has the extra twist that other parts of Count can also be changing under our noses, potentially invalidating a compare-and-swap attempt which may have succeeded if there was nothing other than the spinlock state in the data member. This doesn’t fundamentally change anything though: we still have to keep retrying until we have acquired the spinlock.

This retry looping is a modification of what happens within a regular spinlock. We first attempt the compare-and-swap illustrated in the previous post to see if the current state will allow us to acquire the latch. As described there, this could succeed immediately if the held mode is compatible with the request mode and the spinlock isn’t held. Since that ground has been covered, we are now mostly concerned with the case where either the spinlock is currently held (trumping everything else) or we know our request to be incompatible with the held mode, in which case we have to follow through by acquiring the spinlock as prerequisite for suspending.

Spinlock acquisition here also involves retrying a compare-and-swap in between CPU-burning puffs of N pause instructions. We start with N = 250, and the loop looks like this in pseudocode – I’m just omitting the bit where Count changed under our noses, necessitating going back to the start of the loop:

while (1==1)
{
  Compare-and-swap Count
  if success
  {
    consider latch acquired
    return success
  }
  if (spinlock==1) was blocking us
  {
    N = min(N * 2, 1000000)
    loop {pause} N times
    back to top of loop
  }
  if (promoted==1) was blocking us
  {
    exit into separate superlatch code path
  }
  if not allowed to suspend 
  {
    return failure
  }
  Compare-and-swap to grab spinlock
  if success
  {
    suspend
    return Suspend outcome
  }
}

Here we have two interlocking rhythms while the complete loop is being repeated:

  1. On any iteration, if Count has changed between being initially read and having compare-and-swap attempted, we go straight back to the top of the loop.
  2. If we have been blocked by someone else holding the spinlock, we pause N times and double N up to a maximum of one million (not ten million as with standalone spinlocks), then go back to the top of the loop.

So all in all, this is another highly specialised form of busy looping. It incorporates part of the thinking that went into the full spinlock design: running an exponentially increasing number of pause iterations between acquisition attempts. However, instead of volunteering to the Windows scheduler to sleep after a pause loop, it goes straight into the next acquisition attempt. Furthermore, it lacks the logic twists of resetting N to its starting point or eventually self-diagnosing as “sick”. Not quite that sophisticated on the face of it then.

The spinlock that dares not speak its name

No matter how bad contention gets for normal spinlocks, at least we account for cycles spent spinning: this stuff gets exposed in sys.dm_os_spinlock_stats and can allow us to diagnose and quantify contention. However, spinning done on a latch’s spinlock gets no obviously visible accounting. As such, if we did somehow manage to accrue a large number of spins on a very hot latch, it wouldn’t be obvious that the time went into spinning. This is not to say of course that it is necessarily a common problem, just that it would be hard to prove one way or the other.

If I appear to be painting a bleak picture, I apologise. Given the importance of latches, especially buffer latches, one would assume that the SQL Server development teams would be on the constant lookout for opportunities to identify issues and mitigate against them. And this is exactly the kind of scenario where some bright spark comes up with the idea of superlatches.

Today is not the day to crack open superlatches, but we have now opened the book on an interesting question: how does SQL Server sense that a hard-working latch deserves a promotion? This will be addressed in Part 3.

One thought on “The Latch Files 2: The spinlock that dares not speak its name”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.