The Latch Files 3: Superlatch promotion threshold

Here’s something odd. If you do an online search for “SQL Server latch promotion”, a number of top hits (e.g. this, this and this) don’t actually concern latch promotion, but rather an obscure informational message that seemed to come out of the woodwork in 2008 R2: Warning: Failure to calculate super-latch promotion threshold.”.

Somewhere among those hits you’ll find one of the very few sources around latch promotion, Bob Dorr’s excellent “How it works” blog post. One thing that bugs me about this picture is that latch promotion clearly isn’t that much talked about until people see unusual messages in their error logs.

Today I am going to provide a piece of the puzzle, namely the context around that warning message and what it means to calculate the latch promotion threshold successfully. Also, who doesn’t like a new trace flag to add to their collection of funny SQL Server tricks?

The diligent face of the lazywriter

My three-year old daughter introduced me to the wonders of the animated series Peppa Pig, which features among its characters the improbable Miss Rabbit. She is variously seen operating an ice cream van, working in a department store, piloting a rescue helicopter, and generally doing all the specialised jobs where the writers clearly decided to cap the size of the cast for mental economy.

Notwithstanding its rather specific job title, the lazywriter task exhibits some Miss Rabbit behaviour by doing bits and pieces that just need doing “occasionally”. And one of them is a regular requirement to calculate the cycle-based superlatch promotion threshold.

Now I wish I could use the phrase “cycle-based promotion threshold” in a tone that suggests we were all born knowing the context, but to be honest, I don’t yet have all the pieces. Here is the low-down as it stands in SQL Server 2014:

  • Everything I’m describing applies only to page latches.
  • A cycle-based promotion simply means one that is triggered by the observation that the average acquire time for a given page latch (i.e. the latch for a given page) has exceeded a threshold.
  • Because the times involved are so short, they are measured not in time units but in CPU ticks.
  • There exists a global flag that can enable cycle-based promotions, although I do not know what controls that flag.
  • If cycle-based promotion is disabled, there is another path to promotion; this will be be discussed in Part 4.

Calculating the superlatch promotion threshold

Once per minute, as part of lazywriter work, the method BUF::UpdateCycleBasedThreshold is called. This does a little experiment to get a rough baseline of what latch acquisition ought to cost when all is going well, and without any contention. As such, the experimental conditions are very simple. The method creates a dummy latch as a local (thread stack-based) variable, meaning that nothing else will try and interact with it, and that it is allocated from memory local to the thread. It then does a number of timed acquire/release cycles, discards outlier samples, and calculates the average of what remains. The candidate promotion threshold is forty times this average; if this number is lower than the current global promotion threshold, the threshold is lowered to our new value.

Here is some further detail if you like meat. The actual acquisition method is LatchBase::AcquireInternal(), which takes a pointer to a tick-based timestamp (collected by the RDTSC instruction) as optional parameter. If supplied, it will have have been initialised with a starting timestamp by the caller, and AcquireInternal() will take an ending timestamp just before returning, writing back the calculated elapsed tick count into that variable. However, as a twist, AcquireInternal() only does this calculation a (semi-)random 10% of the time – on other calls, it writes back an elapsed time of zero.

So with that in mind, the full picture within UpdateCycleBasedThreshold() will make more sense:

repeat 2500 times or until 50 nonzero samples
  Acquire test latch as SH;
  Note tick count if nonzero;
  Release test latch;
  repeat {_pause} random # up to 1023;
Calculate average of nonzero samples;
Zero each sample that is > 20 * avg;
if <= 10 nonzero samples
  Log "Failure to calculate..." in errorlog;
  Calculate new average;
  candidate = 40 * avg; 
  if candidate < current threshold
    sm_promotionThreshold = candidate;
    sm_promotionUpperBoundCpuTicks = 5*candidate;
    if TF 847 set
      Log updated threshold in errorlog;

A few thoughts

While I have not gone as far as looking at the 2008 R2 equivalent, I expect to find similar logic, except perhaps with configuration values giving it a higher chance of not getting enough good samples on a given try, hence those errorlog entries that got people excited. This would have been completely benign, since a subsequent try would have succeeded, and unless the calculation was really going to yield a new lowered threshold, it wouldn’t have made any difference to the state of the system.

sm_promotionThreshold, a static field within the BUF class, is obviously the current global promotion threshold. I have not found anywhere it is ever reset to a higher number, although it is possible that this could happen. I’ll discuss sm_promotionUpperBoundCpuTicks in Part 4.

The actual value calculated as promotion threshold feels interesting. The code path being measured is a fairly straight arrow, so variation in cycle counts between test acquisitions is likely indicative of CPU cache volatility or interrupt activity (in cases where the latter is so short that it doesn’t yield an outlier tick count that gets discarded). As such, movement in the “candidate threshold” calculated at each iteration has the flavour of an obscure bit of system health information.

Finally, the obligatory note about trace flags. I have no reason to believe that TF 847 does anything other than this extra bit of informational logging, and given the nature of this calculation, it seems very lightweight and harmless. You may find it interesting to set it as startup flag on a test system and observe what comes out, because it could tell you something about the state of the system. The text that is logged upon a lowering of the threshold is:

Super-latch promotion threshold updated to ###.

Next up in the series: a visit to the decision tree that leads to latch promotion.

The Latch Files 2: The spinlock that dares not speak its name

Spinlocks live among us. We see them on duty, in uniform, and greet them by name. When we interact, they show a badge and leave a receipt for the time they eroded from our working day. Or so we’d like to think.

A spinlock headbanging orgy
A spinlock headbanging party in full spin

When looking at the 2016 SOS_RWLock, we came across the one-bit spinlock buried within its Count member. Since it protects a very simple wait structure, someone evidently made the decision that it is cheap enough to spin aggressively until acquired, with no backoff logic. This suggests that a low degree of spinlock contention is anticipated, either because few threads are expected to try and acquire the lock simultaneously or because the amount of business to be done while holding the lock is very light and likely to finish quickly.
Continue reading “The Latch Files 2: The spinlock that dares not speak its name”

The Latch Files: Out for the count

Time to start chipping away at the monster subject of storage engine latches. If you’re anything like me, you were really impressed by the expositions of latches done by James Rowland-Jones (in Professional SQL Server 2008 Internals and Troubleshooting) and Bob Ward (PASS Summit “Inside Latches” session) when this information first started dribbling out. Now we have reached a point in history where latches seem to be used as a swear word. Well, for the record, I am still fascinated by them, and their internals are pretty darn marvellous.

Today I’m going to keep it comparatively focused, looking at nothing other than the Count member of the LatchBase class. Specifically, I’ll only be considering the act of acquiring an uncontended un-promoted latch, based on the SQL Server 2014 and 2016 latch implementation.
Continue reading “The Latch Files: Out for the count”

Unsung SQLOS: SOS_WaitableAddress

One of the more amusing words in the SQL Server synchronisation lexicon is “lightweight”. Locks bad. Nolocks good. Latches lightweight. The more spinlocks you eat, the more wait you lose!

If only things were that simple… But hey, I love the poetry of compromise. Check out the SOS_WaitableAddress for one of the many competing definitions of “lightweight”.
Continue reading “Unsung SQLOS: SOS_WaitableAddress”

Unsung SQLOS: the 2016 SOS_RWLock


Talk about serendipity. I’ve been working on a progression of blog posts that would include dealing with the SOS_RWLock in both 2014 and 2016 versions, and today is a perfect excuse to align it with the 2016-themed T-SQL Tuesday hosted by Michael J Swart.

The 2014 incarnation of the SOS_RWLock looked sensible enough, but since we’ve been told it was improved, it’s a great opportunity to see how one goes about tuning a lock algorithm. So with lock-picking tools in hand, follow me to the launch party of the Spring 2016 SQLOS collection to see what the hype is all about. Is the 2014 implementation truly Derelocte?
Continue reading “Unsung SQLOS: the 2016 SOS_RWLock”

Unsung SQLOS: the classic SOS_RWLock

Moving along with our bestiary of synchronisation classes, the SOS_RWLock, a reader-writer lock, feels like a logical next stop. It has been in the news recently, it has fairly simple semantics, and it is built upon primitives that we have already explored, namely spinlocks, linked lists and the EventInternal class. Its implementation is quite a leap from the simple SOS_Mutex and there is more scope for alternative implementations providing the same functionality. And, would you believe it, as called out by Bob Dorr, the 2012/2014 implementation has now been found wanting and got rewritten for 2016. Today we’re looking at the “classic” version though, because we then get the chance to understand the 2016 rewrite in terms of concrete design decisions.
Continue reading “Unsung SQLOS: the classic SOS_RWLock”

Unsung SQLOS: the SOS_Mutex

A mutex, short for “mutual exclusion”, is arguably the simplest waitable synchronisation construct you can imagine. It exposes methods for acquisition and release, and the semantics are straightforward:

  • Initially it isn’t “owned”, and anybody who asks to acquire it is granted ownership
  • While owned, anybody else who comes around to try and acquire it must wait her turn
  • When the owner is done with it, the mutex is released, which then transfers ownership to one waiter (if any) and unpends that waiter

A mutex can also validly be referred to as a critical section, in the sense that it protects a critical section of code, or more accurately, data. When programming libraries expose both a mutex and a critical section, as Windows does, it really just reflects different implementations of synchronisation objects with the same semantics. You could also consider a spinlock to be a flavour of mutex: while the name “spinlock” describes the mechanism by which competing threads jostle for exclusive ownership (it can’t be politely waited upon), the idea of mutual exclusion with at most one concurrent owner still applies.

SOS_Mutex class layout and interface

This class is directly derived from EventInternal<SuspendQSlock>, with three modifications:

  1. The addition of an ExclusiveOwner member.
  2. The override of the Wait() method to implement mutex-specific semantics, although the main act of waiting is still delegated to the base class method.
  3. The addition of an AddAsOwner() method, called by Wait(), which crowns the ambient task as the exclusive owner after a successful wait.

Continue reading “Unsung SQLOS: the SOS_Mutex”

Unsung SQLOS: the EventInternal

Today we’re taking a step towards scheduler territory by examining the EventInternal class, the granddaddy of SQLOS synchronisation objects. At the outset, let’s get one formality out of the way: although it is a template class taking a spinlock type as template parameter, we only see it instantiated as EventInternal<SuspendQSLock> as of SQL Server 2014. What this means is that spins on its internal spinlock is always going to be showing up as SOS_SUSPEND_QUEUE.

It’s a very simple class (deceptively so even) which can implement a few different event flavours, doing its waiting via SQLOS scheduler methods rather than directly involving the Windows kernel. The desire to keep things simple and – as far as possible – keep control flow out of kernel mode is a very common goal for threading libraries and frameworks. .Net is a good frame of reference here, because it is well documented, but the pattern exists within OS APIs too, where the power and generality of kernel-mode code has to be weighed off against the cost of getting there.
Continue reading “Unsung SQLOS: the EventInternal”

Unsung SQLOS: the SystemThread

SystemThread, a class within sqldk.dll, can be considered to be at the root of SQLOS’s scheduling capabilities. While it doesn’t expose much obviously exciting functionality, it encapsulates a lot of the state that is necessary to give a thread a sense of self in SQLOS, serving as the beacon for any code to find its way to an associated SQLOS scheduler etc. I won’t go into much of the SQLOS object hierarchy here, but suffice it to say that everything becomes derivable by knowing one’s SystemThread. As such, this class jumps the gap between a Windows thread and the object-oriented SQLOS.
Continue reading “Unsung SQLOS: the SystemThread”

Windows, mirrors and a sense of self

In my previous post, Threading for humans, I ended with a brief look at TLS, thread-local storage. Given its prominent position in SQLOS, I’d like to take you on a deeper dive into TLS, including some x64 implementation details. Far from being a dry subject, this gets interesting when you look at how TLS helps to support the very abstraction of a thread, as well as practical questions like how cleanly any/or efficiently SQLOS encapsulates mechanisms peculiar to Windows on Intel, or for that matter Windows as opposed to Linux.
Continue reading “Windows, mirrors and a sense of self”