The Latch Files 3: Superlatch promotion threshold

Here’s something odd. If you do an online search for “SQL Server latch promotion”, a number of top hits (e.g. this, this and this) don’t actually concern latch promotion, but rather an obscure informational message that seemed to come out of the woodwork in 2008 R2: Warning: Failure to calculate super-latch promotion threshold.”.

Somewhere among those hits you’ll find one of the very few sources around latch promotion, Bob Dorr’s excellent “How it works” blog post. One thing that bugs me about this picture is that latch promotion clearly isn’t that much talked about until people see unusual messages in their error logs.

Today I am going to provide a piece of the puzzle, namely the context around that warning message and what it means to calculate the latch promotion threshold successfully. Also, who doesn’t like a new trace flag to add to their collection of funny SQL Server tricks? If you want to fast-forward to what the threshold is used for, see my post on superlatch promotion rules.

The diligent face of the lazywriter

My three-year old daughter introduced me to the wonders of the animated series Peppa Pig, which features among its characters the improbable Miss Rabbit. She is variously seen operating an ice cream van, working in a department store, piloting a rescue helicopter, and generally doing all the specialised jobs where the writers clearly decided to cap the size of the cast for mental economy.

Notwithstanding its rather specific job title, the lazywriter task exhibits some Miss Rabbit behaviour by doing bits and pieces that just need doing “occasionally”. And one of them is a regular requirement to calculate the cycle-based superlatch promotion threshold.

Now I wish I could use the phrase “cycle-based promotion threshold” in a tone that suggests we were all born knowing the context, but to be honest, I don’t yet have all the pieces. Here is the low-down as it stands in SQL Server 2014:

  • Everything I’m describing applies only to page latches.
  • A cycle-based promotion simply means one that is triggered by the observation that the average acquire time for a given page latch (i.e. the latch for a given page) has exceeded a threshold.
  • Because the times involved are so short, they are measured not in time units but in CPU ticks.
  • There exists a global flag that can enable cycle-based promotions, although I do not know what controls that flag.
  • If cycle-based promotion is disabled, there is another path to promotion; this will be be discussed in Part 4.

Calculating the superlatch promotion threshold

Once per minute, as part of lazywriter work, the method BUF::UpdateCycleBasedThreshold is called. This does a little experiment to get a rough baseline of what latch acquisition ought to cost when all is going well, and without any contention. As such, the experimental conditions are very simple. The method creates a dummy latch as a local (thread stack-based) variable, meaning that nothing else will try and interact with it, and that it is allocated from memory local to the thread. It then does a number of timed acquire/release cycles, discards outlier samples, and calculates the average of what remains. The candidate promotion threshold is forty times this average; if this number is lower than the current global promotion threshold, the threshold is lowered to our new value.

Here is some further detail if you like meat. The actual acquisition method is LatchBase::AcquireInternal(), which takes a pointer to a tick-based timestamp (collected by the RDTSC instruction) as optional parameter. If supplied, it will have have been initialised with a starting timestamp by the caller, and AcquireInternal() will take an ending timestamp just before returning, writing back the calculated elapsed tick count into that variable. However, as a twist, AcquireInternal() only does this calculation a (semi-)random 10% of the time – on other calls, it writes back an elapsed time of zero.

So with that in mind, the full picture within UpdateCycleBasedThreshold() will make more sense:

repeat 2500 times or until 50 nonzero samples
  Acquire test latch as SH;
  Note tick count if nonzero;
  Release test latch;
  repeat {_pause} random # up to 1023;
Calculate average of nonzero samples;
Zero each sample that is > 20 * avg;
if <= 10 nonzero samples
  Log "Failure to calculate..." in errorlog;
  Calculate new average;
  candidate = 40 * avg; 
  if candidate < current threshold
    sm_promotionThreshold = candidate;
    sm_promotionUpperBoundCpuTicks = 5*candidate;
    if TF 847 set
      Log updated threshold in errorlog;

A few thoughts

While I have not gone as far as looking at the 2008 R2 equivalent, I expect to find similar logic, except with logic that is more susceptible to not getting enough samples on a given attempt, hence those errorlog entries that got people excited (see Update below). This would have been completely benign, since a subsequent try would have succeeded, and unless the calculation was really going to yield a new lowered threshold, it wouldn’t have made any difference to the state of the system.

sm_promotionThreshold, a static field within the BUF class, is obviously the current global promotion threshold. I have not found anywhere it is ever reset to a higher number, although it is possible that this could happen. I’ll discuss sm_promotionUpperBoundCpuTicks in Part 4.

The actual value calculated as promotion threshold feels interesting. The code path being measured is a fairly straight arrow, so variation in cycle counts between test acquisitions is likely indicative of CPU cache volatility or interrupt activity (in cases where the latter is so short that it doesn’t yield an outlier tick count that gets discarded). As such, movement in the “candidate threshold” calculated at each iteration has the flavour of an obscure bit of system health information.

Finally, the obligatory note about trace flags. I have no reason to believe that TF 847 does anything other than this extra bit of informational logging, and given the nature of this calculation, it seems very lightweight and harmless. You may find it interesting to set it as startup flag on a test system and observe what comes out, because it could tell you something about the state of the system. The text that is logged upon a lowering of the threshold is:

Super-latch promotion threshold updated to ###.

Next up in the series: a visit to the decision tree that leads to latch promotion.

Update: a theory about the 2008 bug

While I still haven’t had an opportunity to look into the 2008 logic, it seems very likely to me now that the random pause after each sample is the bug fix. The reason lies in how LatchBase::AcquireInternal() implements its 10% probability that it measure execution time on a given call. Because true randomisation is an unnecessarily expensive luxury in such a performance-critical method, it uses a very simple trick: a rdtsc (CPU cycle) timestamp is taken, and if the lower ten bits of this yields number less than 100, we deem the one-in-ten (actually 100 in 1024) chance to have happened.

This is perfectly fine for most uses of this code path, which is invoked at more or less unpredictable times. However, the sampling code path above is atypical, since (without that random pause) it is a rapid-fire latch acquisition in a loop with fairly predictable periodicity. Now if it turns out that each loop cycle takes a number of CPU cycles that is close to a multiple of 1024, and we start calling it at a time when the low ten bits of RDTSC is in the range 0-99, a long string of subsequent times through the loop will not get their “one in ten” chance.

Following through this thought, by adding a random pause up to 1023 CPU cycles, we shake things up to the point where the 10% sampling works as intended without needing to modify the much more critical AcquireInternal() itself.

2 thoughts on “The Latch Files 3: Superlatch promotion threshold”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.