DBCC PAGE and buffer pool disfavoring

A page on a table, with "DBCC" in music notation written on it

While preparing material for my post on latch promotion rules, I found this very interesting Stack Exchange question by Jeremiah Peschka about SQL Server’s LRU-K algorithm and the metrics that support it. It turned out that SQL Server doesn’t expose those in a useful way, but I was really impressed by some experimental evidence provided by Martin Smith, and his excellent deductions. Martin has clearly been on the case for a while and has highlighted that LRU-K (or for that matter Time of Last Access) isn’t well documented at all.

I’m not going to look at the general case of aging out buffers today; instead I’m just confirming and extending Martin’s observations about how DBCC PAGE interacts with buffers.

The return of bUse1

Quick refresher: the lazywriter maintains an internal 16-bit “clock hand” that counts seconds and thus rolls over every eighteen hours or so. Its current value is used as a cheap and easy way to measure the progress of time in places where we don’t care about absolute time, but only need the ability to note how many seconds have passed since a logged event.

The case we looked at before was the resetting of certain BUF fields used in latch promotion logic if more than three seconds have elapsed since the previous reset. This was done by setting bUse1 to the “now” value of that clock hand upon reset, and then measuring the passage of time on subsequent touches of the page by seeing how much bigger “now” is than that saved value.

In that latch promotion use case, bUse1 tells us – give or take four seconds – when the page was last touched. And as such, the field fulfils its more obvious function of stating Time of Last Access, which drives the identification of older pages that are first in line for eviction from the buffer pool.

Buffer pool disfavoring

Paul Randal has given some background on buffer pool disfavoring, which is a way of marking a page as “less important than others” from the viewpoint of cache eviction. I’ve always thought it is a great idea, although it’s one of those things that one tends to forget about.

Clearly some folks do think about it a lot though; witness this very recent Connect item by David Baffaleuf. This touches on the one common case of buffer pool disfavoring, where some intelligence goes into the decision whether or not to disfavor.

However, there is at least one other case of disfavoring, which is what Martin came across: the use of DBCC PAGE marks a page as disfavored. And although I have yet to confirm it through finding more examples, I rather expect to find that this is true of many DBCC commands and the related sys.dm_db_index_physical_stats.

The actual mechanism involved is ingeniously simple, and might be described as instrumentation fraud. Immediately after touching and retrieving the page, which would guarantee that bUse1 is going to show a timestamp no more than four seconds old, another piece of code comes along and hides the evidence by setting it back by an hour.

Implementation details

We previously saw that the timestamping of a BUF access, as well as the maintenance of latch promotion-related metrics, live in the method BPool::Touch(), called by BPool::Get(), which in turn is the standard way to read a page, returning a pointer to the latched BUF which serves as our ticket to page access. It turns out that Touch() has a little brother BUF::Untouch(). This is the method which sets bUse1 to an hour in the past, and seems to be THE entry point into buffer pool disfavoring.

DBCC PAGE uses a helper class, sqlmin!LatchedBuf, which despite its generic-sounding name appears to be very specifically a DBCC helper. This wraps page access in such a way that you can’t accidentally forget to release the page latch, since the release is done in the LatchedBuf destructor. However, another thing you get for free is that the buffer is disfavored by having Untouch() invoked.

This all makes perfect sense for CHECKDB, and while the disfavoring probably wasn’t a primary design choice for DBCC PAGE, it isn’t surprising to find that a more obscure and semi-documented DBCC command would inherit bits of shared DBCC codebase.

Musings

Revisiting Paul’s description (“…a buffer can be marked as the least recently used of all the buffers in the buffer pool…”) it seems that this is strictly speaking true only when your page life expectancy is less than an hour. With modern memory sizes, it may very well be that your working set is so stable that it contains a lot of pages that have not been touched in more than an hour, in which case the disfavored pages might actually be more favored than those other dusty ones. Thus really old pages could be evicted in favour of supposedly disfavored pages, but hey, that probably won’t cause you any real tears. On the other hand, since I haven’t dug properly into the lazywriter yet, there may be more magic at work.

David’s Connect item also makes for an interesting exploration. Armed with the knowledge of how SQL Server implements disfavoring at the moment, I suspect that his wish for a special wait type can’t be granted without some significant BPool/BUF rework. Consider that PAGELATCH would need to be broken into even finer grain. Not only should the request for a page latch – via BPool::Get() – be tagged with the reason why the request went to sleep (IO or not), but in the IO case we’d now we need the historic knowledge that the page was previously in the buffer pool and was evicted for a specific reason.

Since nobody but the caller of BUF::Untouch() knows that a page was disfavored, this history has evaporated by the time the page gets evicted. But even if a new flag was created to note the disfavoring, we’ll now need to persist this knowledge after page eviction, either on disk (expensive) or in memory (wasteful). It does makes for a great thought experiment of the form “How would I design the buffer pool?” though!

The presence of bUse1 on the BUF::Dump() part of DBCC PAGE’s output may be a historic curiosity, since it doesn’t actually expose any useful information other than “give or take a few seconds, this is the current value of the lazywriter’s internal counter”. This is because, as Martin observed, it will have been updated during the act of retrieving the page for DBCC PAGE or during a normal page retrieval a few seconds before. And then, having displayed a “now” value, Untouch() is immediately invoked behind our backs to set bUse1 to a timestamp from one hour ago.

Apart from this special handling of bUse1 though, a DBCC PAGE access is pretty much a normal BUF access. If you somehow managed to rapid-fire a large number of DBCC PAGE requests at the same page, you might just see breferences creep up as described here, but to be honest, if you want to see that happen, an artificial workload querying the page, combined with a single DBCC PAGE check is the way to go.

Finally, the page access done by DBCC PAGE is an example of a beast I’ll cheekily label with (nolatch). Normally you would not read anything from a database page unless you held at least a shared latch, because without that latch in place, there is no guarantee against the page getting modified or even evicted while you’re walking through its data structures. However, given an understandable concern for minimising interference in a busy system, here the page latch is released as soon as the BUF header contents is dumped. This takes us into the exciting terrain of prancing around with a pointer to a data structure after de-registering our interest in it. Then again, DBCC PAGE is what it is, it can cope with the risks, and it is grown-up code that is trusted not to actually corrupt anything. Just don’t try this at home.

Update: How come I didn’t spot (or much better yet, simply assume) that Lonny was here? My crystal ball shows us batting this thing around a bit more.

4 thoughts on “DBCC PAGE and buffer pool disfavoring”

  1. Yup – DBCC PAGE has a lot of safeguards that I put in to ensure it’s not tripping over unformatted space. It’s also not possible for it to corrupt anything as it never changes the pages it’s dumping.

  2. Another excellent post!

    Something to think about on another day…
    One of the most interesting questions to me about buffer pool disfavoring:
    Since it is meant to prevent poisoning/churning the bpool, and when SQL Server is NUMA aware each node’s bpool can churn separately, is the bpool disfavor theshold in full table/index scans calculated based on server-wide aggregate bpool size, or bpool size for the NUMA node of the serial thread/coordinator thread of a parallel query? Since parallel queries are much more likely to cross NUMA node boundaries after SQL Server 2012, in some sense it may make more sense if calculated based on server-wide bpool size. But… since the disfavoring algorith was introduced well before SQL Server 2012 and *may* not have been updated since, perhaps its calculated on the “preferred” node only.

Leave a Reply

Your email address will not be published. Required fields are marked *