Fishing for wait types in WinDbg

Last night a #sqlhelp question from Monica Rathbun (@SQLEspresso) caught my eye:

@SQLEspresso Twitter question

Now some of us take way too much delight in worrying about obscure wait types, and since I’ve recently been in preemptive territory I thought I should take some degree of interest. Spoiler alert: I did nothing to solve Monica’s problem, but my attempt to figure out where the wait type might emanate from made me realise that this is worth a blog post.

Without getting hung up on the detail, here is a very crude and simple way to hunt for areas of SQL Server that may use a particular wait type. The only prerequisite is that you need to be willing and able to attach Windbg to SQL Server, and that you have public symbols loaded.

In this case I was looking for PREEMPTIVE_COM_RELEASE, and sys.dm_xe_map_values tells me that on my 2014 RTM instance it has an index of 01d4 hexadecimal. Crazy as it sounds, I’m going to do a simple search through the code to look for places that magic number is used. As a two-byte (word) pattern we’ll get lots of false positives, but fortunately wait types are internally doublewords, with only one bit set in the high-order word. In other words, we’re going to look for the pattern 000101d4, 000201d4, 000401d4 and so forth up to 800001d4. Ignore the meaning of when which bit is going to be set; with only sixteen permutations, it’s quick enough to try them all.

Let’s focus on sqllang as the likely source – the below would apply to any other module too.

Upon starting the debugger, the module load addresses are listed right away. In my case, I got this for sqllang:

ModLoad: 00007ffe`23870000 00007ffe`25ad7000   C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\Binn\sqllang.dll

So we have a start and end memory address. Take note of the length in bytes, using Windbg as a calculator:

0:063> ? 07ffe`25ad7000 - 7ffe`23870000
Evaluate expression: 36073472 = 00000000`02267000

Great. Now we have everything we need. The s command searches for patterns in a range of memory, and we’ll use the -d flag to make it a doubleword search. First few tries come up empty:

0:063> s -d 0x7ffe`23870000 L0x2267000 000101d4
0:063> s -d 0x7ffe`23870000 L0x2267000 000201d4
0:063> s -d 0x7ffe`23870000 L0x2267000 000401d4
0:063> s -d 0x7ffe`23870000 L0x2267000 000801d4

But now we get one:

0:063> s -d 0x7ffe`23870000 L0x2267000 001001d4
00007ffe`287d39f8  001001d4 8948f633 48602474 68247489  ....3.H.t$`H.t$h

Ignore everything other than the address at the start of the line – we’re not expecting the byte dump to make sense to the human eye. Let’s see what piece of code this belongs to – the uf disassembles the function that this piece of memory falls in.

0:063> uf 0x7ffe`287d39f8

I’m not even going to show you the output, because this one turned out to be a red herring – experience and/or intuition needed to confirm that. But let’s go on…

 
0:063> s -d 0x7ffe`23870000 L0x2267000 002001d4
0:063> s -d 0x7ffe`23870000 L0x2267000 004001d4
0:063> s -d 0x7ffe`23870000 L0x2267000 008001d4
0:063> s -d 0x7ffe`23870000 L0x2267000 010001d4
0:063> s -d 0x7ffe`23870000 L0x2267000 020001d4
0:063> s -d 0x7ffe`23870000 L0x2267000 040001d4
0:063> s -d 0x7ffe`23870000 L0x2267000 080001d4
0:063> s -d 0x7ffe`23870000 L0x2267000 100001d4

Still nothing, but then we strike gold:

 
0:063> s -d 0x7ffe`23870000 L0x2267000 200001d4
00007ffe`246a3fe4  200001d4 244c8d48 32bee840 4890ff1d  ... H.L$@..2...H
00007ffe`246a6694  200001d4 244c8d48 0c0ee840 4890ff1d  ... H.L$@......H
00007ffe`246a8d44  200001d4 244c8d48 e55ee840 4890ff1c  ... H.L$@.^....H
00007ffe`246ac8a4  200001d4 244c8d48 a9fee840 4890ff1c  ... H.L$@......H
00007ffe`246b0404  200001d4 244c8d48 6e9ee840 4890ff1c  ... H.L$@..n...H
... and many more!

Try the uf trick again on the first one:

0:063> uf 0x7ffe`246a3fe4

And we get rewarded with a disassembly of the function sqllang!IWrapInterface<IAccessor>::Release – this one pretty much comes with flashing lights given that IAccessor reeks of COM and we were expecting something involving “RELEASE”. I’ll spare you the bulk of the assembly dump, but would like to highlight the bit that confirms the setup of a preemptive wait type:

00007ffe`246a3fe3 bad4010020      mov     edx,200001D4h
00007ffe`246a3fe8 488d4c2440      lea     rcx,[rsp+40h]
00007ffe`246a3fed e8be321dff      call    sqllang!AutoSwitchPreemptive::AutoSwitchPreemptive (00007ffe`238772b0)

That assignment to the edx register means that the encoded wait type is the second parameter to the AutoSwitchPreemptive constructor. And while it may not always be a recognisable setup, in this case I was already familiar with AutoSwitchPreemptive (see here).

Now this kind of trawling is by no means scientific. The wait type could have been loaded from a memory address, in which case it wouldn’t have been hard-coded in the function. And of course without the code running in context, it doesn’t tell you what kind of call stack it might show up in – only running the relevant code paths and catching the wait through a breakpoint or XEvent will do that. But as a quick and dirty way of hunting for wait type usage in a module up there on the marble slab? Hey, it works for me.

The DMV diaries: Worker, task, request and session state

We have all been there. You believe that a certain status (e.g. is the order shipped?) lives in a simple database column, only to find that it comes from a view built on a view with all kinds of creative CASE statements. And it may look ugly, but at the end of the day, you have to admit that it successfully serves the purpose of exposing business data in the way that users expect to see it.

Guess what: The “V” in “DMV” exists for a similar reason. Today I’ll be whizzing through the various ways in which the status of a running piece of work is exposed to us in sys.dm_os_workers, sys.dm_os_tasks, sys.dm_exec_requests, and sys.dm_exec_sessions.
Continue reading “The DMV diaries: Worker, task, request and session state”

Scheduler stories: Going Preemptive

SQLOS is built upon the idea of cooperative, AKA non-preemptive, scheduling: out of any given collection of threads belonging to a scheduler, only one will own the scheduler at a given moment. To the degree that these cooperative threads represent the only work done by the underlying CPU, this means that the thread owning the scheduler really owns the CPU. Of course, a CPU will occasionally get side-tracked into doing other work, so the SQLOS scheduler as “virtual CPU” only represents a chunk of the real CPU, but we live in the expectation that this is a suitably large chunk.

John Tenniel's White Rabbit from "Alice in Wonderland"

It is understandable when such competition for the CPU comes from outside of SQL Server, but it can also be instigated from within SQL Server itself. This is the world of preemptive – or if you prefer, antisocial – scheduling.
Continue reading “Scheduler stories: Going Preemptive”

Scheduler stories: Interacting with the Windows scheduler

In the previous post, The joy of fiber mode, we saw how a fiber mode scheduler firmly controls which worker runs on a thread at a given moment. While it can’t positively ensure that the thread in question remains running all the time, the soul of the scheduler lives in that one thread, and as long as the thread runs, the scheduler gets invoked by its team of fiber workers, dispatching them in an appropriate order.
Continue reading “Scheduler stories: Interacting with the Windows scheduler”

Scheduler stories: The joy of fiber mode

Probably the funniest thing I had ever seen on stage was a two-hander called “Frank ‘n Stein”. It’s a telling of the classic Frankenstein story, with the physical comedy of two actors having to rotate continuously between a large number of roles, including a whole crowd chasing the monster. This was all made possible by them never leaving the stage, but instead changing characters in front of the audience, using only rudimentary props to help differentiate the characters.

If this is the only thing you remember about fiber mode scheduling, it should see you through.
Continue reading “Scheduler stories: The joy of fiber mode”

Scheduler Stories: When does your scheduler run?

I am planning to burn a fair number of cycles on SQLOS scheduling internals for the foreseeable future, and with some luck, this turns into an interesting series. OS scheduling is already a subject that belongs “on the other side of the looking glass”, and this only gets more interesting when we look at user-mode SOS_Scheduler scheduling built on top of it.

If I don’t specifically mention a version, my frame of reference is SQL Server 2014. Yes, things changed since then, but the 2012-2014 scheduler is a good starting point, and the fundamental mechanisms I’ll initially cover have changed very little since the User Mode Scheduler (UMS) of SQL Server 7.0.
Continue reading “Scheduler Stories: When does your scheduler run?”

A spanking new ReaderWriterSpinlock

Well, I never. There is life in the old spinlock yet!

Being a synchronisation fetishist, I took great interest in last week’s great blog post by Sanjay Mishra and Arvind Shyamsundar about the ReaderWriterSpinlock added in 2016 CU2. Great story, happy ending, room for a sequel, all good news.

We have already seen recent improvement in the reader-writer lock algorithm, so clearly people are finding more corners to squeeze. Now what, pray tell, is a reader-writer spinlock?
Continue reading “A spanking new ReaderWriterSpinlock”

The Thread in the Head: Dr Sqlos explains context switching

The thread in the head

Oh, the things you can schedule,
the games you can play!
If you keep a clear head, you’ll
be well on your way.

You start with a blank sheet,
three nuts and a bolt,
a strong sense of fairness,
a large can of Jolt.
Continue reading “The Thread in the Head: Dr Sqlos explains context switching”

DBCC PAGE and buffer pool disfavoring

A page on a table, with "DBCC" in music notation written on it

While preparing material for my post on latch promotion rules, I found this very interesting Stack Exchange question by Jeremiah Peschka about SQL Server’s LRU-K algorithm and the metrics that support it. It turned out that SQL Server doesn’t expose those in a useful way, but I was really impressed by some experimental evidence provided by Martin Smith, and his excellent deductions. Martin has clearly been on the case for a while and has highlighted that LRU-K (or for that matter Time of Last Access) isn’t well documented at all.

I’m not going to look at the general case of aging out buffers today; instead I’m just confirming and extending Martin’s observations about how DBCC PAGE interacts with buffers.

The return of bUse1

Quick refresher: the lazywriter maintains an internal 16-bit “clock hand” that counts seconds and thus rolls over every eighteen hours or so. Its current value is used as a cheap and easy way to measure the progress of time in places where we don’t care about absolute time, but only need the ability to note how many seconds have passed since a logged event.
Continue reading “DBCC PAGE and buffer pool disfavoring”

King Arthur, Energizer bunnies, and the search for the SQLOS scheduler

ARTHUR: Who lives in that castle?
WOMAN: No one lives there.
ARTHUR: Then who is your lord?
WOMAN: We don’t have a lord.
ARTHUR: What?
DENNIS: I told you. We’re an anarcho-syndicalist commune. We take it in turns to act as a sort of executive officer for the week.

(from Monty Python and the Holy Grail)

Yeah, whatever. I want to hear about bunnies

Picture a world consisting of forty Energizer bunnies, grouped into four teams of ten. Each team has one battery between them, and the main rule of the game is that each bunny may only use the battery for a little while before transferring it to a teammate. There is no way for a sleeping player to be woken up except by being passed the battery, and each battery strictly stays within one team.
Continue reading “King Arthur, Energizer bunnies, and the search for the SQLOS scheduler”