This is the third part of a series about various places where “context” in some form or another crops up within SQL Server. The story so far:
- What the CPU sees in you takes us down to a very low level, below such towering abstractions as object instances and threads. Not a prerequisite to make sense of today’s post.
- A pretty stack frame is like a melody looks at the stack frame of a single invocation of a single function as a self-contained context within a string of other such contexts up the call chain. Although I’ll pick up examples I first brought up there, again you don’t need to have absorbed it to follow this one.
In this episode, I look at one aspect of polymorphic behaviour, whereby the address we use to refer to an object may change depending on what interface the called member function belongs to.
Today the world was just an address, a place for me to live in
In Part 2 my first and simplest example of a member function was the below. For a supplied security context token address in the rcx register, this returns the 32-bit “impersonation type”, which is a member variable, presumably of an enumeration type:
sqllang!CSecContextToken::GetImpersonationType: =============================================== mov eax,dword ptr [rcx+1188h] ret
The emphasis back then was on the implicit stack context – while only the ret(urn) instruction explicitly references the stack, the function was written for an ecosystem where one can rely on certain stack conventions being followed.
Now I’ll look at it from another angle: how sure am I that it is indeed the CSecContextToken instance’s address that goes into that first rcx parameter? As a reminder, when calling a class instance method in an object-oriented language, the first parameter is a pointer to the instance, i.e. the address of the block of memory storing its state. However, high-level languages normally hide this parameter from you. What the programmer might think of as the method’s first parameter is actually the second, with the compiler inserting a This pointer as a preceding parameter. This is nothing more than syntactic sugar, e.g. translating the equivalent of the “parameterless” function call h = orderDate.HourPart() into a call of the form h = HourPart(&orderDate). In fact, in .Net, extension method syntax exposes the true nature of “instance” methods.
In a simple class, assuming that rcx is indeed the address of the instance would be a safe guess, but this isn’t a simple class.
Try this pair of GetData() overloads for size. Both take a single address as a parameter, and return a different address; being overloads, I’ll include the function addresses to disambiguate them. Firstly this one which returns an address 0x1448 bytes lower than the supplied address:
sqllang!CSecContextToken::GetData (00007ffb`ee810e40): ====================================================== lea rax, [rcx-1448h] ret
Clearly if rcx is the This pointer, the return value is something that lives before the CSecContextToken in memory. But the function is within the CSecContextToken class, so that doesn’t look very plausible. We’ll have to assume that rcx doesn’t in fact contain the address of the CSecContextToken instance: that address is somewhere on the West side of rcx. (Spoiler alert. The returned address is actually the address of the CSecContextToken.)
Now for the second overload which calls the first as a tail call after a sleight-of-hand adjustment to rcx:
sqllang!CSecContextToken::GetData (00007ffb`ee810e30): ====================================================== movsxd rax, dword ptr [rcx-4] sub rcx, rax jmp 00007ffb`ee810e40
In English:
- For the supplied address, take the 32-bit value four bytes below it and sign-extend it to 64 bits. The use of movsxd rather than the zero-extend sibling movzxd tells us that this is a signed member variable, whether or not negative numbers will actually get used in practice.
- Deduct this number from the supplied address.
- Call the other overload with this adjusted rcx as its first parameter.
Since one calls the other, clearly the two overloads return the same value from the same object instance, but they do so from different starting point addresses:
- The first one deals with an rcx that is a fixed offset of 0x1448 bytes away from the address within the object it must return. This return value may be the address of the CSecContextToken, or it may itself be somewhere within it – there isn’t enough information supplied to be certain.
- The second one is a lot more complex. Its rcx parameter points somewhere within a structure which itself contains the offset from the address we need to pass to the first overload. In other words, while the first overload’s offset is known at compile time, the second one requires runtime information saved within the object instance to do that little calculation.
The nitty gritty of multiple inheritance
The broad explanation for all these different ways of accessing the same CSecContextToken is that it implements quite a number of interfaces due to multiple inheritance – seven of them to be precise – and each of them has its own virtual function table. I have previously looked at vftables in Indirection indigestion, virtual function calls and SQLOS, but now we have a more chunky context to chew on.
Let’s just back up that assertion by checking our public symbols in Windbg: x /n sqllang!CSecContextToken::`vftable’ yields the following:
00007ffb`f014e530 sqllang!CSecContextToken::`vftable' =00007ffb`eff985e0 sqllang!CSecContextToken::`vftable' = 00007ffb`f014e568 sqllang!CSecContextToken::`vftable' = 00007ffb`effd7ea0 sqllang!CSecContextToken::`vftable' = 00007ffb`eff99080 sqllang!CSecContextToken::`vftable' = 00007ffb`f014e550 sqllang!CSecContextToken::`vftable' = 00007ffb`effd7ee0 sqllang!CSecContextToken::`vftable' =
It turns out that it’s the last one which is of interest to us, because it is the only one which contains an entry for one of our GetData() method, namely the second one. Here is the evidence: dps 7ffb`effd7ee0:
00007ffb`effd7ee0 00007ffb`eead6e70 sqllang!CSecCtxtCacheEntry::GetKey 00007ffb`effd7ee8 00007ffb`ef4f7140 sqllang!CSecContextToken::Serialize 00007ffb`effd7ef0 00007ffb`ee810e30 sqllang!CSecContextToken::GetData
The first and simpler GetData() overload doesn’t show up in a vftable, but the second does. Oddly, the vftable for the second one lives at an offset of +0x1448 into the class instance – you’re going to have to trust me on this one. So the rcx passed into either variation will actually be the same one! But if the virtual version is called, it needs to find its position relative to +0x1448 dynamically, by doing a data lookup. We can confirm that by peeking at what is saved four bytes earlier at +0x1444, and that is indeed the value zero.
The setup of both vftables and dynamic offset lookups is done within the class constructor, with reference to other structures with the symbol of sqllang!CSecContextToken::`vbtable’. This seems to be the glue keeping compile-time and run-time structures in sync, with each vbtable containing the offsets needed to allow one vftable (interface) to reference another, in other words to be cast to it.
When it comes to virtual multiple inheritance and the implementation of multiple interfaces, the previously solid “this” pointer passed into member functions changes its colours. Either it adapts itself with a known compile-time offset or it does so by doing a runtime lookup.
To make an example of the vftable of the second case:
- Clearly a CSecContextToken can be upcast to a CSecCtxtCacheEntry<CSecContextTokenKeySid,1,CPartitionedRefManager>: the vftable tells us it is one of its base types.
- However, the act of casting actually adjusts the “this” pointer passed to the function – in this case by adding 0x1448 to it. Then the implementation of that function does further adjustments as needed so it can act upon the intended underlying object rather than on the interface itself.
Inheritance isn’t always this complicated. For single inheritance, even with virtual functions, an object still retains a single vftable, conventionally located at offset zero (in class constructors, you can see these progressively overwriting each other, revealing the hierarchy). But once multiple inheritance and multiple interfaces are involved, each one wants its own vftable, and the fun of static and dynamic address adjustment starts.
There’s a place for us
Long breath. I am not a C++ programmer, and my only prior exposure to multiple inheritance has been of the form “some languages make this possible, but it might fry your brain”. The implementation details sure are frying mine.
Then again, this is another illustration of the kind of thing high level languages do for us. If you can just focus on the right level of abstraction and trust the compiler to get the details right, you are riding a powerful beast. In a handful of cases, the benefit of the abstraction might be outweighed by the performance cost of moving too far from the metal (branch misprediction and memory stalls due to virtual function calls and the interposition of those “this”-adjusting functions), but most code of most people don’t live on that edge.
So without knowing where I was heading, I think SQL Server just taught me a few things about multiple inheritance and the context implications of coding with it. And it seems that those functions that do nothing more than adjust the incoming “this” pointer before delegating to an overload are in fact adjustor thunks. I read about them ages ago, but it went over my head, leaving only the catchy name, but Googling it now seems to make sense.
There is a moral in there. Somewhere. To find your place in the world, sometimes you just have to find your adjustor thunk.
Further reading
Inside the C++ object model by Stanley Lippman covers this kind of ground while staying in C++-rooted explanations. By the nature of the beast, I have been discovering Microsoft-specific implementation details, but now I feel ready to continue working my way through Lippman.