Enter, Leave, Tailcall Hooks Part 1: The Basics

The CLR Profiling API allows you to hook managed functions so that your profiler is called when a function is entered, returns, or exits via tailcall. We refer to these as Enter/Leave/Tailcall hooks, or “ELT” hooks. In this special multi-part investigative series, I will uncover the truth behind ELT. Today I'll write about some of the basics, NGEN, and a word on what we call "slow-path" vs. "fast-path".

Setting up the hooks

1.     On initialization, your profiler must call SetEnterLeaveFunctionHooks(2) to specify which functions inside your profiler should be called whenever a managed function is entered, returns, or exits via tail call, respectively.

(Profiler calls this…)

  HRESULT SetEnterLeaveFunctionHooks(

                [in] FunctionEnter *pFuncEnter,

                [in] FunctionLeave *pFuncLeave,

                [in] FunctionTailcall *pFuncTailcall);

 

     (Profiler implements these…)

typedef void FunctionEnter(

                FunctionID funcID);

typedef void FunctionLeave(

                FunctionID funcID);

typedef void FunctionTailcall(

                FunctionID funcID);

 

OR

 

(Profiler calls this…)

  HRESULT SetEnterLeaveFunctionHooks2(

                [in] FunctionEnter2 *pFuncEnter,

                [in] FunctionLeave2 *pFuncLeave,

                [in] FunctionTailcall2 *pFuncTailcall);

 

     (Profiler implements these…)

typedef void FunctionEnter2(

                FunctionID funcId,

                UINT_PTR clientData,

                COR_PRF_FRAME_INFO func,

                COR_PRF_FUNCTION_ARGUMENT_INFO *argumentInfo);

typedef void FunctionLeave2(

                FunctionID funcId,

                UINT_PTR clientData,

                COR_PRF_FRAME_INFO func,

                COR_PRF_FUNCTION_ARGUMENT_RANGE *retvalRange);

typedef void FunctionTailcall2(

                FunctionID funcId,

                UINT_PTR clientData,

                COR_PRF_FRAME_INFO func);

 

This step alone does not cause the enter/leave/tailcall (ELT) hooks to be called.  But you must do this on startup to get things rolling.

2.     At any time during the run, your profiler calls SetEventMask specifying COR_PRF_MONITOR_ENTERLEAVE in the bitmask.  Your profiler may set or reset this flag at any time to cause ELT hooks to be called or ignored, respectively.

FunctionIDMapper

In addition to the above two steps, your profiler may specify more granularly which managed functions should have ELT hooks compiled into them:

1.     At any time, your profiler may call ICorProfilerInfo2::SetFunctionIDMapper to specify a special hook to be called when a function is JITted.

(Profiler calls this…)

  HRESULT SetFunctionIDMapper(

                [in] FunctionIDMapper *pFunc);

 

     (Profiler implements this…)

typedef UINT_PTR __stdcall FunctionIDMapper(

                FunctionID funcId,

                BOOL *pbHookFunction);

 

2.     When FunctionIDMapper is called:

a.     Your profiler sets the pbHookFunction [out] parameter appropriately to determine whether the function identified by funcId should have ELT hooks compiled into it.

b.     Of course, the primary purpose of FunctionIDMapper is to allow your profiler to specify an alternate ID for that function.  Your profiler does this by returning that ID from FunctionIDMapper .  The CLR will pass this alternate ID to your ELT hooks (as funcID if you're using the 1.x ELT, and as clientData if you're using the 2.x ELT).

Writing your ELT hooks

You may have noticed that corprof.idl warns that your implementations of these hooks must be __declspec(naked), and that you've got to save registers you use. Yikes! This keeps things nice and efficient on the CLR code generation side, but at the expense of making life a little more difficult for profilers. For great low-level details of writing the hooks (including yummy sample code!) visit Jonathan Keljo's blog entry here.

NGEN /Profile

The profiling API makes use of the fact that it can control the JITting of functions to enable features like ELT hooks. When managed code is NGENd, however, this assumption goes out the door. Managed code is already compiled before the process is run, so there’s no opportunity for the CLR to bake in calls to ELT hooks.

The solution is “NGEN /Profile”. For example, if you run this command against your assembly:

ngen install MyAssembly.dll /Profile

 

it will NGEN MyAssembly.dll with the “Profile” flavor (also called “profiler-enhanced”). This flavor causes extra hooks to be baked in to enable features like ELT hooks, loader callbacks, managed/unmanaged code transition callbacks, and the JITCachedFunctionSearchStarted/Finished callbacks.

The original NGENd versions of all your assemblies still stay around in your NGEN cache. NGEN /Profile simply causes a new set of NGENd assemblies to be generated as well, marked as the “profiler-enhanced” set of NGENd assemblies. At run-time, the CLR determines which flavor should be loaded. If a profiler is attached and enables certain features that only work with profiler-enhanced (not regular) NGENd assemblies (such as ELT via a call to SetEnterLeaveFunctionHooks(2), or any of several other features that are requested by setting particular event flags via SetEventMask), then the CLR will only load profiler-enhanced NGENd images--and if none exist then the CLR degrades to JIT in order to support the features requested by the profiler. In contrast, if the profiler does not specify such event flags, or there is no profiler to begin with, then the CLR loads the regular-flavored NGENd assemblies.

So how does NGEN /Profile make ELT hooks work? Well, in a profiler-enhanced NGEN module, each function gets compiled with calls at enter, leave, and tailcall time to a thunk. At run-time, the CLR decides what this thunk does. Either nothing (if no profiler requested ELT hooks), or jmp to the profiler's ELT hook. For example, if a profiler is loaded, requesting ELT notifications, and the CPU is executing near the top of a function inside a profiler-enhanced NGEN module, the disassembly will look something like this:

5bcfb8b0 call mscorwks!JIT_Writeable_Thunks_Buf+0x1b8 (5d8401d8)

And where's the target of that call? Right here:

5d8401d8 jmp UnitTestSampleProfiler!Enter2Naked (023136b0)

As you may have guessed, I happen to have a profiler named "UnitTestSampleProfiler" loaded and responding to ELT notifications, so that thunk will jmp right into my Enter2 hook. When I return from my hook, control goes right back to the managed function that called the thunk.

Fast-path vs. Slow-path

There are two paths the CLR might take to get to your ELT hooks: fast & slow.  Fast means the JIT inserts a call from the JITted function directly into the profiler. (In profiler-enhanced NGEN modules, this translates to the thunk jumping directly to your ELT hook.) Slow means that some fixup must be done before control can be passed to your profiler, so the JIT inserts a call from the JITted function into helper functions in the CLR to do the fixup and finally forward the call to your profiler. (Or, in NGEN-land, the thunks jmp to those CLR helper functions.)

There are also two supported signatures for the ELT hooks: CLR 1.x (set via SetEnterLeaveFunctionHooks) and CLR 2.x-style (set via SetEnterLeaveFunctionHooks2).

If your profiler requests 1.x ELT hooks, then slow-path is used for them all, end of story.

If your profiler requests 2.x ELT hooks, then slow-path is used for them all if any of the following event flags were set by your profiler:

  • COR_PRF_ENABLE_STACK_SNAPSHOT:  “Slow” ensures that the CLR has an opportunity to do some housekeeping on the stack before your profiler is called so that if your profiler calls DoStackSnapshot from within the ELT hook, then the stack walk will have a marker to begin from.
  • COR_PRF_ENABLE_FUNCTION_ARGS: “Slow” gives the CLR an opportunity to gather the function’s arguments on the stack for passing to the profiler’s enter hook.
  • COR_PRF_ENABLE_FUNCTION_RETVAL: “Slow” gives the CLR an opportunity to gather the function’s return value on the stack for passing to your profiler’s leave hook.
  • COR_PRF_ENABLE_FRAME_INFO: “Slow” gives the CLR an opportunity to gather generics information into a COR_PRF_FRAME_INFO parameter to pass to your profiler.

Why do you care? Well, it's always good to know what price you're paying. If you don't need any of the features above, then you're best off not specifying those flags. Because then you'll see better performance as the managed code may call directly into your profiler without any gunk going on in the middle. Also, this information gives you some incentive to upgrade your profiler's old 1.x ELT hooks to the hip, new 2.x ELT style. Since 1.x ELT hooks always go through the slow path (so the CLR has an opportunity to rearrange the parameters to fit the old 1.x prototype before calling your profiler), you're better off using the 2.x style.

Next time...

That about covers it for the ELT basics. Next installment of this riveting series will talk about that enigma known as tailcall.

Comments

  • Anonymous
    March 22, 2007
    Hi, Thanks for your post. This theme is very interesting. Also thanks for your answer in the forum. It makes me do additional investigation and soon I'll have more questions to you =). I'd like to know, is it guaranteed that the original function id is located in some interval of meanings? For example, I havn't got the function id with the setted first bit. How many bits is free for change in order not to intercept with the original ones. Somewhere were mentioned that the function id is a memnory address. Is it true? Is it means that it's possible to find that interval?

  • Anonymous
    March 24, 2007
    Hi, Sergey. The following implementation details SHOULD NOT BE DEPENDED UPON.  I'm providing this for information only, so read and then forget what you read.  :-) The current implementation uses the address of an internal runtime structure as the value of the FunctionID.  This means that you'll find FunctionIDs will appear aligned by the number of bits that the compiler will normally align the structure (which in turn may be dictated by the architecture, like ia64/x64). Again, it is NOT RECOMMENDED to rely on this information, though.  The CLR team is free, at a whim, to completely change the underlying implementation in new releases.  In theory, we could even change this in an automatic servicing update (delivered via Windows Update).  It is probably unlikely that we'd make such a large change in a servicing update, but we still reserve the right to do that as well, and certainly in a new major release. If you'd like to make assumptions about the ID of a function in your ELT hooks, you should use FunctionIDMapper.

  • Anonymous
    March 25, 2007
    Ok. I'll write an answer and then will forget what's you've written =) As I understand the idMapper was created in order to encode to the functionID some additional information about function. We successfully use it and it's really increases the performance. It's great functionality but it leads to some troubles when creating shadow callstack. I suppose that the convertion operation from new function id to the old one is long time operation, so its preferable to save in the shadow callstack new function ids. In this scenario the main problem of function id mapper is that the ExceptionUnwind callbacks use the old function id. So, it's need to apply that id mapper to every function id, gotten in the ExceptionUnwind. There are two ways to solve this performance issue:

  1. change profiler (I DO understand that it is too difficult, because the current implementation of the profiler) in order to pass new function id to the exception unwind callbacks.
  2. create fast mapper from new function id to the old one. The fastest way for this is setting some bits, that is guaranteed not used in the old function id. Some times ago I've made some tests. I've generated a lot of methods and got function ids list. All of them was aligned =), but all of them also was less then some value (currently I do not remember concrete value). Is it some limitation for the memory, that can be used by CLR? Or maybe runtime structures can be located only in the following addresses space? What do you think about this problem? Do you think that it is problem? =) Maybe there is other way to solve this performance issue?
  • Anonymous
    March 26, 2007
    Hi David, My commment havn't been published but I already know the answer to my question. Closer look at my callbacks shows that I use only one parameter - clientData in order to support FW 1.1. Under FW 2.0 I can use both - FunctionID and ClientData, isn't it?

  • Anonymous
    March 26, 2007
    The comment has been removed

  • Anonymous
    April 17, 2007
    Hi, I'm looking forward to the next part of your article about Tailcall callback. I have some question preventing your article, maybe you will answer it, maybe you will answer here (or it's better to ask in forum?). Questions are:

  1. How to prevent method from being tailcalled? Is it possible?
  2. Can the method with global try{}catch block be tailcalled? Is it true that only the methods with the ending return can be tailcalled?
  • Anonymous
    April 26, 2007
    Hi, Sergey, sorry for the delay.  I've been out of town for most of the month, and am now unburying myself at work.  You can disable tailcalls, but only by disabling ALL optimizations (via COR_PRF_DISABLE_OPTIMIZATIONS), so this isn't a great thing to do unless you really want all codegen optimizations turned off. As for what conditions prevent or allow the tailcall optimization, I'm going to hold off on addressing that until the article, as there are many factors.  It'll probably be a while, though I'm hoping to get started on it in a couple weeks.

  • Anonymous
    May 02, 2007
    The comment has been removed

  • Anonymous
    May 02, 2007
    Hi, Dan.  In general, there's rarely an "easy" way to do anything inside a profiler.  :-)  You are correct that you'll need to grope through the function signature in the metadata to understand the types of each argument.  (In theory, if you're only probing a small set of functions, your profiler could then have a priori knowledge of their signatures hard-coded in.  Though that assumes tokens and signatures never change, and is fragile.)  Once your signature parser comes up with the types of the arguments, you can use profapi functions like GetClassLayout (or the equivalent in IMetaDataImport) to figure out the sizes of various types.

  • Anonymous
    May 07, 2007
    Thanks David! I was optimistic because every time I've looked at the COR_PRF_FUNCTION_ARGUMENT_INFO argument, the number of ranges has always exactly equaled the number of arguments. I was hoping I could just index into the ranges array to get the argument I'm interested in. But it sounds like what you're saying is that one COR_PRF_FUNCTION_ARGUMENT_RANGE can hold multiple argument values. :( Is that correct? Thanks again for your help! Dan

  • Anonymous
    May 07, 2007
    You are interpreting correctly.  Though you might be observing one range per argument, nothing guarantees that this will always be the case.

  • Anonymous
    July 03, 2007
    For most people the idea of entering or returning from a function seems straightforward. Your profiler's

  • Anonymous
    March 05, 2008
    A frequent topic of discussion between those of us on the CLR Profiling API team at Microsoft and our

  • Anonymous
    December 23, 2008
    What follows is a long-lost blog entry that Jonathan Keljo had been working on.  I brushed off some

  • Anonymous
    January 14, 2009
    "For great low-level details of writing the hooks (including yummy sample code!) visit Jonathan Keljo's blog entry here." Unfortunately, the link to the FunctionHooks.zip file there is broken.

  • Anonymous
    February 14, 2009
    Sorry about the delay.  The new location for FunctionHooks.zip is now here: http://feeblah.members.winisp.net/direct/blog/FunctionHooks.zip.

  • Anonymous
    February 03, 2010
    Hi How define a new method in ngen /profile version of a assembly? i met a methodnotfound exeption, but if not ngen assembly is loaded i haven't any problem.

  • Anonymous
    February 04, 2010
    Hi!  This issue has recently been reported in our forum.  Subscribe to this thread for updates: http://social.msdn.microsoft.com/Forums/en-US/netfxtoolsdev/thread/1d3c17b7-9eaa-45d9-9e77-f95619be16a1

  • Anonymous
    February 03, 2011
    Hi Need some clarification on the bitmask : COR_PRF_MONITOR_ENTERLEAVE If I set it during profiler initialization time and then reset it later, will the call  backs happen for the the methods JITted between the set and reset. Your article above seems to indicate that this is the case, if so then is it true for both types of calls (fast-path and slow-path) Thanks

  • Anonymous
    February 07, 2011
    Hi, Ramesh.  Could I ask you to post this question on our forum?  That ensures the most people can see it and help you. social.msdn.microsoft.com/.../threads