NGEN Primer

I am planning to write couple of NGEN/GAC related posts. I thought I’d share out some introductory notes about NGEN. This is for the a beginner managed developer.

Primer

Consider I have a math-library which has this simple C# code.

 namespace Abhinaba
{
    public class MathLibrary
    {
        public static int Adder(int a, int b)
        {
            return a + b;
        }
    }
}

The C# compiler compiles this code into processor independent CIL (Common Intermediate Language) instead of a machine specific (e.g. x86 or ARM) code. That CIL code can be seen by opening the dll generated by C# compiler in a IL disassembler like the default ildasm that comes with .NET. The CIL code looks as follows

 .method public hidebysig static int32  Adder(int32 a,
                                             int32 b) cil managed
{
  // Code size       9 (0x9)
  .maxstack  2
  .locals init ([0] int32 CS$1$0000)
  IL_0000:  nop
  IL_0001:  ldarg.0<br>  IL_0002:  ldarg.1  
IL_0003:  add
  IL_0004:  stloc.0
  IL_0005:  br.s       IL_0007
  IL_0007:  ldloc.0
  IL_0008:  ret
} // end of method MathLibrary::Adder

To abstract away machine architecture the .NET runtime defines a generic stack based processor and generates code for this make-belief processor. Stack based means that this virtual processor works on a stack and it has instructions to push/pop values on the stack and instructions to operate on the values already inside the stack. E.g. in this particular case to add two values it pushes both the arguments onto the stack using ldarg instructions and then issues an add instruction which automatically adds the value on the top of the stack and pushes in the result. The stack based architecture places no assumption on the number of registers (or even if the processor is register based) the final hardware will have.

Now obviously there is no processor in the real world which executes these CIL instructions. So someone needs to convert those to object code (machine instructions). These real world processors could be from the x86, x64 or ARM families (and many other supported platforms). To do this .NET employs Just In Time (JIT) compilation. JIT compilers responsibility is to generate native machine specific instructions from the generic IL instructions on demand, that is as a method is called for the first time JIT generates native instructions for it and hence enables the processor to execute that method. On my machine the JIT produces the following x86 code for the add

 02A826DF  mov         dword ptr [ebp-44h],edx  
02A826E2  nop  
02A826E3  mov         eax,dword ptr [ebp-3Ch]  
02A826E6  add         eax,dword ptr [ebp-40h]  

This process happens on-demand. That is if Main calls Adder, Adder will be JITed only when it is actually being called by Main. If a function is never called it’s in most cases never JITed. The call stack clearly shows this on-demand flow.

 clr!UnsafeJitFunction <------------- This will JIT Abhinaba.MathLibrary.Adder 
clr!MethodDesc::MakeJitWorker+0x535
clr!MethodDesc::DoPrestub+0xbd3
clr!PreStubWorker+0x332
clr!ThePreStub+0x11
App!ConsoleApplication1.Program.Main()+0x3c <----- This managed code drove that JIT

The benefits of this approach are

  1. It provides for a way to develop applications with a variety of different languages. Each of these languages can target the MSIL and hence interop seamlessly
  2. MSIL is processor architecture agnostic. So the MSIL based application could be made to run on any processor on which .NET runs (build once, run many places)
  3. Late binding. Binaries are bound to each other (say an exe to it’s dlls) late which results in allowing more significant lee-way on how loosely couple they could be
  4. Possibility of very machine specific optimization. As the compilation is happening on the exact same machine/device on which the application will run

JIT Overhead

The benefits mentioned above comes with the overhead of having to convert the MSIL before execution. The CLR does this on demand, that is when a method is just going to execute it is converted to native code. This “just in time” dynamic compilation or JITing adds to both application startup cost (a lot of methods are executing for the first time) as well as execution time performance. As a method is run many times, the initial cost of JITing fades away. The cost of executing a method n times can expressed as

Cost JIT + n * Cost Execution

At startup most methods are executing for the first time and n is 1. So the cost of JIT pre-dominates. This might result in slow startup. This effects scenarios like phone where slow application startup results in poor user experience or servers where slow startup may result in timeouts and failure to meet system SLAs.

Also another problem with JITing is that it is essentially generating instructions in RW data pages and then executing it. This does not allow the operating system to share the generated code across processes. So even if two applications is using the exact same managed code, each contains it’s own copy of JITed code.

NGEN: Reducing or eliminating JIT overhead

From the beginning .NET supports the concept of pre-compilation by a process called NGEN (derived from Native image GENeration). NGEN consumes a MSIL file and runs the JIT in offline mode and generates native instructions for all managed IL functions and store them in a native or NI file. Later applications can directly consume this NI file. NGEN is run on the same machine where the application will be used and run during installation of that application. This retains all the benefits of JIT and at the same time removes it’s overhead. Also since the file generated is a standard executable file the executable pages from it can be shared across processes.

 c:\Projects\ConsoleApplication1\ConsoleApplication1\bin\Debug>ngen install MyMathLibrary.dll
Microsoft (R) CLR Native Image Generator - Version 4.0.30319.33440
Copyright (c) Microsoft Corporation.  All rights reserved.
1>    Compiling assembly c:\Projects\bin\Debug\MyMathLibrary.dll (CLR v4.0.30319) ...

One of the problem with NGEN generated executables is that the file contains both the IL and NI code. The files can be quiet large in size. E.g. for mscorlib.dll I have the following sizes

Directory of C:\Windows\Microsoft.NET\Framework\v4.0.30319

09/29/2013  08:13 PM         5,294,672 mscorlib.dll

               1 File(s)      5,294,672 bytes

Directory of C:\Windows\Microsoft.NET\Framework\v4.0.30319\NativeImages

10/18/2013  12:34 AM        17,376,344 mscorlib.ni.dll

               1 File(s)     17,376,344 bytes

 

Read up on MPGO tool on how this can be optimized (https://msdn.microsoft.com/library/hh873180.aspx)

NGEN Fragility

Another problem NGEN faces is fragility. If something changes in the system the NGEN images become invalid and cannot be used. This is true especially for hardbound assemblies.

Consider the following code

 class MyBase
{
    public int a;
    public int b;
    public virtual void func() {}
}

static void Main()
{
    MyBase m = new MyBase();
    mb.a = 42;
    mb.b = 20;
}

Here we have a simple class whose variables have been modified. If we look into the MSIL code of the access it looks like

 L_0008: ldc.i4.s 0x2a
L_000a: stfld int32 ConsoleApplication1.MyBase::a
L_000f: ldloc.0 
L_0010: ldc.i4.s 20
L_0012: stfld int32 ConsoleApplication1.MyBase::b

The native code for the variable access can be as follows

             mb.a = 42;
0000004b  mov         eax,dword ptr [ebp-40h] 
0000004e  mov         dword ptr [eax+4],2Ah 
            mb.b = 20;
00000055  mov         eax,dword ptr [ebp-40h] 
00000058  mov         dword ptr [eax+8],14h 

The code generation engine essentially took a dependency of the layout of MyBase class while generating code to modify and update that. So the hard coded layout dependency is that compiler assumes that MyBase looks like

<base> MethodTable
<base> + 4 a
<base> + 8 b

The base address is stored in eax register and the updates are made at an offset of 4 and 8 bytes from that base. Now consider that MyBase is defined in assembly A and is accessed by some code in assembly B, and that Assembly A and B are NGENed. So if for some reason the MyBase class (and hence assembly A is modified so that the new definition becomes.

 class MyBase
{
    public int foo;
    public int a;
    public int b;
    public virtual void func() {}
}

If we looked from the perspective of MSIL code then the reference to these variables are on their symbolic names ConsoleApplication1.MyBase::a, so if the layout changes the JIT compiler at runtime will find their new location from the metadata located in the assembly and bind it to the correct updated location. However, from NGEN this all changes and hence the NGEN image of the accessor is invalid and have to be updated to match the new layout

<base> MethodTable
<base> + 4 foo
<base> + 8 a
<base> + 12 b

This means that when the CLR picks up a NGEN image is needs to be absolutely sure about it’s validity. More about that in a later post.

Comments

  • Anonymous
    December 10, 2013
    Any chance you can extend your brilliant posts to include whats going on with Windows Phone? ... Triton, cloud compiler, is of interest to me.. Some suggest it serves as an NGEN in the cloud with the last "linking" step being done on the device...

  • Anonymous
    December 10, 2013
    Jose the secret is that this is a cut down version of what was supposed to be a post titles "Evolution of codegen in CLR" which was supposed to cover Windows Phone cloud compilation and more. Unfortunately I am really out of time. I hope to get to that over the holidays though :)

  • Anonymous
    December 10, 2013
    Good post! I am investigating how to use ngen for my application. My problem is that we install it using ClickOnce. The application has around 80 assemblies and I read in an older web site that ngen and ClickOnce cannot be used together. Is this still the current situation? Could you please explain the reason if you know it? Thank you for you great posts Raul

  • Anonymous
    December 11, 2013
    OH ... "Evolution of codegen in CLR" ..... I would pay a princely sum for that post :) ... Can't  wait , but totally understand its a time constraint thing for you... Thanks again for your posts

  • Anonymous
    December 13, 2013
    Great post Abhinaba. I was wondering, how would you see the JIT produced x86 code that you show. Also, what tool would show the call stack that includes CLR steps?  

  • Anonymous
    December 13, 2013
    Hallow, the easiest was is to hit a breakpoint in any debugger like Visual Studio or windbg and use option to show disassembly. I think this is not available in the Visual Studio express versions.

  • Anonymous
    October 04, 2014
    Why does the NI file contain both IL and NI code? Any usage of IL inside NI assembly?