Read IL from MethodBody

Reflection in .NET 2.0 ships with a new class MethodBody, which "provides access
to information about the local variables and exception-handling clauses in a method
body, and to the Microsoft intermediate language (MSIL) that makes up the method
body". (Thanks to Glenn, who wrote
this in the MSDN doc; I simply copy & paste it). The only one method
inside this class (others are properties) is:

public byte[] GetILAsByteArray()

which returns us a byte array containing IL content for this method body.

Recently we got some questions related to reflecting IL stream: users want to check
where a specific method was used; or try to build a call graph inside an assembly;...
Reflection currently does not support this directly. Lutz Roeder's awesome tool
- .NET Reflector can indeed show
the call/callee graph for each method. I used to have the source code for his ILReader,
and noticed it did not use Reflection APIs (so for sure we can read IL without reflection
APIs)

A code sketch below shows how we can use classes in the Reflection/Emit namespace
(and other useful APIs provided in .NET 2.0) to read IL instructions.
Standard ECMA-335 is the authoritative resource to understand IL, MethodBody
format and other CLI topics if you want to know more of them.

public class ILReader
: IEnumerable<ILInstruction>

{

   Byte[] m_byteArray;

   Int32 m_position;

   MethodBase m_enclosingMethod;

 

   static OpCode[]
s_OneByteOpCodes = new OpCode[0x100];

   static OpCode[]
s_TwoByteOpCodes = new OpCode[0x100]; 

   static ILReader()

   {

     foreach (FieldInfo
fi in typeof(OpCodes).GetFields(BindingFlags.Public | BindingFlags.Static))

     {

       OpCode opCode = (OpCode)fi.GetValue(null);

       UInt16 value = (UInt16)opCode.Value;

       if (value < 0x100)

         s_OneByteOpCodes[value] = opCode;

       else if ((value &
0xff00) == 0xfe00)

         s_TwoByteOpCodes[value & 0xff] = opCode;

     }

   }

 

   public ILReader(MethodBase
enclosingMethod)

   {

     this.m_enclosingMethod = enclosingMethod;

     MethodBody methodBody = m_enclosingMethod.GetMethodBody();

     this.m_byteArray = (methodBody
== null) ? new Byte[0] : methodBody.GetILAsByteArray();

     this.m_position = 0;

   }

 

   public IEnumerator<ILInstruction>GetEnumerator()

   {

     while (m_position < m_byteArray.Length)

       yield return Next();

     m_position = 0;

     yield break;

   }

   System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()  { return
this.GetEnumerator(); }

 

   ILInstruction Next()

   {

     Int32 offset = m_position;

     OpCode opCode =
OpCodes.Nop;

     Int32 token = 0;

 

     // read first 1 or 2 bytes as opCode

     Byte code = ReadByte();

     if (code != 0xFE)

       opCode = s_OneByteOpCodes[code];

     else

     {

       code = ReadByte();

       opCode = s_TwoByteOpCodes[code];

     }

 

     switch (opCode.OperandType)

     {

       case
OperandType.InlineNone:

         return new InlineNoneInstruction(m_enclosingMethod,
offset, opCode);

 

       case
OperandType.ShortInlineBrTarget:

         SByte shortDelta =
ReadSByte();

         return new ShortInlineBrTargetInstruction(m_enclosingMethod,
offset, opCode, shortDelta);

 

       case
OperandType.InlineBrTarget:
Int32 delta = ReadInt32();    
return new ...;

       case
OperandType.ShortInlineI:  
Byte int8 = ReadByte();        return
new ...;

       case
OperandType.InlineI:       
Int32 int32 = ReadInt32();    
return new ...;

       case
OperandType.InlineI8:       Int64
int64 = ReadInt64();     return new ...;

       case
OperandType.ShortInlineR:   Single float32
= ReadSingle(); return new ...;

       case
OperandType.InlineR:        Double
float64 = ReadDouble(); return new ...;

       case
OperandType.ShortInlineVar: Byte index8 =
ReadByte();      return new ...;

       case
OperandType.InlineVar:      UInt16
index16 = ReadUInt16(); return new ...;

       case
OperandType.InlineString:   token = ReadInt32();
return new ...;

       case
OperandType.InlineSig:      token = ReadInt32();
return new ...;

       case
OperandType.InlineField:    token = ReadInt32();
return new ...;

       case
OperandType.InlineType:     token = ReadInt32();
return new ...;

       case
OperandType.InlineTok:      token = ReadInt32();
return new ...;

 

       case
OperandType.InlineMethod:

         token = ReadInt32();

         return new InlineMethodInstruction(m_enclosingMethod,
offset, opCode, token);

 

       case
OperandType.InlineSwitch:

         Int32 cases = ReadInt32();

         Int32[] deltas = new Int32[cases];

         for (Int32
i = 0; i < cases; i++) deltas[i] = ReadInt32();

         return new InlineSwitchInstruction(m_enclosingMethod,
offset, opCode, deltas);

 

       default:

         throw new BadImageFormatException("unexpected OperandType " + opCode.OperandType);

     }

   }

 

   Byte ReadByte() { return
(Byte)m_byteArray[m_position++]; }

   SByte ReadSByte() {
return (SByte)ReadByte(); }

 

   UInt16 ReadUInt16() { m_position += 2;
return BitConverter.ToUInt16(m_byteArray,
m_position - 2); }

   UInt32 ReadUInt32() { m_position += 4;
return BitConverter.ToUInt32(m_byteArray,
m_position - 4); }

   UInt64 ReadUInt64() { m_position += 8;
return BitConverter.ToUInt64(m_byteArray,
m_position - 8); }

 

   Int32 ReadInt32() { m_position += 4; return BitConverter.ToInt32(m_byteArray,
m_position - 4); }

   Int64 ReadInt64() { m_position += 8; return BitConverter.ToInt64(m_byteArray,
m_position - 8); }

 

   Single ReadSingle() { m_position += 4;
return BitConverter.ToSingle(m_byteArray,
m_position - 4); }

   Double ReadDouble() { m_position += 8;
return BitConverter.ToDouble(m_byteArray,
m_position - 8); }

}

Few comments here:

  • Definitions for ILInstruction and others derived XXXInstructions are not included
    here. Imagine ILInstruction (need) knows Offset, OpCode, ...; and the derived ILInstructions
    may contain other information. If only OpCode.Call or OpCode.CallVirt (which belong
    to OperandType.InlineMethod case) was interesting to me, I'd like to define special
    CallInstruction or CallVirtInstruction class, and create them explicitly. Module.ResolveMethod
    can then help to get MethodBase from the operand (token). Yiru's
    blog has some posts related to this topic.
  • The static constructor of ILReader uses Reflection to initialize 2 static OpCode
    arrays from the enum type OpCodes. As I mentioned
    here, we may use Enum.GetValues to achieve this too.
  • The public constructor accepts a MethodInfo, gets its MethodBody, and then calls
    GetILAsByteArray() to set m_byteArray. Note calling GetMethodBody on the interface
    method and others could return null.
  • The Next() method takes advantage of OpCode.OperandType to decide what kind of operand
    is expected, how many bytes should be read.
  • BitConverter.ToXXX can read a number of bytes and convert them to primitive values
    in-place. Very handy!

Then I can consume the ILInstruction sequence in C# as follows:

foreach (ILInstruction
il in new ILReader(method))
{ /* do something with il */ }

Comments

  • Anonymous
    March 20, 2006
    i'm running into something that seems a bit strange...

    You deal with the InlineSwitch instruction in the way that the documentation says you should (ECMA - 335 Common Language Infrastructure, Partition III, para. 3.66). If i do this then, when I read what you call the 'deltas' in your code sketch, it seems like I have infact eaten into the instructions associated with the first case statements of the switch.

    If it helps you to understand where i'm coming from, I'm testing  these things by reading the IL instructions and the exception handling info out of a MethodBody and trying to emit a duplicate of the method thru System.Reflection.Emit.ILGenerator. I'm then comparing the results of running ildasm on the original and the emitted version. You could call this round-tripping, right?

    anyway, here's the strange part - if i simply pass the ILGenerator the opCode for an InlineSwitch and the Int32 that followed it (the value called 'cases' in your code sketch) and never try to read the 'delta' values then the IL i output is an exact match of the original.
    I've tested this with a range of examples.

    can anyone explain this?

    If you follow me, then what i'm saying contradicts your example, the ECMA documentation and, in fact, all reason and common sense... where are the delta values coming from if they're not being read from the array of bytes in a method body?


    To cut a long story short - has anyone tested anything based on the above code sketch with switch instructions and found it to work?

    any feedback much appreciated, i'm quite puzzled by this,
    colin

    p.s. On a possibly related note, the ECMA doc i'm looking at (version 3)  says the value following a switch instruction is an unsigned int32. You use a signed Int32. I find that if i read it as an unsigned int32 I get the wrong value, but if i follow you're example and use a signed Int32 everythings cool... is the documentation wrong on this?
  • Anonymous
    March 23, 2006
    cheers - i await your findings with interest :)

    in the mean time, could i possibly run something by you to check if my understanding of .NET Reflection is correct? (I've already tried the msdn developer forums... )

    Given that, MethodBuilder.CreateMethodBody() is "currently not fully supported" as it cannot deal with exception handlers,
    then the only way the .NET platform currently provides for generating IL that includes exception handlers is through the, aptly named, ILGenerator class. Right?

    Thererfore, it is impossible to "round-trip" IL code in the way i described above using Reflection and ILGenerator?

    The reason being that ILGenerator.EndExceptionBlock() and ILGenerator.BeginCatchBlock() always emit a branch instruction at the end of the current exception block and this is always an InlineBrTarget ('leave') with a 4 byte parameter.  Compilers, on the other hand, may end an exception block with a ShortInlineBranchTarget ('leave.s' with a 1 byte parameter).

    Is this correct? Is there another way to manipulate exception handling blocks in IL?
  • Anonymous
    March 24, 2006
    col –I answered the similar question at http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=215364 before. You may want to have a look.

    For your first comment, I tried and did not see issue with the above code. Also using ReadUInt32 or ReadInt32 returns me the same value for “cases”.

    Btw, the new Rotor code (SSCLI 2.0) was released yesterday, if interested, you may want to take a look at the implementation details.
  • Anonymous
    March 26, 2006
    thanks,
    i will look at the code i'm using again, if i still think it's behaving weirdly i will try to post some sort of demo