IL Reading
[Personal Note: This title is very stupid isn't it? :-), It alsmost makes you think about some magic or mental trickery a mind reading comes in mind when looking at the title, or perhaps I watch to much Derren Brown or something, anyways onwards to the post]
I was always fascinated how virtual machines were created and how they worked, so always when doing programming I wanted to push the boundary's of a language or platform and go beyond the sandbox that the designers intended to give us. Some of the most awesome projects were born this way for example Code Injection frameworks, Aspect Frameworks, Decompilers, Profilers etc.
So this time as a referral to my previous post about IL parsing this is committed to building a simple code to parse IL instructions, and hopefully change them at some point into code.
IL Parsing isn't that complex as long as you read the ECMA or find how the instructions are presented in a byte array. The basic is this that there are single byte and multi byte instructions (most of them used in code are single byte though), each instruction can have an operand (just like in assembly) that is one, two, three or four bytes long, to get the types that get called or fields pulled we need the Module class that can get the proper data in context.
The most basic way to get bytes is to use the framework build in method GetIlAsByteArray() but, that's not as effective as to just open the assembly and take out the instructions by brute force (since we can get additional data then), but that requires a whole lot more knowledge about the platform and it's a subject to change with each new version of the framework, so for the purpose of this post I will stick to the easy way.
Prerequisites
When getting the bytes and then reading them one would need to define some ByteBuffer class, but as I designed a DataStream class with simple modifications it can be used to do just that (I will post the extensions in my next post probably), but one could use anything that can read a byte array, and have a position index.
Building It
. public class IlReader { private DataStreams.DataStream stream; private OpCode[] singleByteOpCode; private OpCode[] doubleByteOpCode; private byte[] instrunctions; private IList<LocalVariableInfo> locals; private ParameterInfo[] parameters; private Type[] typeArgs = null; private Type[] methodArgs = null; private MethodBase currentMethod = null; private List<IlInstruction> ilInstructions = null; public IlReader() { CreateOpCodes(); } private void CreateOpCodes() { singleByteOpCode = new OpCode[225]; doubleByteOpCode = new OpCode[31]; FieldInfo[] fields = GetOpCodeFields(); for (int i = 0; i < fields.Length; i++) { OpCode code = (OpCode)fields[i].GetValue(null); if (code.OpCodeType == OpCodeType.Nternal) continue; if (code.Size == 1) singleByteOpCode[code.Value] = code; else doubleByteOpCode[code.Value & 0xff] = code; } } public List<IlInstruction> ReadInstructions(MethodBase method) { ilInstructions = new List<IlInstruction>(); this.currentMethod = method; locals = method.GetMethodBody().LocalVariables; instrunctions = method.GetMethodBody().GetILAsByteArray(); IStreamObject streamObject = new ByteArrayObject(instrunctions); stream = new DataStreams.DataStream(streamObject, Encoding.UTF8); if (!(method.GetType() == typeof(ConstructorInfo))) methodArgs = method.GetGenericArguments(); if (method.DeclaringType != null) typeArgs = method.DeclaringType.GetGenericArguments(); IlInstruction instruction = null; while (stream.Position < stream.Length) { instruction = new IlInstruction(); OpCode code = ReadOpCode(); instruction.Operand = ReadOperand(code, method.Module); instruction.Name = code.Name; ilInstructions.Add(instruction); } return ilInstructions; } private object ReadOperand(OpCode code, Module module) { object operand = null; switch (code.OperandType) { case OperandType.InlineNone: break; case OperandType.InlineSwitch: int length = stream.ReadInt32(); int[] branches = new int[length]; int[] offsets = new int[length]; for (int i = 0; i < length; i++) offsets[i] = stream.ReadInt32(); for (int i = 0; i < length; i++) branches[i] = (int)stream.Position + offsets[i]; break; case OperandType.ShortInlineBrTarget: operand = (stream.ReadOneByte() + stream.Position); break; case OperandType.InlineBrTarget: operand = (stream.ReadInt32() + stream.Position); break; case OperandType.ShortInlineI: if (code == OpCodes.Ldc_I4_S) operand = (sbyte)stream.ReadOneByte(); else operand = stream.ReadOneByte(); break; case OperandType.InlineI: operand = stream.ReadInt32(); break; case OperandType.ShortInlineR: operand = stream.ReadFloat(); break; case OperandType.InlineR: operand = stream.ReadDouble(); break; case OperandType.InlineI8: operand = stream.ReadInt64(); break; case OperandType.InlineSig: operand = module.ResolveSignature(stream.ReadInt32()); break; case OperandType.InlineString: operand = module.ResolveString(stream.ReadInt32()); break; case OperandType.InlineTok: case OperandType.InlineType: case OperandType.InlineMethod: case OperandType.InlineField: operand = module.ResolveMember(stream.ReadInt32() , typeArgs, methodArgs); break; case OperandType.ShortInlineVar: operand = GetVariable(code, stream.ReadOneByte()); break; case OperandType.InlineVar: operand = GetVariable(code, stream.ReadUInt16()); break; default: throw new NotSupportedException(); } return operand; } private OpCode ReadOpCode() { byte instruction = stream.ReadOneByte(); if (instruction != 254) return singleByteOpCode[instruction]; else return doubleByteOpCode[stream.ReadOneByte()]; } private object GetVariable(OpCode code, int index) { if (code.Name.Contains("loc")) return locals[index]; if (!currentMethod.IsStatic) index--; return parameters[index]; } private FieldInfo[] GetOpCodeFields() { return typeof(OpCodes).GetFields(BindingFlags.Public | BindingFlags.Static); } }
That's all of the code to read the instructions from a method, but a few words of explanation is needed here, so lets take it apart method by method.
CreateOpCodes:
here we construct two arrays of possible op codes using the NET build in OpCode structure and we construct the arrays by checking the code size and putting it in the proper one. Larger instructions have minus values starting from -512 counting down so the [code.Value & 0xff] is just used to put those in the correct place in the array meaning from 0 to 30.
ReadInstructions:
This is the main method of the show here, and here we take out the byte array from the method body and take out the module and read the op code from the array and if the read byte is higher then 254 then we are dealing with multi byte instruction thus we read once more.
When having the proper OpCode all we need to do now is to determine it's operand.
ReadOperand:
Based on the operand type we just read the proper byte size to determine the operand value, the [I]type instructions are for ints the [R]type are floating point instructions, for branch types we just take the bytes depending on the type of the branch and add the stream position, the only difference from that rule is the inline switch where we have to keep track of the branches and offsets, and our operand represents the branch array. Another kind of operands are member calls and signatures where we need to use the module that we provided and resolve those operands in the proper metadata context. The very last type of operands are local operands where we need to resolve the indexes of locals taking into consideration is the method is static or not, as static methods don't have [this] pointer, the index will be smaller by one.
The rest of the methods are somewhat self explanatory so I won't cover them as this would be a waste of time :-), now that we have this little code time to test it out against ILdasm to see if it works.
Code
For the code I used the same class that I just posted here as it's more advanced than some simple test method, and if the complex example will pass so will the simple ones.
. var method = typeof(IlReader).GetMethod("ReadOpCode", BindingFlags.NonPublic | BindingFlags.Instance); IlReader rdr = new IlReader(); StringBuilder builder = new StringBuilder(); foreach (var code in rdr.ReadInstructions(method)) { builder.AppendLine(code.Name + " " + code.Operand); } Console.Write(builder.ToString());
And now let's compare the ILs of those two.
ILDasm
IL_0000: nop IL_0001: ldarg.0 IL_0002: ldfld class [DataStream]DataStreams.DataStream ConsoleApplication1.IlReader::'stream' IL_0007: callvirt instance uint8 [DataStream]DataStreams.DataStream::ReadOneByte() IL_000c: stloc.0 IL_000d: ldloc.0 IL_000e: ldc.i4 0xfe IL_0013: ceq IL_0015: stloc.2 IL_0016: ldloc.2 IL_0017: brtrue.s IL_002d IL_0019: ldarg.0 IL_001a: ldfld valuetype [mscorlib]System.Reflection.Emit.OpCode[] ConsoleApplication1.IlReader::singleByteOpCode IL_001f: ldloc.0 IL_0020: ldelema [mscorlib]System.Reflection.Emit.OpCode IL_0025: ldobj [mscorlib]System.Reflection.Emit.OpCode IL_002a: stloc.1 IL_002b: br.s IL_004b IL_002d: ldarg.0 IL_002e: ldfld valuetype [mscorlib]System.Reflection.Emit.OpCode[] ConsoleApplication1.IlReader::doubleByteOpCode IL_0033: ldarg.0 IL_0034: ldfld class [DataStream]DataStreams.DataStream ConsoleApplication1.IlReader::'stream' IL_0039: callvirt instance uint8 [DataStream]DataStreams.DataStream::ReadOneByte() IL_003e: ldelema [mscorlib]System.Reflection.Emit.OpCode IL_0043: ldobj [mscorlib]System.Reflection.Emit.OpCode IL_0048: stloc.1 IL_0049: br.s IL_004b IL_004b: ldloc.1 IL_004c: ret
ILReader
nop ldarg.0 ldfld DataStreams.DataStream stream callvirt Byte ReadOneByte() stloc.0 ldloc.0 ldc.i4 254 ceq stloc.2 ldloc.2 brtrue.s 45 ldarg.0 ldfld System.Reflection.Emit.OpCode[] singleByteOpCode ldloc.0 ldelema System.Reflection.Emit.OpCode ldobj System.Reflection.Emit.OpCode stloc.1 br.s 75 ldarg.0 ldfld System.Reflection.Emit.OpCode[] doubleByteOpCode ldarg.0 ldfld DataStreams.DataStream stream callvirt Byte ReadOneByte() ldelema System.Reflection.Emit.OpCode ldobj System.Reflection.Emit.OpCode stloc.1 br.s 75 ldloc.1 ret
As can be seen the code is identical regarding the instructions and operands, it's missing addresses and other operand data but other than that it's identical, the missing data can be also read btw.
Ending Words
The next big thing would be to parse those instructions and create C# code from it, but for this we would need to write a C# parser and that a lot harder to do properly and requires lot's or work and tears, but ultimately it would be 1000% more awesome so maybe if I will find the time I will take a stab at it, but then again Mono Cecil is in advanced stages of building a IL to C# parser so even better way of doing it would be to commit to that existing code base.
No comments:
Post a Comment