C# to IL 2 IL Basics(IL基础)
This chapter and the next couple of them will focus on and elicit a simple belief of ours,
that if you really want to understand C# code in earnest, then the best way of doing so is
by understanding the IL code generated by the C# compiler.
So, we shall raise the curtains with a small C# program and then explain the IL code
generated by the compiler. In doing so, we will be able to kill two birds with one stone:
Firstly, we will be able to unravel(解开) the mysteries of IL and secondly, we will obtain a more
intuitive understanding of the C# programming language.
We will first show you a .cs file and then a program written in IL by the C# compiler, whose
output will be the same as that of the .cs file. The output will be displayed of the IL code.
This will enhance our understanding of not only C# but also IL. So, without much ado, lets
take the plunge.
The above code is generated by the il disassembler
After executing ildasm on the exe file, we studied the IL code generated by the program.
Subsequently, we eliminated parts of the code that did not ameliorate our understanding of
IL. This consisted of some comments, directives, functions etc. The remaining IL code
presented is as close to the original as possible.
The advantage of this technique of mastering IL by studying the IL code itself is that, we
are learning from the master, i.e. the C# compiler, on how to write decent IL code. We
cannot find a better authority than the C# compiler to enlighten us about IL.
The rules for creating a static function abc remain the same as any other function such as
Main or vijay. As abc is a static function, we have to use the static modifier in the .method
directive.
When we want to call a function, the following information has to be provided in the order
given below:
• the return data type.
• the class name.
• the function name to be called.
• the data types of the parameters.
The same rules also apply when we call the .ctor function from the base class. It is
mandatory to write the name of the class before the name of the function. In IL, no
assumptions are made about the name of the class. The name defaults to the class we are
in while calling the function.
Thus, the above program first displays "hi" using the WriteLine function and then calls the
static function abc. This function too uses the WriteLine function to display "bye".
Static constructors are always called before any other code is executed. In C#, a static
constructor is merely a function with the same name as a class. In IL, the name of the
function changes to .cctor. Thus, you may have observed that in the earlier example, we
got a free function called ctor.
Whenever we have a class with no constructors, a free constructor with no parameters is
created. This free constructor is given the name .ctor. This knowledge should enhance our
ability as C# programmers, as we are now in a better position to comprehend as to what
goes on below the hood.
The static function gets called first and the function with the entrypoint directive gets
called thereafter.
The keyword new in C# gets converted to the assembler instruction newobj. This provides
evidence that IL is not a low level assembler, and that it can also create objects in memory.
The instruction newobj creates a new object in memory. Even in IL, we are shielded from
what new or newobj really does. This demonstrates that IL is not just another high level
language, but is designed in such a way that other modern languages can be compiled to
it.
The rules for using newobj are the same as that for calling a function. The full prototype of
the function name is required. In this case, we are calling the constructor without any
parameters, hence the function .ctor is called. In the constructor, the WriteLine function is
called.
As we had promised earlier, we are going to explain the instruction ldarg.0 here. Whenever
we create an object that is an instance of a class, it contains two basic entities:
• functions
• fields or variables i.e. data.
When a function gets called, it does not know or care as to where it is being called from or
who is calling it. It receives all its parameters off the stack. There is no point in having two
copies of a function in memory. This is because, if a class contains a megabyte of code,
each time we say 'new' on it, an additional megabyte of memory will be occupied.
When new is called for the first time, memory gets allocated for the code and the variables.
But thereafter, with every call on new, fresh memory is allocated only for the variables.
Thus, if we have five instances of a class, there will be only one copy of the code, but five
separate copies of the variables.
Every non-static or instance function is passed a handle which indicates the location of the
variables of the object that has called this function. This handle is called the this pointer.
'this' is represented by ldarg.0. This handle is always passed as the first parameter to every
instance function. Since it is always passed by default, it is not mentioned in the
parameter list of a function.
All the action takes place on the stack. The instruction pop removes whatever is on the top
of the stack. In this example, we use it to remove the instance of zzz that has been placed
on top of the stack by the newobj instruction.
The static constructor always gets called first whereas the instance constructor gets called
only after new. IL enforces this sequence of execution. The calling of the base class
constructor is not mandatory. Hence, to save space in our book, we have not shown its
code in all the programs.
In some cases, if we do not include the code of a constructor, the programs do not work.
Only in these cases, the code of the constructor has been included. The static constructor
does not call the base class constructor, also ‘this’ is not passed to static functions.
We have created two variables called i and j in our function Main in the C# program. They
are local variables and are created on the stack. On conversion to IL, if you notice, the
names of the variables are lost
The variables get created in IL through the locals directive, which assigns its own names to
the variables, beginning with V_0 and V_1 and so on. The data types are also altered from
int to int32 and from long to int64. The basic types in C# are aliases. They all get converted
to data types that IL understands.
The task on hand is to initialize the variable i to a value of 6. This value has to be loaded
on the stack or evaluation stack. The instruction to do so is ldc.i4.value. An i4 takes up
four bytes of memory.
The value mentioned in the syntax above is the constant that has to be put on the stack.
After the value 6 has been loaded on to the stack, we now need to initialize the variable i to
this value. The variable i has been renamed as V_0 and is the first variable in the locals
directive.
The instruction stloc.0 takes the value present at the top of the stack i.e. 6 and initializes
the variable V_0 to it. The process of initializing a variable is definitely complicated.
The second ldc instruction copies the value of 7 onto the stack. On a 32 bit machine,
memory can only be allocated in chunks of 32 bytes. In the same vein, on a 64 bit
machine, the memory is allocated in chunks of 64 bytes.
The number 7 is stored as a constant and requires only 4 bytes, but a long requires 8
bytes. Thus, we need to convert the 4 bytes to 8 bytes. The instruction conv.i8 is used for
this purpose. It places a 8 byte number on the stack. Only after doing so, we use stloc.1 to
initialize the second variable V_1 to the value of 7. Hence stloc.1
Thus, the ldc series is used to place a constant number on the stack and stloc is utilized to
pick up what is on the stack and initialize a local to that value.
Now you will finally be able to see the light at the end of the tunnel and understand as to
why we wanted you to read this book in the first place.
Let us understand the above code, one field at a time. We have created a variable i that is
static and initialized it to the value of 6. Since the variable i has not been given an access
modifier, the default value is private. The static modifier of C# is applicable to variables in
IL also.
The real action begins now. The variable needs to be assigned an initial value. This value
must be assigned in the static constructor only, because the variable is static. We employ
ldc to place the value 6 on the stack. Note that the locals directive is not used here.
To initialize i, we use the instruction stsfld that looks for a value on top of the stack. The
next parameter to the instruction stsfld is the number of bytes it has to pick up from the
stack to initialize the static variable. In this case, the number of bytes specified is 4.
The variable name is preceded by the name of the class. This is in contrast to the syntax of
local variables.
For the instance variable j, since its access modifier was public in C#, on conversion to IL,
its access modifier is retained as public. Since it is an instance variable, its value gets
initialized in the instance constructor. The instruction used here is stfld and not stsfld.
Here we need 8 bytes of the stack.
The rest of the code remains the same as before. Thus, we can see that the instruction
stloc is used to initialize locals and the instruction stfld is used to initialise fields
The main purpose of the above example is to verify whether the variable is initialized first
or the code contained in a constructor gets called first. The IL output demonstrates very
lucidly that, first all the variables get initialized and thereafter, the code in a constructor
gets executed.
You may have also noticed that the base class constructor gets executed first and then,
and only then, does the code that is written in a constructor, get called.
This nugget of knowledge is sure to enhance your understanding of C# and IL
We can print a number instead of a string by overloading the WriteLine function
First, we push the value 10 onto the stack using the ldc family. Observe carefully, the
instruction now is ldc.i4.s and then the value of 10. Any instruction takes 4 bytes in
memory, but when followed by .s takes only one byte.
Then the C# compiler calls the correct overloaded version of the WriteLine function, which
accepts an int32 value from the stack.
This is similar to printing strings
We shall now delve on how to print a number on the screen.
The WriteLine function accepts a string followed by a variable number of objects. The {0}
prints the first object after the comma. Even though there is no variable in the C# code, on
conversion to IL code, a variable of type int32 is created.
The string {0} is loaded on the stack using our trustworthy ldstr. Then, we place the
number that is to be passed as a parameter to the WriteLine function, on the stack. To do
so, we use ldc.i4.s which loads the constant value on the stack. After this, we initialize the
variable V_0 to 20 with the stloc.0 instruction. and then ldloca.s loads the address of the
local varable on the stack.
The major roadblock that we experience here is that the WriteLine function accepts a string
followed by an object as the next parameter. In this case, the variable is of value type and
not reference type.
An int32 is a value type variable whereas the WriteLine function wants a full-fledged object
of a reference type.
How do we solve the dilemma of converting a value type into a reference type?
As informed earlier, we use the instruction ldloca.s to load the address of the local variable
V_0 onto the stack. Thus, our stack contains a string followed by the address of a value
type variable, V_0.
Next, we call an instruction called box. There are only two types of variables in the .NET
world i.e. value types and reference types. Boxing is the method that .NET uses to convert
a value type variable into a reference type variable.
The box instruction takes an unboxed or value type variable and converts it into a boxed or
reference type variable. The box instruction needs the address of a value type on the stack
and allocates space on the heap for its equivalent reference type.
The heap is an area of memory used to store reference types. The values on the stack
disappear at the end of a function, but the heap is available for a much longer duration.
Once this space is allocated, the box instruction initializes the instance fields of the
reference object. Then, it assigns the memory location in the heap, of this newly
constructed object to the stack, The box instruction requires a memory location of a locals
variable on the stack.
The constant stored on the stack has no physical address. Thus, the variable V_0 is
created to provide the memory location.
This boxed version on the heap is similar to the reference type variable that we are familiar
with. It really does not have any type and thus looks like System.Object. To access its
specific values, we need to unbox it first. The WriteLine function does this internally.
The data type of the parameter that is to be boxed must be the same as that of the variable
whose address has been placed on the stack. We will subsequently explain these details
The above code is used to display the value of a static variable. The .cctor function
initializes the static variable to a value of 10. Then, the string {0} is stored on the stack.
The function ldsldfa loads the address of a static variable of a certain data type on the
stack. Then, as usual, box takes over. The explanation regarding the functionality of 'box'
given above is relevant here also.
Static variables in IL work in the same way as instance variables. The only difference is in
the fact that they have their own set of instructions. Instructions like box need a memory
location on the stack without discriminating(有差别的) between static and instance variables.
The only variation that we indulged in from the earlier program is that we have removed
the static constructor. All static variables and instance variables get initialized internally to
ZERO. Thus, IL does not generate any error. Internally, even before the static constructor
gets called, the field i is assigned an initial value of ZERO
We have initialised the local i to a value of 10. This cannot be done in the constructor since
the variable i has been created on the stack. Then, stloc.0 has been used to assign the
value of 10 to V_0. Thereafter, ldloc.0 has been ustilised to place the variable V_0 on the
stack, so that it is available to the WriteLine function.
The Writeline function thereafter displays the value on the screen. A field and a local
behave in a similar manner, except that they use separate sets of instructions.
All local variables have to be initialised, or else, the compiler will generate an unintelligible
error message. Here, even though we have eliminated the ldc and stloc instructions, no
error is generated at runtime. Instead, a very large number is displayed.
The variable V_0 has not been initialised to any value. It was created on the stack and
contained whatever value was available at the memory location assigned to it. On your
machine, the output will be very different than ours.
In a similar situation, the C# compiler will give you an error and not allow you to proceed
further, because the variable has not been initialized. IL, on the other hand, is a strange
kettle of fish. It is much more lenient in its outlook. It does very few error or sanity checks
on the source code. This has its drawback, maening, the programmer has to be much more
responsible and careful while using IL.
In the above example, a static variable has been initialised inside a function and not at the
time of its creation, as seen earlier. The function vijay calls the code present in the static
constructor.
The process given above is the only way to initialize a static or an instance variable.
The above program demonstrates as to how we can call a function with a single parameter.
The rules for placing parameters on the stack are similar to those for the WriteLine
function.
Now let us comprehend as to how a function receives parameters from the stack.
We begin by stating the data type and parameter name in the function declaration. This is
similar to the workings in C#.
Next, we use the instruction ldarga.s to load the address of the parameter i, onto the stack.
box will then convert the value type of this objct into object type and finally WriteLine
function uses these values to display the output on the screen.
In the above example, we have converted an int into an object because, the WriteLine
function requires the parameter to be of this data type.
The only method of achieving this conversion is by using the box instruction. The box
instruction converts an int into an object.
In the function abc, we accept a System.Object and we use the instruction ldarg and not
ldarga. The reason being, we require the value of the parameter and not its address. The
dot after the name signifies the parameter number. In order to place the values of
parameters on the stack, a new instruction is required.
Thus, IL handles locals, fields and parameters with their own set of instructions.
Functions return values. Here, a static function abc has been called. We know from the
function's signature that it returns an int. Return values are stored on the stack.
Thus, the stloc.1 instruction picks up the value on the stack and places it in the local V_1.
In this specific case, it is the return value of the function.
Newobj is also like a function. It returns an object which, in our case, is an instance of the
class zzz, and puts it on the stack.
The stloc instruction has been used repeatedly to initialize all our local variables. Just to
refresh your memory, ldloc does the reverse of this process.
A function has to just place a value on the stack using the trustworthy ldc and then cease
execution using the ret instruction.
Thus, the stack has a dual role to play:
• It is used to place values on the stack.
• It receives the return values of the functions
The only innovation and novelty that has been introduced in the above example is that the
return value of the function abc has been stored in an instance variable.
• Stloc assigns the value on the stack to a local variable.
• Ldloc, on the other hand, places the value of a local variable on the stack.
It is not understood as to why the object that looks like zzz has to be put on the stack
again, especially since abc is a static function and not an instance function. Mind you,
static functions are not passed the this pointer on the stack.
Thereafter, the function abc is called, which places the value 20 on the stack. The
instruction stfld picks up the value 20 from the stack, and initializes the instance variable i
with this value.
Local and instance variables are handled in a similar manner except that, the instructions
for their initialization are different.
The instruction ldfld does the reverse of what stfld does. It places the value of an instance
variable on the stack to make it available for the WriteLine function.
C# to IL 2 IL Basics(IL基础)的更多相关文章
- 《C# to IL》第二章 IL基础
如果你真的想要理解C#代码,那么最好的方法就是通过理解由C#编译器生成的代码.本章和下面两章将关注于此. 我们将用一个短小的C#程序来揭开IL的神秘面纱,并解释由编译器生成的IL代码.这样,我们就可以 ...
- 《C# to IL》第一章 IL入门
我们用C#.VB.NET语言编写的代码最终都会被编译成程序集或IL.因此用VB.NET编写的代码可以在C#中修改,随后在COBOL中使用.因此,理解IL是非常有必要的. 一旦熟悉了IL,理解.NET技 ...
- Language Basics:语言基础
Java包含多种变量类型:Instance Variables (Non-Static Fields)(实例变量):是每个对象特有的,可以用来区分各个实例Class Variables (Static ...
- 3.1 Templates -- Handlerbars Basics(Handlerbars基础知识)
一.简介 Ember.js使用Handlerbars模板库来强化应用程序的用户界面.它就像普通的HTML,但也给你嵌入表达式去改变现实的内容. Ember使用Handlerbars并且用许多新特性去扩 ...
- [No0000152]C#基础之IL,轻松读懂IL
先说说学IL有什么用,有人可能觉得这玩意平常写代码又用不上,学了有个卵用.到底有没有卵用呢,暂且也不说什么学了可以看看一些语法糖的实现,或对.net理解更深一点这些虚头巴脑的东西.其实IL本身逻辑很清 ...
- C#基础之IL ,轻松读懂中间代码IL 转载
[No0000152]C#基础之IL,轻松读懂IL 先说说学IL有什么用,有人可能觉得这玩意平常写代码又用不上,学了有个卵用.到底有没有卵用呢,暂且也不说什么学了可以看看一些语法糖的实现,或对.n ...
- 【计算机基础】IL代码-CLR平台上的字节码【什么是字节码?它与虚拟机的关系?】
字节码(英语:Bytecode)将虚拟机可以读懂的代码称之为字节码.将源码编译成虚拟机读的懂的代码,需要虚拟机转译后才能成为机器代码的中间代码 叫做字节码. 字节码主要为了实现特定软件运行和软件环境. ...
- CLR via C# 摘要二:IL速记
最简单的IL程序 .assembly test {} .method void Func() { .entrypoint ldstr "hello world" call void ...
- IL初步了解
一.概述: 近来也是在看AOP方面的东西,了解到Emit可以实现.之前对Emit的了解也就是停留在Reflector针对方法反编译出来的部分指令.就用这次机会学习下Emit也用这篇随笔记录下学习的过程 ...
随机推荐
- :工厂模式2:抽象工厂模式--Pizza
#ifndef __INGREDIENT_H__ #define __INGREDIENT_H__ #include <iostream> using namespace std; cla ...
- Android: apk反编译 及 AS代码混淆防反编译
一.工具下载: 1.apktool(资源文件获取,如提取出图片文件和布局文件) 反编译apk:apktool d file.apk –o path 回编译apk:apktool b path –o f ...
- 深入理解java虚拟机---Class文件(二十)
无符号数.表 当实现了不同语言的编译器,比如jython,jruby等等,那么就可以利用这些语言编写代码,通过各自的编译器编译成符合jvm规范的字节码文件,就可以利用jvm来执行了. Class文件在 ...
- StringUtils详细介绍
StringUtils详细介绍 public static void TestStr(){ #null 和 "" 操作~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ...
- Centos7安装RabbitMQ解决Erlang依赖报错
通过yum等软件仓库都可以直接安装RabbitMQ,但版本一般都较为保守. RabbitMQ官网提供了新版的rpm包(http://www.rabbitmq.com/download.html),但是 ...
- [PyImageSearch] Ubuntu16.04 使用OpenCV和python识别信用卡 OCR
在今天的博文中,我将演示如何使用模板匹配作为OCR的一种形式来帮助我们创建一个自动识别信用卡并从图像中提取相关信用卡数位的解决方案. 今天的博文分为三部分. 在第一部分中,我们将讨论OCR-A字体,这 ...
- Centos7部署kubernetes集群CA证书创建和分发(二)
1.解压软件包 [root@linux-node1 ~]# cd /usr/local/src/ [root@linux-node1 src]# ls k8s-v1.10.1-manual.zip [ ...
- jdk,jre和jvm
JDK(Java Development Kit)是针对Java开发员的产品,是整个Java的核心,包括了Java运行环境JRE.Java工具和Java基础类库 JRE是Java Runtime En ...
- HihoCoder - 1483 区间最值
给定n个数A1...An,小Ho想了解AL..AR中有多少对元素值相同.小Ho把这个数目定义为区间[L,R]的价值,用v[L,R]表示. 例如1 1 1 2 2这五个数所组成的区间的价值为4. 现在小 ...
- Appium Python API
1.contexts contexts(self): Returns the contexts within the current session. 返回当前会话中的上下文,使用后可以识别H5页面的 ...