HotSpot模板解释器目标代码生成过程源码分析

　　虽然说解释执行模式是逐字逐句翻译给目标平台运行的，但这样的过程未免太过缓慢，如果能把字节码说的话做成纸条，运行时只要把对应的纸条交给目标平台就可以了，这样，执行速度就会明显提升。JVM的Hotspot虚拟机的模板解释器就是用这种方法来解释执行的。在开始分析之前，先了解一下JVM的执行方式。

　　(1).边解释边运行，即每次解释一条字节码并运行其解释的本地代码，这种执行引擎速度相对很慢
　　(2).JIT(即时编译)具有更快的运行速度但是需要更多的内存，方法被第一次调用时，字节码编译生成的本地代码将会被缓存，这样在该方法下次被调用的时候，将取出缓冲的本地代码直接运行
　　(3).自适应优化，对于经常被调用的方法，则会缓存其编译产生成的本地代码，对于其他较少被调用的代码，仍对其采用解释执行的方法。
　　(4).片上虚拟机，即虚拟机的执行引擎直接嵌入在片上

　　HotSpot虚拟机可以配置为以下运行模式：
-Xint：解释模式
-Xcomp：编译模式
-Xmixed：混合模式
(通过java -version就可以查看虚拟机的运行模式)

　　HotSpot在启动时，会为所有字节码创建在目标平台上运行的解释运行的机器码，并存放在CodeCache中，在解释执行字节码的过程中，就会从CodeCache中取出这些本地机器码并执行。

　　Hotspot虚拟机的细节技术实现值得借鉴，如果你觉得源码甚至汇编代码比较枯燥的话，也可以大致了解相关模块的组件、工作流程，对相关实现有一定的认识。

　　下面就从模板解释器的初始化开始，分析HotSpot的解释代码的生成。
　　在创建虚拟机时，在初始化全局模块过程中，会调用interpreter_init()初始化模板解释器，模板解释器的初始化包括抽象解释器AbstractInterpreter的初始化、模板表TemplateTable的初始化、CodeCache的Stub队列StubQueue的初始化、解释器生成器InterpreterGenerator的初始化。

 void TemplateInterpreter::initialize() {

   if (_code != NULL) return;

   // assertions

   //...

   AbstractInterpreter::initialize();

   TemplateTable::initialize();

   // generate interpreter

   { ResourceMark rm;

     TraceTime timer("Interpreter generation", TraceStartupTime);

     int code_size = InterpreterCodeSize;

     NOT_PRODUCT(code_size *= ;)  // debug uses extra interpreter code space

     _code = new StubQueue(new InterpreterCodeletInterface, code_size, NULL,

                           "Interpreter");

     InterpreterGenerator g(_code);

     if (PrintInterpreter) print();

   }

   // initialize dispatch table

   _active_table = _normal_table;

 }

1.AbstractInterpreter是基于汇编模型的解释器的共同基类，定义了解释器和解释器生成器的抽象接口。
2.模板表TemplateTable保存了各个字节码的模板(目标代码生成函数和参数)。
TemplateTable的初始化调用def()将所有字节码的目标代码生成函数和参数保存在_template_table或_template_table_wide(wide指令)模板数组中

  //                              interpr. templates

  // Java spec bytecodes          ubcp|disp|clvm|iswd  in    out   generator      argument

  def(Bytecodes::_nop           , ____|____|____|____, vtos, vtos, nop           ,  _      );

  def(Bytecodes::_aconst_null   , ____|____|____|____, vtos, atos, aconst_null   ,  _      );

  def(Bytecodes::_iconst_m1     , ____|____|____|____, vtos, itos, iconst        , -      );

  def(Bytecodes::_iconst_0      , ____|____|____|____, vtos, itos, iconst        ,        );

  def(Bytecodes::_iconst_1      , ____|____|____|____, vtos, itos, iconst        ,        );

  def(Bytecodes::_iconst_2      , ____|____|____|____, vtos, itos, iconst        ,        );

//...其他字节码的模板定义

其中，def()是查看数组对应项是否为空，若为空则初始化该数组项。

Template* t = is_wide ? template_for_wide(code) : template_for(code);

  // setup entry

  t->initialize(flags, in, out, gen, arg);

_template_table或_template_table_wide的数组项就是Template对象，即字节码的模板，Template的结构如下：

class Template VALUE_OBJ_CLASS_SPEC {

 private:

  enum Flags {

    uses_bcp_bit,                                // set if template needs the bcp pointing to bytecode

    does_dispatch_bit,                           // set if template dispatches on its own

    calls_vm_bit,                                // set if template calls the vm

    wide_bit                                     // set if template belongs to a wide instruction

  };

  typedef void (*generator)(int arg);

  int       _flags;                  // describes interpreter template properties (bcp unknown)

  TosState  _tos_in;                 // tos cache state before template execution

  TosState  _tos_out;                // tos cache state after  template execution

  generator _gen;                    // template code generator

  int       _arg;

_flags为标志，该项的低四位分别标志：

uses_bcp_bit，标志需要使用字节码指针(byte code pointer，数值为字节码基址+字节码偏移量)
does_dispatch_bit，标志是否在模板范围内进行转发，如跳转类指令会设置该位
calls_vm_bit，标志是否需要调用JVM函数
wide_bit，标志是否是wide指令(使用附加字节扩展全局变量索引)

_tos_in表示模板执行前的TosState(操作数栈栈顶元素的数据类型，TopOfStack，用来检查模板所声明的输出输入类型是否和该函数一致，以确保栈顶元素被正确使用)
_tos_out表示模板执行后的TosState
_gen表示模板生成器(函数指针)
_arg表示模板生成器参数

3.StubQueue是用来保存生成的本地代码的Stub队列，队列每一个元素对应一个InterpreterCodelet对象，InterpreterCodelet对象继承自抽象基类Stub，包含了字节码对应的本地代码以及一些调试和输出信息。
其内存结构如下：在对齐至CodeEntryAlignment后，紧接着InterpreterCodelet的就是生成的目标代码。

4.InterpreterGenerator根据虚拟机使用的解释器模型不同分为别CppInterpreterGenerator和TemplateInterpreterGenerator
根据不同平台的实现，以x86_64平台为例，TemplateInterpreterGenerator定义在/hotspot/src/cpu/x86/vm/templateInterpreter_x86_64.cpp

 InterpreterGenerator::InterpreterGenerator(StubQueue* code)

   : TemplateInterpreterGenerator(code) {

    generate_all(); // down here so it can be "virtual"

 }

(1).在TemplateInterpreterGenerator的generate_all()中，将生成一系列JVM运行过程中所执行的一些公共代码和所有字节码的InterpreterCodelet：

error exits：出错退出处理入口
字节码追踪入口(配置了-XX:+TraceBytecodes)
函数返回入口
JVMTI的EarlyReturn入口
逆优化调用返回入口
native调用返回值处理handlers入口
continuation入口
safepoint入口
异常处理入口
抛出异常入口
方法入口(native方法和非native方法)
字节码入口

(2).其中，set_entry_points_for_all_bytes()会对所有被定义的字节码生成目标代码并设置对应的入口(这里只考虑is_defined的情况)

 void TemplateInterpreterGenerator::set_entry_points_for_all_bytes() {

   for (int i = ; i < DispatchTable::length; i++) {

     Bytecodes::Code code = (Bytecodes::Code)i;

     if (Bytecodes::is_defined(code)) {

       set_entry_points(code);

     } else {

       //未被实现的字节码(操作码)

       set_unimplemented(i);

     }

   }

 }

(3).set_entry_points()将取出该字节码对应的Template模板，并调用set_short_enrty_points()进行处理，并将入口地址保存在转发表(DispatchTable)_normal_table或_wentry_table(使用wide指令)中

 void TemplateInterpreterGenerator::set_entry_points(Bytecodes::Code code) {

   CodeletMark cm(_masm, Bytecodes::name(code), code);

   // initialize entry points

   // ... asserts

   address bep = _illegal_bytecode_sequence;

   address cep = _illegal_bytecode_sequence;

   address sep = _illegal_bytecode_sequence;

   address aep = _illegal_bytecode_sequence;

   address iep = _illegal_bytecode_sequence;

   address lep = _illegal_bytecode_sequence;

   address fep = _illegal_bytecode_sequence;

   address dep = _illegal_bytecode_sequence;

   address vep = _unimplemented_bytecode;

   address wep = _unimplemented_bytecode;

   // code for short & wide version of bytecode

   if (Bytecodes::is_defined(code)) {

     Template* t = TemplateTable::template_for(code);

     assert(t->is_valid(), "just checking");

     set_short_entry_points(t, bep, cep, sep, aep, iep, lep, fep, dep, vep);

   }

   if (Bytecodes::wide_is_defined(code)) {

     Template* t = TemplateTable::template_for_wide(code);

     assert(t->is_valid(), "just checking");

     set_wide_entry_point(t, wep);

   }

   // set entry points

   EntryPoint entry(bep, cep, sep, aep, iep, lep, fep, dep, vep);

   Interpreter::_normal_table.set_entry(code, entry);

   Interpreter::_wentry_point[code] = wep;

 }

这里以非wide指令为例分析set_short_entry_points()。bep(byte entry point), cep, sep, aep, iep, lep, fep, dep, vep分别为指令执行前栈顶元素状态为byte/boolean、char、short、array/reference(对象引用)、int、long、float、double、void类型时的入口地址。

(4).set_short_entry_points()根据操作数栈栈顶元素类型进行判断，首先byte类型、char类型和short类型都应被当做int类型进行处理，对于非void类型将调用generate_and_dispatch()产生目标代码，这里以iconst_0为例对TOS的处理进行介绍：
对于iconst，其期望的_tos_in(执行前栈顶元素类型)是void类型(vtos)，期望的_tos_out(执行后栈顶元素类型)是int类型(itos)

 void TemplateInterpreterGenerator::set_short_entry_points(Template* t, address& bep, address& cep, address& sep, address& aep, address& iep, address& lep, address& fep, address& dep, address& vep) {

   assert(t->is_valid(), "template must exist");

   switch (t->tos_in()) {

     case btos:

     case ctos:

     case stos:

       ShouldNotReachHere();  // btos/ctos/stos should use itos.

       break;

     case atos: vep = __ pc(); __ pop(atos); aep = __ pc(); generate_and_dispatch(t); break;

     case itos: vep = __ pc(); __ pop(itos); iep = __ pc(); generate_and_dispatch(t); break;

     case ltos: vep = __ pc(); __ pop(ltos); lep = __ pc(); generate_and_dispatch(t); break;

     case ftos: vep = __ pc(); __ pop(ftos); fep = __ pc(); generate_and_dispatch(t); break;

     case dtos: vep = __ pc(); __ pop(dtos); dep = __ pc(); generate_and_dispatch(t); break;

     case vtos: set_vtos_entry_points(t, bep, cep, sep, aep, iep, lep, fep, dep, vep);     break;

     default  : ShouldNotReachHere();                                                 break;

   }

 }

其中__定义如下：

# define __ _masm->

即模板解释器的宏汇编器

(5).以期望的栈顶状态为vtos状态为例，分析set_vtos_entry_points()：

 void TemplateInterpreterGenerator::set_vtos_entry_points(Template* t,

                                                          address& bep,

                                                          address& cep,

                                                          address& sep,

                                                          address& aep,

                                                          address& iep,

                                                          address& lep,

                                                          address& fep,

                                                          address& dep,

                                                          address& vep) {

   assert(t->is_valid() && t->tos_in() == vtos, "illegal template");

   Label L;

   aep = __ pc();  __ push_ptr();  __ jmp(L);

   fep = __ pc();  __ push_f();    __ jmp(L);

   dep = __ pc();  __ push_d();    __ jmp(L);

   lep = __ pc();  __ push_l();    __ jmp(L);

   bep = cep = sep =

   iep = __ pc();  __ push_i();

   vep = __ pc();

   __ bind(L);

   generate_and_dispatch(t);

 }

以ftos入口类型为例(vtos即当前字节码的实现不关心栈顶元素的状态)，分析该入口的处理指令：
push_f()：
　　定义在 /hotspot/src/cpu/x86/vm/interp_masm_x86_64.cpp中

 void InterpreterMacroAssembler::push_f(XMMRegister r) {

   subptr(rsp, wordSize);

   movflt(Address(rsp, ), r);

 }

　　其中r的默认值为xmm0，wordSize为机器字长(如64位机器为8字节)

subptr()实际上调用了subq()：

 void MacroAssembler::subptr(Register dst, int32_t imm32) {

   LP64_ONLY(subq(dst, imm32)) NOT_LP64(subl(dst, imm32));

 }

subq()的实现如下：

 void Assembler::subq(Register dst, int32_t imm32) {

   (void) prefixq_and_encode(dst->encoding());

   emit_arith(0x81, 0xE8, dst, imm32);

 }

而emit_arith()将调用emit_byte()/emit_long()写入指令的二进制代码”83 EC 08”(由于8可由8位有符号数表示，第一个字节为0x81 | 0x02，即0x83，rsp的寄存器号为4，第二个字节为0xE8 | 0x04，即0xEC，第三个字节为0x08 & 0xFF，即0x08)，该指令即AT&T风格的sub $0x8,%rsp

 void Assembler::emit_arith(int op1, int op2, Register dst, int32_t imm32) {

   assert(isByte(op1) && isByte(op2), "wrong opcode");

   assert((op1 & 0x01) == , "should be 32bit operation");

   assert((op1 & 0x02) == , "sign-extension bit should not be set");

   if (is8bit(imm32)) { //iconst_0的操作数为0，即可以用8位二进制数表示

     emit_byte(op1 | 0x02); // set sign bit

     emit_byte(op2 | encode(dst));

     emit_byte(imm32 & 0xFF);

   } else {

     emit_byte(op1);

     emit_byte(op2 | encode(dst));

     emit_long(imm32);

   }

 }

emit_byte()定义在/hotspot/src/share/vm/asm/assembler.inlilne.hpp中：
该函数将把该字节复制到_code_pos处

 inline void AbstractAssembler::emit_byte(int x) {

   assert(isByte(x), "not a byte");

   *(unsigned char*)_code_pos = (unsigned char)x;

   _code_pos += sizeof(unsigned char);

   sync();

 }

故subq()向代码缓冲写入了指令sub $0x8,%rsp
类似地，movflt()向代码缓冲写入了指令 movss %xmm0,(%rsp)
jmp()向代码缓冲写入了指令jmpq (addr为字节码的本地代码入口)

set_vtos_entry_points()产生的入口部分代码如下：

 push %rax        .....(atos entry)

 jmpq <addr>

 sub $0x8,%rsp     .....(ftos entry)

 movss %xmm0,(%rsp)

 jmpq <addr>(addr为字节码的本地代码入口)

 sub $0x10,%rsp    .....(dtos entry)

 movsd %xmm0,(%rsp)

 jmpq <addr>

 sub $0x10,%rsp     .....(ltos entry)

 mov %rax,(%rsp)

 jmpq <addr>

 push %rax         ...(itos entry)

set_vtos_entry_points()的最后调用generate_and_dispatch()写入了当前字节码的解释代码和跳转到下一个字节码继续执行的逻辑处理部分

generate_and_dispatch()主要内容如下：

 void TemplateInterpreterGenerator::generate_and_dispatch(Template* t, TosState tos_out) {

   // ...

   // generate template

   t->generate(_masm);

   // advance

   if (t->does_dispatch()) {

     //asserts

   } else {

     // dispatch to next bytecode

     __ dispatch_epilog(tos_out, step);

   }

 }

这里我们以iconst()为目标代码生成器为例，分析generate()：

 void Template::generate(InterpreterMacroAssembler* masm) {

   // parameter passing

   TemplateTable::_desc = this;

   TemplateTable::_masm = masm;

   // code generation

   _gen(_arg);

   masm->flush();

 }

generate()会调用生成器函数_gen(_arg)，该函数根据平台而不同，如x86_64平台下，定义在/hotspot/src/cpu/x86/vm/templateTable_x86_64.cpp中

 void TemplateTable::iconst(int value) {

   transition(vtos, itos);

   if (value == ) {

     __ xorl(rax, rax);

   } else {

     __ movl(rax, value);

   }

 }

我们知道，iconst_i指令是将i压入栈，这里生成器函数iconst()在i为0时，没有直接将0写入rax，而是使用异或运算清零，即向代码缓冲区写入指令”xor %rax, %rax”；在i不为0时，写入指令”mov $0xi, %rax”

当不需要转发时，会调用dispatch_epilog()生成取下一条指令和分派的目标代码：

 void InterpreterMacroAssembler::dispatch_epilog(TosState state, int step) {

   dispatch_next(state, step);

 }

dispatch_next()实现如下：

 void InterpreterMacroAssembler::dispatch_next(TosState state, int step) {

   // load next bytecode (load before advancing r13 to prevent AGI)

   load_unsigned_byte(rbx, Address(r13, step));

   // advance r13

   increment(r13, step);

   dispatch_base(state, Interpreter::dispatch_table(state));

 }

dispatch_next()首先调用load_unsigned_byte()写入指令”movzbl (%r13),%rbx”，再调用increment()写入指令”inc/add (,)%r13”指令，最后调用dispatch_base()写入”jmp *(%r10,%rbx,8)”。这类似于PC自增一条指令的宽度再继续取值运行的过程。

分析到这里，不禁有一个疑问，_code_pos是哪里？之前说过，StubQueue是用来保存生成的本地代码的Stub队列，队列每一个元素对应一个InterpreterCodelet对象，InterpreterCodelet对象包含了字节码对应的本地代码以及一些调试和输出信息。那么_code_pos是如何和InterpreterCodelet对应的呢？

我们注意到无论是为JVM的各种入口函数，还是为字节码生成本地代码，都会构造一个CodeletMark对象

CodeletMark cm(_masm, Bytecodes::name(code), code);

CodeletMark的构造函数如下：在初始值列表中，调用了StubQueue的request()创建了一个InterpreterCodelet对象，并以该InterpreterCodelet目标代码地址和大小为参数构造了一块CodeBuffer用来存放生成的目标代码。

public:

  CodeletMark(

    InterpreterMacroAssembler*& masm,

    const char* description,

    Bytecodes::Code bytecode = Bytecodes::_illegal):

    _clet((InterpreterCodelet*)AbstractInterpreter::code()->request(codelet_size())),

    _cb(_clet->code_begin(), _clet->code_size())

  { // request all space (add some slack for Codelet data)

    assert (_clet != NULL, "we checked not enough space already");

    // initialize Codelet attributes

    _clet->initialize(description, bytecode);

    // create assembler for code generation

    masm  = new InterpreterMacroAssembler(&_cb);

    _masm = &masm;

  }

但在此时还未生成目标代码，所以并不知道生成的目标代码有多大，所以这里会向StubQueue申请全部的空闲空间(只留有一点用来对齐空间，注意StubQueue实际上是一片连续的内存空间，所有Stub都在该空间上进行分配)
随后初始化该InterpreterCodelet的描述部分和对应字节码，并以该CodeBuffer为参数构造了一个编译器对象InterpreterMacroAssembler

分析到这里，就应该明白编译器的_code_pos指的就是生成代码在CodeBuffer中的当前写位值
还需一提的就是CodeletMark的析构函数，这里确认编译器的生产代码完全写入到CodeBuffer中后，就会调用StubQueue的commit()将占用的空间划分为当前Stub(InterpreterCodelet)所有

~CodeletMark() {

    // align so printing shows nop's instead of random code at the end (Codelets are aligned)

    (*_masm)->align(wordSize);

    // make sure all code is in code buffer

    (*_masm)->flush();

    // commit Codelet

    AbstractInterpreter::code()->commit((*_masm)->code()->pure_insts_size());

    // make sure nobody can use _masm outside a CodeletMark lifespan

    *_masm = NULL;

  }

HotSpot模板解释器目标代码生成过程源码分析的更多相关文章

[inside hotspot] 汇编模板解释器(Template Interpreter)和字节码执行
[inside hotspot] 汇编模板解释器(Template Interpreter)和字节码执行 1.模板解释器 hotspot解释器模块(hotspot\src\share\vm\inter ...
【JVM源码解析】模板解释器解释执行Java字节码指令（上）
本文由HeapDump性能社区首席讲师鸠摩(马智)授权整理发布第17章-x86-64寄存器不同的CPU都能够解释的机器语言的体系称为指令集架构(ISA,Instruction Set Archit ...
【高速接口-RapidIO】5、Xilinx RapidIO核例子工程源码分析
提示:本文的所有图片如果不清晰,请在浏览器的新建标签中打开或保存到本地打开一.软件平台与硬件平台软件平台: 操作系统:Windows 8.1 64-bit 开发套件:Vivado2015.4.2 ...
5.Xilinx RapidIO核例子工程源码分析
https://www.cnblogs.com/liujinggang/p/10091216.html 一.软件平台与硬件平台软件平台: 操作系统:Windows 8.1 64-bit 开发套件:V ...
Activity启动过程源码分析（Android 8.0）
Activity启动过程源码分析本文来Activity的启动流程,一般我们都是通过startActivity或startActivityForResult来启动目标activity,那么我们就由此出 ...
springboot 事务创建流程源码分析
springboot 事务创建流程源码分析目录 springboot 事务创建流程源码分析 1. 自动加载配置 2. InfrastructureAdvisorAutoProxyCreator类 3 ...
[Android]从Launcher开始启动App流程源码分析
以下内容为原创,欢迎转载,转载请注明来自天天博客:http://www.cnblogs.com/tiantianbyconan/p/5017056.html 从Launcher开始启动App流程源码 ...
[Android]Android系统启动流程源码分析
以下内容为原创,欢迎转载,转载请注明来自天天博客:http://www.cnblogs.com/tiantianbyconan/p/5013863.html Android系统启动流程源码分析首先 ...
Android系统默认Home应用程序（Launcher）的启动过程源码分析
在前面一篇文章中,我们分析了Android系统在启动时安装应用程序的过程,这些应用程序安装好之后,还须要有一个Home应用程序来负责把它们在桌面上展示出来,在Android系统中,这个默认的Home应 ...

随机推荐

poj 1995 Raising Modulo Numbers 题解
Raising Modulo Numbers Time Limit: 1000MS Memory Limit: 30000K Total Submissions: 6347 Accepted: ...
线段树(Segment Tree)（转）
原文链接:线段树(Segment Tree) 1.概述线段树,也叫区间树,是一个完全二叉树,它在各个节点保存一条线段(即“子数组”),因而常用于解决数列维护问题,基本能保证每个操作的复杂度为O(lg ...
MySQL模糊查询(like)时区分大小写
问题说明:通过上面的语句,你会发现MySQL的like查询是不区分大小写的,因为我的失误,把Joe写成了joe才发现了这个东东吧.但是,有时候,我们需要区分大小写的是,该怎么办呢?解决方法如下: 方法 ...
mybatis连接mysql数据库插入中文乱码
对于MySQL数据库的乱码问题,有两种情况: 1. mysql数据库编码问题(建库时设定). 2. 连接mysql数据库的url编码设置问题. 对于第一个问题,目前个人发现只能通过重新建库解决,建库的 ...
iOS: NSURLConnection详解
摘要: NSURLConnection是iOS网络编程中一个比较旧的类,在需要兼容低版本的系统时,NSURLConnection也是一个不错的选择. 一.引言在iOS7后,NSURLSession基 ...
挑战黑客极限:Pwn2Own 2015成史上“最难”黑客大赛
Pwn2Own是全球最著名.奖金最丰厚的黑客大赛,由美国五角大楼入侵防护系统供应商TippingPoint赞助.近日Pwn2Own 2015公布全新的比赛规则,本届赛事难度超高.史无前例,包括VUPE ...
Cookie && Session之验证实例
为了防止各种自动登录,以及反作弊和破坏,往往会要求登录时让用户输入随机产生的验证码(这组验证码是一组数字和字母),这样可以起到一定的防止他人利用程序让机器自动反复登录的情况.在PHP下要实现这种功能是 ...
es修改索引副本个数
es修改索引副本个数 PUT index01/_settings { "number_of_replicas": 2 }
图结构练习——BFS——从起始点到目标点的最短步数（邻接表+BFS）
图练习-BFS-从起点到目标点的最短步数 Time Limit: 1000ms Memory limit: 65536K 有疑问?点这里^_^ 题目描写叙述在古老的魔兽传说中.有两个军团,一个 ...
gem 更新源设置，ruby安装
gem sources --remove http://rubygems.org/ gem sources -a http://ruby.taobao.org/ gem sources -l 结果只有 ...

HotSpot模板解释器目标代码生成过程源码分析

HotSpot模板解释器目标代码生成过程源码分析的更多相关文章

随机推荐

热门专题