Architectural Overview
 
A modern optimizing compiler can be logically divided into four parts:
 
  1. The compiler front end
The front end includes the scanner and parser which read the Java source and build an abstract syntax tree (AST) representation of the source code. The front end must also be able to read the symbol information in the Java ".class" files that are referenced by(被引用的) import statements. After converting the source into an AST, the front end resolves symbol declarations, does semantic analysis and builds the symbol table and other supporting data structures. The output of the front end is an AST where each node in the AST is annoted with either type or symbol information.
 
  1. Java symbol table design issues
The symbol table is one of the core data structures in a compiler. Unlike the AST, which can be deleted after the flow graph is built, the symbol table "lives" as long as the Java source is being compiled. Java's scoping and lack of(缺少) unique names within a scope complicate symbol table construction.
 
  1. The middle pass
The middle pass performs tree to tree transformations and builds the control flow graph of basic blocks that the optimizer works on. An example of a tree to tree transformations is method in-lining.
 
  1. The optimizer
The optimizer builds data structures that describe the variable usage throughout the control flow graph for the method (this is usally called global data flow). This information is used to optimize data references globally within a method.
 
  1. The code generator
The code generator generates instructions for the target processor. The code generation phase also does machine dependent optimization, including peep-hole optimization and load/store scheduling.
 
Java Symbol Table Design Issues
Many Java language processors do not read Java. Instead they read the Java class file and build the symbol table and abstract syntax tree from the class file. The Java represented in the Java class file is already syntatically and semantically correct. As a result the authors of these tools avoid the considerable difficulty involved with implementing a Java front end.
 
 
The designers of the Java programming language did not have ease of implementation in mind when they designed the langauge. This is as it should be, since easy of use in the language is more important. One of the difficulties encountered in designing a Java front end which does semantic analysis is symbol table design. This web page provides a somewhat rambling(有些不够专业的) discussion of the issues involved with the design of a Java symbol table.
 
The front end phase of a compiler is responsible for:
  1. Parsing the source language to recognize correct programs and report syntax errors for incorrect language constructs. In the case of the BPI Java front end, this is done by a parser generated with the ANTLR parser generator. The output of the parser is an abstract syntax tree (AST) which includes all declarations that were in the source.
  2. Reading declaration information in Java class files and, for a native Java compiler, building ASTs from the byte code stream. This also involves following the transitive closure(传递闭包) of the classes required to define the root class. (Def: transitive closure - All the nodes in a graph that are reachable from the root. In this case the graph is the tree of classes that are needed to define all the classes read by the compiler).
  3. Processing the declarations in the AST and class files to build the symbol table. Once they are processed the declarations are pruned (删减) from the AST.
 
The output of the front end is a syntactically and semantically correct AST where each node has a pointer to either an identifier (if it is a leaf) or a class type (if it is a non-terminal or a type reference like MyType.class).
The term "symbol table" is generic and usually refers to(涉及到) a data structure that is much more complex than a table (e.g., an array of structures). While symbols and types are being resolved, the symbol table must reflect the current scope of the AST being processed. For example, in the C code fragment below there are three variables named "x", all in different scopes.
 
static char x;
int foo() {
     int x;

     {
        float x;
     }
}
 
Resolving symbols and types requires traversing the AST to process the various declarations. As the traversal moves through scope in the AST, the symbol table reflects current scope, so that when the symbol for "x" is looked up, the symbol in the current scope will be returned. 
 
The scoped structure of the symbol table is only important while symbols and types are being resolved. After names are resolved, the association between a name in the AST and its symbol can be found directly via a pointer.
 
Compilers for languages like Pascal and C, which have simple hierarchical scope, frequently use symbol tables that directly mirror the language scope. There is a symbol table for every scope. Each symbol table has a pointer to its parent scope. At the root of the symbol table hierarchy is the global symbol table, which contains global symbols and functions (or, in the case of Pascal, procedures). When a function scope is entered, a function symbol table is created. The function symbol table parent pointer points to the next scope "upward" in the hierarchy (either the global symbol table, or in the case of Pascal, an enclosing procedure or function). A block symbol table would point to its parent, which would be a function symbol table. Symbol search traverses upward, starting with the local scope and moving toward the global scope.
 
The scope hierarchy is not needed once symbols and types have been resolved. However the local scope, for a method or a class remains important and the symbol tables for these local scopes must remain accessible to allow the compiler to iterate over all symbol in a given scope. For example, to generate code to allocate a stack frame when a method is called, the compiler must be able to find all the variables associated with the method. A Java compiler must be able to keep track of the members of a class, since these variables will be allocated in garbage collected memory.
 
Scope for most object oriented languages is more complicated than the scope for procedural(程序上的) languages like C and Pascal. C++ supports multiple inheritance and Java supports multiple interface definitions (multiple inheritance done right). The symbol table must also be efficient so compiler performance is not hurt by symbol table lookup in the front end. Symbol table design considerations for a Java compiler include:
  1. Java has a large global scope, since all classes and packages are imported into the global name space. Global symbols must be stored in a high capacity data structure that supports fast (O(n)) lookup (a hash table, for example).
  2. Java has lots of local scopes (classes, methods and blocks) that have relatively few symbols (compared to the global scope). Data structures that support fast high capacity lookup tend to introduce overhead (in either memory use or code complexity).支持快速高容量查找的数据结构往往会引入开销(在内存使用或代码复杂性方面) This is overkill for the local scope. The symbol table for the local scopes should be implemented with a data structure that is simple and relatively fast (e.g., (O(log2 n))). Examples include balanced binary trees and skip lists.
  3. The symbol table must be able to support multiple definitions for a name within a given scope. The symbol table must also help the compiler resolve the error cases where the same kind of symbol (e.g., a method) is declared more than once in a given scope.
 
In C names within a given scope must be unique. For example, in C a type named MyType and a function named MyType are not allowed. In Java names in a given scope are not required to be unique. Names are resolved by context. For example:

  

class Rose {
    Rose( int val ) { juliette = val; }
    public int juliette;
} // Rose

class Venice {
    void thorn {
            garden = new Rose( 42 );
            Rose( 86 );
            garden.Rose( 94 );
     }

     Rose Rose( int val ) { garden.juliette = val; }

     Rose garden;
} // venice

  

In this example there is a type named Rose, a Rose constructor, and a method named Rose that returns an object of type Rose. The compiler must know by context which is which. Also, note that the references to the Rose function and the garden type are references to objects declared later in the file.(注意,对Rose函数和garden类型的引用是对稍后在文件中声明的对象的引用。)
Most of the symbol scope in Java can be described by a simple hierarchy where a lower scope points to the next scope up. The exception is the interface list that may be associated with a Java class. Note that interfaces may also inherit from super interfaces.
The scopes in Java are outlined below: 
 
 Global (objects imported via import statements)
        Parent Interface (this may be a list)
          Interface (there may be a list of interfaces)
             Parent class
               Class
                 Method
                   Block
The symbol table and the semantic analysis code that checks the Java AST returned by the parser must be able to resolve whether a symbol definition is semantically correct. The presence of multiple definitions for a given name (e.g., multiple definitions of a class member) are allowed.对特定名字进行多重定义是可以的 However, ambiguous symbol use is not allowed:有歧义的符号引用则不被允许
 
Java Language Specification (JLS Java SE Specification) 8.3.3.3
A class may inherit two or more fields with the same name, either from two interfaces or from its superclass and an interface. A compile-time error occurs on any attempt to refer to any ambiguously inherited field by its simple name. A qualified name or field access expression that contains the keyword super (15.10.2) may be used to access such fields unambiguously.
 
Both a parent class and an interface place symbols defined in the class or interface in the local scope. In the example below the symbol x is defined in both bar and fu. This is allowed, since x is not referenced in the class DoD.
 
 
interface bar {
  int x = 42;
}

class fu {
  double x;
}

class DoD extends fu implements bar {
   int y;  // No error, since there is no local reference to x
}
 
If x is referenced in the class DoD, the compiler must report an error, since the reference to x is ambiguous.
class DoD extends fu implements bar {
  int y; 

  DoD() {
    y = x + 1;   // Error, since the reference to x is ambiguous
  }
}
Similar name ambiguity can exist with inner classes defined in an interface and a parent class:
interface BuildEmpire
{
  class KhubilaiKahn {
    public int a, b, c;
  }
}

class GengisKahn
{
  class KhubilaiKahn {
    public double x, y, z;
  }
}

class mongol extends GengisKahn implements BuildEmpire
{
  void mondo() {
    KhubilaiKahn TheKahn;  // Ambiguous reference to class KhubilaiKahn
  }
}
Java does not support multiple inheritance in the class hierarchy, but Java does allow a class to implement multiple interfaces or an interface to extend multiple interfaces.
 
 
Java Language Standard 9.3
It is possible for an interface to inherit more than one field with the same name (8.3.3.3). Such a situation does not in itself cause a compile-time error. However, any attempt within the body of the interface to refer to(涉及,参考) either field by its simple name will result in a compile-time error, because such a reference is ambigous.
For example, in the code below key is ambiguous.
 
interface Maryland
{
  String key = "General William Odom";
}

interface ProcurementOffice
{
  String key = "Admiral Bobby Inman";
}

interface NoSuchAgency extends Maryland, ProcurementOffice
{
  String RealKey = key + "42"; // ambiguous reference to key
}

  

When the semantic analysis phase looks up the symbol key the symbol table must allow the semantic checking code to determine that there are two member definitions for key. The symbol table must only group like symbols in the same scope together (e.g., members with members and types with types). Unlike symbols (methods, classes and member variables) are not grouped together because they are distinguished by context.
 
Multiple definitions of a method do not cause a semantic error in Java, since there is no multiple inheritance. If a method of the same name is inherited from two interfaces, for example, the method must either be the same or must define an overloaded version of the method. If there is a local method with the same name and arguments (e.g., same type signature) as a method defined in a parent class, the local method will be in a "lower" scope and will override the definition of the parent.

  

Design of a Java Symbol Table
  
Symbol table requirements
Taking into account the issues discussed above, a just symbol table must fulfill the following requirements:
  1. Support for multiple definitions for a given identifier.
  2. Fast lookup (O(n)) for a large global (e.g., package level) symbol base.
  3. Relatively fast lookup (O(log2 n)) for local symbols (e.g., local to a class, method or block)
  4. Support for Java hierarchical scope
  5. Searchable by symbol type (e.g., member, method, class).
  6. Quickly determine whether a symbol definition is ambiguous.
Symbol lifetime 
 
Languages like C can be compiled one function at a time. The global symbol table must retain the symbol information the functions and their arguments for the functions defined in the current file. But other local symbol information can be discarded after the function is compiled. When the compiler has processed all the functions in a given .c file (and its referenced include files), all symbols can be discarded.
 
C++ can be compiled in a similar fashion. Class definitions are defined in header files (e.g., .h files) for each file (e.g., .C or .cpp file) that references an object. When the file has been processed all symbols can be discarded. 
Java is more complicated. The Java compiler must read the Java symbol definitions for the class tree that is needed to define all classes referenced by the current class being compiled (the transitive closure 传递闭包 of all the class hierarchy). In the case of the object containing the main method, this includes all classes referenced in the program.
In theory Java symbols could be discarded once all of the classes that references them are compiled. In practice this is probably more trouble than it is worth on a modern computer system with lots of memory. So Java symbols live throughout the compile. 
 
Building Symbol Table Scope 
 
Hierarchical scope in the symbol table only needs to be available during the semantic analysis phase. After this phase, all symbols (identifier nodes) will point to the correct symbol. However, once scope is built, it is left in place.
Each local scope (e.g., block, method or class) has a local symbol table which points to the symbol table in the enclosing scope. At the root of the hierarchy is the global symbols table containing all global classes and imported symbols. During semantic analysis symbol search starts with the local symbol table and searches upward in the hierarchy, searching each symbol table until the global symbol table is searched. If the global symbol table is searched and the symbol is not present, the symbol does not exist. 
 
Java scope is not a simple hierarchy composed of unique symbols, as is the case with C. There may be multiple definitions for a symbol (e.g., a class member, a method and a class name). The symbols at a given scope level may come from more than one source. For example, in the Java code below the class gin and the interface tonic define(补药;主调音或基音) symbols at the same level of hierarchy. 
 interface tonic {
    int water = 1;
    int quinine = 2;
    int sugar = 3;
    int TheSame = 4;
  }

  class gin {
    public int water, alcohol, juniper;
    public float TheSame;
  }

  class g_and_t extends gin implements tonic {
    class contextName {
      public int x, y, z;
    } // contextName

    public int contextName( int x ) { return x; }
    public contextName contextName;
  }

Scope and Local Variables and Arguments

Local variables in Java are variables in methods. These variables are allocated in a stack frame and have a "life time" that exists as long as the method is active. A method may also have local scope created by blocks or statements. For example:
 
class bogus {
        public void foobar() {
          int a, b, c;

          { // this is a scope block
            int x, y, z;
          }
}

  

Unlike C and C++, Java does not allow a local variable to be redeclared: 
If a declaration of an identifier as a local variable appears within the scope of a parameter or local variable of the same name, a compile-time error occurs. Thus the following examples does not compile:
 
JLS 14.3.2 
class Test {
        public static void main( String[] args ) {
          int i;

          for (int i = 0; i < 10; i++)  // Error: local variable redefinition redeclared
            System.out.println(i);
        }
}

A local variable is allowed to redefine a class member. This makes variable redefinition a semantic check in the semantic analysis phase.

Forward reference of symbols 
 
A forward reference is a reference to a symbol that is defined texturally(组织上地) later in the code. 
 
When a class field is initialized, the initializer must have been previously declared and initialized. The following example (from JLS 6.3) results in a compile time error: 
 
class Test {
    int i = j;  // compile-time error: incorrect forward reference
    int j = 1;
}

Nor is forward reference allowed for local variables. For example:

class geomancy {
    public float circleArea( float r ) {
      float area;

      area = pie * r * r;     // undefined variable 'pie'
      float pie = (float)Math.PI;

      return area;
    }
  }

However, forward reference is allowed from a local scope (e.g., a method) to a class member defined in the enclosing class. For example, in the Java below the method getHexChar makes a forward reference to the class member hexTab:

class HexStuff {

  public char getHexChar( byte digit ) {

    digit = (byte)(digit & 0xf);
    char ch = hexTab[digit];  // legal forward reference to class member

    return ch;
  } // getHexchar

  private static char hexTab[] = new char[] { '0', '1', '2', '3',
		                              '4', '5', '6', '7',
                                              '8', '9', 'a', 'b',
                                              'c', 'd', 'e', 'f' };

} // HexStuff

  

Packages 
 
The root compilation unit in Java is the package, either an explicitly named package or an unnamed package (e.g., the file containing the main method). All packages import the default packages which include java.lang.* and any other packages that may be required by the local system. The user may also explicitly import other packages.
 
When package A imports package B, package B provides: 
  • Class and interface definitions that have the public modifier.
  • Sub-packages (e.g., packages that are imported into package B).
If package B imports package X which contains the public class foo, the class foo is referred to via the qualified name X.foo.
 
Packages add yet another level of complexity to the symbol table. A package exists as an object that defines a set of classes, interfaces and sub-packages. Once a package has been read by the compiler, it does not need to be read again when subsequent import statements are encountered, since its definition is already known to the compiler. 
The classes, interfaces and packages defined by a package are "imported" into the global scope of the current package. In the Java source, the type names defined in the imported package are referenced via simple names (JLS 6.5.4) and type names defined in the sub-packages of an imported package are referenced via qualified names. However, in the symbol table all type names have an associated fully qualified name. 
symbol Table Implementation Overview 符号表必须具备的功能 
  1. Support for multiple definitions for a given identifier.
All symbols that share the same identifier at a particular scope level are contained in a container. As noted above, an identifier may be a class member, method and local class definition. There may also be multiple instances for a given kind of definition. For example, in the Java above there two definitions for the class member TheSame. The container is searchable by identifier type (member, method or class) and it can quickly be determined whether there is more than on definition of a given type (leading to an ambiguous reference). If the object is named, the symbol will have a field that points to the symbol for its parent (e.g, a method or class). For a block this pointer will be null. Note that parent is not necessarily the parent scope. The symbols defined in the class gin and the interface tonic are in the same scope, but they may have different parents. 
  1. Fast global lookup
The global symbol table is implemented by a hash table with a large capacity (the hash table can support a large number of symbols without developing long hash chains). 
  1. Package information
Once a package is imported into the global scope, the package is not referenced again. The imported type names (classes and interfaces) are referenced as if they were defined in the current compilation unit (e.g., via simple type names). The sub-packages become objects in the global scope as well. Package type names and additional sub-packages are referenced via qualified names.
Package definitions are kept in a separate package table. Packages are imported into the global scope of the compilation unit from this table. Package information is live for as long as the main compilation unit is being compiled (e.g., through out the compile process). 
  1. Local lookup
In general the number of symbols in a local Java scope is small. Local symbol lookup must be fast, but not as fast as the global lookup, since there will usually be fewer symbols. 
I have considered three data structures for implementing the local symbol tables:
For small symbol table sizes the search time does not differ much for these three data structures. The binary tree has the example of being the smallest and simplest algorithm, so it has been chosen for local symbol tables. 
  1. Support for Java hierarchical scope
Each symbol table contains a pointer to the symbol table in the next scope up.
  1. Searchable by symbol type
The semantic analysis phase knows the context for the symbol it is searching for (e.g., whether the symbol should be a member, method or class). The symbol table hierarchy is searched by identifier and type.
  1. Quickly determine whether a symbol definition is ambiguous
Multiple symbol definitions for a given type of symbol (e.g., two member definitions) are chained together. If the next pointer is not NULL, there are multiple definitions. The error reporting code can use these definitions to report to the user where the clashing symbols were defined. 
 
Symbol Table Construction
 
All class member references are processed and entered into the symbol table before methods are processed. This allows references to(关于) class members within a method to be properly resolved. 
Declarations in a method are processed sequentially. If a name referenced in a method has not been "seen", an error will be reported (e.g., Undefined name).

Recursive Compilation and the Symbol Table

When a compilation unit (a package) is compiled, type and package information for all of the packages and classes that it references must be available. The Java Language Specification does not define exactly how this happens. The JLS states that compiled Java code may be stored in a database or in a directory hierarchy that mirrors the qualified names for imported packages and classes. Classes and packages must be accessable. The Java Virtual Machine Specification defines the information in a Java .class file, but it is silent on the issue of compile ordering. Although there is no specification for how Java should be compiled, there is "common practice".习惯作法 At least in the case of this design, "common practice" is based on Sun's javac compiler and Microsoft's Visual J++ compiler jvc.
When a compilation unit is compiled, all information about external classes referenced in the compilation unit is contained in .class files which are produced by compiling the associated Java code (usually stored in .java files). Class files may be packaged in .jar files, which are compressed archived .class file hierarchies in zip file format. The .class or .jar files are located in reference to either the local directory or the CLASSPATH environment variable. For this scheme to work, files names most correspond to the associated type name (e.g., class FooBar is implemented by FooBar.java).
If, when searching for a type definition, the Java compiler finds only a .java file defining the type or the .java file has a newer time stamp (usually file date and time) than the associated Java .class file, the Java compiler will recompile the type definition.
 
While compling the top level compilation unit, the Java compiler keeps track of package objects (where a package contains lists of types and sub-packages) imported by the compilation unit. Package type definitions that are not public are not kept by the compiler, since they cannot be seen outside the package.
 
 
 
 

  

  

 
 
 
 
 
 

  

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Architecture of a Java Compiler的更多相关文章

  1. Java compiler level does not match解决方法

    从别的地方导入一个项目的时候,经常会遇到eclipse/Myeclipse报Description  Resource Path Location Type Java compiler level d ...

  2. idea报错:error java compilation failed internal java compiler error

    idea下面报如下问题 error java compilation failed internal java compiler error 解决办法:Setting->Compiler-> ...

  3. idea Error:java: Compilation failed: internal java compiler error

    idea 遇到Error:java: Compilation failed: internal java compiler error 是提示说你当前使用的编译器jdk版本不对. 按住Ctrl+Alt ...

  4. java compiler level does not match the version of the installed java project facet 解决方案

    项目出现 java compiler level does not match the version of the installed java project facet 错误,一般是项目移植出现 ...

  5. idea之internal java compiler error

    启动错误:Error:java: Compilation failed: internal java compiler error 解决:将圈选地方改为对应的jdk版本即可

  6. Error:java:Compilation failed: internal java compiler error

    在IDEA中编译时出现这个错误:Error:java:Compilation failed: internal java compiler error! Information:Using javac ...

  7. java compiler level does not match the version of the installed java project facet

    Java compiler level does not match the version of the installed java project facet错误的解决 因工作的关系,Eclip ...

  8. Java compiler level does not match the version of the installed Java project facet.(转)

    Java compiler level does not match解决方法 从别的地方导入一个项目的时候,经常会遇到eclipse/Myeclipse报Description  Resource P ...

  9. maven项目 Java compiler level does not match the version of the installed Java project facet

    因工作的关系,Eclipse开发的Java项目拷来拷去,有时候会报一个很奇怪的错误.明明源码一模一样,为什么项目复制到另一台机器上,就会报“java compiler level does not m ...

随机推荐

  1. iPhone 物理尺寸与分辨率

    //    iPhone  物理尺寸(pt:Point)   分辨率(px) //    4S        320*480(3.5英寸)     640*960 //    5,5c,5S   32 ...

  2. (字典树)How many--hdu--2609

    http://acm.hdu.edu.cn/showproblem.php?pid=2609 How many Time Limit: 2000/1000 MS (Java/Others)    Me ...

  3. Spring MVC 的@RequestParam注解和request.getParameter("XXX")

    在SpringMVC后台控制层获取参数的方式主要有两种,一种是request.getParameter("name"),另外一种是用注解@RequestParam直接获取.这里主要 ...

  4. Delphi XE5 图解为Android应用制作签名

    http://redboy136.blog.163.com/blog/static/107188432201381872820132 Delphi XE5 图解为Android应用制作签名 2013- ...

  5. dpdk EAL: Error reading from file descriptor 23: Input/output error

    执行test程序时输出: EAL: Error reading from file descriptor 23: Input/output error 原因: 在虚拟机添加的网卡,dpdk不支持导致的 ...

  6. 879. Profitable Schemes

    There are G people in a gang, and a list of various crimes they could commit. The i-th crime generat ...

  7. docker registry 私有仓库 安装配置、查询、删除

    #++++++++++++++++++++++++++++++ #docker-registry 私有仓库 #搜索,下载register镜像 docker search registry docker ...

  8. 世界线(bzoj2894)(广义后缀自动机)

    由于春希对于第二世代操作的不熟练,所以刚使用完\(invasion process\)便掉落到了世界线之外,错综复杂的平行世界信息涌入到春希的意识中.春希明白了事件的真相. 在一个冬马与雪菜同时存在的 ...

  9. Divide and Conquer-169. Majority Element

    Given an array of size n, find the majority element. The majority element is the element that appear ...

  10. firebug中html显示为灰色的原因总结

    1.被设置了display:none. 2.长.宽都为0.