javac文件系统

1、文件

Java编译器在编译的过程中会涉及到对各种文件的搜索和查找，例如在文件夹下搜索.java源在压缩包*.jar内搜索.class文件，同时也会将编译生成的二进制文件写入文件。Java编译器有自己的文件相关类及管理类，以方便对文件进行各种操作。

为什么要用ct.sym要研究：https://blog.csdn.net/blomule/article/details/40866271

1.1 文件相关实现类

包是存放类/集合的目录或者压缩包。包与类的关系类似于目录/压缩包与文件。Java类库大多以压缩包形式存储，如*.jar，实际上,lib目录下的ct.sym也是压缩包。

javac在编译类时，如果要使用JDK中rt.jar提供的一些类库API，那么会使用ct.sym。这样做是为了避免开发人员合适一些内部的API。这样做是避免当java开发人员调整这些接口造成客户端代码无法运行。

Java中处理最多的就是.class与.java结尾的文件，这些文件都以对象来表示并且需要专门进行管理，其文件对象的继承体系如下：

其中的RegularFileObject就是普通我们自己编写的java类对象，而SymbolFileObject就代表了ct.sym压缩包对象，ZipFileObject代表了除rt.jar外的所有jar包。另外还可以看到有ZipFileIndexFileObject，这表示其它我们也可以为其它的ZipFile压缩包建立快速索引，由于篇幅的限制，在这里不做讲解。

1.2 ZipArchive嵌套类

ZipArchive是ZipFileObject类的内部类，对于除rt.jar外的所有jar都是ZipArchive对象，这个类中有两个重要的属性定义：

/**
 * The index for the contents of this archive.
 */
// 相对路径与class文件的对应关系
protected final Map<RelativeDirectory,List<String>> map;
/**
 * The zip file for the archive.
 */
public final ZipFile zfile;

在构造函数初始化时一般会调用initMap()方法从ZipFile中读取信息填充map。这个map保存了RelativeDirectory目录的相对路径到类的映射关系。　　

protected void initMap() throws IOException {
    for (Enumeration<? extends ZipEntry> e = zfile.entries(); e.hasMoreElements(); ) {
        ZipEntry entry;
        try {
            entry = e.nextElement();
        } catch (InternalError ex) {
            IOException io = new IOException();
            io.initCause(ex); // convenience constructors added in Mustang :-(
            throw io;
        }
        addZipEntry(entry);
    }
}

public void addZipEntry(ZipEntry entry) {
    String name = entry.getName();
    int i = name.lastIndexOf('/');
    String n = name.substring(0, i+1);
    RelativeDirectory dirname = new RelativeDirectory(n);
    String basename = name.substring(i+1);
    if (basename.length() == 0)
        return;
    List<String> list = map.get(dirname);
    if (list == null)
        list = List.nil();
    list = list.prepend(basename);
    map.put(dirname, list);
}

1.3 SymbolArchive嵌套类

SymbolFileObject代表ct.sym文件，继承了ZipArchive类，在构造函数初始化时也会调用initMap()方法，不过覆写了自己的addZipEntry()方法，如下：

@Override
public void addZipEntry(ZipEntry entry) {
    String name = entry.getName();
    if (!name.startsWith(prefix.path)) {
        return;
    }
    name = name.substring(prefix.path.length());
    int i = name.lastIndexOf('/');
    RelativeDirectory dirname = new RelativeDirectory(name.substring(0, i+1));
    String basename = name.substring(i + 1);
    if (basename.length() == 0) {
        return;
    }
    List<String> list = map.get(dirname);
    if (list == null)
        list = List.nil();
    list = list.prepend(basename);
    map.put(dirname, list);
}

主要是屏蔽掉不以META-INF/sym/rt.jar/开头的相对路径。　

2、文件的管理

JavacFileManager类注释如下：

This class provides access to the source, class and other files used by the compiler and related tools.

而文件管理类主要是JavacFileManager类，继承关系如下图。

抽象类BaseFileManger，这个类中有一些共有的实现方法，并不会涉及到具体具体文件对象或者类路径的引用，而实现的接口StandardJavaFileManager与JavaFileManager是Java专门针对文件操作而定义的相关接口。

（2）声明只是表示需要一个List类，并没有指明这个List类以什么样的文件格式存在，如.class或者.java，那么该以什么格式进行搜索呢？或者.class与.java同时存在时该选取哪一种格式呢？

2.1 获取JavacFileManager实例

查看Context类的解释及实例

2.2 Location及Path类

其主要的实现类为StandardLocation，这是一个枚举类，定义了几个重要的枚举常量：

public enum StandardLocation implements Location {

    /**
     * Location of new class files.
     */
    CLASS_OUTPUT,

    /**
     * Location of new source files.
     */
    SOURCE_OUTPUT,

    /**
     * Location to search for user class files.
     */
    CLASS_PATH,

    /**
     * Location to search for existing source files.
     */
    SOURCE_PATH,

    /**
     * Location to search for annotation processors.
     */
    ANNOTATION_PROCESSOR_PATH,

    /**
     * Location to search for platform classes.  Sometimes called
     * the boot class path.
     */
    PLATFORM_CLASS_PATH;
    //...
}

Java编译器可能输出源代码或者二进制的class文件，而文件生成到哪里是通过CLASS_OUTPUT与SOURCE_OUTPUT来指定的。

对.java及.class文件的搜索路径进行了归类，主要是4大类：

（1）PLATFORM_CLASS_PATH

（2）SOURCE_PATH

（3）CLASS_PATH

（4）ANNOTATION_PROCESSOR_PATH

优先在PLATFORM_CLASS_PATH类别下搜索.class类型的文件，将会搜索到<java_home>/lib和<java_home>/ext包下的jar文件。

这两个路径只有在指定了-classpath或者-sourcepath时才会有用。

1、当 -sourcepath 没有指定时，在 -classpath 路径里面搜索 .class 和 .java 文件

2、当 -sourcepath 指定时，只搜索 -classpath 路径下的 .class 文件，即使-classpath 路径下有要找的.java文件也会不搜索这个文件

3、 -sourcepath 只搜索 .java 文件，不搜索 .class 文件。因此应该避免用 -sourcepath，而只用 -classpath 来指定搜索 .class 和 .java 文件的路径

在Paths中对定义了前三个类别的搜索路径，如下：

protected void lazy() {
    if (!inited) {
        warn = lint.isEnabled(Lint.LintCategory.PATH);
        pathsForLocation.put(PLATFORM_CLASS_PATH, computeBootClassPath());
        pathsForLocation.put(CLASS_PATH, computeUserClassPath());
        pathsForLocation.put(SOURCE_PATH, computeSourcePath());
        inited = true;
    }
}

（1）PLATOFRM_CLASS_PATH代表的搜索路径是通过调用computeBootClassPath()方法得到的，这个方法的实现如下：

private Path computeBootClassPath() {
        defaultBootClassPathRtJar = null;
        Path path = new Path(this);

        String bootclasspathOpt = options.get(BOOTCLASSPATH); // -bootclasspath
        String endorseddirsOpt = options.get(ENDORSEDDIRS); // -endorseddirs
        String extdirsOpt = options.get(EXTDIRS); // -extdirs
        String xbootclasspathPrependOpt = options.get(XBOOTCLASSPATH_PREPEND); // -Xbootclasspath/p:
        String xbootclasspathAppendOpt = options.get(XBOOTCLASSPATH_APPEND); // -Xbootclasspath/a:

        path.addFiles(xbootclasspathPrependOpt);

        if (endorseddirsOpt != null) {
            path.addDirectories(endorseddirsOpt);
        }else {
            path.addDirectories(System.getProperty("java.endorsed.dirs"), false);
        }

        if (bootclasspathOpt != null) {
            path.addFiles(bootclasspathOpt);
        } else {
            // Standard system classes for this compiler's release.
            String files = System.getProperty("sun.boot.class.path");
            path.addFiles(files, false);
            File rt_jar = new File("rt.jar");
            for (File file : getPathEntries(files)) {
                if (new File(file.getName()).equals(rt_jar)) {
                    defaultBootClassPathRtJar = file;
                }
            }
        }

        path.addFiles(xbootclasspathAppendOpt);

        // Strictly speaking, standard extensions are not bootstrap
        // classes, but we treat them identically, so we'll pretend
        // that they are.
        if (extdirsOpt != null) {
            path.addDirectories(extdirsOpt);
        }else {
            path.addDirectories(System.getProperty("java.ext.dirs"), false);
        }

        isDefaultBootClassPath =
                (xbootclasspathPrependOpt == null) &&
                (bootclasspathOpt == null) &&
                (xbootclasspathAppendOpt == null);

        return path;
    }

通过这个方法可以清楚的看到，在Javac中指定一些路径之间的关系，不过我们一般都不会通过命令来指定这些路径，默认会获取到。

（2）SOURCE_PATH

private Path computeSourcePath() {
        String sourcePathArg = options.get(SOURCEPATH);
        if (sourcePathArg == null) {
            return null;
        }

        return new Path(this).addFiles(sourcePathArg);
    }

（3）CLASS_PATH

 private Path computeUserClassPath() {
        String cp = options.get(CLASSPATH);

        // CLASSPATH environment variable when run from `javac'.
        if (cp == null) {
            cp = System.getProperty("env.class.path");
        }

        // If invoked via a java VM (not the javac launcher), use the
        // platform class path
        if (cp == null &&
                System.getProperty("application.home") == null) {
            cp = System.getProperty("java.class.path");
        }

        // Default to current working directory.
        if (cp == null) {
            cp = ".";
        }

        return new Path(this)
            .expandJarClassPaths(true)        // Only search user jars for Class-Paths
            .emptyPathDefault(new File("."))  // Empty path elt ==> current directory
            .addFiles(cp);
    }

JavacFileManager类中的共有接口，如下：

public Iterable<? extends File> getLocation(Location location) {
        nullCheck(location);
        paths.lazy();
        if (location == CLASS_OUTPUT) {
            return (getClassOutDir() == null ? null : List.of(getClassOutDir()));
        } else if (location == SOURCE_OUTPUT) {
            return (getSourceOutDir() == null ? null : List.of(getSourceOutDir()));
        } else {
            return paths.getPathForLocation(location);
        }
    }

2.3 JavacFileManager的实现

编译器主要通过JavacFileManager来完成源文件、二进制及其它文件的获取，例如如下实例在编译源代码时，经常通过import关键字声明对依赖类的导入，如下：

package com.test20;

import java.util.List;

class TestHH{
	List<String> l = null;
}

从import声明就可以看出，这个类依赖于java.util包下的公共类List，如果要查找及加载这个依赖类，肯定会去java.util包下找一个名称为List的文件，因为List类与当前类不在同一个包中，肯定是public修饰符修饰的类，而Java规定由public修饰符修饰的类必须与文件同名。

在真正实现时必须要考虑下面2个问题：

（1）java.util只是包名，是查找类的相对路径。而要想加载一个文件必须要确定其绝对路径，该如何得到这个类的绝对路径呢？

JavacFileManager类提供了一个重要的搜索API，实现如下：

public Iterable<JavaFileObject> list(Location location,
                                     String packageName,
                                     Set<JavaFileObject.Kind> kinds,
                                     boolean recurse) throws IOException {
    // validatePackageName(packageName);
    nullCheck(packageName);
    nullCheck(kinds);

    Iterable<? extends File> path = getLocation(location);
    if (path == null) {
        return List.nil();
    }
    RelativeDirectory subdirectory = RelativeDirectory.forPackage(packageName);
    ListBuffer<JavaFileObject> results = new ListBuffer<JavaFileObject>();

    for (File directory : path) {
        listContainer(directory, subdirectory, kinds, recurse, results);
    }
    return results.toList();
}

这个方法中涉及到了几个辅助类，如Location、JavaFileObject.Kind与RelativeDirectory。其中的Location代表了搜索的具体路径，上一章详细介绍过，而RelativeDirectory代表了相对路径，父类为 RelativePath类代表相对路径，主要有两个实现类RelativeFile与RelativeDirectory，继承关系如下图所示。

其中RelativeFile代表文件的相对路径，而RelativeDirectory代表了文件夹的相对路径。

JavaFileObject.Kind枚举类正是指定了搜索文件的格式，如下：

/**
     * Kinds of JavaFileObjects.
     */
    enum Kind {
        /**
         * Source files written in the Java programming language.  For
         * example, regular files ending with {@code .java}.
         */
        SOURCE(".java"),

        /**
         * Class files for the Java Virtual Machine.  For example,
         * regular files ending with {@code .class}.
         */
        CLASS(".class"),

        /**
         * HTML files.  For example, regular files ending with {@code
         * .html}.
         */
        HTML(".html"),

}

不过最常见的还是.java与.class文件。　　

这样我们就可以在Location指定的类别路径下通过包的相对路径RelativeDirectory和指定的文件格式JavaFileObject.Kind来搜索文件了，可以看到调用了listContainer()方法，这个方法的实现如下：

/**
 * container is a directory, a zip file, or a non-existant path.
 * Insert all files in subdirectory subdirectory of container which
 * match fileKinds into resultList
 */
private void listContainer(File container,
                           RelativeDirectory subdirectory,
                           Set<JavaFileObject.Kind> fileKinds,
                           boolean recurse,
                           ListBuffer<JavaFileObject> resultList) {
    // 取出来的一定是ct.sym或者jar或者是生成的索引文件，不会存储目录
    Archive archive = archives.get(container);
    if (archive == null) {
        // archives are not created for directories.
        // jar包不是Directory,如resources.jar
        if  (fsInfo.isDirectory(container)) {
            listDirectory(container,subdirectory,fileKinds,recurse,resultList);
            return;
        }

        // Not a directory; either a file or non-existant, create the archive
        try {
            // 因为archive为空，又不是目录，所以可能是archive没有打开
            archive = openArchive(container);
        } catch (IOException ex) {
            log.error("error.reading.file",container, getMessage(ex));
            return;
        }
    }
    listArchive(archive,subdirectory,fileKinds,recurse,resultList);
}

由于ct.sym与jar包在编译器中都是以Archive

下面首先来看listDirectory()方法的实现，如下：

/**
 * Insert all files in subdirectory subdirectory of directory directory
 * which match fileKinds into resultList
 */
private void listDirectory(File directory,
                           RelativeDirectory subdirectory,
                           Set<JavaFileObject.Kind> fileKinds,
                           boolean recurse,
                           ListBuffer<JavaFileObject> resultList) {

    // directory拼接上subdirectory后形成的路径
    File d = subdirectory.getFile(directory);
    if (!caseMapCheck(d, subdirectory)) {
        return;
    }

    File[] files = d.listFiles();
    if (files == null) {
        return;
    }

    for (File f: files) {
        String fname = f.getName();
        if (f.isDirectory()) { // 是目录
            if (recurse && SourceVersion.isIdentifier(fname)) {
                // 递归时directory值不变，而subdirectory值
                RelativeDirectory subDir = new RelativeDirectory(subdirectory, fname);
                // 递归调用
                listDirectory(directory,subDir,fileKinds,recurse,resultList);
            }
        } else { // 是文件
            if (isValidFile(fname, fileKinds)) {
                File file = new File(d, fname);
                JavaFileObject fe = new RegularFileObject(this, fname,file);
                resultList.append(fe);
            }
        }
    }
}

接着看listContainer()方法中调用的openArchive()方法的源代码实现，这个方法的实现有些复杂，

在openArchive()方法中首先对对ct.sym做了特殊处理，如下：

if (!ignoreSymbolFile &&  // 不忽略符号文件
            paths.isDefaultBootClassPathRtJar(zipFileName) // zipFileName为rt.jar
     ){
        File file = zipFileName.getParentFile().getParentFile(); // ${java.home}
        // C:\Program Files\Java\jdk1.7.0_79\jre\lib\rt.jar
        if (new File(file.getName()).equals(new File("jre"))) {
            file = file.getParentFile(); // C:\Program Files\Java\jdk1.7.0_79
        }
        // file == ${jdk.home}
        // C:\Program Files\Java\jdk1.7.0_79\lib  =>  C:\Program Files\Java\jdk1.7.0_79\lib\ct.sym
        for (String name : symbolFileLocation) {
            file = new File(file, name);
        }
        // file == ${jdk.home}/lib/ct.sym
        if (file.exists()) {
            zipFileName = file; // 最后拼接后的zipFileName路径为C:\Program Files\Java\jdk1.7.0_79\lib\ct.sym
        }
    }

代码复杂，其实就是通过rt.jar的绝对路径找到ct.sym的绝对路径，如我本机 rt.jar的绝对路径为C:\Program Files\Java\jdk1.7.0_79\jre\lib\rt.jar，则最终zipFileName的路径变为C:\Program Files\Java\jdk1.7.0_79\lib\ct.sym。

然后将ZipFile对象进一步封装为ZipArchive与SymbolArchive，之前讲解过，如果调用ZipArchive与SymbolArchive的构造函数，会初始化其中的map属性并填充值。

下面将File到Archive的对应关系保存到JavacFileManager的全局map中，如下：

/** A directory of zip files already opened.
*/
Map<File, Archive> archives = new HashMap<File,Archive>();

回到listContainer()方法中，完成最后一个方法的调用，如下：

listArchive(archive,subdirectory,fileKinds,recurse,resultList);

listArchive()方法的源代码如下：

/**
 * Insert all files in subdirectory subdirectory of archive archive
 * which match fileKinds into resultList
 */
private void listArchive(Archive archive,
                           RelativeDirectory subdirectory,
                           Set<JavaFileObject.Kind> fileKinds,
                           boolean recurse,
                           ListBuffer<JavaFileObject> resultList) {
    // Get the files directly in the subdir
    // 获取压缩包中的所有文件
    List<String> files = archive.getFiles(subdirectory);
    if (files != null) {
        for (; !files.isEmpty(); files = files.tail) {
            String file = files.head;
            if (isValidFile(file, fileKinds)) {
                JavaFileObject jfo = archive.getFileObject(subdirectory, file);
                resultList.append(jfo);
            }
        }
    }
    if (recurse) {
        // 获取压缩包中所有的目录
        for (RelativeDirectory s: archive.getSubdirectories()) {
            if (subdirectory.contains(s)) {
                // Because the archive map is a flat list of directories,
                // the enclosing loop will pick up all child subdirectories.
                // Therefore, there is no need to recurse deeper.
                listArchive(archive, s, fileKinds, false, resultList); // 递归调用
            }
        }
    }
}

3、实例分析

编译器要分析源文件，首先要获取通过路径找到这个源文件，然后获取到字符流，在JavaCompiler中有如下调用：

/**
 * Parse contents of file.
 * @param filename The name of the file to be parsed.
 */
public JCCompilationUnit parse(JavaFileObject filename) {
    JavaFileObject prev = log.useSource(filename);
    try {
        CharSequence content = readSource(filename);
        JCCompilationUnit t = parse(filename, content);
        if (t.endPositions != null) {
            log.setEndPosTable(filename, t.endPositions);
        }
        return t;
    } finally {
        log.useSource(prev);
    }
}

通过调用readSource()方法来获取字符流，然后供下一阶段的词法分析使用，普通的java源文件一般会封装为RegularFileObject对象，然后在readSource()方法中调用了文件对象的getCharContent()方法，源代码如下：

@Override
public CharBuffer getCharContent(boolean ignoreEncodingErrors) throws IOException {
    CharBuffer cb = fileManager.getCachedContent(this);
    if (cb == null) {
        InputStream in = new FileInputStream(file);
        try {
            ByteBuffer bb = fileManager.makeByteBuffer(in);
            JavaFileObject prev = fileManager.log.useSource(this);
            try {
                cb = fileManager.decode(bb, ignoreEncodingErrors);
            } finally {
                fileManager.log.useSource(prev);
            }
            fileManager.recycleByteBuffer(bb);
            if (!ignoreEncodingErrors) {
                fileManager.cache(this, cb);
            }
         } finally {
            in.close();
        }
    }
    return cb;
}

首先从JavacFileManager的缓存中获取当前文件对象的字符流，其实就是通过全局的map来保存从文件对象到字符缓冲的映射，如下：

protected final Map<JavaFileObject, ContentCacheEntry> contentCache = new HashMap<JavaFileObject, ContentCacheEntry>();

这个ContentCacheEntry类是BaseFileManager类中定义的一个私有静态内部类，这个ContentCacheEntry类内部是通过软引用来保持对缓冲的引用的。这样我们就知道当首次加载或者内存不足时，通过fileManager.getCachedContent(this)取出来的都可能为空。

如果为空会进入if语句，而首次获取时一般都为空，根据File文件获取InputStream输入流对象后，调用fileManager对象的makeBytebuffer()对象，将文件中的内容读取到缓存中。

/**
 * Make a byte buffer from an input stream.
 */
public ByteBuffer makeByteBuffer(InputStream in) throws IOException {
    int limit = in.available();
    if (limit < 1024) {
        limit = 1024;
    }
    ByteBuffer result = byteBufferCache.get(limit); // 获取出来的result类型为java.nio.HeapByteBuffer
    int position = 0;
    while (in.available() != 0) {
        if (position >= limit) {
            // expand buffer  扩容
            result = ByteBuffer.allocate(limit <<= 1).put((ByteBuffer) result.flip());
        }
        int count = in.read(result.array(),position,limit - position);
        if (count < 0) {
            break;
        }
        result.position(position += count);
    }
    return (ByteBuffer)result.flip();
}

实现如下：

/**
 * A single-element cache of direct byte buffers.
 */
private static class ByteBufferCache {
    private ByteBuffer cached;
    ByteBuffer get(int capacity) {
        if (capacity < 20480) {
            capacity = 20480;
        }
        ByteBuffer result;
        if (cached != null && cached.capacity() >= capacity){
            result = (ByteBuffer)cached.clear();
        }else{
            result = ByteBuffer.allocate(capacity + capacity>>1);
        }
        cached = null;
        return result;
    }
    void put(ByteBuffer x) {
        cached = x;
    }
}

定义了一个字节缓冲类，其中的cached即为具体的缓冲，由get()方法可以看到分析的