Hadoop源码阅读-HDFS-day2

昨天看到了AbstractFileSystem，也知道应用访问文件是通过FileContext这个类，今天来看这个类的源代码，先看下这个类老长的注释说明

 /**

  * The FileContext class provides an interface to the application writer for

  * using the Hadoop file system.

  * It provides a set of methods for the usual operation: create, open,

  * list, etc

  *

  * <p>

  * <b> *** Path Names *** </b>

  * <p>

  *

  * The Hadoop file system supports a URI name space and URI names.

  * It offers a forest of file systems that can be referenced using fully

  * qualified URIs.

  * Two common Hadoop file systems implementations are

  * <ul>

  * <li> the local file system: file:///path

  * <li> the hdfs file system hdfs://nnAddress:nnPort/path

  * </ul>

  *

  * While URI names are very flexible, it requires knowing the name or address

  * of the server. For convenience one often wants to access the default system

  * in one's environment without knowing its name/address. This has an

  * additional benefit that it allows one to change one's default fs

  *  (e.g. admin moves application from cluster1 to cluster2).

  * <p>

  *

  * To facilitate this, Hadoop supports a notion of a default file system.

  * The user can set his default file system, although this is

  * typically set up for you in your environment via your default config.

  * A default file system implies a default scheme and authority; slash-relative

  * names (such as /for/bar) are resolved relative to that default FS.

  * Similarly a user can also have working-directory-relative names (i.e. names

  * not starting with a slash). While the working directory is generally in the

  * same default FS, the wd can be in a different FS.

  * <p>

  *  Hence Hadoop path names can be one of:

  *  <ul>

  *  <li> fully qualified URI: scheme://authority/path

  *  <li> slash relative names: /path relative to the default file system

  *  <li> wd-relative names: path  relative to the working dir

  *  </ul>

  *  Relative paths with scheme (scheme:foo/bar) are illegal.

  *

  *  <p>

  *  <b>****The Role of the FileContext and configuration defaults****</b>

  *  <p>

  *  The FileContext provides file namespace context for resolving file names;

  *  it also contains the umask for permissions, In that sense it is like the

  *  per-process file-related state in Unix system.

  *  These two properties

  *  <ul>

  *  <li> default file system i.e your slash)

  *  <li> umask

  *  </ul>

  *  in general, are obtained from the default configuration file

  *  in your environment,  (@see {@link Configuration}).

  *

  *  No other configuration parameters are obtained from the default config as

  *  far as the file context layer is concerned. All file system instances

  *  (i.e. deployments of file systems) have default properties; we call these

  *  server side (SS) defaults. Operation like create allow one to select many

  *  properties: either pass them in as explicit parameters or use

  *  the SS properties.

  *  <p>

  *  The file system related SS defaults are

  *  <ul>

  *  <li> the home directory (default is "/user/userName")

  *  <li> the initial wd (only for local fs)

  *  <li> replication factor

  *  <li> block size

  *  <li> buffer size

  *  <li> encryptDataTransfer

  *  <li> checksum option. (checksumType and  bytesPerChecksum)

  *  </ul>

  *

  * <p>

  * <b> *** Usage Model for the FileContext class *** </b>

  * <p>

  * Example 1: use the default config read from the $HADOOP_CONFIG/core.xml.

  *   Unspecified values come from core-defaults.xml in the release jar.

  *  <ul>

  *  <li> myFContext = FileContext.getFileContext(); // uses the default config

  *                                                // which has your default FS

  *  <li>  myFContext.create(path, ...);

  *  <li>  myFContext.setWorkingDir(path)

  *  <li>  myFContext.open (path, ...);

  *  </ul>

  * Example 2: Get a FileContext with a specific URI as the default FS

  *  <ul>

  *  <li> myFContext = FileContext.getFileContext(URI)

  *  <li> myFContext.create(path, ...);

  *   ...

  * </ul>

  * Example 3: FileContext with local file system as the default

  *  <ul>

  *  <li> myFContext = FileContext.getLocalFSFileContext()

  *  <li> myFContext.create(path, ...);

  *  <li> ...

  *  </ul>

  * Example 4: Use a specific config, ignoring $HADOOP_CONFIG

  *  Generally you should not need use a config unless you are doing

  *   <ul>

  *   <li> configX = someConfigSomeOnePassedToYou.

  *   <li> myFContext = getFileContext(configX); // configX is not changed,

  *                                              // is passed down

  *   <li> myFContext.create(path, ...);

  *   <li>...

  *  </ul>

  *

  */

 @InterfaceAudience.Public

 @InterfaceStability.Evolving /*Evolving for a release,to be changed to Stable */

 public class FileContext {

FileContext类为应用程序写提供一个接口，提供了常用操作：创建（create）,打开（open）,列举(list)等

Hadoop 文件系统的两个通用实现分别是

本地文件系统 file:///path
hdfs文件系统 hdfs://nnAddress:nnPort/path

URI命名非常灵活，它需要知道服务端的名字或者地址。HDFS有一个默认值，这有一个额外的好处就是，允许更改默认的fs(比如：管理员将应用从集群1移到集群2)

Hadoop 支持默认文件系统的理念。用户可以设置他的默认文件系统。

默认的文件系统实现了一个默认的scheme和authority；slash-relative名称(例如：/for/bar) 将解析成相对于默认FS的路径

同理，用户可以拥有自己的working-directory-relative名称（不是以slash开头的）。

因此，Hadoop路径的可以是以下几种：

完全合法的URI scheme://authority/path

slash relative names /path 相对于默认的文件系统

wd-relative names path 相对于工作目录

 private FileContext(final AbstractFileSystem defFs,

     final FsPermission theUmask, final Configuration aConf) {

     defaultFS = defFs;

     umask = FsPermission.getUMask(aConf);

     conf = aConf;

     try {

       ugi = UserGroupInformation.getCurrentUser();

     } catch (IOException e) {

       LOG.error("Exception in getCurrentUser: ",e);

       throw new RuntimeException("Failed to get the current user " +

               "while creating a FileContext", e);

     }

     /*

      * Init the wd.

      * WorkingDir is implemented at the FileContext layer

      * NOT at the AbstractFileSystem layer.

      * If the DefaultFS, such as localFilesystem has a notion of

      *  builtin WD, we use that as the initial WD.

      *  Otherwise the WD is initialized to the home directory.

      */

     workingDir = defaultFS.getInitialWorkingDirectory();

     if (workingDir == null) {

       workingDir = defaultFS.getHomeDirectory();

     }

     resolveSymlinks = conf.getBoolean(

         CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY,

         CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_DEFAULT);

     util = new Util(); // for the inner class

   }

FileContext传进来三个参数，

defFs FileContext默认的FS
theUmask 貌似没有使用到，历史遗留问题吗？他的umask使用FsPermission.getUMask(conf)初始化了
conf 配置信息

下面来看它说的几个常用的方法，首先是create，隐藏的是一堆的注释

 /**

    * Create or overwrite file on indicated path and returns an output stream for

    * writing into the file.

    *

    * @param f the file name to open

    * @param createFlag gives the semantics of create; see {@link CreateFlag}

    * @param opts file creation options; see {@link Options.CreateOpts}.

    *          <ul>

    *          <li>Progress - to report progress on the operation - default null

    *          <li>Permission - umask is applied against permisssion: default is

    *          FsPermissions:getDefault()

    *

    *          <li>CreateParent - create missing parent path; default is to not

    *          to create parents

    *          <li>The defaults for the following are SS defaults of the file

    *          server implementing the target path. Not all parameters make sense

    *          for all kinds of file system - eg. localFS ignores Blocksize,

    *          replication, checksum

    *          <ul>

    *          <li>BufferSize - buffersize used in FSDataOutputStream

    *          <li>Blocksize - block size for file blocks

    *          <li>ReplicationFactor - replication for blocks

    *          <li>ChecksumParam - Checksum parameters. server default is used

    *          if not specified.

    *          </ul>

    *          </ul>

    *

    * @return {@link FSDataOutputStream} for created file

    *

    * @throws AccessControlException If access is denied

    * @throws FileAlreadyExistsException If file <code>f</code> already exists

    * @throws FileNotFoundException If parent of <code>f</code> does not exist

    *           and <code>createParent</code> is false

    * @throws ParentNotDirectoryException If parent of <code>f</code> is not a

    *           directory.

    * @throws UnsupportedFileSystemException If file system for <code>f</code> is

    *           not supported

    * @throws IOException If an I/O error occurred

    *

    * Exceptions applicable to file systems accessed over RPC:

    * @throws RpcClientException If an exception occurred in the RPC client

    * @throws RpcServerException If an exception occurred in the RPC server

    * @throws UnexpectedServerException If server implementation throws

    *           undeclared exception to RPC server

    *

    * RuntimeExceptions:

    * @throws InvalidPathException If path <code>f</code> is not valid

    */

 public FSDataOutputStream create(final Path f,

       final EnumSet<CreateFlag> createFlag, Options.CreateOpts... opts)

       throws AccessControlException, FileAlreadyExistsException,

       FileNotFoundException, ParentNotDirectoryException,

       UnsupportedFileSystemException, IOException {

     Path absF = fixRelativePart(f);

     // If one of the options is a permission, extract it & apply umask

     // If not, add a default Perms and apply umask;

     // AbstractFileSystem#create

     CreateOpts.Perms permOpt = CreateOpts.getOpt(CreateOpts.Perms.class, opts);

     FsPermission permission = (permOpt != null) ? permOpt.getValue() :

                                       FILE_DEFAULT_PERM;

     permission = permission.applyUMask(umask);

     final CreateOpts[] updatedOpts =

                       CreateOpts.setOpt(CreateOpts.perms(permission), opts);

     return new FSLinkResolver<FSDataOutputStream>() {

       @Override

       public FSDataOutputStream next(final AbstractFileSystem fs, final Path p)

         throws IOException {

         return fs.create(p, createFlag, updatedOpts);

       }

     }.resolve(this, absF);

   }

create方法是用来在指定的路径上创建或者重写文件并返回outputstream的一个方法

在最后return时 new的 FSLinkResolver是用来处理路径为符号链接的情况

 /**

    * Generic helper function overridden on instantiation to perform a

    * specific operation on the given file system using the given path

    * which may result in an UnresolvedLinkException.

    * @param fs AbstractFileSystem to perform the operation on.

    * @param p Path given the file system.

    * @return Generic type determined by the specific implementation.

    * @throws UnresolvedLinkException If symbolic link <code>path</code> could

    *           not be resolved

    * @throws IOException an I/O error occurred

    */

   abstract public T next(final AbstractFileSystem fs, final Path p)

       throws IOException, UnresolvedLinkException;

 /**

    * Performs the operation specified by the next function, calling it

    * repeatedly until all symlinks in the given path are resolved.

    * @param fc FileContext used to access file systems.

    * @param path The path to resolve symlinks on.

    * @return Generic type determined by the implementation of next.

    * @throws IOException

    */

   public T resolve(final FileContext fc, final Path path) throws IOException {

     int count = 0;

     T in = null;

     Path p = path;

     // NB: More than one AbstractFileSystem can match a scheme, eg

     // "file" resolves to LocalFs but could have come by RawLocalFs.

     AbstractFileSystem fs = fc.getFSofPath(p);

     // Loop until all symlinks are resolved or the limit is reached

     for (boolean isLink = true; isLink;) {

       try {

         in = next(fs, p);

         isLink = false;

       } catch (UnresolvedLinkException e) {

         if (!fc.resolveSymlinks) {

           throw new IOException("Path " + path + " contains a symlink"

               + " and symlink resolution is disabled ("

               + CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY + ").", e);

         }

         if (!FileSystem.areSymlinksEnabled()) {

           throw new IOException("Symlink resolution is disabled in"

               + " this version of Hadoop.");

         }

         if (count++ > FsConstants.MAX_PATH_LINKS) {

           throw new IOException("Possible cyclic loop while " +

                                 "following symbolic link " + path);

         }

         // Resolve the first unresolved path component

         p = qualifySymlinkTarget(fs.getUri(), p, fs.getLinkTarget(p));

         fs = fc.getFSofPath(p);

       }

     }

     return in;

   }

next 是一个一般的helper函数，需要被实例重写，从而在给定路径的文件系统上执行特定的操作，可能会抛UnresolvedLinkException异常

resolve 通过next执行特定的操作，反复的调用next函数，知道路径上所有的符号链接被解析

Hadoop源码阅读-HDFS-day2的更多相关文章

Mac搭建Hadoop源码阅读环境
1.本次Hadoop源码阅读环境使用的阅读工具是idea,Hadoop版本是2.7.3.需要安装的工具包括idea.jdk.maven.protobuf等 2.jdk,使用的版本是1.8版,在jdk官 ...
Hadoop源码阅读环境搭建（IDEA）
拿到一份Hadoop源码之后,经常关注的两件事情就是 1.怎么阅读?涉及IDEA和Eclipse工程搭建.IDEA搭建,选择源码,逐步导入即可:Eclipse可以选择后台生成工程,也可以选择IDE导入 ...
【深入浅出 Yarn 架构与实现】1-2 搭建 Hadoop 源码阅读环境
本文将介绍如何使用 idea 搭建 Hadoop 源码阅读环境.(默认已安装好 Java.Maven 环境) 一.搭建源码阅读环境一)idea 导入 hadoop 工程从 github 上拉取代码 ...
详细讲解Hadoop源码阅读工程（以hadoop-2.6.0-src.tar.gz和hadoop-2.6.0-cdh5.4.5-src.tar.gz为代表）
首先,说的是,本人到现在为止,已经玩过. 对于,这样的软件,博友,可以去看我博客的相关博文.在此,不一一赘述! Eclipse *版本 Eclipse *下载 Jd ...
IntelliJ IDEA 配置 Hadoop 源码阅读环境
1.下载安装IDEA https://www.jetbrains.com/idea/download/#section=windows 2.下载hadoop源码 https://archive.apa ...
hadoop源码阅读
1.Hadoop的包的功能分析 2.由于Hadoop的MapReduce和HDFS都有通信的需求,需要对通信的对象进行序列化.Hadoop并没有采用java的序列化,而是引入它自己的系统.org.ap ...
Apache Hadoop 源码阅读
总之一句话,这些都是hadoop-2.2.0的源代码里有的.也就是不光只是懂理论,编程最重要,还是基本功要扎实啊.... 在hadoop-2.2.0的源码里,按Ctrl + Shift + T . 跳 ...
Apache Hadoop 源码阅读（陆续更新）
不多说,直接上干货! 总之一句话,这些都是hadoop-2.2.0的源代码里有的.也就是不光只是懂理论,编程最重要,还是基本功要扎实啊.... 在hadoop-2.2.0的源码里,按Ctrl + Sh ...
Hadoop源码之HDFS(1)--------通信方式
说起hadoop这个东西,只能说真是个伟大的发明,而本人对cutting大神也是无比的崇拜,记得刚接触hadoop的时候,还觉得这个东西挺多余的,但是现在想想,这个想法略傻逼...... 2006-2 ...
Hadoop 源码阅读技巧
http://www.cnblogs.com/xuxm2007/category/388607.html 个人谈谈阅读hadoop源代码的经验.首先,不得不说,hadoop发展到现在这个阶段, ...

随机推荐

设计模式笔记观察者模式 Observer
//---------------------------15/04/27---------------------------- //Observer 观察者模式----对象行为型模式 /* 1:意 ...
Spring学习(十九)----- Spring与WEB容器整合
首先可以肯定的是,加载顺序与它们在 web.xml 文件中的先后顺序无关.即不会因为 filter 写在 listener 的前面而会先加载 filter.最终得出的结论是:listener -> ...
debug 在windows下的使用
debug是什么? debug是一款windows和DOS系统下的一款软件,其最早可追溯到1937年的"马克1号"(具体度娘):早期debug主要在DOS和windows系统中,它 ...
利用BFS实现最短路
首先,我们要知道BFS的思想,BFS全称是Breadth-First-Search. 二叉树的BFS:通过BFS访问,它们的访问顺序是它们到根节点距离从小到大的排序. 图的BFS:同样的,离起点越近, ...
强化学习算法Policy Gradient
1 算法的优缺点 1.1 优点在DQN算法中,神经网络输出的是动作的q值,这对于一个agent拥有少数的离散的动作还是可以的.但是如果某个agent的动作是连续的,这无疑对DQN算法是一个巨大的挑战 ...
symfon2 配置文件使用 + HttpRequest使用 + Global多语言解决方案
1. 在 app/conig中建立一个自命名的文件: abc.yml 2. 在 app/config/config.yml中导入abc.yml 文件头部: imports:- { resource: ...
Java serialVersionUID作用和生成
序列化和反序列化Java是面向对象的语言,与其他语言进行交互(比如与前端js进行http通信),需要把对象转化成一种通用的格式比如json(前端显然不认识Java对象),从对象到json字符串的转换, ...
20135202闫佳歆--week4 两种方式使用同一个系统调用--实验及总结
实验四使用库函数API和C代码中嵌入汇编代码两种方式使用同一个系统调用在这里我选择的是第20号系统调用,getpid. 1.使用库函数API: 代码如下: /* getpid.c */ #incl ...
linux内核分析--操作系统是如何工作的？
一个简单的时间片轮转多道程序操作系统的"两把剑":中断上下文(保存现场和恢复现场)和进程上下文的切换源代码的分析 *使用的源代码为视频中所使用的精简内核的源代码首先分析myp ...
浅谈个人对存储区域网络SAN的理解
存储区域网络SAN,是一种通过将网络存储设备和服务器连接起来的网络,提供计算机和存储设备间的数据传输.其中,SAN是独立于服务器系统之外的,拥有近乎无限的存储能力,通过利用光纤作为传输媒介,实现了高速 ...

Hadoop源码阅读-HDFS-day2

Hadoop源码阅读-HDFS-day2的更多相关文章

随机推荐

热门专题