Hadoop源码阅读-HDFS-day2
昨天看到了AbstractFileSystem,也知道应用访问文件是通过FileContext这个类,今天来看这个类的源代码,先看下这个类老长的注释说明
/**
* The FileContext class provides an interface to the application writer for
* using the Hadoop file system.
* It provides a set of methods for the usual operation: create, open,
* list, etc
*
* <p>
* <b> *** Path Names *** </b>
* <p>
*
* The Hadoop file system supports a URI name space and URI names.
* It offers a forest of file systems that can be referenced using fully
* qualified URIs.
* Two common Hadoop file systems implementations are
* <ul>
* <li> the local file system: file:///path
* <li> the hdfs file system hdfs://nnAddress:nnPort/path
* </ul>
*
* While URI names are very flexible, it requires knowing the name or address
* of the server. For convenience one often wants to access the default system
* in one's environment without knowing its name/address. This has an
* additional benefit that it allows one to change one's default fs
* (e.g. admin moves application from cluster1 to cluster2).
* <p>
*
* To facilitate this, Hadoop supports a notion of a default file system.
* The user can set his default file system, although this is
* typically set up for you in your environment via your default config.
* A default file system implies a default scheme and authority; slash-relative
* names (such as /for/bar) are resolved relative to that default FS.
* Similarly a user can also have working-directory-relative names (i.e. names
* not starting with a slash). While the working directory is generally in the
* same default FS, the wd can be in a different FS.
* <p>
* Hence Hadoop path names can be one of:
* <ul>
* <li> fully qualified URI: scheme://authority/path
* <li> slash relative names: /path relative to the default file system
* <li> wd-relative names: path relative to the working dir
* </ul>
* Relative paths with scheme (scheme:foo/bar) are illegal.
*
* <p>
* <b>****The Role of the FileContext and configuration defaults****</b>
* <p>
* The FileContext provides file namespace context for resolving file names;
* it also contains the umask for permissions, In that sense it is like the
* per-process file-related state in Unix system.
* These two properties
* <ul>
* <li> default file system i.e your slash)
* <li> umask
* </ul>
* in general, are obtained from the default configuration file
* in your environment, (@see {@link Configuration}).
*
* No other configuration parameters are obtained from the default config as
* far as the file context layer is concerned. All file system instances
* (i.e. deployments of file systems) have default properties; we call these
* server side (SS) defaults. Operation like create allow one to select many
* properties: either pass them in as explicit parameters or use
* the SS properties.
* <p>
* The file system related SS defaults are
* <ul>
* <li> the home directory (default is "/user/userName")
* <li> the initial wd (only for local fs)
* <li> replication factor
* <li> block size
* <li> buffer size
* <li> encryptDataTransfer
* <li> checksum option. (checksumType and bytesPerChecksum)
* </ul>
*
* <p>
* <b> *** Usage Model for the FileContext class *** </b>
* <p>
* Example 1: use the default config read from the $HADOOP_CONFIG/core.xml.
* Unspecified values come from core-defaults.xml in the release jar.
* <ul>
* <li> myFContext = FileContext.getFileContext(); // uses the default config
* // which has your default FS
* <li> myFContext.create(path, ...);
* <li> myFContext.setWorkingDir(path)
* <li> myFContext.open (path, ...);
* </ul>
* Example 2: Get a FileContext with a specific URI as the default FS
* <ul>
* <li> myFContext = FileContext.getFileContext(URI)
* <li> myFContext.create(path, ...);
* ...
* </ul>
* Example 3: FileContext with local file system as the default
* <ul>
* <li> myFContext = FileContext.getLocalFSFileContext()
* <li> myFContext.create(path, ...);
* <li> ...
* </ul>
* Example 4: Use a specific config, ignoring $HADOOP_CONFIG
* Generally you should not need use a config unless you are doing
* <ul>
* <li> configX = someConfigSomeOnePassedToYou.
* <li> myFContext = getFileContext(configX); // configX is not changed,
* // is passed down
* <li> myFContext.create(path, ...);
* <li>...
* </ul>
*
*/ @InterfaceAudience.Public
@InterfaceStability.Evolving /*Evolving for a release,to be changed to Stable */
public class FileContext {
FileContext类为应用程序写提供一个接口,提供了常用操作:创建(create),打开(open),列举(list)等
Hadoop 文件系统的两个通用实现分别是
- 本地文件系统 file:///path
- hdfs文件系统 hdfs://nnAddress:nnPort/path
URI命名非常灵活,它需要知道服务端的名字或者地址。HDFS有一个默认值,这有一个额外的好处就是,允许更改默认的fs(比如:管理员将应用从集群1移到集群2)
Hadoop 支持默认文件系统的理念。用户可以设置他的默认文件系统。
默认的文件系统实现了一个默认的scheme和authority;slash-relative名称(例如:/for/bar) 将解析成相对于默认FS的路径
同理,用户可以拥有自己的working-directory-relative名称(不是以slash开头的)。
因此,Hadoop路径的可以是以下几种:
完全合法的URI scheme://authority/path
slash relative names /path 相对于默认的文件系统
wd-relative names path 相对于工作目录
private FileContext(final AbstractFileSystem defFs,
final FsPermission theUmask, final Configuration aConf) {
defaultFS = defFs;
umask = FsPermission.getUMask(aConf);
conf = aConf;
try {
ugi = UserGroupInformation.getCurrentUser();
} catch (IOException e) {
LOG.error("Exception in getCurrentUser: ",e);
throw new RuntimeException("Failed to get the current user " +
"while creating a FileContext", e);
}
/*
* Init the wd.
* WorkingDir is implemented at the FileContext layer
* NOT at the AbstractFileSystem layer.
* If the DefaultFS, such as localFilesystem has a notion of
* builtin WD, we use that as the initial WD.
* Otherwise the WD is initialized to the home directory.
*/
workingDir = defaultFS.getInitialWorkingDirectory();
if (workingDir == null) {
workingDir = defaultFS.getHomeDirectory();
}
resolveSymlinks = conf.getBoolean(
CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY,
CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_DEFAULT);
util = new Util(); // for the inner class
}
FileContext传进来三个参数,
- defFs FileContext默认的FS
- theUmask 貌似没有使用到,历史遗留问题吗?他的umask使用FsPermission.getUMask(conf)初始化了
- conf 配置信息
下面来看它说的几个常用的方法,首先是create,隐藏的是一堆的注释
/**
* Create or overwrite file on indicated path and returns an output stream for
* writing into the file.
*
* @param f the file name to open
* @param createFlag gives the semantics of create; see {@link CreateFlag}
* @param opts file creation options; see {@link Options.CreateOpts}.
* <ul>
* <li>Progress - to report progress on the operation - default null
* <li>Permission - umask is applied against permisssion: default is
* FsPermissions:getDefault()
*
* <li>CreateParent - create missing parent path; default is to not
* to create parents
* <li>The defaults for the following are SS defaults of the file
* server implementing the target path. Not all parameters make sense
* for all kinds of file system - eg. localFS ignores Blocksize,
* replication, checksum
* <ul>
* <li>BufferSize - buffersize used in FSDataOutputStream
* <li>Blocksize - block size for file blocks
* <li>ReplicationFactor - replication for blocks
* <li>ChecksumParam - Checksum parameters. server default is used
* if not specified.
* </ul>
* </ul>
*
* @return {@link FSDataOutputStream} for created file
*
* @throws AccessControlException If access is denied
* @throws FileAlreadyExistsException If file <code>f</code> already exists
* @throws FileNotFoundException If parent of <code>f</code> does not exist
* and <code>createParent</code> is false
* @throws ParentNotDirectoryException If parent of <code>f</code> is not a
* directory.
* @throws UnsupportedFileSystemException If file system for <code>f</code> is
* not supported
* @throws IOException If an I/O error occurred
*
* Exceptions applicable to file systems accessed over RPC:
* @throws RpcClientException If an exception occurred in the RPC client
* @throws RpcServerException If an exception occurred in the RPC server
* @throws UnexpectedServerException If server implementation throws
* undeclared exception to RPC server
*
* RuntimeExceptions:
* @throws InvalidPathException If path <code>f</code> is not valid
*/
public FSDataOutputStream create(final Path f,
final EnumSet<CreateFlag> createFlag, Options.CreateOpts... opts)
throws AccessControlException, FileAlreadyExistsException,
FileNotFoundException, ParentNotDirectoryException,
UnsupportedFileSystemException, IOException {
Path absF = fixRelativePart(f); // If one of the options is a permission, extract it & apply umask
// If not, add a default Perms and apply umask;
// AbstractFileSystem#create CreateOpts.Perms permOpt = CreateOpts.getOpt(CreateOpts.Perms.class, opts);
FsPermission permission = (permOpt != null) ? permOpt.getValue() :
FILE_DEFAULT_PERM;
permission = permission.applyUMask(umask); final CreateOpts[] updatedOpts =
CreateOpts.setOpt(CreateOpts.perms(permission), opts);
return new FSLinkResolver<FSDataOutputStream>() {
@Override
public FSDataOutputStream next(final AbstractFileSystem fs, final Path p)
throws IOException {
return fs.create(p, createFlag, updatedOpts);
}
}.resolve(this, absF);
}
create方法是用来在指定的路径上创建或者重写文件并返回outputstream的一个方法
在最后return时 new的 FSLinkResolver是用来处理路径为符号链接的情况
/**
* Generic helper function overridden on instantiation to perform a
* specific operation on the given file system using the given path
* which may result in an UnresolvedLinkException.
* @param fs AbstractFileSystem to perform the operation on.
* @param p Path given the file system.
* @return Generic type determined by the specific implementation.
* @throws UnresolvedLinkException If symbolic link <code>path</code> could
* not be resolved
* @throws IOException an I/O error occurred
*/
abstract public T next(final AbstractFileSystem fs, final Path p)
throws IOException, UnresolvedLinkException; /**
* Performs the operation specified by the next function, calling it
* repeatedly until all symlinks in the given path are resolved.
* @param fc FileContext used to access file systems.
* @param path The path to resolve symlinks on.
* @return Generic type determined by the implementation of next.
* @throws IOException
*/
public T resolve(final FileContext fc, final Path path) throws IOException {
int count = 0;
T in = null;
Path p = path;
// NB: More than one AbstractFileSystem can match a scheme, eg
// "file" resolves to LocalFs but could have come by RawLocalFs.
AbstractFileSystem fs = fc.getFSofPath(p); // Loop until all symlinks are resolved or the limit is reached
for (boolean isLink = true; isLink;) {
try {
in = next(fs, p);
isLink = false;
} catch (UnresolvedLinkException e) {
if (!fc.resolveSymlinks) {
throw new IOException("Path " + path + " contains a symlink"
+ " and symlink resolution is disabled ("
+ CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY + ").", e);
}
if (!FileSystem.areSymlinksEnabled()) {
throw new IOException("Symlink resolution is disabled in"
+ " this version of Hadoop.");
}
if (count++ > FsConstants.MAX_PATH_LINKS) {
throw new IOException("Possible cyclic loop while " +
"following symbolic link " + path);
}
// Resolve the first unresolved path component
p = qualifySymlinkTarget(fs.getUri(), p, fs.getLinkTarget(p));
fs = fc.getFSofPath(p);
}
}
return in;
}
next 是一个一般的helper函数,需要被实例重写,从而在给定路径的文件系统上执行特定的操作,可能会抛UnresolvedLinkException异常
resolve 通过next执行特定的操作,反复的调用next函数,知道路径上所有的符号链接被解析
Hadoop源码阅读-HDFS-day2的更多相关文章
- Mac搭建Hadoop源码阅读环境
1.本次Hadoop源码阅读环境使用的阅读工具是idea,Hadoop版本是2.7.3.需要安装的工具包括idea.jdk.maven.protobuf等 2.jdk,使用的版本是1.8版,在jdk官 ...
- Hadoop源码阅读环境搭建(IDEA)
拿到一份Hadoop源码之后,经常关注的两件事情就是 1.怎么阅读?涉及IDEA和Eclipse工程搭建.IDEA搭建,选择源码,逐步导入即可:Eclipse可以选择后台生成工程,也可以选择IDE导入 ...
- 【深入浅出 Yarn 架构与实现】1-2 搭建 Hadoop 源码阅读环境
本文将介绍如何使用 idea 搭建 Hadoop 源码阅读环境.(默认已安装好 Java.Maven 环境) 一.搭建源码阅读环境 一)idea 导入 hadoop 工程 从 github 上拉取代码 ...
- 详细讲解Hadoop源码阅读工程(以hadoop-2.6.0-src.tar.gz和hadoop-2.6.0-cdh5.4.5-src.tar.gz为代表)
首先,说的是,本人到现在为止,已经玩过. 对于,这样的软件,博友,可以去看我博客的相关博文.在此,不一一赘述! Eclipse *版本 Eclipse *下载 Jd ...
- IntelliJ IDEA 配置 Hadoop 源码阅读环境
1.下载安装IDEA https://www.jetbrains.com/idea/download/#section=windows 2.下载hadoop源码 https://archive.apa ...
- hadoop源码阅读
1.Hadoop的包的功能分析 2.由于Hadoop的MapReduce和HDFS都有通信的需求,需要对通信的对象进行序列化.Hadoop并没有采用java的序列化,而是引入它自己的系统.org.ap ...
- Apache Hadoop 源码阅读
总之一句话,这些都是hadoop-2.2.0的源代码里有的.也就是不光只是懂理论,编程最重要,还是基本功要扎实啊.... 在hadoop-2.2.0的源码里,按Ctrl + Shift + T . 跳 ...
- Apache Hadoop 源码阅读(陆续更新)
不多说,直接上干货! 总之一句话,这些都是hadoop-2.2.0的源代码里有的.也就是不光只是懂理论,编程最重要,还是基本功要扎实啊.... 在hadoop-2.2.0的源码里,按Ctrl + Sh ...
- Hadoop源码之HDFS(1)--------通信方式
说起hadoop这个东西,只能说真是个伟大的发明,而本人对cutting大神也是无比的崇拜,记得刚接触hadoop的时候,还觉得这个东西挺多余的,但是现在想想,这个想法略傻逼...... 2006-2 ...
- Hadoop 源码阅读技巧
http://www.cnblogs.com/xuxm2007/category/388607.html 个人谈谈阅读hadoop源代码的经验.首先,不得不说,hadoop发展到现在这个阶段, ...
随机推荐
- ace -- about
Built for Code Ace is an embeddable code editor written in JavaScript. It matches the features and p ...
- [!] CocoaPods could not find compatible versions for pod "Folly"问题举例
$ pod install 后出现下面错误: [!] CocoaPods could not find compatible versions for pod "Folly": I ...
- Daily Scrum 12.22
姓名 上周末任务 今日任务 刘垚鹏 完善和增加quiz页面的过滤功能 完善和增加quiz页面的过滤功能 王骜 对问答功能的修复 对问答功能的修复 林旭鹏 存储文件路径太长导致bug修复 存储文件路径太 ...
- 用户场景模拟+Spec
场景模拟: Spec: 浏览包车信息-->登录-->选择包车城市-->选择去/回程-->选择路线-->预定-->选择包车日期-->出发时间和地点--> ...
- Smart Disk -- proposed by Liyuan Liu
Need 如今,照相渐渐得成为了人们的日常举动.几乎所有的人都在随时随地得照相.手机,相机,平板越来越多的设备对照相进行了支持,同时, 照片以一种前所未有的速度渐渐淹没我们的文件夹.而寻找照片,对照片 ...
- Java实现模拟登录新浪微博
毕设题目要使用到新浪微博数据,所以要爬取新浪微博的数据.一般而言,新浪微博的爬虫有两种模式:新浪官方API和模拟登录新浪微博.两种方法的异同点和适用情况就无须赘述了.前辈的文章已经非常多了.写这篇文章 ...
- pandas读取csv数据时设置index
比如读取数据时想把第一列设为index,那么只需要简单的 pd.read_csv("new_wordvecter.csv",index_col=[0]) 这里index_col可以 ...
- Spring源码解析二:IOC容器初始化过程详解
IOC容器初始化分为三个步骤,分别是: 1.Resource定位,即BeanDefinition的资源定位. 2.BeanDefinition的载入 3.向IOC容器注册BeanDefinition ...
- [转帖]Mysql 最简单的参数调优配置
http://blog.jobbole.com/113659/ 我并不期望成为一个专家级的 DBA,但是,在我优化 MySQL 时,我推崇 80/20 原则,明确说就是通过简单的调整一些配置,你可以压 ...
- 什么是Asp.net Core?和 .net core有什么区别?
为什么要写这篇文章 写这篇文章有两个原因,第一个是因为新站点创建出来后一直空置着,所以写一篇文章放在这里.第二就是因为近来在做一些基于Asp.net core平台的项目开发,也遇到了一些问题,正好趁此 ...