HDFS命令概述

HDFS命令涉及两类,一类是hadoop命令,一类是hdfs命令,功能也分为两类,第一类是HDFS文件操作命令,第二类是HDFS管理命令。

二者都是shell命令,真正的命令只有hadoop和hdfs,而无所谓的ls/mv/cp/cat/mkdir…dfs/setQuota/fsck…等命令,后者都是以入参传递给hadoop和hdfs的。

具体实现参考bin/hadoop和bin/hdfs。hadoop族其他命令如yarn,实现机制类似。官方介绍如下:

The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that 
Hadoop supports, such as Local FS, WebHDFS, S3 FS, and others. The FS shell is invoked by:
bin/hadoop fs <args>
All FS shell commands take path URIs as arguments. The URI format is scheme://authority/path. For HDFS the scheme is hdfs, and for the Local FS the scheme is file.
The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost).
Most of the commands in FS shell behave like corresponding Unix commands. Differences are described with each of the commands. Error information is sent to stderr and the output is sent to stdout.
If HDFS is being used, hdfs dfs is a synonym.
Relative paths can be used. For HDFS, the current working directory is the HDFS home directory /user/<username> that often has to be created manually. The HDFS home directory can also be implicitly accessed, e.g., when using the HDFS trash folder, the .Trash directory in the home directory.

HDFS命令实现机制:
输入命令hadoop/hdfs和参数,经过解析之后,通过执行java或者jsvc启动java程序,获取hdfs文件信息或者传入操作指令,实现文件系统管理。

                                              图1. HDFS命令执行交互过程简介

针对hadoop fs和hdsf dfs/dfsadmin命令,hadoop-3.1.1版本具体实现如下:

相关的shell脚本为bin/hadoopbin/hdfs

相关的java类为 org/apache/hadoop/fs/FsShell.java org/apache/hadoop/fs/shell/CommandFactory.java org/apache/hadoop/util/ToolRunner.java  org/apache/hadoop/fs/shell/FsCommand.java  org/apache/hadoop/util/GenericOptionsParser.java 和 org/apache/hadoop/hdfs/tools/DFSAdmin.java 。

其中DFSAdmin.java继承自FsShell类。

FsShell: 是整个hdfs命令的核心,负责各种操作实现类的注册(通过反射实现)和命令的分发执行。
FsCommand: 各种操作命令注册实现。
CommandFactoryL: 命令注册机制的实现。主要通过工厂模式利用反射实现。
GenericOptionsParser.java: 中间转换操作,实际执行命令的仍是FsShell.run()函数。
GenericOptionsParser.java: 负责命令输入参数的解析。
ToolRunner.java: 执行命令操作,实际上是FsShell.run()执行。

1、命令注册

 //org/apache/hadoop/fs/FsShell.java
protected void init() throws IOException {
getConf().setQuietMode(true);
UserGroupInformation.setConfiguration(getConf());
if (commandFactory == null) {
commandFactory = new CommandFactory(getConf());
commandFactory.addObject(new Help(), "-help");
commandFactory.addObject(new Usage(), "-usage");
registerCommands(commandFactory);
}
} protected void registerCommands(CommandFactory factory) {
// TODO: DFSAdmin subclasses FsShell so need to protect the command
// registration. This class should morph into a base class for
// commands, and then this method can be abstract
if (this.getClass().equals(FsShell.class)) {
factory.registerCommands(FsCommand.class);
}
} //org/apache/hadoop/fs/shell/CommandFactory.java
/**
* Invokes "static void registerCommands(CommandFactory)" on the given class.
* This method abstracts the contract between the factory and the command
* class. Do not assume that directly invoking registerCommands on the
* given class will have the same effect.
* @param registrarClass class to allow an opportunity to register
*/
public void registerCommands(Class<?> registrarClass) {
try {
registrarClass.getMethod(
"registerCommands", CommandFactory.class
).invoke(null, this);
} catch (Exception e) {
throw new RuntimeException(StringUtils.stringifyException(e));
}
} //org/apache/hadoop/fs/shell/FsCommand.java
/**
* Register the command classes used by the fs subcommand
* @param factory where to register the class
*/
public static void registerCommands(CommandFactory factory) {
factory.registerCommands(AclCommands.class);
factory.registerCommands(CopyCommands.class);
factory.registerCommands(Count.class);
factory.registerCommands(Delete.class);
factory.registerCommands(Display.class);
factory.registerCommands(Find.class);
factory.registerCommands(FsShellPermissions.class);
factory.registerCommands(FsUsage.class);
factory.registerCommands(Ls.class);
factory.registerCommands(Mkdir.class);
factory.registerCommands(MoveCommands.class);
factory.registerCommands(SetReplication.class);
factory.registerCommands(Stat.class);
factory.registerCommands(Tail.class);
factory.registerCommands(Head.class);
factory.registerCommands(Test.class);
factory.registerCommands(Touch.class);
factory.registerCommands(Truncate.class);
factory.registerCommands(SnapshotCommands.class);
factory.registerCommands(XAttrCommands.class);
}

2、命令解析

  /**
* Parse the user-specified options, get the generic options, and modify
* configuration accordingly.
*
* @param opts Options to use for parsing args.
* @param args User-specified arguments
* @return true if the parse was successful
*/
private boolean parseGeneralOptions(Options opts, String[] args)
throws IOException {
opts = buildGeneralOptions(opts);
CommandLineParser parser = new GnuParser();
boolean parsed = false;
try {
commandLine = parser.parse(opts, preProcessForWindows(args), true);
processGeneralOptions(commandLine);
parsed = true;
} catch(ParseException e) {
LOG.warn("options parsing failed: "+e.getMessage()); HelpFormatter formatter = new HelpFormatter();
formatter.printHelp("general options are: ", opts);
}
return parsed;
}
/**
* Retrieve any left-over non-recognized options and arguments
*
* @return remaining items passed in but not parsed as an array
*/
public String[] getArgs()
{
String[] answer = new String[args.size()]; args.toArray(answer); return answer;
}

3、命令执行

/**
* run
*/
@Override
public int run(String argv[]) throws Exception {
// initialize FsShell
init();
Tracer tracer = new Tracer.Builder("FsShell").
conf(TraceUtils.wrapHadoopConf(SHELL_HTRACE_PREFIX, getConf())).
build();
int exitCode = -1;
if (argv.length < 1) {
printUsage(System.err);
} else {
String cmd = argv[0];
Command instance = null;
try {
instance = commandFactory.getInstance(cmd);
if (instance == null) {
throw new UnknownCommandException();
}
TraceScope scope = tracer.newScope(instance.getCommandName());
if (scope.getSpan() != null) {
String args = StringUtils.join(" ", argv);
if (args.length() > 2048) {
args = args.substring(0, 2048);
}
scope.getSpan().addKVAnnotation("args", args);
}
try {
exitCode = instance.run(Arrays.copyOfRange(argv, 1, argv.length));
} finally {
scope.close();
}
} catch (IllegalArgumentException e) {
if (e.getMessage() == null) {
displayError(cmd, "Null exception message");
e.printStackTrace(System.err);
} else {
displayError(cmd, e.getLocalizedMessage());
}
printUsage(System.err);
if (instance != null) {
printInstanceUsage(System.err, instance);
}
} catch (Exception e) {
// instance.run catches IOE, so something is REALLY wrong if here
LOG.debug("Error", e);
displayError(cmd, "Fatal internal error");
e.printStackTrace(System.err);
}
}
tracer.close();
return exitCode;
}

4、格式化输出

  /** allows stdout to be captured if necessary */
public PrintStream out = System.out;
/** allows stderr to be captured if necessary */
public PrintStream err = System.err;
/** allows the command factory to be used if necessary */
private CommandFactory commandFactory = null;

HDFS命令实现分析的更多相关文章

  1. HDFS源码分析数据块校验之DataBlockScanner

    DataBlockScanner是运行在数据节点DataNode上的一个后台线程.它为所有的块池管理块扫描.针对每个块池,一个BlockPoolSliceScanner对象将会被创建,其运行在一个单独 ...

  2. HDFS源码分析心跳汇报之数据块汇报

    在<HDFS源码分析心跳汇报之数据块增量汇报>一文中,我们详细介绍了数据块增量汇报的内容,了解到它是时间间隔更长的正常数据块汇报周期内一个smaller的数据块汇报,它负责将DataNod ...

  3. HDFS源码分析心跳汇报之BPServiceActor工作线程运行流程

    在<HDFS源码分析心跳汇报之数据结构初始化>一文中,我们了解到HDFS心跳相关的BlockPoolManager.BPOfferService.BPServiceActor三者之间的关系 ...

  4. HDFS源码分析心跳汇报之数据块增量汇报

    在<HDFS源码分析心跳汇报之BPServiceActor工作线程运行流程>一文中,我们详细了解了数据节点DataNode周期性发送心跳给名字节点NameNode的BPServiceAct ...

  5. HDfs命令

    HDFS命令分为用户命令(dfs,fsck等),管理员命令(dfsadmn,namenode,datanode等) hdfs -ls -lsr 执行lsr 是递归显示 drwxr-xr-x -hado ...

  6. sodu 命令场景分析

    摘自:http://www.cnblogs.com/hazir/p/sudo_command.html sudo 命令情景分析   Linux 下使用 sudo 命令,可以让普通用户也能执行一些或者全 ...

  7. 4-linux、hdfs命令

    定义: linux:Linux是一套免费使用和自由传播的类Unix操作系统,是一个基于POSIX和UNIX的多用户.多任务.支持多线程和多CPU的 操作系统.它能运行主要的UNIX工具软件.应用程序和 ...

  8. hdfs命令get或者put提示找不到目录或文件

    今天用hdfs命令出现个诡异情况: hadoop fs -put a.txt /user/root/ put: `a.txt': No such file or directory 用get命令存在相 ...

  9. HDFS 命令大全

    目录 概要 用户命令 dfs 命令 追加文件内容 查看文件内容 得到文件的校验信息 修改用户组 修改文件权限 修改文件所属用户 本地拷贝到 hdfs hdfs 拷贝到本地 获取目录,文件数量及大小 h ...

随机推荐

  1. TYPE_SCROLL_INSENSITIVE is not compatible with CONCUR_UPDATABLE

    There are two options when setting ResultSet to be scrollable: TYPE_SCROLL_INSENSITIVE - The result ...

  2. maven 结合mybaits整合框架,打包时mapper.xml文件,mapper目录打不进war包去问题

    首先,来看下MAVENx项目标准的目录结构: 一般情况下,我们用到的资源文件(各种xml,properites,xsd文件等)都放在src/main/resources下面,利用maven打包时,ma ...

  3. Mysql中的delimiter详解

    初学mysql时,可能不太明白delimiter的真正用途,delimiter在mysql很多地方出现,比如存储过程.触发器.函数等. 学过oracle的人,再来学mysql就会感到很奇怪,百思不得其 ...

  4. 在 Azure 中备份 Linux 虚拟机

    可以通过定期创建备份来保护数据. Azure 备份可创建恢复点,这些恢复点存储在异地冗余的恢复保管库中. 从恢复点还原时,可以还原整个 VM,或只是还原特定的文件. 本文介绍如何将单个文件还原到运行 ...

  5. hadoop伪分布模式安装

    软件环境 操作系统 :  OracleLinux-R6-U6 主机名: hadoop java: jdk1.7.0_75 hadoop: hadoop-2.4.1 环境搭建 1.软件安装 由于所需的软 ...

  6. 查看windows所有exe的启动参数。

    在cmd中输入 wmicprocess 即可查看到所有进程的启动参数和运行参数.

  7. Oracle 补丁那些事儿(PS、PSU、CPU、SPU、BP、DBBP…)

    当前ORACLE数据库提供两种方式的补丁一种是主动的Proactive Patches和另一种被动的Reactive Patches,其中Reactive Patches是指过去的ONE-OFF Pa ...

  8. [翻译] CBStoreHouseRefreshControl

    CBStoreHouseRefreshControl What is it? A fully customizable pull-to-refresh control for iOS inspired ...

  9. json格式转换(json,csjon)(天气预报)

    json格式数据默认为string,可以使用eval()函数或者json模块将其转换为dict.标准Json字符串必须使用双引号(")而不能使用单引号('),否则从字符串转换成dict类型会 ...

  10. Redis学习---Redis操作之Set

    Set操作,Set集合就是不允许重复的列表 sadd(name,values) name对应的集合中添加元素 --------------------------------------------- ...