使用ToolRunner运行Hadoop程序基本原理分析
为了简化命令行方式运行作业,Hadoop自带了一些辅助类。GenericOptionsParser是一个类,用来解释常用的Hadoop命令行选项,并根据需要,为Configuration对象设置相应的取值。通常不直接使用GenericOptionsParser,更方便的方式是:实现Tool接口,通过ToolRunner来运行应用程序,ToolRunner内部调用GenericOptionsParser。
A utility to help run Tools.
ToolRunner can be used to run classes implementing Tool interface. It works in conjunction with GenericOptionsParser to parse the generic hadoop command line arguments and modifies the Configuration of the Tool.
The application-specific options are passed along without being modified.
run
- public static int run(Configuration conf,
- Tool tool,
- String[] args)
- throws Exception
- Runs the given Tool by Tool.run(String[]), after parsing with the given generic arguments. Uses
the given Configuration, or builds one if null. Sets the Tool's configuration with the possibly modified version of the conf. -
- Parameters:
- conf - Configuration for the Tool.
- tool - Tool to run.
- args - command-line arguments to the tool.
- Returns:
- exit code of the Tool.run(String[]) method.
- Throws:
- Exception
run
- Runs the Tool with its Configuration. Equivalent to run(tool.getConf(), tool, args).
-
- Parameters:
- tool - Tool to run.
- args - command-line arguments to the tool.
- Returns:
- exit code of the Tool.run(String[]) method.
- Throws:
- Exception
它们均是静态方法,即可以通过类名调用。
除此以外,还有一个方法:
static void
printGenericCommandUsage(PrintStream out)
Prints generic command-line argurments and usage information.
4、ToolRunner完成以下2个功能:
(1)为Tool创建一个Configuration对象。
(2)使得程序可以方便的读取参数配置。
ToolRunner完整源代码如下:
- package org.apache.hadoop.util;
- import java.io.PrintStream;
- import org.apache.hadoop.conf.Configuration;
- /**
- * A utility to help run {@link Tool}s.
- *
- * <p><code>ToolRunner</code> can be used to run classes implementing
- * <code>Tool</code> interface. It works in conjunction with
- * {@link GenericOptionsParser} to parse the
- * <a href="{@docRoot}/org/apache/hadoop/util/GenericOptionsParser.html#GenericOptions">
- * generic hadoop command line arguments</a> and modifies the
- * <code>Configuration</code> of the <code>Tool</code>. The
- * application-specific options are passed along without being modified.
- * </p>
- *
- * @see Tool
- * @see GenericOptionsParser
- */
- public class ToolRunner {
- /**
- * Runs the given <code>Tool</code> by {@link Tool#run(String[])}, after
- * parsing with the given generic arguments. Uses the given
- * <code>Configuration</code>, or builds one if null.
- *
- * Sets the <code>Tool</code>'s configuration with the possibly modified
- * version of the <code>conf</code>.
- *
- * @param conf <code>Configuration</code> for the <code>Tool</code>.
- * @param tool <code>Tool</code> to run.
- * @param args command-line arguments to the tool.
- * @return exit code of the {@link Tool#run(String[])} method.
- */
- public static int run(Configuration conf, Tool tool, String[] args)
- throws Exception{
- if(conf == null) {
- conf = new Configuration();
- }
- GenericOptionsParser parser = new GenericOptionsParser(conf, args);
- //set the configuration back, so that Tool can configure itself
- tool.setConf(conf);
- //get the args w/o generic hadoop args
- String[] toolArgs = parser.getRemainingArgs();
- return tool.run(toolArgs);
- }
- /**
- * Runs the <code>Tool</code> with its <code>Configuration</code>.
- *
- * Equivalent to <code>run(tool.getConf(), tool, args)</code>.
- *
- * @param tool <code>Tool</code> to run.
- * @param args command-line arguments to the tool.
- * @return exit code of the {@link Tool#run(String[])} method.
- */
- public static int run(Tool tool, String[] args)
- throws Exception{
- return run(tool.getConf(), tool, args);
- }
- /**
- * Prints generic command-line argurments and usage information.
- *
- * @param out stream to write usage information to.
- */
- public static void printGenericCommandUsage(PrintStream out) {
- GenericOptionsParser.printGenericCommandUsage(out);
- }
- }
Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:
- core-default.xml : Read-only defaults for hadoop.
- core-site.xml: Site-specific configuration for a given hadoop installation.
- static{
- //print deprecation warning if hadoop-site.xml is found in classpath
- ClassLoader cL = Thread.currentThread().getContextClassLoader();
- if (cL == null) {
- cL = Configuration.class.getClassLoader();
- }
- if(cL.getResource("hadoop-site.xml")!=null) {
- LOG.warn("DEPRECATED: hadoop-site.xml found in the classpath. " +
- "Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, "
- + "mapred-site.xml and hdfs-site.xml to override properties of " +
- "core-default.xml, mapred-default.xml and hdfs-default.xml " +
- "respectively");
- }
- addDefaultResource("core-default.xml");
- addDefaultResource("core-site.xml");
- }
Configuration.java的源代码中包含了以上代码,即通过静态语句为程序加载core-default.xml以及core-site.xml中的参数。
同时,检查是否还存在hadoop-site.xml,若还存在,则给出warning,提醒此配置文件已经废弃。
- for (Entry<String, String> entry : conf){
- .....
- }
(四)关于Tool
- package org.apache.hadoop.util;
- import org.apache.hadoop.conf.Configurable;
- public interface Tool extends Configurable {
- int run(String [] args) throws Exception;
- }
由此可见,Tool自身只有一个方法run(String[]),同时它继承了Configuable的2个方法。
- package org.apache.hadoop.conf;
- public interface Configurable {
- void setConf(Configuration conf);
- Configuration getConf();
- }
2、Configured的源文件如下:
- package org.apache.hadoop.conf;
- public class Configured implements Configurable {
- private Configuration conf;
- public Configured() {
- this(null);
- }
- public Configured(Configuration conf) {
- setConf(conf);
- }
- public void setConf(Configuration conf) {
- this.conf = conf;
- }
- public Configuration getConf() {
- return conf;
- }
- }
它有2个构造方法,分别是带Configuration参数的方法与不还参数的方法。
- package org.jediael.hadoopdemo.toolrunnerdemo;
- import java.util.Map.Entry;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.conf.Configured;
- import org.apache.hadoop.util.Tool;
- import org.apache.hadoop.util.ToolRunner;
- public class ToolRunnerDemo extends Configured implements Tool {
- static {
- //Configuration.addDefaultResource("hdfs-default.xml");
- //Configuration.addDefaultResource("hdfs-site.xml");
- //Configuration.addDefaultResource("mapred-default.xml");
- //Configuration.addDefaultResource("mapred-site.xml");
- }
- @Override
- public int run(String[] args) throws Exception {
- Configuration conf = getConf();
- for (Entry<String, String> entry : conf) {
- System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());
- }
- return 0;
- }
- public static void main(String[] args) throws Exception {
- int exitCode = ToolRunner.run(new ToolRunnerDemo(), args);
- System.exit(exitCode);
- }
- }
io.seqfile.compress.blocksize=1000000
keep.failed.task.files=false
mapred.disk.healthChecker.interval=60000
dfs.df.interval=60000
dfs.datanode.failed.volumes.tolerated=0
mapreduce.reduce.input.limit=-1
mapred.task.tracker.http.address=0.0.0.0:50060
mapred.used.genericoptionsparser=true
mapred.userlog.retain.hours=24
dfs.max.objects=0
mapred.jobtracker.jobSchedulable=org.apache.hadoop.mapred.JobSchedulable
mapred.local.dir.minspacestart=0
hadoop.native.lib=true
color=yello
wc
68 68 3028
<?xml version="1.0"?>
- package org.jediael.hadoopdemo.toolrunnerdemo;
- import java.io.IOException;
- import java.util.StringTokenizer;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.conf.Configured;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.Mapper;
- import org.apache.hadoop.mapreduce.Reducer;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
- import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
- import org.apache.hadoop.util.Tool;
- import org.apache.hadoop.util.ToolRunner;
- public class WordCount extends Configured implements Tool{
- public static class WordCountMap extends
- Mapper<LongWritable, Text, Text, IntWritable> {
- private final IntWritable one = new IntWritable(1);
- private Text word = new Text();
- public void map(LongWritable key, Text value, Context context)
- throws IOException, InterruptedException {
- String line = value.toString();
- StringTokenizer token = new StringTokenizer(line);
- while (token.hasMoreTokens()) {
- word.set(token.nextToken());
- context.write(word, one);
- }
- }
- }
- public static class WordCountReduce extends
- Reducer<Text, IntWritable, Text, IntWritable> {
- public void reduce(Text key, Iterable<IntWritable> values,
- Context context) throws IOException, InterruptedException {
- int sum = 0;
- for (IntWritable val : values) {
- sum += val.get();
- }
- context.write(key, new IntWritable(sum));
- }
- }
- @Override
- public int run(String[] args) throws Exception {
- Configuration conf = new Configuration();
- Job job = new Job(conf);
- job.setJarByClass(WordCount.class);
- job.setJobName("wordcount");
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(IntWritable.class);
- job.setMapperClass(WordCountMap.class);
- job.setReducerClass(WordCountReduce.class);
- job.setInputFormatClass(TextInputFormat.class);
- job.setOutputFormatClass(TextOutputFormat.class);
- FileInputFormat.addInputPath(job, new Path(args[0]));
- FileOutputFormat.setOutputPath(job, new Path(args[1]));
- return(job.waitForCompletion(true)?0:-1);
- }
- public static void main(String[] args) throws Exception {
- int exitCode = ToolRunner.run(new WordCount(), args);
- System.exit(exitCode);
- }
- }
运行程序:
- [root@jediael project]# hadoop fs -mkdir wcin2
- [root@jediael project]# hadoop fs -copyFromLocal /opt/jediael/apache-nutch-2.2.1/CHANGES.txt wcin2
- [root@jediael project]# hadoop jar wordcount2.jar org.jediael.hadoopdemo.toolrunnerdemo.WordCount wcin2 wcout2
使用ToolRunner运行Hadoop程序基本原理分析的更多相关文章
- 使用ToolRunner运行Hadoop程序基本原理分析 分类: A1_HADOOP 2014-08-22 11:03 3462人阅读 评论(1) 收藏
为了简化命令行方式运行作业,Hadoop自带了一些辅助类.GenericOptionsParser是一个类,用来解释常用的Hadoop命令行选项,并根据需要,为Configuration对象设置相应的 ...
- eclipse运行hadoop程序报错:Connection refused: no further information
eclipse运行hadoop程序报错:Connection refused: no further information log4j:WARN No appenders could be foun ...
- 【爬坑】在 IDEA 中运行 Hadoop 程序 报 winutils.exe 不存在错误解决方案
0. 问题说明 环境为 Windows 10 在 IDEA 中运行 Hadoop 程序报 winutils.exe 不存在 错误 1. 解决方案 [1.1 解压] 解压 hadoop-2.7.3 ...
- 如何在Ubuntu的idea上运行Hadoop程序
如何在Ubuntu的idea上运行Hadoop程序 一.前言 在idea上运行Hadoop程序,需要使用Hadoop的相关库,Ubuntu为Hadoop的运行提供了良好的支持. 二.操作方法 首先我们 ...
- WIN7下运行hadoop程序报:Failed to locate the winutils binary in the hadoop binary path
之前在mac上调试hadoop程序(mac之前配置过hadoop环境)一直都是正常的.因为工作需要,需要在windows上先调试该程序,然后再转到linux下.程序运行的过程中,报Failed to ...
- 关于在Eclipse上运行Hadoop程序的日志输出问题
在安装由Eclipse-Hadoop-Plugin的Eclipse中, 可以直接运行Hadoop的MapReduce程序, 但是如果什么都不配置的话你发现Eclipse控制台没有任何日志输出, 这个问 ...
- 用java运行Hadoop程序报错:org.apache.hadoop.fs.LocalFileSystem cannot be cast to org.apache.
用java运行Hadoop例程报错:org.apache.hadoop.fs.LocalFileSystem cannot be cast to org.apache.所写代码如下: package ...
- Ubuntu中使用终端运行Hadoop程序
接上一篇<Ubuntu Kylin系统下安装Hadoop2.6.0> 通过上一篇,Hadoop伪分布式基本配好了. 下一步是运行一个MapReduce程序,以WordCount为例: 1. ...
- IDEA下调试和运行Hadoop程序例子
准备 配置好JDK和Hadoop环境, 在IDEA中建立maven项目,建立后的目录结构为: 修改pom..xml引入相关支持: <?xml version="1.0" en ...
随机推荐
- php用get_meta_tags轻松获取网页的meta信息
之前没发现php还有这个函数,get_meta_tags()直接就可以获取文件中meta标签的属性值,返回数组: <?php $metas = get_meta_tags('http://www ...
- destoon系统商城加淘宝客按钮方法
destoon系统很多喜欢运营B2B的站长都在用,其中的商城模块常常被用来做淘宝客,其中的难点是如何把购买按钮做成淘宝客地址,这个问题的修改在论坛上被叫价50元,下面小编把这个实用的方法分享下,希望对 ...
- 开发纯ndk程序之环境搭配
安装ndk 从安卓官网下载,ndk,双击解压到当前文件夹.建议想装在那个文件夹便解压到那个文件夹,而且文件夹的路径中不要有空格,因为gcc编译的时候会把空格前后两个字符串作为两个文件夹来对待. 使用g ...
- oracle 主键应用序列和触发器实现自动增长
oracle 主键自动增长 这几天搞Oracle,想让表的主键实现自动增长,查网络实现如下: create table simon_example ( id number(4) not null pr ...
- Android 网络通信框架Volley的简单使用
Volley是Android平台上的网络通信库,能使网络通信更快,更简单,更健壮. Volley提供的功能: JSON,图像等的异步下载: 网络请求的排序(scheduling) 网络请求的优先级处理 ...
- CCI_chapter 1
1.1Implement an algorithm to determine if a string has all unique characters What if you can not us ...
- Cmake设置环境变量
references: http://stackoverflow.com/questions/21047399/cmake-set-environment-variables-from-a-scrip ...
- Delphi 调试 通过BreakPoint
1.打个断点, 如下图 2. 在断点上,邮件,如下图 3. 弹出一个窗体 ,如下图 在 condition 中写条件就可以了. 这样就可以按你假设的条件来进行了,方便.
- cxSplitter.HotZone 怎么给分隔条增加值
请使用它的类名指定HotZoneClassName . cxSplitter1.ResizeUpdate := True; cxSplitter1.HotZoneClassName := 'TcxMe ...
- Oracle优化笔记
2016-11-22 子查询:标量子查询 内联视图(in-line view) 半连接/反连接 标量子查询 select 后跟子查询 类似自定义函数 可用开窗函数之类的改写 内联视图(in ...