对于Hadoop的MapReduce编程makefile
根据近期需要hadoop的MapReduce程序集成到一个大的应用C/C++书面框架。在需求make当自己主动MapReduce编译和打包的应用。
在这里,一个简单的WordCount1一个例子详细的实施细则,注意:hadoop版本号2.4.0.
源码包括两个文件。一个是WordCount1.java是详细的对单词计数实现的逻辑。第二个是CounterThread.java。当中简单的当前处理的行数做一个统计和打印。代码分别见附1. 编写makefile的关键是将hadoop提供的jar包的路径所有载入进来,看到网上非常多资料都自己实现一个脚本把hadoop文件夹下所有的.jar文件放到一个路径中。然后进行编译。这样的做法太麻烦了。当然也有些简单的办法,可是都是比較老的hadoop版本号如0.20之类的。
事实上,hadoop提供了一个命令hadoop classpath能够获得包括全部jar包的路径.所以仅仅须要用 javac -classpath "`hadoop classpath`" *.java 便可。然后使用jar -cvf对class文件进行打包就能够了。
详细的Makefile代码例如以下:
SRC_DIR = src/mypackage/*.java
CLASS_DIR = bin
TARGET_JAR = WordCount all:$(TARGET_JAR) $(TARGET_JAR): $(SRC_DIR)
mkdir -p $(CLASS_DIR)
# javac -classpath `$(HADOOP) classpath` -d $(CLASS_DIR) $(SRC_DIR)
javac -classpath "`hadoop classpath`" src/mypackage/*.java -d $(CLASS_DIR) -Xlint
jar -cvf $(TARGET_JAR).jar -C $(CLASS_DIR) ./ clean:
rm -rf $(CLASS_DIR) *.jar
make一下:
lichao@ubuntu:WordCount1$ make
mkdir -p bin
javac -classpath "`hadoop classpath`" src/mypackage/*.java -d bin -Xlint
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/jaxb-api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/activation.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/jsr173_1.0_api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/jaxb1-impl.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/jaxb-api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/activation.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/jsr173_1.0_api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/jaxb1-impl.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/contrib/capacity-scheduler/*.jar": no such file or directory
src/mypackage/WordCount1.java:61: warning: [deprecation] Job(Configuration,String) in Job has been deprecated
Job job = new Job(conf, "WordCount1"); //建立新job
^
10 warnings
jar -cvf WordCount.jar -C bin ./
added manifest
adding: mypackage/(in = 0) (out= 0)(stored 0%)
adding: mypackage/WordCount1.class(in = 1970) (out= 1037)(deflated 47%)
adding: mypackage/CounterThread.class(in = 1760) (out= 914)(deflated 48%)
adding: mypackage/WordCount1$IntSumReducer.class(in = 1762) (out= 749)(deflated 57%)
adding: mypackage/WordCount1$TokenizerMapper.class(in = 1759) (out= 762)(deflated 56%)
adding: log4j.properties(in = 476) (out= 172)(deflated 63%)
尽管有warning,可是不影响结果。
编译后。我们来简单的測试一下。
先生成測试数据:while true; do seq 1 100000 >> tmpfile; done; 差点儿相同能够了就Ctrl+c
然后将数据放到hdfs上。hadoop fs -put tmpfile /data/
接着执行MapReduce程序:hadoop jar WordCount.jar mypackage/WordCount1 /data/tmpfile /output2
效果例如以下:
14/07/15 13:26:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/07/15 13:26:03 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
14/07/15 13:26:05 INFO input.FileInputFormat: Total input paths to process : 1
14/07/15 13:26:05 INFO mapreduce.JobSubmitter: number of splits:6
14/07/15 13:26:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1405397597558_0003
14/07/15 13:26:06 INFO impl.YarnClientImpl: Submitted application application_1405397597558_0003
14/07/15 13:26:06 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1405397597558_0003/
14/07/15 13:26:06 INFO mapreduce.Job: Running job: job_1405397597558_0003
14/07/15 13:26:20 INFO mapreduce.Job: Job job_1405397597558_0003 running in uber mode : false
14/07/15 13:26:20 INFO mapreduce.Job: map 0% reduce 0%
14/07/15 13:26:34 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
输入行数:0
14/07/15 13:26:48 INFO mapreduce.Job: map 2% reduce 0%
输入行数:3138474
14/07/15 13:26:51 INFO mapreduce.Job: map 5% reduce 0%
14/07/15 13:26:54 INFO mapreduce.Job: map 6% reduce 0%
14/07/15 13:26:55 INFO mapreduce.Job: map 8% reduce 0%
14/07/15 13:26:57 INFO mapreduce.Job: map 9% reduce 0%
14/07/15 13:26:58 INFO mapreduce.Job: map 11% reduce 0%
14/07/15 13:27:00 INFO mapreduce.Job: map 12% reduce 0%
14/07/15 13:27:01 INFO mapreduce.Job: map 13% reduce 0%
输入行数:23383595
14/07/15 13:27:05 INFO mapreduce.Job: map 14% reduce 0%
输入行数:23383595
14/07/15 13:27:23 INFO mapreduce.Job: map 15% reduce 0%
14/07/15 13:27:27 INFO mapreduce.Job: map 16% reduce 0%
14/07/15 13:27:28 INFO mapreduce.Job: map 18% reduce 0%
14/07/15 13:27:30 INFO mapreduce.Job: map 19% reduce 0%
14/07/15 13:27:31 INFO mapreduce.Job: map 21% reduce 0%
14/07/15 13:27:34 INFO mapreduce.Job: map 24% reduce 0%
输入行数:38430301
14/07/15 13:27:37 INFO mapreduce.Job: map 25% reduce 0%
14/07/15 13:27:40 INFO mapreduce.Job: map 26% reduce 0%
输入行数:42826322
14/07/15 13:27:57 INFO mapreduce.Job: map 27% reduce 0%
14/07/15 13:28:00 INFO mapreduce.Job: map 29% reduce 0%
14/07/15 13:28:02 INFO mapreduce.Job: map 30% reduce 0%
14/07/15 13:28:03 INFO mapreduce.Job: map 32% reduce 0%
输入行数:54513531
14/07/15 13:28:05 INFO mapreduce.Job: map 33% reduce 0%
14/07/15 13:28:06 INFO mapreduce.Job: map 34% reduce 0%
14/07/15 13:28:08 INFO mapreduce.Job: map 35% reduce 0%
14/07/15 13:28:09 INFO mapreduce.Job: map 36% reduce 0%
输入行数:60959081
14/07/15 13:28:22 INFO mapreduce.Job: map 42% reduce 0%
14/07/15 13:28:30 INFO mapreduce.Job: map 43% reduce 0%
14/07/15 13:28:31 INFO mapreduce.Job: map 44% reduce 0%
14/07/15 13:28:34 INFO mapreduce.Job: map 45% reduce 0%
14/07/15 13:28:35 INFO mapreduce.Job: map 46% reduce 0%
输入行数:69936159
14/07/15 13:28:37 INFO mapreduce.Job: map 47% reduce 0%
14/07/15 13:28:38 INFO mapreduce.Job: map 48% reduce 0%
14/07/15 13:28:41 INFO mapreduce.Job: map 49% reduce 0%
14/07/15 13:28:44 INFO mapreduce.Job: map 50% reduce 0%
输入行数:77160461
14/07/15 13:29:01 INFO mapreduce.Job: map 51% reduce 0%
14/07/15 13:29:04 INFO mapreduce.Job: map 52% reduce 0%
14/07/15 13:29:05 INFO mapreduce.Job: map 53% reduce 0%
输入行数:83000373
14/07/15 13:29:07 INFO mapreduce.Job: map 54% reduce 0%
14/07/15 13:29:09 INFO mapreduce.Job: map 55% reduce 0%
14/07/15 13:29:10 INFO mapreduce.Job: map 56% reduce 0%
14/07/15 13:29:13 INFO mapreduce.Job: map 57% reduce 0%
14/07/15 13:29:16 INFO mapreduce.Job: map 58% reduce 0%
输入行数:93361766
14/07/15 13:29:32 INFO mapreduce.Job: map 59% reduce 0%
输入行数:98194696
14/07/15 13:29:35 INFO mapreduce.Job: map 60% reduce 0%
14/07/15 13:29:37 INFO mapreduce.Job: map 61% reduce 0%
14/07/15 13:29:38 INFO mapreduce.Job: map 62% reduce 0%
14/07/15 13:29:40 INFO mapreduce.Job: map 63% reduce 0%
14/07/15 13:29:41 INFO mapreduce.Job: map 64% reduce 0%
14/07/15 13:29:44 INFO mapreduce.Job: map 65% reduce 0%
14/07/15 13:29:48 INFO mapreduce.Job: map 66% reduce 0%
输入行数:109562184
14/07/15 13:30:04 INFO mapreduce.Job: map 67% reduce 0%
输入行数:113362818
14/07/15 13:30:06 INFO mapreduce.Job: map 68% reduce 0%
14/07/15 13:30:08 INFO mapreduce.Job: map 69% reduce 0%
14/07/15 13:30:10 INFO mapreduce.Job: map 70% reduce 0%
14/07/15 13:30:12 INFO mapreduce.Job: map 71% reduce 0%
14/07/15 13:30:15 INFO mapreduce.Job: map 72% reduce 0%
输入行数:123074119
14/07/15 13:30:32 INFO mapreduce.Job: map 76% reduce 0%
14/07/15 13:30:33 INFO mapreduce.Job: map 80% reduce 0%
14/07/15 13:30:34 INFO mapreduce.Job: map 83% reduce 0%
14/07/15 13:30:35 INFO mapreduce.Job: map 84% reduce 0%
输入行数:123074119
14/07/15 13:30:37 INFO mapreduce.Job: map 89% reduce 0%
14/07/15 13:30:38 INFO mapreduce.Job: map 92% reduce 0%
14/07/15 13:30:39 INFO mapreduce.Job: map 95% reduce 0%
14/07/15 13:30:40 INFO mapreduce.Job: map 100% reduce 0%
输入行数:123074119
14/07/15 13:30:53 INFO mapreduce.Job: map 100% reduce 100%
14/07/15 13:30:53 INFO mapreduce.Job: Job job_1405397597558_0003 completed successfully
14/07/15 13:30:53 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=58256119
FILE: Number of bytes written=66039749
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=724520133
HDFS: Number of bytes written=1088895
HDFS: Number of read operations=21
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=2
Launched map tasks=8
Launched reduce tasks=1
Data-local map tasks=8
Total time spent by all maps in occupied slots (ms)=1528715
Total time spent by all reduces in occupied slots (ms)=17508
Total time spent by all map tasks (ms)=1528715
Total time spent by all reduce tasks (ms)=17508
Total vcore-seconds taken by all map tasks=1528715
Total vcore-seconds taken by all reduce tasks=17508
Total megabyte-seconds taken by all map tasks=1565404160
Total megabyte-seconds taken by all reduce tasks=17928192
Map-Reduce Framework
Map input records=123074119
Map output records=123074119
Map output bytes=1216795535
Map output materialized bytes=7133406
Input split bytes=594
Combine input records=127374119
Combine output records=4900000
Reduce input groups=100000
Reduce shuffle bytes=7133406
Reduce input records=600000
Reduce output records=100000
Spilled Records=5500000
Shuffled Maps =6
Failed Shuffles=0
Merged Map outputs=6
GC time elapsed (ms)=39761
CPU time spent (ms)=1397060
Physical memory (bytes) snapshot=1797943296
Virtual memory (bytes) snapshot=5082316800
Total committed heap usage (bytes)=1398800384
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=724519539
File Output Format Counters
Bytes Written=1088895
附录1:WordCount1.java和CounterThread.java的代码
//WordCount1.java代码
package mypackage; import java.io.IOException;
import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser; public class WordCount1 {
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); //建立"int"型变量one,初值为1
private Text word = new Text(); //建立"string:型变量 word,用于接收传入的单词 public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString()); //将输入的文本按行分段
while (itr.hasMoreTokens()) {
word.set(itr.nextToken()); //为word赋值
context.write(word, one); // 将 键-值 对 word one 传入
}
//System.out.println("read lines:"+context.getCounter("org.apache.hadoop.mapred.Task$Counter","MAP_INPUT_RECORDS").getValue());
//System.out.println( "输入行数:" + context.getCounters().findCounter("org.apache.hadoop.mapred.Task$Counter", "MAP_INPUT_RECORDS").getValue() );
//System.out.println( "输入行数:" + context.getCounters().findCounter("", "MAP_INPUT_RECORDS").getValue() );
}
} public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable(); //创建整型变量result public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0; //创建int 型变量sum 初值0
for (IntWritable val : values) {
sum += val.get(); //将每一个key相应的全部value类间 }
result.set(sum); //sum传入result
context.write(key, result); //将 key-result对传入
}
} public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
//String[] newArgs = new String[]{"hdfs://localhost:9000/data/tmpfile","hdfs://localhost:9000/data/wc_output"};
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "WordCount1"); //建立新job
job.setJarByClass(WordCount1.class);
job.setMapperClass(TokenizerMapper.class); //设置map类
job.setCombinerClass(IntSumReducer.class); //设置combiner类
job.setReducerClass(IntSumReducer.class); //设置reducer类
job.setOutputKeyClass(Text.class); //输出的key类型
job.setOutputValueClass(IntWritable.class); //输出的value类型
FileInputFormat.addInputPath(job, new Path(otherArgs[0])); //输入输出參数(在设置中指定)
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); CounterThread ct = new CounterThread(job);
ct.start(); job.waitForCompletion(true); System.exit(0);
//System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
//CounterThread.java的代码
package mypackage; import java.lang.*;
import java.io.IOException;
import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobStatus;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser; public class CounterThread extends Thread{ public CounterThread(Job job) {
_job = job;
} public void run() {
while(true){
try {
Thread.sleep(1000*5);
} catch (InterruptedException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
try {
if(_job.getStatus().getState() == JobStatus.State.RUNNING)
//continue;
System.out.println( "输入行数:" + _job.getCounters().findCounter("org.apache.hadoop.mapred.Task$Counter", "MAP_INPUT_RECORDS").getValue() );
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
} private Job _job;
}
对于Hadoop的MapReduce编程makefile的更多相关文章
- [Hadoop入门] - 1 Ubuntu系统 Hadoop介绍 MapReduce编程思想
Ubuntu系统 (我用到版本号是140.4) ubuntu系统是一个以桌面应用为主的Linux操作系统,Ubuntu基于Debian发行版和GNOME桌面环境.Ubuntu的目标在于为一般用户提供一 ...
- hadoop之mapreduce编程实例(系统日志初步清洗过滤处理)
刚刚开始接触hadoop的时候,总觉得必须要先安装hadoop集群才能开始学习MR编程,其实并不用这样,当然如果你有条件有机器那最好是自己安装配置一个hadoop集群,这样你会更容易理解其工作原理.我 ...
- Hadoop MapReduce编程 API入门系列之压缩和计数器(三十)
不多说,直接上代码. Hadoop MapReduce编程 API入门系列之小文件合并(二十九) 生成的结果,作为输入源. 代码 package zhouls.bigdata.myMapReduce. ...
- Hadoop MapReduce编程学习
一直在搞spark,也没时间弄hadoop,不过Hadoop基本的编程我觉得我还是要会吧,看到一篇不错的文章,不过应该应用于hadoop2.0以前,因为代码中有 conf.set("map ...
- [转]Hadoop集群_WordCount运行详解--MapReduce编程模型
Hadoop集群_WordCount运行详解--MapReduce编程模型 下面这篇文章写得非常好,有利于初学mapreduce的入门 http://www.nosqldb.cn/1369099810 ...
- MapReduce编程模型及其在Hadoop上的实现
转自:https://www.zybuluo.com/frank-shaw/note/206604 MapReduce基本过程 关于MapReduce中数据流的传输过程,下图是一个经典演示: 关于上 ...
- Hadoop MapReduce编程 API入门系列之挖掘气象数据版本3(九)
不多说,直接上干货! 下面,是版本1. Hadoop MapReduce编程 API入门系列之挖掘气象数据版本1(一) 下面是版本2. Hadoop MapReduce编程 API入门系列之挖掘气象数 ...
- Hadoop MapReduce编程 API入门系列之挖掘气象数据版本2(十)
下面,是版本1. Hadoop MapReduce编程 API入门系列之挖掘气象数据版本1(一) 这篇博文,包括了,实际生产开发非常重要的,单元测试和调试代码.这里不多赘述,直接送上代码. MRUni ...
- mapreduce编程模型你知道多少?
上次新霸哥给大家介绍了一些hadoop的相关知识,发现大家对hadoop有了一定的了解,但是还有很多的朋友对mapreduce很模糊,下面新霸哥将带你共同学习mapreduce编程模型. mapred ...
随机推荐
- hdu 1665 That Nice Euler Circuit(欧拉定理)
输入n个点,然后从第一个点开始,依次链接点i->点i+1,最后回到第一点(输入中的点n),求得到的图形将平面分成了多少部分. 根据欧拉定理 v_num + f_num - e_num = 2可知 ...
- Qt4在linux下的安装
1.下载SDK ftp://ftp.informatik.hu-berlin.de/pub/Mirrors/ftp.troll.no/QT/qtsdk/qt-sdk-linux-x86-opensou ...
- Photon的使用
这几个月给公司一个正在做的半吊子游戏加pvp功能,一个人居然要2个多月弄个 PVP 类似 Dota 对战的游戏.我手里有套现成搭建服务端架构都没敢用起来,这服务器还是太初步了,只是验证了 Boost ...
- <一年成为Emacs高手>更新到20130706版
这次更新比较多,加了第三方精品插件推荐,添加了我认为不错的Emacs社区. 见 原文
- xcode6 cocos2dx开玩笑git和github学习记录
1. git Xcode4开始,它一直Git作为一个内置的源代码控制(Source Control)工具,所以对于新项目的用途git要管理非常方便.在新建项目向导.可以直接选择Git作为源控制工具.项 ...
- Dom对象和JQuery对象的详细介绍及其区别
一直搞不清Dom对象和JQuery对象之间的区别,今天好好总结下 1.dom对象(摘抄自百度百科http://baike.baidu.com/link?url=4L8bZ7kW6kE-it4F-1LU ...
- POJ2421 & HDU1102 Constructing Roads(最小生成树)
嘎唔!~又一次POJ过了HDU错了...不禁让我想起前两天的的Is it a tree? orz..这次竟然错在HDU一定要是多组数据输入输出!(无力吐槽TT)..题目很简单,炒鸡水! 题意: 告 ...
- jQuery 自学笔记—8 常见操作
jQuery 拥有可操作 HTML 元素和属性的强大方法. jQuery DOM 操作 jQuery 中非常重要的部分,就是操作 DOM 的能力. jQuery 提供一系列与 DOM 相关的方法,这使 ...
- python语言学习8——字符串和编码
Unicode编码 计算机只能处理数字,如果要处理文本,就必须把文本转化为数字才能处理 有许多编码标准,但是不同的编码标准有时候会混乱,所以Unicode应运而生 Unicode把所有语言统一到一套编 ...
- jdk1.6与1.7垃圾回收
最近项目中遇到了个关于JVM中GC线程数的问题,做一下简单的总结 问题场景: server:均为 sun公司的solaris 系统 CPU 128个 项目8.1时使用的 java版本: jdk1. ...