Hadoop2.4.1入门实例:MaxTemperature
注意:以下内容在2.x版本与1.x版本同样适用,已在2.4.1与1.2.0进行测试。
一、前期准备
1、创建伪分布Hadoop环境,请参考官方文档。或者http://blog.csdn.net/jediael_lu/article/details/38637277
2、准备数据文件如下sample.txt:
123456798676231190101234567986762311901012345679867623119010123456798676231190101234561+00121534567890356
123456798676231190101234567986762311901012345679867623119010123456798676231190101234562+01122934567890456
123456798676231190201234567986762311901012345679867623119010123456798676231190101234562+02120234567893456
123456798676231190401234567986762311901012345679867623119010123456798676231190101234561+00321234567803456
123456798676231190101234567986762311902012345679867623119010123456798676231190101234561+00429234567903456
123456798676231190501234567986762311902012345679867623119010123456798676231190101234561+01021134568903456
123456798676231190201234567986762311902012345679867623119010123456798676231190101234561+01124234578903456
123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+04121234678903456
123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+00821235678903456
二、编写代码
1、创建Map
package org.jediael.hadoopDemo.maxtemperature; import java.io.IOException; import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper; public class MaxTemperatureMapper extends
Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999; @Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus
// signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
2、创建Reduce
package org.jediael.hadoopDemo.maxtemperature; import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; public class MaxTemperatureReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
3、创建main方法
package org.jediael.hadoopDemo.maxtemperature; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class MaxTemperature {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err
.println("Usage: MaxTemperature <input path> <output path>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
4、导出成MaxTemp.jar,并上传至运行程序的服务器。
三、运行程序
1、创建input目录并将sample.txt复制到input目录
hadoop fs -put sample.txt /
2、运行程序
export HADOOP_CLASSPATH=MaxTemp.jar
hadoop org.jediael.hadoopDemo.maxtemperature.MaxTemperature /sample.txt output10
注意输出目录不能已经存在,否则会创建失败。
3、查看结果
(1)查看结果
[jediael@jediael44 code]$ hadoop fs -cat output10/*
14/07/09 14:51:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1901 42
1902 212
1903 412
1904 32
1905 102
(2)运行时输出
[jediael@jediael44 code]$ hadoop org.jediael.hadoopDemo.maxtemperature.MaxTemperature /sample.txt output10
14/07/09 14:50:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/07/09 14:50:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/07/09 14:50:42 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
14/07/09 14:50:43 INFO input.FileInputFormat: Total input paths to process : 1
14/07/09 14:50:43 INFO mapreduce.JobSubmitter: number of splits:1
14/07/09 14:50:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1404888618764_0001
14/07/09 14:50:44 INFO impl.YarnClientImpl: Submitted application application_1404888618764_0001
14/07/09 14:50:44 INFO mapreduce.Job: The url to track the job: http://jediael44:8088/proxy/application_1404888618764_0001/
14/07/09 14:50:44 INFO mapreduce.Job: Running job: job_1404888618764_0001
14/07/09 14:50:57 INFO mapreduce.Job: Job job_1404888618764_0001 running in uber mode : false
14/07/09 14:50:57 INFO mapreduce.Job: map 0% reduce 0%
14/07/09 14:51:05 INFO mapreduce.Job: map 100% reduce 0%
14/07/09 14:51:15 INFO mapreduce.Job: map 100% reduce 100%
14/07/09 14:51:15 INFO mapreduce.Job: Job job_1404888618764_0001 completed successfully
14/07/09 14:51:16 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=94
FILE: Number of bytes written=185387
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1051
HDFS: Number of bytes written=43
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5812
Total time spent by all reduces in occupied slots (ms)=7023
Total time spent by all map tasks (ms)=5812
Total time spent by all reduce tasks (ms)=7023
Total vcore-seconds taken by all map tasks=5812
Total vcore-seconds taken by all reduce tasks=7023
Total megabyte-seconds taken by all map tasks=5951488
Total megabyte-seconds taken by all reduce tasks=7191552
Map-Reduce Framework
Map input records=9
Map output records=8
Map output bytes=72
Map output materialized bytes=94
Input split bytes=97
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=94
Reduce input records=8
Reduce output records=5
Spilled Records=16
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=154
CPU time spent (ms)=1450
Physical memory (bytes) snapshot=303112192
Virtual memory (bytes) snapshot=1685733376
Total committed heap usage (bytes)=136515584
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=954
File Output Format Counters
Bytes Written=43
Hadoop2.4.1入门实例:MaxTemperature的更多相关文章
- React 入门实例教程(转载)
本人转载自: React 入门实例教程
- struts入门实例
入门实例 1 .下载struts-2.3.16.3-all .不摆了.看哈就会下载了. 2 . 解压 后 找到 apps 文件夹. 3. 打开后将 struts2-blank.war ...
- Vue.js2.0从入门到放弃---入门实例
最近,vue.js越来越火.在这样的大浪潮下,我也开始进入vue的学习行列中,在网上也搜了很多教程,按着教程来做,也总会出现这样那样的问题(坑啊,由于网上那些教程都是Vue.js 1.x版本的,现在用 ...
- wxPython中文教程入门实例
这篇文章主要为大家分享下python编程中有关wxPython的中文教程,分享一些wxPython入门实例,有需要的朋友参考下 wxPython中文教程入门实例 wx.Window 是一个基类 ...
- Omnet++ 4.0 入门实例教程
http://blog.sina.com.cn/s/blog_8a2bb17d01018npf.html 在网上找到的一个讲解omnet++的实例, 是4.0下面实现的. 我在4.2上试了试,可以用. ...
- Spring中IoC的入门实例
Spring中IoC的入门实例 Spring的模块化是很强的,各个功能模块都是独立的,我们可以选择的使用.这一章先从Spring的IoC开始.所谓IoC就是一个用XML来定义生成对象的模式,我们看看如 ...
- Node.js入门实例程序
在使用Node.js创建实际“Hello, World!”应用程序之前,让我们看看Node.js的应用程序的部分.Node.js应用程序由以下三个重要组成部分: 导入需要模块: 我们使用require ...
- Java AIO 入门实例(转)
Java7 AIO入门实例,首先是服务端实现: 服务端代码 SimpleServer: public class SimpleServer { public SimpleServer(int port ...
- Akka入门实例
Akka入门实例 Akka 是一个用 Scala 编写的库,用于简化编写容错的.高可伸缩性的 Java 和 Scala 的 Actor 模型应用. Actor模型并非什么新鲜事物,它由Carl Hew ...
随机推荐
- [转]shell中 source命令即点空格后面再跟可执行文件的说明
这里记录的是在一个shell脚本里面使用. ./file.sh 和./file.sh 的区别,本文参考了http://www.lslnet.com/linux/dosc1/39/linux-28353 ...
- Python学习笔记九-文件读写
1,读取文件: f=open('目录','读写模式',encoding='gbk,error='egiong') 后三项可以不写但是默认是' r'读模式:open函数打开的文件对象会自动加上read( ...
- 可以让javascript加快的脚本(收藏了)
<?php ob_start('ob_gzhandler'); header("Cache-Control: public"); h ...
- 《Programming WPF》翻译 第7章 5.可视化层编程
原文:<Programming WPF>翻译 第7章 5.可视化层编程 形状元素能提供一种便利的方式与图形一起工作,在一些情形中,添加表示绘图的元素到UI树中,可能是比它的价值更加麻烦.你 ...
- poj 3321
题目链接 题意:一开始1-n都有苹果,Q查询以x为根下存在多少. 树状数组+DFS+队列转换 这题纠结了2天,一开始一点思路都没有,看大神都是吧树状数组转换成队列来做 看了好久都不知道怎么转换的, 解 ...
- UML--建模
建模公式 这种精华的东西,一定是值得研读和实践的! myself:人,事,物,规则. 人,业务主角.业务工人.参与者.如果应用到教务系统中,就是管理员,主任,老师的关系. 事,业务用例,系统用例. 物 ...
- 深入浅出Node.js (4) - 异步编程
4.1 函数式编程 4.1.1 高阶函数 4.1.2 偏函数用法 4.2 异步编程的优势与难点 4.2.1 优势 4.2.2 难点 4.3 异步编程解决方案 4.3.1 事件发布/订阅模式 4.3.2 ...
- PHP与C++的不同
由于工作需要,需要学习一下PHP,由于3年的C++背景,在刚开始学习PHP的过程中,有些不习惯,经过一段时间的学习,总结了一些PHP与C++的不同. 1.应用场景 在谈两种语言不同的时候,首先需要了解 ...
- 精确覆盖DLX算法模板另一种写法
代码 struct DLX { int n,id; int L[maxn],R[maxn],U[maxn],D[maxn]; ]; int H[ms]; ) //传列长 { n=nn; ;i<= ...
- errno的基本用法
error是一个包含在 perror()和strerrot()函数可以把errno的值转化为有意义的字符输出. #include <stdio.h> #include <stdlib ...