编写自已的第一个MapReduce程序

从进入系统学习到现在，貌似我们还没有真正开始动手写程序，估计有些立志成为Hadoop攻城狮的小伙伴们已经有些急了。环境已经搭好，小讲也有些按捺不住了。今天，小讲就和大家一起来动手编写我们的第一个MapReduce程序。

小讲曾说过，写Hadoop程序，核心就是Mapper类，Reudcer类，run()方法，很多时候照葫芦画瓢就行了，今天我们就照Hadoop程序基础模板这个葫芦来“画个瓢” —— 写个MapReduce程序。

Hadoop程序模板（葫芦）

数据源：来自美国成百上千个气象站的气象数据，其中一个气象站的几行示例数据如下:

1985 07 31 02   200    94 10137   220    26     1     0 -9999

1985 07 31 03   172    94 10142   240     0     0     0 -9999

1985 07 31 04   156    83 10148   260    10     0     0 -9999

1985 07 31 05   133    78 -9999   250     0 -9999     0 -9999

1985 07 31 06   122    72 -9999    90     0 -9999     0     0

1985 07 31 07   117    67 -9999    60     0 -9999     0 -9999

1985 07 31 08   111    61 -9999    90     0 -9999     0 -9999

1985 07 31 09   111    61 -9999    60     5 -9999     0 -9999

1985 07 31 10   106    67 -9999    80     0 -9999     0 -9999

1985 07 31 11   100    56 -9999    50     5 -9999     0 -9999

功能需求：基于这份数据，统计美国每个气象站30年的平均气温，部分输出结果如下：

03103    82        //03103代表气象站编号，82代表平均气温（华氏）

03812    128

03813    178

03816    143

03820    173

03822    189

03856    160

03860    130

03870    156

03872    108

Hadoop模板程序：

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.FileSplit;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

/**

 * 统计美国各个气象站30年来的平均气温

 */

public class Temperature extends Configured implements Tool {

    public static class TemperatureMapper extends Mapper< LongWritable, Text, Text, IntWritable> {

        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            //数据示例：1985 07 31 02  200 94 10137 220 26 1 0 -9999

            String line = value.toString(); //读取每行数据

            int temperature = Integer.parseInt(line.substring(14, 19).trim());//气温值

            if (temperature != -9999) { //过滤无效数据

                FileSplit fileSplit = (FileSplit) context.getInputSplit();

                //通过文件名称获取气象站id

                String weatherStationId = fileSplit.getPath().getName().substring(5, 10);

                //map 输出

                context.write(new Text(weatherStationId), new IntWritable(temperature));

            }

        }

    }

    public static class TemperatureReducer extends

            Reducer< Text, IntWritable, Text, IntWritable> {

        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable< IntWritable> values,

                Context context) throws IOException, InterruptedException {

            int sum = 0;

            int count = 0;

            //循环values,对统一气象站的所有气温值求和

            for (IntWritable val : values) {

                sum += val.get();

                count++;

            }

            //求每个气象站的平均值

            result.set(sum / count);

            //reduce输出  key=weatherStationId  value=mean(temperature)

            context.write(key, result);

        }

    }

    /**

     * @function 任务驱动方法

     * @param args

     * @return

     * @throws Exception

     */

    @Override

    public int run(String[] args) throws Exception {

        // TODO Auto-generated method stub

        Configuration conf = new Configuration();//读取配置文件

        Path mypath = new Path(args[1]);

        FileSystem hdfs = mypath.getFileSystem(conf);

        if (hdfs.isDirectory(mypath)) {//删除已经存在的输出目录

            hdfs.delete(mypath, true);

        }

        Job job = new Job(conf, "temperature");//新建一个任务

        job.setJarByClass(Temperature.class);// 主类

        FileInputFormat.addInputPath(job, new Path(args[0]));// 输入路径

        FileOutputFormat.setOutputPath(job, new Path(args[1]));// 输出路径

        job.setMapperClass(TemperatureMapper.class);// Mapper

        job.setReducerClass(TemperatureReducer.class);// Reducer

        job.setOutputKeyClass(Text.class);//输出结果的key类型

        job.setOutputValueClass(IntWritable.class);//输出结果的value类型

        job.waitForCompletion(true);//提交任务

        return 0;

    }

    /**

     * @function main 方法

     * @param args

     * @throws Exception

     */

    public static void main(String[] args) throws Exception {

        String[] args0 = {

                "hdfs://single.hadoop.dajiangtai.com:9000/weather/",

                "hdfs://single.hadoop.dajiangtai.com:9000/weather/out/"

                };

        int ec = ToolRunner.run(new Configuration(), new Temperature(), args0);

        System.exit(ec);

    }

}

编写自已的第一个MapReduce程序的更多相关文章

一起学Hadoop——使用IDEA编写第一个MapReduce程序(Java和Python)
上一篇我们学习了MapReduce的原理,今天我们使用代码来加深对MapReduce原理的理解. wordcount是Hadoop入门的经典例子,我们也不能免俗,也使用这个例子作为学习Hadoop的第 ...
HDFS设计思路，HDFS使用，查看集群状态，HDFS，HDFS上传文件，HDFS下载文件，yarn web管理界面信息查看，运行一个mapreduce程序，mapreduce的demo
26 集群使用初步 HDFS的设计思路 l 设计思想分而治之:将大文件.大批量文件,分布式存放在大量服务器上,以便于采取分而治之的方式对海量数据进行运算分析: l 在大数据系统中作用: 为各类分布式 ...
一劳永逸Java环境配置,以及编写我的第一个Java程序
Java环境配置,以及编写我的第一个Java程序配置步骤 1.下载jdk 2.安装步骤 3.配置环境 4.我的第一个Java程序配置步骤网上的教程有很多,方法也都不尽相同.今天我就分享一下我的配 ...
编写第一个MapReduce程序—— 统计气温
摘要:hadoop安装完成后,像学习其他语言一样,要开始写一个“hello world!” ,看了一些学习资料,模仿写了个程序.对于一个C#程序员来说,写个java程序,并调用hadoop的包,并跑在 ...
从零开始学习Hadoop--第2章第一个MapReduce程序
1.Hadoop从头说 1.1 Google是一家做搜索的公司做搜索是技术难度很高的活.首先要存储很多的数据,要把全球的大部分网页都抓下来,可想而知存储量有多大.然后,要能快速检索网页,用户输入几个 ...
java学习第一步，使用IntelliJ IDEA编写自己的第一个java程序
首先下载java的jdk,然后说一下IDEA的配置 IntelliJ IDEA目前公认的最好的java开发工具,不过一般的学校的教学还是使用eclipse来进行java的开发.所以老师一般只会教你如何 ...
第一个MapReduce程序——WordCount
通常我们在学习一门语言的时候,写的第一个程序就是Hello World.而在学习Hadoop时,我们要写的第一个程序就是词频统计WordCount程序. 一.MapReduce简介 1.1 MapRe ...
Hadoop学习之第一个MapReduce程序
期望通过这个mapreduce程序了解mapreduce程序执行的流程,着重从程序解执行的打印信息中提炼出有用信息. 执行前程序代码程序代码基本上是<hadoop权威指南>上原封不动 ...
运行第一个MapReduce程序，WordCount
1.安装Eclipse 安装后如果无法启动重新配置Java路径(如果之前配置了Java) 2.下载安装eclipse的hadoop插件注意版本对应,放到/uer/lib/eclipse/plugin ...

随机推荐

浅谈IM软件怎样建立安全socket连接、登录
----------------------------------------------------欢迎查看IM软件业务知识<专栏>-------------------------- ...
linux oracle配置开机启动
参考:http://jingyan.baidu.com/article/b2c186c8fe4306c46ef6ff16.html 先以root身份登录到linux系统, 1. 修改vi /etc/o ...
从获取点击事件根元素谈 target和currentTarget
事情由来: 写了一个点击事件,想获取根元素,想的直接用current就行了,因为之前就是这么用的,但是之前的点击元素是没子元素的,current就是根元素,但是这次点击元素内部有子元素,current ...
SVN服务器迁移，SVN版本库迁移(网络copy)
做法: 准备:系统平台:windows server 2003 版本库:vos 源服务器:10.10.13.48 目标服务器:10.10.13.129源SVN版本库的path: D:\svn\vos要 ...
正则表达式初识，re模块
作业收藏 # 3.reversed和sorted和list列表类型内置的sort.reverse有什么区别? #reversed 的返回值是一个迭代器并不会直接修改原列表 sorted的返回值是生成一 ...
Codeforces 460 D. Little Victor and Set
暴力+构造 If r - l ≤ 4 we can all subsets of size not greater than k. Else, if k = 1, obviously that ans ...
微信公众平台开发：进阶篇(Web App开发入门)
本文转载至:http://blog.csdn.net/yual365/article/details/16820805 WebApp与Native App有何区别呢? Native App: 1.开 ...
Java的版本历史与特性
一个比较流行的问题是,“Java下一个版本会有什么特性呢?” .这是否是个好问题却有待商榷.在下面的内容里,我总结了至今为止的Java主要发行版中各自引入的新特性,这样做的目的是为了突出各个新特性是在 ...
对django模型中的objects的理解
object是模型属性,用于模型对象和数据库交互. object=Manager()是管理器类型的对象,是model和数据库进行查询的接口可以自定义管理对象 books=models.Manager ...
C 和 C++ 的标准库分别有自己的 locale 操作方法，C 标准库的 locale 设定函数是 setlocale()，而 C++ 标准库有 locale 类和流对象的 imbue() 方法（gcc使用zh_CN.GBK，或者zh_CN.UTF-8，VC++使用Chinese_People's Republic of China.936或者65001.）
转自:http://zyxhome.org/wp/cc-prog-lang/c-stdlib-setlocale-usage-note/ [在此向原文作者说声谢谢!若有读者看到文章转载时请写该转载地址 ...

编写自已的第一个MapReduce程序

编写自已的第一个MapReduce程序的更多相关文章

随机推荐

热门专题