前一阵子參加炼数成金的MapReduce培训，培训中的作业样例比較有代表性，用于解释问题再好只是了。

有一本国外的有关MR的教材，比較有用。点此下载。

一.MapReduce应用场景

MR能解决什么问题？一般来说，用的最多的应该是日志分析，海量数据排序处理。近期一段时间公司用MR来解决大量日志的离线并行分析问题。

二.MapReduce机制

对于不熟悉MR工作原理的同学，推荐大家先去看一篇博文：http://blog.csdn.net/athenaer/article/details/8203990

三.经常使用计算模型

这里举一个样例。数据表在Oracle默认用户Scott下有DEPT表和EMP表。为方便，如今直接写成两个TXT文件例如以下：

1.部门表

DEPTNO,DNAME,LOC // 部门号。部门名称，所在地

10,ACCOUNTING,NEW YORK

20,RESEARCH,DALLAS

30,SALES,CHICAGO

40,OPERATIONS,BOSTON

2.员工表

EMPNO,ENAME,JOB,HIREDATE,SAL,COMM,DEPTNO,MGR // 员工号，英文名，职位，聘期。工资，奖金，所属部门，管理者

7369,SMITH,CLERK,1980-12-17 00:00:00.0,800,,20,7902

7499,ALLEN,SALESMAN,1981-02-20 00:00:00.0,1600,300,30,7698

7521,WARD,SALESMAN,1981-02-22 00:00:00.0,1250,500,30,7698

7566,JONES,MANAGER,1981-04-02 00:00:00.0,2975,,20,7839

7654,MARTIN,SALESMAN,1981-09-28 00:00:00.0,1250,1400,30,7698

7698,BLAKE,MANAGER,1981-05-01 00:00:00.0,2850,,30,7839

7782,CLARK,MANAGER,1981-06-09 00:00:00.0,2450,    ,10,7839

7839,KING,PRESIDENT,1981-11-17 00:00:00.0,5000,,10,

7844,TURNER,SALESMAN,1981-09-08 00:00:00.0,1500,0,30,7698

7900,JAMES,CLERK,1981-12-03 00:00:00.0,950,,30,7698

7902,FORD,ANALYST,1981-12-03 00:00:00.0,3000,,20,7566

7934,MILLER,CLERK,1982-01-23 00:00:00.0,1300,,10,7782

3.实例化为bean

这两个bean的实际作用都是切割传入的字符串，从字符串内得到所属的属性信息。

emp.java

public Emp(String inStr) {

		String[] split = inStr.split(",");

		this.empno = (split[0].isEmpty()? "" : split[0]);

		this.ename = (split[1].isEmpty() ?

"" : split[1]);

		this.job = (split[2].isEmpty() ? "" : split[2]);

		this.hiredate = (split[3].isEmpty() ? "" : split[3]);

		this.sal = (split[4].isEmpty() ?

"0" : split[4]);

		this.comm = (split[5].isEmpty() ? "" : split[5]);

		this.deptno = (split[6].isEmpty() ? "" : split[6]);

		try {

			this.mgr = (split[7].isEmpty() ? "" : split[7]);

		} catch (IndexOutOfBoundsException e) {     //防止最后一位为空的情况

			this.mgr = "";

		}

}

dept.java

public Dept(String string) {

		String[] split = string.split(",");

		this.deptno = split[0];

		this.dname = split[1];

		this.loc = split[2];

	}

4.模型分析

4.1 求和

求各个部门的总工资

public static class Map_1 extends MapReduceBase implements Mapper<Object, Text, Text, IntWritable> {

		public void map(Object key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

			try {

				Emp emp = new Emp(value.toString());

 				output.collect(new Text(emp.getDeptno()), new IntWritable(Integer.parseInt(emp.getSal())));  // { k=部门号，v=员工薪资}

			} catch (Exception e) {

			reporter.getCounter(ErrCount.LINESKIP).increment(1);

			WriteErrLine.write("./input/" + this.getClass().getSimpleName() + "err_lines", reporter.getCounter(ErrCount.LINESKIP).getCounter() + " " + value.toString());

			}

		}

	}

	public static class Reduce_1 extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

		public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

			int sum = 0;

			while (values.hasNext()) {

				sum = sum + values.next().get();

			}

			output.collect(key, new IntWritable(sum));

		}

	}

执行结果：

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" />

4.3 平均值

求各个部门的人数和平均工资

public static class Map_2 extends MapReduceBase implements Mapper<Object, Text, Text, IntWritable> {

		public void map(Object key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

			try {

				Emp emp = new Emp(value.toString());

				output.collect(new Text(emp.getDeptno()), new IntWritable(Integer.parseInt(emp.getSal())));  //{ k=部门号，v=薪资}

			} catch (Exception e) {

				reporter.getCounter(ErrCount.LINESKIP).increment(1);

				WriteErrLine.write("./input/" + this.getClass().getSimpleName() + "err_lines", reporter.getCounter(ErrCount.LINESKIP).getCounter() + " " + value.toString());

			}

		}

	}

	public static class Reduce_2 extends MapReduceBase implements Reducer<Text, IntWritable, Text, Text> {

		public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

			double sum = 0; //部门工资

			int count =0 ; //人数

			while (values.hasNext()) {

				count++;

				sum = sum + values.next().get();

			}

			output.collect(key, new Text( count+" "+sum/count));

		}

	}

执行结果

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" />

4.4 分组排序

求每一个部门最早进入公司的员工姓名

	public static class Map_3 extends MapReduceBase implements Mapper<Object, Text, Text, Text> {

		public void map(Object key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

			try {

				Emp emp = new Emp(value.toString());

				output.collect(new Text(emp.getDeptno()), new Text(emp.getHiredate() + "~" + emp.getEname())); // { k=部门号。v=聘期}

			} catch (Exception e) {

				reporter.getCounter(ErrCount.LINESKIP).increment(1);

				WriteErrLine.write("./input/" + this.getClass().getSimpleName() + "err_lines", reporter.getCounter(ErrCount.LINESKIP).getCounter() + " " + value.toString());

			}

		}

	}

	public static class Reduce_3 extends MapReduceBase implements Reducer<Text, Text, Text, Text> {

		public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

			DateFormat sdf = DateFormat.getDateInstance();

			Date minDate = new Date(9999, 12, 30);

			Date d;

			String[] strings = null;

			while (values.hasNext()) {

				try {

					strings = values.next().toString().split("~"); // 获取名字和日期

					d = sdf.parse(strings[0].toString().substring(0, 10));

					if (d.before(minDate)) {

						minDate = d;

					}

				} catch (ParseException e) {

					e.printStackTrace();

				}

			}

			output.collect(key, new Text(minDate.toLocaleString() + " " + strings[1]));

		}

	}

执行结果

4.5 多表关联

求各个城市的员工的总工资

public static class Map_4 extends MapReduceBase implements Mapper<Object, Text, Text, Text> {

		public void map(Object key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

			try {

				String fileName = ((FileSplit) reporter.getInputSplit()).getPath().getName();

				if (fileName.equalsIgnoreCase("emp.txt")) {

					Emp emp = new Emp(value.toString());

					output.collect(new Text(emp.getDeptno()), new Text("A#" + emp.getSal()));

				}

				if (fileName.equalsIgnoreCase("dept.txt")) {

					Dept dept = new Dept(value.toString());

					output.collect(new Text(dept.getDeptno()), new Text("B#" + dept.getLoc()));

				}

			} catch (Exception e) {

				reporter.getCounter(ErrCount.LINESKIP).increment(1);

				WriteErrLine.write("./input/" + this.getClass().getSimpleName() + "err_lines", reporter.getCounter(ErrCount.LINESKIP).getCounter() + " " + value.toString());

			}

		}

	}

	public static class Reduce_4 extends MapReduceBase implements Reducer<Text, Text, Text, Text> {

		public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

			String deptV;

			Vector<String> empList = new Vector<String>(); // 保存EMP表的工资数据

			Vector<String> deptList = new Vector<String>(); // 保存DEPT表的位置数据

			while (values.hasNext()) {

				deptV = values.next().toString();

				if (deptV.startsWith("A#")) {

					empList.add(deptV.substring(2));

				}

				if (deptV.startsWith("B#")) {

					deptList.add(deptV.substring(2));

				}

			}

			double sumSal = 0;

			for (String location : deptList) {

				for (String salary : empList) {

					//每一个城市员工工资总和

					sumSal = Integer.parseInt(salary) + sumSal;

				}

				output.collect(new Text(location), new Text(Double.toString(sumSal)));

			}

		}

	}

执行结果

4.6 单表关联

工资比上司高的员工姓名及其工资

public static class Map_5 extends MapReduceBase implements Mapper<Object, Text, Text, Text> {

		public void map(Object key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

			try {

				Emp emp = new Emp(value.toString());

				output.collect(new Text(emp.getMgr()), new Text("A#" + emp.getEname() + "~" + emp.getSal()));  // 员工表 { k=上司名。v=员工工资}

				output.collect(new Text(emp.getEmpno()), new Text("B#" + emp.getEname() + "~" + emp.getSal()));// “经理表” { k=员工名，v=员工工资}

			} catch (Exception e) {

				reporter.getCounter(ErrCount.LINESKIP).increment(1);

				WriteErrLine.write("./input/" + this.getClass().getSimpleName() + "err_lines", reporter.getCounter(ErrCount.LINESKIP).getCounter() + " " + value.toString());

			}

		}

	}

	public static class Reduce_5 extends MapReduceBase implements Reducer<Text, Text, Text, Text> {

		public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

			String value;

			Vector<String> empList = new Vector<String>(); // 员工表

			Vector<String> mgrList = new Vector<String>(); // 经理表

			while (values.hasNext()) {

				value = values.next().toString();

				if (value.startsWith("A#")) {

					empList.add(value.substring(2));

				}

				if (value.startsWith("B#")) {

					mgrList.add(value.substring(2));

				}

			}

			String empName, empSal, mgrSal;

			for (String emploee : empList) {

				for (String mgr : mgrList) {

					String[] empInfo = emploee.split("~");

					empName = empInfo[0];

					empSal = empInfo[1];

					String[] mgrInfo = mgr.split("~");

					mgrSal = mgrInfo[1];

					if (Integer.parseInt(empSal) > Integer.parseInt(mgrSal)) {

						output.collect(key, new Text(empName + " " + empSal));

					}

				}

			}

		}

	}

执行结果

4.7 TOP N

列出工资最高的头三名员工姓名及其工资

public static class Map_8 extends MapReduceBase implements Mapper<Object, Text, Text, Text> {

		public void map(Object key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

			try {

				Emp emp = new Emp(value.toString());

				output.collect(new Text("1"), new Text(emp.getEname() + "~" + emp.getSal()));    // { k=任意字符串或数字，v=员工名字+薪资}

			} catch (Exception e) {

				reporter.getCounter(ErrCount.LINESKIP).increment(1);

				WriteErrLine.write("./input/" + this.getClass().getSimpleName() + "err_lines", reporter.getCounter(ErrCount.LINESKIP).getCounter() + " " + value.toString());

			}

		}

	}

	public static class Reduce_8 extends MapReduceBase implements Reducer<Text, Text, Text, Text> {

		public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

			Map<Integer, String> emp = new TreeMap<Integer, String>();   // TreeMap默认key升序排列，巧妙利用这点能够实现top N

			while (values.hasNext()) {

				String[] valStrings = values.next().toString().split("~");

				emp.put(Integer.parseInt(valStrings[1]), valStrings[0]);

			}

			int count = 0; // 计数器

			for (Iterator<Integer> keySet = emp.keySet().iterator(); keySet.hasNext();) {

				if (count < 3) {  //  N =3

					Integer current_key = keySet.next();

					output.collect(new Text(emp.get(current_key)), new Text(current_key.toString())); // 迭代key，即SAL

					count++;

				} else {

					break;

				}

			}

		}

	}

运算结果

4.8 降序排序

将全体员工依照总收入（工资+提成）从高到低排列。要求列出姓名及其总收入

public static class Map_9 extends MapReduceBase implements Mapper<Object, Text, Text, Text> {

		public void map(Object key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

			try {

				Emp emp = new Emp(value.toString());

				int totalSal = Integer.parseInt(emp.getComm()) + Integer.parseInt(emp.getSal());

				output.collect(new Text("1"), new Text(emp.getEname() + "~" + totalSal));

			} catch (Exception e) {

				reporter.getCounter(ErrCount.LINESKIP).increment(1);

				WriteErrLine.write("./input/" + this.getClass().getSimpleName() + "err_lines", reporter.getCounter(ErrCount.LINESKIP).getCounter() + " " + value.toString());

			}

		}

	}

	public static class Reduce_9 extends MapReduceBase implements Reducer<Text, Text, Text, Text> {

		public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {

			Map<Integer, String> emp = new TreeMap<Integer, String>(

			// 重写比較器，使降序排列

					new Comparator<Integer>() {

						public int compare(Integer o1, Integer o2) {

							return o2.compareTo(o1);

						}

					});

			while (values.hasNext()) {

				String[] valStrings = values.next().toString().split("~");

				emp.put(Integer.parseInt(valStrings[1]), valStrings[0]);

			}

			for (Iterator<Integer> keySet = emp.keySet().iterator(); keySet.hasNext();) {

				Integer current_key = keySet.next();

				output.collect(new Text(emp.get(current_key)), new Text(current_key.toString())); // 迭代key，即SAL

			}

		}

	}

执行结果

四.总结

把sql里经常使用的计算模型写成MR是一件比較麻烦的事，由于非常多情况下一行sql预计要十几甚至几十行代码来实现，略显笨拙。可是从数据计算速度来说，MR跟sql不是一个级别的。

但不可否认的一点是。不管是什么技术都有各自的适用范围，MR不是万能的。详细要看使用场景再选择适当的技术。

【MapReduce】经常使用计算模型具体解释的更多相关文章

重要 | Spark和MapReduce的对比，不仅仅是计算模型？
[前言:笔者将分上下篇文章进行阐述Spark和MapReduce的对比,首篇侧重于"宏观"上的对比,更多的是笔者总结的针对"相对于MapReduce我们为什么选择Spar ...
MapReduce 计算模型
前言本文讲解Hadoop中的编程及计算模型MapReduce,并将给出在MapReduce模型下编程的基本套路. 模型架构在Hadoop中,用于执行计算任务(MapReduce任务)的机器有两个角 ...
MapReduce计算模型
MapReduce计算模型 MapReduce两个重要角色:JobTracker和TaskTracker. MapReduce Job 每个任务初始化一个Job,没个Job划分为两个阶段:Map和 ...
MapReduce计算模型的优化
MapReduce 计算模型的优化涉及了方方面面的内容,但是主要集中在两个方面:一是计算性能方面的优化:二是I/O操作方面的优化.这其中,又包含六个方面的内容. 1.任务调度任务调度是Hadoop中 ...
第四篇：MapReduce计算模型
前言本文讲解Hadoop中的编程及计算模型MapReduce,并将给出在MapReduce模型下编程的基本套路. 模型架构在Hadoop中,用于执行计算任务(MapReduce任务)的机器有两个角 ...
MapReduce计算模型二
之前写过关于Hadoop方面的MapReduce框架的文章MapReduce框架Hadoop应用(一) 介绍了MapReduce的模型和Hadoop下的MapReduce框架,此文章将进一步介绍map ...
【CDN+】 Spark入门---Handoop 中的MapReduce计算模型
前言项目中运用了Spark进行Kafka集群下面的数据消费,本文作为一个Spark入门文章/笔记,介绍下Spark基本概念以及MapReduce模型 Spark的基本概念: 官网: http://s ...
性能测试学习之二 ——性能测试模型（PV计算模型）
PV计算模型现有的PV计算公式是: 每台服务器每秒平均PV量 =( (总PV*80%)/(24*60*60*40%))/服务器数量 =2*(总PV)/* (24*60*60) /服务器数量通过定积 ...
Spark计算模型
[TOC] Spark计算模型 Spark程序模型一个经典的示例模型 SparkContext中的textFile函数从HDFS读取日志文件,输出变量file var file = sc.textF ...

随机推荐

vue的钩子函数
1.computed 计算属性计算属性将被混入到 Vue 实例中.所有 getter 和 setter 的 this 上下文自动地绑定为 Vue 1..aPlus: { get: function ...
AtCoder Beginner Contest 067 D - Fennec VS. Snuke
D - Fennec VS. Snuke Time limit : 2sec / Memory limit : 256MB Score : 400 points Problem Statement F ...
try{futureGirl}catch(Exception){"Kill All Trouble"}——echarts样式
首先先给未来女,解释一下题目吧.这是段代码,我再try{}括号里写了你,意思我会保护你.后面的catch(Exception)是捕捉你的所有麻烦,交给我解决. 今天收工较早,拖着疲惫是身躯回到宿舍,简 ...
Vue 全局过滤和局部过滤
局部过滤器(放在组件里) filters: { //局部过滤器 FormattingMoney:value=>{ return value==null? '0' : value/100 } }, ...
超好用的谷歌浏览器、Sublime Text、Phpstorm、油猴插件合集
原文:超好用的谷歌浏览器.Sublime Text.Phpstorm.油猴插件合集 - 『精品软件区』 - 吾爱破解 - LCG - LSG |安卓破解|病毒分析|破解软件|www.52pojie.c ...
JS关键字 import
今天开发时使用import作为方法名,报错后查明报错原因:import是js中的关键字,在取方法名时不能取import
cogs 32. [POI1999] 位图
32. [POI1999] 位图 ★ 输入文件:bit.in 输出文件:bit.out 简单对比时间限制:1 s 内存限制:128 MB [问题描述 ] 给定一个 n*m 的矩形位图, ...
leetCode 85.Maximal Rectangle （最大矩阵）解题思路和方法
Given a 2D binary matrix filled with 0's and 1's, find the largest rectangle containing all ones and ...
误操作 rpm -e --nodeps zlib
误删缘由:目的是要升级ssh版本,结果好像是冥冥之中有股力量在作祟迫使我粘了一条致死的命令rpm -e --nodeps zlib就执行了,奇怪的是执行之后根本就全然不知.最后在敲rpm命令时居然报 ...
MyBatis自动生成代码之generatorConfig配置文件及其详细解读
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE generatorConfiguratio ...

【MapReduce】经常使用计算模型具体解释

一.MapReduce应用场景

二.MapReduce机制

三.经常使用计算模型

1.部门表

2.员工表

3.实例化为bean

4.模型分析

4.1 求和

4.3 平均值

4.4 分组排序

4.5 多表关联

4.6 单表关联

4.7 TOP N

4.8 降序排序

四.总结

【MapReduce】经常使用计算模型具体解释的更多相关文章

随机推荐

热门专题