引用地址:http://www.cnblogs.com/lucius/p/3442381.html

examples:

Overview

This document explains how to write unit tests for your map reduce code, and testing your mapper and reducer logic on your desktop without having any Hadoop environment setup.

Let's look at some code

For testing your map and reduce logic, we will need 4 blocks of code: Mapper code, Reducer code, Driver code, and finally the Unit Testing code.

Sample Mapper

In our sample Mapper code, we are simply counting the frequency of words and emitting <word, 1=""> for each word found.

 1 package com.kodkast.analytics;
 2
 3 import org.apache.hadoop.io.LongWritable;
 4 import org.apache.hadoop.io.NullWritable;
 5 import org.apache.hadoop.io.Text;
 6 import org.apache.hadoop.mapred.OutputCollector;
 7 import org.apache.hadoop.mapred.Reporter;
 8 import org.apache.hadoop.mapred.MapReduceBase;
 9 import org.apache.hadoop.mapred.Mapper;
10 import org.apache.hadoop.mapred.JobConf;
11 import org.apache.log4j.Logger;
12
13 import java.lang.Runtime;
14 import java.io.*;
15
16 public class UnitTestDemoMapper extends MapReduceBase implements Mapper<Object, Text, Text, Text> {
17
18     public static final Logger Log = Logger.getLogger(UnitTestDemoMapper.class.getName());
19     private final static Text one = new Text("1");
20
21     public void configure(JobConf conf) {
22         // mapper initialization code, if needed
23     }
24
25     public void map(Object key, Text value, OutputCollector<Text, Text> collector, Reporter rep) throws IOException {
26         try {
27
28             String input = value.toString();
29             String[] words = processInput(input);
30
31             for(int i = 0; i < words.length; i++) {
32                 Text textInput = new Text(words[i]);
33                 collector.collect(textInput, one);
34             }
35
36         } catch(IOException e) {
37             e.printStackTrace();
38         }
39     }
40
41     private String[] processInput(String input) {
42         String words[] = input.split(" ");
43         return words;
44     }
45
46 }

Sample Reducer

In our sample Reducer code, we are simply adding all the word counts and emitting the final result as <word, totalfrequency=""> for each word.

 1 package com.kodkast.analytics;
 2
 3 import java.io.IOException;
 4 import java.util.Iterator;
 5
 6 import org.apache.hadoop.io.Text;
 7 import org.apache.hadoop.mapred.MapReduceBase;
 8 import org.apache.hadoop.mapred.OutputCollector;
 9 import org.apache.hadoop.mapred.Reducer;
10 import org.apache.hadoop.mapred.Reporter;
11
12 public class UnitTestDemoReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> {
13
14     public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
15
16         int count = 0;
17         while (values.hasNext()) {
18             String value = values.next().toString();
19             count += Integer.parseInt(value);
20         }
21         String countStr = "" + count;
22         output.collect(key, new Text(countStr));
23     }
24 }

Sample Driver

Simple invocation of Mapper and Reducer code.

 1 package com.kodkast.analytics;
 2
 3 import java.io.IOException;
 4 import java.util.*;
 5
 6 import org.apache.hadoop.fs.Path;
 7 import org.apache.hadoop.conf.*;
 8 import org.apache.hadoop.io.*;
 9 import org.apache.hadoop.mapred.*;
10 import org.apache.hadoop.util.*;
11
12 public class UnitTestDemo {
13
14     public static void main(String[] args) throws Exception {
15         JobConf conf = new JobConf(UnitTestDemo.class);
16         conf.setJobName("unit-test-demo");
17         conf.setOutputKeyClass(Text.class);
18         conf.setOutputValueClass(Text.class);
19         conf.setMapperClass(UnitTestDemoMapper.class);
20         conf.setReducerClass(UnitTestDemoReducer.class);
21         FileInputFormat.setInputPaths(conf, new Path(args[0]));
22         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
23         JobClient.runJob(conf);
24     }
25 }

Unit Testing Class

Now, this is the new class which we are adding to test our mapper and reducer logic using mrunit framework built on top of junit.

 1 package com.kodkast.analytics;
 2
 3 import java.util.ArrayList;
 4 import java.util.List;
 5 import java.io.*;
 6
 7 import org.apache.hadoop.io.LongWritable;
 8 import org.apache.hadoop.io.Text;
 9
10 import org.apache.hadoop.mrunit.MapDriver;
11 import org.apache.hadoop.mrunit.MapReduceDriver;
12 import org.apache.hadoop.mrunit.ReduceDriver;
13 import org.apache.hadoop.mapred.JobConf;
14 import org.junit.Before;
15 import org.junit.Test;
16
17 public class UnitTestDemoTest {
18
19     MapDriver<Object, Text, Text, Text> mapDriver;
20     ReduceDriver<Text, Text, Text, Text> reduceDriver;
21
22     @Before
23     public void setUp() {
24
25         // create mapper and reducer objects
26         UnitTestDemoMapper mapper = new UnitTestDemoMapper();
27         UnitTestDemoReducer reducer = new UnitTestDemoReducer();
28
29         // call mapper initialization code
30         mapper.configure(new JobConf());
31
32         // create mapdriver and reducedriver objects for unit testing
33         mapDriver = new MapDriver<Object, Text, Text, Text>();
34         mapDriver.setMapper(mapper);
35         reduceDriver = new ReduceDriver<Text, Text, Text, Text>();
36         reduceDriver.setReducer(reducer);
37     }
38
39     @Test
40     public void testMapper() {
41
42         // prepare mapper input
43         String input = "Hadoop is nice and Java is also very nice";
44
45         // test mapper logic
46         mapDriver.withInput(new LongWritable(1), new Text(input));
47         mapDriver.withOutput(new Text("Hadoop"), new Text("1"));
48         mapDriver.withOutput(new Text("is"), new Text("1"));
49         mapDriver.withOutput(new Text("nice"), new Text("1"));
50         mapDriver.withOutput(new Text("and"), new Text("1"));
51         mapDriver.withOutput(new Text("Java"), new Text("1"));
52         mapDriver.withOutput(new Text("is"), new Text("1"));
53         mapDriver.withOutput(new Text("also"), new Text("1"));
54         mapDriver.withOutput(new Text("very"), new Text("1"));
55         mapDriver.withOutput(new Text("nice"), new Text("1"));
56         mapDriver.runTest();
57     }
58
59     @Test
60     public void testReducer() {
61
62         // prepare mapper output values
63         List<Text> values = new ArrayList<Text>();
64         String mapperValues[] = "1,1".split(",");
65         for (int i = 0; i <= mapperValues.length - 1; i++) {
66             values.add(new Text(mapperValues[i]));
67         }
68
69         // test reducer logic
70         reduceDriver.withInput(new Text("nice"), values);
71         reduceDriver.withOutput(new Text("nice"), new Text("2"));
72         reduceDriver.runTest();
73     }
74
75 }
  • Add Unit tests for testing the Map Reduce logic

The use of this framework is quite straightforward, especially in our business case. So I will just show the unit test code and some comments if necessary but I think it is quite obvious how to use it.
The unit test for the Mapper ‘MapperTest’:

 1 package net.pascalalma.hadoop;
 2 import org.apache.hadoop.io.Text;
 3 import org.apache.hadoop.mrunit.mapreduce.MapDriver;
 4 import org.junit.Before;
 5 import org.junit.Test;
 6 import java.io.IOException;
 7 /**
 8 * Created with IntelliJ IDEA.
 9 * User: pascal
10 */
11 public class MapperTest {
12 MapDriver<Text, Text, Text, Text> mapDriver;
13 @Before
14 public void setUp() {
15 WordMapper mapper = new WordMapper();
16 mapDriver = MapDriver.newMapDriver(mapper);
17 }
18 @Test
19 public void testMapper() throws IOException {
20 mapDriver.withInput(new Text("a"), new Text("ein"));
21 mapDriver.withInput(new Text("a"), new Text("zwei"));
22 mapDriver.withInput(new Text("c"), new Text("drei"));
23 mapDriver.withOutput(new Text("a"), new Text("ein"));
24 mapDriver.withOutput(new Text("a"), new Text("zwei"));
25 mapDriver.withOutput(new Text("c"), new Text("drei"));
26 mapDriver.runTest();
27 }
28 }

This test class is actually even simpler than the Mapper implementation itself. You just define the input of the mapper and the expected output and then let the configured MapDriver run the test. In our case the Mapper doesn’t do anything specific but you see how easy it is to setup a testcase.
For completeness here is the test class of the Reducer:

 1 package net.pascalalma.hadoop;
 2 import org.apache.hadoop.io.Text;
 3 import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
 4 import org.junit.Before;
 5 import org.junit.Test;
 6 import java.io.IOException;
 7 import java.util.ArrayList;
 8 import java.util.List;
 9 /**
10 * Created with IntelliJ IDEA.
11 * User: pascal
12 */
13 public class ReducerTest {
14 ReduceDriver<Text, Text, Text, Text> reduceDriver;
15 @Before
16 public void setUp() {
17 AllTranslationsReducer reducer = new AllTranslationsReducer();
18 reduceDriver = ReduceDriver.newReduceDriver(reducer);
19 }
20 @Test
21 public void testReducer() throws IOException {
22 List<Text> values = new ArrayList<Text>();
23 values.add(new Text("ein"));
24 values.add(new Text("zwei"));
25 reduceDriver.withInput(new Text("a"), values);
26 reduceDriver.withOutput(new Text("a"), new Text("|ein|zwei"));
27 reduceDriver.runTest();
28 }
29 }

Debugging MapReduce Programs With MRUnit

The distributed nature of MapReduce programs makes debugging a challenge. Attaching a debugger to a remote process is cumbersome, and the lack of a single console makes it difficult to inspect what is occurring when several distributed copies of a mapper or reducer are running concurrently. Furthermore, operations that work on small amounts of input (e.g., saving the inputs to a reducer in an array) fail when running at scale, causing out-of-memory exceptions or other unintended effects.

A full discussion of how to debug MapReduce programs is beyond the scope of a single blog post, but I’d like to introduce you to a tool we designed at Cloudera to assist you with MapReduce debugging: MRUnit.

MRUnit helps bridge the gap between MapReduce programs and JUnit by providing a set of interfaces and test harnesses, which allow MapReduce programs to be more easily tested using standard tools and practices.

While this doesn’t solve the problem of distributed debugging, many common bugs in MapReduce programs can be caught and debugged locally. For this purpose, developers often try to use JUnit to test their MapReduce programs. The current state of the art often involves writing a set of tests that each create a JobConf object, which is configured to use a mapper and reducer, and then set to use the LocalJobRunner (viaJobConf.set(”mapred.job.tracker”, “local”)). A MapReduce job will then run in a single thread, reading its input from test files stored on the local filesystem and writing its output to another local directory.

This process provides a solid mechanism for end-to-end testing, but has several drawbacks. Developing new tests requires adding test inputs to files that are stored alongside one’s program. Validating correct output also requires filesystem access and parsing of the emitted data files. This involves writing a great deal of test harness code, which itself may contain subtle bugs. Finally, this process is slow. Each test requires several seconds to run. Users often find themselves aggregating several unrelated inputs into a single test (violating a unit testing principle of isolating unrelated tests) or performing less exhaustive testing due to the high barriers to test authorship.

The easiest way to test MapReduce programs is to include as little Hadoop-specific code as possible in one’s application. Parsers can operate on instances of String instead of Text, and mappers should instantiate instances of MySpecificParser to tokenize input data rather than embed parsing code in the body of MyMapper.map(). Your MySpecificParser implementation can then be tested with ordinary JUnit tests. Another class or method could then be used to perform processing on parsed lines.

But even with those components separately tested, your map() and reduce() calls should still be tested individually, as the composition of separate classes may cause unintended bugs to surface. MRUnit provides test drivers that accept programmatically specified inputs and outputs, which validate the correct behavior of mappers and reducers in isolation, as well as when composed in a MapReduce job. For instance, the following code checks whether the IdentityMapper emits the same (key, value) pair as output that it receives as input:

 1 import junit.framework.TestCase;
 2
 3 import org.apache.hadoop.io.Text;
 4 import org.apache.hadoop.mapred.Mapper;
 5 import org.apache.hadoop.mapred.lib.IdentityMapper;
 6 import org.junit.Before;
 7 import org.junit.Test;
 8
 9 public class TestExample extends TestCase {
10
11   private Mapper mapper;
12   private MapDriver driver;
13
14   @Before
15   public void setUp() {
16     mapper = new IdentityMapper();
17     driver = new MapDriver(mapper);
18   }
19
20   @Test
21   public void testIdentityMapper() {
22     driver.withInput(new Text("foo"), new Text("bar"))
23             .withOutput(new Text("foo"), new Text("bar"))
24             .runTest();
25   }
26 }

The MapDriver orchestrates the test process, feeding the input (“foo” and “bar”) record to the IdentityMapper when its runTest() method is called. It also passes a mock OutputCollector implementation to the mapper. The driver then validates the output received by the OutputCollectoragainst the expected output (”foo” and “bar”) record. If the actual and expected outputs mismatch, a JUnit assertion failure is raised, informing the developer of the error. More test drivers exist for testing individual reducers, as well as mapper/reducer compositions.

End-to-end tests involving JobConf configuration code, InputFormat and OutputFormat implementations, filesystem access, and larger scale testing are still necessary. But many errors can be quickly identified with small tests involving a single, well-chosen input record, and a suite of regression tests allows correct behavior to be assured in the face of ongoing changes to your data processing pipeline. We hope MRUnit helps your organization test code, find bugs, and improve its use of Hadoop by facilitating faster and more thorough test cycles.

MRUnit is open source and is included in Cloudera’s Distribution for Hadoop. For more information about MRUnit, including where to get it and how to use its API, see the MRUnit documentation page.

How to run MRUnit with Command line?

注意: 需要下载MRUnit并编译,之后修改HADOOPHOME/libexec/hadoop−config.sh,将MRUnit_HOME/lib/*.jar添加进去, 之后source $HADOOP_HOME/libexec/hadoop-config.sh,再执行下面操作:

javac  -d class/  MaxTemperatureMapper.java  MaxTemperatureMapperTest.java
jar -cvf test.jar -C class ./
java -cp test.jar:$CLASSPATH org.junit.runner.JUnitCore  MaxTemperatureMapperTest  # or
yarn -cp test.jar:$CLASSPATH org.junit.runner.JUnitCore  MaxTemperatureMapperTest

hadoop2.2编程:MRUnit测试的更多相关文章

  1. hadoop之计数器和管道的mrunit测试

    引言 hadoop的调试真心让人灰常恼火,而且从企业实际出发,集群的资源是有限的,不可能在集群上跑一遍又一遍根据log去调试代码,那么使用MRUnit编写测试单元,显得尤为重要.MRUnit中的Map ...

  2. IC开短路测试(open_short_test),编程器测试接触不良、开短路

    http://kitebee.meibu.com/forum.php?mod=viewthread&tid=69654&extra=page%3D5 IC开短路测试(open_shor ...

  3. linux下libnet编程 亲自测试可用

    linux下libnet编程 亲自测试可用 亲自测试  如果build包的时候 只要把类型改了 就能改成相应的协议. 0x0800 ip 0x0806 arp 0x86DD    IPv6 0x86e ...

  4. hadoop2.2编程:MRUnit——Test MaxTemperatureMapper

    继承关系1 1. java.lang.Object |__ org.apache.hadoop.mapreduce.JobContext |__org.apache.hadoop.mapreduce. ...

  5. hadoop2.2编程:MRUnit

    examples: Overview This document explains how to write unit tests for your map reduce code, and test ...

  6. 使用IDEA2017在Windows下编程并测试Hadoop2.7+Spark2.2+Azkaban

    1. 下载好IDEA HADOOP SPARK 首先,配置IDEA, 在插件管理中使用IDEA在线库安装scala插件, 在在线库直接搜索即可; 其次,配置Maven选项, 将Maven添加到IDEA ...

  7. 初识-----基于Socket的UDP和TCP编程及测试代码

    一.概述 TCP(传输控制协议)和UDP(用户数据报协议是网络体系结构TCP/IP模型中传输层一层中的两个不同的通信协议. TCP:传输控制协议,一种面向连接的协议,给用户进程提供可靠的全双工的字节流 ...

  8. hadoop2.2编程:各种API

    hadoop2.2 API http://hadoop.apache.org/docs/r0.23.9/api/index.html junit API http://junit.org/javado ...

  9. Hadoop2.4.x 实例测试 WordCount程序

     在实例测试前先确保hadoop 启动正确 Master.Hadoop: word 1[hadoop@Master input]$ jps6736 Jps6036 NameNode4697 Secon ...

随机推荐

  1. ASP.NET知识总结(8.AJAX异步)

    AJAX:”Asynchronous JavaScript and XML” 中文意思:异步JavaScript和XML. 指一种创建交互式网页应用的网页开发技术.   不是指一种单一的技术,而是有机 ...

  2. 病毒四度升级:安天AVL Team揭露一例跨期两年的电信诈骗进化史

    自2014年9月起,安天AVL移动安全团队持续检测到一类基于Android移动平台的间谍类病毒,病毒样本大多伪装成名为"最高人民检察院"的应用.经过反编译逆向分析以及长期的跟踪调查 ...

  3. 第三十三篇:使用uiresImporter生成uires.idx及skin.xml

    在SOUI中,使用uires.idx这个文件来记录程序中使用的所有资源文件. 此外绘制对象(ISkinObj)则一般放在skin.xml中描述. 要向一个界面中增加一个新的图片,在没有uiresImp ...

  4. (UWP开发)更为合理的一种ListView下拉刷新(PullToRefresh)实现方法

    最近在做的一个项目需要用到下拉刷新,但是参考了现在网络上比较普遍的方法,觉得都不太好,因为要在外部套上一个SrollViewer,容易出现滚动错误.于是刚开始的时候就把思路定到了ListView内部的 ...

  5. Swift 基本常量和变量,基本数据类型

    p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Menlo; color: #4dbf56 } p.p2 { margin: 0.0px 0. ...

  6. Chrome 中的彩蛋,一款小游戏,你知道吗?

    今天看到一篇文章,介绍chrome中的彩蛋,带着好奇心进去看了一眼,没想到发现了一款小游戏,个人觉得还不错,偶尔可以玩一下,放松放松心情!^_^ 当 Chrome 无法连接到互联网时, 或者上着网突然 ...

  7. ChannelHandler

    ChannelHandler功能介绍 ChannelHandler类似于Servlet的Filter过滤器,负责对I/O事件或者I/O操作进行拦截和处理,它可以选择性地拦截和处理自己感兴趣的事件,也可 ...

  8. Json格式示意图

    json视图工具:http://www.bejson.com/jsonviewernew/ 一.Json格式化,(看到数组里面又有数组一下子疑问不是合格json):尾门地址查询: =>=> ...

  9. 普林斯顿算法课第四周作业_8Puzzle

    作业地址:http://coursera.cs.princeton.edu/algs4/assignments/8puzzle.html 作业难点: 1.如何求一个Puzzle的解? 根据作业提示,使 ...

  10. SPOJ DQUERY D-query(主席树)

    题目 Source http://www.spoj.com/problems/DQUERY/en/ Description Given a sequence of n numbers a1, a2, ...