Reduce Task的学习笔记

MapReduce五大过程已经分析过半了。上次分析完Map的过程，着实花费了我的非常多时间。只是收获非常大，值得了额，这次用相同的方法分析完了Reduce的过程，也算是彻底摸透了MapReduce思想的2个最最重要的思想了吧。

好，废话不多，切入正题，在学习Reduce过程分析的之前，我特意查了书籍上或网络上相关的资料。我发现非常大都是大同小异。缺乏对于源代码的參照分析。所以我个人觉得。我了能够在某些细节上讲得跟明确些，或许会比較好。由于Map和Reduce的过程的总体流程是非常相近的，假设你看过之前我写的Map
Task的分析，相信你也能非常快理解我的Reduce过程的分析的。

Reduce过程的集中表现体现于Reduce Task中，Reduce Task与Map Reduce一样，分为Job-setup Task, Job-cleanup Task, Task-cleanup Task和Reduce Task。我分析的主要是最后一个Reduce Task 。

Reduce Task 主要分为5个阶段:

Shuffle------------------->Merge------------------->Sort------------------->Reduce------------------->Write

当中最重要的部分为前3部分，我也会花最多的时间描写叙述前面3个阶段的任务。

Shuffle阶段。我们知道，Reduce的任务在最最開始的时候，就是接收Map任务中输出的中间结果的数据，key-value依据特定的分区算法，给对应的Reduce任务做处理，所以这时须要Reduce任务去远程拷贝Map输出的中间数据了，这个过程就称作Shuffle阶段，所以这个阶段也称为Copy阶段。在Shuffle阶段中。有个GetMapEventsThread。会定期发送RPC请求。获取远程运行好的Map
Task的列表，把他们的输出location映射到mapLocation中。

....

        	//GetMapEventsThread线程是远程调用获得已经完毕的Map任务的列表

            int numNewMaps = getMapCompletionEvents();

            if (LOG.isDebugEnabled()) {

              if (numNewMaps > 0) {

                LOG.debug(reduceTask.getTaskID() + ": " +

                    "Got " + numNewMaps + " new map-outputs");

              }

            }

            Thread.sleep(SLEEP_TIME);

          }

进入getMapCompletionEvents方法。继续看:

...

        for (TaskCompletionEvent event : events) {

          switch (event.getTaskStatus()) {

            case SUCCEEDED:

            {

              URI u = URI.create(event.getTaskTrackerHttp());

              String host = u.getHost();

              TaskAttemptID taskId = event.getTaskAttemptId();

              URL mapOutputLocation = new URL(event.getTaskTrackerHttp() +

                                      "/mapOutput?job=" + taskId.getJobID() +

                                      "&map=" + taskId +

                                      "&reduce=" + getPartition());

              List<MapOutputLocation> loc = mapLocations.get(host);

              if (loc == null) {

                loc = Collections.synchronizedList

                  (new LinkedList<MapOutputLocation>());

                mapLocations.put(host, loc);

               }

              //loc中加入新的已经完毕的，mapOutputLocation,mapLocations是全局共享的

              loc.add(new MapOutputLocation(taskId, host, mapOutputLocation));

              numNewMaps ++;

            }

            break;

            ....

为了避免出现网络热点。Reduce Task对输出的位置进行了混洗的操作。然后保存到scheduleCopies中，兴许的拷贝操作都是环绕着这个列表进行的。这个变量保存在了一个叫ReduceCopier的类里面。确认拷贝的目标位置。还仅仅是Shuffle阶段的前半部分，这时看一下，运行的入口代码在哪里。回到Reduce
Task的入口run()代码:

public void run(JobConf job, final TaskUmbilicalProtocol umbilical)

    throws IOException, InterruptedException, ClassNotFoundException {

    this.umbilical = umbilical;

    job.setBoolean("mapred.skip.on", isSkipping());

    if (isMapOrReduce()) {

      //设置不同阶段任务的进度

      copyPhase = getProgress().addPhase("copy");

      sortPhase  = getProgress().addPhase("sort");

      reducePhase = getProgress().addPhase("reduce");

    }

    // start thread that will handle communication with parent

    //创建Task任务报告，与父进程进行联系沟通

    TaskReporter reporter = new TaskReporter(getProgress(), umbilical,

        jvmContext);

    reporter.startCommunicationThread();

    //推断是否使用的是新的额API

    boolean useNewApi = job.getUseNewReducer();

    initialize(job, getJobID(), reporter, useNewApi);

    // check if it is a cleanupJobTask

    //和map任务一样，Task有4种，Job-setup Task, Job-cleanup Task, Task-cleanup Task和ReduceTask

    if (jobCleanup) {

      //这里运行的是Job-cleanup Task

      runJobCleanupTask(umbilical, reporter);

      return;

    }

    if (jobSetup) {

      //这里运行的是Job-setup Task

      runJobSetupTask(umbilical, reporter);

      return;

    }

    if (taskCleanup) {

      //这里运行的是Task-cleanup Task

      runTaskCleanupTask(umbilical, reporter);

      return;

    }

    /*  后面的内容就是開始运行Reduce的Task */

    // Initialize the codec

    codec = initCodec();

    boolean isLocal = "local".equals(job.get("mapred.job.tracker", "local"));

    if (!isLocal) {

      reduceCopier = new ReduceCopier(umbilical, job, reporter);

      if (!reduceCopier.fetchOutputs()) {

    	  ......

到了reduceCopier.fetchOutps()这里必须停一步了。由于后面的Shuffle阶段和Merge阶段都在这里实现:

/**

     * 开启n个线程远程拷贝Map中的输出数据

     * @return

     * @throws IOException

     */

    public boolean fetchOutputs() throws IOException {

      int totalFailures = 0;

      int            numInFlight = 0, numCopied = 0;

      DecimalFormat  mbpsFormat = new DecimalFormat("0.00");

      final Progress copyPhase =

        reduceTask.getProgress().phase();

      //单独的线程用于对本地磁盘的文件进行定期的合并

      LocalFSMerger localFSMergerThread = null;

      //单独的线程用于对内存上的文件进行进行定期的合并

      InMemFSMergeThread inMemFSMergeThread = null;

      GetMapEventsThread getMapEventsThread = null;

      for (int i = 0; i < numMaps; i++) {

        copyPhase.addPhase();       // add sub-phase per file

      }

      //建立拷贝线程列表容器

      copiers = new ArrayList<MapOutputCopier>(numCopiers);

      // start all the copying threads

      for (int i=0; i < numCopiers; i++) {

    	//新建拷贝线程，逐一开启拷贝线程

        MapOutputCopier copier = new MapOutputCopier(conf, reporter,

            reduceTask.getJobTokenSecret());

        copiers.add(copier);

        //加入到列表容器中，并开启此线程

        copier.start();

      }

      //start the on-disk-merge thread

      localFSMergerThread = new LocalFSMerger((LocalFileSystem)localFileSys);

      //start the in memory merger thread

      inMemFSMergeThread = new InMemFSMergeThread();

      //定期合并的2个线程也开启，也就是说copy阶段和merge阶段是并行操作的

      localFSMergerThread.start();

      inMemFSMergeThread.start();

      // start the map events thread

      getMapEventsThread = new GetMapEventsThread();

      getMapEventsThread.start();

      .....

在上面的代码中出现非常多陌生的Thread的定义。这个能够先不用管。我们发现getMapEventsThread就是在这里开启的，去获取了最新的位置，位置获取完毕当然是要启动非常多的拷贝线程了，这里叫做MapOutputCopier线程，作者是把他放入一个线程列表中，逐个开启。

看看里面的详细实现，他是怎样进行远程拷贝的呢。

@Override

      public void run() {

        while (true) {

          try {

            MapOutputLocation loc = null;

            long size = -1;

            synchronized (scheduledCopies) {

              //从scheduledCopies列表中获取获取map Task的输出数据的位置

              while (scheduledCopies.isEmpty()) {

            	//假设scheduledCopies我空，则等待

                scheduledCopies.wait();

              }

              //获取列表中的第一个数据作为拷贝的地址

              loc = scheduledCopies.remove(0);

            }

            CopyOutputErrorType error = CopyOutputErrorType.OTHER_ERROR;

            readError = false;

            try {

              shuffleClientMetrics.threadBusy();

              //标记当前的map输出位置为loc

              start(loc);

              //进行仅仅要的copy操作，返回拷贝字节数的大小

              size = copyOutput(loc);

              shuffleClientMetrics.successFetch();

              //假设进行到这里，说明拷贝成功吗，标记此error的标记为NO_ERROR

              error = CopyOutputErrorType.NO_ERROR;

            } catch (IOException e) {

              //抛出异常，做异常处理

              ....

从location列表中去取出，然后进行拷贝操作，核心方法在copyOutput()，接着往里跟踪:

.....

        // Copy the map output

        //依据loc Map任务的数据输出位置。进行RPC的拷贝

        MapOutput mapOutput = getMapOutput(loc, tmpMapOutput,

                                           reduceId.getTaskID().getId());

继续往里：

private MapOutput getMapOutput(MapOutputLocation mapOutputLoc,

                                     Path filename, int reduce)

      throws IOException, InterruptedException {

        // Connect

    	//打开url资源定位符的连接

        URL url = mapOutputLoc.getOutputLocation();

        URLConnection connection = url.openConnection();

        //得到远程数据的输入流

        InputStream input = setupSecureConnection(mapOutputLoc, connection);

        ......

        //We will put a file in memory if it meets certain criteria:

        //1. The size of the (decompressed) file should be less than 25% of

        //    the total inmem fs

        //2. There is space available in the inmem fs

        // Check if this map-output can be saved in-memory

        //向ShuffleRamManager申请内存存放拷贝的数据。推断内存是否内存是否装得下，装不下则放入DISK磁盘

        boolean shuffleInMemory = ramManager.canFitInMemory(decompressedLength); 

        // Shuffle

        MapOutput mapOutput = null;

        if (shuffleInMemory) {

          if (LOG.isDebugEnabled()) {

            LOG.debug("Shuffling " + decompressedLength + " bytes (" +

                compressedLength + " raw bytes) " +

                "into RAM from " + mapOutputLoc.getTaskAttemptId());

          }

          //假设内存装得下，则将输入流中的数据放入内存

          mapOutput = shuffleInMemory(mapOutputLoc, connection, input,

                                      (int)decompressedLength,

                                      (int)compressedLength);

        } else {

          if (LOG.isDebugEnabled()) {

            LOG.debug("Shuffling " + decompressedLength + " bytes (" +

                compressedLength + " raw bytes) " +

                "into Local-FS from " + mapOutputLoc.getTaskAttemptId());

          }

          //装不下。则放入文件里

          mapOutput = shuffleToDisk(mapOutputLoc, input, filename,

              compressedLength);

        }

        return mapOutput;

      }

在这里我们看到了，Hadoop通过URL资源定位符，获取远程输入流，进行操作的。在复制到本地的时候，还分了2种情况处理，当当前的内存能方得下当前数据的时候，放入内存中。放不下则写入到磁盘中。这里还出现了ShuffleRamManager的使用方法。至此，Shuffle阶段宣告完毕。

还是比較深的。一层。又一层的。

Merger阶段。

Merge阶段事实上是和Shuffle阶段并行进行的，刚刚也看到了，在fetchOutputs中，这些相关进程都是同一时候开启的，

public boolean fetchOutputs() throws IOException {

      int totalFailures = 0;

      int            numInFlight = 0, numCopied = 0;

      DecimalFormat  mbpsFormat = new DecimalFormat("0.00");

      final Progress copyPhase =

        reduceTask.getProgress().phase();

      //单独的线程用于对本地磁盘的文件进行定期的合并

      LocalFSMerger localFSMergerThread = null;

      //单独的线程用于对内存上的文件进行进行定期的合并

      InMemFSMergeThread inMemFSMergeThread = null;

      ....

Merge的主要工作就是合并数据，当内存中或者磁盘中的文件比較多的时候。将小文件进行合并变成大文件。挑出当中的一个run方法

....

      public void run() {

        LOG.info(reduceTask.getTaskID() + " Thread started: " + getName());

        try {

          boolean exit = false;

          do {

            exit = ramManager.waitForDataToMerge();

            if (!exit) {

              //进行内存merger操作

              doInMemMerge();

目的很明白，就是Merge操作，这是内存文件的合并线程的run方法，LocalFSMerger与此类似，不分析了。这个Mergr处理是并与Shuffle阶段的。在这里这2个阶段都完毕了。

还是有点复杂的。以下是相关的一些类关系图，主要要搞清4个线程是什么作用的。

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvQW5kcm9pZGx1c2hhbmdkZXJlbg==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="">

4个线程的调用都是在ReduceCopier.fetchOutput()方法中进行的。在Shuffle，Merge阶段的后面就来到了，Sort阶段。

Sort阶段。的任务和轻松。就是完毕一次对内存和磁盘总的一次Merge合并操作，当中还会对当中进行一次sort排序操作。

....

    //标识copy操作已经完毕

    copyPhase.complete();                         // copy is already complete

    setPhase(TaskStatus.Phase.SORT);

    statusUpdate(umbilical);

    //进行内存和磁盘中的总的merge阶段的操作。Sort包括当中运行

    final FileSystem rfs = FileSystem.getLocal(job).getRaw();

    RawKeyValueIterator rIter = isLocal

      ? Merger.merge(job, rfs, job.getMapOutputKeyClass(),

          job.getMapOutputValueClass(), codec, getMapFiles(rfs, true),

          !conf.getKeepFailedTaskFiles(), job.getInt("io.sort.factor", 100),

          new Path(getTaskID().toString()), job.getOutputKeyComparator(),

          reporter, spilledRecordsCounter, null)

      : reduceCopier.createKVIterator(job, rfs, reporter);

那么Sort操作在哪里呢。就在最以下的createKVIterator中:

private RawKeyValueIterator createKVIterator(

        JobConf job, FileSystem fs, Reporter reporter) throws IOException {

      .....

      //在Merge阶段对全部的数据进行归并排序

      Collections.sort(diskSegments, new Comparator<Segment<K,V>>() {

        public int compare(Segment<K, V> o1, Segment<K, V> o2) {

          if (o1.getLength() == o2.getLength()) {

            return 0;

          }

          return o1.getLength() < o2.getLength() ? -1 : 1;

        }

      });

      // build final list of segments from merged backed by disk + in-mem

      List<Segment<K,V>> finalSegments = new ArrayList<Segment<K,V>>();

。Sort阶段的任务就是这么简单。以下看一下前3个阶段基本的运行流程。这3个阶段构成了Reduce
Task的核心。

Reduce阶段，尾随这个图的运行方向。接下来我们应该运行的是key-value的reduce()函数了，没错就是循环键值对，运行此函数

....

    //推断运行的是新的API还是旧的API

    if (useNewApi) {

      runNewReducer(job, umbilical, reporter, rIter, comparator,

                    keyClass, valueClass);

    } else {

      runOldReducer(job, umbilical, reporter, rIter, comparator,

                    keyClass, valueClass);

    }

在这里我们运行的就是runReducer方法了。我们往老的API跳:

  private <INKEY,INVALUE,OUTKEY,OUTVALUE>

  void runOldReducer(JobConf job,

                     TaskUmbilicalProtocol umbilical,

                     final TaskReporter reporter,

                     RawKeyValueIterator rIter,

                     RawComparator<INKEY> comparator,

                     Class<INKEY> keyClass,

                     Class<INVALUE> valueClass) throws IOException {

    Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE> reducer =

      ReflectionUtils.newInstance(job.getReducerClass(), job);

    // make output collector

    String finalName = getOutputName(getPartition());

    //获取输出的key，value

    final RecordWriter<OUTKEY, OUTVALUE> out = new OldTrackingRecordWriter<OUTKEY, OUTVALUE>(

        reduceOutputCounter, job, reporter, finalName);

    OutputCollector<OUTKEY,OUTVALUE> collector =

      new OutputCollector<OUTKEY,OUTVALUE>() {

        public void collect(OUTKEY key, OUTVALUE value)

          throws IOException {

          //将处理后的key,value写入输出流中。最后写入HDFS作为终于结果

          out.write(key, value);

          // indicate that progress update needs to be sent

          reporter.progress();

        }

      };

    // apply reduce function

    try {

      //increment processed counter only if skipping feature is enabled

      boolean incrProcCount = SkipBadRecords.getReducerMaxSkipGroups(job)>0 &&

        SkipBadRecords.getAutoIncrReducerProcCount(job);

      //推断是否为跳过错误记录模式

      ReduceValuesIterator<INKEY,INVALUE> values = isSkipping() ?

          new SkippingReduceValuesIterator<INKEY,INVALUE>(rIter,

              comparator, keyClass, valueClass,

              job, reporter, umbilical) :

          new ReduceValuesIterator<INKEY,INVALUE>(rIter,

          job.getOutputValueGroupingComparator(), keyClass, valueClass,

          job, reporter);

      values.informReduceProgress();

      while (values.more()) {

        reduceInputKeyCounter.increment(1);

        //Record迭代器中获取每一对，运行用户定义的Reduce函数，此阶段为Reduce阶段

        reducer.reduce(values.getKey(), values, collector, reporter);

        if(incrProcCount) {

          reporter.incrCounter(SkipBadRecords.COUNTER_GROUP,

              SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPS, 1);

        }

        //获取下一个key。value

        values.nextKey();

        values.informReduceProgress();

      }

     //...

和Map Task的过程非常类似。也正如我们预期的那样，循环迭代运行，这就是Reduce阶段。

Write阶段。

Write阶段是最后一个阶段。在用户自己定义的reduce中。一般用户都会调用collect.collect方法。这时候就是写入的操作了。

这时的写入就是将最后的结果写入HDFS作为终于结果了。这里先定义了OutputCollector的collect方法:

OutputCollector<OUTKEY,OUTVALUE> collector =

      new OutputCollector<OUTKEY,OUTVALUE>() {

        public void collect(OUTKEY key, OUTVALUE value)

          throws IOException {

          //将处理后的key,value写入输出流中。最后写入HDFS作为终于结果

          out.write(key, value);

          // indicate that progress update needs to be sent

          reporter.progress();

        }

      };

至此。完毕了Reduce任务的全部阶段。

以下是一张时序图，便于理解：

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvQW5kcm9pZGx1c2hhbmdkZXJlbg==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="">

掌握了Map ，Reduce2个过程核心实现的过程将会帮助我们更加理解Hadoop作业执行的整个流程。整个分析的过程或许会有点枯燥，可是苦中作乐。

Reduce Task的学习笔记的更多相关文章

JavaScript学习笔记：数组reduce()和reduceRight()方法
很多时候需要累加数组项的得到一个值(比如说求和).如果你碰到一个类似的问题,你想到的方法是什么呢?会不会和我一样,想到的就是使用for或while循环,对数组进行迭代,依次将他们的值加起来.比如: v ...
JavaScript学习笔记（十）——高阶函数之map，reduce，filter，sort
在学习廖雪峰前辈的JavaScript教程中,遇到了一些需要注意的点,因此作为学习笔记列出来,提醒自己注意! 如果大家有需要,欢迎访问前辈的博客https://www.liaoxuefeng.com/ ...
Hadoop学习笔记2 - 第一和第二个Map Reduce程序
转载请标注原链接http://www.cnblogs.com/xczyd/p/8608906.html 在Hdfs学习笔记1 - 使用Java API访问远程hdfs集群中,我们已经可以完成了访问hd ...
基于.net的分布式系统限流组件 C# DataGridView绑定List对象时，利用BindingList来实现增删查改 .net中ThreadPool与Task的认识总结 C# 排序技术研究与对比基于.net的通用内存缓存模型组件 Scala学习笔记：重要语法特性
基于.net的分布式系统限流组件在互联网应用中,流量洪峰是常有的事情.在应对流量洪峰时,通用的处理模式一般有排队.限流,这样可以非常直接有效的保护系统,防止系统被打爆.另外,通过限流技术手段,可 ...
MapReduce剖析笔记之三：Job的Map/Reduce Task初始化
上一节分析了Job由JobClient提交到JobTracker的流程,利用RPC机制,JobTracker接收到Job ID和Job所在HDFS的目录,够早了JobInProgress对象,丢入队列 ...
RAC学习笔记
RAC学习笔记 ReactiveCocoa(简称为RAC),是由Github开源的一个应用于iOS和OS开发的新框架,Cocoa是苹果整套框架的简称,因此很多苹果框架喜欢以Cocoa结尾. 在学习Re ...
spark学习笔记总结-spark入门资料精化
Spark学习笔记 Spark简介 spark 可以很容易和yarn结合,直接调用HDFS.Hbase上面的数据,和hadoop结合.配置很容易. spark发展迅猛,框架比hadoop更加灵活实用. ...
Hadoop学习笔记（两）设置单节点集群
本文描写叙述怎样设置一个单一节点的 Hadoop 安装.以便您能够高速运行简单的操作,使用 Hadoop MapReduce 和 Hadoop 分布式文件系统 (HDFS). 參考官方文档:Hadoo ...
分布式计算框架学习笔记--hadoop工作原理
(hadoop安装方法:http://blog.csdn.net/wangjia55/article/details/53160679这里不再累述) hadoop是针对大数据设计的一个计算架构.如果你 ...

随机推荐

kotlin 视频
韩梦飞沙韩亚飞 313134555@qq.com yue31313 han_meng_fei_sha
2018haoi总结
AM T1 写了40分,有50分的暴力分,只看到n=1能用费马小定理,没看到还有一个mod质数也能用费马小定理的10分. AM T2 写了10分,60分异或方程求自由元. AM T3 现在还没搞清楚题 ...
CF1027F Session in BSU
link 花絮: 这场看起来打得还不错的样子……(别问我是用哪个号打的). 然后听说这题的思想被出了好多次,女生赛也出过,quailty算法,然而当时没反应过来,而且时间不多啦. 题意: 有n个人,每 ...
（Nginx） URL REWRITE
URL重写的基础介绍把URI地址用作参数传递:URL REWRITE 最简单的是基于各种WEB服务器中的URL重写转向(Rewrite)模块的URL转换: 这样几乎可以不修改程序的实现将 news. ...
Windows Server 2008 R2下将nginx安装成windows系统服务
一直在Linux平台上部署web服务,但是最近的一个项目,必须要用windows,不得已再次研究了nginx在windows下的表现,因为Apache httpd在Windows下表现其实也不算太好, ...
php发送get、post请求的6种方法代码示例
本文主要展示了php发送get.post请求的6种方法的代码示例,分别为使用file_get_contents .fopen.fsockopen.curl来发送GET和POST请求,代码如下: 方法1 ...
Codeforces Round #357 (Div. 2) C. Heap Operations 模拟
C. Heap Operations 题目连接: http://www.codeforces.com/contest/681/problem/C Description Petya has recen ...
SpringBoot无法启动，Process finished with exit code 0
1.排查yml和properties文件是否配置错误 2.排查POM引入的包
Tasker App Factory
http://tasker.dinglisch.net/userguide/en/appcreation.html App Creation Introduction Hello World Exam ...
solaris 10系统配置工具
bash-3.2# prtdiag 报告一般系统信息 System Configuration: VMware, Inc. VMware Virtual Platform BIOS Configura ...

Reduce Task的学习笔记

Reduce Task的学习笔记的更多相关文章

随机推荐

热门专题