[Hadoop源码解读]（五）MapReduce篇之Writable相关类

前面讲了InputFormat，就顺便讲一下Writable的东西吧，本来应当是放在HDFS中的。

当要在进程间传递对象或持久化对象的时候，就需要序列化对象成字节流，反之当要将接收到或从磁盘读取的字节流转换为对象，就要进行反序列化。Writable是Hadoop的序列化格式，Hadoop定义了这样一个Writable接口。

 public interface Writable {
   void write(DataOutput out) throws IOException;
   void readFields(DataInput in) throws IOException;
 }

一个类要支持可序列化只需实现这个接口即可。下面是Writable类得层次结构，借用了<<Hadoop:The Definitive Guide>>的图。

下面我们一点一点来看，先是IntWritable和LongWritable。

WritableComparable接口扩展了Writable和Comparable接口，以支持比较。正如层次图中看到，IntWritable、LongWritable、ByteWritable等基本类型都实现了这个接口。IntWritable和LongWritable的readFields()都直接从实现了DataInput接口的输入流中读取二进制数据并分别重构成int型和long型，而write()则直接将int型数据和long型数据直接转换成二进制流。IntWritable和LongWritable都含有相应的Comparator内部类，这是用来支持对在不反序列化为对象的情况下对数据流中的数据单位进行直接的，这是一个优化，因为无需创建对象。看下面IntWritable的代码片段：

 public class IntWritable implements WritableComparable {
   private int value;

    //…… other methods
   public static class Comparator extends WritableComparator {
     public Comparator() {
       super(IntWritable.class);
     }

     public int compare(byte[] b1, int s1, int l1,
                        byte[] b2, int s2, int l2) {
       int thisValue = readInt(b1, s1);
       int thatValue = readInt(b2, s2);
       return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
     }
   }

   static {                                        // register this comparator
     WritableComparator.define(IntWritable.class, new Comparator());
   }
 }

代码中的static块调用WritableComparator的static方法define()用来注册上面这个Comparator，就是将其加入WritableComparator的comparators成员中，comparators是HashMap类型且是static的。这样，就告诉WritableComparator，当我使用WritableComparator.get（IntWritable.class）方法的时候，你返回我注册的这个Comparator给我[对IntWritable来说就是IntWritable.Comparator]，然后我就可以使用comparator.compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)来比较b1和b2，而不需要将它们反序列化成对象[像下面代码中]。comparator.compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)中的readInt()是从WritableComparator继承来的，它将IntWritable的value从byte数组中通过移位转换出来。

//params byte[] b1, byte[] b2
RawComparator<IntWritable> comparator = WritableComparator.get(IntWritable.class);
comparator.compare(b1,0,b1.length,b2,0,b2.length);

注意，当comparators中没有注册要比较的类的Comparator，则会返回一个默认的Comparator，然后使用这个默认Comparator的compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)方法比较b1、b2的时候还是要序列化成对象的，详见后面细讲WritableComparator。

LongWritable的方法基本和IntWritable一样，区别就是LongWritable的值是long型，且多了一个额外的LongWritable.DecresingComparator，它继承于LongWritable.Comparator，只是它的比较方法返回值与使用LongWritable.Comparator比较相反[取负]，这个应当是为降序排序准备的。

 public class LongWritable implements WritableComparable {
   private long value;
   //……others
   /** A decreasing Comparator optimized for LongWritable. */
   public static class DecreasingComparator extends Comparator {
     public int compare(WritableComparable a, WritableComparable b) {
       return -super.compare(a, b);
     }
     public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
       return -super.compare(b1, s1, l1, b2, s2, l2);
     }
   }
   static {                                       // register default comparator
     WritableComparator.define(LongWritable.class, new Comparator());
   }
 }

另外，ByteWritable、BooleanWritable、FloatWritable、DoubleWritable都基本一样。

然后我们看VIntWritable和VLongWritable，这两个类基本一样而且VIntWritable[反]的value编码的时候也是使用VLongWritable的value编解码时的方法，主要区别是VIntWritable对象使用int型value成员，而VLongWritable使用long型value成员，这是由它们的取值范围决定的。它们都没有Comparator，不像上面的类。

我们只看VLongWritable即可，先看看其源码长什么样。

 public class VLongWritable implements WritableComparable {
   private long value;

   public VLongWritable() {}

   public VLongWritable(long value) { set(value); }

   /** Set the value of this LongWritable. */
   public void set(long value) { this.value = value; }

   /** Return the value of this LongWritable. */
   public long get() { return value; }

   public void readFields(DataInput in) throws IOException {
     value = WritableUtils.readVLong(in);
   }

   public void write(DataOutput out) throws IOException {
     WritableUtils.writeVLong(out, value);
   }

   /** Returns true iff <code>o</code> is a VLongWritable with the same value. */
   public boolean equals(Object o) {
     if (!(o instanceof VLongWritable))
       return false;
     VLongWritable other = (VLongWritable)o;
     return this.value == other.value;
   }

   public int hashCode() {
     return (int)value;
   }

   /** Compares two VLongWritables. */
   public int compareTo(Object o) {
     long thisValue = this.value;
     long thatValue = ((VLongWritable)o).value;
     return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
   }

   public String toString() {
     return Long.toString(value);
   }

 }

在上面可以看到它编码时使用WritableUtils.writeVLong()方法。WritableUtils是关于编解码等的，暂时只看关于VIntWritable和VLongWritable的。

VIntWritable的value的编码实际也是使用writeVLong()：

  public static void writeVInt(DataOutput stream, int i) throws IOException {
    writeVLong(stream, i);
  }

首先VIntWritable的长度是[1-5],VLonWritable长度是[1-9]，如果数值在[-112,127]时，使用1Byte表示，即编码后的1Byte存储的就是这个数值。{中文版权威指南上p91我看见说范围是[-127,127]，我猜可能是编码方法进行更新了}。如果不是在这个范围内，则需要更多的Byte，而第一个Byte将被用作存储长度，其它Byte存储数值。

writeVLong()的操作过程如下图，解析附在代码中[不知道说的够明白不，如果感觉难理解，个人觉得其实也不一定要了解太细节]。

WritableUtils.writeVLong()源码：

   public static void writeVLong(DataOutput stream, long i) throws IOException {
     if (i >= -112 && i <= 127) {
       stream.writeByte((byte)i);
       return;  //-112~127 only use one byte
     }

     int len = -112;
     if (i < 0) {
       i ^= -1L; // take one's complement' ~1 = (11111111)2  得到这
               //个i_2, i_2 + 1 = |i|,可想一下负数的反码如何能得到其正数[连符号一起取反+1]
       len = -120;
     }

     long tmp = i;  //到这里，i一定是正数，这个数介于[0,2^64-1]
     //然后用这个循环计算一下长度,i越大，实际长度越大，偏离长度起始值[原来len]越大，len值越小
     while (tmp != 0) {
       tmp = tmp >> 8;
       len--;
     }
     //现在，我们显然计算出了一个能表示其长度的值len,只要看其偏离长度起始值多少即可
     stream.writeByte((byte)len);

     len = (len < -120) ? -(len + 120) : -(len + 112); //看吧，计算出了长度,不包含第一个Byte哈[表示长度的Byte]

     for (int idx = len; idx != 0; idx--) {  //然后，这里从将i的二进制码从左到右8位8位地拿出来，然后写入流中
       int shiftbits = (idx - 1) * 8;
       long mask = 0xFFL << shiftbits;
       stream.writeByte((byte)((i & mask) >> shiftbits));
     }
   }

现在知道它是怎么写出去的了，再看看它是怎么读进来，这显然是个反过程。

WritableUtils.readVLong():

   public static long readVLong(DataInput stream) throws IOException {
     byte firstByte = stream.readByte();
     int len = decodeVIntSize(firstByte);
     if (len == 1) {
       return firstByte;
     }
     long i = 0;
     for (int idx = 0; idx < len-1; idx++) {
       byte b = stream.readByte();
       i = i << 8;
       i = i | (b & 0xFF);
     }
     return (isNegativeVInt(firstByte) ? (i ^ -1L) : i);
   }

这显然就是读出字节表示长度[包括表示长度],然后从输入流中一个Byte一个Byte读出来，& 0xFF是为了不让系统自动类型转换，然后再^ -1L，也就是连符号一起取反.

WritableUtils.decodeVIntSize()就是获取编码长度：

   public static int decodeVIntSize(byte value) {
     if (value >= -112) {
       return 1;
     } else if (value < -120) {
       return -119 - value;
     }
     return -111 - value;
   }

显然，就是按照上面图中的反过程，使用了-119和-111只是为了获取编码长度而不是实际数值长度[不包含表示长度的第一个Byte]而已。

继续说前面的WritableComparator，它是实现了RawComparator接口。RawComparator无非就是一个compare()方法。

public interface RawComparator<T> extends Comparator<T> {
  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);
}

WritableComparator是RawComparator实例的工厂[注册了的Writable的实现类]，它为这些Writable实现类提供了反序列化用的方法，这些方法都比较简单，比较难的readVInt()和readVLong()也就是上面说到的过程。Writable还提供了compare()的默认实现，它会反序列化才比较。如果WritableComparator.get()没有得到注册的Comparator，则会创建一个新的Comparator[其实是WritableComparator的实例]，然后当你使用 public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2)进行比较，它会去使用你要比较的Writable的实现的readFields()方法读出value来。

比如，VIntWritable没有注册，我们get()时它就构造一个WritableComparator，然后设置key1,key2,buffer,keyClass，当你使用 public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) ，则使用VIntWritable.readField从编码后的byte[]中读取value值再进行比较。

然后是ArrayWritable和TwoDArrayWritable，AbstractMapWritable

这两个Writable实现分别是对一位数组和二维数组的封装，不难想象它们都应该提供一个Writable数组和保持关于这个数组的类型，而且序列化和反序列化也将使用封装的Writable实现的readFields()方法和write()方法。

 public class TwoDArrayWritable implements Writable {
   private Class valueClass;
   private Writable[][] values;

   //……others
   public void readFields(DataInput in) throws IOException {
     // construct matrix
     values = new Writable[in.readInt()][];
     for (int i = 0; i < values.length; i++) {
       values[i] = new Writable[in.readInt()];
     }

     // construct values
     for (int i = 0; i < values.length; i++) {
       for (int j = 0; j < values[i].length; j++) {
         Writable value;                             // construct value
         try {
           value = (Writable)valueClass.newInstance();
         } catch (InstantiationException e) {
           throw new RuntimeException(e.toString());
         } catch (IllegalAccessException e) {
           throw new RuntimeException(e.toString());
         }
         value.readFields(in);                       // read a value
         values[i][j] = value;                       // store it in values
       }
     }
   }

   public void write(DataOutput out) throws IOException {
     out.writeInt(values.length);                 // write values
     for (int i = 0; i < values.length; i++) {
       out.writeInt(values[i].length);
     }
     for (int i = 0; i < values.length; i++) {
       for (int j = 0; j < values[i].length; j++) {
         values[i][j].write(out);
       }
     }
   }
 }

也就是那样，没什么好讲的了。

另外还有些TupleWritable，AbstractMapWritable->{MapWritable,SortMapWritable}，DBWritable，CompressedWritable，VersionedWritable，GenericWritable之类的，有必要时去再谈它们，其实也差不多，功能不一样而已。

from: http://blog.csdn.net/posa88/article/details/7906426

[Hadoop源码解读]（五）MapReduce篇之Writable相关类的更多相关文章

[Hadoop源码解读]（六）MapReduce篇之MapTask类
MapTask类继承于Task类,它最主要的方法就是run(),用来执行这个Map任务. run()首先设置一个TaskReporter并启动,然后调用JobConf的getUseNewAPI()判断 ...
Hadoop源码解读系列目录
Hadoop源码解读系列 1.hadoop源码|common模块-configuration详解2.hadoop源码|core模块-序列化与压缩详解3.hadoop源码|core模块-远程调用与NIO ...
Hadoop2源码分析－MapReduce篇
1.概述前面我们已经对Hadoop有了一个初步认识,接下来我们开始学习Hadoop的一些核心的功能,其中包含mapreduce,fs,hdfs,ipc,io,yarn,今天为大家分享的是mapred ...
[Hadoop源码解读]（一）MapReduce篇之InputFormat
平时我们写MapReduce程序的时候,在设置输入格式的时候,总会调用形如job.setInputFormatClass(KeyValueTextInputFormat.class);来保证输入文件按 ...
mybatis源码解读(五)——sql语句的执行流程
还是以第一篇博客中给出的例子,根据代码实例来入手分析. static { InputStream inputStream = MybatisTest.class.getClassLoader().ge ...
spring beans源码解读之--总结篇
spring beans下面有如下源文件包: org.springframework.beans, 包含了操作java bean的接口和类.org.springframework.beans.anno ...
Vue.js 源码分析(五) 基础篇方法 methods属性详解
methods中定义了Vue实例的方法,官网是这样介绍的: 例如:: <!DOCTYPE html> <html lang="en"> <head&g ...
[Hadoop源码解读]（二）MapReduce篇之Mapper类
前面在讲InputFormat的时候,讲到了Mapper类是如何利用RecordReader来读取InputSplit中的K-V对的. 这一篇里,开始对Mapper.class的子类进行解读. 先回忆 ...
[Hadoop源码解读]（三）MapReduce篇之Job类
下面,我们只涉及MapReduce 1,而不涉及YARN. 当我们在写MapReduce程序的时候,通常,在main函数里,我们会像下面这样做.建立一个Job对象,设置它的JobName,然后配置输入 ...

随机推荐

IOS上iframe的滚动条失效的解决办法
#iframe-wrap { position: fixed; top: 100px; bottom: 0px; left: 0px; right: 0px; -webkit-overflow-scr ...
WPF 程序中启动和关闭外部.exe程序
当需要在WPF程序启动时,启动另一外部程序(.exe程序)时,可以按照下面的例子来: C#后台代码如下: using System; using System.Collections.Generic; ...
MFC: Create Directory
Original link: How to check if Directory already Exists in MFC(VC++)? MSDN Links: CreateDirectory fu ...
Entity Framework 学习笔记（1）
开始从头系统地学习Entity Framework,当前的稳定版本为6.1.3,Nuget主页 http://www.nuget.org/packages/EntityFramework/ 微软喜欢把 ...
改善EF代码的方法（下）
本节,我们将介绍一些改善EF代码的方法,包括编译查询.存储模型视图以及冲突处理等内容. > CompiledQuery 提供对查询的编译和缓存以供重新使用.当相同的查询需要执行很多遍的时候,那么 ...
菜鸟的MySQL学习笔记（一）
本学习笔记是照搬慕课网<与MySQL的零距离接触>内容,特此感谢! 1-1 mysql的安装与配置 Windows环境下的MSI安装: 1.安装: 双击MSI文件->用户协议-> ...
TweenMax动画库学习(三)
目录 TweenMax动画库学习(一) TweenMax动画库学习(二) TweenMax动画库学习(三) ...
JSON字符串转换为JSON对象
一.JSON字符串转换为JSON对象 A:eval函数 eval函数可以直接将本质符合或者近似符合JSON格式的字符串转换为JSON对象,使用方式如: eval('(' + str + ')'); / ...
Chrome 将默认不播放非重要 Flash 内容
Chrome 45将不再自动播放Flash,可能是45以后的版本都不自动播放了,没有具体测试. 小尺寸flash不被chrome播放,需要手动点击才能播放如何解决: <p>1.同域名fla ...
为什么要有binary-to-text encoding?
在wikipedia上看MIME的介绍的时候,有一节是关于Content-Transfer-Encoding的,里面提到了binary-to-text encoding,我就想,既然计算机中的信息使用 ...

[Hadoop源码解读]（五）MapReduce篇之Writable相关类

[Hadoop源码解读]（五）MapReduce篇之Writable相关类的更多相关文章

随机推荐

热门专题