MapReduce之Writable相关类

当要在进程间传递对象或持久化对象的时候，就需要序列化对象成字节流，反之当要将接收到或从磁盘读取的字节流转换为对象，就要进行反序列化。Writable是Hadoop的序列化格式，Hadoop定义了这样一个Writable接口。

public interface Writable {
void write(DataOutput out) throws IOException;
void readFields(DataInput in) throws IOException;
}

public interface Writable {

  void write(DataOutput out) throws IOException;

  void readFields(DataInput in) throws IOException;

}

一个类要支持可序列化只需实现这个接口即可。下面是Writable类得层次结构，借用了<<Hadoop:The Definitive Guide>>的图。

下面我们一点一点来看，先是IntWritable和LongWritable。

WritableComparable接口扩展了Writable和Comparable接口，以支持比较。正如层次图中看到，IntWritable、LongWritable、ByteWritable等基本类型都实现了这个接口。IntWritable和LongWritable的readFields()都直接从实现了DataInput接口的输入流中读取二进制数据并分别重构成int型和long型，而write()则直接将int型数据和long型数据直接转换成二进制流。IntWritable和LongWritable都含有相应的Comparator内部类，这是用来支持对在不反序列化为对象的情况下对数据流中的数据单位进行直接的，这是一个优化，因为无需创建对象。看下面IntWritable的代码片段：

public class IntWritable implements WritableComparable {
private int value;
//…… other methods
public static class Comparator extends WritableComparator {
public Comparator() {
super(IntWritable.class);
}
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
int thisValue = readInt(b1, s1);
int thatValue = readInt(b2, s2);
return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}
}
static { // register this comparator
WritableComparator.define(IntWritable.class, new Comparator());
}
}

public class IntWritable implements WritableComparable {

  private int value;

   //…… other methods

  public static class Comparator extends WritableComparator {

    public Comparator() {

      super(IntWritable.class);

    }

    public int compare(byte[] b1, int s1, int l1,

                       byte[] b2, int s2, int l2) {

      int thisValue = readInt(b1, s1);

      int thatValue = readInt(b2, s2);

      return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));

    }

  }

  static {                                        // register this comparator

    WritableComparator.define(IntWritable.class, new Comparator());

  }

}

代码中的static块调用WritableComparator的static方法define()用来注册上面这个Comparator，就是将其加入WritableComparator的comparators成员中，comparators是HashMap类型且是static的。这样，就告诉WritableComparator，当我使用WritableComparator.get（IntWritable.class）方法的时候，你返回我注册的这个Comparator给我[对IntWritable来说就是IntWritable.Comparator]，然后我就可以使用comparator.compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)来比较b1和b2，而不需要将它们反序列化成对象[像下面代码中]。comparator.compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)中的readInt()是从WritableComparator继承来的，它将IntWritable的value从byte数组中通过移位转换出来。

//params byte[] b1, byte[] b2
RawComparator<IntWritable> comparator = WritableComparator.get(IntWritable.class);
comparator.compare(b1,0,b1.length,b2,0,b2.length);

//params byte[] b1, byte[] b2

RawComparator<IntWritable> comparator = WritableComparator.get(IntWritable.class);

comparator.compare(b1,0,b1.length,b2,0,b2.length);

注意，当comparators中没有注册要比较的类的Comparator，则会返回一个默认的Comparator，然后使用这个默认Comparator的compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)方法比较b1、b2的时候还是要序列化成对象的，详见后面细讲WritableComparator。

LongWritable的方法基本和IntWritable一样，区别就是LongWritable的值是long型，且多了一个额外的LongWritable.DecresingComparator，它继承于LongWritable.Comparator，只是它的比较方法返回值与使用LongWritable.Comparator比较相反[取负]，这个应当是为降序排序准备的。

public class LongWritable implements WritableComparable {
private long value;
//……others
/** A decreasing Comparator optimized for LongWritable. */
public static class DecreasingComparator extends Comparator {
public int compare(WritableComparable a, WritableComparable b) {
return -super.compare(a, b);
}
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
return -super.compare(b1, s1, l1, b2, s2, l2);
}
}
static { // register default comparator
WritableComparator.define(LongWritable.class, new Comparator());
}
}

public class LongWritable implements WritableComparable {

  private long value;

  //……others

  /** A decreasing Comparator optimized for LongWritable. */

  public static class DecreasingComparator extends Comparator {

    public int compare(WritableComparable a, WritableComparable b) {

      return -super.compare(a, b);

    }

    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

      return -super.compare(b1, s1, l1, b2, s2, l2);

    }

  }

  static {                                       // register default comparator

    WritableComparator.define(LongWritable.class, new Comparator());

  }

}

另外，ByteWritable、BooleanWritable、FloatWritable、DoubleWritable都基本一样。

然后我们看VIntWritable和VLongWritable，这两个类基本一样而且VIntWritable[反]的value编码的时候也是使用VLongWritable的value编解码时的方法，主要区别是VIntWritable对象使用int型value成员，而VLongWritable使用long型value成员，这是由它们的取值范围决定的。它们都没有Comparator，不像上面的类。

我们只看VLongWritable即可，先看看其源码长什么样。

public class VLongWritable implements WritableComparable {
private long value;
public VLongWritable() {}
public VLongWritable(long value) { set(value); }
/** Set the value of this LongWritable. */
public void set(long value) { this.value = value; }
/** Return the value of this LongWritable. */
public long get() { return value; }
public void readFields(DataInput in) throws IOException {
value = WritableUtils.readVLong(in);
}
public void write(DataOutput out) throws IOException {
WritableUtils.writeVLong(out, value);
}
/** Returns true iff <code>o</code> is a VLongWritable with the same value. */
public boolean equals(Object o) {
if (!(o instanceof VLongWritable))
return false;
VLongWritable other = (VLongWritable)o;
return this.value == other.value;
}
public int hashCode() {
return (int)value;
}
/** Compares two VLongWritables. */
public int compareTo(Object o) {
long thisValue = this.value;
long thatValue = ((VLongWritable)o).value;
return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
}
public String toString() {
return Long.toString(value);
}
}

public class VLongWritable implements WritableComparable {

  private long value;

  public VLongWritable() {}

  public VLongWritable(long value) { set(value); }

  /** Set the value of this LongWritable. */

  public void set(long value) { this.value = value; }

  /** Return the value of this LongWritable. */

  public long get() { return value; }

  public void readFields(DataInput in) throws IOException {

    value = WritableUtils.readVLong(in);

  }

  public void write(DataOutput out) throws IOException {

    WritableUtils.writeVLong(out, value);

  }

  /** Returns true iff <code>o</code> is a VLongWritable with the same value. */

  public boolean equals(Object o) {

    if (!(o instanceof VLongWritable))

      return false;

    VLongWritable other = (VLongWritable)o;

    return this.value == other.value;

  }

  public int hashCode() {

    return (int)value;

  }

  /** Compares two VLongWritables. */

  public int compareTo(Object o) {

    long thisValue = this.value;

    long thatValue = ((VLongWritable)o).value;

    return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));

  }

  public String toString() {

    return Long.toString(value);

  }

}

在上面可以看到它编码时使用WritableUtils.writeVLong()方法。WritableUtils是关于编解码等的，暂时只看关于VIntWritable和VLongWritable的。

VIntWritable的value的编码实际也是使用writeVLong()：

public static void writeVInt(DataOutput stream, int i) throws IOException {
writeVLong(stream, i);
}

  public static void writeVInt(DataOutput stream, int i) throws IOException {

    writeVLong(stream, i);

  }

首先VIntWritable的长度是[1-5],VLonWritable长度是[1-9]，如果数值在[-112,127]时，使用1Byte表示，即编码后的1Byte存储的就是这个数值。{中文版权威指南上p91我看见说范围是[-127,127]，我猜可能是编码方法进行更新了}。如果不是在这个范围内，则需要更多的Byte，而第一个Byte将被用作存储长度，其它Byte存储数值。

writeVLong()的操作过程如下图，解析附在代码中[不知道说的够明白不，如果感觉难理解，个人觉得其实也不一定要了解太细节]。

WritableUtils.writeVLong()源码：

public static void writeVLong(DataOutput stream, long i) throws IOException {
if (i >= -112 && i <= 127) {
stream.writeByte((byte)i);
return; //-112~127 only use one byte
}
int len = -112;
if (i < 0) {
i ^= -1L; // take one's complement' ~1 = (11111111)2 得到这
//个i_2, i_2 + 1 = |i|,可想一下负数的反码如何能得到其正数[连符号一起取反+1]
len = -120;
}
long tmp = i; //到这里，i一定是正数，这个数介于[0,2^64-1]
//然后用这个循环计算一下长度,i越大，实际长度越大，偏离长度起始值[原来len]越大，len值越小
while (tmp != 0) {
tmp = tmp >> 8;
len--;
}
//现在，我们显然计算出了一个能表示其长度的值len,只要看其偏离长度起始值多少即可
stream.writeByte((byte)len);
len = (len < -120) ? -(len + 120) : -(len + 112); //看吧，计算出了长度,不包含第一个Byte哈[表示长度的Byte]
for (int idx = len; idx != 0; idx--) { //然后，这里从将i的二进制码从左到右8位8位地拿出来，然后写入流中
int shiftbits = (idx - 1) * 8;
long mask = 0xFFL << shiftbits;
stream.writeByte((byte)((i & mask) >> shiftbits));
}
}

  public static void writeVLong(DataOutput stream, long i) throws IOException {

    if (i >= -112 && i <= 127) {

      stream.writeByte((byte)i);

      return;  //-112~127 only use one byte

    }

    int len = -112;

    if (i < 0) {

      i ^= -1L; // take one's complement' ~1 = (11111111)2  得到这

      		//个i_2, i_2 + 1 = |i|,可想一下负数的反码如何能得到其正数[连符号一起取反+1]

      len = -120;

    }

    long tmp = i;  //到这里，i一定是正数，这个数介于[0,2^64-1]

    //然后用这个循环计算一下长度,i越大，实际长度越大，偏离长度起始值[原来len]越大，len值越小

    while (tmp != 0) {

      tmp = tmp >> 8;

      len--;

    }

    //现在，我们显然计算出了一个能表示其长度的值len,只要看其偏离长度起始值多少即可

    stream.writeByte((byte)len);

    len = (len < -120) ? -(len + 120) : -(len + 112); //看吧，计算出了长度,不包含第一个Byte哈[表示长度的Byte]

    for (int idx = len; idx != 0; idx--) {  //然后，这里从将i的二进制码从左到右8位8位地拿出来，然后写入流中

      int shiftbits = (idx - 1) * 8;

      long mask = 0xFFL << shiftbits;

      stream.writeByte((byte)((i & mask) >> shiftbits));

    }

  }

现在知道它是怎么写出去的了，再看看它是怎么读进来，这显然是个反过程。

WritableUtils.readVLong():

public static long readVLong(DataInput stream) throws IOException {
byte firstByte = stream.readByte();
int len = decodeVIntSize(firstByte);
if (len == 1) {
return firstByte;
}
long i = 0;
for (int idx = 0; idx < len-1; idx++) {
byte b = stream.readByte();
i = i << 8;
i = i | (b & 0xFF);
}
return (isNegativeVInt(firstByte) ? (i ^ -1L) : i);
}

  public static long readVLong(DataInput stream) throws IOException {

    byte firstByte = stream.readByte();

    int len = decodeVIntSize(firstByte);

    if (len == 1) {

      return firstByte;

    }

    long i = 0;

    for (int idx = 0; idx < len-1; idx++) {

      byte b = stream.readByte();

      i = i << 8;

      i = i | (b & 0xFF);

    }

    return (isNegativeVInt(firstByte) ? (i ^ -1L) : i);

  }

这显然就是读出字节表示长度[包括表示长度],然后从输入流中一个Byte一个Byte读出来，& 0xFF是为了不让系统自动类型转换，然后再^ -1L，也就是连符号一起取反.

WritableUtils.decodeVIntSize()就是获取编码长度：

public static int decodeVIntSize(byte value) {
if (value >= -112) {
return 1;
} else if (value < -120) {
return -119 - value;
}
return -111 - value;
}

  public static int decodeVIntSize(byte value) {

    if (value >= -112) {

      return 1;

    } else if (value < -120) {

      return -119 - value;

    }

    return -111 - value;

  }

显然，就是按照上面图中的反过程，使用了-119和-111只是为了获取编码长度而不是实际数值长度[不包含表示长度的第一个Byte]而已。

继续说前面的WritableComparator，它是实现了RawComparator接口。RawComparator无非就是一个compare()方法。

public interface RawComparator<T> extends Comparator<T> {
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);
}

public interface RawComparator<T> extends Comparator<T> {

  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);

}

WritableComparator是RawComparator实例的工厂[注册了的Writable的实现类]，它为这些Writable实现类提供了反序列化用的方法，这些方法都比较简单，比较难的readVInt()和readVLong()也就是上面说到的过程。Writable还提供了compare()的默认实现，它会反序列化才比较。如果WritableComparator.get()没有得到注册的Comparator，则会创建一个新的Comparator[其实是WritableComparator的实例]，然后当你使用 public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2)进行比较，它会去使用你要比较的Writable的实现的readFields()方法读出value来。

比如，VIntWritable没有注册，我们get()时它就构造一个WritableComparator，然后设置key1,key2,buffer,keyClass，当你使用 public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) ，则使用VIntWritable.readField从编码后的byte[]中读取value值再进行比较。

然后是ArrayWritable和TwoDArrayWritable，AbstractMapWritable

这两个Writable实现分别是对一位数组和二维数组的封装，不难想象它们都应该提供一个Writable数组和保持关于这个数组的类型，而且序列化和反序列化也将使用封装的Writable实现的readFields()方法和write()方法。

public class TwoDArrayWritable implements Writable {
private Class valueClass;
private Writable[][] values;
//……others
public void readFields(DataInput in) throws IOException {
// construct matrix
values = new Writable[in.readInt()][];
for (int i = 0; i < values.length; i++) {
values[i] = new Writable[in.readInt()];
}
// construct values
for (int i = 0; i < values.length; i++) {
for (int j = 0; j < values[i].length; j++) {
Writable value; // construct value
try {
value = (Writable)valueClass.newInstance();
} catch (InstantiationException e) {
throw new RuntimeException(e.toString());
} catch (IllegalAccessException e) {
throw new RuntimeException(e.toString());
}
value.readFields(in); // read a value
values[i][j] = value; // store it in values
}
}
}
public void write(DataOutput out) throws IOException {
out.writeInt(values.length); // write values
for (int i = 0; i < values.length; i++) {
out.writeInt(values[i].length);
}
for (int i = 0; i < values.length; i++) {
for (int j = 0; j < values[i].length; j++) {
values[i][j].write(out);
}
}
}
}

public class TwoDArrayWritable implements Writable {

  private Class valueClass;

  private Writable[][] values;

  //……others

  public void readFields(DataInput in) throws IOException {

    // construct matrix

    values = new Writable[in.readInt()][];

    for (int i = 0; i < values.length; i++) {

      values[i] = new Writable[in.readInt()];

    }

    // construct values

    for (int i = 0; i < values.length; i++) {

      for (int j = 0; j < values[i].length; j++) {

        Writable value;                             // construct value

        try {

          value = (Writable)valueClass.newInstance();

        } catch (InstantiationException e) {

          throw new RuntimeException(e.toString());

        } catch (IllegalAccessException e) {

          throw new RuntimeException(e.toString());

        }

        value.readFields(in);                       // read a value

        values[i][j] = value;                       // store it in values

      }

    }

  }

  public void write(DataOutput out) throws IOException {

    out.writeInt(values.length);                 // write values

    for (int i = 0; i < values.length; i++) {

      out.writeInt(values[i].length);

    }

    for (int i = 0; i < values.length; i++) {

      for (int j = 0; j < values[i].length; j++) {

        values[i][j].write(out);

      }

    }

  }

}

也就是那样，没什么好讲的了。

另外还有些TupleWritable，AbstractMapWritable->{MapWritable,SortMapWritable}，DBWritable，CompressedWritable，VersionedWritable，GenericWritable之类的，有必要时去再谈它们，其实也差不多，功能不一样而已。

参考资料：

[1]Hadoop权威指南中文版第二版

转载：http://blog.csdn.net/posa88/article/details/7906426

MapReduce之Writable相关类的更多相关文章

[Hadoop源码解读]（五）MapReduce篇之Writable相关类
前面讲了InputFormat,就顺便讲一下Writable的东西吧,本来应当是放在HDFS中的. 当要在进程间传递对象或持久化对象的时候,就需要序列化对象成字节流,反之当要将接收到或从磁盘读取的字节 ...
Android随笔之——Android时间、日期相关类和方法
今天要讲的是Android里关于时间.日期相关类和方法.在Android中,跟时间.日期有关的类主要有Time.Calendar.Date三个类.而与日期格式化输出有关的DateFormat和Simp ...
21 BasicTaskScheduler基本任务调度器（一）——Live555源码阅读(一)任务调度相关类
21_BasicTaskScheduler基本任务调度器(一)——Live555源码阅读(一)任务调度相关类 BasicTaskScheduler基本任务调度器 BasicTaskScheduler基 ...
8 延时队列相关类——Live555源码阅读(一)基本组件类
这是Live555源码阅读的第一部分,包括了时间类,延时队列类,处理程序描述类,哈希表类这四个大类. 本文由乌合之众 lym瞎编,欢迎转载 http://www.cnblogs.com/oloroso ...
4 Handler相关类——Live555源码阅读(一)基本组件类
这是Live555源码阅读的第一部分,包括了时间类,延时队列类,处理程序描述类,哈希表类这四个大类. Handler相关类概述处理程序相关类一共有三个,其没有派生继承关系,但是其有友元关系和使用关系 ...
MFC编程入门之十三（对话框：属性页对话框及相关类的介绍）
前面讲了模态对话框和非模态对话框,本节来将一种特殊的对话框--属性页对话框. 属性页对话框的分类属性页对话框想必大家并不陌生,XP系统中桌面右键点属性,弹出的就是属性页对话框,它通过标签切换各个页面 ...
android 6.0 SDK中删除HttpClient的相关类的解决方法
一.出现的情况在eclipse或 android studio开发, 设置android SDK的编译版本为23时,且使用了httpClient相关类的库项目:如android-async-http ...
Android 6.0删除Apache HttpClient相关类的解决方法
相应的官方文档如下: 上面文档的大致意思是,在Android 6.0(API 23)中,Google已经移除了Apache HttpClient相关的类,推荐使用HttpUrlConnection. ...
List 接口以及实现类和相关类源码分析
List 接口以及实现类和相关类源码分析 List接口分析接口描述用户可以对列表进行随机的读取(get),插入(add),删除(remove),修改(set),也可批量增加(addAll),删除( ...

随机推荐

WPFFontCache_v0400.exe CPU使用率过高的问题
最近的电脑很慢 CPU超过50%了任务管理器显示是WPFFontCache_v0400.exe 的问题每次强制终止后不就又重新启动很是麻烦, 在MSDN中找到了解决办法: 禁用Windows Pr ...
iOS 并发编程指南
iOS Concurrency Programming Guide iOS 和 Mac OS 传统的并发编程模型是线程,不过线程模型伸缩性不强,而且编写正确的线程代码也不容易.Mac OS 和 iOS ...
Java Final, Finally, Finalize
Final is a Keyword, final can be used in three different ways: final variable final method final cla ...
notepad++代码折叠对应的树形结构快捷键
树形层次,从1开始计数 <!doctype html> <html lang="en" class="1"> <head clas ...
CentOS6.7搭建蜜罐dionaea
yum -y install epel-release wget tar git autoconf* libtool-* mkdir /opt/dionaea 1.安装liblcfg软件.git cl ...
Android 使用shape来画线
注意:Android3.0以上系统开始支持硬件加速特性hardwareAccelerated,默认是启用的.当你的某个activity用到了“虚线”效果的时候,必须要设置AndroidManifest ...
nginx学习
nginx源码学习是一个痛苦又快乐的过程,下面列出了一些nginx的学习资源. 首先要做的当然是下载一份nginx源码,可以从nginx官方网站下载一份最新的. 看了nginx源码,发现这是一份完全没 ...
[BS-25] IOS中手势UIGestureRecognizer概述
IOS中手势UIGestureRecognizer概述一.概述 iPhone中处理触摸屏的操作,在3.2之前是主要使用的是由UIResponder而来的如下4种方式: - (void)touches ...
python_遇到问题
1. [出现问题]:cx_Oracle.DatabaseError: ORA-24315: 非法的属性类型 [原因]:是因为版本不兼容,检查了一下环境,我的oracle client是10g的,但我安 ...
javascript设计模式学习之十二——享元模式
一.享元模式的定义及使用场景享元模式是为了解决性能问题而诞生的设计模式,这和大部分设计模式为了提高程序复用性的原因不太一样,如果系统中因为创建了大量类似对象而导致内存占用过高,享元模式就非常有用了. ...

MapReduce之Writable相关类

MapReduce之Writable相关类的更多相关文章

随机推荐

热门专题