
  上一篇我们了解了MapReduce的相关流程,包含MapReduce V2的重构思路,新的设计架构,与MapReduce V1的区别等内容,今天我们在来学习下在Hadoop V2中的序列化的相关内容,其目录如下所示:

  • 序列化的由来
  • Hadoop序列化依赖图详解
  • Writable常用实现类




package cn.hdfs.io; import java.io.Serializable; /**
* @author dengjie
* @date Apr 21, 2015
* @description 定义一个可序列化的App信息类
public class AppInfo implements Serializable{ /**
private static final long serialVersionUID = 1L; }


  由于 Hadoop 的 MapReduce 和 HDFS 都有通信的需求,需要对通信的对象进行序列化。而且,Hadoop本身需要序列化速度要快,体积要小,占用带宽低等要求。因此,了解Hadoop的序列化过程是很有必要的,下面我们对Hadoop的序列化内容做进一步学习研究。

  注:本文不对Java的Serializable接口做详细赘述,若需了解 ,请参考官方文档:http://docs.oracle.com/javase/7/docs/api/java/io/Serializable.html


  在Hadoop的序列化机制中,org.apache.hadoop.io 中定义了大量的可序列化对象,他们都实现了 Writable 接口,Writable接口中有两个方法,如下所示:

  • write:将对象写入字节流。

  • readFields:从字节流中解析出对象。


*/ package org.apache.hadoop.io; import java.io.DataOutput;
import java.io.DataInput;
import java.io.IOException; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; /**
* A serializable object which implements a simple, efficient, serialization
* protocol, based on {@link DataInput} and {@link DataOutput}.
* <p>Any <code>key</code> or <code>value</code> type in the Hadoop Map-Reduce
* framework implements this interface.</p>
* <p>Implementations typically implement a static <code>read(DataInput)</code>
* method which constructs a new instance, calls {@link #readFields(DataInput)}
* and returns the instance.</p>
* <p>Example:</p>
* <p><blockquote><pre>
* public class MyWritable implements Writable {
* // Some data
* private int counter;
* private long timestamp;
* public void write(DataOutput out) throws IOException {
* out.writeInt(counter);
* out.writeLong(timestamp);
* }
* public void readFields(DataInput in) throws IOException {
* counter = in.readInt();
* timestamp = in.readLong();
* }
* public static MyWritable read(DataInput in) throws IOException {
* MyWritable w = new MyWritable();
* w.readFields(in);
* return w;
* }
* }
* </pre></blockquote></p>
public interface Writable {
* Serialize the fields of this object to <code>out</code>.
* @param out <code>DataOuput</code> to serialize this object into.
* @throws IOException
void write(DataOutput out) throws IOException; /**
* Deserialize the fields of this object from <code>in</code>.
* <p>For efficiency, implementations should attempt to re-use storage in the
* existing object where possible.</p>
* @param in <code>DataInput</code> to deseriablize this object from.
* @throws IOException
void readFields(DataInput in) throws IOException;




*/ package org.apache.hadoop.io; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; /**
* A {@link Writable} which is also {@link Comparable}.
* <p><code>WritableComparable</code>s can be compared to each other, typically
* via <code>Comparator</code>s. Any type which is to be used as a
* <code>key</code> in the Hadoop Map-Reduce framework should implement this
* interface.</p>
* <p>Note that <code>hashCode()</code> is frequently used in Hadoop to partition
* keys. It's important that your implementation of hashCode() returns the same
* result across different instances of the JVM. Note also that the default
* <code>hashCode()</code> implementation in <code>Object</code> does <b>not</b>
* satisfy this property.</p>
* <p>Example:</p>
* <p><blockquote><pre>
* public class MyWritableComparable implements WritableComparable<MyWritableComparable> {
* // Some data
* private int counter;
* private long timestamp;
* public void write(DataOutput out) throws IOException {
* out.writeInt(counter);
* out.writeLong(timestamp);
* }
* public void readFields(DataInput in) throws IOException {
* counter = in.readInt();
* timestamp = in.readLong();
* }
* public int compareTo(MyWritableComparable o) {
* int thisValue = this.value;
* int thatValue = o.value;
* return (thisValue &lt; thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
* }
* public int hashCode() {
* final int prime = 31;
* int result = 1;
* result = prime * result + counter;
* result = prime * result + (int) (timestamp ^ (timestamp &gt;&gt;&gt; 32));
* return result
* }
* }
* </pre></blockquote></p>
public interface WritableComparable<T> extends Writable, Comparable<T> {


package java.lang;
import java.util.*; public interface Comparable<T> {
public int compareTo(T o);





  • IntWriteable
package org.apache.hadoop.io;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; /** A WritableComparable for ints. */
public class IntWritable implements WritableComparable<IntWritable> {
private int value; public IntWritable() {} public IntWritable(int value) { set(value); } /** Set the value of this IntWritable. */
public void set(int value) { this.value = value; } /** Return the value of this IntWritable. */
public int get() { return value; } @Override
public void readFields(DataInput in) throws IOException {
value = in.readInt();
} @Override
public void write(DataOutput out) throws IOException {
} /** Returns true iff <code>o</code> is a IntWritable with the same value. */
public boolean equals(Object o) {
if (!(o instanceof IntWritable))
return false;
IntWritable other = (IntWritable)o;
return this.value == other.value;
} @Override
public int hashCode() {
return value;
} /** Compares two IntWritables. */
public int compareTo(IntWritable o) {
int thisValue = this.value;
int thatValue = o.value;
return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
} @Override
public String toString() {
return Integer.toString(value);
} /** A Comparator optimized for IntWritable. */
public static class Comparator extends WritableComparator {
public Comparator() {
} @Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
int thisValue = readInt(b1, s1);
int thatValue = readInt(b2, s2);
return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
} static { // register this comparator
WritableComparator.define(IntWritable.class, new Comparator());
  • LongWritable
package org.apache.hadoop.io;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; /** A WritableComparable for longs. */
public class LongWritable implements WritableComparable<LongWritable> {
private long value; public LongWritable() {} public LongWritable(long value) { set(value); } /** Set the value of this LongWritable. */
public void set(long value) { this.value = value; } /** Return the value of this LongWritable. */
public long get() { return value; } @Override
public void readFields(DataInput in) throws IOException {
value = in.readLong();
} @Override
public void write(DataOutput out) throws IOException {
} /** Returns true iff <code>o</code> is a LongWritable with the same value. */
public boolean equals(Object o) {
if (!(o instanceof LongWritable))
return false;
LongWritable other = (LongWritable)o;
return this.value == other.value;
} @Override
public int hashCode() {
return (int)value;
} /** Compares two LongWritables. */
public int compareTo(LongWritable o) {
long thisValue = this.value;
long thatValue = o.value;
return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
} @Override
public String toString() {
return Long.toString(value);
} /** A Comparator optimized for LongWritable. */
public static class Comparator extends WritableComparator {
public Comparator() {
} @Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
long thisValue = readLong(b1, s1);
long thatValue = readLong(b2, s2);
return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
} /** A decreasing Comparator optimized for LongWritable. */
public static class DecreasingComparator extends Comparator { @Override
public int compare(WritableComparable a, WritableComparable b) {
return -super.compare(a, b);
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
return -super.compare(b1, s1, l1, b2, s2, l2);
} static { // register default comparator
WritableComparator.define(LongWritable.class, new Comparator());
} }

  从源码IntWritable和LongWriteable中可以看到,两个类中都包含内部类Comparator,该类的作用是用来支持在没有反序列化的情况下直接对数据进行处理。源码中的compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2)方法不需要创建IntWritable对象,效率比compareTo(Object o)高。

  • Text


public class Text extends BinaryComparable
implements WritableComparable<BinaryComparable> { // 详细代码省略...... }


public abstract class BinaryComparable implements Comparable<BinaryComparable> { // 详细代码省略...... }


* Compare bytes from {#getBytes()}.
* @see org.apache.hadoop.io.WritableComparator#compareBytes(byte[],int,int,byte[],int,int)
public int compareTo(BinaryComparable other) {
if (this == other)
return 0;
return WritableComparator.compareBytes(getBytes(), 0, getLength(),
other.getBytes(), 0, other.getLength());
} /**
* Compare bytes from {#getBytes()} to those provided.
public int compareTo(byte[] other, int off, int len) {
return WritableComparator.compareBytes(getBytes(), 0, getLength(),
other, off, len);


/** This class stores text using standard UTF8 encoding.  It provides methods
* to serialize, deserialize, and compare texts at byte level. The type of
* length is integer and is serialized using zero-compressed format. <p>In
* addition, it provides methods for string traversal without converting the
* byte array to a string. <p>Also includes utilities for
* serializing/deserialing a string, coding/decoding a string, checking if a
* byte array contains valid UTF8 code, calculating the length of an encoded
* string.







