HBase HFileBlock

HFileBlock官方源码注释：

Reading HFile version 1 and 2 blocks, and writing version 2 blocks.

In version 1 all blocks are always compressed or uncompressed, as specified by the HFile's compression algorithm, with a type-specific magic record stored in the beginning of the compressed data (i.e. one needs to uncompress the compressed block to determine the block type). There is only a single compression algorithm setting for all blocks. Offset and size information from the block index are required to read a block.
In version 2 a block is structured as follows:
- Magic record identifying the block type (8 bytes)
- Compressed block size, header not included (4 bytes)
- Uncompressed block size, header not included (4 bytes)
- The offset of the previous block of the same type (8 bytes). This is used to be able to navigate to the previous block without going to the block
- For minorVersions >=1, there is an additional 4 byte field bytesPerChecksum that records the number of bytes in a checksum chunk.
- For minorVersions >=1, there is a 4 byte value to store the size of data on disk (excluding the checksums)
- For minorVersions >=1, a series of 4 byte checksums, one each for the number of bytes specified by bytesPerChecksum. index.
- Compressed data (or uncompressed data if compression is disabled). The compression algorithm is the same for all the blocks in the HFile, similarly to what was done in version 1.

The version 2 block representation in the block cache is the same as above, except that the data section is always uncompressed in the cache.

从上述文档可以看出，随着HBase版本的迭代，出现了两种格式的HFile（version1、version2），对应着两个格式的HFileBlock（version1、version2）,这里仅考虑version2的情况，且minorVersions默认值为1，由此可得，HFileBlock version2格式如下图：

magic record (8 bytes)

compressed block size (header not included, 8 bytes)

uncompressed block size (header not included, 4 bytes)

the offset of the previous block of the same type (8 bytes)

bytesPerChecksum (the number of bytes in a checksum chunk, 4 bytes)

data size on disk (excluding the checksums)

a series of 4 byte checksums (one each for the number of bytes specified by bytesPerChecksum)

compressed data (uncompressed data if compression is disabled)

HFileBlock.Writer

Unified version 2 HFile block writer. The intended usage pattern is as follows:

Construct an HFileBlock.Writer, providing a compression algorithm
Call Writer.startWriting(BlockType, boolean) and get a data stream to write to
Write your data into the stream
Call Writer.writeHeaderAndData(FSDataOutputStream) as many times as you need to store the serialized block into an external stream, or call Writer.getHeaderAndData() to get it as a byte array.
Repeat to write more blocks

从上述文档可以看出，官方建议HFileBlock.Writer的使用方式如下：

（1）构建HFileBlock.Writer实例，需要提供使用什么压缩算法、使用什么数据编码格式、是否包含MemstoreTS、HBase小版本号、校验和类型以及多少字节数据生成一个校验和；

（2）通过startWriting方法获取到一个可以用以写出数据的输出流；

（3）根据需要循环调用writeHeaderAndData方法将块数据写出至输出流，然后可以通过方法getHeaderAndData得到一个包含已写出块数据的字节数组；

（4）循环（2）、（3）写出更多的块数据。

类图

核心变量

private State state = State.INIT;

Writer state. Used to ensure the correct usage protocol.

用以标识当前Writer状态，在其它方法调用期间确保Writer状态的正确性（即某些方法调用时Writer应处于特定状态），Writer状态有以下三种类型：

INIT
WRITING
BLOCK_READY

private final Compression.Algorithm compressAlgo;

Compression algorithm for all blocks this instance writes.

用以标识当前Writer使用的压缩算法，共有以下几种类型：

LZO
GZ
NONE
SNAPPY
LZ4

private final HFileDataBlockEncoder dataBlockEncoder;

Data block encoder used for data blocks.

数据块使用的编码器。

private ByteArrayOutputStream baosInMemory;

The stream we use to accumulate data in uncompressed format for each block. We reset this stream at the end of each block and reuse it. The header is written as the first HFileBlock.headerSize(int) bytes into this stream.

该输出流用以积聚每个块的非压缩格式数据（每个块的数据由多个KeyValue组成），当一个块数据写出完毕后，通过方法reset重置此输出流，以使多个块之间可以重用该输出流。

private Compressor compressor;

private CompressionOutputStream compressionStream;

private ByteArrayOutputStream compressedByteStream;

如果使用压缩，将会涉及到上面三个变量：

compressor：Compressor, which is also reused between consecutive blocks.

compressionStream：Compression output stream.

compressedByteStream：Underlying stream to write compressed bytes to.

其中，compressionStream通过封装compressor、compressedByteStream而成。

private BlockType blockType;

Current block type. Set in startWriting(BlockType). Could be changed in encodeDataBlockForDisk() from BlockType.DATA to BlockType.ENCODED_DATA.

用以标识当前数据块的类型。

private DataOutputStream userDataStream;

A stream that we write uncompressed bytes to, which compresses them and writes them to baosInMemory.

userDataStream通过封装baosInMemory而成。

private byte[] onDiskBytesWithHeader;

Bytes to be written to the file system, including the header. Compressed if compression is turned on. It also includes the checksum data that immediately follows the block data. (header + data + checksums)

写入文件系统的最终字节数据，可能是压缩后的字节数据，同时包含头和检验和数据。

private int onDiskDataSizeWithHeader;

The size of the data on disk that does not include the checksums. (header + data)

写入文件系统的最终字节数据大小，不包含校验和数据。

private byte[] onDiskChecksum;

The size of the checksum data on disk. It is used only if data is not compressed. If data is compressed, then the checksums are already part of onDiskBytesWithHeader. If data is uncompressed, then this variable stores the checksum data for this block.

校验和数据，它仅仅被使用在数据没有被压缩的场景下。如果数据被压缩，则校验和数据包含在onDiskBytesWithHeader中；如果数据没有被压缩，onDiskChecksum存储着当前数据块的校验和数据。

private byte[] uncompressedBytesWithHeader;

Valid in the READY state. Contains the header and the uncompressed (but potentially encoded, if this is a data block) bytes, so the length is uncompressedSizeWithoutHeader + HFileBlock.headerSize(int). Does not store checksums.

包含头和未压缩的数据（数据可能已经被编码），不包含检验和数据。

private long startOffset;

Current block's start offset in the HFile. Set in writeHeaderAndData(FSDataOutputStream).

当前数据块在HFile中的起始偏移量。

private long[] prevOffsetByType;

Offset of previous block by block type. Updated when the next block is started.

private long prevOffset;

The offset of the previous block of the same type.

用以记录和当前块类型相同的前一个数据块在HFile中的起始偏移量，其中prevOffsetByType中记录着所有通过Writer写出的数据块类型的起始偏移量，并在开始下一个数据块时更新值。

private boolean includesMemstoreTS;

Whether we are including memstore timestamp after every key/value.

用以标识是否在每个KeyValue键值数据后加入MemstoreTS。

private ChecksumType checksumType;

校验和类型，包含以下几种：

NULL
CRC32
CRC32C

private int bytesPerChecksum;

用以标识多少字节数据需要形成一个校验和。

private final int minorVersion;

用以标识HBase小版本号。

HFileBlock.Writer实例创建（构造函数）过程

this.minorVersion = minorVersion; // 默认值为HFileReaderV2.MAX_MINOR_VERSION（1）

compressAlgo = compressionAlgorithm == null ? NONE : compressionAlgorithm; // 默认值为Compression.Algorithm.NONE

this.dataBlockEncoder = dataBlockEncoder != null ? dataBlockEncoder : NoOpDataBlockEncoder.INSTANCE; // 默认值为NoOpDataBlockEncoder.INSTANCE，该实例实际上不对数据进行任务加工。

baosInMemory = new ByteArrayOutputStream();

if (compressAlgo != NONE) {

compressor = compressionAlgorithm.getCompressor();

compressedByteStream = new ByteArrayOutputStream();

try {

compressionStream = compressionAlgorithm.createPlainCompressionStream(compressedByteStream, compressor);

} catch (IOException e) {

throw new RuntimeException("Could not create compression stream " + "for algorithm " + compressionAlgorithm, e);

}

创建字节输出流，如果使用压缩，则需要创建相应的压缩输出流，无论压缩与否，底层均包装ByteArrayOutputStream，由此可以看出，HFileBlock.Writer并没有将数据直接写入文件系统，而且仅仅缓存在了内存中的一个字节数组中，外部程序拿到这个字节数组（会经过若干步处理，如编码、压缩）后再将数据写入文件系统。

if (minorVersion > MINOR_VERSION_NO_CHECKSUM && bytesPerChecksum < HEADER_SIZE_WITH_CHECKSUMS) {

throw new RuntimeException("Unsupported value of bytesPerChecksum. " +

" Minimum is " + HEADER_SIZE_WITH_CHECKSUMS + " but the configured value is " +

bytesPerChecksum);

}

校验bytesPerChecksum。

prevOffsetByType = new long[BlockType.values().length];

for (int i = 0; i < prevOffsetByType.length; ++i)

prevOffsetByType[i] = -1;

初始化数组prevOffsetByType，数组长度为块类型的个数。

this.includesMemstoreTS = includesMemstoreTS; // 默认值为true

this.checksumType = checksumType; // 默认值为ChecksumType.CRC32

this.bytesPerChecksum = bytesPerChecksum; // 默认值为16 * 1024

典型处理流程及源码分析

1. startWriting

方法签名：public DataOutputStream startWriting(BlockType newBlockType) throws IOException

方法描述：Starts writing into the block. The previous block's data is discarded.准备写入块数据，前一个块的数据将被丢弃。

触发条件：HFileWriterV2.newBlock()

执行流程：

if (state == State.BLOCK_READY && startOffset != -1) {

// We had a previous block that was written to a stream at a specific

// offset. Save that offset as the last offset of a block of that type.

prevOffsetByType[blockType.getId()] = startOffset;

}

根据数据块类型，记录前一个数据块在HFile中的起始偏移量，仅当前一个数据块写入完成（state为BLOCK_READY）才会执行此操作。

startOffset = -1;

blockType = newBlockType;

重置起始偏移量及数据块类型。

baosInMemory.reset();

baosInMemory.write(getDummyHeaderForVersion(this.minorVersion));

重置字节数组输出流，并写入“假”的头部信息，即一个空的字节数组，数据长度为24。

state = State.WRITING;

更新Writer状态为WRITING。

// We will compress it later in finishBlock()

userDataStream = new DataOutputStream(baosInMemory);

使用DataOutputStream“装饰”baosInMemory，方便外部数据写入。

return userDataStream;

返回供外部使用的数据输出流。

2. 写入KeyValue

// Write length of key and value and then actual key and value bytes.

// Additionally, we may also write down the memstoreTS.

{

DataOutputStream out = fsBlockWriter.getUserDataStream();

out.writeInt(klength);

totalKeyLength += klength;

out.writeInt(vlength);

totalValueLength += vlength;

out.write(key, koffset, klength);

out.write(value, voffset, vlength);

if (this.includeMemstoreTS) {

WritableUtils.writeVLong(out, memstoreTS);

}

上述代码实际由HFileWriterV2.append完成，在此不作详细解释。可以看出，通过HFileBlock.Writer getUserDataStream获取DataOutputStream实例out，然后即可通过out将数据写入。

3. writeHeaderAndData

方法签名：public void writeHeaderAndData(FSDataOutputStream out) throws IOException

方法描述：Similar to writeHeaderAndData(FSDataOutputStream), but records the offset of this block so that it can be referenced in the next block of the same type.主要作用是记录当前块的起始偏移量，以便于操作同一类型的下一个数据块时使用，其它操作转交由重载方法writeHeaderAndData(FSDataOutputStream)负责。总体职责就是将积聚的块数据写入指定的输出流即HDFS。

触发条件：当数据块的大小超过配额限制（默认64KB）时（通过方法blockSizeWritten，即userDataStream.size()得出），间接触发该方法执行。

执行流程：

long offset = out.getPos();

if (startOffset != -1 && offset != startOffset) {

throw new IOException("A " + blockType + " block written to a "

+ "stream twice, first at offset " + startOffset + ", then at "

+ offset);

}

startOffset = offset;

记录前一个数据块的起始偏移量。

writeHeaderAndData((DataOutputStream) out);

执行流程转交给重载方法writeHeaderAndData(FSDataOutputStream)，执行流程如下：

ensureBlockReady();

Transitions the block writer from the "writing" state to the "block ready" state. Does nothing if a block is already finished.

将Writer的状态从“writing”转移至“block ready”，实际过程由finishBlock完成。

out.write(onDiskBytesWithHeader);

写出头部数据、数据，如果使用压缩则包含校验和数据。

if (compressAlgo == NONE && minorVersion > MINOR_VERSION_NO_CHECKSUM) {

if (onDiskChecksum == HConstants.EMPTY_BYTE_ARRAY) {

throw new IOException("A " + blockType

+ " without compression should have checksums "

+ " stored separately.");

}

out.write(onDiskChecksum);

}

如果没有使用压缩，写出校验和数据。

4. finishBlock

方法签名：private void finishBlock() throws IOException

方法描述：An internal method that flushes the compressing stream (if using compression), serializes the header, and takes care of the separate uncompressed stream for caching on write, if applicable. Sets block write state to "block ready".

触发条件：由方法ensureBlockReady调用触发

执行流程：

userDataStream.flush();

刷出输出流数据。

uncompressedBytesWithHeader = baosInMemory.toByteArray();

数据拷贝，将字节数组输出流中的数据保存至一字节数组中。

prevOffset = prevOffsetByType[blockType.getId()];

保存与当前块类型相同的前一个块的起始偏移量。

state = State.BLOCK_READY;

更新Writer状态为BLOCK_READY，注意执行此操作时数据尚未编码或压缩。

encodeDataBlockForDisk();

编码数据，默认编码器实例为NoOpDataBlockEncoder，并不会对数据产生任何编码操作。

doCompressionAndChecksumming();

压缩数据并生成校验和，执行流程如下：

private void doCompressionAndChecksumming() throws IOException {

if ( minorVersion <= MINOR_VERSION_NO_CHECKSUM) {

version20compression();

} else {

version21ChecksumAndCompression();

}

minorVersion值默认为1，流程跳转至version21ChecksumAndCompression，代码如下：

// do the compression

if (compressAlgo != NONE) {

// 使用压缩

// 重置压缩输出流

compressedByteStream.reset();

// 压缩输出流中写入“假”的头部数据

compressedByteStream.write(DUMMY_HEADER_WITH_CHECKSUM);

// 重置压缩输出流状态，compressionStream“装饰”compressedByteStream

compressionStream.resetState();

// 将uncompressedBytesWithHeader中的数据部分（不包含头）写入压缩输出流

compressionStream.write(uncompressedBytesWithHeader, headerSize(this.minorVersion),

uncompressedBytesWithHeader.length - headerSize(this.minorVersion));

// 刷出并完成压缩输出流

compressionStream.flush();

compressionStream.finish();

// generate checksums

// 其实就是在压缩输出流的末尾写入“假”的校验和数据用以“占位”，为后期的计算校验和数据留出空间

onDiskDataSizeWithHeader = compressedByteStream.size(); // data size

// reserve space for checksums in the output byte stream

ChecksumUtil.reserveSpaceForChecksums(compressedByteStream,

onDiskDataSizeWithHeader, bytesPerChecksum);

// “假”头部数据 + 数据（已被压缩）+ “假”校验和数据

onDiskBytesWithHeader = compressedByteStream.toByteArray();

// 计算“真实”头部数据

put21Header(onDiskBytesWithHeader, 0, onDiskBytesWithHeader.length,

uncompressedBytesWithHeader.length, onDiskDataSizeWithHeader);

// generate checksums for header and data. The checksums are

// part of onDiskBytesWithHeader itself.

// 计算“真实”校验和数据

ChecksumUtil.generateChecksums(

onDiskBytesWithHeader, 0, onDiskDataSizeWithHeader,

onDiskBytesWithHeader, onDiskDataSizeWithHeader,

checksumType, bytesPerChecksum);

// Checksums are already part of onDiskBytesWithHeader

onDiskChecksum = HConstants.EMPTY_BYTE_ARRAY;

// 为cache-on-write生成头部数据，uncompressedBytesWithHeader的数据部分是没有被压缩的

//set the header for the uncompressed bytes (for cache-on-write)

put21Header(uncompressedBytesWithHeader, 0,

onDiskBytesWithHeader.length + onDiskChecksum.length,

uncompressedBytesWithHeader.length, onDiskDataSizeWithHeader);

} else {

// 没有使用压缩

// If we are not using any compression, then the

// checksums are written to its own array onDiskChecksum.

onDiskBytesWithHeader = uncompressedBytesWithHeader;

onDiskDataSizeWithHeader = onDiskBytesWithHeader.length;

// 计算校验和长度

int numBytes = (int)ChecksumUtil.numBytes(

uncompressedBytesWithHeader.length,

bytesPerChecksum);

onDiskChecksum = new byte[numBytes];

// 计算头部数据

//set the header for the uncompressed bytes

put21Header(uncompressedBytesWithHeader, 0,

onDiskBytesWithHeader.length + onDiskChecksum.length,

uncompressedBytesWithHeader.length, onDiskDataSizeWithHeader);

// 计算校验和数据

ChecksumUtil.generateChecksums(

uncompressedBytesWithHeader, 0, uncompressedBytesWithHeader.length,

onDiskChecksum, 0,

checksumType, bytesPerChecksum);

}

代码到此，如果没有使用压缩，onDiskBytesWithHeader中包含头部数据+数据，onDiskChecksum为相应的校验和数据；如果使用压缩，onDiskBytesWithHeader包含头部数据+压缩数据+校验和数据，onDiskChecksum为EMPTY_BYTE_ARRAY。第3步writeHeaderAndData即可据此将块数据（onDiskBytesWithHeader、onDiskChecksum）写出。

循环以上几步，即可完成HFile Block的写流程。

HBase HFileBlock的更多相关文章

HBase BlockCache
1. Cache 读写调用逻辑: hmaster.handleCreateTable->HRegion.createHRegion-> HRegion. initialize-> ...
HBase之CF持久化系列(续3——完结篇)
相信大家在看了该系列的前两篇文章就已经对其中的持久化有比较深入的了解.相对而言,本节内容只是对前两节的一个巩固.与持久化相对应的是打开文件并将其内容读入到内存变量中.而在本节,我就来介绍这一点. 本节 ...
HBase之CF持久化系列(续2)
正如上篇博文所说,在本节我将为大家带来StoreFlusher.finalizeWriter..如果大家没有看过我的上篇博文<HBase之CF持久化系列(续1)>,那我希望大家还是回去看一 ...
HBase之CF持久化系列(续1)
这一节本来打算讲解HRegion的初始化过程中一些比较复杂的流程.不过,考虑前面的博文做的铺垫并不够,因此,在这一节,我还是特意来介绍HBase的CF持久化.关于这个话题的整体流程性分析在博文< ...
<HBase><读写><LSM>
Overview HBase中的一个big table,首先会按行划分成一些region(这些region之间是有序的,由startkey保证),每个region分配到不同的节点进行存储.因此,reg ...
hbase源码系列（十三）缓存机制MemStore与Block Cache
这一章讲hbase的缓存机制,这里面涉及的内容也是比较多,呵呵,我理解中的缓存是保存在内存中的特定的便于检索的数据结构就是缓存. 之前在讲put的时候,put是被添加到Store里面,这个Store是 ...
hbase源码系列（十二）Get、Scan在服务端是如何处理？
继上一篇讲了Put和Delete之后,这一篇我们讲Get和Scan, 因为我发现这两个操作几乎是一样的过程,就像之前的Put和Delete一样,上一篇我本来只打算写Put的,结果发现Delete也可以 ...
hbase源码系列（九）StoreFile存储格式
从这一章开始要讲Region Server这块的了,但是在讲Region Server这块之前得讲一下StoreFile,否则后面的不好讲下去,这块是基础,Region Sever上面的操作,大部分都 ...
HBase – 存储文件HFile结构解析
本文由网易云发布. 作者:范欣欣本篇文章仅限内部分享,如需转载,请联系网易获取授权. HFile是HBase存储数据的文件组织形式,参考BigTable的SSTable和Hadoop的TFile ...

随机推荐

[Angular 2] Adding a data model
Instead of add todo as a string, we create a data model: export class TodoModel{ constructor( public ...
[AngularJS] Enable Animations Explicitly For A Performance Boost In AngularJS
http://www.bennadel.com/blog/2935-enable-animations-explicitly-for-a-performance-boost-in-angularjs. ...
Redis配置文件分析
#Redis演示示例配置文件 # 注意单位问题:当须要设置内存大小的时候,能够使用类似1k.5GB.4M这种常见格式: # # 1k=> 1000 bytes #1kb => 1024 b ...
查看linux系统状态
就类似你装完xp后,或者你拿到一台新的机器的时候,你通常都是进入系统,看看他的cpu,内存,硬盘使用情况.我也按照这个来看看linux的系统状态.1:top 退出按q,这个就类似windows的任务管 ...
Upgrade to Python 2.7.9 on CentOS5.5
1. Install python2.7 #cd /tmp #wget https://www.python.org/ftp/python/2.7.9/Python-2.7.9.tgz --no-ch ...
python的局部变量和全局变量
#coding=utf-8#全局变量与局部变量 #作用域def func(): i=8#print i# print object# j=9# print j #局部变量def func(a): i= ...
(转)php中GD库的配置，解决dedecms安装中GD不支持问题
了解gd库在php中,使用gd库来对图像进行操作,gd库是一个开放的动态创建的图像的源代码公开的函数库,可以从官方网站http://www.boutell.com/gd处下载.目前,gd库支持gif ...
js apply
1.作用函数的apply方法的作用与call方法类似,也是改变this指向,然后再调用该函数.唯一的区别就是,它接收一个数组作为函数执行时的参数 Fn.apply(obj, [arg1, arg2, ...
IOS开发之UIScrollView
一.UIScrollView的边界处理问题: bounds属性: (1)当bounces属性设置为YES时,当UIScrollView中图片滑动到边界的时候会出现弹动的效果,就像是Linux中的果冻效 ...
解决VS2010使用comboBox死机问题
今天,在10下使用combobox总是不响应,原来是和翻译软件冲突,关掉有道立即解决.

HBase HFileBlock

HBase HFileBlock的更多相关文章

随机推荐

热门专题