HBase提供的工具

1 .压缩测试工具

hbase org.apache.hadoop.hbase.util.CompressionTest

1G数据不同的压缩算法得到的结果

+--------------------+--------------+ | MODIFIER | SIZE (bytes) | +--------------------+--------------+ | none | 1108553612 | +--------------------+--------------+ | compression:SNAPPY | 427335534 | +--------------------+--------------+ | compression:LZO | 270422088 | +--------------------+--------------+ | compression:GZ | 152899297 | +--------------------+--------------+ | codec:PREFIX | 1993910969 | +--------------------+--------------+ | codec:DIFF | 1960970083 | +--------------------+--------------+ | codec:FAST_DIFF | 1061374722 | +--------------------+--------------+ | codec:PREFIX_TREE | 1066586604 | +--------------------+--------------+

(1)安装Sannpy 压缩

export HBASE_LIBRARY_PATH=/pathtoyourhadoop/lib/native/Linux-amd64-64
测试sannpy压缩
hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://host/path/to/hbase snappy
（2）配置压缩
 hbase-site.xml 中，配置hbase.regionserver.codecs  ，可选的值有LZO,Snappy,GZIP

2.HFile工具

查看HFile

hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f hdfs://10.81.47.41:8020/hbase/TEST/1418428042/DSMP/4759508618286845475

3.WAL工具

查看WAL文件（FSHLog文件）

hbase org.apache.hadoop.hbase.regionserver.wal.FSHLog --dump hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012

强制split WAL文件

hbase org.apache.hadoop.hbase.regionserver.wal.FSHLog --split hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/

HLogPrettyPrinter 打印HLog 内容

4.表拷贝工具

将一个集群中的表拷贝到另外一个表中，前提是目标集群中必须有同样的表存在。

hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase TestTable

其他选项：

starttime Beginning of the time range. Without endtime means starttime to forever.
endtime End of the time range. Without endtime means starttime to forever.
versions Number of cell versions to copy.
new.name New table's name.
peer.adr Address of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
families Comma-separated list of ColumnFamilies to copy.
all.cells Also copy delete markers and uncollected deleted cells (advanced option).

配置scan缓存：hbase.client.scanner.caching

通过表拷贝实现在线数据备份：http://blog.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/

5.导出表数据

hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

6.导入表数据

hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>

不同hbase版本的表数据导入

hbase -Dhbase.import.version=0.94 org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>

7.WALPlayer

先生成HFile ，然后bulk 导入。

hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /backuplogdir oldTable1,oldTable2 newTable1,newTable2

默认是分布式马屁reduce，可以改成本地模式。-Dmapred.job,traker=local

8.RowCounter CellCounter

RowCounter是一个MR程序，用于计算表的row数。

hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...]

CellCount 得到的结果有：

Total number of rows in the table.
Total number of CFs across all rows.
Total qualifiers across all rows.
Total occurrence of each CF.
Total occurrence of each qualifier.
Total number of versions of each qualifier.

hbase org.apache.hadoop.hbase.mapreduce.CellCounter <tablename> <outputDir> [regex or prefix]

9.mlockall

export HBASE_REGIONSERVER_OPTS="-agentpath:./libmlockall_agent.so=user=hbase"

hbase --mlock user=hbase regionserver start

JDK必须是root用户安装的

10.先下紧缩工具

hbase org.apache.hadoop.hbase.regionserver.CompactionTool

11.region合并工具

hbase org.apache.hadoop.hbase.util.Merge <tablename> <region1> <region2>