导入数据到HBase的方式选择

Choosing the Right Import Method

If the data is already in an HBase table:

To move the data from one HBase cluster to another, use snapshot and either the clone_snapshot or ExportSnapshot utility; or, use the CopyTable utility.
To move the data from one HBase cluster to another without downtime on either cluster, use replication.
To migrate data between HBase version that are not wire compatible, such as from CDH 4 to CDH 5, see Importing HBase Data From CDH 4 to CDH 5.

If the data currently exists outside HBase:

If possible, write the data to HFile format, and use a BulkLoad to import it into HBase. The data is immediately available to HBase and you can bypass the normal write path, increasing efficiency.
If you prefer not to use bulk loads, and you are using a tool such as Pig, you can use it to import your data.

If you need to stream live data to HBase instead of import in bulk:

Write a Java client using the Java API, or use the Apache Thrift Proxy API to write a client in a language supported by Thrift.
Stream data directly into HBase using the REST Proxy API in conjunction with an HTTP client such as wget or curl.
Use Flume or Spark.

Most likely, at least one of these methods works in your situation. If not, you can use MapReduce directly. Test the most feasible methods with a subset of your data to determine which one is optimal.

摘自：http://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_hbase_import.html

导入数据到HBase的方式选择的更多相关文章

批量导入数据到HBase
hbase一般用于大数据的批量分析,所以在很多情况下需要将大量数据从外部导入到hbase中,hbase提供了一种导入数据的方式,主要用于批量导入大量数据,即importtsv工具,用法如下: Us ...
通过phoenix导入数据到hbase出错记录
解决方法1 错误如下 -- ::, [hconnection-0x7b9e01aa-shared--pool11069-t114734] WARN org.apache.hadoop.hbase.ip ...
Hive导入数据到HBase,再与Phoenix映射同步
1. 创建HBase 表 create 'hbase_test','user' 2. 插入数据 put 'hbase_test','111','user:name','jack' put 'hbase ...
用spark导入数据到hbase
集群环境:一主三从,Spark为Spark On YARN模式 Spark导入hbase数据方式有多种 1.少量数据:直接调用hbase API的单条或者批量方法就可以 2.导入的数据量比较大,那就需 ...
importTSV工具导入数据到hbase
1.建立目标表test,确定好列族信息. create'test','info','address' 2.建立文件编写要导入的数据并上传到hdfs上 touch a.csv vi a.csv 数据内容 ...
使用Sqoop从MySQL导入数据到Hive和HBase 及近期感悟
使用Sqoop从MySQL导入数据到Hive和HBase 及近期感悟 Sqoop 大数据 Hive HBase ETL 使用Sqoop从MySQL导入数据到Hive和HBase 及近期感悟基础环境 ...
Hbase 学习（十一）使用hive往hbase当中导入数据
我们可以有很多方式可以把数据导入到hbase当中,比如说用map-reduce,使用TableOutputFormat这个类,但是这种方式不是最优的方式. Bulk的方式直接生成HFiles,写入到文 ...
教程 | 使用Sqoop从MySQL导入数据到Hive和HBase
基础环境 sqoop:sqoop-1.4.5+cdh5.3.6+78, hive:hive-0.13.1+cdh5.3.6+397, hbase:hbase-0.98.6+cdh5.3.6+115 S ...
1.6-1.10 使用Sqoop导入数据到HDFS及一些设置
一.导数据 1.import和export Sqoop可以在HDFS/Hive和关系型数据库之间进行数据的导入导出,其中主要使用了import和export这两个工具.这两个工具非常强大, 提供了很多 ...

随机推荐

XJOI3363 树3/Codeforces 682C Alyona and the Tree(dfs)
Alyona decided to go on a diet and went to the forest to get some apples. There she unexpectedly fou ...
sqlite初识
最近在部署PHP网站项目的时候,发现项目并没有使用传统的三大关系型数据库,而是采用了sqlite数据库,以前的时候,也见过sqlite,但是并没有深入了解其功能和用法,好奇心驱使,决定好好研究一下sq ...
Django Query
Making Qeries 一旦创建了数据模型,Django就会自动为您提供一个数据库抽象API,允许您创建.检索.更新和删除对象.本文档解释了如何使用这个API. The models 一个clas ...
如何使用OpenGL中的扩展
如果你在Windows平台下开发OpenGL程序,那么系统中自带的OpenGL库就是1.1的,如果想使用1.2或者更高版本的OpenGL库,那么只能使用OpenGL扩展,在网上关于如何使用OpenGL ...
POJ 1330 Nearest Common Ancestors（lca）
POJ 1330 Nearest Common Ancestors A rooted tree is a well-known data structure in computer science a ...
读优&&输优
很nb的技巧……但奇怪的是只能对文件使用…… 然而交到OJ上或者比赛的时候都没有关系→_→ 我大概也只能弄弄这些花里胡哨的东西了→_→ 原理不清楚,背个板子好了 //minamoto #include ...
php面向对象编程_1
1, php面向对象编程的三大特征: (1) 封装性,封装就是把抽象出的数据和对数据的操作封装在一起,数据被保护在内部,程序的其它部分只有通过被授权的操作(成员方法)才能对数据进行操作. (2) 继承 ...
Logstash IIS日志采集
Logstash IIS 日志采集,跟Linux上运行差不多,都需要java运行环境,装个jdk就好了,对于IIS日志暂时未处理X-forward-for,纠结怎么弄当中,貌似要装个插件,慢慢研究. ...
[Flex] 组件Tree系列 —— 阻止用户点击选中Tree中任何节点
mxml: <?xml version="1.0" encoding="utf-8"?> <!--功能描述:阻止用户点击选中Tree中任何节点 ...
iOS hook原理
OC中的method其实是一个结构体 struct objc_method{ SEL method_name char *method_types IMP method_imp } SEL是方法名,I ...

导入数据到HBase的方式选择

Choosing the Right Import Method

导入数据到HBase的方式选择的更多相关文章

随机推荐

热门专题