HBase Scan,Get用法
1. get help帮助信息
从下列get用法信息可以看出 get 后面可以跟table表名,rowkey,以及column,value.但是如果想通过get直接获取一个表中的全部数据是做不到的,这种情况就要用到另外一个命令scan。
hbase(main):214:0> help 'get'
Get row or cell contents; pass table name, row, and optionally
a dictionary of column(s), timestamp, timerange and versions. Examples: hbase> get 'ns1:t1', 'r1'
hbase> get 't1', 'r1'
hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
hbase> get 't1', 'r1', {COLUMN => 'c1'}
hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
hbase> get 't1', 'r1', 'c1'
hbase> get 't1', 'r1', 'c1', 'c2'
hbase> get 't1', 'r1', ['c1', 'c2']
hbsase> get 't1','r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
hbsase> get 't1','r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
2. Scan help帮助信息
hbase(main):221:0> help 'scan'
Scan a table; pass table name and optionally a dictionary of scanner
specifications. Scanner specifications may include one or more of:
or COLUMNS, CACHE If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family:'. The filter can be specified in two ways:
1. Using a filterString - more information on this is available in the
Filter Language document attached to the HBASE-4176 JIRA
2. Using the entire package name of the filter. Some examples: hbase> scan 'hbase:meta'
hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
hbase> scan 't1', {REVERSED => true}
hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
(QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"}
hbase> scan 't1', {FILTER =>
org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
For setting the Operation Attributes
hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}
hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false). By
default it is enabled. Examples: hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false} Also for experts, there is an advanced option -- RAW -- which instructs the
scanner to return all cells (including delete markers and uncollected deleted
cells). This option cannot be combined with requesting specific COLUMNS.
Disabled by default. Example: hbase> scan 't1', {RAW => true, VERSIONS => 10} Besides the default 'toStringBinary' format, 'scan' supports custom formatting
by column. A user can define a FORMATTER by adding it to the column name in
the scan specification. The FORMATTER can be stipulated: 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'. Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:
hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt',
'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } Note that you can specify a FORMATTER by column only (cf:qualifer). You cannot
specify a FORMATTER for all columns of a column family. Scan can also be used directly from a table, by first getting a reference to a
table, like such: hbase> t = get_table 't'
hbase> t.scan Note in the above situation, you can still provide all the filtering, columns,
options, etc as described above.
3. 通过get,Scan用法来获取表中指定rowkey信息。
1. get 获取table中rowkey语句 于 Scan获取table中rowkey语句
hbase(main):011:0> get 'liupeng:employee','1001'
contect:mail timestamp=1522202414649, value=liupliup@cn.ibm.com
contect:phone timestamp=1522202430196, value=15962459503
group:number timestamp=1522202455929, value=1
info:age timestamp=1522202371257, value=34
info:name timestamp=1522202364156, value=liupeng 【Scan】
hbase(main):010:0> scan 'liupeng:employee',FILTER=>"PrefixFilter('1001')"
1001 column=contect:mail, timestamp=1522202414649, value=liupliup@cn.ibm.com
1001 column=contect:phone, timestamp=1522202430196, value=15962459503
1001 column=group:number, timestamp=1522202455929, value=1
1001 column=info:age, timestamp=1522202371257, value=34
1001 column=info:name, timestamp=1522202364156, value=liupeng
1 row(s) in 0.0590 seconds 总结:从上述两种不同的方法可以看出Scan的结果包含了rowkey本身。而get获取到的信息不包含rowkey的值。另外get的column于cell是分开的。而Scan是2者结合在一起的。
4. get于Scan获取table中单条数据信息中的区别
hbase(main):229:0> get "liupeng:employee",'1001','info:phone'
info:phone timestamp=1527914569028, value=15962459503
1 row(s) in 0.0320 seconds hbase(main):230:0> scan "liupeng:employee",FILTER=>"PrefixFilter('1001')AND ValueFilter(=,'substring:159')"
1001 column=info:phone, timestamp=1527914569028, value=15962459503
1 row(s) in 0.1010 seconds
##注意事项:上述都可以把table中rowkey为1002,元素为'159'的信息查询出来。但是查询的方式截然不同。get是通过指定固定的value 'contect:phone'来获取到的。
hbase(main):026:0> scan 'liupeng:employee',FILTER=>"ValueFilter(=,'substring:159')"
1001 column=contect:phone, timestamp=1522202430196, value=15962459503
1002 column=contect:phone, timestamp=1522202527866, value=15977634464
hbase> t.get 'r1'
hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]}
hbase> t.get 'r1', {COLUMN => 'c1'}
hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
hbase> t.get 'r1', 'c1'
hbase> t.get 'r1', 'c1', 'c2'
hbase> t.get 'r1', ['c1', 'c2']
5. Scan方法可以不用指定rowkey检索的情况下直接找valuse值。更具体点说也就是我们要找的哪个column中的哪个value值。get方法是无法做到这一点的。
hbase(main):038:0> scan 'liupeng:employee',FILTER=>"ColumnPrefixFilter('name')"
1001 column=info:name, timestamp=1522202364156, value=liupeng
1002 column=info:name, timestamp=1522202474669, value=Jack_Ma
1003 column=info:name, timestamp=1522202561029, value=kevin_shi
3 row(s) in 0.0210 seconds ##注释:ColumnPrefixFilter代表指定具体哪一个column(key(info)对应的value(name))。
6. Scan方法方便在于它可以随意指定rowkey,column以及value的值来进行查找。还可以结合AND,ORD等条件语句并用来找到自己想要的数据。
下列语法是AND及OR的连用方法。但是同一条语句中相同的条件语句不可以同时使用。例如AND ....AND..这种方法是不允许的。
hbase(main):060:0> scan 'liupeng:employee',FILTER=>"ColumnPrefixFilter('ph')AND ValueFilter(=,'substring:15962')OR ValueFilter(=,'substring:186')"
1001 column=contect:phone, timestamp=1522202430196, value=15962459503
1003 column=contect:phone, timestamp=1522202605976, value=18665851263
2 row(s) in 0.0170 seconds
7. 通过SingleColumnValueFilter类方法指定检索值列举出检索值对应的所有列及value数据
hbase(main):242:0> scan "liupeng:employee",{FILTER=>"SingleColumnValueFilter('info','age',=,'substring:30')"}
1005 column=contect:mail, timestamp=1528420218800, value=zhangsan@163.com
1005 column=info:age, timestamp=1528439967493, value=30
1005 column=info:name, timestamp=1528420218800, value=zhangsan
1008 column=contect:mail, timestamp=1528681786126, value=www.kevin@alibaba.com
1008 column=info:age, timestamp=1528681786126, value=30
1008 column=info:name, timestamp=1528681786126, value=kevin
2 row(s) in 0.0110 seconds
8. SingleColumnValueFilter类还提供正则表达式查询方法。可以通过模糊查询来查找对应的rowkeys,columns以及values。
hbase(main):244:0> scan "liupeng:employee",{FILTER=>"SingleColumnValueFilter('info','name',=,'regexstring:liu')"}
1001 column=contect:mail, timestamp=1527231141046, value=liupliup@cn.ibm.com
1001 column=info:address, timestamp=1527753987327, value=shanghai
1001 column=info:age, timestamp=1527231097033, value=34
1001 column=info:name, timestamp=1527231081262, value=liupeng
1001 column=info:phone, timestamp=1527914569028, value=15962459503
1004 column=contect:mail, timestamp=1527473497956, value=lqdong@jingdong.com
1004 column=info:address, timestamp=1527755135174, value=shenzhen
1004 column=info:age, timestamp=1527473477124, value=40
1004 column=info:name, timestamp=1527415665182, value=liuqiangdong
2 row(s) in 0.0080 seconds
HBase Scan,Get用法的更多相关文章
- HBase Scan Timeout-OutOfOrderScannerNextException
最近迁移数据时需要执行大Scan,HBase集群经常碰到以下日志: Exception in thread "main" org.apache.hadoop.hbase.DoNot ...
- <HBase><Scan>
Overview The Scan operation for HBase. Scan API All operations are identical to Get with the excepti ...
- HBase Scan流程分析
HBase Scan流程分析 HBase的读流程目前看来比较复杂,主要由于: HBase的表数据分为多个层次,HRegion->HStore->[HFile,HFile,...,MemSt ...
- HBase shell scan 过滤器用法总结
比较器: 前面例子中的regexstring:2014-11-08.*.binary:\x00\x00\x00\x05,这都是比较器.HBase的filter有四种比较器: (1)二进制比较器:如’b ...
- hbase scan 的例子
/** * Created by han on 2016/1/28. */ import org.apache.hadoop.conf.Configuration; import org.apache ...
- HBase scan 时 异常 ScannerTimeoutException 解决
org.apache.Hadoop.hbase.client.ScannerTimeoutException: 60622ms passed since the last invocation, ti ...
- HBase scan setBatch和setCaching的区别
HBase的查询实现只提供两种方式: 1.按指定RowKey获取唯一一条记录,get方法(org.apache.hadoop.hbase.client.Get) 2.按指定的条件获取一批记录,scan ...
- HBase scan setBatch和setCaching的区别【转】
转自:http://blog.csdn.net/caoli98033/article/details/44650497 HBase的查询实现只提供两种方式: 1.按指定RowKey获取唯一一条记录,g ...
- HBase scan shell操作详解
创建表 create 'test1', 'lf', 'sf' lf: column family of LONG values (binary value) -- sf: column family ...
- March 27 2017 Week 13 Monday
A book that remains shut is but a block. 有书闭卷不阅读,无异于一块木头. I had planned to buy a book and read it ev ...
- windows10 、eclipse kepler配置maven环境
maven环境的配置需要先配置Java环境. 一.在wIn10 中配置maven环境,需要先下载maven压缩包,官网http://maven.apache.org/download.cgi,我选择下 ...
- ccsu小助手
CCSU小助手 队名:瓜队 组员:钟文兴.周畅.吉刘磊.唐仲勋 宣言:We are a team at any time! 团队项目描述: 内容:“生活在长大”: 目标:为了方便对学校不了解的学生能够 ...
- windows 网络通讯模型Overlapped (转)(未看)
https://blog.csdn.net/jofranks/article/details/7895316 https://blog.csdn.net/caoshiying/article/deta ...
- MySQL:数据库入门篇3
1.sql语句逻辑执行顺序 (7) SELECT (8) DISTINCT <select_list> (1) FROM <left_table> (3) <join_t ...
- PIL 一秒切九图 朋友圈发图神器
注意图片像素返回值是(宽度,高度),pil填像素点坐标原点左上角. 判断像素点是否在圆方程中. import numpy as np from PIL import Image file = inpu ...
- public /protected/private的作用域
作用域 当前类 同一package 子孙类 其他package public √ √ √ √ protected √ √ √ × friendly √ √ × × private √ × × ×
- stixel-net绘制指标图
需解决问题: 1.离散点进行平滑曲线画法 https://blog.csdn.net/cdqn10086/article/details/70143616 def draw_curve(x,y,img ...
一.DTM(Digital Terrain Model) 数字地面模型是利用一个任意坐标系中大量选择的已知x.y.z的坐标点对连续地面的一个简单的统计表示,或者说,DTM就是地形表面形态属性信息的数字 ...
- ASP.NET整体运行机制+asp.net请求管道+页面生命周期