http://www.cnblogs.com/skyl/p/4807793.html

比较运算符 CompareFilter.CompareOp
比较运算符用于定义比较关系,可以有以下几类值供选择:

  • EQUAL 相等
  • GREATER 大于
  • GREATER_OR_EQUAL 大于等于
  • LESS 小于
  • LESS_OR_EQUAL 小于等于
  • NOT_EQUAL 不等于

比较器 ByteArrayComparable
通过比较器可以实现多样化目标匹配效果,比较器有以下子类可以使用:

  • BinaryComparator 匹配完整字节数组
  • BinaryPrefixComparator 匹配字节数组前缀
  • BitComparator  不常用
  • NullComparator  不常用
  • RegexStringComparator 匹配正则表达式
  • SubstringComparator 匹配子字符串

1.多重过滤器--FilterList(Shell不支持)
FilterList代表一个过滤器链,它可以包含一组即将应用于目标数据集的过滤器,过滤器间具有“与”FilterList.Operator.MUST_PASS_ALL 和“或” FilterList.Operator.MUST_PASS_ONE 关系。

  1. //结合过滤器,获取所有age在15到30之间的行
  2. private static void scanFilter() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. // And
  10. FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
  11. // >=15
  12. SingleColumnValueFilter filter1 = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.GREATER_OR_EQUAL, "15".getBytes());
  13. // =<30
  14. SingleColumnValueFilter filter2 = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.LESS_OR_EQUAL, "30".getBytes());
  15. filterList.addFilter(filter1);
  16. filterList.addFilter(filter2);
  17.  
  18. Scan scan = new Scan();
  19. // set Filter
  20. scan.setFilter(filterList);
  21.  
  22. ResultScanner rs = ht.getScanner(scan);
  23. for(Result result : rs){
  24. for(Cell cell : result.rawCells()){
  25. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  26. +new String(CellUtil.cloneFamily(cell))+"\t"
  27. +new String(CellUtil.cloneQualifier(cell))+"\t"
  28. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  29. +cell.getTimestamp());
  30. }
  31. }
  32. ht.close();
  33. }

2. 列值过滤器--SingleColumnValueFilter
用于测试列值相等(CompareOp.EQUAL ),不等(CompareOp.NOT_EQUAL),或单侧范围 (如CompareOp.GREATER)。构造函数:
2.1.比较的关键字是一个字符数组(Shell不支持?)
SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, byte[] value)

  1. //SingleColumnValueFilter例子
  2. private static void scanFilter01() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, "18".getBytes());
  10. Scan scan = new Scan();
  11. scan.setFilter(scvf);
  12. ResultScanner rs = ht.getScanner(scan);
  13. for(Result result : rs){
  14. for(Cell cell : result.rawCells()){
  15. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  16. +new String(CellUtil.cloneFamily(cell))+"\t"
  17. +new String(CellUtil.cloneQualifier(cell))+"\t"
  18. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  19. +cell.getTimestamp());
  20. }
  21. }
  22. ht.close();
  23. }

2.2.比较的关键字是一个比较器ByteArrayComparable
SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, ByteArrayComparable comparator)

  1. //SingleColumnValueFilter例子2 -- RegexStringComparator
  2. private static void scanFilter02() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");

  8.    //值比较的正则表达式 -- RegexStringComparator
  9. //匹配info:age值以"4"结尾
  10. RegexStringComparator comparator = new RegexStringComparator(".4");
  11. //第四个参数不一样
  12. SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, comparator);
  13. Scan scan = new Scan();
  14. scan.setFilter(scvf);
  15. ResultScanner rs = ht.getScanner(scan);
  16. for(Result result : rs){
  17. for(Cell cell : result.rawCells()){
  18. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  19. +new String(CellUtil.cloneFamily(cell))+"\t"
  20. +new String(CellUtil.cloneQualifier(cell))+"\t"
  21. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  22. +cell.getTimestamp());
  23. }
  24. }
  25. ht.close();
  26. }
  1. hbase(main):032:0> scan 'users',{FILTER=>"SingleColumnValueFilter('info','age',=,'regexstring:.4')"}
  2. ROW COLUMN+CELL
  3. xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
  4. xiaoming01 column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD
  5. xiaoming01 column=info:age, timestamp=1441998917568, value=24
  6. xiaoming02 column=info:age, timestamp=1441998917594, value=24
  7. xiaoming03 column=info:age, timestamp=1441998919607, value=24
  8. 3 row(s) in 0.0130 seconds
  1. //SingleColumnValueFilter例子2 -- SubstringComparator
  2. private static void scanFilter03() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. //检测一个子串是否存在于值中(大小写不敏感) -- SubstringComparator
  10. //过滤age值中包含'4'的RowKey
  11. SubstringComparator comparator = new SubstringComparator("4");
  12. //第四个参数不一样
  13. SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, comparator);
  14. Scan scan = new Scan();
  15. scan.setFilter(scvf);
  16. ResultScanner rs = ht.getScanner(scan);
  17. for(Result result : rs){
  18. for(Cell cell : result.rawCells()){
  19. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  20. +new String(CellUtil.cloneFamily(cell))+"\t"
  21. +new String(CellUtil.cloneQualifier(cell))+"\t"
  22. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  23. +cell.getTimestamp());
  24. }
  25. }
  26. ht.close();
  27. }
  1. hbase(main):033:0> scan 'users',{FILTER=>"SingleColumnValueFilter('info','age',=,'substring:4')"}
  2. ROW COLUMN+CELL
  3. xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
  4. xiaoming01 column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD
  5. xiaoming01 column=info:age, timestamp=1441998917568, value=24
  6. xiaoming02 column=info:age, timestamp=1441998917594, value=24
  7. xiaoming03 column=info:age, timestamp=1441998919607, value=24
  8. 3 row(s) in 0.0180 seconds

3.列名过滤器
由于HBase采用键值对保存内部数据,列名过滤器过滤一行的列名(ColumnFamily:Qualifiers)是否存在 , 对应前节所述列值的情况。

3.1.基于Columun Family列族过滤数据的FamilyFilter
FamilyFilter(CompareFilter.CompareOp familyCompareOp, ByteArrayComparable familyComparator)

注意:
1.如果希望查找的是一个已知的列族,则使用 scan.addFamily(family); 比使用过滤器效率更高.
2.由于目前HBase对多列族支持不完善,所以该过滤器目前用途不大.

  1. //基于列族过滤数据的FamilyFilter
  2. private static void scanFilter04() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. //过滤 = 'address'的列族
  10. //FamilyFilter familyFilter = new FamilyFilter(CompareOp.EQUAL, new BinaryComparator("address".getBytes()));
  11.  
  12. //过滤以'add'开头的列族
  13. FamilyFilter familyFilter = new FamilyFilter(CompareOp.EQUAL, new BinaryPrefixComparator("add".getBytes()));
  14.  
  15. Scan scan = new Scan();
  16. scan.setFilter(familyFilter);
  17. ResultScanner rs = ht.getScanner(scan);
  18. for(Result result : rs){
  19. for(Cell cell : result.rawCells()){
  20. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  21. +new String(CellUtil.cloneFamily(cell))+"\t"
  22. +new String(CellUtil.cloneQualifier(cell))+"\t"
  23. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  24. +cell.getTimestamp());
  25. }
  26. }
  27. ht.close();
  28. }
  1. hbase(main):021:0> scan 'users',{FILTER=>"FamilyFilter(=,'binaryprefix:add')"}
  2. ROW COLUMN+CELL
  3. xiaoming column=address:city, timestamp=1441997498965, value=hangzhou
  4. xiaoming column=address:contry, timestamp=1441997498911, value=china
  5. xiaoming column=address:province, timestamp=1441997498939, value=zhejiang
  6. xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
  7. xiaoming01 column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD
  8. zhangyifei column=address:city, timestamp=1441997499108, value=jieyang
  9. zhangyifei column=address:contry, timestamp=1441997499077, value=china
  10. zhangyifei column=address:province, timestamp=1441997499093, value=guangdong
  11. zhangyifei column=address:town, timestamp=1441997500711, value=xianqiao
  12. 3 row(s) in 0.0400 seconds

3.2.基于Qualifier列名过滤数据的QualifierFilter
QualifierFilter(CompareFilter.CompareOp op, ByteArrayComparable qualifierComparator)

说明:该过滤器应该比FamilyFilter更常用!

  1. //基于Qualifier(列名)过滤数据的QualifierFilter
  2. private static void scanFilter05() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. //过滤列名 = 'age'所有RowKey
  10. //QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new BinaryComparator("age".getBytes()));
  11.  
  12. //过滤列名 以'age'开头 所有RowKey(包含age)
  13. //QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new BinaryPrefixComparator("age".getBytes()));
  14.  
  15. //过滤列名 包含'age' 所有RowKey(包含age)
  16. //QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new SubstringComparator("age"));
  17.  
  18. //过滤列名 符合'.ge'正则表达式 所有RowKey
  19. QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new RegexStringComparator(".ge"));
  20.  
  21. Scan scan = new Scan();
  22. scan.setFilter(qualifierFilter);
  23. ResultScanner rs = ht.getScanner(scan);
  24. for(Result result : rs){
  25. for(Cell cell : result.rawCells()){
  26. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  27. +new String(CellUtil.cloneFamily(cell))+"\t"
  28. +new String(CellUtil.cloneQualifier(cell))+"\t"
  29. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  30. +cell.getTimestamp());
  31. }
  32. }
  33. ht.close();
  34. }
  1. hbase(main):020:0> scan 'users',{FILTER=>"QualifierFilter(=,'regexstring:.ge')"}
  2. ROW COLUMN+CELL
  3. xiaoming column=info:age, timestamp=1441997971945, value=38
  4. xiaoming01 column=info:age, timestamp=1441998917568, value=24
  5. xiaoming02 column=info:age, timestamp=1441998917594, value=24
  6. xiaoming03 column=info:age, timestamp=1441998919607, value=24
  7. zhangyifei column=info:age, timestamp=1442247255446, value=18
  8. 5 row(s) in 0.0460 seconds

3.3.基于列名前缀过滤数据的ColumnPrefixFilter(该功能用QualifierFilter也能实现)
ColumnPrefixFilter(byte[] prefix) 
注意:一个列名是可以出现在多个列族中的,该过滤器将返回所有列族中匹配的列。

  1. //ColumnPrefixFilter例子
  2. private static void scanFilter06() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. //匹配 以'ag'开头的所有的列
  10. ColumnPrefixFilter columnPrefixFilter = new ColumnPrefixFilter("ag".getBytes());
  11.  
  12. Scan scan = new Scan();
  13. scan.setFilter(columnPrefixFilter);
  14. ResultScanner rs = ht.getScanner(scan);
  15. for(Result result : rs){
  16. for(Cell cell : result.rawCells()){
  17. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  18. +new String(CellUtil.cloneFamily(cell))+"\t"
  19. +new String(CellUtil.cloneQualifier(cell))+"\t"
  20. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  21. +cell.getTimestamp());
  22. }
  23. }
  24. ht.close();
  25. }
  1. hbase(main):018:0> scan 'users',{FILTER=>"ColumnPrefixFilter('ag')"}
  2. ROW COLUMN+CELL
  3. xiaoming column=info:age, timestamp=1441997971945, value=38
  4. xiaoming01 column=info:age, timestamp=1441998917568, value=24
  5. xiaoming02 column=info:age, timestamp=1441998917594, value=24
  6. xiaoming03 column=info:age, timestamp=1441998919607, value=24
  7. zhangyifei column=info:age, timestamp=1442247255446, value=18
  8. 5 row(s) in 0.0280 seconds

3.4.基于多个列名前缀过滤数据的MultipleColumnPrefixFilter
MultipleColumnPrefixFilter 和 ColumnPrefixFilter 行为差不多,但可以指定多个前缀。

  1. //MultipleColumnPrefixFilter例子
  2. private static void scanFilter07() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. //匹配 以'a'或者'c'开头 所有的列{二维数组}
  10. byte[][] prefixes =new byte[][]{"a".getBytes(), "c".getBytes()};
  11. MultipleColumnPrefixFilter multipleColumnPrefixFilter = new MultipleColumnPrefixFilter(prefixes );
  12.  
  13. Scan scan = new Scan();
  14. scan.setFilter(multipleColumnPrefixFilter);
  15. ResultScanner rs = ht.getScanner(scan);
  16. for(Result result : rs){
  17. for(Cell cell : result.rawCells()){
  18. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  19. +new String(CellUtil.cloneFamily(cell))+"\t"
  20. +new String(CellUtil.cloneQualifier(cell))+"\t"
  21. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  22. +cell.getTimestamp());
  23. }
  24. }
  25. ht.close();
  26. }
  1. hbase(main):017:0> scan 'users',{FILTER=>"MultipleColumnPrefixFilter('a','c')"}
  2. ROW COLUMN+CELL
  3. xiaoming column=address:city, timestamp=1441997498965, value=hangzhou
  4. xiaoming column=address:contry, timestamp=1441997498911, value=china
  5. xiaoming column=info:age, timestamp=1441997971945, value=38
  6. xiaoming column=info:company, timestamp=1441997498889, value=alibaba
  7. xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
  8. xiaoming01 column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD
  9. xiaoming01 column=info:age, timestamp=1441998917568, value=24
  10. xiaoming02 column=info:age, timestamp=1441998917594, value=24
  11. xiaoming03 column=info:age, timestamp=1441998919607, value=24
  12. zhangyifei column=address:city, timestamp=1441997499108, value=jieyang
  13. zhangyifei column=address:contry, timestamp=1441997499077, value=china
  14. zhangyifei column=info:age, timestamp=1442247255446, value=18
  15. zhangyifei column=info:company, timestamp=1441997499039, value=alibaba
  16. 5 row(s) in 0.0430 seconds

3.5.基于列范围(不是行范围)过滤数据ColumnRangeFilter

  1. 可用于获得一个范围的列,例如,如果你的一行中有百万个列,但是你只希望查看列名从bbbb到dddd的范围
  2. 该方法从 HBase 0.92 版本开始引入
  3. 一个列名是可以出现在多个列族中的,该过滤器将返回所有列族中匹配的列

构造函数:
ColumnRangeFilter(byte[] minColumn, boolean minColumnInclusive, byte[] maxColumn, boolean maxColumnInclusive)
参数解释:

  • minColumn - 列范围的最小值,如果为空,则没有下限
  • minColumnInclusive - 列范围是否包含minColumn
  • maxColumn - 列范围最大值,如果为空,则没有上限
  • maxColumnInclusive - 列范围是否包含maxColumn
  1. //ColumnRangeFilter例子
  2. private static void scanFilter08() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. //匹配 以'a'开头到以'c'开头(不包含c) 所有的列
  10. ColumnRangeFilter columnRangeFilter = new ColumnRangeFilter("a".getBytes(), true, "c".getBytes(), false);
  11.  
  12. Scan scan = new Scan();
  13. scan.setFilter(columnRangeFilter);
  14. ResultScanner rs = ht.getScanner(scan);
  15. for(Result result : rs){
  16. for(Cell cell : result.rawCells()){
  17. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  18. +new String(CellUtil.cloneFamily(cell))+"\t"
  19. +new String(CellUtil.cloneQualifier(cell))+"\t"
  20. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  21. +cell.getTimestamp());
  22. }
  23. }
  24. ht.close();
  25. }
  1. hbase(main):016:0> scan 'users',{FILTER=>"ColumnRangeFilter('a',true,'c',false)"}
  2. ROW COLUMN+CELL
  3. xiaoming column=info:age, timestamp=1441997971945, value=38
  4. xiaoming column=info:birthday, timestamp=1441997498851, value=1987-06-17
  5. xiaoming01 column=info:age, timestamp=1441998917568, value=24
  6. xiaoming02 column=info:age, timestamp=1441998917594, value=24
  7. xiaoming03 column=info:age, timestamp=1441998919607, value=24
  8. zhangyifei column=info:age, timestamp=1442247255446, value=18
  9. zhangyifei column=info:birthday, timestamp=1441997498990, value=1987-4-17
  10. 5 row(s) in 0.0340 seconds

4.RowKey
当需要根据行键特征查找一个范围的行数据时,使用Scan的startRow和stopRow会更高效,但是,startRow和stopRow只能匹配行键的开始字符,而不能匹配中间包含的字符。当需要针对行键进行更复杂的过滤时,可以使用RowFilter。
构造函数:RowFilter(CompareFilter.CompareOp rowCompareOp, ByteArrayComparable rowComparator)

  1. //RowFilter例子
  2. private static void scanFilter09() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. //匹配 行键包含'01' 所有的行
  10. RowFilter rowFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("01"));
  11.  
  12. Scan scan = new Scan();
  13. scan.setFilter(rowFilter);
  14. ResultScanner rs = ht.getScanner(scan);
  15. for(Result result : rs){
  16. for(Cell cell : result.rawCells()){
  17. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  18. +new String(CellUtil.cloneFamily(cell))+"\t"
  19. +new String(CellUtil.cloneQualifier(cell))+"\t"
  20. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  21. +cell.getTimestamp());
  22. }
  23. }
  24. ht.close();
  25. }
  1. hbase(main):013:0> scan 'users',{FILTER=>"RowFilter(=,'substring:01')"}
  2. ROW COLUMN+CELL
  3. xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
  4. xiaoming01 column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD
  5. xiaoming01 column=info:age, timestamp=1441998917568, value=24
  6. 1 row(s) in 0.0190 seconds

5.PageFilter(Shell不支持?)
指定页面行数,返回对应行数的结果集。
需要注意的是,该过滤器并不能保证返回的结果行数小于等于指定的页面行数,因为过滤器是分别作用到各个region server的,它只能保证当前region返回的结果行数不超过指定页面行数。
构造函数:PageFilter(long pageSize)

  1. //PageFilter例子
  2. private static void scanFilter10() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. //从RowKey为 "xiaoming" 开始,取3行(包含xiaoming)
  10. PageFilter pageFilter = new PageFilter(3L);
  11.  
  12. Scan scan = new Scan();
  13. scan.setStartRow("xiaoming".getBytes());
  14. scan.setFilter(pageFilter);
  15. ResultScanner rs = ht.getScanner(scan);
  16. for(Result result : rs){
  17. for(Cell cell : result.rawCells()){
  18. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  19. +new String(CellUtil.cloneFamily(cell))+"\t"
  20. +new String(CellUtil.cloneQualifier(cell))+"\t"
  21. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  22. +cell.getTimestamp());
  23. }
  24. }
  25. ht.close();
  26. }

注意:由于该过滤器并不能保证返回的结果行数小于等于指定的页面行数,所以更好的返回指定行数的办法是ResultScanner.next(int nbRows),即:

  1. //上面Demo的改动版
    private static void scanFilter11() throws IOException,
  2. UnsupportedEncodingException {
  3. Configuration conf = HBaseConfiguration.create();
  4. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  5. conf.set("hbase.zookeeper.quorum", "ncst");
  6. HTable ht = new HTable(conf, "users");
  7.  
  8. //从RowKey为 "xiaoming" 开始,取3行(包含xiaoming)
  9. //PageFilter pageFilter = new PageFilter(3L);
  10.  
  11. Scan scan = new Scan();
  12. scan.setStartRow("xiaoming".getBytes());
  13. //scan.setFilter(pageFilter);
  14. ResultScanner rs = ht.getScanner(scan);
  15. //指定返回3行数据
  16. for(Result result : rs.next(3)){
  17. for(Cell cell : result.rawCells()){
  18. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  19. +new String(CellUtil.cloneFamily(cell))+"\t"
  20. +new String(CellUtil.cloneQualifier(cell))+"\t"
  21. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  22. +cell.getTimestamp());
  23. }
  24. }
  25. ht.close();
  26. }

6.SkipFilter(Shell不支持)
根据整行中的每个列来做过滤,只要存在一列不满足条件,整行都被过滤掉。
构造函数:SkipFilter(Filter filter)

例如,如果一行中的所有列代表的是不同物品的重量,则真实场景下这些数值都必须大于零,我们希望将那些包含任意列值为0的行都过滤掉。在这个情况下,我们结合ValueFilter和SkipFilter共同实现该目的:
scan.setFilter(new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL,new BinaryComparator(Bytes.toBytes(0))));

  1. //SkipFilter例子
  2. private static void scanFilter12() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. //跳过列值中包含"24"的所有列
  10. SkipFilter skipFilter = new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL, new BinaryComparator("24".getBytes())));
  11.  
  12. Scan scan = new Scan();
  13. scan.setFilter(skipFilter);
  14. ResultScanner rs = ht.getScanner(scan);
  15. for(Result result : rs){
  16. for(Cell cell : result.rawCells()){
  17. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  18. +new String(CellUtil.cloneFamily(cell))+"\t"
  19. +new String(CellUtil.cloneQualifier(cell))+"\t"
  20. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  21. +cell.getTimestamp());
  22. }
  23. }
  24. ht.close();
  25. }

7.Utility--FirstKeyOnlyFilter
该过滤器仅仅返回每一行中第一个cell的值,可以用于高效的执行行数统计操作。估计实战意义不大。
构造函数:public FirstKeyOnlyFilter()

  1. //FirstKeyOnlyFilter例子
  2. private static void scanFilter12() throws IOException,
  3. UnsupportedEncodingException {
  4. Configuration conf = HBaseConfiguration.create();
  5. conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");
  6. conf.set("hbase.zookeeper.quorum", "ncst");
  7. HTable ht = new HTable(conf, "users");
  8.  
  9. //返回每一行中的第一个cell的值
  10. FirstKeyOnlyFilter firstKeyOnlyFilter = new FirstKeyOnlyFilter();
  11.  
  12. Scan scan = new Scan();
  13. scan.setFilter(firstKeyOnlyFilter);
  14. ResultScanner rs = ht.getScanner(scan);
  15. int i = 0;
  16. for(Result result : rs){
  17. for(Cell cell : result.rawCells()){
  18. System.out.println(new String(CellUtil.cloneRow(cell))+"\t"
  19. +new String(CellUtil.cloneFamily(cell))+"\t"
  20. +new String(CellUtil.cloneQualifier(cell))+"\t"
  21. +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"
  22. +cell.getTimestamp());
  23. i++;
  24. }
  25. }
  26. //输出总的行数
  27. System.out.println(i);
  28. ht.close();
  29. }
  1. hbase(main):009:0> scan 'users',{FILTER=>'FirstKeyOnlyFilter()'}
  2. ROW COLUMN+CELL
  3. xiaoming column=address:city, timestamp=1441997498965, value=hangzhou
  4. xiaoming01 column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD
  5. xiaoming02 column=info:age, timestamp=1441998917594, value=24
  6. xiaoming03 column=info:age, timestamp=1441998919607, value=24
  7. zhangyifei column=address:city, timestamp=1441997499108, value=jieyang
  8. 5 row(s) in 0.0240 seconds

HBase Filter及对应Shell--转的更多相关文章

  1. HBase Filter及对应Shell

    比较运算符 CompareFilter.CompareOp比较运算符用于定义比较关系,可以有以下几类值供选择: EQUAL 相等 GREATER 大于 GREATER_OR_EQUAL 大于等于 LE ...

  2. HBase filter shell操作

    创建表 create 'test1', 'lf', 'sf' lf: column family of LONG values (binary value) -- sf: column family ...

  3. hbase的常用的shell命令&hbase的DDL操作&hbase的DML操作

    前言 笔者在分类中的hbase栏目之前已经分享了hbase的安装以及一些常用的shell命令的使用,这里不仅仅重新复习一下shell命令,还会介绍hbase的DDL以及DML的相关操作. hbase的 ...

  4. hbase各种遍历查询shell语句 包含过滤组合条件

    import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; import java.util.Li ...

  5. HBase Filter 过滤器之RowFilter详解

    前言:本文详细介绍了HBase RowFilter过滤器Java&Shell API的使用,并贴出了相关示例代码以供参考.RowFilter 基于行键进行过滤,在工作中涉及到需要通过HBase ...

  6. HBase Filter 过滤器之FamilyFilter详解

    前言:本文详细介绍了 HBase FamilyFilter 过滤器 Java&Shell API 的使用,并贴出了相关示例代码以供参考.FamilyFilter 基于列族进行过滤,在工作中涉及 ...

  7. HBase Filter 过滤器之QualifierFilter详解

    前言:本文详细介绍了 HBase QualifierFilter 过滤器 Java&Shell API 的使用,并贴出了相关示例代码以供参考.QualifierFilter 基于列名进行过滤, ...

  8. HBase Filter 过滤器之 ValueFilter 详解

    前言:本文详细介绍了 HBase ValueFilter 过滤器 Java&Shell API 的使用,并贴出了相关示例代码以供参考.ValueFilter 基于列值进行过滤,在工作中涉及到需 ...

  9. 一个自定义 HBase Filter -“通过RowKeys来高性能获取数据”

    摘要: 大家在使用HBase和Solr搭建系统中经常遇到的一个问题就是:“我通过SOLR得到了RowKeys后,该怎样去HBase上取数据”.使用现有的Filter性能差劲,网上也没有现成的自定义Fi ...

  10. 生成HFile文件后倒入数据出现Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.filter.Filter

    数据导入的时候出现: at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclar ...

随机推荐

  1. Quoit Design(最近点对+分治)

    题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=1007 Quoit Design Time Limit: 10000/5000 MS (Java/Oth ...

  2. Mysql隔离级别,锁与MVCC

    关键词:事务,ACID,隔离级别,MVCC,共享锁,排它锁 阅读本文前请先阅读http://hedengcheng.com/?p=771 http://www.hollischuang.com/arc ...

  3. [国嵌攻略][179][OpenSSL加密系统]

    未加密传输的安全弊端 如果在网络传输中没有加密,就是以明文传输.传输的数据可以被抓包软件直接截获,并能读取里面的数据. 加密基本原理 1.对称加密 2.非对称加密 2.1.公钥私钥 公钥和私密要配对. ...

  4. 用于 C&sharp; 图像识别的轮廓分析技术

    用于 C♯ 图像识别的轮廓分析技术 供稿:Conmajia 标题:Contour Analysis for Image Recognition in C# 作者:Pavel Torgashov 此中文 ...

  5. Spark算子--union、intersection、subtract

    转载请标明出处http://www.cnblogs.com/haozhengfei/p/252bcc1d1ab30c430d347279d5827615.html union.intersection ...

  6. java if与for循环的题

    //打印一个4*5的空心长方形        /*        for (int i = 0; i < 5;i++ ) {            if (i == 0 | i == 4) {  ...

  7. bat复制文件夹下所有文件到另一个目录

    一个需求,网上了半天都是错了,所以记一下吧,方便你我. copy是文件拷贝,文件夹拷贝需要用到xcopy @echo off::当前盘符set curPath=%cd%set digPath =&qu ...

  8. js数组操作记录

    一 .splice() 方法向/从数组中添加/删除项目,然后返回被删除的项目. arrayObject.splice(index,howmany,item1,.....,itemX) 参数 描述 in ...

  9. SQL语句order by两个字段同时排序。

    ORDER BY  后可加2个字段,用英文逗号隔开.理解:对两个字段都排序,并不是之排序其中的一个字段: f1用升序, f2降序,sql该这样写 ORDERBY  f1, f2  DESC 也可以这样 ...

  10. Jupyter notebook入门

    Jupyter notebook入门 [TOC] Jupyter notebook 是一种 Web 应用,能让用户将说明文本.数学方程.代码和可视化内容全部组合到一个易于共享的文档中. Jupyter ...