HBase Filter及对应Shell

比较运算符 CompareFilter.CompareOp
比较运算符用于定义比较关系，可以有以下几类值供选择：

EQUAL 相等
GREATER 大于
GREATER_OR_EQUAL 大于等于
LESS 小于
LESS_OR_EQUAL 小于等于
NOT_EQUAL 不等于

比较器 ByteArrayComparable
通过比较器可以实现多样化目标匹配效果，比较器有以下子类可以使用：

BinaryComparator 匹配完整字节数组
BinaryPrefixComparator 匹配字节数组前缀
BitComparator　　不常用
NullComparator　　不常用
RegexStringComparator 匹配正则表达式
SubstringComparator 匹配子字符串

1.多重过滤器--FilterList(Shell不支持)
FilterList代表一个过滤器链，它可以包含一组即将应用于目标数据集的过滤器，过滤器间具有“与”FilterList.Operator.MUST_PASS_ALL 和“或” FilterList.Operator.MUST_PASS_ONE 关系。

//结合过滤器，获取所有age在15到30之间的行

private static void scanFilter() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    // And

    FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);

    // >=15

    SingleColumnValueFilter filter1 = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.GREATER_OR_EQUAL, "15".getBytes());

    // =<30

    SingleColumnValueFilter filter2 = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.LESS_OR_EQUAL, "30".getBytes());

    filterList.addFilter(filter1);

    filterList.addFilter(filter2);        

    Scan scan = new Scan();

    // set Filter

    scan.setFilter(filterList);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

2. 列值过滤器--SingleColumnValueFilter
用于测试列值相等(CompareOp.EQUAL ),不等(CompareOp.NOT_EQUAL),或单侧范围 (如CompareOp.GREATER)。构造函数：
2.1.比较的关键字是一个字符数组(Shell不支持?)
SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, byte[] value)

//SingleColumnValueFilter例子

private static void scanFilter01() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, "18".getBytes());

    Scan scan = new Scan();

    scan.setFilter(scvf);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

2.2.比较的关键字是一个比较器ByteArrayComparable
SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, ByteArrayComparable comparator)

//SingleColumnValueFilter例子2 -- RegexStringComparator

private static void scanFilter02() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    
　 　//值比较的正则表达式 -- RegexStringComparator

    //匹配info:age值以"4"结尾

    RegexStringComparator comparator = new RegexStringComparator(".4");

    //第四个参数不一样

    SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, comparator);

    Scan scan = new Scan();

    scan.setFilter(scvf);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

hbase(main):032:0> scan 'users',{FILTER=>"SingleColumnValueFilter('info','age',=,'regexstring:.4')"}

ROW                                 COLUMN+CELL

 xiaoming01                         column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD

 xiaoming01                         column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD

 xiaoming01                         column=info:age, timestamp=1441998917568, value=24

 xiaoming02                         column=info:age, timestamp=1441998917594, value=24

 xiaoming03                         column=info:age, timestamp=1441998919607, value=24

3 row(s) in 0.0130 seconds

//SingleColumnValueFilter例子2 -- SubstringComparator

private static void scanFilter03() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    //检测一个子串是否存在于值中(大小写不敏感) -- SubstringComparator

    //过滤age值中包含'4'的RowKey

    SubstringComparator comparator = new SubstringComparator("4");

    //第四个参数不一样

    SingleColumnValueFilter scvf = new SingleColumnValueFilter("info".getBytes(), "age".getBytes(), CompareOp.EQUAL, comparator);

    Scan scan = new Scan();

    scan.setFilter(scvf);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

hbase(main):033:0> scan 'users',{FILTER=>"SingleColumnValueFilter('info','age',=,'substring:4')"}

ROW                                 COLUMN+CELL

 xiaoming01                         column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD

 xiaoming01                         column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD

 xiaoming01                         column=info:age, timestamp=1441998917568, value=24

 xiaoming02                         column=info:age, timestamp=1441998917594, value=24

 xiaoming03                         column=info:age, timestamp=1441998919607, value=24

3 row(s) in 0.0180 seconds

3.列名过滤器
由于HBase采用键值对保存内部数据，列名过滤器过滤一行的列名(ColumnFamily：Qualifiers)是否存在 , 对应前节所述列值的情况。

3.1.基于Columun Family列族过滤数据的FamilyFilter
FamilyFilter(CompareFilter.CompareOp familyCompareOp, ByteArrayComparable familyComparator)

注意：
1.如果希望查找的是一个已知的列族，则使用 scan.addFamily(family); 比使用过滤器效率更高.
2.由于目前HBase对多列族支持不完善，所以该过滤器目前用途不大.

//基于列族过滤数据的FamilyFilter

private static void scanFilter04() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    //过滤 = 'address'的列族

    //FamilyFilter familyFilter = new FamilyFilter(CompareOp.EQUAL, new BinaryComparator("address".getBytes()));

    //过滤以'add'开头的列族

    FamilyFilter familyFilter = new FamilyFilter(CompareOp.EQUAL, new BinaryPrefixComparator("add".getBytes()));

    Scan scan = new Scan();

    scan.setFilter(familyFilter);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

hbase(main):021:0> scan 'users',{FILTER=>"FamilyFilter(=,'binaryprefix:add')"}

ROW                                 COLUMN+CELL

 xiaoming                           column=address:city, timestamp=1441997498965, value=hangzhou

 xiaoming                           column=address:contry, timestamp=1441997498911, value=china

 xiaoming                           column=address:province, timestamp=1441997498939, value=zhejiang

 xiaoming01                         column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD

 xiaoming01                         column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD

 zhangyifei                         column=address:city, timestamp=1441997499108, value=jieyang

 zhangyifei                         column=address:contry, timestamp=1441997499077, value=china

 zhangyifei                         column=address:province, timestamp=1441997499093, value=guangdong

 zhangyifei                         column=address:town, timestamp=1441997500711, value=xianqiao

3 row(s) in 0.0400 seconds

3.2.基于Qualifier列名过滤数据的QualifierFilter
QualifierFilter(CompareFilter.CompareOp op, ByteArrayComparable qualifierComparator)

说明：该过滤器应该比FamilyFilter更常用！

//基于Qualifier(列名)过滤数据的QualifierFilter

private static void scanFilter05() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    //过滤列名 = 'age'所有RowKey

    //QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new BinaryComparator("age".getBytes()));

    //过滤列名  以'age'开头 所有RowKey(包含age)

    //QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new BinaryPrefixComparator("age".getBytes()));

    //过滤列名  包含'age' 所有RowKey(包含age)

    //QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new SubstringComparator("age"));

    //过滤列名  符合'.ge'正则表达式 所有RowKey

    QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new RegexStringComparator(".ge"));

    Scan scan = new Scan();

    scan.setFilter(qualifierFilter);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

hbase(main):020:0> scan 'users',{FILTER=>"QualifierFilter(=,'regexstring:.ge')"}

ROW                                 COLUMN+CELL

 xiaoming                           column=info:age, timestamp=1441997971945, value=38

 xiaoming01                         column=info:age, timestamp=1441998917568, value=24

 xiaoming02                         column=info:age, timestamp=1441998917594, value=24

 xiaoming03                         column=info:age, timestamp=1441998919607, value=24

 zhangyifei                         column=info:age, timestamp=1442247255446, value=18

5 row(s) in 0.0460 seconds

3.3.基于列名前缀过滤数据的ColumnPrefixFilter(该功能用QualifierFilter也能实现)
ColumnPrefixFilter(byte[] prefix)
注意：一个列名是可以出现在多个列族中的，该过滤器将返回所有列族中匹配的列。

//ColumnPrefixFilter例子

private static void scanFilter06() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    //匹配 以'ag'开头的所有的列

    ColumnPrefixFilter columnPrefixFilter = new ColumnPrefixFilter("ag".getBytes());

    Scan scan = new Scan();

    scan.setFilter(columnPrefixFilter);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

hbase(main):018:0> scan 'users',{FILTER=>"ColumnPrefixFilter('ag')"}

ROW                                 COLUMN+CELL

 xiaoming                           column=info:age, timestamp=1441997971945, value=38

 xiaoming01                         column=info:age, timestamp=1441998917568, value=24

 xiaoming02                         column=info:age, timestamp=1441998917594, value=24

 xiaoming03                         column=info:age, timestamp=1441998919607, value=24

 zhangyifei                         column=info:age, timestamp=1442247255446, value=18

5 row(s) in 0.0280 seconds

3.4.基于多个列名前缀过滤数据的MultipleColumnPrefixFilter
MultipleColumnPrefixFilter 和 ColumnPrefixFilter 行为差不多，但可以指定多个前缀。

//MultipleColumnPrefixFilter例子

private static void scanFilter07() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    //匹配 以'a'或者'c'开头 所有的列{二维数组}

    byte[][] prefixes =new byte[][]{"a".getBytes(), "c".getBytes()};

     MultipleColumnPrefixFilter multipleColumnPrefixFilter = new MultipleColumnPrefixFilter(prefixes );

    Scan scan = new Scan();

    scan.setFilter(multipleColumnPrefixFilter);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

hbase(main):017:0> scan 'users',{FILTER=>"MultipleColumnPrefixFilter('a','c')"}

ROW                                 COLUMN+CELL

 xiaoming                           column=address:city, timestamp=1441997498965, value=hangzhou

 xiaoming                           column=address:contry, timestamp=1441997498911, value=china

 xiaoming                           column=info:age, timestamp=1441997971945, value=38

 xiaoming                           column=info:company, timestamp=1441997498889, value=alibaba

 xiaoming01                         column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD

 xiaoming01                         column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD

 xiaoming01                         column=info:age, timestamp=1441998917568, value=24

 xiaoming02                         column=info:age, timestamp=1441998917594, value=24

 xiaoming03                         column=info:age, timestamp=1441998919607, value=24

 zhangyifei                         column=address:city, timestamp=1441997499108, value=jieyang

 zhangyifei                         column=address:contry, timestamp=1441997499077, value=china

 zhangyifei                         column=info:age, timestamp=1442247255446, value=18

 zhangyifei                         column=info:company, timestamp=1441997499039, value=alibaba

5 row(s) in 0.0430 seconds

3.5.基于列范围(不是行范围)过滤数据ColumnRangeFilter

可用于获得一个范围的列，例如，如果你的一行中有百万个列，但是你只希望查看列名从bbbb到dddd的范围
该方法从 HBase 0.92 版本开始引入
一个列名是可以出现在多个列族中的，该过滤器将返回所有列族中匹配的列

构造函数：
ColumnRangeFilter(byte[] minColumn, boolean minColumnInclusive, byte[] maxColumn, boolean maxColumnInclusive)
参数解释：

minColumn - 列范围的最小值，如果为空，则没有下限
minColumnInclusive - 列范围是否包含minColumn
maxColumn - 列范围最大值，如果为空，则没有上限
maxColumnInclusive - 列范围是否包含maxColumn

//ColumnRangeFilter例子

private static void scanFilter08() throws IOException,

UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    //匹配 以'a'开头到以'c'开头(不包含c) 所有的列

    ColumnRangeFilter columnRangeFilter = new ColumnRangeFilter("a".getBytes(), true, "c".getBytes(), false);

    Scan scan = new Scan();

    scan.setFilter(columnRangeFilter);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

hbase(main):016:0> scan 'users',{FILTER=>"ColumnRangeFilter('a',true,'c',false)"}

ROW                                 COLUMN+CELL

 xiaoming                           column=info:age, timestamp=1441997971945, value=38

 xiaoming                           column=info:birthday, timestamp=1441997498851, value=1987-06-17

 xiaoming01                         column=info:age, timestamp=1441998917568, value=24

 xiaoming02                         column=info:age, timestamp=1441998917594, value=24

 xiaoming03                         column=info:age, timestamp=1441998919607, value=24

 zhangyifei                         column=info:age, timestamp=1442247255446, value=18

 zhangyifei                         column=info:birthday, timestamp=1441997498990, value=1987-4-17

5 row(s) in 0.0340 seconds

4.RowKey
当需要根据行键特征查找一个范围的行数据时，使用Scan的startRow和stopRow会更高效，但是，startRow和stopRow只能匹配行键的开始字符，而不能匹配中间包含的字符。当需要针对行键进行更复杂的过滤时，可以使用RowFilter。
构造函数：RowFilter(CompareFilter.CompareOp rowCompareOp, ByteArrayComparable rowComparator)

//RowFilter例子

private static void scanFilter09() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    //匹配 行键包含'01' 所有的行

    RowFilter rowFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("01"));

    Scan scan = new Scan();

    scan.setFilter(rowFilter);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

hbase(main):013:0> scan 'users',{FILTER=>"RowFilter(=,'substring:01')"}

ROW                                 COLUMN+CELL

 xiaoming01                         column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD

 xiaoming01                         column=address:country, timestamp=1442000228945, value=\xE4\xB8\xAD\xE5\x9B\xBD

 xiaoming01                         column=info:age, timestamp=1441998917568, value=24

1 row(s) in 0.0190 seconds

5.PageFilter(Shell不支持?)
指定页面行数，返回对应行数的结果集。
需要注意的是，该过滤器并不能保证返回的结果行数小于等于指定的页面行数，因为过滤器是分别作用到各个region server的，它只能保证当前region返回的结果行数不超过指定页面行数。
构造函数：PageFilter(long pageSize)

//PageFilter例子

private static void scanFilter10() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    //从RowKey为 "xiaoming" 开始，取3行(包含xiaoming)

    PageFilter pageFilter = new PageFilter(3L);

    Scan scan = new Scan();

    scan.setStartRow("xiaoming".getBytes());

    scan.setFilter(pageFilter);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

注意：由于该过滤器并不能保证返回的结果行数小于等于指定的页面行数，所以更好的返回指定行数的办法是ResultScanner.next(int nbRows)，即：

//上面Demo的改动版
private static void scanFilter11() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    //从RowKey为 "xiaoming" 开始，取3行(包含xiaoming)

    //PageFilter pageFilter = new PageFilter(3L);

    Scan scan = new Scan();

    scan.setStartRow("xiaoming".getBytes());

    //scan.setFilter(pageFilter);

    ResultScanner rs = ht.getScanner(scan);

    //指定返回3行数据

    for(Result result : rs.next(3)){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

6.SkipFilter(Shell不支持)
根据整行中的每个列来做过滤，只要存在一列不满足条件，整行都被过滤掉。
构造函数：SkipFilter(Filter filter)

例如，如果一行中的所有列代表的是不同物品的重量，则真实场景下这些数值都必须大于零，我们希望将那些包含任意列值为0的行都过滤掉。在这个情况下，我们结合ValueFilter和SkipFilter共同实现该目的：
scan.setFilter(new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL,new BinaryComparator(Bytes.toBytes(0))));

//SkipFilter例子

private static void scanFilter12() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    //跳过列值中包含"24"的所有列

    SkipFilter skipFilter = new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL, new BinaryComparator("24".getBytes())));

    Scan scan = new Scan();

    scan.setFilter(skipFilter);

    ResultScanner rs = ht.getScanner(scan);

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

        }

    }

    ht.close();

}

7.Utility--FirstKeyOnlyFilter
该过滤器仅仅返回每一行中第一个cell的值，可以用于高效的执行行数统计操作。估计实战意义不大。
构造函数：public FirstKeyOnlyFilter()

//FirstKeyOnlyFilter例子

private static void scanFilter12() throws IOException,

        UnsupportedEncodingException {

    Configuration conf = HBaseConfiguration.create();

    conf.set("hbase.rootdir", "hdfs://ncst:9000/hbase");

    conf.set("hbase.zookeeper.quorum", "ncst");

    HTable ht = new HTable(conf, "users");

    //返回每一行中的第一个cell的值

    FirstKeyOnlyFilter firstKeyOnlyFilter = new FirstKeyOnlyFilter();

    Scan scan = new Scan();

    scan.setFilter(firstKeyOnlyFilter);

    ResultScanner rs = ht.getScanner(scan);

    int i = 0;

    for(Result result : rs){

        for(Cell cell : result.rawCells()){

            System.out.println(new String(CellUtil.cloneRow(cell))+"\t"

                    +new String(CellUtil.cloneFamily(cell))+"\t"

                    +new String(CellUtil.cloneQualifier(cell))+"\t"

                    +new String(CellUtil.cloneValue(cell),"UTF-8")+"\t"

                    +cell.getTimestamp());

            i++;

        }

    }

    //输出总的行数

    System.out.println(i);

    ht.close();

}

hbase(main):009:0> scan 'users',{FILTER=>'FirstKeyOnlyFilter()'}

ROW                                COLUMN+CELL

 xiaoming                          column=address:city, timestamp=1441997498965, value=hangzhou

 xiaoming01                        column=address:contry, timestamp=1442000277200, value=\xE4\xB8\xAD\xE5\x9B\xBD

 xiaoming02                        column=info:age, timestamp=1441998917594, value=24

 xiaoming03                        column=info:age, timestamp=1441998919607, value=24

 zhangyifei                        column=address:city, timestamp=1441997499108, value=jieyang

5 row(s) in 0.0240 seconds

HBase Filter及对应Shell的更多相关文章

HBase filter shell操作
创建表 create 'test1', 'lf', 'sf' lf: column family of LONG values (binary value) -- sf: column family ...
hbase的常用的shell命令&hbase的DDL操作&hbase的DML操作
前言笔者在分类中的hbase栏目之前已经分享了hbase的安装以及一些常用的shell命令的使用,这里不仅仅重新复习一下shell命令,还会介绍hbase的DDL以及DML的相关操作. hbase的 ...
hbase各种遍历查询shell语句包含过滤组合条件
import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; import java.util.Li ...
HBase Filter 过滤器之RowFilter详解
前言:本文详细介绍了HBase RowFilter过滤器Java&Shell API的使用,并贴出了相关示例代码以供参考.RowFilter 基于行键进行过滤,在工作中涉及到需要通过HBase ...
HBase Filter 过滤器之FamilyFilter详解
前言:本文详细介绍了 HBase FamilyFilter 过滤器 Java&Shell API 的使用,并贴出了相关示例代码以供参考.FamilyFilter 基于列族进行过滤,在工作中涉及 ...
HBase Filter 过滤器之QualifierFilter详解
前言:本文详细介绍了 HBase QualifierFilter 过滤器 Java&Shell API 的使用,并贴出了相关示例代码以供参考.QualifierFilter 基于列名进行过滤, ...
HBase Filter 过滤器之 ValueFilter 详解
前言:本文详细介绍了 HBase ValueFilter 过滤器 Java&Shell API 的使用,并贴出了相关示例代码以供参考.ValueFilter 基于列值进行过滤,在工作中涉及到需 ...
一个自定义 HBase Filter -“通过RowKeys来高性能获取数据”
摘要: 大家在使用HBase和Solr搭建系统中经常遇到的一个问题就是:“我通过SOLR得到了RowKeys后,该怎样去HBase上取数据”.使用现有的Filter性能差劲,网上也没有现成的自定义Fi ...
生成HFile文件后倒入数据出现Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.filter.Filter
数据导入的时候出现: at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclar ...

随机推荐

使用 Canal 实现数据异构
小结: 1. https://mp.weixin.qq.com/s/z-5aoGVuk7JzIGrJJWgeJw 使用 Canal 实现数据异构原创: 杜亦舒性能与架构 3月4日
微信开发基于springboot
0.申请一个微信公众号,记住他的appId,secret,token,accesstoken 1.创建一个springboot项目.在pom文件里面导入微信开发工具类 <dependency&g ...
[dev][go] 入门Golang都需要了解什么
一什么是Golang 首先要了解Golang是什么. Golang是一门计算机编程语言:可以编译成机器码的像python一样支持各种特性的高级语言. 由Google发明,发明人之一是K,就是C语言的 ...
第七周 ip通信基础回顾
H3C的配置指令包括:基本配置,查看指令,接口配置. 基本配置包括:查看可用指令:进入系统视图,全局配置模式:给设备命名:退回上一层模式:直接退回到用户模式. 查看指令包括:显示设备系统版本信息:显示 ...
用servlet进行用户名和密码校验1
运行效果如下: 代码截图: 登陆网页: 显示网页: 网盘链接: 链接:https://pan.baidu.com/s/1g5XJ6y8u5R5Wt0Lkj9g9lg 提取码:bphb
react-router@4.0 使用和源码解析
如果你已经是一个正在开发中的react应用,想要引入更好的管理路由功能.那么,react-router是你最好的选择~react-router版本现今已经到4.0.0了,而上一个稳定版本还是2.8.1 ...
改造一下jeecg中的部门树
假装有需求关于 jeecg 提供的部门树,相信很多小伙伴都已经用过了,今天假装有那么一个需求 "部门树弹窗选择默认展开下级部门",带着这个需求再次去探索一下吧. 一.改造之前的部 ...
webservice学习教程（三）--
快速入门首先,我们来尝试一下调用别人写好的webService 来体验一把:我们访问http://www.webxml.com.cn/zh_cn/index.aspx 进入到里边当我们输入一个号码 ...
python学习笔记1-基础知识
# 0.输入输出 # print数值型直接输出计算结果 pirnt( + ) # 输出 + = # input输入(可在括号内加提示语句) name = input('please enter you ...
Byword for Mac(Markdown编辑器)中文版
还在找Markdown编辑器吗?那不妨试试Byword for Mac吧!这是一款轻量级的富文本编辑器,byword mac版提供了完整的Markdown支持,包含脚注.表格.交叉引用等功能,Bywo ...

HBase Filter及对应Shell

HBase Filter及对应Shell的更多相关文章

随机推荐

热门专题