使用TableSnapshotInputFormat读取Hbase快照数据

根据快照名称读取hbase快照中的数据，在网上查了好多资料，很少有资料能够给出清晰的方案，根据自己的摸索终于实现，现将代码贴出，希望能给大家有所帮助：

public void read(org.apache.hadoop.conf.Configuration hadoopConf, Pipeline pipeline, ReaderParam readerParam, int batchSize) {

        limiter = RateLimiter.create(readerParam.getFetchSize() * M_BYTE_SIZE);

        //用于记录读取行数

        AtomicInteger totalCount = new AtomicInteger();

        JobConf conf = new JobConf(hadoopConf);

        String sourceRcFilePath = readerParam.getFilePath();

        logger.info(String.format("Start Read Rcfile [%s].", sourceRcFilePath));

        String defaultFS=String.format("hdfs://%s", readerParam.getFsdefaultname());

        try {

            int size = 1;

            BatchData batchData;

            List<Record> recordList = new ArrayList<>(batchSize);

            Scan scan = new Scan();

            scan.setCaching(500);

            scan.setCacheBlocks(false);      //离线任务必须设置

            conf.set(TableInputFormat.SCAN, Base64.encodeBytes(ProtobufUtil.toScan(scan).toByteArray()));

            //序列化

            InputFormat<ImmutableBytesWritable, Result> in = new TableSnapshotInputFormat();

            Path rootDir = FSUtils.getRootDir(conf);

            String[] tableNameSplit = readerParam.getFileName().split(":");

            String namespace_table = tableNameSplit[0]+"_"+tableNameSplit[1];

            Connection conn = ConnectionFactory.createConnection(conf);

            Admin admin = conn.getAdmin();

            boolean tableExist = admin.tableExists(TableName.valueOf(readerParam.getFileName()));

//            List<HBaseProtos.SnapshotDescription> list = admin.listSnapshots("^"+namespace_table);

//            TableName[] tables = admin.listTableNames();

//            List<HBaseProtos.SnapshotDescription> list = admin.listSnapshots();

//            for(HBaseProtos.SnapshotDescription snapshotDescription : list){

//                String snapshotName = snapshotDescription.getMsg();

//                String table = snapshotDescription.getTable();

//            }

            FileSystem fs = FileSystem.get(conf);

            Path rootPath = new Path(conf.get("hbase.rootdir"));

            Path snapshotDir = new Path(conf.get("hbase.rootdir")+HBASE_SNAPSHOT_BASE_PATH);

            snapshotDir = SnapshotDescriptionUtils.getSnapshotRootDir(new Path(conf.get("hbase.rootdir")));

            FileStatus[] listStatus = fs.listStatus(snapshotDir);

//            HBaseProtos.SnapshotDescription snapshotDescription = SnapshotDescriptionUtils.readSnapshotInfo(fs, new Path(conf.get("hbase.rootdir")+"/.snapshots/completed"));

//            Arrays.stream(listStatus).forEach(x-> System.out.println(x.getPath().toString()));

//            System.out.println("-----------------------------------------");

            List<String> snapshotList = new ArrayList<String>();

            Arrays.stream(listStatus).filter(x-> !x.getPath().getName().startsWith(".")).forEach(x->{

                String snapshotName = x.getPath().getName();

                Path snapshotPath = SnapshotDescriptionUtils.getCompletedSnapshotDir(snapshotName, rootPath);

                try {

                   HBaseProtos.SnapshotDescription s = SnapshotDescriptionUtils.readSnapshotInfo(fs, snapshotPath);

                   System.out.println("tableName:"+ s.getTable()+"\t snapshot:"+s.getName());

                   if (s.getTable().equalsIgnoreCase(readerParam.getFileName())){

                       snapshotList.add(s.getName());

                   }

                } catch (CorruptedSnapshotException e) {

                    e.printStackTrace();

                }

            });

//            List<String> snapshotList = Arrays.stream(listStatus).filter(x-> !x.getPath().getMsg().startsWith(".")).map(x -> String.valueOf(x.getPath())).filter(x -> x.contains(namespace_table)).sorted(Comparator.reverseOrder()).collect(Collectors.toList());

            snapshotList.stream().forEach(x -> System.out.println(x));

            if (snapshotList.isEmpty()){

                String message = String.format("读取Hbase快照信息发生异常，没有找到对应表快照，请联系系统管理员。", readerParam.getFilePath());

                logger.error(message);

                throw DiException.asDiException(CommonErrorCode.CONFIG_ERROR, message);

            }

            String snapshotName = snapshotList.stream().sorted(Comparator.reverseOrder()).findFirst().get();

            String restorTmp = String.format("%s/user/%s/restoretmp/%s", conf.get("fs.defaultFS"), "di", namespace_table);

            Path restorPath = new Path(restorTmp);

//            Path restorPath = new Path("hdfs://RouterSit/user/di/restoretmp/ns_di_snapshot_test2");

            TableSnapshotInputFormatImpl.setInput(conf, snapshotName, restorPath);

            List<String> columns = Arrays.asList(readerParam.getReadColumns().split(","));

            //Each file as a split

            InputSplit[] splits = in.getSplits(conf, 1);

            for (InputSplit split : splits){

                recordReader = in.getRecordReader(split, conf, Reporter.NULL);

                ImmutableBytesWritable key = recordReader.createKey();

                Result value = recordReader.createValue();

                List<Object> recordFields;

                while (start && recordReader.next(key, value)) {

                    Record record = result2Record(value, columns);

                    limiter.acquire(record.getMemorySize());

                    recordList.add(record);

                    size++;

                }

            }

        } catch (Exception e) {

            String message = String.format("读取Hbase快照数据发生异常，请联系系统管理员。", readerParam.getFilePath());

            logger.error(message);

            throw DiException.asDiException(CommonErrorCode.CONFIG_ERROR, message, e);

        } finally {

            stop();

        }

    }

如果读取快照数据时，数据列簇使用的是lzo压缩的话，可能会遇到lzo解压缩问题，可以参照：hbase读取快照数据-lzo压缩遇到的问题

使用TableSnapshotInputFormat读取Hbase快照数据的更多相关文章

Spark读取Hbase的数据
val conf = HBaseConfiguration.create() conf.addResource(new Path("/opt/cloudera/parcels/CDH-5.4 ...
hbase与hive集成：hive读取hbase中数据
1.创建hbase jar包到hive lib目录软连接 hive需要jar包: hive-hbase-handler-0.13.1-cdh5.3.6.jar zookeeper-3.4.5-cdh5 ...
hbase读取快照数据-lzo压缩遇到的问题
1.读取hbase快照数据时报UnsatisfiedLinkError: no gplcompression in java.library.path错: 2019-09-04 17:36:07,44 ...
Spark 读取HBase和SolrCloud数据
Spark1.6.2读取SolrCloud 5.5.1 //httpmime-4.4.1.jar // solr-solrj-5.5.1.jar //spark-solr-2.2.2-20161007 ...
关于mapreducer 读取hbase数据存入mysql的实现过程
mapreducer编程模型是一种八股文的代码逻辑,就以用户行为分析求流存率的作为例子 1.map端来说:必须继承hadoop规定好的mapper类:在读取hbase数据时,已经有现成的接口 Tabl ...
使用MapReduce读取HBase数据存储到MySQL
Mapper读取HBase数据 package MapReduce; import org.apache.hadoop.hbase.Cell; import org.apache.hadoop.hba ...
SparkSQL读取HBase数据
这里的SparkSQL是指整合了Hive的spark-sql cli(关于SparkSQL和Hive的整合,见文章后面的参考阅读). 本质上就是通过Hive访问HBase表,具体就是通过hive-hb ...
Spark 读取HBase数据
Spark1.6.2 读取 HBase 1.2.3 //hbase-common-1.2.3.jar //hbase-protocol-1.2.3.jar //hbase-server-1.2.3.j ...
Spark读取Hbase中的数据
大家可能都知道很熟悉Spark的两种常见的数据读取方式(存放到RDD中):(1).调用parallelize函数直接从集合中获取数据,并存入RDD中:Java版本如下: JavaRDD<Inte ...

随机推荐

$\LaTeX$数学公式大全4
$4\ Standard\ Function\ Names$将英文转化为罗马文$\arccos$ \arccos$\cos$ \cos$\csc$ \csc$\exp$ \exp$\ker$ \ker ...
Amaple.js框架详细介绍
Amaple · 体验优先的JavaScript单页框架 Amaple (点此查看Github仓库)是专为单页web应用而设计的基于页面模块化的JavaScript框架,它可使开发者快速开发单页web ...
[译]Webpack 4 — 神秘的SplitChunksc插件
原文链接:Webpack 4 - Mysterious SplitChunks Plugin 官方发布了 webpack 4,舍弃了之前的 commonChunkPlugin,增加了 SplitChu ...
【Nginx】 linux环境下安装nginx步骤
开始前,请确认gcc g++开发类库是否装好,默认已经安装. centos平台编译环境使用如下指令安装make: yum -y install gcc automake autoconf libto ...
js 原型链、构造函数、原型与实例之间的关系
面向对象编程都会涉及到继承这个概念,JS中实现继承的方式主要是通过原型链的方法. 一.构造函数.原型与实例之间的关系每创建一个函数,该函数就会自动带有一个 prototype 属性.该属性是个指针, ...
Linux 解压小全
.gz 解压1:gunzip FileName.gz 解压2:gzip -d FileName.gz 压缩:gzip FileName .zip 解压:unzip FileName.zip 压缩:zi ...
ZT：我们身边大多数的事都是暂时性的
1. 家庭放在首位. 2. 戒酒能有助于身体健康. 3. 经常跑步以及运动有益于身心健康. 4. 保证心胸开阔.让爱自动来到你的身边,而不需要你自己去拼命寻找. 5. 区分优秀的导师和老师.不断提升自 ...
backspace 产生乱码的问题
1.要使用回删键(backspace)时,同时按住ctrl键(一般情况下会有用,如果没用使用下面的方法) 2.设定环境变量在bash下:$ stty erase ^? 或者把 stty er ...
python基础知识（元组）
元组不能更改内容元组 (元素1,元素2) 元组的创建和删除使用赋值运算符直接创建元组元组名 = (元素1,元素2........) 只创建一个元素的元组元组名 = (元素1,) 创建空 ...
MATLAB2014b parpool 报错，并行工具无法开启解决方法
笔者一直在用matlab2014b,第一次使用并行工具parpool,但在运行别人的程序的过程中一直出现一个错误: Starting parallel pool (parpool) using the ...

使用TableSnapshotInputFormat读取Hbase快照数据

使用TableSnapshotInputFormat读取Hbase快照数据的更多相关文章

随机推荐

热门专题