2.8-2.10 HBase集成MapReduce

一、HBase集成MapReduce

1、查看HBase集成MapReduce需要的jar包

[root@hadoop-senior hbase-0.98.6-hadoop2]# bin/hbase mapredcp

2019-05-22 16:23:46,814 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-common-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/protobuf-java-2.5.0.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-client-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-hadoop-compat-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-server-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-protocol-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/high-scale-lib-1.1.1.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/zookeeper-3.4.5.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/guava-12.0.1.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/htrace-core-2.04.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/netty-3.6.6.Final.jar

2、

##开启yarn

[root@hadoop-senior hadoop-2.5.0]# sbin/yarn-daemon.sh start nodemanager

[root@hadoop-senior hadoop-2.5.0]# sbin/mr-jobhistory-daemon.sh start histryserver

[root@hadoop-senior hadoop-2.5.0]# sbin/mr-jobhistory-daemon.sh start historyserver

##HBase默认带的MapReduce程序都在hbase-server-0.98.6-hadoop2.jar里面，比较有用

[root@hadoop-senior hbase-0.98.6-hadoop2]# export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2

[root@hadoop-senior hbase-0.98.6-hadoop2]# export HADOOP_HOME=/opt/modules/hadoop-2.5.0

[root@hadoop-senior hbase-0.98.6-hadoop2]# HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp` $HADOOP_HOME/bin/yarn jar $HBASE_HOME/lib/hbase-server-0.98.6-hadoop2.jar

An example program must be given as the first argument.

Valid program names are:

  CellCounter: Count cells in HBase table

  completebulkload: Complete a bulk data load.

  copytable: Export a table from local cluster to peer cluster

  export: Write table data to HDFS.

  import: Import data written by Export.

  importtsv: Import data in TSV format.

  rowcounter: Count rows in HBase table

  verifyrep: Compare the data from tables in two different clusters. WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is changed after being appended to the log.

#####

TSV

    tab分割

    >>student.tsv

    1001 zhangsan 26 shanghai 

CSV

    逗号分割

    >>student.csv

    1001，zhangsan，26，shanghai

二、编写MapReduce程序，集成HBase对表进行读取和写入数据

1、准备数据

##准备两张表，user:里面有数据，basic:没有数据

hbase(main):004:0> create 'basic', 'info'

0 row(s) in 0.4290 seconds

=> Hbase::Table – basic

hbase(main):005:0> list

TABLE

basic

user

2 row(s) in 0.0290 seconds

=> ["basic", "user"]

hbase(main):003:0> scan 'user'

ROW                                          COLUMN+CELL

 10002                                       column=info:age, timestamp=1558343570256, value=30

 10002                                       column=info:name, timestamp=1558343559457, value=wangwu

 10002                                       column=info:qq, timestamp=1558343612746, value=231294737

 10002                                       column=info:tel, timestamp=1558343607851, value=231294737

 10003                                       column=info:age, timestamp=1558577830484, value=35

 10003                                       column=info:name, timestamp=1558345826709, value=zhaoliu

 10004                                       column=info:address, timestamp=1558505387829, value=shanghai

 10004                                       column=info:age, timestamp=1558505387829, value=25

 10004                                       column=info:name, timestamp=1558505387829, value=zhaoliu

3 row(s) in 0.0190 seconds

hbase(main):006:0> scan 'basic'

ROW                                          COLUMN+CELL

0 row(s) in 0.0100 seconds

2、编写MapReduce，将user表中的数据导入到basic表中

package com.beifeng.senior.hadoop.hbase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.hbase.Cell;

import org.apache.hadoop.hbase.CellUtil;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.Mutation;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;

import org.apache.hadoop.hbase.mapreduce.TableMapper;

import org.apache.hadoop.hbase.mapreduce.TableReducer;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

public class User2BasicMapReduce extends Configured implements Tool {

    // Mapper Class

    public static class ReadUserMapper extends TableMapper<Text, Put> {

        private Text mapOutputKey = new Text();

        @Override

        public void map(ImmutableBytesWritable key, Result value,

                Mapper<ImmutableBytesWritable, Result, Text, Put>.Context context)

                        throws IOException, InterruptedException {

            // get rowkey

            String rowkey = Bytes.toString(key.get());

            // set

            mapOutputKey.set(rowkey);

            // --------------------------------------------------------

            Put put = new Put(key.get());

            // iterator

            for (Cell cell : value.rawCells()) {

                // add family : info

                if ("info".equals(Bytes.toString(CellUtil.cloneFamily(cell)))) {

                    // add column: name

                    if ("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))) {

                        put.add(cell);

                    }

                    // add column : age

                    if ("age".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))) {

                        put.add(cell);

                    }

                }

            }

            // context write

            context.write(mapOutputKey, put);

        }

    }

    // Reducer Class

    public static class WriteBasicReducer extends TableReducer<Text, Put, //

    ImmutableBytesWritable> {

        @Override

        public void reduce(Text key, Iterable<Put> values,

                Reducer<Text, Put, ImmutableBytesWritable, Mutation>.Context context)

                        throws IOException, InterruptedException {

            for(Put put: values){

                context.write(null, put);

            }

        }

    }

    // Driver

    public int run(String[] args) throws Exception {

        // create job

        Job job = Job.getInstance(this.getConf(), this.getClass().getSimpleName());

        // set run job class

        job.setJarByClass(this.getClass());

        // set job

        Scan scan = new Scan();

        scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs

        scan.setCacheBlocks(false);  // don't set to true for MR jobs

        // set other scan attrs

        // set input and set mapper

        TableMapReduceUtil.initTableMapperJob(

          "user",        // input table

          scan,               // Scan instance to control CF and attribute selection

          ReadUserMapper.class,     // mapper class

          Text.class,         // mapper output key

          Put.class,  // mapper output value

          job //

         );

        // set reducer and output

        TableMapReduceUtil.initTableReducerJob(

          "basic",        // output table

          WriteBasicReducer.class,    // reducer class

          job//

         );

        job.setNumReduceTasks(1);   // at least one, adjust as required

        // submit job

        boolean isSuccess = job.waitForCompletion(true) ;

        return isSuccess ? 0 : 1;

    }

    public static void main(String[] args) throws Exception {

        // get configuration

        Configuration configuration = HBaseConfiguration.create();

        // submit job

        int status = ToolRunner.run(configuration,new User2BasicMapReduce(),args) ;

        // exit program

        System.exit(status);

    }

}

3、执行

##打jar包，并上传到$HADOOP_HOME/jars/

##执行

export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2

export HADOOP_HOME=/opt/modules/hadoop-2.5.0

HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp` $HADOOP_HOME/bin/yarn jar $HADOOP_HOME/jars/hbase-mr-user2basic.jar

##查看执行结果

hbase(main):004:0> scan 'basic'

ROW                                          COLUMN+CELL

 10002                                       column=info:age, timestamp=1558343570256, value=30

 10002                                       column=info:name, timestamp=1558343559457, value=wangwu

 10003                                       column=info:age, timestamp=1558577830484, value=35

 10003                                       column=info:name, timestamp=1558345826709, value=zhaoliu

 10004                                       column=info:age, timestamp=1558505387829, value=25

 10004                                       column=info:name, timestamp=1558505387829, value=zhaoliu

3 row(s) in 0.0300 seconds

2.8-2.10 HBase集成MapReduce的更多相关文章

HBase概念学习（七）HBase与Mapreduce集成
这篇文章是看了HBase权威指南之后,依据上面的解说搬下来的样例,可是略微有些不一样. HBase与mapreduce的集成无非就是mapreduce作业以HBase表作为输入,或者作为输出,也或者作 ...
HBase 与 MapReduce 集成
6. HBase 与 MapReduce 集成 6.1 官方 HBase 与 MapReduce 集成查看 HBase 的 MapReduce 任务的执行:bin/hbase mapredcp; 环 ...
hbase运行mapreduce设置及基本数据加载方法
hbase与mapreduce集成后,运行mapreduce程序,同时需要mapreduce jar和hbase jar文件的支持,这时我们需要通过特殊设置使任务可以同时读取到hadoop jar和h ...
hive与hbase集成
http://blog.csdn.net/vah101/article/details/22597341 这篇文章最初是基于介绍HIVE-705.这个功能允许Hive QL命令访问HBase表,进行读 ...
Hbase框架原理及相关的知识点理解、Hbase访问MapReduce、Hbase访问Java API、Hbase shell及Hbase性能优化总结
转自:http://blog.csdn.net/zhongwen7710/article/details/39577431 本blog的内容包含: 第一部分:Hbase框架原理理解第二部分:Hbas ...
《HBase in Action》第三章节的学习总结 ---- 如何编写和运行基于HBase的MapReduce程序
HBase之所以与Hadoop是最好的伙伴,我理解就因为两点:1.HADOOP的HDFS,为HBase提供了分布式的存储方式:2.HADOOP的MR为HBase提供的分布式的计算方法.u 其中第一点, ...
3.12-3.16 Hbase集成hive、sqoop、hue
一.Hbase集成hive https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration 1.说明 Hive与HBase整合在一起 ...
新闻实时分析系统Hive与HBase集成进行数据分析 Cloudera HUE大数据可视化分析
1.Hue 概述及版本下载 1)概述 Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python ...
新闻实时分析系统Hive与HBase集成进行数据分析
(一)Hive 概述 (二)Hive在Hadoop生态圈中的位置 (三)Hive 架构设计 (四)Hive 的优点及应用场景 (五)Hive 的下载和安装部署 1.Hive 下载 Apache版本的H ...

随机推荐

C#中??和？分别是什么意思？在ASP.NET开发中一些单词的标准缩写 C#SESSION丢失问题的解决办法在C#中INTERFACE与ABSTRACT CLASS的区别 SQL命令语句小技巧 JQUERY判断CHECKBOX是否选中三种方法 JS中!=、==、!==、===的用法和区别在对象比较中，对象相等和对象一致分别指的是什么？
C#中??和?分别是什么意思? 在C#中??和?分别是什么意思? 1. 可空类型修饰符(?):引用类型可以使用空引用表示一个不存在的值,而值类型通常不能表示为空.例如:string str=null; ...
PCIE、UART、HDA、I2C、SMBUS、SPI、eSPI、USB、PS2、CAN、SDIO等数据传输协议简介
M.2 wife一般支持USB.SDIO.PCIE三种传输 1.摄像头 (1)MIPI CSI (2)USB mipi摄像头模组IC简单便宜(小),应为一般把ADC解码在CPU端. MIPI摄像头简介 ...
Python编写的ARP扫描工具
源码如下: rom scapy.all import * import threading import argparse import logging import re logging.getLo ...
Tika解析word文件
Apache POI - HWPF and XWPF - Java API to Handle Microsoft Word Files http://poi.apache.org/document/ ...
cvpr2014
http://www.cvpapers.com/cvpr2014.html 吴佳俊楼天城
MVC入门——列表页
创建控制器UserInfoController using System; using System.Collections.Generic; using System.Linq; using Sys ...
objective-c中#import和@class的差别
在Objective-C中,能够使用#import和@class来引用别的类型, 可是你知道两者有什么差别吗? @class叫做forward-class, 你常常会在头文件的定义中看到通过@cla ...
Share Memory By Communicating
Share Memory By Communicating - The Go Programming Language https://golang.google.cn/doc/codewalk/sh ...
ElasticSearch（四）kibana实现CURD
一. kibana安装 1.到官网或是用brew下载kibana 安装包,这边我们选择在官网下载对应的安装包 https://www.elastic.co/cn/downloads/kibana 2. ...
SQL Server 2005中top关键字的用法
1.返回N条记录数 select top n * from <表名> [查询条件] 2.返回总结果集中指定百分比记录数 select top n percent * from <表名 ...

2.8-2.10 HBase集成MapReduce

2.8-2.10 HBase集成MapReduce的更多相关文章

随机推荐

热门专题