Spark读写HBase
Spark读写HBase示例
1、HBase shell查看表结构
hbase(main)::> desc 'SDAS_Person'
Table SDAS_Person is ENABLED
SDAS_Person
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf0', BLOOMFILTER => 'ROW', VERSIONS => '', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '', BLOCKCACHE =>
'true', BLOCKSIZE => '', REPLICATION_SCOPE => ''}
{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '', BLOCKCACHE =>
'true', BLOCKSIZE => '', REPLICATION_SCOPE => ''}
{NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '', BLOCKCACHE =>
'true', BLOCKSIZE => '', REPLICATION_SCOPE => ''}
row(s) in 0.0810 seconds
hbase(main)::> desc 'RESULT'
Table RESULT is ENABLED
RESULT
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf0', BLOOMFILTER => 'ROW', VERSIONS => '', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '', BLOCKCACHE =>
'true', BLOCKSIZE => '', REPLICATION_SCOPE => ''}
row(s) in 0.0250 seconds
2、HBase shell插入数据
hbase(main)::> scan 'SDAS_Person'
ROW COLUMN+CELL
SDAS_1# column=cf0:Age, timestamp=, value=
SDAS_1# column=cf0:CompanyID, timestamp=, value=
SDAS_1# column=cf0:InDate, timestamp=, value=-- ::08.49
SDAS_1# column=cf0:Money, timestamp=, value=5.20
SDAS_1# column=cf0:Name, timestamp=, value=zhangsan
SDAS_1# column=cf0:PersonID, timestamp=, value=
3、pom.xml:
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
4、源码:
package com.zxth.sdas.spark.apps
import org.apache.spark._
import org.apache.spark.rdd.NewHadoopRDD
import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor}
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.hbase.client.Result
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat object HBaseOp {
var total:Int = 0
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("HBaseOp").setMaster("local")
val sc = new SparkContext(sparkConf) val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.quorum","master,slave1,slave2")
conf.set("hbase.zookeeper.property.clientPort", "2181")
conf.set(TableInputFormat.INPUT_TABLE, "SDAS_Person") //读取数据并转化成rdd
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result]) val count = hBaseRDD.count()
println("\n\n\n:" + count)
hBaseRDD.foreach{case (_,result) =>{
//获取行键
val key = Bytes.toString(result.getRow)
//通过列族和列名获取列
var obj = result.getValue("cf0".getBytes,"Name".getBytes)
val name = if(obj==null) "" else Bytes.toString(obj) obj = result.getValue("cf0".getBytes,"Age".getBytes);
val age:Int = if(obj == null) 0 else Bytes.toString(obj).toInt total = total + age
println("Row key:"+key+" Name:"+name+" Age:"+age+" total:"+total)
}}
var average:Double = total.toDouble/count.toDouble
println("" + total + "/" + count + " average age:" + average.toString()) //write hbase
conf.set(TableOutputFormat.OUTPUT_TABLE, "RESULT")
val job = new Job(conf)
job.setOutputKeyClass(classOf[ImmutableBytesWritable])
job.setOutputValueClass(classOf[Result])
job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]]) var arrResult:Array[String] = new Array[String](1)
arrResult(0) = "1," + total + "," + average;
//arrResult(0) = "1,100,11" val resultRDD = sc.makeRDD(arrResult)
val saveRDD = resultRDD.map(_.split(',')).map{arr=>{
val put = new Put(Bytes.toBytes(arr(0)))
put.add(Bytes.toBytes("cf0"),Bytes.toBytes("total"),Bytes.toBytes(arr(1)))
put.add(Bytes.toBytes("cf0"),Bytes.toBytes("average"),Bytes.toBytes(arr(2)))
(new ImmutableBytesWritable, put)
}}
println("getConfiguration")
var c = job.getConfiguration()
println("save")
saveRDD.saveAsNewAPIHadoopDataset(c) sc.stop()
}
}
5、maven打包
mvn clean scala:compile compile package
6、提交运算
bin/spark-submit \
--jars $(echo /opt/hbase-1.2./lib/*.jar | tr ' ' ',') \
--class com.zxth.sdas.spark.apps.HBaseOp \
--master local \
sdas-spark-1.0.0.jar
Spark读写HBase的更多相关文章
- Spark读写Hbase的二种方式对比
作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 转载请注明出处 一.传统方式 这种方式就是常用的TableInputFormat和TableOutputForm ...
- spark读写hbase性能对比
一.spark写入hbase hbase client以put方式封装数据,并支持逐条或批量插入.spark中内置saveAsHadoopDataset和saveAsNewAPIHadoopDatas ...
- Spark读写HBase时出现的问题--RpcRetryingCaller: Call exception
问题描述 Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: ...
- Spark读写Hbase中的数据
def main(args: Array[String]) { val sparkConf = new SparkConf().setMaster("local").setAppN ...
- Spark-读写HBase,SparkStreaming操作,Spark的HBase相关操作
Spark-读写HBase,SparkStreaming操作,Spark的HBase相关操作 1.sparkstreaming实时写入Hbase(saveAsNewAPIHadoopDataset方法 ...
- Spark实战之读写HBase
1 配置 1.1 开发环境: HBase:hbase-1.0.0-cdh5.4.5.tar.gz Hadoop:hadoop-2.6.0-cdh5.4.5.tar.gz ZooKeeper:zooke ...
- 使用 Spark SQL 高效地读写 HBase
Apache Spark 和 Apache HBase 是两个使用比较广泛的大数据组件.很多场景需要使用 Spark 分析/查询 HBase 中的数据,而目前 Spark 内置是支持很多数据源的,其中 ...
- Spark读Hbase优化 --手动划分region提高并行数
一. Hbase的region 我们先简单介绍下Hbase的架构和Hbase的region: 从物理集群的角度看,Hbase集群中,由一个Hmaster管理多个HRegionServer,其中每个HR ...
- 开源大数据技术专场(上午):Spark、HBase、JStorm应用与实践
16日上午9点,2016云栖大会“开源大数据技术专场” (全天)在阿里云技术专家封神的主持下开启.通过封神了解到,在上午的专场中,阿里云高级技术专家无谓.阿里云技术专家封神.阿里巴巴中间件技术部高级技 ...
随机推荐
- 《ASP.NET Core In Action》读书笔记系列四 创建ASP.NET Core 应用步骤及相应CLI命令
一般情况下,我们都是从一个模板(template)开始创建应用的(模板:提供构建应用程序所需的基本代码).本节使用 Visual Studio 2017 .ASP.NET Core2.0和 Visua ...
- rpm 数据库
rpm 数据库 /var/lib/rpm
- Mac 下 实现终端跳转 服务器 不用输入密码
首先需要安装 expect 安装 expect 需要 tcl 依赖 第一步 下载tcl http://www.tcl.tk/software/tcltk/downloadnow84.tml 将下载好 ...
- java线程学习之notify方法和notifyAll方法
notify(通知)方法,会将等待队列中的一个线程取出.比如obj.notify();那么obj的等待队列中就会有一个线程选中并且唤醒,然后被唤醒的队列就会退出等待队列.活跃线程调用等待队列中的线程时 ...
- PLS-00357: Table,View Or Sequence reference 'SEQ_TRADE_RECODE.NEXTVAL' not allowed in this context
oracle数据库: 为了使ID自增,建了序列后,创建触发器: create or replace TRIGGER TRIG_INSERT_TRADE_RECODE BEFORE INSERT ON ...
- python 识别图片上的数字
https://blog.csdn.net/qq_31446377/article/details/81708006 ython 3.6 版本 Pytesseract 图像验证码识别 环境: (1) ...
- k8s构建镜像-基于centos的python环境+pip
FROM centos:7.4.1708 #维护者信息MAINTAINER by icdss # 标签LABEL version="1.0" # 安装依赖RUN yum -y up ...
- IDEA 创建 web项目
创建web步骤: 1.创建一个project File -> New Project -> 选择Java,Project SDK为1.7,勾选Web Application(创建web.x ...
- 0x13链表与邻接表之邻值查找
题目链接:https://www.acwing.com/problem/content/138/ 参考链接:https://blog.csdn.net/sdz20172133/article/deta ...
- BZOJ 5261 Rhyme
思路 考虑一个匹配的过程,当一个节点x向后拼接一个c的时候,为了满足题目条件的限制,应该向suflink中最深的len[x]+1>=k的节点转移(保证该后缀拼上一个c之后,长度为k的子串依然属于 ...