用Sqoop进行Hive和MySQL之间的数据互导

Hive导数据入MySQL

创建mysql表

use anticheat;

create table anticheat_blacklist(

userid varchar(30) primary key ,

dt int,

update_time timestamp,

delete_flag int,

operator varchar(30)

);

全量导出

用sqoop export全量导出hive表数据入mysql,具体命令如下：

sqoop export -D mapred.job.queue.name=datacenter

--connect jdbc:mysql://localhost:3306/anticheat?tinyInt1isBit=false

--username root

--password ^qn9DFYPm

--table anticheat_blacklist

--input-fields-terminated-by '\t'

--input-null-string '\\N'

--input-null-non-string '\\N'

--num-mappers 10

--export-dir hdfs://dc5/user/test/hive/online/anticheat_blacklist_mysql

增量导出

sqoop export -D mapred.job.queue.name=datacenter

--connect jdbc:mysql://localhost:3306/anticheat?tinyInt1isBit=false

--username root

--password ^qn9DFYPm

--table anticheat_blacklist2

--input-fields-terminated-by '\t'

--input-null-string '\\N'

--input-null-non-string '\\N'

--num-mappers 10

--update-key update_time

--update-mode  allowinsert

--export-dir hdfs://dc5/user/test/hive/online/anticheat_blacklist_mysql2

MySQL导数据入Hive

创建Hive表

创建同步mysql表的hive表

CREATE TABLE test.anticheat_blacklist_mysql(

key string,

dt int,

update_time timestamp,

delete_flag int,

operator string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE

LOCATION 'hdfs://dc5/user/test/hive/online/anticheat_blacklist_mysql';

全量导入

用sqoop import全量导出mysql表数据入hive表,具体命令如下：

sqoop import -D mapred.job.queue.name=datacenter

--connect jdbc:mysql://localhost:3306/anticheat?tinyInt1isBit=false

--username root

--password ^qn9DFYPm

--table anticheat_blacklist

--delete-target-dir

--beeline "jdbc:hive2://dsrv2.heracles.sohuno.com:10000/test;principal=hive/dsrv2.heracles.sohuno.com@HERACLE.SOHUNO.COM;"

--hive-import --fields-terminated-by '\t'

--hive-database test

--hive-table  anticheat_blacklist_mysql

--null-string '\\N'

--null-non-string '\\N'

--hive-overwrite

--num-mappers 1

--outdir /home/test/data/anticheat/mysql2hive

null字符串转为NULL,添加下面两条参数可以实现：

–null-string 如果指定列为字符串类型，使用指定字符串替换值为null的该类列的值
–null-non-string 如果指定列为非字符串类型，使用指定字符串替换值为null的该类列的值

增量导入

增量导入：（根据时间来导入，如果表中没有时间属性，可以增加一列时间簇）

核心参数：

–check-column 用来指定一些列，这些列在增量导入时用来检查这些数据是否作为增量数据进行导入，和关系型数据库中的自增字段及时间戳类似. 注意:这些被指定的列的类型不能使任意字符类型（在关系数据库中），如char、varchar等类型都是不可以的，同时–check-column可以去指定多个列
–incremental 用来指定增量导入的模式，两种模式分别为Append和Lastmodified
–last-value 指定上一次导入中检查列指定字段最大值，即会导入比lastvalue指定值大的数据记录

注意：上面三个参数都必须添加！

执行语句：

sqoop import -D mapred.job.queue.name=datacenter

--connect jdbc:mysql://localhost:3306/anticheat?tinyInt1isBit=false

--username root

--password ^qn9DFYPm

--table anticheat_blacklist

--delete-target-dir

--hive-import --fields-terminated-by '\t'

--beeline "jdbc:hive2://dsrv2.heracles.sohuno.com:10000/test;principal=hive/dsrv2.heracles.sohuno.com@HERACLE.SOHUNO.COM;"

--hive-database test

--hive-table  anticheat_blacklist_mysql

--null-string '\\N'

--hive-overwrite

--num-mappers 1

--check-column update_time

--incremental lastmodified

--last-value "2019-04-12 14:31:34"

--outdir /home/test/data/anticheat/mysql2hive

以上语句使用 lastmodified 模式进行增量导入，结果报错：

错误信息：--incremental lastmodified option for hive imports is not supported. Please remove the parameter --incremental lastmodified

错误原因：Sqoop 不支持 mysql转hive时使用 lastmodified 模式进行增量导入，但mysql转HDFS时可以支持该方式！

我们使用append方式导入：

sqoop import -D mapred.job.queue.name=datacenter

--connect jdbc:mysql://localhost:3306/anticheat?tinyInt1isBit=false

--username root

--password ^qn9DFYPm

--table anticheat_blacklist

--delete-target-dir

--hive-import --fields-terminated-by '\t'

--hive-database test

--hive-table  anticheat_blacklist_mysql

--null-string '\\N'

--null-non-string '\\N'

--num-mappers 1

--check-column update_time

--incremental append

--last-value "2019-04-12 14:31:34"

--outdir /home/test/data/anticheat/mysql2hive

增量导入成功！

用Sqoop进行Hive和MySQL之间的数据互导的更多相关文章

解决kettle在两个mysql之间迁移数据时乱码的问题和相关报错及参数调整, 速度优化
1. 乱码问题编辑目标数据库的链接: 配置编码参数即可. 2. 报错 No operations allowed after statement closed. 需要调整wait_timeout: ...
<关于数据仓库>基于docker的Mysql与Hadoop/Hive之间的数据转移 (使用Apache Sqoop™)
原创博客,转载请联系博主! 摘要:本文介绍了如何使用docker快速搭建一个可以从外部访问的mysql服务容器,和由docker搭建的分布式Hadoop文件系统,并且使用ApacheSqoop完成将m ...
sqoop从hive导入数据到mysql时出现主键冲突
今天在将一个hive数仓表导出到mysql数据库时出现进度条一直维持在95%一段时间后提示失败的情况,搞了好久才解决.使用的环境是HUE中的Oozie的workflow任何调用sqoop命令,该死的o ...
ETL数据从sqlserver到mysql之间迁移
因近期需要进行sqlserver数据到mysql之间的数据同步.偶然之间发现了这一款工具ELK 一.下载 1.Kettle可以在http://kettle.pentaho.org/网站下载 2.下载的 ...
Hadoop Hive概念学习系列之HDFS、Hive、MySQL、Sqoop之间的数据导入导出（强烈建议去看）
Hive总结(七)Hive四种数据导入方式 (强烈建议去看) Hive几种数据导出方式 https://www.iteblog.com/archives/955 (强烈建议去看) 把MySQL里的数据 ...
如何利用sqoop将hive数据导入导出数据到mysql
运行环境 centos 5.6 hadoop hive sqoop是让hadoop技术支持的clouder公司开发的一个在关系数据库和hdfs,hive之间数据导入导出的一个工具. 上海尚学堂 ...
hive、sqoop、MySQL间的数据传递
hdfs到MySQL csv/txt文件到hdfs MySQL到hdfs hive与hdfs的映射: drop table if exists emp;create table emp ( id i ...
sqoop与mysql之间中文乱码
sudo -u hive sqoop export --connect "jdbc:mysql://192.168.22.201/LauncherDB?useUnicode=true& ...
Sqoop export（Hive to MySQL）的一些 reference
之后可能会整理成文章..还有一些坑没趟完. Reference: https://cloud.tencent.com/developer/article/1078473 Sqoop抽取Hive Pa ...

随机推荐

网盘资源分享：你不知道的JavaScript（上）
链接:https://pan.baidu.com/s/1UEBetOr2Z94oEeu5VsQYXQ 提取码:etts 复制这段内容后打开百度网盘手机App,操作更方便哦
windows下Xshell远程访问虚拟机
下载Xshell 5软件在windows下安装安装好后Xshell 5启动软件下一步,检查虚拟机,配置是否正确下一步,设置网络,保障虚拟机系统能够连接网络下一步,进入虚拟机系统,检查虚拟机网络 ...
C++学习笔记：多态篇之虚析构函数
动态多态中存在的问题:可能会产生内存泄漏! 以下通过一个例子向大家说明问什么会产生内存泄漏: class Shape//形状类 { public: Shape(); virtual double ca ...
springboot秒杀课程学习整理1-1
1)新建一个maven工程quickStart,然后在pom文件里添加依赖 <parent> <groupId>org.springframework.boot</gro ...
Vasya And Password（CodeForces - 1051A）
Vasya came up with a password to register for EatForces — a string ss. The password in EatForces sho ...
单链表ADT
本博客第一篇学术性博客,所以还是写点什么东西: 首先这篇博客以及以后的博客中的代码尽量百分之90是自己写过的: 可能有部分图片和代码是我认为别人更好的故摘抄下来, 本人三观正确,所以一定会表明来源: ...
Python3的基础
Python的3.0版本,常被称为Python 3000,或简称Py3k. 关于Python版本的下载:https://www.python.org/,以及Anaconda的下载:https://ww ...
anytime
#include<stdio.h> #include<stdlib.h> #include<unistd.h> #include<sys/time.h> ...
Android：JNI强化训练
一.前言 Java本机接口(Java Native Interface (JNI))是本机编程接口,它是JDK的一部分,JNI它提供了若干的API,实现了和Java和其他通信(主要是C&C++ ...
java 的重写（覆盖）和重载的区别
方法的的重写(覆盖) 在类继承中,子类可以修改从父类继承来的行为,也就是说子类能创建一个与父类方法有不同功能的方法,但具有相同的:名称.返回类型.参数列表.如果在子类中定义一个方法,其方法名称.返回值 ...