sqoop import -D oraoop.disabled=true \

--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=HOSTNAME)(port=PORT))(connect_data=(service_name=SERVICE_NAME)))" \

--username USERNAME --table TABLE_NAME --null-string '\\N' --null-non-string '\\N' \

--hive-import --hive-table HIVEDB.HIVETALBENAME \

--num-mappers  --verbose --password PWD --hive-drop-import-delims --hive-overwrite

--fetch-size 

-D is not the parameter for sqoop, it is used for hadoop.

oraoop.disabled=true

If not set this parameter, the command report a issue: table or view does not exists.

Oraoop is a special plugin for sqoop that provides faster access to Oracle's RDBMS by using custom protocols that are not available publicly. Quest software partnered with Oracle to get those protocols, implemented them and created Oraoop.

In our test environment, without this parameter setting, it works fine. For another environment, encounter this issue, before this, I see one log message is : it can't be recognized a valid thin url. Maybe the driver issue .

Another thing need to take care is , you 'd better write TABLE_NAME(VIEW) AND username in UPPER CASE. Or else you may encounter same issue: table or view not exists.

--hive-drop-import-delims

This parameter used to address the known issue, when your fields in the RDBMS table has new line (\r \n or special char such as \001) in the content.

It will break the hive rule. Hive use \001 as default field separator and \n as the row terminator in default.

But if you specify the fields separator or row terminator by yourself, hive will report a error. Hive now just support \n as the row terminator. So you can replace or drop the special char or \r\n in the fields.

--hive-overwrite

This will overwrite the data in the hive table

--fetch-size

This parameter 's default value is 1000.

One time, when we load a width view, has about 80 columns. The sqoop command report a error: out of memory .

The java file not generated now. I don't know why, but this error occurs before the fetch size setting, so I change this.

The root cause may need get more information from source code .

--null-string '\\N' --null-non-string '\\N'

For this parameter, the hive will parse NULL in RDBMS to string 'null', with this parameter, it will keep null in hive table.

If the sqoop command will generate the hadoop jar file in temp path, and then execute the mapreduce job.

First , it will load data to HDFS, then create table for hive, then use load command load data from HDFS to datawarehouse folder.

If the command execute successfully, it will clean the staging file.

If it fails when load data to hive or create hive table. The hdfs folder and file will keep in the HDFS.

If you rerun the same command again, it will fail, report the output directory has exists. So just drop it or load the data by self.

If you use --query (-e) , use free query to load data.

Demo : --query "select *from table where \$conditions", in double quote , you should add \, in single quote, not needed for this.

And you should add parameter --target-dir /hdfspath , if you use --query.

when load data from rdbms to hive, if you let the sqoop create the table for you. you will find the integer type will convert to double.

so you need do something for this. please take care.

SQOOP Load Data from Oracle to Hive Table的更多相关文章

  1. 使用sqoop从mysql导入数据到hive

      目录 前言 一.使用的导入命令 二.遇到的问题及解决 1. 用文本字段进行分区的问题 2. Hadoop历史服务器Hadoop JobHistory没开启的问题 3. 连接元数据存储数据库报错 4 ...

  2. mysql load data infile的使用 和 SELECT into outfile备份数据库数据

    LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE t ...

  3. LOAD DATA INFILE – performance case study

    转: http://venublog.com/2007/11/07/load-data-infile-performance/ I often noticed that people complain ...

  4. LOAD DATA INFILE Syntax--官方

    LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] INTO TABLE tbl_n ...

  5. mysql 的load data infile要使用

    LOAD DATA INFILE从文本文件中读出的声明以极高的速度到表. 1.基本语法 LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'fi ...

  6. mysql导入数据load data infile用法整理

    有时候我们需要将大量数据批量写入数据库,直接使用程序语言和Sql写入往往很耗时间,其中有一种方案就是使用MySql Load data infile导入文件的形式导入数据,这样可大大缩短数据导入时间. ...

  7. mysql 开发进阶篇系列 50 表的数据导入(load data infile,mysqlimport )

    一.概述 上篇讲到的表的数据导出(select .. into outfile 或者mysqldump),这篇继续讲表的数据导入,导入也同样有二个方法,分别是load data infile... 和 ...

  8. load data妙用

    load变量和用户变量的巧妙结合,实现灵活导入字段列(NO.1) LOAD DATA INFILE 'file.csv' INTO TABLE dados_meteo (@var1, @var2) S ...

  9. MySQL基础之---mysqlimport工具和LOAD DATA命令导入文本文件

     1.mysqlimport工具的使用 看一下命令的使用方法: shell > mysqlimport -u root -p [--LOCAL] DBname File [option] --f ...

随机推荐

  1. 基于FreeBSD 64位内核的kFreeBSD无法在Virtualbox下安装

    ArchBSD同上 感谢大A(豆瓣)的投稿 :)

  2. [moka同学摘录]Yii2.0开发初学者必看

    想要了解更多YII,PHP方面内容,请关注本博客. 基础总结 1.修改默认控制器/方法 yii默认是site控制器,可以在web.php中设置$config中的'defaultRoute'='xxxx ...

  3. Angularjs,WebAPI 搭建一个简易权限管理系统 —— WebAPI项目主体结构(四)

    目录 前言 Angularjs名词与概念 Angularjs 基本功能演示 系统业务与实现 WebAPI项目主体结构 Angularjs 前端主体结构 5.0 WebAPI项目主体结构 5.1 总体结 ...

  4. spring扫描classpath下特定package,并加载具有特定注解的接口

    spring扫描classpath下特定package,并加载具有特定注解的接口. 在框架平台的开发中,通常有很多的情况通过spring配置方式来实现某些功能会使得框架平台难以使用和扩展,我们通常的做 ...

  5. 浅谈ES6中的Proxy

    Proxy是一个很有趣的对象,它能够修改某些操作的默认行为,等同于在语言层面做出修改,属于一种‘元编程’,即对编程语言进行编程. Proxy其实很好理解,就是在目标对象之前架设一层拦截,外界的访问都得 ...

  6. css实现垂直居中的方法

    1,设置其line-height值,使之与其高度相同 2,设置table结构,用vertical-align:middle; 3,应用定位,父级别:position:relative:子级:posit ...

  7. 利用ng-click、ng-switch和click-class制作切换的tabl

    效果如下图,当分别点击1,2,3时,下面的不同颜色的div会切换 <html ng-app> <head> <title></title> <sc ...

  8. 据说是百度ios面试题

    百度面试题:   一面:知识点 Objective C runtime library: Objective C的对象模型,Block的底层实现结构,消息发送,消息转发,内存管理 CoreData : ...

  9. Android——五大布局

    Android的五大布局分为: 线性布局 相对布局 帧布局 绝对布局 表格布局 一.线性布局 线性布局在开发中使用最多,具有垂直方向与水平方向的布局方式 通过设置属性"android:ori ...

  10. android项目中gen目录不能自动生成R.java的原因

    1.调用的资源文件不存在:xml文件中有些控件没有关联引用:把项目缺少的文件加上,包括资源文件,如 values中的strings.xml或者图片等资源. 2.项目中缺少必须的系统文件(比如:defa ...