1. Scenario description

when I use sqoop to import mysql table into hive, I got the following error:

// :: WARN hcat.SqoopHCatUtilities: The Sqoop job can fail if types are not  assignment compatible
// :: WARN hcat.SqoopHCatUtilities: The HCatalog field submername has type string. Expected = varchar based on database column type : VARCHAR
// :: WARN hcat.SqoopHCatUtilities: The Sqoop job can fail if types are not assignment compatible
// :: INFO mapreduce.DataDrivenImportJob: Configuring mapper for HCatalog import job
// :: INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
// :: INFO client.RMProxy: Connecting to ResourceManager at hadoop-namenode01/192.168.1.101:
// :: WARN conf.HiveConf: HiveConf of name hive.server2.webui.host.port does not exist
// :: INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
// :: INFO db.DBInputFormat: Using read commited transaction isolation
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1562229385371_50086
// :: INFO impl.YarnClientImpl: Submitted application application_1562229385371_50086
// :: INFO mapreduce.Job: The url to track the job: http://hadoop-namenode01:8088/proxy/application_1562229385371_50086/
// :: INFO mapreduce.Job: Running job: job_1562229385371_50086
// :: INFO hive.metastore: Closed a connection to metastore, current connections:
// :: INFO mapreduce.Job: Job job_1562229385371_50086 running in uber mode : false
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Task Id : attempt_1562229385371_50086_m_000000_0, Status : FAILED
Error: GC overhead limit exceeded

Why Sqoop Import throws this exception?
The answer is – During the process, RDBMS database (NOT SQOOP) fetches all the rows at one shot and tries to load everything into memory. This causes memory spill out and throws error. To overcome this you need to tell RDBMS database to return the data in batches. The following parameters “?dontTrackOpenResources=true&defaultFetchSize=10000&useCursorFetch=true” following the jdbc connection string tells database to fetch 10000 rows per batch.

The script I use to import is as follows:

file sqoop_order_detail.sh

#!/bin/bash

/home/lenmom/sqoop-1.4./bin/sqoop import \
--connect jdbc:mysql://lenmom-mysql:3306/inventory \
--username root --password root \
--driver com.mysql.jdbc.Driver \
--table order_detail \
--hcatalog-database orc \
--hcatalog-table order_detail \
--hcatalog-partition-keys pt_log_d \
--hcatalog-partition-values \
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \
-m

the target mysql table has 10 billion record.

2.Solution:

2.1 solution 1

modify the mysql url to set stream read data style by append the following content:

?dontTrackOpenResources=true&defaultFetchSize=&useCursorFetch=true

of which the defaultFetchSize can be changed according to specific condition,in my case, the whole script is :

#!/bin/bash

/home/lenmom/sqoop-1.4./bin/sqoop import \
--connect jdbc:mysql://lenmom-mysql:3306/inventory?dontTrackOpenResources=true\&defaultFetchSize=10000\&useCursorFetch=true\&useUnicode=yes\&characterEncoding=utf8\&characterEncoding=utf8 \
--username root --password root \
--driver com.mysql.jdbc.Driver \
--table order_detail \
--hcatalog-database orc \
--hcatalog-table order_detail \
--hcatalog-partition-keys pt_log_d \
--hcatalog-partition-values \
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \
-m

don't  forget to use escape for & in shell script, or we can also use "jdbc url" to instead of using escape.

#!/bin/bash

/home/lenmom/sqoop-1.4./bin/sqoop import \
--connect "jdbc:mysql://lenmom-mysql:3306/inventory?dontTrackOpenResources=true&defaultFetchSize=10000&useCursorFetch=true&useUnicode=yes&characterEncoding=utf8&characterEncoding=utf8" \
--username root --password root \
--driver com.mysql.jdbc.Driver \
--table order_detail \
--hcatalog-database orc \
--hcatalog-table order_detail \
--hcatalog-partition-keys pt_log_d \
--hcatalog-partition-values \
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \
-m

2.2 Solution 2

sqoop import -Dmapreduce.map.memory.mb= -Dmapreduce.map.java.opts=-Xmx1600m -Dmapreduce.task.io.sort.mb=

Above parameters needs to be tuned according to the data for a successful SQOOP pull.

2.3 Solution 3

increase mapper number(the default mapper number is 4, should not greater than datanode number)

sqoop job --exec lenmom-job -- --num-mappers ;

reference:

https://stackoverflow.com/questions/26484873/cloudera-settings-sqoop-import-gives-java-heap-space-error-and-gc-overhead-limit

sqoop import mysql to hive table:GC overhead limit exceeded的更多相关文章

  1. troubleshooting-sqoop mysql导入hive 报:GC overhead limit exceeded

    Halting due to Out Of Memory Error...18/09/13 21:42:17 INFO mapreduce.Job: Task Id : attempt_1536756 ...

  2. java.lang.OutOfMemoryError:GC overhead limit exceeded填坑心得

    我遇到这样的问题,本地部署时抛出异常java.lang.OutOfMemoryError:GC overhead limit exceeded导致服务起不来,查看日志发现加载了太多资源到内存,本地的性 ...

  3. [转]java.lang.OutOfMemoryError:GC overhead limit exceeded

    我遇到这样的问题,本地部署时抛出异常java.lang.OutOfMemoryError:GC overhead limit exceeded导致服务起不来,查看日志发现加载了太多资源到内存,本地的性 ...

  4. java.lang.OutOfMemoryError:GC overhead limit exceeded

    在调测程序时报java.lang.OutOfMemoryError:GC overhead limit exceeded 错误 错误原因:在用程序进行数据切割时报了该错误.由于在本地执行数据切割测试的 ...

  5. Android:java.lang.OutOfMemoryError:GC overhead limit exceeded

    Android编译:java.lang.OutOfMemoryError:GC overhead limit exceeded 百度好多什么JVM啊之类的东西,新手简单粗暴的办法: 1.在的Model ...

  6. oozie: GC overhead limit exceeded 解决方法

    1.异常表现形式 1)  提示信息      Error java.lang.OutOfMemoryError: GC overhead limit exceeded 2)提示出错      Erro ...

  7. java.lang.OutOfMemoryError:GC overhead limit exceeded解决方法

    异常如下:Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded 一.解 ...

  8. java.lang.OutOfMemoryError:GC overhead limit exceeded解决方

    Tomcat异常信息: Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit excee ...

  9. solr索引报错(java.lang.OutOfMemoryError:GC overhead limit exceeded)

    配置文件修改如下: <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3 ...

随机推荐

  1. vue中超简单的方法实现点击一个按钮出现弹框,点击弹框外关闭弹框

    效果图展示: View层 <template> <div> <div class="mask" v-if="showModal" ...

  2. python_requests ~爬虫~小视频~~~

    当一只小小的Py_Spider也有一段时间了, 期间,更多的时间是在爬取图片啊, 文字, 文档这类的东西, 今天突然一时兴起, 来爬一手视频! 所以就找到了远近闻名的六间房(六扇门)哈哈,~~~ 1. ...

  3. JS判断移动端访问设备并加载对应CSS样式

    JS判断不同web访问环境,主要针对移动设备,提供相对应的解析方案(判断设备代码直接copy腾讯网的) // 判断是否为移动端运行环境 if(/AppleWebKit.*Mobile/i.test(n ...

  4. [Angular 8] Keep original DOM structure with ng-container

    ng-container is using for grouping elments together, a bit similar to div. If you want to group some ...

  5. 【爬虫】把抓到数据存起来——爬虫绝配mongodb

    [爬虫]把抓到数据存起来——爬虫绝配mongodb 视频地址 抓取数据的方法,前面的课程该讲的都已经讲了,爬取下来数据只是第一步,第二步就是要先存起来.我们最容易想到的就是存文件里喽,python写文 ...

  6. (三)IDEA使用,功能面板

    IDEA 打开界面后周围有许多的功能面板 常用的界面 1.project:项目的目录结构: 2.Structure:结构界面:在这个界面里可以看到选择的类,接口 的结构,有哪些方法,字段,等: 3. ...

  7. luogu 2312 解方程 乱搞+取模

    思路非常好想,但是你很难想到去用这个算法,因为这个几乎就是个乱搞~ 我们发现多项式中每一个系数都很大,但是 $m$ 却很小,即最多只用 $10^6$ 个整数需要验证. 我们知道,如果一个数等于 $0$ ...

  8. 【一起来烧脑】读懂JQuery知识体系

    背景 在现在就业的过程中,会运用JQuery是你的加分项,那么什么是JQuery,嗯,jquery是JavaScript的函数库,是一种轻量级的JavaScript库,写得少,做的多,导致jQuery ...

  9. (29)打鸡儿教你Vue.js

    web阅读器开发 epub格式的解析原理 Vue.js+epub.js实现一个简单的阅读器 实现阅读器的基础功能 字号选择,背景颜色 有上一页,下一页的功能 设置字号,切换主题,进度按钮 电子书目录 ...

  10. Error instantiating class cn.edu.zju.springmvc.pojo.Items with invalid types () or values (). 报错解决方法

    org.mybatis.spring.MyBatisSystemException: nested exception is org.apache.ibatis.reflection.Reflecti ...