Sqoop--Free-form Query Imports 自由查询模式下$CONDITIONS关键字的作用
Scoop是用来实现HDFS文件系统和关系型数据库如MySQL之间数据传输和转换的工具。
从MySQL导出到HDFS可以通过--table
, --columns
and --where
等设置数据抽出的条件。但是同时也只是自由sql语句(Free-form Query )的方式抽出数据。此时我们用--query加sql语句方式自由抽取数据。
1,必须制定目标文件的位置--target-dir
2,必须使用$CONDITIONS关键字,
3,你也可以选择使用--split-by
分片(分区,结果分成多个小文件,请参考mapreduce分区)
我们主要讨论$CONDITIONS关键字的作用是什么。
1如果直接输出,这里面是空的条件
2,我们在执行log中发现被替换成了1=0
sqoop import --connect jdbc:mysql://server74:3306/Server74 --username root --password 123456 --target-dir /sqoopout2 --m 1 --delete-target-dir
--query 'select id,name,deg from emp where id>1202 and $CONDITIONS'
[root@server72 sqoop]# sqoop import --connect jdbc:mysql://server74:3306/Server74 --username root --password 123456 --target-dir /sqoopout2
--m 1 --delete-target-dir --query 'select id,name,deg from emp where id>1202 and $CONDITIONS'
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/11/10 13:42:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/11/10 13:42:14 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/11/10 13:42:16 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/11/10 13:42:16 INFO tool.CodeGenTool: Beginning code generation
17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and (1 = 0)
17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and (1 = 0)
17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and (1 = 0)
17/11/10 13:42:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-root/compile/ac7745794cf5f0bf5859e7e8369a8c5f/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/11/10 13:42:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ac7745794cf5f0bf5859e7e8369a8c5f/QueryResult.jar
17/11/10 13:42:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/10 13:42:41 INFO tool.ImportTool: Destination directory /sqoopout2 deleted.
17/11/10 13:42:41 INFO mapreduce.ImportJobBase: Beginning query import.
17/11/10 13:42:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/11/10 13:42:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/11/10 13:42:43 INFO client.RMProxy: Connecting to ResourceManager at server71/192.168.32.71:8032
17/11/10 13:42:58 INFO db.DBInputFormat: Using read commited transaction isolation
17/11/10 13:42:58 INFO mapreduce.JobSubmitter: number of splits:1
17/11/10 13:43:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510279795921_0011
17/11/10 13:43:03 INFO impl.YarnClientImpl: Submitted application application_1510279795921_0011
17/11/10 13:43:04 INFO mapreduce.Job: The url to track the job: http://server71:8088/proxy/application_1510279795921_0011/
17/11/10 13:43:04 INFO mapreduce.Job: Running job: job_1510279795921_0011
17/11/10 13:44:01 INFO mapreduce.Job: Job job_1510279795921_0011 running in uber mode : false
17/11/10 13:44:01 INFO mapreduce.Job: map 0% reduce 0%
17/11/10 13:44:58 INFO mapreduce.Job: map 100% reduce 0%
17/11/10 13:45:00 INFO mapreduce.Job: Job job_1510279795921_0011 completed successfully
17/11/10 13:45:01 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=124473
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=61
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=45099
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=45099
Total vcore-milliseconds taken by all map tasks=45099
Total megabyte-milliseconds taken by all map tasks=46181376
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=370
CPU time spent (ms)=6380
Physical memory (bytes) snapshot=106733568
Virtual memory (bytes) snapshot=842854400
Total committed heap usage (bytes)=16982016
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=61
17/11/10 13:45:01 INFO mapreduce.ImportJobBase: Transferred 61 bytes in 139.3429 seconds (0.4378 bytes/sec)
17/11/10 13:45:01 INFO mapreduce.ImportJobBase: Retrieved 3 records. 输出结果查看,发现1202以上的数据被正常抽出
[root@server72 sqoop]# hdfs dfs -cat /sqoopout2/part-m-00000
17/11/10 13:48:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1203,khalil,php dev
1204,prasanth,php dev
1205,kranthi,admin
通过以上过程,我们得知一点:$CONTITONS是linux系统的变量,在执行过程中被赋值为(1=0),虽然实际执行的这个sql很奇怪。
现在正式开始研究CONTITONS到底是什么,所以我们先查看官方文档。
If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop. Your query must include the token
$CONDITIONS
which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with--split-by
.如果你想通过并行的方式导入结果,每个map task需要执行sql查询语句的副本,结果会根据sqoop推测的边界条件分区。query必须包含
$CONDITIONS
。这样每个scoop程序都会被替换为一个独立的条件。同时你必须指定--split-by
.分区For example:
$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
--split-by a.id --target-dir /user/foo/joinresults
直接理解可能有点困难,我先修改一些条件,大家观察joblog的区别。
sqoop import --connect jdbc:mysql://server74:3306/Server74 --username root --password 123456 --target-dir /sqoopout2
--m 2 --delete-target-dir --query 'select id,name,deg from emp where id>1202 and $CONDITIONS'
--split-by id
我按照要求添加了--split-by id 分区,并设置map task数量为2
[root@server72 sqoop]# sqoop import --connect jdbc:mysql://server74:3306/Server74 --username root
--password 123456 --target-dir /sqoopout2 --m 2 --delete-target-dir --query 'select id,name,deg from emp where id>1202 and $CONDITIONS' --split-by id
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
// :: INFO sqoop.Sqoop: Running Sqoop version: 1.4.
// :: WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
// :: INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
// :: INFO tool.CodeGenTool: Beginning code generation
// :: INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id> and ( = )
// :: INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id> and ( = )
// :: INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id> and ( = )
// :: INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-root/compile/1024341fa58082466565e5bd648cb10e/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
// :: INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/1024341fa58082466565e5bd648cb10e/QueryResult.jar
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO tool.ImportTool: Destination directory /sqoopout2 deleted.
// :: INFO mapreduce.ImportJobBase: Beginning query import.
// :: INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
// :: INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
// :: INFO client.RMProxy: Connecting to ResourceManager at server71/192.168.32.71:
// :: INFO db.DBInputFormat: Using read commited transaction isolation
// :: INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM (select id,name,deg from emp where id>1202 and (1 = 1) ) AS t1
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510279795921_0012
// :: INFO impl.YarnClientImpl: Submitted application application_1510279795921_0012
// :: INFO mapreduce.Job: The url to track the job: http://server71:8088/proxy/application_1510279795921_0012/
// :: INFO mapreduce.Job: Running job: job_1510279795921_0012
// :: INFO mapreduce.Job: Job job_1510279795921_0012 running in uber mode : false
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Job job_1510279795921_0012 completed successfully
// :: INFO mapreduce.Job: Counters:
File System Counters
FILE: Number of bytes read=
FILE: Number of bytes written=
FILE: Number of read operations=
FILE: Number of large read operations=
FILE: Number of write operations=
HDFS: Number of bytes read=
HDFS: Number of bytes written=
HDFS: Number of read operations=
HDFS: Number of large read operations=
HDFS: Number of write operations=
Job Counters
Killed map tasks=
Launched map tasks=
Other local map tasks=
Total time spent by all maps in occupied slots (ms)=
Total time spent by all reduces in occupied slots (ms)=
Total time spent by all map tasks (ms)=
Total vcore-milliseconds taken by all map tasks=
Total megabyte-milliseconds taken by all map tasks=
Map-Reduce Framework
Map input records=
Map output records=
Input split bytes=
Spilled Records=
Failed Shuffles=
Merged Map outputs=
GC time elapsed (ms)=
CPU time spent (ms)=
Physical memory (bytes) snapshot=
Virtual memory (bytes) snapshot=
Total committed heap usage (bytes)=
File Input Format Counters
Bytes Read=
File Output Format Counters
Bytes Written=
Sqoop--Free-form Query Imports 自由查询模式下$CONDITIONS关键字的作用的更多相关文章
- Query Object--查询对象模式(下)
回顾 上一篇对模式进行了介绍,并基于ADO.NET进行了实现,虽然现在ORM框架越来越流行,但是很多中小型的公司仍然是使用ADO.NET来进行数据库操作的,随着项目的需求不断增加,业务不断变化,ADO ...
- Query Object--查询对象模式(上)
回顾 上两篇文章主要讲解了我对于数据层的Unit Of Work(工作单元模式)的理解,其中包括了CUD的操作,那么今天就来谈谈R吧,文章包括以下几点: 什么是Query Object 基于SQL的实 ...
- Oracle ADF VO排序及VO的查询模式
常规应用中,当需要使用Table向终端用户展示数据时,Table中数据的显示排序一致性极大程度的影响到了客户体验.通常希望诸如多次查询结果显示顺序相同.插入数据在原数据上方等的实现. ADF为开发人员 ...
- 重构改善既有代码设计--重构手法04:Replace Temp with Query (以查询取代临时变量)
所谓的以查询取代临时变量:就是当你的程序以一个临时变量保存某一个表达式的运算效果.将这个表达式提炼到一个独立函数中.将这个临时变量的所有引用点替换为对新函数的调用.此后,新函数就可以被其他函数调用. ...
- iPhone CSS media query(媒体查询)
iPhone5 iPhone6 iPhone6Plus iPad设备 media query(媒体查询)代码. iPhone < 5: @media screen and (device-a ...
- Elasticsearch(入门篇)——Query DSL与查询行为
ES提供了丰富多彩的查询接口,可以满足各种各样的查询要求.更多内容请参考:ELK修炼之道 Query DSL结构化查询 Query DSL是一个Java开源框架用于构建类型安全的SQL查询语句.采用A ...
- Spring Boot 整合 Elasticsearch,实现 function score query 权重分查询
摘要: 原创出处 www.bysocket.com 「泥瓦匠BYSocket 」欢迎转载,保留摘要,谢谢! 『 预见未来最好的方式就是亲手创造未来 – <史蒂夫·乔布斯传> 』 运行环境: ...
- 重构手法之Replace Temp with Query(以查询取代临时变量)
返回总目录 6.4Replace Temp with Query(以查询取代临时变量) 概要 你的程序以一个临时变量保存某一表达式的运算结果. 将这个表达式提炼到一个独立函数中.将这个临时变量的所有引 ...
- GIS-010-ArcGIS JS 三种查询模式(转)
QueryTask.FindTask.IdentifyTask都是继承自ESRI.ArcGIS.Client.Tasks: 1.QueryTask:是一个进行空间和属性查询的功能类,它可以在某个地图服 ...
随机推荐
- 【leetcode刷题笔记】Text Justification
Given an array of words and a length L, format the text such that each line has exactly L characters ...
- 【Flask】Sqlalchemy limit, offset slice操作
### limit.offset和切片操作:1. limit:可以限制每次查询的时候只查询几条数据.2. offset:可以限制查找数据的时候过滤掉前面多少条.3. 切片:可以对Query对象使用切片 ...
- 【Flask】ORM 关系一对一
### 一对一的关系:在sqlalchemy中,如果想要将两个模型映射成一对一的关系,那么应该在父模型中,指定引用的时候,要传递一个`uselist=False`这个参数进去.就是告诉父模型,以后引用 ...
- HNOI2019梦游记
\(Day_0\) 十点半开始睡觉,开始了八个小时的不眠之夜,整晚都没睡着,这状态明天肯定挂了 \(Day_1\) 开局一条鱼,计算几何只会\(20\) 还是\(T2\)的\(20\)纯暴力好打,\( ...
- Java虚拟机的平台无关性与语言无关性
平台无关性 不同平台的不同java虚拟机,都执行同一种字节码文件,即Class文件 语言无关性 Java虚拟机不止能执行java程序,还有Clojure.Groovy.JRuby.Jython.Sca ...
- .NET及JAVA 中如何使用代码启动程序
.NET 中: System.Diagnostics.Process.Start("应用程序"); JAVA中: ProcessBuilder pb=new ProcessB ...
- Linux 基本命令___0002
来源:https://mp.weixin.qq.com/s/DmfpDfWpWRV3EDItDdYgXQ #配置vim #http://www.cnblogs.com/ma6174/archive/2 ...
- 泛型学习第三天——C#读取数据库返回泛型集合 把DataSet类型转换为List<T>泛型集合
定义一个类: public class UserInfo { public System.Guid ID { get; set; } public string LoginName ...
- C#反射第一天
[转]C#反射 反射(Reflection)是.NET中的重要机制,通过放射,可以在运行时获得.NET中每一个类型(包括类.结构.委托.接口和枚举等)的成员,包括方法.属性.事件,以及构造函数等. ...
- hadoop mapreduce实现数据去重
实现原理分析: map函数数将输入的文本按照行读取, 并将Key--每一行的内容 输出 value--空. reduce 会自动统计所有的key,我们让reduce输出key-> ...