sqoop的安装与使用

1.什么是Sqoop

Sqoop即 SQL to Hadoop ，是一款方便的在传统型数据库与Hadoop之间进行数据迁移的工具。充分利用MapReduce并行特点以批处理的方式加快传输数据。发展至今主要演化了二大版本号。Sqoop1和Sqoop2。

Sqoop工具是hadoop下连接关系型数据库和Hadoop的桥梁，支持关系型数据库和hive、hdfs。hbase之间数据的相互导入，能够使用全表导入和增量导入。

那么为什么选择Sqoop呢？

高效可控的利用资源，任务并行度，超时时间。数据类型映射与转化，可自己主动进行，用户也可自己定义支持多种主流数据库。MySQL,Oracle，SQL Server，DB2等等

2.Sqoop1和Sqoop2对照的异同之处

两个不同的版本号。全然不兼容版本号号划分差别。Apache版本号：1.4.x(Sqoop1); 1.99.x(Sqoop2) CDH版本号 : Sqoop-1.4.3-cdh4(Sqoop1) ; Sqoop2-1.99.2-cdh4.5.0 (Sqoop2)Sqoop2比Sqoop1的改进引入Sqoop server。集中化管理connector等多种訪问方式：CLI,Web UI，REST API 引入基于角色的安全机制

3.Sqoop1与Sqoop2的架构图

Sqoop架构图1

Sqoop架构图2

4.Sqoop1与Sqoop2的优缺点

比較	Sqoop1	Sqoop2
架构	只使用一个Sqoopclient	引入了Sqoop server集中化管理connector。以及rest api，web，UI，并引入权限安全机制
部署	部署简单，安装须要root权限，connector必须符合JDBC模型	架构稍复杂。配置部署更繁琐
使用	命令行方式easy出错，格式紧耦合。无法支持全部数据类型。安全机制不够完好。比如password暴漏	多种交互方式，命令行。web UI。rest API，conncetor集中化管理，全部的链接安装在Sqoop server上，完好权限管理机制。connector规范化，只负责数据的读写

5.Sqoop1的安装部署

5.0 安装环境

hadoop：hadoop-2.3.0-cdh5.1.2

sqoop：sqoop-1.4.4-cdh5.1.2

5.1 下载安装包及解压

tar -zxvf sqoop-1.4.4-cdh5.1.2.tar.gz

ln -s sqoop-1.4.4-cdh5.1.2 sqoop

5.2 配置环境变量和配置文件

<span style="font-size:18px;">cd sqoop/conf/

cat  sqoop-env-template.sh  >> sqoop-env.sh

vi sqoop-env.sh </span>

在sqoop-env.sh中加入例如以下代码

<span style="font-size:18px;"># Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

#

#     http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# included in all the hadoop scripts with source command

# should not be executable directly

# also should not be passed any arguments, since we need original $*

# Set Hadoop-specific environment variables here.

#Set path to where bin/hadoop is available

export HADOOP_COMMON_HOME=/home/hadoop/hadoop

#Set path to where hadoop-*-core.jar is available

export HADOOP_MAPRED_HOME=/home/hadoop/hadoop

#set the path to where bin/hbase is available

export HBASE_HOME=/home/hadoop/hbase

#Set the path to where bin/hive is available

export HIVE_HOME=/home/hadoop/hive

#Set the path for where zookeper config dir is

export ZOOCFGDIR=/home/hadoop/zookeeper

</span>

该配置文件里仅仅有HADOOP_COMMON_HOME的配置是必须的另外关于hbase和hive的配置假设用到须要配置不用的话就不用配置

5.3 加入须要的jar包到lib以下

这里的jar包指的是连接关系型数据库的jar 比方mysql oracle 这些jar包是须要自己加入到lib文件夹以下去的

<span style="font-size:18px;"> cp  ~/hive/lib/mysql-connector-java-5.1.30.jar   ~/sqoop/lib/</span>

5.4 加入环境变量

vi ~/.profile

加入例如以下内容

<span style="font-size:18px;">export SQOOP_HOME=/home/hadoop/sqoop

export SBT_HOME=/home/hadoop/sbt

export PATH=$PATH:$SBT_HOME/bin:$SQOOP_HOME/bin

export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib

</span>

source ~/.profile使配置文件生效

5.5 測试mysql数据库的连接使用

①连接mysql数据库，列出全部的数据库

<span style="font-size:18px;">hadoop@caozw:~/sqoop/conf$ sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P

Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

14/10/21 18:15:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2

Enter password:

14/10/21 18:15:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

information_schema

XINGXUNTONG

XINGXUNTONG_HIVE

amon

hive

hmon

mahout

mysql

oozie

performance_schema

realworld

rman

scm

smon

</span>

-P表示输入password 能够直接使用--password来制定password

②mysql数据库的表导入到HDFS

hadoop@caozw:~/sqoop/conf$ sqoop import -m 1  --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test1

Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

14/10/21 18:19:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2

Enter password:

14/10/21 18:19:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

14/10/21 18:19:21 INFO tool.CodeGenTool: Beginning code generation

14/10/21 18:19:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:19:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:19:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop

Note: /tmp/sqoop-hadoop/compile/15cb67e2b315154cdf02e3a17cf32bbe/weblogs.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

14/10/21 18:19:23 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/15cb67e2b315154cdf02e3a17cf32bbe/weblogs.jar

14/10/21 18:19:23 WARN manager.MySQLManager: It looks like you are importing from mysql.

14/10/21 18:19:23 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

14/10/21 18:19:23 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

14/10/21 18:19:23 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

14/10/21 18:19:23 INFO mapreduce.ImportJobBase: Beginning import of weblogs

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.3.0-cdh5.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.98.1-cdh5.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

14/10/21 18:19:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

14/10/21 18:19:24 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

14/10/21 18:19:25 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

14/10/21 18:19:25 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

14/10/21 18:19:40 INFO db.DBInputFormat: Using read commited transaction isolation

14/10/21 18:19:41 INFO mapreduce.JobSubmitter: number of splits:1

14/10/21 18:19:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1413879907572_0002

14/10/21 18:19:46 INFO impl.YarnClientImpl: Submitted application application_1413879907572_0002

14/10/21 18:19:46 INFO mapreduce.Job: The url to track the job: N/A

14/10/21 18:19:46 INFO mapreduce.Job: Running job: job_1413879907572_0002

14/10/21 18:20:12 INFO mapreduce.Job: Job job_1413879907572_0002 running in uber mode : false

14/10/21 18:20:12 INFO mapreduce.Job:  map 0% reduce 0%

14/10/21 18:20:41 INFO mapreduce.Job:  map 100% reduce 0%

14/10/21 18:20:45 INFO mapreduce.Job: Job job_1413879907572_0002 completed successfully

14/10/21 18:20:46 INFO mapreduce.Job: Counters: 30

	File System Counters

		FILE: Number of bytes read=0

		FILE: Number of bytes written=107189

		FILE: Number of read operations=0

		FILE: Number of large read operations=0

		FILE: Number of write operations=0

		HDFS: Number of bytes read=87

		HDFS: Number of bytes written=251130

		HDFS: Number of read operations=4

		HDFS: Number of large read operations=0

		HDFS: Number of write operations=2

	Job Counters

		Launched map tasks=1

		Other local map tasks=1

		Total time spent by all maps in occupied slots (ms)=22668

		Total time spent by all reduces in occupied slots (ms)=0

		Total time spent by all map tasks (ms)=22668

		Total vcore-seconds taken by all map tasks=22668

		Total megabyte-seconds taken by all map tasks=23212032

	Map-Reduce Framework

		Map input records=3000

		Map output records=3000

		Input split bytes=87

		Spilled Records=0

		Failed Shuffles=0

		Merged Map outputs=0

		GC time elapsed (ms)=41

		CPU time spent (ms)=1540

		Physical memory (bytes) snapshot=133345280

		Virtual memory (bytes) snapshot=1201442816

		Total committed heap usage (bytes)=76021760

	File Input Format Counters

		Bytes Read=0

	File Output Format Counters

		Bytes Written=251130

14/10/21 18:20:46 INFO mapreduce.ImportJobBase: Transferred 245.2441 KB in 80.7974 seconds (3.0353 KB/sec)

14/10/21 18:20:46 INFO mapreduce.ImportJobBase: Retrieved 3000 records.

-m 表示启动几个map任务来读取数据假设数据库中的表没有主键这个參数是必须设置的并且仅仅能设定为1 否则会提示

14/10/21 18:18:27 ERROR tool.ImportTool: Error during import: No primary key could be found for table weblogs. Please specify one with --split-by or perform a sequential import with '-m 1'.

而这个參数设置为几会直接决定导入的文件在hdfs上面是分成几块的比方设置为1 则会产生一个数据文件

14/10/21 18:23:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 2 items

-rw-r--r--   1 hadoop supergroup          0 2014-10-21 18:20 /user/sqoop/test1/_SUCCESS

-rw-r--r--   1 hadoop supergroup     251130 2014-10-21 18:20 /user/sqoop/test1/part-m-00000

这里加入主键：

mysql> desc weblogs;

+--------------+-------------+------+-----+---------+-------+

| Field        | Type        | Null | Key | Default | Extra |

+--------------+-------------+------+-----+---------+-------+

| md5          | varchar(32) | YES  |     | NULL    |       |

| url          | varchar(64) | YES  |     | NULL    |       |

| request_date | date        | YES  |     | NULL    |       |

| request_time | time        | YES  |     | NULL    |       |

| ip           | varchar(15) | YES  |     | NULL    |       |

+--------------+-------------+------+-----+---------+-------+

5 rows in set (0.00 sec)

mysql> alter table weblogs add primary key(md5,ip);

Query OK, 3000 rows affected (1.60 sec)

Records: 3000  Duplicates: 0  Warnings: 0

mysql> desc weblogs;

+--------------+-------------+------+-----+---------+-------+

| Field        | Type        | Null | Key | Default | Extra |

+--------------+-------------+------+-----+---------+-------+

| md5          | varchar(32) | NO   | PRI |         |       |

| url          | varchar(64) | YES  |     | NULL    |       |

| request_date | date        | YES  |     | NULL    |       |

| request_time | time        | YES  |     | NULL    |       |

| ip           | varchar(15) | NO   | PRI |         |       |

+--------------+-------------+------+-----+---------+-------+

5 rows in set (0.02 sec)

然后指定-m

hadoop@caozw:~/sqoop/conf$ sqoop import -m 2  --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test2

Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

14/10/21 18:22:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2

Enter password:

14/10/21 18:24:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

14/10/21 18:24:04 INFO tool.CodeGenTool: Beginning code generation

14/10/21 18:24:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:24:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:24:04 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop

Note: /tmp/sqoop-hadoop/compile/7061f445f29510afa2b89729126a57b9/weblogs.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

14/10/21 18:24:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/7061f445f29510afa2b89729126a57b9/weblogs.jar

14/10/21 18:24:07 WARN manager.MySQLManager: It looks like you are importing from mysql.

14/10/21 18:24:07 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

14/10/21 18:24:07 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

14/10/21 18:24:07 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

14/10/21 18:24:07 ERROR tool.ImportTool: Error during import: No primary key could be found for table weblogs. Please specify one with --split-by or perform a sequential import with '-m 1'.

hadoop@caozw:~/sqoop/conf$ sqoop import -m 2  --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test2

Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

14/10/21 18:30:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2

Enter password:

14/10/21 18:30:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

14/10/21 18:30:07 INFO tool.CodeGenTool: Beginning code generation

14/10/21 18:30:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:30:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:30:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop

Note: /tmp/sqoop-hadoop/compile/6dbf2401c1a51b81c5b885e6f7d43137/weblogs.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

14/10/21 18:30:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/6dbf2401c1a51b81c5b885e6f7d43137/weblogs.jar

14/10/21 18:30:09 WARN manager.MySQLManager: It looks like you are importing from mysql.

14/10/21 18:30:09 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

14/10/21 18:30:09 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

14/10/21 18:30:09 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

14/10/21 18:30:09 WARN manager.CatalogQueryManager: The table weblogs contains a multi-column primary key. Sqoop will default to the column md5 only for this job.

14/10/21 18:30:09 WARN manager.CatalogQueryManager: The table weblogs contains a multi-column primary key. Sqoop will default to the column md5 only for this job.

14/10/21 18:30:09 INFO mapreduce.ImportJobBase: Beginning import of weblogs

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.3.0-cdh5.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.98.1-cdh5.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

14/10/21 18:30:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

14/10/21 18:30:09 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

14/10/21 18:30:10 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

14/10/21 18:30:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

14/10/21 18:30:17 INFO db.DBInputFormat: Using read commited transaction isolation

14/10/21 18:30:17 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`md5`), MAX(`md5`) FROM `weblogs`

14/10/21 18:30:17 WARN db.TextSplitter: Generating splits for a textual index column.

14/10/21 18:30:17 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

14/10/21 18:30:17 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

14/10/21 18:30:18 INFO mapreduce.JobSubmitter: number of splits:4

14/10/21 18:30:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1413879907572_0003

14/10/21 18:30:19 INFO impl.YarnClientImpl: Submitted application application_1413879907572_0003

14/10/21 18:30:19 INFO mapreduce.Job: The url to track the job: N/A

14/10/21 18:30:19 INFO mapreduce.Job: Running job: job_1413879907572_0003

14/10/21 18:30:32 INFO mapreduce.Job: Job job_1413879907572_0003 running in uber mode : false

14/10/21 18:30:32 INFO mapreduce.Job:  map 0% reduce 0%

14/10/21 18:31:12 INFO mapreduce.Job:  map 50% reduce 0%

14/10/21 18:31:13 INFO mapreduce.Job:  map 75% reduce 0%

14/10/21 18:31:15 INFO mapreduce.Job:  map 100% reduce 0%

14/10/21 18:31:21 INFO mapreduce.Job: Job job_1413879907572_0003 completed successfully

14/10/21 18:31:22 INFO mapreduce.Job: Counters: 30

	File System Counters

		FILE: Number of bytes read=0

		FILE: Number of bytes written=429312

		FILE: Number of read operations=0

		FILE: Number of large read operations=0

		FILE: Number of write operations=0

		HDFS: Number of bytes read=532

		HDFS: Number of bytes written=251209

		HDFS: Number of read operations=16

		HDFS: Number of large read operations=0

		HDFS: Number of write operations=8

	Job Counters

		Launched map tasks=4

		Other local map tasks=4

		Total time spent by all maps in occupied slots (ms)=160326

		Total time spent by all reduces in occupied slots (ms)=0

		Total time spent by all map tasks (ms)=160326

		Total vcore-seconds taken by all map tasks=160326

		Total megabyte-seconds taken by all map tasks=164173824

	Map-Reduce Framework

		Map input records=3001

		Map output records=3001

		Input split bytes=532

		Spilled Records=0

		Failed Shuffles=0

		Merged Map outputs=0

		GC time elapsed (ms)=806

		CPU time spent (ms)=5450

		Physical memory (bytes) snapshot=494583808

		Virtual memory (bytes) snapshot=4805771264

		Total committed heap usage (bytes)=325058560

	File Input Format Counters

		Bytes Read=0

	File Output Format Counters

		Bytes Written=251209

14/10/21 18:31:22 INFO mapreduce.ImportJobBase: Transferred 245.3213 KB in 72.5455 seconds (3.3816 KB/sec)

这里产生的文件跟主键的字段个数以及-m的參数是相关的大致是-m的值乘以主键字段数，有待考证

hadoop@caozw:~/study/cdh5$ hadoop fs -ls /user/sqoop/test2/

14/10/21 18:32:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 5 items

-rw-r--r--   1 hadoop supergroup          0 2014-10-21 18:31 /user/sqoop/test2/_SUCCESS

-rw-r--r--   1 hadoop supergroup          0 2014-10-21 18:31 /user/sqoop/test2/part-m-00000

-rw-r--r--   1 hadoop supergroup     251130 2014-10-21 18:31 /user/sqoop/test2/part-m-00001

-rw-r--r--   1 hadoop supergroup          0 2014-10-21 18:31 /user/sqoop/test2/part-m-00002

-rw-r--r--   1 hadoop supergroup         79 2014-10-21 18:31 /user/sqoop/test2/part-m-00003

这里的主键设计的不合理导致数据分布不均匀～～有待改进

③数据导出Oracle和HBase

使用export可将hdfs中数据导入到远程数据库中

export --connect jdbc:oracle:thin:@192.168.**.**:**:**--username **--password=** -m1table VEHICLE--export-dir /user/root/VEHICLE

向Hbase导入数据

sqoop import --connect jdbc:oracle:thin:@192.168.**.**:**:**--username**--password=**--m 1 --table VEHICLE --hbase-create-table --hbase-table VEHICLE--hbase-row-key ID --column-family VEHICLEINFO --split-by ID

5.6 測试Mysql数据库的使用

前提：导入mysql jdbc的jar包

①測试数据库连接

sqoop list-databases –connect jdbc:mysql://192.168.10.63 –username root–password 123456

②Sqoop的使用

下面全部的命令每行之后都存在一个空格，不要忘记

（下面6中命令都没有进行过成功測试）

<1>mysql–>hdfs

sqoop export –connect

jdbc:mysql://192.168.10.63/ipj

–username root

–password 123456

–table ipj_flow_user

–export-dir hdfs://192.168.10.63:8020/user/flow/part-m-00000

前提：

(1)hdfs中文件夹/user/flow/part-m-00000必须存在

(2)假设集群设置了压缩方式lzo。那么本机必须得安装且配置成功lzo

(3)hadoop集群中每一个节点都要有对mysql的操作权限

<2>hdfs–>mysql

sqoop import –connect

jdbc:mysql://192.168.10.63/ipj

–table ipj_flow_user

<3>mysql–>hbase

sqoop import –connect

jdbc:mysql://192.168.10.63/ipj

–table ipj_flow_user

–hbase-table ipj_statics_test

–hbase-create-table

–hbase-row-key id

–column-family imei

<4>hbase–>mysql

关于将Hbase的数据导入到mysql里，Sqoop并非直接支持的，一般採用例如以下3种方法：

第一种：将Hbase数据扁平化成HDFS文件，然后再由Sqoop导入.

另外一种：将Hbase数据导入Hive表中，然后再导入mysql。

第三种：直接使用Hbase的Java API读取表数据。直接向mysql导入

不须要使用Sqoop。

<5>mysql–>hive

sqoop import –connect

jdbc:mysql://192.168.10.63/ipj

–table hive_table_test

–hive-import

–hive-table hive_test_table 或–create-hive-table hive_test_table

<6>hive–>mysql

sqoop export –connect

jdbc:mysql://192.168.10.63/ipj

–username hive

–password 123456

–table target_table

–export-dir /user/hive/warehouse/uv/dt=mytable

前提：mysql中表必须存在

③Sqoop其它操作

<1>列出mysql中的全部数据库

sqoop list-databases –connect jdbc:mysql://192.168.10.63:3306/ –usernameroot –password 123456

<2>列出mysql中某个库下全部表

sqoop list-tables –connect jdbc:mysql://192.168.10.63:3306/ipj –usernameroot –password 123456

6 Sqoop1的性能

測试数据：

表名：tb_keywords

行数：11628209

数据文件大小：1.4G

測试结果：

	HDFS--->DB	HDFS<---DB
Sqoop	428s	166s
HDFS<->FILE<->DB	209s	105s

从结果上来看，以FILE作为中转方式性能是要高于SQOOP的,原因例如以下：

本质上SQOOP使用的是JDBC，效率不会比MYSQL自带的导入\导出工具效率高以导入数据到DB为例。SQOOP的设计思想是分阶段提交，也就是说如果一个表有1K行。那么它会先读出100行（默认值），然后插入，提交。再读取100行……如此往复

即便如此。SQOOP也是有优势的。比方说使用的便利性，任务运行的容错性等。在一些測试环境中假设须要的话能够考虑把它拿来作为一个工具使用。

sqoop的安装与使用的更多相关文章

Hadoop 2.6.0-cdh5.4.0集群环境搭建和Apache-Hive、Sqoop的安装
搭建此环境主要用来hadoop的学习,因此我们的操作直接在root用户下,不涉及HA. Software: Hadoop 2.6.0-cdh5.4.0 Apache-hive-2.1.0-bin Sq ...
sqoop的安装
Sqoop是一个用来完成Hadoop和关系型数据库中的数据相互转移的工具, 他可以将关系型数据库(MySql,Oracle,Postgres等)中的数据导入Hadoop的HDFS中, 也可以将HDFS ...
Hive/Hbase/Sqoop的安装教程
Hive/Hbase/Sqoop的安装教程 HIVE INSTALL 1.下载安装包:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3 ...
Sqoop的安装及简单使用
SQOOP是用于对数据进行导入导出的. (1)把MySQL.Oracle等数据库中的数据导入到HDFS.Hive.HBase中 (2)把HDFS.Hive.HBase中的数据导出到MySQL.Or ...
Sqoop的安装配置及使用
一.Sqoop基础:连接关系型数据库与Hadoop的桥梁 1.1 Sqoop的基本概念 Hadoop正成为企业用于大数据分析的最热门选择,但想将你的数据移植过去并不容易.Apache Sqoop正在加 ...
Sqoop环境安装
环境下载首先将下载的 sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz放到 /usr/hadoop/目录下(该目录可以自定义,一般为Hadoop集群安装目录),然 ...
Sqoop的安装和验证
Sqoop是一个用来完成Hadoop和关系型数据库中的数据相互转移的工具,它可以将关系型数据库中的数据导入到Hadoop的HDFS中,也可以将HDFS的数据导入到关系型数据库中. Kafka是一个开源 ...
【sqoop】安装配置测试sqoop1
3.1.1 下载sqoop1:sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 3.1.2 解压并查看目录: [hadoop@hadoop01 ~]$ tar -zxvf sq ...
大数据之路week07--day06 （Sqoop 的安装及配置）
Sqoop 的安装配置比较简单. 提供安装需要的安装包和连接mysql的驱动的百度云链接: 链接:https://pan.baidu.com/s/1pdFj0u2lZVFasgoSyhz-yQ 提取码 ...
Sqoop 之安装
Sqoop 之安装前言安装 Sqoop 的前提是已经具备 Java 和 Hadoop 的环境. 一.下载并解压 1) 下载地址:http://mirrors.hust.edu.cn/apache ...

随机推荐

《C++ primer》--第7章
删除指针后,该指针就变成了悬垂指针.悬垂指针指向曾经存放对象的内存,但该对象已经不再存在了. 习题7.8 举一个例子说明什么时候应该将形参定义为引用类型.再举一个例子说明什么时候不应该将形参定义为引用 ...
细雨学习笔记：Jmeter之post processors（后置处理器）
后置处理器
我的Myeclipse黑色主题
Cloudera的安装
To enable these parts of the tutorial, choose one of the following options: To use Cloudera Express ...
【原】Storm 入门教程目录
Storm入门教程 1. Storm基础 Storm Storm主要特点 Storm基本概念 Storm调度器 Storm配置 Guaranteeing Message Processing(消息处理 ...
login：用户登陆的意思
login:用户登陆的意思在思科的设备上有两种登录方式: 一种是本地方式,使用console口: 一种是远程方式(或者叫做网络方式):使用的是telnet等 1.默认情况下,思科的远程访问是禁止的. ...
LeetCode题解——4Sum
题目: 给定一个数组,找出其中和为0的所有4个数组合,每个组合内的4个数非递降. 解法: ①先排序,然后利用4个指针,前两个遍历,后两个在第二个指针之后的部分里夹逼,时间O(N3). ②或者利用一个哈 ...
CSS快速制作图片轮播的焦点
来源:http://www.ido321.com/858.html 效果图: 演示地址:http://jsfiddle.net/Web_Code/q5qfd8aL/embedded/result/ 代 ...
无序数组a，求a[i]-a[j]的最大值，且i<j
一道面试题:对于无序数组a,求a[i]-a[j]的最大值,其中i<j package test; import java.util.Arrays; public class FindMax { ...
叉积判断 POJ1696
// 叉积判断 POJ1696 #include <iostream> #include <algorithm> #include <cstring> #inclu ...

sqoop的安装与使用

sqoop的安装与使用的更多相关文章

随机推荐

热门专题