Hive/Hbase/Sqoop的安装教程
Hive/Hbase/Sqoop的安装教程
HIVE INSTALL
1.下载安装包:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/
2.上传到Linux指定目录,解压:
- mkdir hive
- mv apache-hive-2.3.3-bin.tar.gz hive
- tar -zxvf apache-hive-2.3.3-bin.tar.gz
- mv apache-hive-2.3.3-bin apache-hive-2.3.3
### 安装目录为:/app/hive/apache-hive-2.3.3
3.配置环境变量:
sudo vi /etc/profile
添加环境变量:
- export HIVE_HOME=/app/hive/apache-hive-2.3.3
- export PATH=$PATH:$HIVE_HOME/bin
:wq #保存退出
4.修改HIVE配置文件:
配置文件hive-env.sh (在原有的基础上修改,没有的项就添加):
- cd /app/hive/apache-hive-2.3.3/conf
- cp hive-env.sh.template hive-env.sh
- ###在文件中添加如下内容-- 去掉#,并把目录改为自己设定的目录
- export HADOOP_HEAPSIZE=1024
- export HADOOP_HOME=/app/hadoop/hadoop-2.7.7 #hadoop的安装目录
- export HIVE_CONF_DIR=/app/hive/apache-hive-2.3.3/conf
- export HIVE_HOME=/app/hive/apache-hive-2.3.3
- export HIVE_AUX_JARS_PATH=/app/hive/apache-hive-2.3.3/lib
- export JAVA_HOME=/app/lib/jdk
创建hdfs文件目录:
- cd /app/hive/apache-hive-2.3.3
- mkdir hive_site_dir
- cd hive_site_dir
- hdfs dfs -mkdir -p warehouse #使用这条命令的前提是hadoop已经安装好了
- hdfs dfs -mkdir -p tmp
- hdfs dfs -mkdir -p log
- hdfs dfs -chmod -R 777 warehouse
- hdfs dfs -chmod -R 777 tmp
- hdfs dfs -chmod -R 777 log
- 创建临时文件夹:
- cd /app/hive/apache-hive-2.3.3
- mkdir tmp
配置文件hive-site.xml (在原有的基础上修改):
cp hive-default.xml.template hive-site.xml
vi hive-site.xml
>>配置一些数据库的信息 ConnectionURL/ConnectionUserName/ConnectionPassword/ConnectionDriverName
- <!--mysql database connection setting -->
- <property>
- <name>javax.jdo.option.ConnectionDriverName</name>
- <value>com.mysql.jdbc.Driver</value>
- </property>
- <property>
- <name>javax.jdo.option.ConnectionURL</name>
- <value>jdbc:mysql://10.28.85.149:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
- </property>
- <property>
- <name>javax.jdo.option.ConnectionUserName</name>
- <value>szprd</value>
- </property>
- <property>
- <name>javax.jdo.option.ConnectionPassword</name>
- <value>szprd</value>
- </property>
>>配置hdfs文件目录
- <property>
- <name>hive.exec.scratchdir</name>
- <!--<value>/tmp/hive</value>-->
- <value>/app/hive/apache-hive-2.3.3/hive_site_dir/tmp</value>
- <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
- </property>
- <property>
- <name>hive.metastore.warehouse.dir</name>
- <value>/app/hive/apache-hive-2.3.3/hive_site_dir/warehouse</value>
- </property>
- <property>
- <name>hive.exec.local.scratchdir</name>
- <!--<value>${system:java.io.tmpdir}/${system:user.name}</value> -->
- <value>/app/hive/apache-hive-2.3.3/tmp/${system:user.name}</value>
- <description>Local scratch space for Hive jobs</description>
- </property>
- <property>
- <name>hive.downloaded.resources.dir</name>
- <!--<value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>-->
- <value>/app/hive/apache-hive-2.3.3/tmp/${hive.session.id}_resources</value>
- <description>Temporary local directory for added resources in the remote file system.</description>
- </property>
- <property>
- <name>hive.querylog.location</name>
- <!--<value>${system:java.io.tmpdir}/${system:user.name}</value>-->
- <value>/app/hive/apache-hive-2.3.3/hive_site_dir/log/${system:user.name}</valu
- <description>Location of Hive run time structured log file</description>
- </property>
- <property>
- <name>hive.metastore.schema.verification</name>
- <value>false</value>
- <description>
- Enforce metastore schema version consistency.
- True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
- schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
- proper metastore schema migration. (Default)
- False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
- </description>
- </property>
修改完配置文件后,:wq 保存退出
5.下载合适版本的mysql驱动包,复制到HIVE安装目录的 lib目录下
https://dev.mysql.com/downloads/connector/j/
6.初始化数据库(在启动hive前一定要先执行这个命令哦,如果失败了,请查看数据库配置信息是否准确~ )
- cd /app/hive/apache-hive-2.3.3/bin
- ./schematool -initSchema -dbType mysql
7.启动hive
hive #这里配置了环境变量后,可以在任意目录下执行 (/etc/profile)
8.实时查看日志启动hive命令(在hive安装目录的bin目录下执行):
./hive -hiveconf hive.root.logger=DEBUG,console
HBASE INSTALL
1.下载hbase安装包: http://hbase.apache.org/downloads.html
2.解压: tar -zxvf hbase-1.2.6.1-bin.tar.gz
3.配置环境变量: (加在最后面)
vi /etc/profile
- #HBase Setting
- export HBASE_HOME=/app/hbase/hbase-1.2.6.1
- export PATH=$PATH:$HBASE_HOME/bin
4.编辑配置文件: hbase-env.sh
- export HBASE_MANAGES_ZK=false
- export HBASE_PID_DIR=/app/hadoop/hadoop-2.7.7/pids #如果该目录不存在,则先创建
- export JAVA_HOME=/app/lib/jdk #指定JDK的安装目录
编辑配置文件: hbase-site.xml
在configuration节点添加如下配置:
- <property>
- <name>hbase.rootdir</name>
- <value>hdfs://192.168.1.202:9000/hbase</value>
- </property>
- <property>
- <name>hbase.zookeeper.property.dataDir</name>
- <value>/home/vc/dev/MQ/ZK/zookeeper-3.4.12</value>
- </property>
- <property>
- <name>zookeeper.znode.parent</name>
- <value>/hbase</value>
- </property>
- <property>
- <name>hbase.cluster.distributed</name>
- <value>true</value>
- </property>
- <property>
- <name>hbase.unsafe.stream.capability.enforce</name>
- <value>false</value>
- <description>
- Controls whether HBase will check for stream capabilities (hflush/hsync). Disable this if you intend to run on LocalFileSystem, denoted by arootdir with the 'file://' scheme, but be mindful of the NOTE below.
- WARNING: Setting this to false blinds you to potential data loss and inconsistent system state in the event of process and/or node failures.If HBase is complaining of an inability to use hsync or hflush it's most likely not a false positive.
- </description>
- </property>
5.启动zookeeper
进入zookeeper的安装目录下的bin目录,执行 ./zkServer.sh
然后启动客户端: ./zkCli.sh
启动成功后,输入: create /hbase hbase
6.启动hbase
进入Hbase的bin目录: ./start-hbase.sh
./hbase shell #这里启动成功后就可以开始执行hbase相关命令了
list #没有报错表示成功
7.web访问HBASE: http://10.28.85.149:16010/master-status #ip为当前服务器的ip,端口为16010
#Sqoop install
1.下载安装包: https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/
2.解压: tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
更改文件名: mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7_hadoop-2.6.0
3. 配置环境变量:/etc/profile
- #Sqoop Setting
- export SQOOP_HOME=/app/sqoop/sqoop-1.4.7_hadoop-2.6.0
- export PATH=$PATH:$SQOOP_HOME/bin
4.将mysql的驱动包复制到 Sqoop安装目录的lib目录下
https://dev.mysql.com/downloads/connector/j/
5.编辑配置文件: sqoop的安装目录下的 conf下
vi sqoop-env.sh
- #Set path to where bin/hadoop is available
- export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.7.7
- #Set path to where hadoop-*-core.jar is available
- export HADOOP_MAPRED_HOME=/app/hadoop/hadoop-2.7.7
- #set the path to where bin/hbase is available
- export HBASE_HOME=/app/hbase/hbase-1.2.6.1
- #Set the path to where bin/hive is available
- export HIVE_HOME=/app/hive/apache-hive-2.3.3
- #Set the path for where zookeper config dir is
- export ZOOCFGDIR=/app/zookeeper/zookeeper-3.4.12
6,输入命令:
sqoop help #查看相关的sqoop命令
sqoop version #查看sqoop的版本
ps:
关于停止hbase的命令: stop-hbase.sh ,出现关于pid的错误提示时,请参考这篇博文:https://blog.csdn.net/xiao_jun_0820/article/details/35222699
hadoop的安装教程:http://note.youdao.com/noteshare?id=0cae2da671de0f7175376abb8e705406
zookeeper的安装教程:http://note.youdao.com/noteshare?id=33e37b0967da40660920f755ba2c03f0
- # hadoop 伪分布式模式安装
- # 前提 JDK 安装成功
- # 下载hadoop2.7.7
- ```
- cd /home/vc/dev/hadoop
- wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
- ```
- # 解压缩
- ```
- tar -zxvf hadoop-2.7.7.tar.gz
- ```
- ## 配置hadoop的环境变量,在/etc/profile下追加 hadoop配置
- ```
- # hadoop home setting
- export HADOOP_HOME=/app/hadoop/hadoop-2.7.7
- export HADOOP_INSTALL=${HADOOP_HOME}
- export PATH=$PATH:$HADOOP_HOME/bin
- export PATH=$PATH:$HADOOP_HOME/sbin
- export HADOOP_MAPRED_HOME=${HADOOP_HOME}
- export HADOOP_COMMON_HOME=${HADOOP_HOME}
- export HADOOP_HDFS_HOME=${HADOOP_HOME}
- export YARN_HOME=${HADOOP_HOME}
- export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
- export HADOOP_INSTALL=$HADOOP_HOME
- export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
- ```
- ## 修改 hadoop安装目录/etc/hadoop/hadoop-env.sh 文件
- ```
- # The java implementation to use.
- export JAVA_HOME=/home/vc/dev/jdk/jdk1.8.0_161
- ```
- ### hadoop安装目录/etc/hadoop/core-site.xml
- ```
- <configuration>
- <!-- 指定hadoop运行时产生文件的存储路径;指定被hadoop使用的目录,用于存储数据文件。-->
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/vc/dev/hadoop/hadoop-2.7.7/tmp</value>
- <description>Abase for other temporary directories.</description>
- </property>
- <!-- 指定HDFS老大(namenode)的通信地址指定默认的文件系统。 -->
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://192.168.1.202:9000</value>
- </property>
- </configuration>
- ```
- ### 配置HDFS ,etc/hadoop/hdfs-site.xml
- ```
- <configuration>
- <!-- 设置namenode存放的路径 -->
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>file:///home/vc/dev/hadoop/hadoop-2.7.7/hdfs/name</value>
- </property>
- <!-- 设置hdfs副本数量 -->
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <!-- 设置datanode存放的路径 -->
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>file:///home/vc/dev/hadoop/hadoop-2.7.7/hdfs/data</value>
- </property>
- </configuration>
- ~
- ```
- ### 设置hadoop 伪分布式下免密登入,Hadoop集群节点之间的免密登入务必配置成功,不然有各种问题
- 如果是单节点情况下免密登入测试`ssh localhost`,如果不能登入成功,执行下面命令:
- ```
- ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
- cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- chmod 0600 ~/.ssh/authorized_keys
- ```
- ### 伪分布式下不需要配置/etc/hosts文件,真分布式下需要配置各主机和IP的映射关系。
- # hadoop伪分布式下启动
- ## 配置 hdfs
- ```
- # 第一次启动hdfs需要格式化:出现询问输入Y or N,全部输Y即可
- bin/hdfs namenode -format
- # 启动 Start NameNode daemon and DataNode daemon: 启动HDFS,这个命令启动hadoop单节点集群
- sbin/start-dfs.sh
- ```
- 通过上面启动后即可在web页面浏览 NameNode 节点信息:
- 
- ```
- # 通过hadoop 命令在hdfs上创建目录
- hadoop fs -mkdir /test
- # 或者通过这个命令
- hdfs dfs -mkdir /user
- # 上传文件
- ```
- 
- ## 关闭 HDFS
- ```
- ./sbin/stop-dfs.sh
- ```
- ## 配置 yarn
- ### etc/hadoop/mapred-site.xml
- ```
- <configuration>
- <!-- 通知框架MR使用YARN -->
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- </configuration>
- ```
- ### etc/hadoop/yarn-site.xml
- ```
- <configuration>
- <!-- Site specific YARN configuration properties -->
- <!-- reducer取数据的方式是mapreduce_shuffle -->
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
- <value>org.apache.hadoop.mapred.ShuffleHandler</value>
- </property>
- </configuration>
- ```
- 
- 
- ## yarn 启动和停止
- ```
- ./sbin/start-yarn.sh
- ./sbin/stop-yarn.sh
- ```
- ## 查看集群状态
- ```
- ./bin/hadoop dfsadmin -report
- ```
- # 伪分布式下测试
- ```
- //服务器上新建目录
- mkdir ~/input
- //进入服务器目录并将hadoop配置文件当做数据文件复制到input目录
- cd ~/input
- cp /app/hadoop/hadoop-2.7.7/etc/hadoop/*.xml ./
- //将 input下的文件上传到hdfs分布式文件系统中/one目录下
- hdfs dfs -put ./* /one
- //检查上传到hdfs中的文件
- hdfs dfs -ls /one
- //执行jar文件,务必保证计算结果目录 /output 在hdfs上不存在。不然报错
- hadoop jar /app/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep /one /output 'dfs[a-z.]+'
- //将计算结果目录导出到服务器下~/input目录中
- hdfs dfs -get /output
- // 查看内容
- cat output/*
- ```
- ---
- # ZK 安装
- # 下载zk解压并安装:(zookeeper-3.4.9.tar.gz)
- # 设置环境变量
- 
- # 改配置文件(配置文件存放在$ZOOKEEPER_HOME/conf/目录下,将zoo_sample.cfg文件名称改为zoo.cfg)
- 配置说明:
- - tickTime:这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每个 tickTime 时间就会发送一个心跳。
- - dataDir:顾名思义就是 Zookeeper 保存数据的目录,默认情况下,Zookeeper 将写数据的日志文件也保存在这个目录里。
- - clientPort:这个端口就是客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求。
- 
- 4.1单机模式
- - 下载zookeeper的安装包之后, 解压到合适目录. 进入zookeeper目录下的conf子目录, 创建`cp zoo_sample.cfg zoo.cfg`根据模板创建配置文件,并配置如下参数。
- - tickTime=2000
- - dataDir=/home/vc/dev/MQ/ZK/data
- - dataLogDir=/home/vc/dev/MQ/ZK/log
- - clientPort=2181
- ## 每个参数的含义说明
- - tickTime: zookeeper中使用的tick基本时间单位, 毫秒值.
- - dataDir: 数据目录. 可以是任意目录.
- - dataLogDir: log目录, 同样可以是任意目录. 如果没有设置该参数, 将使用和dataDir相同的设置.
- - clientPort: 监听client连接的端口号
- # 启动zk
- `/dev/Zk/zookeeper-3.4.9/bin$ ./zkServer.sh start`
- 
- # 查看是否起来
- 使用命令:`netstat -antp | grep 2181`
- 
- # 通过zCl.sh链接到zk服务
- ```
- ./zkCli.sh -server localhost:2181 链接到本机zk服务
- history 执行命令
- quit 客户端断开zkserver链接
- ```
- 
- # 关闭Zk服务
- `./zkServer.sh stop`
- ---
- # [HIVE SQOOP HBASE安装博客链接:](https://www.cnblogs.com/DFX339/p/9550213.html)
- # HIVE-INSTALL
- - 下载安装包:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/
- - 上传到Linux指定目录,解压:
- ```
- mkdir hive
- mv apache-hive-2.3.3-bin.tar.gz hive
- tar -zxvf apache-hive-2.3.3-bin.tar.gz
- mv apache-hive-2.3.3-bin apache-hive-2.3.3
- ### 安装目录为:/app/hive/apache-hive-2.3.3
- ```
- - 配置环境变量:
- ```
- sudo vi /etc/profile
- 添加:export HIVE_HOME=/app/hive/apache-hive-2.3.3
- export PATH=$PATH:$HIVE_HOME/bin
- :wq #保存退出
- ```
- - 修改HIVE配置文件:
- - 配置文件hive-env.sh (在原有的基础上修改,没有的项就添加):
- ```
- cd /app/hive/apache-hive-2.3.3/conf
- cp hive-env.sh.template hive-env.sh
- 在文件中添加如下内容(去掉#,并把目录改为自己设定的目录)
- export HADOOP_HEAPSIZE=1024
- export HADOOP_HOME=/app/hadoop/hadoop-2.7.7 #hadoop的安装目录
- export HIVE_CONF_DIR=/app/hive/apache-hive-2.3.3/conf
- export HIVE_HOME=/app/hive/apache-hive-2.3.3
- export HIVE_AUX_JARS_PATH=/app/hive/apache-hive-2.3.3/lib
- export JAVA_HOME=/app/lib/jdk
- ```
- - 创建hdfs文件目录:
- ```
- cd /app/hive/apache-hive-2.3.3
- mkdir hive_site_dir
- cd hive_site_dir
- hdfs dfs -mkdir -p warehouse #使用这条命令的前提是hadoop已经安装好了
- hdfs dfs -mkdir -p tmp
- hdfs dfs -mkdir -p log
- hdfs dfs -chmod -R 777 warehouse
- hdfs dfs -chmod -R 777 tmp
- hdfs dfs -chmod -R 777 log
- 创建临时文件夹:
- cd /app/hive/apache-hive-2.3.3
- mkdir tmp
- ```
- - 配置文件hive-site.xml (在原有的基础上修改):
- ```
- cp hive-default.xml.template hive-site.xml
- vi hive-site.xml
- ```
- - 配置一些数据库的信息 ConnectionURL/ConnectionUserName/ConnectionPassword/ConnectionDriverName
- ```
- <!--mysql database connection setting -->
- <property>
- <name>javax.jdo.option.ConnectionDriverName</name>
- <value>com.mysql.jdbc.Driver</value>
- </property>
- <property>
- <name>javax.jdo.option.ConnectionURL</name>
- <value>jdbc:mysql://10.28.85.149:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
- </property>
- <property>
- <name>javax.jdo.option.ConnectionUserName</name>
- <value>szprd</value>
- </property>
- <property>
- <name>javax.jdo.option.ConnectionPassword</name>
- <value>szprd</value>
- </property>
- ```
- - 配置hdfs文件目录
- ```
- <property>
- <name>hive.exec.scratchdir</name>
- <!--<value>/tmp/hive</value>-->
- <value>/app/hive/apache-hive-2.3.3/hive_site_dir/tmp</value>
- <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
- </property>
- <property>
- <name>hive.metastore.warehouse.dir</name>
- <value>/app/hive/apache-hive-2.3.3/hive_site_dir/warehouse</value>
- </property>
- <property>
- <name>hive.exec.local.scratchdir</name>
- <!--<value>${system:java.io.tmpdir}/${system:user.name}</value> -->
- <value>/app/hive/apache-hive-2.3.3/tmp/${system:user.name}</value>
- <description>Local scratch space for Hive jobs</description>
- </property>
- <property>
- <name>hive.downloaded.resources.dir</name>
- <!--<value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>-->
- <value>/app/hive/apache-hive-2.3.3/tmp/${hive.session.id}_resources</value>
- <description>Temporary local directory for added resources in the remote file system.</description>
- </property>
- <property>
- <name>hive.querylog.location</name>
- <!--<value>${system:java.io.tmpdir}/${system:user.name}</value>-->
- <value>/app/hive/apache-hive-2.3.3/hive_site_dir/log/${system:user.name}</valu
- <description>Location of Hive run time structured log file</description>
- </property>
- <property>
- <name>hive.metastore.schema.verification</name>
- <value>false</value>
- <description>
- Enforce metastore schema version consistency.
- True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
- schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
- proper metastore schema migration. (Default)
- False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
- </description>
- </property>
- ```
- **修改完hive-site.xml 配置文件后,wq 保存退出**
- - 下载合适版本的mysql驱动包,放到HIVE安装目录的 lib目录下
- https://dev.mysql.com/downloads/connector/j/
- - 始化数据库(在启动hive前一定要先执行这个命令哦,如果失败了,请查看数据库配置信息是否准确~ )
- ```
- cd /app/hive/apache-hive-2.3.3/bin
- ./schematool -initSchema -dbType mysql
- ```
- - 启动hive
- `hive #这里配置了环境变量后(/etc/profile),可以在任意目录下执行 `
- - 实时查看日志启动hive命令(在hive安装目录的bin目录下执行): `./hive -hiveconf hive.root.logger=DEBUG,console`
- ---
- # HBASE INSTALL
- - [下载hbase安装包:](http://hbase.apache.org/downloads.html)
- - 解压: `tar -zxvf hbase-1.2.6.1-bin.tar.gz`
- - 配置环境变量: (加在最后面)
- ```
- vi /etc/profile
- #HBase Setting
- export HBASE_HOME=/app/hbase/hbase-1.2.6.1
- export PATH=$PATH:$HBASE_HOME/bin
- ```
- - 编辑配置文件: `hbase-env.sh`
- ```
- # 默认为ture,表示使用内建的zk,false使用外部zk系统
- export HBASE_MANAGES_ZK=false
- export HBASE_PID_DIR=/app/hadoop/hadoop-2.7.7/pids #如果该目录不存在,则先创建
- export JAVA_HOME=/app/lib/jdk #指定JDK的安装目录
- ```
- - 编辑配置文件: `hbase-site.xml`
- 在configuration节点添加如下配置:
- ```
- <configuration>
- <!-- 备份数据份数 -->
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <!-- 配置hbase 在hadoop 中的根目录 -->
- <property>
- <name>hbase.rootdir</name>
- <value>hdfs://10.28.85.149:9000/hbase</value>
- </property>
- <!-- zk 监听的端口号,必须和zk系统监听的端口一致 -->
- <property>
- <name>hbase.zookeeper.property.clientPort</name>
- <value>2181</value>
- </property>
- <!-- zk 属性文件中dataDir属性设置值一致 -->
- <property>
- <name>hbase.zookeeper.property.dataDir</name>
- <value>/app/zookeeper/data</value>
- </property>
- <!-- zk 根 znode 节点 -->
- <property>
- <name>zookeeper.znode.parent</name>
- <value>/hbase</value>
- </property>
- <!-- hbase 是否是集群安装 -->
- <property>
- <name>hbase.cluster.distributed</name>
- <value>true</value>
- </property>
- <!-- 如果你使用本地文件系统,LocalFileSystem 这个属性设置成 false -->
- <property>
- <name>hbase.unsafe.stream.capability.enforce</name>
- <value>true</value>
- <description>
- Controls whether HBase will check for stream capabilities (hflush/hsync). Disable this if you intend to run on LocalFileSystem, denoted by arootdir with the 'file://' scheme, but be mindful of the NOTE below.
- WARNING: Setting this to false blinds you to potential data loss and inconsistent system state in the event of process and/or node failures.If HBase is complaining of an inability to use hsync or hflush it's most likely not a false positive.
- </description>
- </property>
- </configuration>
- ```
- - 启动zookeeper
- 进入zookeeper的安装目录下的bin目录,执行 `./zkServer.sh`
- 然后启动客户端: ` ./zkCli.sh`
- 启动成功后,输入: ` create /hbase hbase`
- - 启动hbase
- 进入Hbase的bin目录: `./start-hbase.sh`
- ```
- ./hbase shell #这里启动成功后就可以开始执行hbase相关命令了
- list #查看当前hbase库中的所有表,没有报错表示成功
- ```
- - web访问HBASE: http://10.28.85.149:16010/master-status #ip为当前服务器的ip,端口为16010
- ---
- # SQOOP INSTALL
- - [下载安装包](https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/)
- - 解压 ` tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz`
- 更改文件名: `mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7_hadoop-2.6.0`
- - 配置环境变量:
- ```
- #Sqoop Setting
- export SQOOP_HOME=/app/sqoop/sqoop-1.4.7_hadoop-2.6.0
- export PATH=$PATH:$SQOOP_HOME/bin
- ```
- - 将mysql的驱动包复制到 Sqoop安装目录的lib目录下
- 下载地址:https://dev.mysql.com/downloads/connector/j/
- - 编辑配置文件: sqoop的安装目录下的 conf下
- ```
- vi sqoop-env.sh
- #Set path to where bin/hadoop is available
- export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.7.7
- #Set path to where hadoop-*-core.jar is available
- export HADOOP_MAPRED_HOME=/app/hadoop/hadoop-2.7.7
- #set the path to where bin/hbase is available
- export HBASE_HOME=/app/hbase/hbase-1.2.6.1
- #Set the path to where bin/hive is available
- export HIVE_HOME=/app/hive/apache-hive-2.3.3
- #Set the path for where zookeper config dir is
- export ZOOCFGDIR=/app/zookeeper/zookeeper-3.4.12
- ```
- - 测试sqoop的安装
- - sqoop help #可以查看到sqoop的相关命令
- - 测试sqoop的连接: 查看此连接信息下的所有数据库
- ```
- sqoop list-databases \
- --connect jdbc:mysql://10.28.85.148:3306/data_mysql2hive \
- --username root \
- --password Abcd1234
- ```
- ---
- # oozie 安装
- # 安装基于oozie-4.0.0-cdh5.3.6.tar.gz oozie 版本
- 安装之前准备条件:
- - 可用的mysql数据库
- - 已经安装好的hadoop集群
- - oozie 最终编译好的安装包中 `oozie-server` 就是一个tomcat环境,不用另外安装tomcat 环境。
- ## 安装
- - 下载编译后的压缩包:`wget http://archive.cloudera.com/cdh5/cdh/5/oozie-4.0.0-cdh5.3.6.tar.gz`
- - 解压缩到所指定的目录 :`tar -zxvf oozie-4.0.0-cdh5.3.6.tar.gz`,这里使用的目录是: `/app/oozie`
- - 设置全局环境变量:`sudo vim /etc/profile`
- ```
- #oozie setting
- export OOZIE_HOME=/app/oozie/oozie-4.0.0-cdh5.3.6
- export PATH=$PATH:$OOZIE_HOME/bin
- ```
- - 设置 ` Oozie安装目录/conf/oozie-env.sh ` 设置环境变量
- 同时oozie的web console 的端口也在这里进行设置:
- `OOZIE_HTTP_PORT ` 设置 oozie web 服务的监听端口,默认是11000
- ```
- export OOZIE_CONF=${OOZIE_HOME}/conf
- export OOZIE_DATA=${OOZIE_HOME}/data
- export OOZIE_LOG=${OOZIE_HOME}/logs
- export CATALINA_BASE=${OOZIE_HOME}/oozie-server
- export CATALINA_HOME=${OOZIE_HOME}/oozie-server
- ```
- - 在Oozie根目录下创建libext文件夹,并将Oozie依赖的其他第三方jar移动到该目录下面。`mkdir libext`
- - 将下载的ext2.2添加到 libext 目录 :` cp ext-2.2.zip oozie-5.0.0/libext/`
- - 添加hadoop lib下的包到libext目录,进入libext目录`cp /app/hadoop/hadoop-2.7.7/share/hadoop/*/*.jar ./`和 ` cp /app/hadoop/hadoop-2.7.7/share/hadoop/*/lib/*.jar ./`
- - 添加对于存储元数据的mysql数据库的驱动(`mysql-connector-java-5.1.41.jar`)
- - hadoop 设置oozie 代理用户设置:
- 只需要替换xxx 为你oozie提交任务的用户名即可。
- - hadoop.proxyuser.**xxx**.hosts
- - hadoop.proxyuser.**xxx**.groups
- ```
- <!-- oozie -->
- <property>
- <name>hadoop.proxyuser.imodule.hosts</name>
- <value>*</value>
- </property>
- <property>
- <name>hadoop.proxyuser.imodule.groups</name>
- <value>*</value>
- </property>
- ```
- - 在hdfs上设置Oozie的公用jar文件夹,
- hadoop的默认端口号是8020,我改成了9000,所以这里注意一下:
- 遇到一个问题是:NameNode 处于 safe mode,需要关闭安全模式:`hdfs dfsadmin -safemode leave`
- ```
- oozie-setup.sh sharelib create -fs hdfs://10.28.85.149:9000 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
- ```
- - 创建Oozie的war文件
- 先将hadoop相关包,mysql相关包,ext相关压缩包放到libext文件夹中,然后运行:`oozie-setup.sh prepare-war` 命令创建war包。
- - oozie 安装目录conf/oozie-site.xml
- oozie.service.HadoopAccessorService.hadoop.configurations属性的值为本地hadoop目录的配置文件路径:
- ```
- <configuration>
- <property>
- <name>oozie.services</name>
- <value>
- org.apache.oozie.service.JobsConcurrencyService,
- org.apache.oozie.service.SchedulerService,
- org.apache.oozie.service.InstrumentationService,
- org.apache.oozie.service.MemoryLocksService,
- org.apache.oozie.service.CallableQueueService,
- org.apache.oozie.service.UUIDService,
- org.apache.oozie.service.ELService,
- org.apache.oozie.service.AuthorizationService,
- org.apache.oozie.service.UserGroupInformationService,
- org.apache.oozie.service.HadoopAccessorService,
- org.apache.oozie.service.URIHandlerService,
- org.apache.oozie.service.DagXLogInfoService,
- org.apache.oozie.service.SchemaService,
- org.apache.oozie.service.LiteWorkflowAppService,
- org.apache.oozie.service.JPAService,
- org.apache.oozie.service.StoreService,
- org.apache.oozie.service.CoordinatorStoreService,
- org.apache.oozie.service.SLAStoreService,
- org.apache.oozie.service.DBLiteWorkflowStoreService,
- org.apache.oozie.service.CallbackService,
- org.apache.oozie.service.ActionService,
- org.apache.oozie.service.ShareLibService,
- org.apache.oozie.service.ActionCheckerService,
- org.apache.oozie.service.RecoveryService,
- org.apache.oozie.service.PurgeService,
- org.apache.oozie.service.CoordinatorEngineService,
- org.apache.oozie.service.BundleEngineService,
- org.apache.oozie.service.DagEngineService,
- org.apache.oozie.service.CoordMaterializeTriggerService,
- org.apache.oozie.service.StatusTransitService,
- org.apache.oozie.service.PauseTransitService,
- org.apache.oozie.service.GroupsService,
- org.apache.oozie.service.ProxyUserService,
- org.apache.oozie.service.XLogStreamingService,
- org.apache.oozie.service.JvmPauseMonitorService
- </value>
- </property>
- <!-- 配置hadoop etc/hadoop目录 -->
- <property>
- <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
- <value>*=/app/hadoop/hadoop-2.7.7/etc/hadoop</value>
- </property>
- <property>
- <name>oozie.service.JPAService.create.db.schema</name>
- <value>true</value>
- </property>
- <property>
- <name>oozie.service.JPAService.jdbc.driver</name>
- <value>com.mysql.jdbc.Driver</value>
- </property>
- <property>
- <name>oozie.service.JPAService.jdbc.url</name>
- <value>jdbc:mysql://10.28.85.148:3306/ooize?createDatabaseIfNotExist=true</value>
- </property>
- <property>
- <name>oozie.service.JPAService.jdbc.username</name>
- <value>root</value>
- </property>
- <property>
- <name>oozie.service.JPAService.jdbc.password</name>
- <value>Abcd1234</value>
- </property>
- </configuration>
- ```
- - 运行Oozie服务并检查是否安装完成
- `oozied.sh run 或者oozied.sh start` (前者在前端运行,后者在后台运行)
- - 关闭oozie 服务: `oozied.sh stop`
- - 命令行检查oozie web 状态(`oozie admin -oozie http://10.28.85.149:11000/oozie -status `) 返回:`System mode: NORMAL`
- - 然后使用shareliblist命令查看相关内容 `oozie admin -shareliblist -oozie http://localhost:11000/oozie`
- - 页面访问:`http://10.28.85.149:11000/oozie/`
- **遇到 了一个问题**
- ```
- Sep 03, 2018 4:36:47 PM org.apache.catalina.core.StandardWrapperValve invoke
- SEVERE: Servlet.service() for servlet jsp threw exception
- java.lang.NullPointerException
- at org.apache.jsp.index_jsp._jspInit(index_jsp.java:25)
- at org.apache.jasper.runtime.HttpJspBase.init(HttpJspBase.java:52)
- at org.apache.jasper.servlet.JspServletWrapper.getServlet(JspServletWrapper.java:164)
- at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:340)
- at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
- at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
- at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
- at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
- at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
- at org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:154)
- at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:594)
- at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:553)
- at org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:159)
- at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
- at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
- at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:84)
- at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
- at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
- at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
- at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
- at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
- at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
- at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
- at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
- at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
- at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:620)
- at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
- at java.lang.Thread.run(Thread.java:745)
- ```
- 问题原因是工程目录下`WEB-INF/lib`目录和tomcat下lib目录都有servlet-api.jar ,jsp-api.jar 文件造成的。
- `/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/webapps/oozie/WEB-INF/lib `下 和`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/lib`两个目录下有具有相同的jar包造成了冲突。`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server`这个目录下就是oozie-server的tomcat 环境。目录下的lib目录就是tomcat运行时jar包。
- 解决办法:将`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/webapps/oozie/WEB-INF/lib `目录下的:servlet-api-2.5-6.1.14.jar, servlet-api-2.5.jar, jsp-api-2.1.jar 三个文件删除即可。
- 然后就可以顺利启动了
- 
- ---
- Pig的安装
- # 前提
- ### hadoop 2.7.7 已安装
- ### jdk1.7+
- # 安装
- ```
- tar -xzvf pig-0.17.0.tar.gz
- # Pig setting
- export PIG_HOME=/app/pig/pig-0.17.0
- export PATH=$PATH:$PIG_HOME/bin
- ```
- # 测试
- ```
- -- 本地模式
- pig -x local
- -- mapreduce模式
- pig -x mapreduce
- ```
- 
- ---
Hive/Hbase/Sqoop的安装教程的更多相关文章
- Hive/hbase/sqoop的基本使用教程~
Hive/hbase/sqoop的基本使用教程~ ###Hbase基本命令start-hbase.sh #启动hbasehbase shell #进入hbase编辑命令 list ...
- Centos搭建mysql/Hadoop/Hive/Hbase/Sqoop/Pig
目录: 准备工作 Centos安装 mysql Centos安装Hadoop Centos安装hive JDBC远程连接Hive Hbase和hive整合 Centos安装Hbase 准备工作: 配置 ...
- HIVE 2.1.0 安装教程。(数据源mysql)
前期工作 安装JDK 安装Hadoop 安装MySQL 安装Hive 下载Hive安装包 可以从 Apache 其中一个镜像站点中下载最新稳定版的 Hive, apache-hive-2.1.0-bi ...
- Hadoop 2.6.0-cdh5.4.0集群环境搭建和Apache-Hive、Sqoop的安装
搭建此环境主要用来hadoop的学习,因此我们的操作直接在root用户下,不涉及HA. Software: Hadoop 2.6.0-cdh5.4.0 Apache-hive-2.1.0-bin Sq ...
- apache-hadoop-1.2.1、hbase、hive、mahout、nutch、solr安装教程
1 软件环境: VMware8.0 Ubuntu-12.10-desktop-i386 jdk-7u40-linux-i586.tar.gz hadoop-1.2.1.tar.gz eclipse-d ...
- CDH5上安装Hive,HBase,Impala,Spark等服务
Apache Hadoop的服务的部署比較繁琐.须要手工编辑配置文件.下载依赖包等.Cloudera Manager以GUI的方式的管理CDH集群,提供向导式的安装步骤.因为须要对Hive,HBase ...
- Hadoop、Zookeeper、Hbase分布式安装教程
参考: Hadoop安装教程_伪分布式配置_CentOS6.4/Hadoop2.6.0 Hadoop集群安装配置教程_Hadoop2.6.0_Ubuntu/CentOS ZooKeeper-3.3 ...
- hive安装教程本地模式
1.安装模式介绍: Hive官网上介绍了Hive的3种安装方式,分别对应不同的应用场景. a.内嵌模式(元数据保村在内嵌的derby种,允许一个会话链接,尝试多个会话链接时会报错) b.本地模式(本地 ...
- 3.12-3.16 Hbase集成hive、sqoop、hue
一.Hbase集成hive https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration 1.说明 Hive与HBase整合在一起 ...
随机推荐
- 第 3 章 镜像 - 021 - Docker 镜像小结
镜像小结 镜像的常用操作子命令: images 显示镜像列表 history 显示镜像构建历史 commit 从容器创建新镜像 build 从 Dockerfile 构建镜像 ...
- Windows 7 Update Settings Disabled (Important Updates Grayed Out)
This worked for me: 1) Hold WindowsKey + R (is hold Start & press R on your keyboard) 2) Typ ...
- 非常好的 gdb tui 的文章
http://beej.us/guide/bggdb/ Help Commands help command Get help on a certain command apropos keyword ...
- PC端、移动端的页面适配及兼容处理
转自 一.关于移动端兼容性 目前针对跨终端的方案,主要分为两大阵营:一套资源Vs两套资源. 第一种是通过响应式或页面终端判断去实现一套资源适配所有终端: 第二种是通过终端判断分别调取两套资源以适配所有 ...
- Learn Python3 the hard way 第一天总结 命令行(1)
附录-命令行快速入门(1) command line interface 简称 CLI ,可以在mac OS 上通过一些输入进行一些操作. 1如何在迷路后怎样回家 命令: pwd:打印工作目录cd 更 ...
- android -------- ConstraintLayout介绍 (一)
ConstraintLayout 翻译为 约束布局,也有人把它称作 增强型的相对布局,由 2016 年 Google I/O 推出. 扁平式的布局方式,无任何嵌套,减少布局的层级,优化渲染性能.从支持 ...
- C++ string的用法和例子
使用场合: string是C++标准库的一个重要的部分,主要用于字符串处理.可以使用输入输出流方式直接进行操作,也可以通过文件等手段进行操作.同时C++的算法库对string也有着很好的支持,而且st ...
- json字符串转Map、json数组
json数组转map public static void main(String[] args){ String strArr = "[{\"0\":\"zh ...
- Event IO Process
先了解一下process和event loop EventLoop 除了异步Server和Client库之外,Swoole扩展还提供了直接操作底层epoll/kqueue事件循环的接口.可将其他扩展创 ...
- object对象转string字符串
var obj = {}; obj=new Array; obj.name='小王'; obj.sex='男'; var str=JSON.string(obj)