Tags: Hadoop

Centos7.5安装分布式Hadoop2.6.0+Hbase+Hive(CDH5.14.2离线安装tar包)

主机环境

基本配置:

节点数 5
操作系统 CentOS Linux release 7.5.1804 (Core)
内存 8GB

流程配置:

节点数 5
操作系统 CentOS Linux release 7.5.1804 (Core)
内存 16GB

注: 实际生产中按照需求分配内存,如果只是在vmvare中搭建虚拟机,内存可以调整为每台主机1-2GB即可

软件环境

软件 版本 下载地址
jdk jdk-8u172-linux-x64 点击下载
hadoop hadoop-2.6.0-cdh5.14.2 点击下载
zookeeeper zookeeper-3.4.5-cdh5.14.2 点击下载
hbase hbase-1.2.0-cdh5.14.2 点击下载
hive hive-1.1.0-cdh5.14.2 点击下载

注: CDH5的所有软件可以在此下载:http://archive.cloudera.com/cdh5/cdh/5/

主机规划

5个节点角色规划如下:

主机名 CDHNode1 CDHNode2 CDHNode3 CDHNode4 CDHNode5
IP 192.168.223.201 192.168.223.202 192.168.223.203 192.168.223.204 192.168.223.205
namenode yes yes no no no
dataNode no no yes yes yes
resourcemanager yes yes no no no
journalnode yes yes yes yes yes
zookeeper yes yes yes no no
hmaster(hbase) yes yes no no no
regionserver(hbase) no no yes yes yes
hive(hiveserver2) no no yes yes yes

注: Journalnode和ZooKeeper保持奇数个,如果需要高可用则不少于 3 个节点。具体原因,以后详叙。

主机安装前准备

  1. 关闭所有节点的 SELinux
sed -i 's/^SELINUX=.*$/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
  1. 关闭所有节点防火墙 firewalld or iptables
systemctl disable firewalld;
systemctl stop firewalld;
systemctl disable iptables;
systemctl stop iptables;
  1. 开启所有节点时间同步 ntpdate
echo "*/5 * * * * /usr/sbin/ntpdate asia.pool.ntp.org | logger -t NTP" >> /var/spool/cron/root
  1. 设置所有节点语言编码以及时区
echo 'export TZ=Asia/Shanghai' >> /etc/profile
echo 'export LANG=en_US.UTF-8' >> /etc/profile
. /etc/profile
  1. 所有节点添加hadoop用户
useradd -m hadoop
echo '123456' | passwd --stdin hadoop
# 设置PS1
su - hadoop
echo 'export PS1="\u@\h:\$PWD>"' >> ~/.bash_profile
echo "alias mv='mv -i'
alias rm='rm -i'" >> ~/.bash_profile
. ~/.bash_profile
  1. 设置hadoop用户之间免密登录 首先在CDHNode1主机生成秘钥
su - hadoop
ssh-keygen -t rsa # 一直回车即可生成hadoop用户的公钥和私钥
cd .ssh
vi id_rsa.pub # 去掉私钥末尾的主机名 hadoop@CDHNode1
cat id_rsa.pub > authorized_keys
chmod 600 authorized_keys

压缩.ssh文件夹

su - hadoop
zip -r ssh.zip .ssh

随后分发ssh.zip到CDHNode2-5主机hadoop用户家目录解压即完成免密登录

  1. 主机内核参数优化以及最大文件打开数、最大进程数等参数优化 不同主机优化参数有可能不一样,故这里不作出具体优化方法,但如果Hadoop环境用于正式生产,必须优化,linux默认参数可能会导致hadoop集群性能低下。
  2. datanode节点(CDHNode3-5)挂载数据盘/chunk1,大小15G,请挂载后目录需要授权给hadoop用户

注: 以上操作需要使用 root 用户,到目前为止操作系统环境已经准备完成,以下开始正式安装,后面的操作如果不做特殊说明均使用 hadoop 用户

安装jdk1.8

所有节点都需要安装,安装方式都一样 解压 jdk-8u172-linux-x64.tar.gz

tar zxvf jdk-8u172-linux-x64.tar.gz
mkdir -p /home/hadoop/app
mv jdk-8u172-linux-x64 /home/hadoop/app/jdk
rm -f jdk-8u172-linux-x64.tar.gz

配置环境变量 vi ~/.bash_profile 添加以下内容:

#java
export JAVA_HOME=/home/hadoop/app/jdk
export CLASSPATH=.:$JAVA_HOME/lib:$CLASSPATH
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

加载环境变量

. ~/.bash_profile

查看是否安装成功 java -version

java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)

如果出现以上结果证明安装成功。

安装zookeeper

首先在CDHNode1上安装

解压 zookeeper-3.4.5-cdh5.14.2.tar.gz

tar zxvf zookeeper-3.4.5-cdh5.14.2.tar.gz
mv zookeeper-3.4.5-cdh5.14.2 /home/hadoop/app/zookeeper
rm -f zookeeper-3.4.5-cdh5.14.2.tar.gz

设置环境变量 vi ~/.bash_profile 添加以下内容:

#zk
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin

加载环境变量

. ~/.bash_profile

添加配置文件 vi /home/hadoop/app/zookeeper/conf/zoo.cfg 添加以下内容:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
#数据文件目录与日志目录
dataDir=/home/hadoop/data/zookeeper/zkdata
dataLogDir=/home/hadoop/data/zookeeper/zkdatalog
# the port at which the clients will connect
clientPort=2181
#server.服务编号=主机名称:Zookeeper不同节点之间同步和通信的端口:选举端口(选举leader)
server.1=CDHNode1:2888:3888
server.2=CDHNode2:2888:3888
server.3=CDHNode3:2888:3888
# 节点变更时只需在此添加或者删除相应的节点(所有节点配置都需要修改),然后在启动新增或者停止删除的节点即可
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

创建所需目录

mkdir -p /home/hadoop/data/zookeeper/zkdata
mkdir -p /home/hadoop/data/zookeeper/zkdatalog
mkdir -p /home/hadoop/app/zookeeper/logs

添加myid vim /home/hadoop/data/zookeeper/zkdata/myid,添加:

1

注: 此数字来源于zoo.cfg中配置 server.1=CDHNode1:2888:3888行server后面的1,故CDHNode2填写2,CDHNode3填写3

配置日志目录 vim /home/hadoop/app/zookeeper/libexec/zkEnv.sh ,修改以下参数为:

ZOO_LOG_DIR="$ZOOKEEPER_HOME/logs"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"

注: /home/hadoop/app/zookeeper/libexec/zkEnv.sh 与 /home/hadoop/app/zookeeper/bin/zkEnv.sh 文件内容相同。启动脚本 /home/hadoop/app/zookeeper/bin/zkServer.sh 会优先读取/home/hadoop/app/zookeeper/libexec/zkEnv.sh,当其不存在时才会读取 /home/hadoop/app/zookeeper/bin/zkEnv.sh

vim /home/hadoop/app/zookeeper/conf/log4j.properties ,修改以下参数为:

zookeeper.root.logger=INFO, ROLLINGFILE
zookeeper.log.dir=/home/hadoop/app/zookeeper/logs
log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender

复制zookeeper到CDHNode2-3

scp ~/.bash_profile CDHNode2:/home/hadoop
scp ~/.bash_profile CDHNode3:/home/hadoop
scp -pr /home/hadoop/app/zookeeper CDHNode2:/home/hadoop/app
scp -pr /home/hadoop/app/zookeeper CDHNode3:/home/hadoop/app
ssh CDHNode2 "mkdir -p /home/hadoop/data/zookeeper/zkdata;mkdir -p /home/hadoop/data/zookeeper/zkdatalog;mkdir -p /home/hadoop/app/zookeeper/logs"
ssh CDHNode2 "echo 2 > /home/hadoop/data/zookeeper/zkdata/myid"
ssh CDHNode3 "mkdir -p /home/hadoop/data/zookeeper/zkdata;mkdir -p /home/hadoop/data/zookeeper/zkdatalog;mkdir -p /home/hadoop/app/zookeeper/logs"
ssh CDHNode3 "echo 3 > /home/hadoop/data/zookeeper/zkdata/myid"

启动zookeeper 3个节点均启动

/home/hadoop/app/zookeeper/bin/zkServer.sh start

查看节点状态

/home/hadoop/app/zookeeper/bin/zkServer.sh status

如果一个节点为leader,另2个节点为follower,则说明Zookeeper安装成功

查看进程

jps

其中 QuorumPeerMain 进程为zookeeper

停止zookeeper

/home/hadoop/app/zookeeper/bin/zkServer.sh stop

安装hadoop

首先在CDHNode1节点安装,然后复制到其他节点 解压 hadoop-2.6.0-cdh5.14.2.tar.gz

tar zxvf hadoop-2.6.0-cdh5.14.2.tar.gz
mv hadoop-2.6.0-cdh5.14.2 /home/hadoop/app/hadoop
rm -f hadoop-2.6.0-cdh5.14.2.tar.gz

设置环境变量 vi ~/.bash_profile 添加以下内容:

#hadoop
HADOOP_HOME=/home/hadoop/app/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME PATH

加载环境变量

. ~/.bash_profile

配置HDFS

配置 /home/hadoop/app/hadoop/etc/hadoop/hadoop-env.sh, 修改以下内容

export JAVA_HOME=/home/hadoop/app/jdk

配置 /home/hadoop/app/hadoop/etc/hadoop/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--> <!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster1</value>
</property>
<!-- 这里的值指的是默认的HDFS路径 ,取名为cluster1 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/tmp</value>
</property>
<!-- hadoop的临时目录,如果需要配置多个目录,需要逗号隔开,data目录需要我们自己创建 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>
</property>
<!-- 配置Zookeeper 管理HDFS -->
</configuration>

配置 /home/hadoop/app/hadoop/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--> <!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 数据块副本数为3 -->
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/data/hdfs/name</value>
</property>
<!-- 元数据保存目录,多个以','隔开 -->
<property>
<name>dfs.data.dir</name>
<value>/chunk1</value>
</property>
<!-- 数据保存目录,多个以','隔开 -->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<!-- 权限默认配置为false -->
<property>
<name>dfs.nameservices</name>
<value>cluster1</value>
</property>
<!-- 命名空间,它的值与fs.defaultFS的值要对应,namenode高可用之后有两个namenode,cluster1是对外提供的统一入口 -->
<property>
<name>dfs.ha.namenodes.cluster1</name>
<value>CDHNode1,CDHNode2</value>
</property>
<!-- 指定 nameService 是 cluster1 时的nameNode有哪些,这里的值也是逻辑名称,名字随便起,相互不重复即可 -->
<property>
<name>dfs.namenode.rpc-address.cluster1.CDHNode1</name>
<value>CDHNode1:9000</value>
</property>
<!-- CDHNode1 rpc地址 -->
<property>
<name>dfs.namenode.http-address.cluster1.CDHNode1</name>
<value>CDHNode1:50070</value>
</property>
<!-- CDHNode1 http地址 -->
<property>
<name>dfs.namenode.rpc-address.cluster1.CDHNode2</name>
<value>CDHNode2:9000</value>
</property>
<!-- CDHNode2 rpc地址 -->
<property>
<name>dfs.namenode.http-address.cluster1.CDHNode2</name>
<value>CDHNode2:50070</value>
</property>
<!-- CDHNode2 http地址 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 启动故障自动恢复 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://CDHNode1:8485;CDHNode2:8485;CDHNode3:8485;CDHNode4:8485;CDHNode5:8485/cluster1</value>
</property>
<!-- 指定journal -->
<property>
<name>dfs.client.failover.proxy.provider.cluster1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 指定 cluster1 出故障时,哪个实现类负责执行故障切换 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/data/journaldata/jn</value>
</property>
<!-- 指定JournalNode集群在对nameNode的目录进行共享时,自己存储数据的磁盘路径 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>10000</value>
</property>
<!-- 脑裂默认配置 -->
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
</configuration>

配置 /home/hadoop/app/hadoop/etc/hadoop/slaves

CDHNode3
CDHNode4
CDHNode5

配置YARN

配置 /home/hadoop/app/hadoop/etc/hadoop/mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--> <!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 指定运行mapreduce的环境是Yarn,与hadoop1不同的地方 -->
</configuration>

配置 /home/hadoop/app/hadoop/etc/hadoop/yarn-site.xml

<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--> <!-- Site specific YARN configuration properties -->
<configuration>
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<!-- 超时的周期 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 打开高可用 -->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 启动故障自动恢复 -->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-rm-cluster</value>
</property>
<!-- 给yarn cluster 取个名字yarn-rm-cluster -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 给ResourceManager 取个名字 rm1,rm2 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>CDHNode1</value>
</property>
<!-- 配置ResourceManager rm1 hostname -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>CDHNode2</value>
</property>
<!-- 配置ResourceManager rm2 hostname -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!-- 启用resourcemanager 自动恢复 -->
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>
</property>
<!-- 配置Zookeeper地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>
</property>
<!-- 配置Zookeeper地址 -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>CDHNode1:8032</value>
</property>
<!-- rm1端口号 -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>CDHNode1:8034</value>
</property>
<!-- rm1调度器的端口号 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>CDHNode1:8088</value>
</property>
<!-- rm1 webapp端口号 -->
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>CDHNode2:8032</value>
</property>
<!-- rm2端口号 -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>CDHNode2:8034</value>
</property>
<!-- rm2调度器的端口号 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>CDHNode2:8088</value>
</property>
<!-- rm2 webapp端口号 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!-- 执行MapReduce需要配置的shuffle过程 -->
</configuration>

创建相应目录

mkdir -p /home/hadoop/data/tmp
mkdir -p /home/hadoop/data/hdfs/name
mkdir -p /home/hadoop/data/journaldata/jn
mkdir -p /home/hadoop/data/pid
touch /home/hadoop/app/hadoop/etc/hadoop/excludes

复制hadoop到CDHNode2-5

scp ~/.bash_profile CDHNode2:/home/hadoop
scp ~/.bash_profile CDHNode3:/home/hadoop
scp ~/.bash_profile CDHNode4:/home/hadoop
scp ~/.bash_profile CDHNode5:/home/hadoop scp -pr /home/hadoop/app/hadoop CDHNode2:/home/hadoop/app
scp -pr /home/hadoop/app/hadoop CDHNode3:/home/hadoop/app
scp -pr /home/hadoop/app/hadoop CDHNode4:/home/hadoop/app
scp -pr /home/hadoop/app/hadoop CDHNode5:/home/hadoop/app ssh CDHNode2 "mkdir -p /home/hadoop/data/tmp;mkdir -p /home/hadoop/data/hdfs/name;mkdir -p /home/hadoop/data/journaldata/jn;mkdir -p /home/hadoop/data/pid;touch /home/hadoop/app/hadoop/etc/hadoop/excludes" ssh CDHNode3 "mkdir -p /home/hadoop/data/tmp;mkdir -p /home/hadoop/data/hdfs/name;mkdir -p /home/hadoop/data/journaldata/jn;mkdir -p /home/hadoop/data/pid;touch /home/hadoop/app/hadoop/etc/hadoop/excludes" ssh CDHNode4 "mkdir -p /home/hadoop/data/tmp;mkdir -p /home/hadoop/data/hdfs/name;mkdir -p /home/hadoop/data/journaldata/jn;mkdir -p /home/hadoop/data/pid;touch /home/hadoop/app/hadoop/etc/hadoop/excludes" ssh CDHNode5 "mkdir -p /home/hadoop/data/tmp;mkdir -p /home/hadoop/data/hdfs/name;mkdir -p /home/hadoop/data/journaldata/jn;mkdir -p /home/hadoop/data/pid;touch /home/hadoop/app/hadoop/etc/hadoop/excludes"

集群初始化

启动 CDHNode1-3 节点上面的 zookeeper

/home/hadoop/app/zookeeper/bin/zkServer.sh start

启动 CDHNode1-5 节点上面的 journalnode

/home/hadoop/app/hadoop/sbin/hadoop-daemon.sh start journalnode

jps 如有 JournalNode 则启动正常

首先在主节点上CDHNode1执行格式化

/home/hadoop/app/hadoop/bin/hdfs namenode -format	# namenode 格式化
/home/hadoop/app/hadoop/bin/hdfs zkfc -formatZK # 格式化高可用
/home/hadoop/app/hadoop/bin/hdfs namenode # 启动namenode

注: 执行完上述命令后,程序就会在等待状态,只有在CDHNode2上执行完下一步后,按下ctrl+c来结束此namenode进程。

在CDHNode2上面执行namenode数据同步

/home/hadoop/app/hadoop/bin/hdfs namenode -bootstrapStandby	# 同步主节点和备节点之间的元数据

同步完成后,在CDHNode1节点上,按下ctrl+c来结束namenode进程。

然后关闭所有节点journalnode

/home/hadoop/app/hadoop/sbin/hadoop-daemon.sh stop journalnode

启动HDFS

如果上面操作没有问题,则可以集群中任何一台主机使用一键脚本启动hdfs所有相关进程,一般建议在namenode主节点上操作

/home/hadoop/app/hadoop/sbin/start-dfs.sh

注: start-dfs.sh 脚本原理是通过免密ssh登录到各节点启动相关进程,所以也会遇到ssh第一次连接需要确认的问题,请注意。

启动HDFS时如果遇到警告: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable , 这个只是WARN,不会影响正常执行,如果需要根治,方法如下: 下载:hadoop-2.6.0+cdh5.14.2+2748-1.cdh5.14.2.p0.11.el7.x86_64.rpm,windows下使用7zip解压hadoop-2.6.0+cdh5.14.2+2748-1.cdh5.14.2.p0.11.el7.x86_64.rpm并取出\usr\lib\hadoop\lib\native下所有文件,上传到所有节点/home/hadoop/app/hadoop/lib/native下,然后在所有节点执行:

cd /home/hadoop/app/hadoop/lib/native
rm -f libhadoop.so
rm -f libnativetask.so
rm -f libsnappy.so
rm -f libsnappy.so.1
cp libhadoop.so.1.0.0 libhadoop.so
cp libnativetask.so.1.0.0 libnativetask.so
cp libsnappy.so.1.1.4 libsnappy.so
cp libsnappy.so.1.1.4 libsnappy.so.1

再次启动HDFS就不会再有此WARN了。

通过web界面查看hdfs namenode启动情况

http://CDHNode1:50070 http://CDHNode2:50070

通过web界面查看hdfs datanode启动情况

http://cdhnode3:50075 http://cdhnode4:50075 http://cdhnode5:50075

上传文件至HDFS测试 vi a.txt //本地创建一个test.txt文件

hadoop CDH
hello world
CDH hadoop

hdfs dfs -mkdir /test #在hdfs上创建一个文件目录 hdfs dfs -put test.txt /test #向hdfs上传一个文件 hdfs dfs -ls /test #查看a.txt是否上传成功 如果上面操作没有问题说明hdfs配置成功。

启动YARN

首先在CDHNode1节点执行

/home/hadoop/app/hadoop/sbin/start-yarn.sh

然后在CDHNode2节点执行

/home/hadoop/app/hadoop/sbin/yarn-daemon.sh start resourcemanager

通过web界面查看yarn resourcemanager启动情况

http://CDHNode1:8088 http://CDHNode2:8088

通过web界面查看yarn nodemanager启动情况

http://cdhnode3:8042/node http://cdhnode4:8042/node http://cdhnode5:8042/node

检查一下ResourceManager状态

/home/hadoop/app/hadoop/bin/yarn rmadmin -getServiceState rm1
/home/hadoop/app/hadoop/bin/yarn rmadmin -getServiceState rm2

active 为主节点,standby为备节点

Wordcount示例测试

hadoop jar /home/hadoop/app/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar wordcount /test/test.txt /test/out

如果上面执行没有异常,说明YARN安装成功。

整个集群启动顺序

启动

启动CDHNode1-3节点zookeeper

/home/hadoop/app/zookeeper/bin/zkServer.sh start

启动HDFS

/home/hadoop/app/hadoop/sbin/start-dfs.sh

启动YARN

# 首先CDHNode1执行:
/home/hadoop/app/hadoop/sbin/start-yarn.sh # 然后CDHNode2执行:
/home/hadoop/app/hadoop/sbin/yarn-daemon.sh start resourcemanager
停止

停止YARN

# 首先CDHNode2执行:
/home/hadoop/app/hadoop/sbin/yarn-daemon.sh stop resourcemanager # 然后CDHNode1执行:
/home/hadoop/app/hadoop/sbin/stop-yarn.sh

停止HDFS

/home/hadoop/app/hadoop/sbin/stop-dfs.sh

停止CDHNode1-3节点zookeeper

/home/hadoop/app/zookeeper/bin/zkServer.sh stop

Hbase安装

首先在CDHNode1上安装

解压 hbase-1.2.0-cdh5.14.2.tar.gz

tar zxvf hbase-1.2.0-cdh5.14.2.tar.gz
mv hbase-1.2.0-cdh5.14.2 /home/hadoop/app/hbase
rm -f hbase-1.2.0-cdh5.14.2.tar.gz

设置环境变量 vi ~/.bash_profile 添加以下内容:

#hbase
export HBASE_HOME=/home/hadoop/app/hbase
export PATH=$PATH:$HBASE_HOME/bin

加载环境变量

. ~/.bash_profile

修改配置文件 vi /home/hadoop/app/hbase/conf/hbase-env.sh 修改以下内容:

export JAVA_HOME=/home/hadoop/app/jdk
export HBASE_MANAGES_ZK=false # 不使用hbase自带zookeeper

添加配置文件 vi /home/hadoop/app/hbase/conf/hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://cluster1/hbase</value>
</property>
<!-- 此处的hdfs配置需要与hadoop配置文件core-site.xml中fs.defaultFS的值一致 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>
</property>
<!-- zookeeper配置 -->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/data/hbase/zookeeper</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/home/hadoop/data/hbase/tmp</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>

添加regionservers从机 vi /home/hadoop/app/hbase/conf/regionservers,修改为

CDHNode3
CDHNode4
CDHNode5

拷贝hadoop的hdfs-site.xml和core-site.xml 放到$HBASE_HOME/conf下

cp /home/hadoop/app/hadoop/etc/hadoop/hdfs-site.xml /home/hadoop/app/hbase/conf
cp /home/hadoop/app/hadoop/etc/hadoop/core-site.xml /home/hadoop/app/hbase/conf

创建相关目录

mkdir -p /home/hadoop/data/hbase/zookeeper
mkdir -p /home/hadoop/data/hbase/tmp

复制hbase到CDHNode2-5

scp ~/.bash_profile CDHNode2:/home/hadoop
scp ~/.bash_profile CDHNode3:/home/hadoop
scp ~/.bash_profile CDHNode4:/home/hadoop
scp ~/.bash_profile CDHNode5:/home/hadoop scp -pr /home/hadoop/app/hbase CDHNode2:/home/hadoop/app
scp -pr /home/hadoop/app/hbase CDHNode3:/home/hadoop/app
scp -pr /home/hadoop/app/hbase CDHNode4:/home/hadoop/app
scp -pr /home/hadoop/app/hbase CDHNode5:/home/hadoop/app ssh CDHNode2 "mkdir -p /home/hadoop/data/hbase/zookeeper;mkdir -p /home/hadoop/data/hbase/tmp;"
ssh CDHNode3 "mkdir -p /home/hadoop/data/hbase/zookeeper;mkdir -p /home/hadoop/data/hbase/tmp;"
ssh CDHNode4 "mkdir -p /home/hadoop/data/hbase/zookeeper;mkdir -p /home/hadoop/data/hbase/tmp;"
ssh CDHNode5 "mkdir -p /home/hadoop/data/hbase/zookeeper;mkdir -p /home/hadoop/data/hbase/tmp;"

启动hbase

/home/hadoop/app/hbase/bin/start-hbase.sh

如果使用jdk8以上,会有以下警告

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

解决方法,所有节点 vi /home/hadoop/app/hbase/conf/hbase-env.sh 注释掉以下行

# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"

启动完成后,CDHNode1节点会多出HMaster进程,CDHNode3-5三个节点会多出HRegionServer进程(regionservers文件中配置的CDHNode3-5)

CDHNode2上启动从HMaster

/home/hadoop/app/hbase/bin/hbase-daemon.sh start master

备注:如果需要单独启动一个regionserver,使用类似命令

/home/hadoop/app/hbase/bin/hbase-daemon.sh start regionserver

查看HMaster

http://cdhnode1:60010/master-status http://cdhnode2:60010/master-status

可以看出CDHNode2节点是HMaster的从机。

查看HRegionServer

http://cdhnode3:60030/rs-status http://cdhnode4:60030/rs-status http://cdhnode5:60030/rs-status

验证

hbase shell

停止hbase 首先停止CDHNode2上的HMaster

/home/hadoop/app/hbase/bin/hbase-daemon.sh stop master

然后停止其他所有相关进程

/home/hadoop/app/hbase/bin/stop-hbase.sh

Hive安装

首先在CDHNode5上面安装mysql,教程参考我的博客: Centos7.5安装mysql 8.0.11

mysql中创建hive数据库与用户

mysql > create database hivedb character set latin1 collate latin1_bin;    # 这里必须知道hivedb字符集为latin1

# mysql > grant all privileges on hivedb.*  to 'hive'@'%' identified identified by 'hive'; # mysql 8.0以前可以使用,8.0报错,可以使用以下的方法

mysql > create user 'hive'@'%' identified by 'hive';
mysql > grant all privileges on hivedb.* to 'hive'@'%';
mysql > flush privileges;

CDHNode3上面解压 hive-1.1.0-cdh5.14.2.tar.gz

tar zxvf hive-1.1.0-cdh5.14.2.tar.gz
mv hive-1.1.0-cdh5.14.2 /home/hadoop/app/hive
rm -f hive-1.1.0-cdh5.14.2.tar.gz

下载mysql连接驱动并拷贝到/home/hadoop/app/hive/lib下: 下载地址: mysql-connector-java-8.0.11

tar zxvf mysql-connector-java-8.0.11.tar.gz
cp mysql-connector-java-8.0.11/mysql-connector-java-8.0.11.jar /home/hadoop/app/hive/lib/
rm -f mysql-connector-java-8.0.11.tar.gz
  • 这里使用的mysql为8.0.11版本,对应的mysql-connector-java也下载的8.0.11版本,如果使用的其他mysql版本,下载对应的驱动即可

设置环境变量 vi ~/.bash_profile 添加以下内容:

#hive
export HIVE_HOME=/home/hadoop/app/hive
export PATH=$PATH:$HIVE_HOME/bin

加载环境变量

. ~/.bash_profile

进入/home/hadoop/app/hive/conf,创建hive-env.sh

cp hive-env.sh.template hive-env.sh

编辑 vi /home/hadoop/app/hive/conf/hive-env.sh,修改以下配置:

export HADOOP_HEAPSIZE=1024
HADOOP_HOME=/home/hadoop/app/hadoop
export HIVE_CONF_DIR=/home/hadoop/app/hive/conf
export HIVE_AUX_JARS_PATH=/home/hadoop/app/hive/lib

添加hive配置文件 vi /home/hadoop/app/hive/conf/hive-site.xml :

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration>
<property>
<name>hive.exec.scratchdir</name>
<value>hdfs://cluster1/hive/scratchdir</value>
<description>HDFS路径,用于存储不同 map/reduce 阶段的执行计划和这些阶段的中间输出结果。</description>
</property> <property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://cluster1/hive/warehouse</value>
<description>HDFS路径,用于存储hive数据文件</description>
</property> <!-- 相关日志目录设置 -->
<property>
<name>hive.querylog.location</name>
<value>/home/hadoop/app/hive/logs</value>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/home/hadoop/data/hive/local/${hive.session.id}_resources</value>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/home/hadoop/app/hive/logs/operation_logs</value>
</property> <!-- 存储元数据的mysql连接信息 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<!--<value>jdbc:mysql://CDHNode5:3306/hivedb?characterEncoding=UTF-8&amp;createDatabaseIfNotExist=true</value>-->
<value>jdbc:mysql://CDHNode5:3306/hivedb?characterEncoding=latin1&amp;createDatabaseIfNotExist=true</value>
<description>主要编码设置,其中的 &amp; 在xml中表示 ; </description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property> <!-- 开启hive delete update 操作 -->
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>false</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<!--<value>1</value>-->
<value>5</value>
</property>
<property>
<name>hive.in.test</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.join.noconditionaltask.size</name>
<value>10000000</value>
</property> <!-- hwi设置 -->
<property>
<name>hive.hwi.listen.host</name>
<value>CDHNode3</value>
<description>hwi监听地址,每个节点不一样</description>
</property>
<property>
<name>hive.hwi.listen.port</name>
<value>9999</value>
<description>listen port</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive-hwi-1.2.2.war</value>
<description>war包所在的地址,不可以写绝对路径.</description>
</property> <!-- hiveserver2设置 -->
<property>
<name>hive.server2.support.dynamic.service.discovery</name>
<value>true</value>
</property>
<property>
<name>hive.server2.zookeeper.namespace</name>
<value>hiveserver2</value>
<description>zookeeper namespace设置</description>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>CDHNode3</value>
<description>hiveserver2监听地址,每个节点不一样</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10001</value>
<description>多个HiveServer2实例的端口号要一致</description>
</property>
</configuration>
  • 其中,hive-site.xml文件中的hive.exec.scratchdir和hive.metastore.warehouse.dir的hdfs访问地址需要和hadoop的配置文件core-site.xml中fs.defaultFS的值一致,即hdfs://cluster1

创建相关目录

mkdir -p /home/hadoop/data/hive/local
mkdir -p /home/hadoop/app/hive/logs

配置log4j日志输出,进入/home/hadoop/app/hive/conf,创建hive-exec-log4j.properties与hive-log4j.properties

cp hive-exec-log4j.properties.template hive-exec-log4j.properties
cp hive-log4j.properties.template hive-log4j.properties

编辑 hive-exec-log4j.properties 与 hive-log4j.properties ,修改以下配置(2个配置文件都修改):

hive.log.dir=/home/hadoop/app/hive/logs
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

初始化mysql元数据

hadoop@CDHNode3:/home/hadoop>schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/app/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://CDHNode5:3306/hivedb?characterEncoding=latin1&createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
Fri Jun 29 11:37:02 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Starting metastore schema initialization to 1.1.0-cdh5.14.2
Initialization script hive-schema-1.1.0.mysql.sql
Fri Jun 29 11:37:03 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Initialization script completed
Fri Jun 29 11:37:05 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
schemaTool completed
  • 这个步骤必须做,如果未做,后面hive还是可以正常启动,但是在做数据操作时有可能就会出现卡住现象,原因是mysql元数据数据库没有初始化,导致hive在读写mysql元数据数据库是产生 metadata lock

复制hbase到CDHNode4-5

scp ~/.bash_profile CDHNode4:/home/hadoop
scp ~/.bash_profile CDHNode5:/home/hadoop
scp -pr /home/hadoop/app/hive CDHNode4:/home/hadoop/app
scp -pr /home/hadoop/app/hive CDHNode5:/home/hadoop/app
ssh CDHNode4 "mkdir -p /home/hadoop/data/hive/local;"
ssh CDHNode5 "mkdir -p /home/hadoop/data/hive/local;"

注意: 传输完毕后需要修改CDHNode4-5节点配置文件 hive-site.xml 中hwi和hiverserver2的监听地址为本机

Hive的三种启动方式

  1. hive命令行模式 用于linux平台命令行查询,查询语句基本跟mysql查询语句类似
hive

基本操作

hive> show databases;
OK
default
Time taken: 0.08 seconds, Fetched: 1 row(s) hive> create database hive;
OK
Time taken: 0.18 seconds hive> show databases;
OK
default
hive hive> use hive;
OK
Time taken: 0.089 seconds hive> create table test(id int,name string);
OK
Time taken: 0.331 seconds hive> show tables;
OK
test
Time taken: 0.082 seconds, Fetched: 1 row(s) hive> insert into test values (1,'hello hive');
Query ID = hadoop_20180628105757_e64fc58b-37f6-4087-a823-738d5d933454
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1530153811198_0001, Tracking URL = http://CDHNode1:8088/proxy/application_1530153811198_0001/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1530153811198_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-06-28 10:57:38,636 Stage-1 map = 0%, reduce = 0%
2018-06-28 10:57:53,893 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.27 sec
MapReduce Total cumulative CPU time: 1 seconds 270 msec
Ended Job = job_1530153811198_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://cluster1/hive/warehouse/hive.db/test/.hive-staging_hive_2018-06-28_10-57-11_240_3145632387075179354-1/-ext-10000
Loading data to table hive.test
Table hive.test stats: [numFiles=1, numRows=1, totalSize=13, rawDataSize=12]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.27 sec HDFS Read: 3706 HDFS Write: 78 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 270 msec
OK
Time taken: 44.177 seconds hive> select * from test;
OK
1 hello hive
Time taken: 0.146 seconds, Fetched: 1 row(s)

注:hive默认配置不支持update和delete操作,会报错:

FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.

解决方法,在 hive-site.xml中添加相关参数,具体配置参见 /home/hadoop/app/hive/conf/hive-site.xml:

重启启动hive,执行delete语句,还是会报错:

FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table hive.test that does not use an AcidOutputFormat or is not bucketed

说是要进行delete操作的表test不是AcidOutputFormat或没有分桶。估计是要求输出是AcidOutputFormat然后必须分桶。网上查到确实如此,而且目前只有ORCFileformat支持AcidOutputFormat,不仅如此建表时必须指定参数('transactional' = true)。感觉太麻烦了。。。。

照网上重新建表:

hive> create table test(id int ,name string )clustered by (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
hive> insert into table test values (1,'row1'),(2,'row2'),(3,'row3');
hive> delete from test where id = 1;
hive> delete from test where name = 'row2';
hive> update test set name = 'Raj' where id = 3;

执行delete,update语句正常

  1. hive web界面的启动 bin/hive –service hwi & (&表示后台运行) 用于通过浏览器来访问hive,感觉没多大用途,浏览器访问地址是:127.0.0.1:9999/hwi 启动时需要 hive-hwi-*.war 启动包,CDH版本的Hive没带此包,如果需要安装,方法如下: 首先在官网下载相应的hive源码包 点击下载apache-hive-1.2.2-src.tar.gz,由于没有找到1.1.0版本的,故下载1.2.2 然后解压安装
tar zxvf apache-hive-1.2.2-src.tar.gz
cd apache-hive-1.2.2-src/hwi/web
jar -cvf hive-hwi-1.2.2.war *
cp hive-hwi-1.2.2.war /home/hadoop/app/hive/lib
cp /home/hadoop/app/jdk/lib/tools.jar /home/hadoop/app/hive/lib/

再次启动 bin/hive –service hwi & 就可以在浏览器访问:

http://CDHNode3:9999/hwi

  1. hive 远程服务 (默认端口号10000) 启动方式 bin/hive –service hiveserver &(&表示后台运行) 或者 bin/hive –service hiveserver2 &(&表示后台运行) 用java,python等程序实现通过jdbc等驱动的访问hive就用这种起动方式了,-p 指定端口,这个是程序员最需要的方式了,也可以直接在配置文件里面修改

其中 hiveserver 与 hiveserver2 区别如下: 两者都允许远程客户端使用多种编程语言,通过HiveServer或者HiveServer2,客户端可以在不启动CLI的情况下对Hive中的数据进行操作,连这个和都允许远程客户端使用多种编程语言如java,python等向hive提交请求,取回结果(从hive0.15起就不再支持hiveserver了),但是在这里我们还是要说一下hiveserver。HiveServer或者HiveServer2都是基于Thrift的,但HiveSever有时被称为Thrift server,而HiveServer2却不会。既然已经存在HiveServer,为什么还需要HiveServer2呢?这是因为HiveServer不能处理多于一个客户端的并发请求,这是由于HiveServer使用的Thrift接口所导致的限制,不能通过修改HiveServer的代码修正。因此在Hive-0.11.0版本中重写了HiveServer代码得到了HiveServer2,进而解决了该问题。HiveServer2支持多客户端的并发和认证,为开放API客户端如JDBC、ODBC提供更好的支持。

HiveServer version Connection URL Driver Class
HiveServer2 jdbc:hive2://: org.apache.hive.jdbc.HiveDriver
HiveServer1 jdbc:hive://: org.apache.hadoop.hive.jdbc.HiveDriver

hiveserver2启动方式: hiveserver2允许在配置文件hive-site.xml中进行配置管理,具体的参数为:

hive.server2.thrift.min.worker.threads – 最小工作线程数,默认为5。
hive.server2.thrift.max.worker.threads – 最小工作线程数,默认为500。
hive.server2.thrift.port – TCP 的监听端口,默认为10000。
hive.server2.thrift.bind.host – TCP绑定的主机,默认为localhost

参数在hive-site.xml中配置的形式为:

	<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>listen port</description>
</property>

启动hiveserver2 在生产环境中使用Hive,强烈建议使用HiveServer2来提供服务,好处很多:

  1. 在应用端不用部署Hadoop和Hive客户端;
  2. 相比hive-cli方式,HiveServer2不用直接将HDFS和Metastore暴漏给用户;
  3. 有安全认证机制,并且支持自定义权限校验;
  4. 配合zookeeper有HA机制,解决应用端的并发和负载均衡问题;
  5. JDBC方式,可以使用任何语言,方便与应用进行数据交互;
  6. 从2.0开始,HiveServer2提供了WEB UI。

分别启动CDHNode3-5主机的hiveserver2:

nohup hiveserver2 > /home/hadoop/app/hive/logs/hiveserver2.log 2>&1 &

启动zk cli查看注册的hiveserver2

/home/hadoop/app/zookeeper/bin/zkCli.sh -server CDHNode1:2181,CDHNode2:2181,CDHNode3:2181
[zk: CDHNode1:2181,CDHNode2:2181,CDHNode3:2181(CONNECTED) 1] ls /hiveserver2
[serverUri=CDHNode4:10001;version=1.1.0-cdh5.14.2;sequence=0000000001, serverUri=CDHNode3:10001;version=1.1.0-cdh5.14.2;sequence=0000000002,serverUri=CDHNode5:10001;version=1.1.0-cdh5.14.2;sequence=0000000003]
[zk: CDHNode1:2181,CDHNode2:2181,CDHNode3:2181(CONNECTED) 2]
  • 可以看到3台主机的hiveserver2均注册了

使用beeline验证hiveserver2

hadoop@CDHNode3:/home/hadoop/app/hive/conf>beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/app/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Beeline version 1.1.0-cdh5.14.2 by Apache Hive
beeline> !connect jdbc:hive2://CDHNode1:2181,CDHNode2:2181,CDHNode3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
scan complete in 1ms
Connecting to jdbc:hive2://CDHNode1:2181,CDHNode2:2181,CDHNode3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Enter username for jdbc:hive2://CDHNode1:2181,CDHNode2:2181,CDHNode3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2:
Enter password for jdbc:hive2://CDHNode1:2181,CDHNode2:2181,CDHNode3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2:
18/06/29 11:14:14 [main]: INFO jdbc.HiveConnection: Connected to CDHNode4:10001
Connected to: Apache Hive (version 1.1.0-cdh5.14.2)
Driver: Hive JDBC (version 1.1.0-cdh5.14.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://CDHNode1:2181,CDHNode2:2181,C>
0: jdbc:hive2://CDHNode1:2181,CDHNode2:2181,C> create database leffss;
INFO : Compiling command(queryId=hadoop_20180629111515_c662a537-ec49-4380-8328-058a6b0f5c33): create database leffss
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hadoop_20180629111515_c662a537-ec49-4380-8328-058a6b0f5c33); Time taken: 0.023 seconds
INFO : Executing command(queryId=hadoop_20180629111515_c662a537-ec49-4380-8328-058a6b0f5c33): create database leffss
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hadoop_20180629111515_c662a537-ec49-4380-8328-058a6b0f5c33); Time taken: 0.12 seconds
INFO : OK
No rows affected (0.163 seconds)
0: jdbc:hive2://CDHNode1:2181,CDHNode2:2181,C> show databases;
INFO : Compiling command(queryId=hadoop_20180629111717_a91cdf08-2431-47a5-be89-3926cb0731fd): show databases
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=hadoop_20180629111717_a91cdf08-2431-47a5-be89-3926cb0731fd); Time taken: 0.025 seconds
INFO : Executing command(queryId=hadoop_20180629111717_a91cdf08-2431-47a5-be89-3926cb0731fd): show databases
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hadoop_20180629111717_a91cdf08-2431-47a5-be89-3926cb0731fd); Time taken: 0.046 seconds
INFO : OK
+----------------+--+
| database_name |
+----------------+--+
| default |
| leffss |
+----------------+--+
2 rows selected (0.117 seconds)

这里还没有开启hiveserver2的账号验证,账号和密码直接输入空,创建的数据在hadoop hdfs权限是如下

hadoop@CDHNode1:/home/hadoop>hadoop fs -ls /hive/warehouse
Found 2 items
drwx-wx-wx - anonymous supergroup 0 2018-06-29 11:15 /hive/warehouse/leffss.db

这样使用HiveServer2时候,将非常危险,因为任何人都可以作为超级用户来操作Hive及HDFS数据。开启hive的用户安全认证后面再补充。

停止hiveserver2 查找到hiveserver2相关进程id,然后kill id即可。

Centos7.5安装分布式Hadoop2.6.0+Hbase+Hive(CDH5.14.2离线安装tar包)的更多相关文章

  1. hadoop2.2.0 + hbase 0.94 + hive 0.12 配置记录

    一开始用hadoop2.2.0 + hbase 0.96 + hive 0.12 ,基本全部都配好了.只有在hive中查询hbase的表出错.以直报如下错误: java.io.IOException: ...

  2. windows下eclipse连接ubuntu伪分布式hadoop2.6.0

    环境: win10 jdk1.7 hadoop2.6.0 linux虚拟机 Ubuntu14.04 首先把安装在Ubuntu上的hadoop2.6.0.tar.gz复制到windows系统上,解压到任 ...

  3. CDH-5.7.1离线安装

    CDH-5.7.1离线安装 参考自:http://blog.csdn.net/jdplus/article/details/45920733   1.文件下载 CDH (Cloudera's Dist ...

  4. 完全分布式hadoop2.5.0安装 VMware下虚拟机centos6.4安装1主两从hadoop

    请跟我走,从零开始搭建hadoop2.5.0环境.总览第一步:搭建三台能不用密码shh的虚拟机.第二步,装jdk,解压hadoop文件,配置环境变量和xml文件.第三步,复制克隆两个slave机器.调 ...

  5. 03.搭建Spark集群(CentOS7+Spark2.1.1+Hadoop2.8.0)

    接上一篇:https://www.cnblogs.com/yjm0330/p/10077076.html 一.下载安装scala 1.官网下载 2.spar01和02都建立/opt/scala目录,解 ...

  6. Ubuntu14.04 安装配置Hadoop2.6.0

    目前关于Hadoop的安装配置教程书上.官方教程.博客都有很多,但由于对Linux环境的不熟悉以及各种教程或多或少有这样那样的坑,很容易导致折腾许久都安装不成功(本人就是受害人之一).经过几天不断尝试 ...

  7. ubuntu搭建分布式hadoop-2.6.0概略和错误

    详细配置:http://blog.csdn.net/ggz631047367/article/details/42426391 1.修改机器/etc/hostname分别为   master    s ...

  8. 【安装】Hadoop2.8.0搭建过程整理版

    Hadoop搭建过程 前期环境搭建主要分为软件的安装与配置文件的配置,集成的东西越多,配置项也就越复杂. Hadoop集成了一个动物园,所以配置项也比较多,且每个版本之间会有少许差异. 安装的方式有很 ...

  9. CentOS7.4伪分布式搭建 hadoop+zookeeper+hbase+opentsdb

    前言 由于hadoop和hbase都得想zookeeper注册,所以启动顺序为 zookeeper——>hadoop——>hbase,关闭顺序反之 一.前期准备 1.配置ip 进入文件编辑 ...

随机推荐

  1. sql 基本查询

    要查询数据库表的数据,我们使用如下的SQL语句: SELECT * FROM <表名> 假设表名是students,要查询students表的所有行,我们用如下SQL语句: -- 查询st ...

  2. linux IPC的PIPE

    一.PIPE(无名管道) 函数原型: #include <unistd.h> ]); 通常,进程会先调用pipe,接着调用fork,从而创建从父进程到子进程的IPC通道. 父进程和子进程之 ...

  3. Vue 事件相关实例方法---on/emit/off/once

    一.初始位置 平常项目中写逻辑,避免不了注册/触发各种事件 今天来研究下 Vue 中,我们平常用到的关于 on/emit/off/once 的实现原理 关于事件的方法,是在 Vue 项目下面文件中的 ...

  4. Dubbo 如何成为连接异构微服务体系的最佳服务开发框架

    从编程开发的角度来说,Apache Dubbo (以下简称 Dubbo)首先是一款 RPC 服务框架,它最大的优势在于提供了面向接口代理的服务编程模型,对开发者屏蔽了底层的远程通信细节.同时 Dubb ...

  5. Sqli labs系列-less-2 详细篇

    就今天晚上一个小插曲,瞬间感觉我被嘲讽了. SQL手工注入这个东西,杂说了吧,如果你好久不玩的话,一时说开了,你也只能讲个大概,有时候,长期不写写,你的构造语句还非常容易忘,要不我杂会被瞬间嘲讽了啊. ...

  6. Android_开发片段(Part 3)

    1.Android中的五种布局方式:线性布局(Linear Layout).相对布局(Relative Layout).表格布局(Table Layout).网格视图(Grid View).标签布局( ...

  7. JAVA调用R脚本 windwos路径下

    RConnection c = new RConnection();// REXP x = c.eval("source('D:\\\\jiaoben\\\\RJava_test.R',en ...

  8. STL unique

    1: template <class ForwardIterator> 2: ForwardIterator unique (ForwardIterator first, ForwardI ...

  9. CSP2019总结

    CSP2019总结 前言 赛前停课集训了两个星期,自认为已经准备充分了,结果... 不知道有没有写挂分,即使一分没挂,满打满算也只有400出头,还是太菜了. Day0 晚上复习了一会,打了会游戏就睡了 ...

  10. LeetCode 相交链表&环形链表II

    题目链接:https://leetcode-cn.com/problems/intersection-of-two-linked-lists/ 题目连接:https://leetcode-cn.com ...