1、主机规划

序号 主机名 IP地址 角色
1 nn-1 192.168.9.21 NameNode、mr-jobhistory、zookeeper、JournalNode
2 nn-2 192.168.9.22 Secondary NameNode、JournalNode
3 dn-1 192.168.9.23 DataNode、JournalNode、zookeeper、ResourceManager、NodeManager
4 dn-2 192.168.9.24 DataNode、zookeeper、ResourceManager、NodeManager
5 dn-3 192.168.9.25 DataNode、NodeManager

集群说明:
(1)、对于集群规模小于7台和以下的, 可以不做NameNode HA。
(2)、HA的集群, JournalNode节点要在3个以上, 建议设置成5个节点。JournalNode是轻量级服务, 为了本地性, 其中两个JournalNode和两台NameNode节点复用。其他JournalNode和分散在其他节点上。
(3)、HA的集群,zookeeper节点要在3个以上, 建议设置成5个或者7个节点。zookeeper可以和DataNode节点复用。
(4)、HA的集群,ResourceManager建议单独一个节点。对于较大规模的集群,且有空闲的主机资源, 可以考虑设置ResourceManager的HA。

2、主机环境设置

2.1 配置JDK


卸载OpenJDK:
  1. --查看java版本
  2. [root@dtgr ~]# java -version
  3. java version "1.7.0_45"
  4. OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15)
  5. OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)--查看安装源

  6. [root@dtgr ~]# rpm -qa | grep java
  7. java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64-- 卸载
  8. [root@dtgr ~]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
  9. --验证是否卸载成功
  10. [root@dtgr ~]# rpm -qa | grep java
  11. [root@dtgr ~]# java -version
  12. -bash: /usr/bin/java: 没有那个文件或目录

安装jdk:
  1. -- 下载并解压java源码包
  2. [root@dtgr java]# mkdir /usr/local/java
  3. [root@dtgr java]# mv jdk-7u79-linux-x64.tar.gz /usr/local/java
  4. [root@dtgr java]# cd /usr/local/java
  5. [root@dtgr java]# tar xvf jdk-7u79-linux-x64.tar.gz
  6. [root@dtgr java]# ls
  7. jdk1.7.0_79 jdk-7u79-linux-x64.tar.gz
  8. [root@dtgr java]#
  9. --- 添加环境变量
  10. [root@dtgr java]# vim /etc/profile
  11. [root@dtgr java]# tail /etc/profile
  12. export JAVA_HOME=/usr/local/java/jdk1.7.0_79
  13. export JRE_HOME=/usr/local/java/jdk1.7.0_79/jre
  14. export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPATH
  15. export PATH=$JAVA_HOME/bin:$PATH
  16. -- 生效环境变量
  17. [root@dtgr ~]# source /etc/profile
  18. -- 验证
  19. [root@dtgr ~]# java -version
  20. java version "1.7.0_79"
  21. Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
  22. Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
  23. [root@dtgr ~]# javac -version
  24. javac 1.7.0_79

2.2 修改主机名和配置主机名解析

在所有节点按照规划修改主机名, 并将主机名加入/etc/hosts文件。
修改主机名:
  1. [root@dn-3 ~]# cat /etc/sysconfig/network
  2. NETWORKING=yes
  3. HOSTNAME=dn-3
  4. [root@dn-3 ~]# hostname dn-3

配置/etc/hosts, 并分发到所有节点:
  1. [root@dn-3 ~]# cat /etc/hosts
  2. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
  3. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
  4. 192.168.9.21 nn-1
  5. 192.168.9.22 nn-2
  6. 192.168.9.23 dn-1
  7. 192.168.9.24 dn-2
  8. 192.168.9.25 dn-3

2.3 新建hadoop账户

用户和组均为hadoop, 密码为hadoop, home目录为/hadoop。
  1. [root@dn-3 ~]# useradd -d /hadoop hadoop

2.4 配置ntp时钟同步

将nn-1主机作为时钟源)
#vi  /etc/ntp.conf
#server 0.centos.pool.ntp.org
#server 1.centos.pool.ntp.org
#server 2.centos.pool.ntp.org
server nn-1

配置ntp服务自启动
#chkconfig ntpd on
启动ntp服务
#service ntpd start

2.5 关闭防火墙iptables和selinux

(1)、关闭iptables
  1. [root@dn-3 ~]# service iptables stop
  2. [root@dn-3 ~]# chkconfig iptables off
  3. [root@dn-3 ~]# chkconfig --list | grep iptables
  4. iptables 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:关闭 6:关闭
  5. [root@dn-3 ~]#

(2)、关闭selinux
  1. [root@dn-3 ~]# setenforce 0
  2. setenforce: SELinux is disabled
  3. [root@dn-3 ~]# vim /etc/sysconfig/selinux
SELINUX=disabled

2.6 设置ssh无密码登陆

(1)、在所有节点生成密钥
所有节点, 切换到hadoop用户下,生成密钥,一路回车:
  1. [hadoop@nn-1 ~]$ ssh-keygen -t rsa

(2)、在nn-1上面,将公钥复制到文件authorized_keys中:
命令:$ ssh  主机名   'cat ./.ssh/id_rsa.pub' >> authorized_keys
将上面的命令的主机名替换成实际的主机名, 在nn-1上面将所有的主机都执行一次,包括自己, 如下示例:
  1. [hadoop@nn-1 ~]$ ssh nn-1 'cat ./.ssh/id_rsa.pub' >> authorized_keys
  2. hadoop@nn-1's password:

(3)、设置权限
  1. [hadoop@nn-1 .ssh]$ chmod 644 authorized_keys

(4)、将authorized_keys分发到所有节点: $HOME/.ssh/ 。
如下示例:
  1. [hadoop@nn-1 .ssh]$ scp authorized_keys hadoop@nn-2:/hadoop/.ssh/

3、安装配置Hadoop


说明: 先在nn-1上面修改配置, 配置完毕批量分发到其他节点。

3.1 上传hadoop、zookeeper安装包

复制安装包到/hadoop目录下。
解压安装包: [hadoop@nn-1 ~]$ tar -xzvf hadoop2-js-0121.tar.gz

3.2 修改hadoop-env.sh

  1. export JAVA_HOME=/usr/local/java/jdk1.7.0_79
  2. export HADOOP_HEAPSIZE=2000
  3. export HADOOP_NAMENODE_INIT_HEAPSIZE=10000
  4. export HADOOP_OPTS="-server $HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
  5. export HADOOP_NAMENODE_OPTS="-Xmx15000m -Xms15000m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger
  6. =${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"

3.3 修改core-site.xml

  1. <configuration>
  2. <property>
  3. <name>fs.defaultFS</name>
  4. <value>hdfs://dpi</value>
  5. </property>
  6. <property>
  7. <name>io.file.buffer.size</name>
  8. <value>131072</value>
  9. </property>
  10. <property>
  11. <name>hadoop.tmp.dir</name>
  12. <value>file:/hadoop/hdfs/temp</value>
  13. <description>Abase for other temporary directories.</description>
  14. </property>
  15. <property>
  16. <name>hadoop.proxyuser.hduser.hosts</name>
  17. <value>*</value>
  18. </property>
  19. <property>
  20. <name>hadoop.proxyuser.hduser.groups</name>
  21. <value>*</value>
  22. </property>
  23. <property>
  24. <name>ha.zookeeper.quorum</name>
  25. <value>dn-1:2181,dn-2:2181,dn-3:2181</value>
  26. </property>
  27. </configuration>

3.4 修改hdfs-site.xml

  1. <configuration>
  2. <property>
  3. <name>dfs.namenode.secondary.http-address</name>
  4. <value>nn-1:9001</value>
  5. </property>
  6. <property>
  7. <name>dfs.namenode.name.dir</name>
  8. <value>file:/hadoop/hdfs/name</value>
  9. </property>
  10. <property>
  11. <name>dfs.datanode.data.dir</name>
  12. <value>file:/hadoop/hdfs/data,file:/hadoopdata/hdfs/data</value>
  13. </property>
  14. <property>
  15. <name>dfs.replication</name>
  16. <value>3</value>
  17. </property>
  18. <property>
  19. <name>dfs.webhdfs.enabled</name>
  20. <value>true</value>
  21. </property>
  22. <property>
  23. <name>dfs.nameservices</name>
  24. <value>dpi</value>
  25. </property>
  26. <property>
  27. <name>dfs.ha.namenodes.dpi</name>
  28. <value>nn-1,nn-2</value>
  29. </property>
  30. <property>
  31. <name>dfs.namenode.rpc-address.dpi.nn-1</name>
  32. <value>nn-1:9000</value>
  33. </property>
  34. <property>
  35. <name>dfs.namenode.http-address.dpi.nn-1</name>
  36. <value>nn-1:50070</value>
  37. </property>
  38. <property>
  39. <name>dfs.namenode.rpc-address.dpi.nn-2</name>
  40. <value>nn-2:9000</value>
  41. </property>
  42. <property>
  43. <name>dfs.namenode.http-address.dpi.nn-2</name>
  44. <value>nn-2:50070</value>
  45. </property>
  46. <property>
  47. <name>dfs.namenode.servicerpc-address.dpi.nn-1</name>
  48. <value>nn-1:53310</value>
  49. </property>
  50. <property>
  51. <name>dfs.namenode.servicerpc-address.dpi.nn-2</name>
  52. <value>nn-2:53310</value>
  53. </property>
  54. <property>
  55. <name>dfs.ha.automatic-failover.enabled</name>
  56. <value>true</value>
  57. </property>
  58. <property>
  59. <name>dfs.namenode.shared.edits.dir</name>
  60. <value>qjournal://nn-1:8485;nn-2:8485;dn-1:8485/dpi</value>
  61. </property>
  62. <property>
  63. <name>dfs.client.failover.proxy.provider.dpi</name>
  64. <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  65. </property>
  66. <property>
  67. <name>dfs.journalnode.edits.dir</name>
  68. <value>/hadoop/hdfs/journal</value>
  69. </property>
  70. <property>
  71. <name>dfs.ha.fencing.methods</name>
  72. <value>sshfence</value>
  73. </property>
  74. <property>
  75. <name>dfs.ha.fencing.ssh.private-key-files</name>
  76. <value>/hadoop/.ssh/id_rsa</value>
  77. </property>
  78. </configuration>
新建配置文件中的目录:
  1. mkdir -p /hadoop/hdfs/name
  2. mkdir -p /hadoop/hdfs/data
  3. mkdir -p /hadoop/hdfs/temp
  4. mkdir -p /hadoop/hdfs/journal
  5. 授权:chmod 755 /hadoop/hdfs
  6. mkdir -p /hadoopdata/hdfs/data
  7. chmod 755 /hadoopdata/hdfs

属主和属组修改为:hadoop:hadoop


3.5 修改mapred-site.xml

  1. <configuration>
  2. <property>
  3. <name>mapreduce.framework.name</name>
  4. <value>yarn</value>
  5. </property>
  6. <property>
  7. <name>mapreduce.jobhistory.address</name>
  8. <value>nn-1:10020</value>
  9. </property>
  10. <property>
  11. <name>mapreduce.jobhistory.webapp.address</name>
  12. <value>nn-1:19888</value>
  13. </property>
  14. </configuration>


3.6 修改yarn-site.xml

启用yarn ha功能, 根据规划, dn-1和dn-2为ResourceManager节点
  1. <configuration>
  2. <!-- Site specific YARN configuration properties -->
  3. <property>
  4. <name>yarn.nodemanager.aux-services</name>
  5. <value>mapreduce_shuffle</value>
  6. </property>
  7. <property>
  8. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  9. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  10. </property>
  11. <property>
  12. <name>yarn.resourcemanager.ha.enabled</name>
  13. <value>true</value>
  14. </property>
  15. <property>
  16. <name>yarn.resourcemanager.ha.rm-ids</name>
  17. <value>rm1,rm2</value>
  18. </property>
  19. <property>
  20. <name>yarn.resourcemanager.hostname.rm1</name>
  21. <value>dn-1</value>
  22. </property>
  23. <property>
  24. <name>yarn.resourcemanager.hostname.rm2</name>
  25. <value>dn-2</value>
  26. </property>
  27. <property>
  28. <name>yarn.resourcemanager.recovery.enabled</name>
  29. <value>true</value>
  30. </property>
  31. <property>
  32. <name>yarn.resourcemanager.store.class</name>
  33. <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  34. </property>
  35. <property>
  36. <name>yarn.resourcemanager.zk-address</name>
  37. <value>dn-1:2181,dn-2:2181,dn-3:2181</value>
  38. <description>For multiple zk services, separate them with comma</description>
  39. </property>
  40. <property>
  41. <name>yarn.resourcemanager.cluster-id</name>
  42. <value>yarn-ha</value>
  43. </property>
  44. </configuration>

3.7 修改slaves

将所有的DataNode节点加入到slaves文件中:
  1. dn-1
  2. dn-2
  3. dn-3


3.8 修改yarn-env.sh

  1. # some Java parameters
  2. # export JAVA_HOME=/home/y/libexec/jdk1.6.0/
  3. if [ "$JAVA_HOME" != "" ]; then
  4. #echo "run java in $JAVA_HOME"
  5. JAVA_HOME=/usr/local/java/jdk1.7.0_79
  6. fi
  7. JAVA_HEAP_MAX=-Xmx15000m
  8. YARN_HEAPSIZE=15000
  9. export YARN_RESOURCEMANAGER_HEAPSIZE=5000
  10. export YARN_TIMELINESERVER_HEAPSIZE=10000
  11. export YARN_NODEMANAGER_HEAPSIZE=10000

3.9 分发配置好的hadoop目录到所有节点

  1. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@nn-2:/hadoop
  2. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@dn-1:/hadoop
  3. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@dn-2:/hadoop
  4. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@dn-3:/hadoop

4 安装配置zookeeper

切换到hadoop目录下面, 根据规划, 三台zookeeper节点为:nn-1, dn-1, dn-2。
先在nn-1节点配置zookeeper, 然后分发至三个zookeeper节点:

4.1 在nn-1上传并解压zookeeper


4.2 修改配置文件/hadoop/zookeeper/conf/zoo.cfg

  1. dataDir=/hadoop/zookeeper/data/
  2. dataLogDir=/hadoop/zookeeper/log/
  3. # the port at which the clients will connect
  4. clientPort=2181
  5. server.1=nn-1:2887:3887
  6. server.2=dn-1:2888:3888
  7. server.3=dn-2:2889:3889

4.3 从nn-1分发配置的zookeeper目录到其他节点

  1. [hadoop@nn-1 ~]$ scp -rp zookeeper hadoop@dn-1:/hadoop
  2. [hadoop@nn-1 ~]$ scp -rp zookeeper hadoop@dn-2:/hadoop

4.4 在所有zk节点创建目录

  1. [hadoop@dn-1 ~]$ mkdir /hadoop/zookeeper/data/
  2. [hadoop@dn-1 ~]$ mkdir /hadoop/zookeeper/log/

4.5 修改myid

在所有zk节点, 切换到目录/hadoop/zookeeper/data,创建myid文件:
注意:myid文件的内容为zoo.cfg文件中配置的server.后面的数字(即nn-1为1,dn-1为2,dn-2为3)。
在nn-1节点的myid内容为:
  1. [hadoop@nn-1 data]$ echo 1 > /hadoop/zookeeper/data/myid

其他zk节点也安要求创建myid文件。


4.6 设置环境变量

  1. $ echo "export ZOOKEEPER_HOME=/hadoop/zookeeper" >> $HOME/.bash_profile
  2. $ echo "export PATH=$ZOOKEEPER_HOME/bin:\$PATH" >> $HOME/.bash_profile
  3. $ source $HOME/.bash_profile


5 集群启动

5.1 启动zookeeper

根据规划, zk的节点为nn-1、dn-1和dn-2, 在这三台节点分别启动zk:

启动命令:
  1. [hadoop@nn-1 ~]$ /hadoop/zookeeper/bin/zkServer.sh start
  2. JMX enabled by default
  3. Using config: /hadoop/zookeeper/bin/../conf/zoo.cfg
  4. Starting zookeeper ... STARTED

查看进程, 可以看到QuorumPeerMain:
  1. [hadoop@nn-1 ~]$ jps
  2. 9382 QuorumPeerMain
  3. 9407 Jps

查看状态, 可以看到Mode: follower, 说明这是zk的从节点:
  1. [hadoop@nn-1 ~]$ /hadoop/zookeeper/bin/zkServer.sh status
  2. JMX enabled by default
  3. Using config: /hadoop/zookeeper/bin/../conf/zoo.cfg
  4. Mode: follower

查看状态, 可以看到Mode: leader, 说明这是zk的leader节点:
  1. [hadoop@dn-1 data]$ /hadoop/zookeeper/bin/zkServer.sh status
  2. JMX enabled by default
  3. Using config: /hadoop/zookeeper/bin/../conf/zoo.cfg
  4. Mode: leader

5.2 格式化zookeeper集群(只做一次)(机器nn-1上执行)


  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/bin/hdfs zkfc -formatZK
中间有个交互的步骤, 输入Y:
 
进入zk, 查看是否创建成功:
  1. [hadoop@nn-1 bin]$ ./zkCli.sh
 

5.3 启动zkfc(机器nn-1,nn-2上执行)

  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start zkfc
  2. starting zkfc, logging to /hadoop/hadoop/logs/hadoop-hadoop-zkfc-nn-1.out

使用jps, 可以看到进程DFSZKFailoverController:
  1. [hadoop@nn-1 ~]$ jps
  2. 9681 Jps
  3. 9638 DFSZKFailoverController
  4. 9382 QuorumPeerMain

 

5.4 启动journalnode

根据规划, 启动journalnode节点为nn-1、nn-2和dn-1, 在这三个节点分别使用如下的命令启动服务:
  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start journalnode
  2. starting journalnode, logging to /hadoop/hadoop/logs/hadoop-hadoop-journalnode-nn-1.out

使用jps命令可以看到进程JournalNode:
  1. [hadoop@nn-1 ~]$ jps
  2. 9714 JournalNode
  3. 9638 DFSZKFailoverController
  4. 9382 QuorumPeerMain
  5. 9762 Jps

5.5 格式化namenode(机器nn-1上执行)

  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/bin/hadoop namenode -format

查看日志信息:
 

5.6 启动namenode(机器nn-1上执行)

  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start namenode
  2. starting namenode, logging to /hadoop/hadoop/logs/hadoop-hadoop-namenode-nn-1.out
使用jps命令可以看到进程NameNode:
  1. [hadoop@nn-1 ~]$ jps
  2. 9714 JournalNode
  3. 9638 DFSZKFailoverController
  4. 9382 QuorumPeerMain
  5. 10157 NameNode
  6. 10269 Jps

5.7 格式化secondnamnode(机器nn-2上执行)

  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs namenode -bootstrapStandby
部分日志如下:
 

5.8 启动namenode(机器nn-2上执行)

  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start namenode
  2. starting namenode, logging to /hadoop/hadoop/logs/hadoop-hadoop-namenode-nn-2.out
使用jps命令可以看到进程NameNode:
  1. [hadoop@nn-2 ~]$ jps
  2. 53990 NameNode
  3. 54083 Jps
  4. 53824 JournalNode
  5. 53708 DFSZKFailoverController

5.9 启动datanode(机器dn-1到dn-3上执行)

  1. [hadoop@dn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start datanode
使用jps可以看到DataNode进程:
  1. [hadoop@dn-1 temp]$ jps
  2. 57007 Jps
  3. 56927 DataNode
  4. 56223 QuorumPeerMain


5.10 启动resourcemanager

根据规划,resourcemanager做了HA, 服务在节点dn-1和dn-2上面, 在dn-1和dn-2上面启动resourcemanager:
  1. [hadoop@dn-1 ~]$ /hadoop/hadoop/sbin/yarn-daemon.sh start resourcemanager
  2. starting resourcemanager, logging to /hadoop/hadoop/logs/yarn-hadoop-resourcemanager-dn-1.out

使用jps, 可以看到进程ResourceManager:
  1. [hadoop@dn-1 ~]$ jps
  2. 57173 QuorumPeerMain
  3. 58317 Jps
  4. 57283 JournalNode
  5. 58270 ResourceManager
  6. 58149 DataNode

5.11 启动jobhistory

根据规划, jobhistory服务在nn-1上面, 使用如下命令启动:
  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
  2. starting historyserver, logging to /hadoop/hadoop/logs/mapred-hadoop-historyserver-nn-1.out

使用jps, 可以看到进程JobHistoryServer:
  1. [hadoop@nn-1 ~]$ jps
  2. 11210 JobHistoryServer
  3. 9714 JournalNode
  4. 9638 DFSZKFailoverController
  5. 9382 QuorumPeerMain
  6. 11039 NameNode
  7. 11303 Jps

5.12 启动NodeManager

根据规划, dn-1、dn-2和dn-3是nodemanager, 在这三个节点启动NodeManager:
  1. [hadoop@dn-1 ~]$ /hadoop/hadoop/sbin/yarn-daemon.sh start nodemanager
  2. starting nodemanager, logging to /hadoop/hadoop/logs/yarn-hadoop-nodemanager-dn-1.out

使用jps可以看到进程NodeManager:
  1. [hadoop@dn-1 ~]$ jps
  2. 58559 NodeManager
  3. 57173 QuorumPeerMain
  4. 58668 Jps
  5. 57283 JournalNode
  6. 58270 ResourceManager
  7. 58149 DataNode


6、安装后查看和验证

6.1 HDFS相关操作命令

查看NameNode状态的命令
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -getServiceState nn-1

手工切换,将active的NameNode从nn-1切换到nn-2 。
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -DfSHAadmin -failover nn-1 nn-2
 
NameNode健康检查:
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -checkHealth nn-1
 将其中一台NameNode给kill后, 查看健康状态:
 


查看所有的DataNode列表:
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs dfsadmin -report | more
 
查看正常DataNode列表:
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs dfsadmin -report -live
  2. 17/03/01 22:49:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Configured Capacity: 224954695680 (209.51 GB)
  4. Present Capacity: 180557139968 (168.16 GB)
  5. DFS Remaining: 179963428864 (167.60 GB)
  6. DFS Used: 593711104 (566.21 MB)
  7. DFS Used%: 0.33%
  8. Under replicated blocks: 2
  9. Blocks with corrupt replicas: 0
  10. Missing blocks: 0
  11. -------------------------------------------------
  12. Live datanodes (3):
  13. Name: 192.168.9.23:50010 (dn-1)
  14. Hostname: dn-1
  15. Rack: /rack2
  16. Decommission Status : Normal
  17. Configured Capacity: 74984898560 (69.84 GB)
  18. DFS Used: 197902336 (188.73 MB)
  19. Non DFS Used: 14869356544 (13.85 GB)
  20. DFS Remaining: 59917639680 (55.80 GB)
  21. DFS Used%: 0.26%
  22. DFS Remaining%: 79.91%
  23. Configured Cache Capacity: 0 (0 B)
  24. Cache Used: 0 (0 B)
  25. Cache Remaining: 0 (0 B)
  26. Cache Used%: 100.00%
  27. Cache Remaining%: 0.00%
  28. Xceivers: 1
  29. Last contact: Wed Mar 01 22:49:42 CST 2017

查看异常DataNode列表:
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs dfsadmin -report -dead

获取指定DataNode信息(运行时间及版本等):
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -checkHealth nn-2
  2. 17/03/01 22:55:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -checkHealth nn-1
  4. 17/03/01 22:55:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


6.2 YARN相关的命令

查看resourceManager状态的命令:
  1. [hadoop@dn-1 hadoop]$ yarn rmadmin -getServiceState rm1
  2. active
  3. [hadoop@dn-1 hadoop]$ yarn rmadmin -getServiceState rm2
  4. standby

查看所有的yarn节点:
  1. [hadoop@dn-1 hadoop]$ /hadoop/hadoop/bin/yarn node -all -list
  2. 17/03/01 23:06:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Total Nodes:3
  4. Node-Id Node-State Node-Http-Address Number-of-Running-Containers
  5. dn-2:55506 RUNNING dn-2:8042 0
  6. dn-1:56447 RUNNING dn-1:8042 0
  7. dn-3:37533 RUNNING dn-3:8042 0

查看正常的yarn节点:
  1. [hadoop@dn-1 hadoop]$ /hadoop/hadoop/bin/yarn node -list
  2. 17/03/01 23:07:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Total Nodes:3
  4. Node-Id Node-State Node-Http-Address Number-of-Running-Containers
  5. dn-2:55506 RUNNING dn-2:8042 0
  6. dn-1:56447 RUNNING dn-1:8042 0
  7. dn-3:37533 RUNNING dn-3:8042 0

查看指定节点的信息:
/hadoop/hadoop/bin/yarn node -status <NodeId>
  1. [hadoop@dn-1 hadoop]$ /hadoop/hadoop/bin/yarn node -status dn-2:55506
  2. 17/03/01 23:08:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Node Report :
  4. Node-Id : dn-2:55506
  5. Rack : /default-rack
  6. Node-State : RUNNING
  7. Node-Http-Address : dn-2:8042
  8. Last-Health-Update : 星期三 01/三月/17 11:06:21:373CST
  9. Health-Report :
  10. Containers : 0
  11. Memory-Used : 0MB
  12. Memory-Capacity : 8192MB
  13. CPU-Used : 0 vcores
  14. CPU-Capacity : 8 vcores
  15. Node-Labels :

查看当前运行的MapReduce任务:
  1. [hadoop@dn-2 ~]$ /hadoop/hadoop/bin/yarn application -list
  2. 17/03/01 23:10:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
  4. Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
  5. application_1488375590901_0004 QuasiMonteCarlo MAPREDUCE hadoop default RUNNING UNDEFINED


6.3 使用自带的例子测试

  1. [hadoop@dn-1 ~]$ cd hadoop/
  2. [hadoop@dn-1 hadoop]$
  3. [hadoop@dn-1 hadoop]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 200

  1. [hadoop@dn-1 hadoop]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 200
  2. Number of Maps = 2
  3. Samples per Map = 200
  4. 17/02/28 01:51:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  5. Wrote input for Map #0
  6. Wrote input for Map #1
  7. Starting Job
  8. 17/02/28 01:51:15 INFO input.FileInputFormat: Total input paths to process : 2
  9. 17/02/28 01:51:15 INFO mapreduce.JobSubmitter: number of splits:2
  10. 17/02/28 01:51:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1488216892564_0001
  11. 17/02/28 01:51:16 INFO impl.YarnClientImpl: Submitted application application_1488216892564_0001
  12. 17/02/28 01:51:16 INFO mapreduce.Job: The url to track the job: http://dn-1:8088/proxy/application_1488216892564_0001/
  13. 17/02/28 01:51:16 INFO mapreduce.Job: Running job: job_1488216892564_0001
  14. 17/02/28 01:51:24 INFO mapreduce.Job: Job job_1488216892564_0001 running in uber mode : false
  15. 17/02/28 01:51:24 INFO mapreduce.Job: map 0% reduce 0%
  16. 17/02/28 01:51:38 INFO mapreduce.Job: map 100% reduce 0%
  17. 17/02/28 01:51:49 INFO mapreduce.Job: map 100% reduce 100%
  18. 17/02/28 01:51:49 INFO mapreduce.Job: Job job_1488216892564_0001 completed successfully
  19. 17/02/28 01:51:50 INFO mapreduce.Job: Counters: 49
  20. File System Counters
  21. FILE: Number of bytes read=50
  22. FILE: Number of bytes written=326922
  23. FILE: Number of read operations=0
  24. FILE: Number of large read operations=0
  25. FILE: Number of write operations=0
  26. HDFS: Number of bytes read=510
  27. HDFS: Number of bytes written=215
  28. HDFS: Number of read operations=11
  29. HDFS: Number of large read operations=0
  30. HDFS: Number of write operations=3
  31. Job Counters
  32. Launched map tasks=2
  33. Launched reduce tasks=1
  34. Data-local map tasks=2
  35. Total time spent by all maps in occupied slots (ms)=25604
  36. Total time spent by all reduces in occupied slots (ms)=7267
  37. Total time spent by all map tasks (ms)=25604
  38. Total time spent by all reduce tasks (ms)=7267
  39. Total vcore-seconds taken by all map tasks=25604
  40. Total vcore-seconds taken by all reduce tasks=7267
  41. Total megabyte-seconds taken by all map tasks=26218496
  42. Total megabyte-seconds taken by all reduce tasks=7441408
  43. Map-Reduce Framework
  44. Map input records=2
  45. Map output records=4
  46. Map output bytes=36
  47. Map output materialized bytes=56
  48. Input split bytes=274
  49. Combine input records=0
  50. Combine output records=0
  51. Reduce input groups=2
  52. Reduce shuffle bytes=56
  53. Reduce input records=4
  54. Reduce output records=0
  55. Spilled Records=8
  56. Shuffled Maps =2
  57. Failed Shuffles=0
  58. Merged Map outputs=2
  59. GC time elapsed (ms)=419
  60. CPU time spent (ms)=6940
  61. Physical memory (bytes) snapshot=525877248
  62. Virtual memory (bytes) snapshot=2535231488
  63. Total committed heap usage (bytes)=260186112
  64. Shuffle Errors
  65. BAD_ID=0
  66. CONNECTION=0
  67. IO_ERROR=0
  68. WRONG_LENGTH=0
  69. WRONG_MAP=0
  70. WRONG_REDUCE=0
  71. File Input Format Counters
  72. Bytes Read=236
  73. File Output Format Counters
  74. Bytes Written=97
  75. Job Finished in 35.466 seconds
  76. Estimated value of Pi is 3.17000000000000000000

6.4 查看NameNode

 链接分别为:

192.168.9.21和192.168.9.22分别为NameNode和Secondary NameNode的地址。
 
 



6.5 查看NameNode 的HA切换是否正常

将nn-1上状态为active的NameNode进程kill, 查看nn-2上的NameNode能否从standby切换为active:
 

 


6.6 查看RM页面


 



查看节点信息, 192.168.9.23为Resource服务所在的active节点。
 


运行测试任务, 查看YARN HA能否自动切换:
  1. [hadoop@dn-2 hadoop]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 200

运行到中间的时候, 将rm1上的rm给kill掉, 查看切换是否正常:
 
查看standby的HA state变成了active:


查看下面的日志, 程序运行期间由于rm1被kill, 程序报错, 然后Trying to fail over immediately, 最终程序运行成功。
 
  1. [hadoop@dn-2 hadoop]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 200
  2. Number of Maps = 2
  3. Samples per Map = 200
  4. 17/02/28 02:11:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  5. Wrote input for Map #0
  6. Wrote input for Map #1
  7. Starting Job
  8. 17/02/28 02:11:12 INFO input.FileInputFormat: Total input paths to process : 2
  9. 17/02/28 02:11:12 INFO mapreduce.JobSubmitter: number of splits:2
  10. 17/02/28 02:11:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1488216892564_0002
  11. 17/02/28 02:11:13 INFO impl.YarnClientImpl: Submitted application application_1488216892564_0002
  12. 17/02/28 02:11:13 INFO mapreduce.Job: The url to track the job: http://dn-1:8088/proxy/application_1488216892564_0002/
  13. 17/02/28 02:11:13 INFO mapreduce.Job: Running job: job_1488216892564_0002
  14. 17/02/28 02:11:18 INFO retry.RetryInvocationHandler: Exception while invoking getApplicationReport of class ApplicationClientProtocolPBClientImpl over rm1. Trying to fail over immediately.
  15. java.io.EOFException: End of File Exception between local host is: "dn-2/192.168.9.24"; destination host is: "dn-1":8032; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
  16. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  17. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  18. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  19. at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  20. at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
  21. at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
  22. at org.apache.hadoop.ipc.Client.call(Client.java:1472)
  23. at org.apache.hadoop.ipc.Client.call(Client.java:1399)
  24. at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
  25. at com.sun.proxy.$Proxy14.getApplicationReport(Unknown Source)
  26. at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)
  27. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  28. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  29. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  30. at java.lang.reflect.Method.invoke(Method.java:606)
  31. at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
  32. at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  33. at com.sun.proxy.$Proxy15.getApplicationReport(Unknown Source)
  34. at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:399)
  35. at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:302)
  36. at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:153)
  37. at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:322)
  38. at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:422)
  39. at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:575)
  40. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:325)
  41. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322)
  42. at java.security.AccessController.doPrivileged(Native Method)
  43. at javax.security.auth.Subject.doAs(Subject.java:415)
  44. at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  45. at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
  46. at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:610)
  47. at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1355)
  48. at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1317)
  49. at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
  50. at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
  51. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  52. at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
  53. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  54. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  55. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  56. at java.lang.reflect.Method.invoke(Method.java:606)
  57. at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
  58. at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
  59. at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
  60. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  61. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  62. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  63. at java.lang.reflect.Method.invoke(Method.java:606)
  64. at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
  65. at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
  66. Caused by: java.io.EOFException
  67. at java.io.DataInputStream.readInt(DataInputStream.java:392)
  68. at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1071)
  69. at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
  70. 17/02/28 02:11:18 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
  71. 17/02/28 02:11:18 INFO retry.RetryInvocationHandler: Exception while invoking getApplicationReport of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 40859ms.
  72. java.net.ConnectException: Call From dn-2/192.168.9.24 to dn-2:8032 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
  73. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  74. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  75. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  76. at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  77. at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
  78. at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
  79. at org.apache.hadoop.ipc.Client.call(Client.java:1472)
  80. at org.apache.hadoop.ipc.Client.call(Client.java:1399)
  81. at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
  82. at com.sun.proxy.$Proxy14.getApplicationReport(Unknown Source)
  83. at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)
  84. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  85. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  86. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  87. at java.lang.reflect.Method.invoke(Method.java:606)
  88. at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
  89. at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  90. at com.sun.proxy.$Proxy15.getApplicationReport(Unknown Source)
  91. at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:399)
  92. at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:302)
  93. at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:153)
  94. at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:322)
  95. at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:422)
  96. at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:575)
  97. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:325)
  98. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322)
  99. at java.security.AccessController.doPrivileged(Native Method)
  100. at javax.security.auth.Subject.doAs(Subject.java:415)
  101. at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  102. at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
  103. at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:610)
  104. at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1355)
  105. at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1317)
  106. at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
  107. at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
  108. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  109. at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
  110. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  111. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  112. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  113. at java.lang.reflect.Method.invoke(Method.java:606)
  114. at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
  115. at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
  116. at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
  117. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  118. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  119. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  120. at java.lang.reflect.Method.invoke(Method.java:606)
  121. at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
  122. at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
  123. Caused by: java.net.ConnectException: 拒绝连接
  124. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  125. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
  126. at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
  127. at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
  128. at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
  129. at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
  130. at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
  131. at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
  132. at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
  133. at org.apache.hadoop.ipc.Client.call(Client.java:1438)
  134. ... 43 more
  135. 17/02/28 02:11:59 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1
  136. 17/02/28 02:11:59 INFO retry.RetryInvocationHandler: Exception while invoking getApplicationReport of class ApplicationClientProtocolPBClientImpl over rm1 after 2 fail over attempts. Trying to fail over after sleeping for 17213ms.
  137. java.net.ConnectException: Call From dn-2/192.168.9.24 to dn-1:8032 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
  138. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  139. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  140. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  141. at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  142. at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
  143. at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
  144. at org.apache.hadoop.ipc.Client.call(Client.java:1472)
  145. at org.apache.hadoop.ipc.Client.call(Client.java:1399)
  146. at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
  147. at com.sun.proxy.$Proxy14.getApplicationReport(Unknown Source)
  148. at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)
  149. at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
  150. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  151. at java.lang.reflect.Method.invoke(Method.java:606)
  152. at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
  153. at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  154. at com.sun.proxy.$Proxy15.getApplicationReport(Unknown Source)
  155. at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:399)
  156. at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:302)
  157. at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:153)
  158. at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:322)
  159. at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:422)
  160. at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:575)
  161. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:325)
  162. at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322)
  163. at java.security.AccessController.doPrivileged(Native Method)
  164. at javax.security.auth.Subject.doAs(Subject.java:415)
  165. at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  166. at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
  167. at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:610)
  168. at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1355)
  169. at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1317)
  170. at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
  171. at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
  172. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  173. at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
  174. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  175. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  176. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  177. at java.lang.reflect.Method.invoke(Method.java:606)
  178. at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
  179. at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
  180. at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
  181. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  182. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  183. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  184. at java.lang.reflect.Method.invoke(Method.java:606)
  185. at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
  186. at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
  187. Caused by: java.net.ConnectException: 拒绝连接
  188. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  189. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
  190. at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
  191. at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
  192. at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
  193. at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
  194. at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
  195. at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
  196. at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
  197. at org.apache.hadoop.ipc.Client.call(Client.java:1438)
  198. ... 42 more
  199. 17/02/28 02:12:16 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
  200. 17/02/28 02:12:18 INFO mapreduce.Job: Job job_1488216892564_0002 running in uber mode : false
  201. 17/02/28 02:12:18 INFO mapreduce.Job: map 0% reduce 0%
  202. 17/02/28 02:12:22 INFO mapreduce.Job: map 100% reduce 0%
  203. 17/02/28 02:12:28 INFO mapreduce.Job: map 100% reduce 100%
  204. 17/02/28 02:12:28 INFO mapreduce.Job: Job job_1488216892564_0002 completed successfully
  205. 17/02/28 02:12:28 INFO mapreduce.Job: Counters: 49
  206. File System Counters
  207. FILE: Number of bytes read=50
  208. FILE: Number of bytes written=326931
  209. FILE: Number of read operations=0
  210. FILE: Number of large read operations=0
  211. FILE: Number of write operations=0
  212. HDFS: Number of bytes read=510
  213. HDFS: Number of bytes written=215
  214. HDFS: Number of read operations=11
  215. HDFS: Number of large read operations=0
  216. HDFS: Number of write operations=3
  217. Job Counters
  218. Launched map tasks=2
  219. Launched reduce tasks=1
  220. Data-local map tasks=2
  221. Total time spent by all maps in occupied slots (ms)=22713
  222. Total time spent by all reduces in occupied slots (ms)=3213
  223. Total time spent by all map tasks (ms)=22713
  224. Total time spent by all reduce tasks (ms)=3213
  225. Total vcore-seconds taken by all map tasks=22713
  226. Total vcore-seconds taken by all reduce tasks=3213
  227. Total megabyte-seconds taken by all map tasks=23258112
  228. Total megabyte-seconds taken by all reduce tasks=3290112
  229. Map-Reduce Framework
  230. Map input records=2
  231. Map output records=4
  232. Map output bytes=36
  233. Map output materialized bytes=56
  234. Input split bytes=274
  235. Combine input records=0
  236. Combine output records=0
  237. Reduce input groups=2
  238. Reduce shuffle bytes=56
  239. Reduce input records=4
  240. Reduce output records=0
  241. Spilled Records=8
  242. Shuffled Maps =2
  243. Failed Shuffles=0
  244. Merged Map outputs=2
  245. GC time elapsed (ms)=233
  246. CPU time spent (ms)=12680
  247. Physical memory (bytes) snapshot=517484544
  248. Virtual memory (bytes) snapshot=2548441088
  249. Total committed heap usage (bytes)=260186112
  250. Shuffle Errors
  251. BAD_ID=0
  252. CONNECTION=0
  253. IO_ERROR=0
  254. WRONG_LENGTH=0
  255. WRONG_MAP=0
  256. WRONG_REDUCE=0
  257. File Input Format Counters
  258. Bytes Read=236
  259. File Output Format Counters
  260. Bytes Written=97
  261. Job Finished in 76.447 seconds
  262. Estimated value of Pi is 3.17000000000000000000
  263. [hadoop@dn-2 hadoop]$


7、安装Spark


规划, 在现有的Hadoop集群安装spark集群:
master节点: nn-1
worker节点: nn-2、dn-1、dn-2、dn-3。

7.1 安装配置Scala

上传安装包到nn-1的/hadoop目录下面,解压:
  1. [hadoop@nn-1 ~]$ tar -xzvf spark-1.6.0-bin-hadoop2.6.tgz
环境变量后面统一配置。

7.2 安装spark


上传安装包spark-1.6.0-bin-hadoop2.6.tgz到nn-1的目录/hadoop下面, 解压
  1. [hadoop@nn-1 ~]$ tar -xzvf spark-1.6.0-bin-hadoop2.6.tgz

进入目录:/hadoop/spark-1.6.0-bin-hadoop2.6/conf
复制生成文件spark-env.sh和slaves:
  1. [hadoop@nn-1 conf]$ pwd
  2. /hadoop/spark-1.6.0-bin-hadoop2.6/conf
  3. [hadoop@nn-1 conf]$ cp spark-env.sh.template spark-env.sh
  4. [hadoop@nn-1 conf]$ cp slaves.template slaves
编辑spark-env.sh, 加入如下内容:
  1. export JAVA_HOME=/usr/local/java/jdk1.7.0_79
  2. export SCALA_HOME=/hadoop/scala-2.11.7
  3. export SPARK_HOME=/hadoop/spark-1.6.0-bin-hadoop2.6
  4. export SPARK_MASTER_IP=nn-1
  5. export SPARK_WORKER_MEMORY=2g
  6. export HADOOP_CONF_DIR=/hadoop/hadoop/etc/hadoop
SPARK_WORKER_MEMORY根据实际情况配置。

编辑spark-env.sh, 加入如下内容:slaves
  1. nn-2
  2. dn-1
  3. dn-2
  4. dn-3
slaves指定的是worker节点。

7.3 配置环境变量

  1. [hadoop@nn-1 ~]$ vim .bash_profile
追加如下内容:
  1. export HADOOP_HOME=/hadoop/hadoop
  2. export SCALA_HOME=/hadoop/scala-2.11.7
  3. export SPARK_HOME=/hadoop/spark-1.6.0-bin-hadoop2.6
  4. export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH

7.4 分发上面配置好的scala和spark目录到其他节点

  1. [hadoop@nn-1 bin]$ cd /hadoop
  2. [hadoop@nn-1 ~]$ scp -rp spark-1.6.0-bin-hadoop2.6 hadoop@dn-1:/hadoop
  3. [hadoop@nn-1 ~]$ scp -rp scala-2.11.7 hadoop@dn-1:/hadoop

7.5 启动Spark集群

  1. [hadoop@nn-1 ~]$ /hadoop/spark-1.6.0-bin-hadoop2.6/sbin/start-all.sh

在nn-1和其他slaves节点查看进程:
在nn-1节点, 可以看到Master进程:
  1. [hadoop@nn-1 ~]$ jps
  2. 2473 JournalNode
  3. 2541 NameNode
  4. 4401 Jps
  5. 2399 DFSZKFailoverController
  6. 2687 JobHistoryServer
  7. 2775 Master
  8. 2351 QuorumPeerMain

在slaves节点可以看到Worker进程:
  1. [hadoop@dn-1 ~]$ jps
  2. 2522 NodeManager
  3. 3449 Jps
  4. 2007 QuorumPeerMain
  5. 2141 DataNode
  6. 2688 Worker
  7. 2061 JournalNode
  8. 2258 ResourceManager

查看spark页面:

 

7.6 运行测试案例

./bin/spark-submit --class org.apache.spark.examples.SparkPi \

--master yarn --deploy-mode cluster \

--driver-memory 100M \

--executor-memory 200M \

--executor-cores 1 \

--queue default \

lib/spark-examples*.jar 10

或者:

./bin/spark-submit --class org.apache.spark.examples.SparkPi \

--master yarn --deploy-mode cluster \

--executor-cores 1 \

--queue default \

lib/spark-examples*.jar 10


 
 
 



8、配置机架感知

在nn-1和nn-2节点的配置文件/hadoop/hadoop/etc/hadoop/core-site.xml加入如下配置:
  1. <property>
  2. <name>topology.script.file.name</name>
  3. <value>/hadoop/hadoop/etc/hadoop/RackAware.py</value>
  4. </property>
新增文件:/hadoop/hadoop/etc/hadoop/RackAware.py,内容如下:
  1. #!/usr/bin/python
  2. #-*-coding:UTF-8 -*-
  3. import sys
  4. rack = {"dn-1":"rack2",
  5. "dn-2":"rack1",
  6. "dn-3":"rack1",
  7. "192.168.9.23":"rack2",
  8. "192.168.9.24":"rack1",
  9. "192.168.9.25":"rack1",
  10. }
  11. if __name__=="__main__":
  12. print "/" + rack.get(sys.argv[1],"rack0")
设置权限:
  1. [root@nn-1 hadoop]# chmod +x RackAware.py
  2. [root@nn-1 hadoop]# ll RackAware.py
  3. -rwxr-xr-x 1 hadoop hadoop 294 3月 1 21:24 RackAware.py

重启nn-1和nn-2上的NameNode服务:
  1. [hadoop@nn-1 ~]$ hadoop-daemon.sh stop namenode
  2. stopping namenode
  3. [hadoop@nn-1 ~]$ hadoop-daemon.sh start namenode
  4. starting namenode, logging to /hadoop/hadoop/logs/hadoop-hadoop-namenode-nn-1.out

查看日志:
  1. [root@nn-1 logs]# pwd
  2. /hadoop/hadoop/logs
  3. [root@nn-1 logs]# vim hadoop-hadoop-namenode-nn-1.log

 


使用命令查看拓扑:
  1. [hadoop@dn-3 ~]$ hdfs dfsadmin -printTopology
  2. 17/03/02 00:21:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Rack: /rack1
  4. 192.168.9.24:50010 (dn-2)
  5. 192.168.9.25:50010 (dn-3)
  6. Rack: /rack2
  7. 192.168.9.23:50010 (dn-1)







Apache Hadoop集群安装(NameNode HA + YARN HA + SPARK + 机架感知)的更多相关文章

  1. Apache Hadoop 集群安装文档

    简介: Apache Hadoop 集群安装文档 软件:jdk-8u111-linux-x64.rpm.hadoop-2.8.0.tar.gz http://www.apache.org/dyn/cl ...

  2. Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)

    1.主机规划 序号 主机名 IP地址 角色 1 nn-1 192.168.9.21 NameNode.mr-jobhistory.zookeeper.JournalNode 2 nn-2 ).HA的集 ...

  3. Apache Hadoop集群离线安装部署(一)——Hadoop(HDFS、YARN、MR)安装

    虽然我已经装了个Cloudera的CDH集群(教程详见:http://www.cnblogs.com/pojishou/p/6267616.html),但实在太吃内存了,而且给定的组件版本是不可选的, ...

  4. Apache Hadoop集群离线安装部署(二)——Spark-2.1.0 on Yarn安装

    Apache Hadoop集群离线安装部署(一)——Hadoop(HDFS.YARN.MR)安装:http://www.cnblogs.com/pojishou/p/6366542.html Apac ...

  5. Apache Hadoop集群离线安装部署(三)——Hbase安装

    Apache Hadoop集群离线安装部署(一)——Hadoop(HDFS.YARN.MR)安装:http://www.cnblogs.com/pojishou/p/6366542.html Apac ...

  6. 最近有安装了一次hadoop集群,NameNode启动失败,及原因

    最近有安装了一次hadoop集群,NameNode启动失败,查看日志,找到以下原因: 遇到的异常1: org.apache.hadoop.hdfs.server.common.Inconsistent ...

  7. 2 Hadoop集群安装部署准备

    2 Hadoop集群安装部署准备 集群安装前需要考虑的几点硬件选型--CPU.内存.磁盘.网卡等--什么配置?需要多少? 网络规划--1 GB? 10 GB?--网络拓扑? 操作系统选型及基础环境-- ...

  8. 1.Hadoop集群安装部署

    Hadoop集群安装部署 1.介绍 (1)架构模型 (2)使用工具 VMWARE cenos7 Xshell Xftp jdk-8u91-linux-x64.rpm hadoop-2.7.3.tar. ...

  9. Hadoop集群安装-CDH5(5台服务器集群)

    CDH5包下载:http://archive.cloudera.com/cdh5/ 架构设计: 主机规划: IP Host 部署模块 进程 192.168.254.151 Hadoop-NN-01 N ...

随机推荐

  1. BUPTOJj83

    83. A + B Problem 时间限制 1000 ms 内存限制 65536 KB 题目描述 Calculate the sum of two given integers A and B. 输 ...

  2. Springboot源码分析之代理三板斧

    摘要: 在Spring的版本变迁过程中,注解发生了很多的变化,然而代理的设计也发生了微妙的变化,从Spring1.x的ProxyFactoryBean的硬编码岛Spring2.x的Aspectj注解, ...

  3. deepin 15.11 安装 pyenv

    GitHub:官方环境:https://github.com/pyenv/pyenv/wiki/Common-build-problems GitHub:官方文档:https://github.com ...

  4. python 35 多线程

    目录 多线程 1. 线程 2. 线程vs进程 3. 开启线程的两种方法. 4. 线程的特性 5. 线程的相关方法 6. join 阻塞 7. 守护线程 daemon 8. 互斥锁 多线程 1. 线程 ...

  5. HDU 6044

    题意略. 思路: I.对于整个区间a1,....,an,必然有一个区间[1,n]与之对应,因为a1,...,an是1,...,n的一个排列,所以在[1,n]中定然有一个最小的数字1, 如果最大的区间[ ...

  6. IO核心子系统

    IO核心子系统 一.IO层次结构 IO实现普遍采用了层次式的结构.其基本思想与计算机网络中的层次结构相同:将系统IO的功能组织成一系列的层次,每一层完成整个系统功能的一个子集,其实现依赖于下层完成更原 ...

  7. [目录] ASP.Net Core 搭建微服务网站

    本项目采用ASP.Net Core微服务技术,搭建博客和Saas平台. 全文将围绕(1)设计模式  (2)敏捷开发 目的: 结构足够合理,代码足够优美,扩展性.可读性.易维护性做到最优. 以下目录仅为 ...

  8. XMind使用教程入门

    什么是思维导图 借用百度百科的介绍,思维导图又称脑图.心智导图.是一种将思维形象化的方法,它利用图文并重的方法,将各级主题之间的关系用相互隶属与相关的层级图表现出来,将主题关键词与图像.颜色等建立记忆 ...

  9. Docker swarm 获取service的container信息

    我们可以通过docker service create创建服务,例如: docker service create --name mysql mysql:latest 服务创建好后,如何来获取该ser ...

  10. zoj 5823 Soldier Game 2018 青岛 I

    题目传送门 题意:现在有n个人,现在可以把这n个人分成若干组,只有连续的人才能被分为一组,并且一个组内最多2个人,现在问你 所有组内的最大值-最小值 这个差值最小是多少. 题解: 将每个人的情况3种情 ...