Hadoop2.x 集群搭建

一些重复的细节参考Hadoop1.X集群完全分布式模式环境部署

1 HADOOP 集群搭建

1.1 集群简介

HADOOP 集群具体来说包含两个集群:HDFS 集群和YARN集群,两者逻辑上分离,但物理上常在一起.

  • HDFS集群:负责海量数据的存储,集群中的角色主要有 NameNode / DataNode
  • YARN集群:负责海量数据运算时的资源调度,集群中的角色主要有 ResourceManager /NodeManager

本集群搭建案例,以 5 节点为例进行搭建,角色分配如下:

结点 角色 IP
node1 NameNode
SecondaryNameNode
192.168.33.200
node2 ResourceManager 192.168.33.201
node3 DataNode
NodeManager
192.168.33.202
node4 DataNode
NodeManager
192.168.33.203
node5 DataNode
NodeManager
192.168.33.204

部署图如下:

1.2 服务器准备

本案例使用虚拟机服务器来搭建 HADOOP 集群,所用软件及版本:

★ paraller Desktop 12

★ Centos 6.5 64bit

1.3 网络环境准备

  • 采用 NAT 方式联网
  • 网关地址:192.168.33.1
  • 5个服务器节点 IP 地址:
    • 192.168.33.200,
    • 192.168.33.201,
    • 192.168.33.202,
    • 192.168.33.203,
    • 192.168.33.204
  • 子网掩码:255.255.255.0

1.4 服务器系统设置

  • 添加 HADOOP 用户
  • 为 HADOOP 用户分配 sudoer 权限
  • 设置主机名
    • node1
    • node2
    • node3
    • node4
    • node5
  • 配置内网域名映射:
    • 192.168.33.200--------node1
    • 192.168.33.201--------node2
    • 192.168.33.202--------node3
    • 192.168.33.203--------node4
    • 192.168.33.204--------node5
  • 配置 ssh 免密登陆
  • 配置防火墙

1.5 环境安装

  • 上传 jdk 安装包
  • 规划安装目录 /home/hadoop/apps/jdk_1.7.65
  • 解压安装包
  • 配置环境变量 /etc/profile

1.6 HADOOP 安装部署

  • 上传 HADOOP 安装包
  • 规划安装目录 /home/hadoop/apps/hadoop-2.6.1
  • 解压安装包
  • 修改配置文件 $HADOOP_HOME/etc/hadoop/

最简化配置如下:

vi hadoop-env.sh

/home/hd2/tmp目录要先建好

# The java implementation to use.

export JAVA_HOME=/usr/local/jdk1.7.0_65

vi core-site.xml

<configuration> 

<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:9000</value>
</property> <property>
<name>hadoop.tmp.dir</name>
<value>/home/hd2/tmp</value>
</property> <property>
<name>hadoop.logfile.size</name>
<value>10000000</value>
<description>The max size of each log file</description>
</property> <property>
<name>hadoop.logfile.count</name>
<value>10</value>
<description>The max number of log files</description>
</property> </configuration>

vi hdfs-site.xml

<configuration>

<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hd2/data/name</value>
</property> <property>
<name>dfs.datanode.data.dir</name>
<value>/home/hd2/data/data</value>
</property> <property>
<name>dfs.replication</name>
<value>3</value>
</property> <property>
<name>dfs.secondary.http.address</name>
<value>node1:50090</value>
</property>
ca </configuration>

vi mapred-site.xml

<configuration> 

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property> </configuration>

vi yarn-site.xml

<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property> <property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

vi salves

node1
node2
node3
node4
node5

1.7 启动集群

初始化 HDFS

bin/hadoop namenode -format

启动 HDFS

sbin/start-dfs.sh

启动 YARN

sbin/start-yarn.sh

1.8 验证集群

浏览器访问http://192.168.33.200:50070



1.9 用worldcount程序测试集群

1.建立一个测试的目录

[hd2@node1 hadoop-2.4.1]$ hadoop fs -mkdir input

2.检验input文件夹是否创建成功

[hd2@node1 hadoop-2.4.1]$ hadoop fs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2014-08-18 09:02 input

3.建立测试文件

[hd2@node1 hadoop-2.4.1]$ vi test.txt

hello hadoop


hello World


Hello Java


Hey man


i am a programmer

4.将测试文件放到测试目录中

[hd2@node1 hadoop-2.4.1]$ hadoop fs -put test.txt input/

5.检验test.txt文件是否已经导入

[hd2@node1 hadoop-2.4.1]$ hadoop fs -ls input/
Found 1 items
-rw-r--r-- 1 root supergroup 62 2014-08-18 09:03 input/test.txt

6.执行wordcount程序

[hd2@node1 hadoop-2.4.1]$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount input/ output/

执行过程

17/04/19 21:07:19 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.33.200:8032
17/04/19 21:07:19 INFO input.FileInputFormat: Total input paths to process : 2
17/04/19 21:07:20 INFO mapreduce.JobSubmitter: number of splits:2
17/04/19 21:07:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492605823444_0003
17/04/19 21:07:20 INFO impl.YarnClientImpl: Submitted application application_1492605823444_0003
17/04/19 21:07:20 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1492605823444_0003/
17/04/19 21:07:20 INFO mapreduce.Job: Running job: job_1492605823444_0003
17/04/19 21:07:26 INFO mapreduce.Job: Job job_1492605823444_0003 running in uber mode : false
17/04/19 21:07:26 INFO mapreduce.Job: map 0% reduce 0%
17/04/19 21:07:33 INFO mapreduce.Job: map 100% reduce 0%
17/04/19 21:07:40 INFO mapreduce.Job: map 100% reduce 100%
17/04/19 21:07:42 INFO mapreduce.Job: Job job_1492605823444_0003 completed successfully
17/04/19 21:07:42 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=68
FILE: Number of bytes written=279333
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=246
HDFS: Number of bytes written=25
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=8579
Total time spent by all reduces in occupied slots (ms)=5101
Total time spent by all map tasks (ms)=8579
Total time spent by all reduce tasks (ms)=5101
Total vcore-seconds taken by all map tasks=8579
Total vcore-seconds taken by all reduce tasks=5101
Total megabyte-seconds taken by all map tasks=8784896
Total megabyte-seconds taken by all reduce tasks=5223424
Map-Reduce Framework
Map input records=2
Map output records=6
Map output bytes=62
Map output materialized bytes=74
Input split bytes=208
Combine input records=6
Combine output records=5
Reduce input groups=3
Reduce shuffle bytes=74
Reduce input records=5
Reduce output records=3
Spilled Records=10
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=430
CPU time spent (ms)=1550
Physical memory (bytes) snapshot=339206144
Virtual memory (bytes) snapshot=1087791104
Total committed heap usage (bytes)=242552832
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=38
File Output Format Counters
Bytes Written=25

执行结果

[hd2@node1 hadoop-2.4.1]$ hadoop fs -ls /user/hd2/out/
Found 2 items
-rw-r--r-- 3 hd2 supergroup 0 2017-04-19 21:07 /user/hd2/out/_SUCCESS
-rw-r--r-- 3 hd2 supergroup 25 2017-04-19 21:07 /user/hd2/out/part-r-00000
[hd2@node1 hadoop-2.4.1]$ hadoop fs -cat /user/hd2/out/part-r-00000
hadoop 2
hello 3
world 1

Hadoop2.x 集群搭建的更多相关文章

  1. Hadoop2.20集群搭建

    hadoop2.0已经发布了稳定版本了,增加了很多特性,比如HDFS HA.YARN等. 注意:apache提供的hadoop-2.2.0的安装包是在32位操作系统编译的,因为hadoop依赖一些C+ ...

  2. hadoop2.2集群搭建问题只能启动一个datanode问题

    按照教程http://cn.soulmachine.me/blog/20140205/搭建总是出现如下问题: 2014-04-13 23:53:45,450 INFO org.apache.hadoo ...

  3. 大数据学习——hadoop2.x集群搭建

    1.准备Linux环境 1.0先将虚拟机的网络模式选为NAT 1.1修改主机名 vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=itcast ### ...

  4. Hadoop2学习记录(1) |HA完全分布式集群搭建

    准备 系统:CentOS 6或者RedHat 6(这里用的是64位操作) 软件:JDK 1.7.hadoop-2.3.0.native64位包(可以再csdn上下载,这里不提供了) 部署规划 192. ...

  5. centos下hadoop2.6.0集群搭建详细过程

    一 .centos集群环境配置 1.创建一个namenode节点,5个datanode节点 主机名 IP namenodezsw 192.168.129.158 datanode1zsw 192.16 ...

  6. 懒人记录 Hadoop2.7.1 集群搭建过程

    懒人记录 Hadoop2.7.1 集群搭建过程 2016-07-02 13:15:45 总结 除了配置hosts ,和免密码互连之外,先在一台机器上装好所有东西 配置好之后,拷贝虚拟机,配置hosts ...

  7. hadoop2.2.0的ha分布式集群搭建

    hadoop2.2.0 ha集群搭建 使用的文件如下:    jdk-6u45-linux-x64.bin    hadoop-2.2.0.x86_64.tar    zookeeper-3.4.5. ...

  8. Hadoop2.0 HA集群搭建步骤

    上一次搭建的Hadoop是一个伪分布式的,这次我们做一个用于个人的Hadoop集群(希望对大家搭建集群有所帮助): 集群节点分配: Park01 Zookeeper NameNode (active) ...

  9. hadoop2.6.0集群搭建

    p.MsoNormal { margin: 0pt; margin-bottom: .0001pt; text-align: justify; font-family: Calibri; font-s ...

随机推荐

  1. Ubuntu 19.04 安装docker

    配置国内源: deb https://mirrors.ustc.edu.cn/ubuntu/ disco main restricted universe multiverse deb https:/ ...

  2. vue----子组件引用vux popup mask遮罩在最上层解决办法 z-index问题

    在一个页面的子组件中引用vux的popup组件时,出现mask遮罩在最上层的问题,百度了一下发现有两种解决办法,现提供第三种. popup在子组件引用时,vux将vux-popup-mask默认添加到 ...

  3. 对回溯算法的理解(以数独游戏为例,使用c++实现)

    算法思想: 数独游戏的规则: 每一行都用到1.2.3.4.5.6.7.8.9位置不限: 每一列都用到1.2.3.4.5.6.7.8.9位置不限: 每3×3的格子(共九个这样的格子)都用到1.2.3.4 ...

  4. linux , nginx: 封禁IP的办法【转】

    今天,我们的一台服务器出了问题: 被若干IP地址访问某个接口,该接口会发送短信. 所以,我们可以做两件事: 1. nginx的层面封IP .  2  linux server的层面封IP 先看ngin ...

  5. 【phpstudy2016】apache配置Tp5.0,获取表单数据总是多了一个路由变量,解决

    1.用的apahce配置tp5.0的php环境 2.发现input()过来的数据,总是多了一个变量,那就是路由变量, 类似[array(2) { ["/index/index/form_su ...

  6. server computer (实验室移动服务器环境)

    star@xmatrix:~$ lshwWARNING: you should run this program as super-user.xmatrix                       ...

  7. "What's your problem?"记住!聊天千万不能用这句话!

    "What's your problem?"记住!聊天千万不能用这句话!  2018-01-05 19:21 这个世界套路太多 学英语也不例外 一不留神就陷入套路里 有一种痛叫做“ ...

  8. .Net Oracle TransactionScope的使用

    IIS服务器和Oracle服务器: 1.配置msdtc允许DTC访问及启用事务 2.配置msdtc程序入站出站例外 3.数据库连接字符串不能带enlist=false标识 如下这样带enlist=fa ...

  9. Anaconda + PyCharm + Pytorch

    Anaconda 1.  下载Anaconda https://www.anaconda.com/download/ 2.  安装 3. 添加环境变量 Path - C:\Users\Godzilla ...

  10. DApp是什么,DApp是必然趋势

    DApp是什么,DApp是必然趋势  https://www.jianshu.com/p/dfe3098de0de Thehrdertheluck关注 12018.04.23 11:54:00字数 2 ...