1、硬件配置

采用3台虚拟机

节点名称 IP地址 内存 硬盘 节点角色
node1 192.168.1.6 2GB 10GB NameNode、ResoucerManager
node2 192.168.1.7 2GB 10GB DataNode、NodeManager、SecondaryNameNode
node3 192.168.1.8 2GB 10GB DataNode、NodeManager

2、软件版本

软件 版本
JDK jdk-8u271
HADOOP hadoop-3.2.1

3、准备工作

3.1、建立虚拟机,网络设置为桥接模式

3.2、更改主机名

[root@node1 ~]# vi /etc/hostname
[root@node1 ~]# reboot
[root@node1 ~]# cat /etc/hostname
node1
[root@node1 ~]# hostname
node1
[root@node2 ~]# hostname # 其他节点同理
node2
[root@node3 ~]# hostname
node3

3.3、绑定主机名和IP,建立各主机间的联系

在 node1 上执行如下步骤:

[root@node1 ~]# vi /etc/hosts # 添加如下三行内容  IP地址 节点名称
[root@node1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.6 node1
192.168.1.7 node2
192.168.1.8 node3 # 将hosts文件复制到node2和node3节点
[root@node1 ~]# scp /etc/hosts node2:/etc/
The authenticity of host 'node2 (192.168.1.7)' can't be established.
ECDSA key fingerprint is SHA256:8MU51OTPEjoMAEsg3eOMgAJBy3L4nuSMX1RGWN8ew/w.
ECDSA key fingerprint is MD5:00:2a:ce:9a:66:9b:42:af:a6:8e:74:07:a9:01:52:dc.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2,192.168.1.7' (ECDSA) to the list of known hosts.
hosts
[root@node1 ~]# ping node2
PING node2 (192.168.1.7) 56(84) bytes of data.
64 bytes from node2 (192.168.1.7): icmp_seq=1 ttl=64 time=0.404 ms
64 bytes from node2 (192.168.1.7): icmp_seq=2 ttl=64 time=0.617 ms
64 bytes from node2 (192.168.1.7): icmp_seq=3 ttl=64 time=0.828 ms
^C
--- node2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2016ms
rtt min/avg/max/mdev = 0.404/0.616/0.828/0.174 ms
[root@node1 ~]# ping node3
PING node3 (192.168.1.8) 56(84) bytes of data.
64 bytes from node3 (192.168.1.8): icmp_seq=1 ttl=64 time=1.59 ms
64 bytes from node3 (192.168.1.8): icmp_seq=2 ttl=64 time=0.496 ms
64 bytes from node3 (192.168.1.8): icmp_seq=3 ttl=64 time=0.443 ms
^C
--- node3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2015ms
rtt min/avg/max/mdev = 0.443/0.843/1.592/0.530 ms [root@node2 ~]# ping node1
PING node1 (192.168.1.6) 56(84) bytes of data.
64 bytes from node1 (192.168.1.6): icmp_seq=1 ttl=64 time=0.325 ms
64 bytes from node1 (192.168.1.6): icmp_seq=2 ttl=64 time=0.864 ms
^C
--- node1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.325/0.594/0.864/0.270 ms
[root@node2 ~]# ping node3
PING node3 (192.168.1.8) 56(84) bytes of data.
64 bytes from node3 (192.168.1.8): icmp_seq=1 ttl=64 time=1.58 ms
64 bytes from node3 (192.168.1.8): icmp_seq=2 ttl=64 time=0.728 ms
^C
--- node3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1012ms
rtt min/avg/max/mdev = 0.728/1.158/1.589/0.431 ms [root@node3 ~]# ping node1
PING node1 (192.168.1.6) 56(84) bytes of data.
64 bytes from node1 (192.168.1.6): icmp_seq=1 ttl=64 time=0.372 ms
64 bytes from node1 (192.168.1.6): icmp_seq=2 ttl=64 time=0.395 ms
^C
--- node1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1003ms
rtt min/avg/max/mdev = 0.372/0.383/0.395/0.022 ms
[root@node3 ~]# ping node2
PING node2 (192.168.1.7) 56(84) bytes of data.
64 bytes from node2 (192.168.1.7): icmp_seq=1 ttl=64 time=0.874 ms
64 bytes from node2 (192.168.1.7): icmp_seq=2 ttl=64 time=1.03 ms
^C
--- node2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1006ms
rtt min/avg/max/mdev = 0.874/0.955/1.036/0.081 ms

3.4、关闭防火墙

[root@node1 ~]# systemctl stop firewalld.service
[root@node1 ~]# firewall-cmd --state
not running
[root@node1 ~]# systemctl disable firewalld.service # 禁止firewall开机启动

3.5、配置宿主机hosts文件

使宿主机和虚拟机系统可以相互ping通

C:\Windows\System32\drivers\etc\hosts 目录下,添加如下内容:

192.168.1.6 node1
192.168.1.7 node2
192.168.1.8 node3
C:\Users\zgg>ping node1

正在 Ping node1 [192.168.1.6] 具有 32 字节的数据:
来自 192.168.1.6 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.6 的回复: 字节=32 时间<1ms TTL=64 192.168.1.6 的 Ping 统计信息:
数据包: 已发送 = 2,已接收 = 2,丢失 = 0 (0% 丢失),
往返行程的估计时间(以毫秒为单位):
最短 = 0ms,最长 = 0ms,平均 = 0ms
Control-C
^C
C:\Users\zgg>ping node2 正在 Ping node2 [192.168.1.7] 具有 32 字节的数据:
来自 192.168.1.7 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.7 的回复: 字节=32 时间<1ms TTL=64 192.168.1.7 的 Ping 统计信息:
数据包: 已发送 = 2,已接收 = 2,丢失 = 0 (0% 丢失),
往返行程的估计时间(以毫秒为单位):
最短 = 0ms,最长 = 0ms,平均 = 0ms
Control-C
^C
C:\Users\zgg>ping node3 正在 Ping node3 [192.168.1.8] 具有 32 字节的数据:
来自 192.168.1.8 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.8 的回复: 字节=32 时间<1ms TTL=64 192.168.1.8 的 Ping 统计信息:
数据包: 已发送 = 2,已接收 = 2,丢失 = 0 (0% 丢失),
往返行程的估计时间(以毫秒为单位):
最短 = 0ms,最长 = 0ms,平均 = 0ms
Control-C
^C

3.6、配置SSH,实现节点间的无密码登录

无密码登陆:在 node1 上,通过 ssh node2ssh node3 就可以登陆到对方计算机上,而不用输入密码。

分别在三台虚拟机的 /root 目录下执行:

ssh-keygen -t rsa

设置 ssh 的密钥和密钥的存放路径。 路径为~/.ssh

进入到 .ssh 目录,执行如下命令,将公钥放到 authorized_keys 里:

cp id_rsa.pub  authorized_keys

将 node1 上的 authorized_keys 放入其他虚拟机的 ~/.ssh 目录下:

scp authorized_keys test2:~/.ssh/
scp authorized_keys test3:~/.ssh/
[root@node1 ~]# ssh node2
Last login: Thu Nov 12 15:38:28 2020 from node1
[root@node2 ~]# exit
登出
Connection to node2 closed.

4、安装JDK

在 node1 上,下载,解压,并配置环境变量:

[root@node1 opt]# tar -zxvf jdk-8u271-linux-x64.tar.gz
... [root@node1 opt]# vi /etc/profile
[root@node1 opt]# source /etc/profile
[root@node1 opt]# java -version
java version "1.8.0_271"
Java(TM) SE Runtime Environment (build 1.8.0_271-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.271-b09, mixed mode)
[root@node1 opt]# cat /etc/profile
# /etc/profile
... export JAVA_HOME=/opt/jdk1.8.0_271
export PATH=.:$JAVA_HOME/bin:$PATH
...

将 jdk1.8.0_271 复制到 node2 和 node3

[root@node1 opt]# scp -r jdk1.8.0_271/  node2:/opt/
[root@node1 opt]# scp -r jdk1.8.0_271/ node3:/opt/

将 /etc/profile 复制到 node2 和 node3

[root@node1 opt]# scp  /etc/profile  node2:/etc/
profile 100% 1890 1.4MB/s 00:00
[root@node1 opt]# scp /etc/profile node3:/etc/
profile 100% 1890 1.7MB/s 00:00
[root@node2 opt]# source /etc/profile
[root@node3 opt]# source /etc/profile

5、安装Hadoop

在 node1 上,下载,解压,并配置环境变量:

[root@node1 opt]# tar -zxvf hadoop-3.2.1.tar.gz
...
[root@node1 opt]# vi /etc/profile
[root@node1 opt]# source /etc/profile
[root@node1 opt]# cat /etc/profile
# /etc/profile
... export JAVA_HOME=/opt/jdk1.8.0_271
export HADOOP_HOME=/opt/hadoop-3.2.1 export PATH=.:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH

将 /etc/profile 复制到 node2 和 node3

[root@node1 opt]# scp  /etc/profile  node2:/etc/
profile 100% 1945 1.6MB/s 00:00
[root@node1 opt]# scp /etc/profile node3:/etc/
profile 100% 1945 1.5MB/s 00:00
[root@node2 opt]# source /etc/profile
[root@node3 opt]# source /etc/profile

配置配置文件后,将 hadoop-3.2.1 复制到 node2 和 node3

[root@node1 opt]# scp -r hadoop-3.2.1/  node2:/opt/
[root@node1 opt]# scp -r hadoop-3.2.1/ node3:/opt/

6、格式化

对 node1 :

[root@node1 hadoop-3.2.1]# hdfs namenode -format  # 格式化
2020-11-12 21:43:16,999 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = node1/192.168.1.6
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.2.1
STARTUP_MSG: classpath = /opt/hadoop-3.2.1/etc/
...
2020-11-12 21:43:20,696 INFO common.Storage: Storage directory /opt/hadoop-3.2.1/dfs/namenode has been successfully formatted.
2020-11-12 21:43:20,762 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/hadoop-3.2.1/dfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
2020-11-12 21:43:20,859 INFO namenode.FSImageFormatProtobuf: Image file /opt/hadoop-3.2.1/dfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 399 bytes saved in 0 seconds .
2020-11-12 21:43:20,866 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2020-11-12 21:43:20,874 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2020-11-12 21:43:20,874 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node1/192.168.1.6
************************************************************/

如果再次格式化,需要先删除 namenode 和 datanode 上的 dfs/namenodedfs/datanode目录。

7、启动

可以全部启动,也可以分别启动。

[root@node1 -3.2.1]# sbin/start-all.sh
Starting namenodes on [node1]
上一次登录:六 11月 14 20:28:04 CST 2020pts/0 上
Starting datanodes
上一次登录:六 11月 14 20:30:51 CST 2020pts/0 上
Starting secondary namenodes [node2]
上一次登录:六 11月 14 20:30:54 CST 2020pts/0 上
Starting resourcemanager
上一次登录:六 11月 14 20:30:59 CST 2020pts/0 上
Starting nodemanagers
上一次登录:六 11月 14 20:31:05 CST 2020pts/0 上
[root@node1 hadoop-3.2.1]# mapred --daemon start historyserver
[root@node1 hadoop]# jps
11524 ResourceManager
11927 Jps
11899 JobHistoryServer
11100 NameNode [root@node2 hadoop-3.2.1]# jps
8210 DataNode
8312 SecondaryNameNode
8393 NodeManager
8507 Jps [root@node3 hadoop-3.2.1]# jps
17760 DataNode
17981 Jps
17870 NodeManager

8、测试wordcount

[root@node1 hadoop-3.2.1]# hadoop jar /opt/hadoop-3.2.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /in/wc.txt /out
2020-11-15 00:43:37,940 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.1.6:8032
2020-11-15 00:43:38,763 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1605372113315_0001
2020-11-15 00:43:38,945 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-11-15 00:43:39,647 INFO input.FileInputFormat: Total input files to process : 1
2020-11-15 00:43:39,695 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-11-15 00:43:39,731 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-11-15 00:43:39,770 INFO mapreduce.JobSubmitter: number of splits:1
2020-11-15 00:43:39,960 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-11-15 00:43:39,999 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1605372113315_0001
2020-11-15 00:43:39,999 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-11-15 00:43:40,196 INFO conf.Configuration: resource-types.xml not found
2020-11-15 00:43:40,196 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-11-15 00:43:40,664 INFO impl.YarnClientImpl: Submitted application application_1605372113315_0001
2020-11-15 00:43:40,808 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1605372113315_0001/
2020-11-15 00:43:40,809 INFO mapreduce.Job: Running job: job_1605372113315_0001
2020-11-15 00:43:52,004 INFO mapreduce.Job: Job job_1605372113315_0001 running in uber mode : false
2020-11-15 00:43:52,005 INFO mapreduce.Job: map 0% reduce 0%
2020-11-15 00:43:59,092 INFO mapreduce.Job: map 100% reduce 0%
2020-11-15 00:44:05,137 INFO mapreduce.Job: map 100% reduce 100%
2020-11-15 00:44:06,168 INFO mapreduce.Job: Job job_1605372113315_0001 completed successfully
2020-11-15 00:44:06,284 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=67
FILE: Number of bytes written=452639
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=149
HDFS: Number of bytes written=41
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4936
Total time spent by all reduces in occupied slots (ms)=3495
Total time spent by all map tasks (ms)=4936
Total time spent by all reduce tasks (ms)=3495
Total vcore-milliseconds taken by all map tasks=4936
Total vcore-milliseconds taken by all reduce tasks=3495
Total megabyte-milliseconds taken by all map tasks=5054464
Total megabyte-milliseconds taken by all reduce tasks=3578880
Map-Reduce Framework
Map input records=3
Map output records=9
Map output bytes=93
Map output materialized bytes=67
Input split bytes=92
Combine input records=9
Combine output records=5
Reduce input groups=5
Reduce shuffle bytes=67
Reduce input records=5
Reduce output records=5
Spilled Records=10
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=162
CPU time spent (ms)=1180
Physical memory (bytes) snapshot=322658304
Virtual memory (bytes) snapshot=5471531008
Total committed heap usage (bytes)=170004480
Peak Map Physical memory (bytes)=210452480
Peak Map Virtual memory (bytes)=2732470272
Peak Reduce Physical memory (bytes)=112205824
Peak Reduce Virtual memory (bytes)=2739060736
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=57
File Output Format Counters
Bytes Written=41

出现了如下问题:

(1)通过 yarn 提交任务出现 Failed while trying to construct the redirect url to the log server. Log Server url may not be configured

原因是未配置 historyserver 服务。配置如下属性:

<!-- mapred-site.xml-->
<property>
<!-- MapReduce JobHistory Server IPC host:port -->
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
</property>
<property>
<!-- MapReduce JobHistory Server Web UI host:port -->
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
</property>
<!-- yarn-site.xml-->
<property>
<name>yarn.log.server.url</name>
<value>http://node1:19888/jobhistory/logs</value>
</property>

(2)执行作业时,出现了 错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster

[root@node1 hadoop-3.2.1]# hadoop classpath
/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoop/yarn:/opt/hadoop-3.2.1/share/hadoop/yarn/lib/*:/opt/hadoop-3.2.1/share/hadoop/yarn/*

将上述值添加到 yarn-site.xml 文件如下属性中:

    <property>
<name>yarn.application.classpath</name>
<value>/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoop/yarn:/opt/hadoop-3.2.1/share/hadoop/yarn/lib/*:/opt/hadoop-3.2.1/share/hadoop/yarn/*</value>
</property>

(3)执行作业时,出现了 The auxService:mapreduce_shuffle does not exist 错误。

因为在复制 yarn-site.xml 时漏掉了 yarn.nodemanager.aux-services 属性。

(4)第一次执行作业的时候,输出日志一直卡在 INFO mapreduce.Job: Running job: job_1605371813670_0001 。这个问题首先要考虑配置文件是否正确,其次考虑yarn的资源分配。

9、注意点

(1)如果某个进程启动失败了,考虑配置文件是不是配置错误了,或者格式化的时候未清理上次集群的id。

(2)如果启动,出现了 ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. 错误,说明在 hadoop-env.sh 中未配置此项。具体配置内容见下面的配置文件。

(3)在 Hadoop3.x 中,NameNode 的 web 端口改成了 9870

(4)配置文件的配置可以同时参考 官网集群搭建官网core-site.xml官网hdfs-site.xml官网yarn-site.xml官网mapred-site.xml

(5)在跑任务时,注意资源的分配。

10、配置文件

[root@node1 hadoop]# pwd
/opt/hadoop-3.2.1/etc/hadoop
[root@node1 hadoop]# ls
capacity-scheduler.xml hadoop-user-functions.sh.example kms-log4j.properties ssl-client.xml.example
configuration.xsl hdfs-site.xml kms-site.xml ssl-server.xml.example
container-executor.cfg httpfs-env.sh log4j.properties user_ec_policies.xml.template
core-site.xml httpfs-log4j.properties mapred-env.cmd workers
hadoop-env.cmd httpfs-signature.secret mapred-env.sh yarn-env.cmd
hadoop-env.sh httpfs-site.xml mapred-queues.xml.template yarn-env.sh
hadoop-metrics2.properties kms-acls.xml mapred-site.xml yarnservice-log4j.properties
hadoop-policy.xml kms-env.sh shellprofile.d yarn-site.xml

管理员应该通过设置 etc/hadoop/hadoop-env.sh,和可选的 etc/hadoop/mapred-env.shetc/hadoop/yarn-env.sh 脚本来对 Hadoop 守护进程环境进行个性化设置,比如,设置 namenode 使用多少堆内存。

至少,你需要在每个远程结点上指定 JAVA_HOME 。

# 在 node1、node2、node3 节点:
[root@node1 hadoop]# vi hadoop-env.sh
...
###
# Generic settings for HADOOP
### # Technically, the only required environment variable is JAVA_HOME.
# All others are optional. However, the defaults are probably not
# preferred. Many sites configure these options outside of Hadoop,
# such as in /etc/profile.d # The java implementation to use. ...
# export JAVA_HOME=
export JAVA_HOME=/opt/jdk1.8.0_271 export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
# 在 node1、node2、node3 节点:
[root@node1 hadoop]# cat core-site.xml
...
<!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<!-- 指定namenode的hdfs协议的文件系统通信地址 -->
<name>fs.defaultFS</name>
<value>hdfs://node1:9000</value>
</property>
<property>
<!-- 文件IO缓冲区的大小,131072KB(64M),是系统默认值 -->
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<!-- hadoop临时目录 -->
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-3.2.1/tmp</value>
</property>
</configuration>
# 在 node1、node2、node3 节点:
[root@node1 hadoop]# cat hdfs-site.xml
...
<!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<!-- NameNode持久存储命名空间和事务日志的本地文件系统上的路径 -->
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop-3.2.1/dfs/namenode</value>
</property>
<property>
<!-- List of permitted DataNodes. -->
<name>dfs.hosts</name>
<value>/opt/hadoop-3.2.1/etc/hadoop/workers</value>
</property>
<property>
<!-- 配置 secondary namenodes在node2上 -->
<name>dfs.namenode.secondary.http-address</name>
<value>node2:9868</value>
</property>
<property>
<!-- 在本地文件系统存储数据块的目录的逗号分隔的列表,即配置多个存储目录 -->
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop-3.2.1/dfs/datanode</value>
</property>
<property>
<!-- Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. -->
<name>dfs.namenode.checkpoint.dir</name>
<value>/opt/hadoop-3.2.1/dfs/namesecondary</value>
</property>
</configuration>
# 在 node1、node2、node3 节点:
[root@node1 hadoop]# cat mapred-site.xml
...
<!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<!-- 指定mapreduce框架为yarn方式 -->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<!-- Directory where history files are written by MapReduce jobs -->
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>
<property>
<!-- Directory where history files are managed by the MR JobHistory Server -->
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
<property>
<!-- MapReduce JobHistory Server IPC host:port -->
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
</property>
<property>
<!-- MapReduce JobHistory Server Web UI host:port -->
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
</property>
</configuration>
# 在 node1、node2、node3 节点:
[root@node1 hadoop]# cat yarn-site.xml
...
<configuration>
<property>
<!-- Configuration to enable or disable log aggregation -->
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>node1</value>
</property>
<property>
<!-- ResourceManager host:port for clients to submit jobs. -->
<name>yarn.resourcemanager.address</name>
<value>node1:8032</value>
</property>
<property>
<!-- ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources. -->
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:8030</value>
</property>
<property>
<!-- ResourceManager host:port for NodeManagers. -->
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:8031</value>
</property>
<property>
<!-- ResourceManager host:port for administrative commands. -->
<name>yarn.resourcemanager.admin.address</name>
<value>node1:8033</value>
</property>
<property>
<!-- ResourceManager web-ui host:port. -->
<name>yarn.resourcemanager.webapp.address</name>
<value>node1:8088</value>
</property>
<property>
<!-- List of permitted NodeManagers. -->
<name>yarn.resourcemanager.nodes.include-path</name>
<value>/opt/hadoop-3.2.1/etc/hadoop/workers</value>
</property>
<property>
<!-- Comma-separated list of paths on the local filesystem where intermediate data is written. -->
<name>yarn.nodemanager.local-dirs</name>
<value>/opt/hadoop-3.2.1/tmp</value>
</property>
<property>
<!-- Comma-separated list of paths on the local filesystem where logs are written. -->
<name>yarn.nodemanager.log-dirs</name>
<value>/opt/hadoop-3.2.1/logs</value>
</property>
<property>
<!-- NodeManager上运行的附属服务 -->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<!-- URL for log aggregation server -->
<name>yarn.log.server.url</name>
<value>http://node1:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoop/yarn:/opt/hadoop-3.2.1/share/hadoop/yarn/lib/*:/opt/hadoop-3.2.1/share/hadoop/yarn/*</value>
</property>
</configuration>
# 在 node1、node2、node3 节点:
[root@node1 hadoop]# cat workers
node2
node3

10.1、一些配置项解释

dfs.datanode.data.dir

默认是 file://${hadoop.tmp.dir}/dfs/data

Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories that do not exist will be created if local filesystem permission allows.

hadoop环境搭建:完全分布式的更多相关文章

  1. hadoop环境搭建-完全分布式

    用于测试,我用4台虚拟机搭建成了hadoop结构 我用了两个台式机.一个xp系统,一个win7系统.每台电脑装两个虚拟机,要不然内存就满了. 1.安装虚拟机环境 Vmware,收费产品,占内存较大. ...

  2. hadoop环境搭建之关于NAT模式静态IP的设置 ---VMware12+CentOs7

    很久没有更新了,主要是没有时间,今天挤出时间验证了一下,果然还是有些问题的,不过已经解决了,就发上来吧. PS:小豆腐看仔细了哦~ 关于hadoop环境搭建,从单机模式,到伪分布式,再到完全分布式,我 ...

  3. Storm环境搭建(分布式集群)

    作为流计算的开篇,笔者首先给出storm的安装和部署,storm的第二篇,笔者将详细的介绍storm的工作原理.下边直接上干货,跟笔者的步伐一块儿安装storm. 原文链接:Storm环境搭建(分布式 ...

  4. 大数据学习之Hadoop环境搭建

    一.Hadoop的优势 1)高可靠性:因为Hadoop假设计算元素和存储会出现故障,因为它维护多个工作数据副本,在出现故障时可以对失败的节点重新分布处理. 2)高扩展性:在集群间分配任务数据,可方便的 ...

  5. Hadoop环境搭建、启动和管理界面查看

    一.hadoop环境搭建: 1. hadoop 6个核心配置文件的作用:core-site.xml:核心配置文件,主要定义了我们文件访问的格式 hdfs://hadoop-env.sh:主要配置我们的 ...

  6. 转 史上最详细的Hadoop环境搭建

    GitChat 作者:鸣宇淳 原文:史上最详细的Hadoop环境搭建 关注公众号:GitChat 技术杂谈,一本正经的讲技术 [不要错过文末活动哦] 前言 Hadoop在大数据技术体系中的地位至关重要 ...

  7. Hadoop环境搭建(centos)

    Hadoop环境搭建(centos) 本平台密码83953588abc 配置Java环境 下载JDK(本实验从/cgsrc 文件中复制到指定目录) mkdir /usr/local/java cp / ...

  8. 【转】RHadoop实践系列之一:Hadoop环境搭建

    RHadoop实践系列之一:Hadoop环境搭建 RHadoop实践系列文章,包含了R语言与Hadoop结合进行海量数据分析.Hadoop主要用来存储海量数据,R语言完成MapReduce 算法,用来 ...

  9. eclipse工具下hadoop环境搭建

    eclipse工具下hadoop环境搭建:    window10操作系统中搭建eclipse64开发系统,配置hadoop的eclipse插件,让eclipse可以查看Hdfs中的文件内容.     ...

  10. Ubuntu中Hadoop环境搭建

    Ubuntu中Hadoop环境搭建 JDK安装 方法一:通过命令行直接安装(不建议) 有两种java可以安装oracle-java8-installer以及openjdk (1)安装oracle-ja ...

随机推荐

  1. Nginx基本功能及其原理,配置原理

    Nginx基本功能及其原理,配置原理 一.正向代理.反向代理 二.Nginx配置文件的整体结构 三.Nginx配置SSL及HTTP跳转到HTTPS 四.nginx 配置管理 [nginx.conf 基 ...

  2. (14)Linux绝对路径和相对路径

    Linux 系统中,文件是存放在目录中的,而目录又可以存放在其他的目录中,因此,用户(或程序)可以借助文件名和目录名,从文件树中的任何地方开始,搜寻并定位所需的目录或文件. 说明目录或文件名位置的方法 ...

  3. 网际互连__TCP/IP三次握手和四次挥手

    在TCP/IP协议中,TCP协议提供可靠的连接服务. 位码即tcp标志位,有6种标示: SYN(synchronous建立联机).ACK(acknowledgement 确认).PSH(push传送) ...

  4. WPF 一种带有多个子集的类ComBox 解决方法

    在最近的工作中遇到很多,类似这种layUI风格的Combox: 因为WPF原本的控件,并不具备这种功能,尝试重写Combox的模板,发现无从下手. 于是尝试从多个控件组合来实现这个功能. 这里使用了P ...

  5. 从零搭建一个IdentityServer——目录(更新中...)

    从零搭建一个IdentityServer--项目搭建 从零搭建一个IdentityServer--集成Asp.net core Identity 从零搭建一个IdentityServer--初识Ope ...

  6. vulnhub靶机练习-Os-Hax,详细使用

    Difficulty : Intermediate Flag : boot-root Learing : exploit | web application Security | Privilege ...

  7. Scala面向对象—类详解2(继承相关)

    1.单例类 package com.zzzy class AAA {//单例 /*//java 思路--私有化构造方法,提供公开的getAAA 行不通 private def this(){ this ...

  8. codeforces 630K Indivisibility (容斥原理)

    IT City company developing computer games decided to upgrade its way to reward its employees. Now it ...

  9. AtCoder Beginner Contest 188 D - Snuke Prime (思维,差分)

    题意:你需要订阅一些服务,每个服务每天需要花费\(c_i\),要从第\(a_i\)用到第\(b_i\)天,你可以购买会员,会员每天需要花费\(C\),但是这天的服务不用再另花钱了,问你订阅这些服务的最 ...

  10. 数理统计10(习题篇):寻找UMVUE

    利用L-S定理,充分完备统计量法是寻找UMVUE的最方便方法,不过实际运用时还需要一些小技巧,比如如何写出充分完备统计量.如何找到无偏估计.如何求条件期望,等等.课本上的例题几乎涵盖了所有这些技巧,我 ...