前提:

①已经搭建好zk

②已经安装好JDK

正文开始:

首先从官网下载hadoop 2.7.3 (虽然官网3.0都出了。但是目前还没经过完全的测试。。待测试后。。。)

一、hadoop-env.sh(环境变量相关)

export JAVA_HOME=/app/jdk/jdk1.8.0_92
export HOME=/app/hadoop
export HADOOP_HOME=$HOME
export HADOOP_COMMON_HOME=$HOME
export HADOOP_MAPRED_HOME=$HOME
export HADOOP_HDFS_HOME=$HOME
export YARN_HOME=$HOME
export CLASSPATH=.:$HADOOP_HOME/lib:$SQOOP_HOME/lib:$HIVE_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_PID_DIR=/app/hadoop/tmp
export YARN_PID_DIR=/app/hadoop/tmp
export HADOOP_LOG_DIR="/log/hadoop"
export YARN_LOG_DIR=/log/yarn

#export HADOOP_HEAPSIZE=4096
# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol. Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME=${JSVC_HOME}

export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
export HADOOP_NAMENODE_OPTS="-Xmx80G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution"
export HADOOP_DATANODE_OPTS="-Xmx6G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done

# The maximum amount of heap to use, in MB. Default is 1000.
#export HADOOP_HEAPSIZE=
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""

# Extra Java runtime options. Empty by default.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"

export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"

# On secure datanodes, user to run the datanode as after dropping privileges.
# This **MUST** be uncommented to enable secure HDFS if using privileged ports
# to provide authentication of data transfer protocol. This **MUST NOT** be
# defined if SASL is configured for authentication of data transfer protocol
# using non-privileged ports.
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}

# Where log files are stored. $HADOOP_HOME/logs by default.
#export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER

# Where log files are stored in the secure data environment.
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}

###
# HDFS Mover specific parameters
###
# Specify the JVM options to be used when starting the HDFS Mover.
# These options will be appended to the options specified as HADOOP_OPTS
# and therefore may override any similar flags set in HADOOP_OPTS
#
# export HADOOP_MOVER_OPTS=""

###
# Advanced Users Only!
###

# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by
# the user that will run the hadoop daemons. Otherwise there is the
# potential for a symlink attack.
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER

二、core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://bdp-core</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp/</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>CNSZ17PL1523:2181,CNSZ17PL1524:2181,CNSZ17PL1525:2181,CNSZ17PL1526:2181,CNSZ17PL1527:2181,CNSZ17PL1528:2181,CNSZ17PL1529:2181</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.Lz4Codec</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>2880</value>
</property>
<property>
<name>net.topology.script.file.name</name>
<value>/app/hadoop/etc/hadoop/rack.sh</value>
</property>
</configuration>

三、hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>bdp-core</value>
</property>
<property>
<name>dfs.ha.namenodes.bdp-core</name>
<value>nn1,nn2</value>
</property>
<!--NameNode1 的地址-->
<property>
<name>dfs.namenode.rpc-address.bdp-core.nn1</name>
<value>CNSZ17PL1523:8020</value>
</property>
<!--NameNode2 的地址-->
<property>
<name>dfs.namenode.rpc-address.bdp-core.nn2</name>
<value>CNSZ17PL1524:8020</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/dfs/nn/local</value>
</property>
<property>
<name>dfs.namenode.http-address.bdp-core.nn1</name>
<value>CNSZ17PL1523:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.bdp-core.nn2</name>
<value>CNSZ17PL1524:50070</value>
</property>
<!--journal node的地址-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://CNSZ17PL1523:8485;CNSZ17PL1524:8485;CNSZ17PL1525:8485;CNSZ17PL1526:8485;CNSZ17PL1527:8485;CNSZ17PL1528:8485;CNSZ17PL1529:8485/bdp-core</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/dfs/jn</value>
</property>
<property>
<name>dfs.qjournal.start-segment.timeout.ms</name>
<value>60000</value>
</property>
<property>
<name>dfs.qjournal.prepare-recovery.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.accept-recovery.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.finalize-segment.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.select-input-streams.timeout.ms</name>
<value>60000</value>
</property>
<property>
<name>dfs.qjournal.get-journal-state.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.new-epoch.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.write-txns.timeout.ms</name>
<value>60000</value>
</property>

<property>
<name>dfs.client.failover.proxy.provider.bdp-core</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
<description>Number of replication for each chunk.</description>
</property>
<!--需要根据实际配置进行修改-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/var/lib/hadoop-hdfs/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/HDATA/12/dfs/local,/HDATA/11/dfs/local,/HDATA/10/dfs/local,/HDATA/9/dfs/local,/HDATA/8/dfs/local,/HDATA/7/dfs/local,/HDATA/6/dfs/local,/HDATA/5/dfs/local,/HDATA/4/dfs/local,/HDATA/3/dfs/local,/HDATA/2/dfs/local,/HDATA/1/dfs/local</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>8192</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/app/hadoop/etc/hadoop/exclude.list</value>
<description> List of nodes to decommission </description>
</property>
<property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name>
<value>10737418240</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
<value>0.75</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.streams.cache.size</name>
<value>1000</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name>
<value>10000</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/app/var/run/hadoop-hdfs/dn._PORT</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>300</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>40</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

四、yarn-site.xml

<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>List of directories to store localized files in.</description>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data/yarn/local</value>
</property>
<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///data/yarn/log</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://bdp-core/var/log/hadoop-yarn/apps</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/hdfs/*,
$HADOOP_COMMON_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/mapreduce/*,
$HADOOP_COMMON_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/yarn/*,
$HADOOP_COMMON_HOME/share/hadoop/yarn/lib/*
</value>
</property>
<!-- resourcemanager config -->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-rm-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>CNSZ17PL1523</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>CNSZ17PL1524</value>
</property>
<!-- fair scheduler -->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/app/hadoop/etc/hadoop/fair-scheduler.xml</value>
</property>
<property>
<name>yarn.scheduler.fair.user-as-default-queue</name>
<value>false</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/app/hadoop/etc/hadoop/yarn.exclude</value>
<final>true</final>
</property>
<!-- ZKRMStateStore config -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>CNSZ17PL1523:2181,CNSZ17PL1524:2181,CNSZ17PL1525:2181,CNSZ17PL1526:2181,CNSZ17PL1527:2181,CNSZ17PL1528:2181,CNSZ17PL1529:2181</value>
</property>
<!--
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>cnsz23pl0073:2181,cnsz23pl0069:2181,cnsz23pl0070:2181,cnsz23pl0071:2181,cnsz23pl0072:2181</value>
</property>
-->
<!-- applications manager interface -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>CNSZ17PL1523:23140</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>CNSZ17PL1524:23140</value>
</property>
<!-- scheduler interface -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>CNSZ17PL1523:23130</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>CNSZ17PL1524:23130</value>
</property>
<!-- RM admin interface -->
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>CNSZ17PL1523:23141</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>CNSZ17PL1524:23141</value>
</property>
<!-- RM resource-tracker interface -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>CNSZ17PL1523:23125</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>CNSZ17PL1524:23125</value>
</property>
<!-- RM web application interface -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>CNSZ17PL1523:23188</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>CNSZ17PL1524:23188</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>CNSZ17PL1523:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>CNSZ17PL1524:23189</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://CNSZ17PL1525:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>CNSZ17PL1525:54315</value>
</property>
<!-- Node Manager Configs -->
<property>
<description>Address where the localizer IPC is.</description>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:23344</value>
</property>
<property>
<description>NM Webapp address.</description>
<name>yarn.nodemanager.webapp.address</name>
<value>0.0.0.0:23999</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///HDATA/11/yarn/local,file:///HDATA/10/yarn/local,file:///HDATA/9/yarn/local,file:///HDATA/8/yarn/local,file:///HDATA/7/yarn/local,file:///HDATA/6/yarn/local,file:///HDATA/5/yarn/local,file:///HDATA/4/yarn/local,file:///HDATA/3/yarn/local,file:///HDATA/2/yarn/local,file:///HDATA/1/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///HDATA/11/yarn/logs,file:///HDATA/10/yarn/logs,file:///HDATA/9/yarn/logs,file:///HDATA/8/yarn/logs,file:///HDATA/7/yarn/logs,file:///HDATA/6/yarn/logs,file:///HDATA/5/yarn/logs,file:///HDATA/4/yarn/logs,file:///HDATA/3/yarn/logs,file:///HDATA/2/yarn/logs,file:///HDATA/1/yarn/logs</value>
</property>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>1200</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
<property>
<name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
<value>true</value>
</property>
<!-- tuning -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>120000</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>16</value>
</property>
<!-- tuning yarn container -->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>16384</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.fair.allow-undeclared-pools</name>
<value>false</value>
</property>
<property>
<name>yarn.scheduler.fair.allow-undeclared-pools</name>
<value>false</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>1209600</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>CNSZ17PL1525:54315</value>
</property>
</configuration>

五、mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>CNSZ17PL1525:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>CNSZ17PL1525:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>

<!-- tuning mapreduce -->
<property>
<name>mapreduce.map.memory.mb</name>
<value>5120</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx4096m -Dfile.encoding=UTF-8</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>13312</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx10649m -Dfile.encoding=UTF-8</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>2</value>
</property>
<property>
<name>mapreduce.jobhistory.max-age-ms</name>
<value>1296000000</value>
<source>mapred-default.xml</source>
</property>
<property>
<name>mapreduce.jobhistory.joblist.cache.size</name>
<value>200000</value>
<source>mapred-default.xml</source>
</property>
</configuration>

5个配置文件配完就ok了。其中的参数意思会有专门帖子讲,现在分发到每台机器上,执行脚本新建运行的时候的data  log等需要的目录,   dir.sh  (每台机器都执行。改成上述配置文件的hadoop2.7.3安装包也都每台机器都分发。每台机器的环境变量都需要增加。)

环境变量类似:

export ZOOKEEPER_HOME=/app/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export JAVA_HOME=/app/jdk
export JRE_HOME=/app/jdk/jre
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/app/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_HOME=$HADOOP_HOME

(dir,sh 脚本)

#!/bin/bash
groupadd hadoop
groupadd hdfs
useradd -g hdfs -G hadoop hdfs

echo "sf#123456" | passwd --stdin hdfs

for i in 1 2 3 4 5 6 7 8 9 10 11
do
datadir=/HDATA/$i/dfs/local
mrdir=/HDATA/$i/mapred/local
yarndir=/HDATA/$i/yarn/local
yarnlog=/HDATA/$i/yarn/logs
mkdir -p $datadir
mkdir -p $mrdir
mkdir -p $yarndir
mkdir -p $yarnlog
echo "$datadir $mrdir $yarndir $yarnlog make over and chown hdfs:hadoop"
chown hdfs:hadoop -R $datadir $mrdir

chown yarn:yarn -R  $yarndir $yarnlog

done

#log
mkdir -p /data/dfs/nn/local
chown hdfs:hadoop /data/dfs/nn/local
mkdir -p /log/hadoop /log/yarn /log/yarn-log /log/balant /log/hadoop-datanode-log/ /app/hadoop/tmp /app/var/run/hadoop-hdfs
chown hdfs:hadoop /log/hadoop /log/yarn /log/yarn-log /log/balant /log/hadoop-datanode-log/ /app/hadoop/tmp /app/var/run/hadoop-hdfs

最后将hadoop的应用目录赋给hdfs:hadoop

然后启动过程:

1. 启动 ZooKeeper 集群
在集群中安装 ZooKeeper 的主机上启动 ZooKeeper 服务。在本教程中也就是在 slave51、slave52、slave53 的主机上启动相应进程。分别登陆到三台机子上执行:
zookeeper的启动在每台zookeeper节点执行这句命令

zkServer.sh start

2. 格式化 ZooKeeper 集群
在任意的 namenode 上都可以执行,笔者还是选择了 master1 主机执行格式化命令(namenode1上执行)

hdfs zkfc -formatZK

3. 启动 JournalNode 集群
分别在 slave1、slave2、slave3 上执行以下命令(所有的journal节点)

hadoop-daemon.sh start journalnode

4. 格式化集群的 NameNode

在 master1 的主机上执行以下命令,以格式化 namenode:(namenode1节点执行)

hdfs namenode -format

5. 启动刚格式化的 NameNode
刚在 master1 上格式化了 namenode ,故就在 master1上执行(namenode1节点执行)

hadoop-daemon.sh start namenode

6. 同步 NameNode1 元数据到 NameNode2 上
复制你 NameNode 上的元数据目录到另一个 NameNode,也就是此处的 master5 复制元数据到 master52 上。在 master52 上执行以下命令:(namenode2节点执行)

hdfs namenode -bootstrapStandby

7. 启动 NameNode2
master2 主机拷贝了元数据之后,就接着启动 namenode 进程了,执行(namenode2节点执行)

hadoop-daemon.sh start namenode

8. 启动集群中所有的DataNode(所有datanode节点执行)

hadoop-daemon.sh start datanode

9. 启动 ZKFC

在 master1 和 master2 的主机上分别执行如下命令:((namenode1节点执行)&&(namenode2节点执行))

hadoop-daemon.sh start zkfc

10. 开启历史日志服务

在 master1和 master2 的主机上执行((namenode1节点执行)&&(namenode2节点执行))

mr-jobhistory-daemon.sh start historyserver

11. 在 RM1 启动 YARN

在 master1的主机上执行以下命令:((namenode1节点执行))

yarn-daemon.sh start resourcemanager

12. 在 RM2 单独启动 YARN

虽然上一步启动了 YARN ,但是在 master2 上是没有相应的 ResourceManager 进程,故需要在 master2 主机上单独启动:(namenode2节点执行)

yarn-daemon.sh start resourcemanager

13.启动所有datanode 的 nodemanager(所有datanode节点)

yarn-daemon.sh start nodemanager

Hadoop生产环境配置文件的更多相关文章

  1. Hadoop生产环境搭建(含HA、Federation)

    Hadoop生产环境搭建 1. 将安装包hadoop-2.x.x.tar.gz存放到某一目录下,并解压. 2. 修改解压后的目录中的文件夹etc/hadoop下的配置文件(若文件不存在,自己创建.) ...

  2. SpringBoot yml 配置 多配置文件,开发环境,生产环境配置文件分开

    原文地址:https://www.cnblogs.com/baoyi/p/SpringBoot_YML.html 1. 在 spring boot 中,有两种配置文件,一种是application.p ...

  3. 转 通过 spring 容器内建的 profile 功能实现开发环境、测试环境、生产环境配置自动切换

                                      软件开发的一般流程为工程师开发 -> 测试 -> 上线,因此就涉及到三个不同的环境,开发环境.测试环境以及生产环境,通常 ...

  4. webpack(7)-生产环境

    development(开发环境) 和 production(生产环境) 这两个环境下的构建目标存在着巨大差异.在开发环境中,我们需要:强大的 source map 和一个有着 live reload ...

  5. Spring.profile配合Jenkins发布War包,实现开发、测试和生产环境的按需切换

    前两篇不错 Spring.profile实现开发.测试和生产环境的配置和切换 - Strugglion - 博客园https://www.cnblogs.com/strugglion/p/709102 ...

  6. 一种简单的生产环境部署Node.js程序方法

    最近在部署Node.js程序时,写了段简单的脚本,发觉还挺简单的,忍不住想与大家分享. 配置文件 首先,本地测试环境和生产环境的数据库连接这些配置信息是不一样的,需要将其分开为两个文件存储 到conf ...

  7. spring boot区分生产环境和开发环境

    回顾一下spring boot使用基础,做个笔记. 通过配置文件,设置项目的开发环境和生成环境. 项目目录结构: application-dev.yml是开发环境配置文件,application-pr ...

  8. Spring.profile实现开发、测试和生产环境的配置和切换

    软件开发过程一般涉及“开发 -> 测试 -> 部署上线”多个阶段,每个阶段的环境的配置参数会有不同,如数据源,文件路径等.为避免每次切换环境时都要进行参数配置等繁琐的操作,可以通过spri ...

  9. 使用Asset Pipeline管理rails生产环境静态资源实现步骤

    1.    修改项目中指向静态资源文件的链接 a)     访问静态资源文件 <%= stylesheet_link_tag "application", media: &q ...

随机推荐

  1. layui laydate设置当前日期往后不可选

    layui laydate设置当前日期往后不可选 laydate.render({ elem: '#demo', max: maxDate() }); // 设置最大可选的日期 function ma ...

  2. 第四十一天 socker server和 event

    今日内容 1.基于TCP的socketserver 2.基于UDP的socketserver 3.event 一.TCP的socketserver #服务器 import socketserver f ...

  3. codeforces 1051F The Shortest Statement

    题目链接:codeforces 1051F The Shortest Statement 题意:\(q\)组询问,求任意两点之间的最短路,图满足\(m-n\leq 20\) 分析:一开始看这道题:fl ...

  4. hdu 5510 Bazinga (KMP+暴力标记)

    题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=5510 思路: 一开始直接用KMP莽了发,超时了,后面发现如果前面的字符串被后面的字符串包含,那么我们就 ...

  5. 概率DP自学

    转自https://blog.csdn.net/zy691357966/article/details/46776199 zy691357966的blog 有关概率和期望问题的研究 摘要 在各类信息学 ...

  6. 搭建web定时任务管理平台

    需要安装mysql及gityum -y install git mysql-server 下载安装go官网:https://golang.org/dl/wget https://redirector. ...

  7. nfs的配置文件/etc/exports

    /etc/exports  文件格式 <输出目录> [客户端1 选项(访问权限,用户映射,其他)] [客户端2 选项(访问权限,用户映射,其他)] a. 输出目录:输出目录是指NFS系统中 ...

  8. 【BZOJ3691】游行(网络流)

    [BZOJ3691]游行(网络流) 题面 BZOJ 然而权限题. Description 每年春季,在某岛屿上都会举行游行活动. 在这个岛屿上有N个城市,M条连接着城市的有向道路. 你要安排英雄们的巡 ...

  9. [JLOI2014]聪明的燕姿(搜索)

    城市中人们总是拿着号码牌,不停寻找,不断匹配,可是谁也不知道自己等的那个人是谁. 可是燕姿不一样,燕姿知道自己等的人是谁,因为燕姿数学学得好!燕姿发现了一个神奇的算法:假设自己的号码牌上写着数字 S, ...

  10. C/C++ 控制台窗口暂停

    为什么我看不到控制台的输出结果? 在编写C++程序中,经常会出现,控制台窗口一闪就消失了的情况 为什么会这样? 原因简单到有点可笑:因为程序运行结束了 对于控制台程序,操作系统让它开始运行前会为它造一 ...