前提:

①已经搭建好zk

②已经安装好JDK

正文开始:

首先从官网下载hadoop 2.7.3 (虽然官网3.0都出了。但是目前还没经过完全的测试。。待测试后。。。)

一、hadoop-env.sh(环境变量相关)

export JAVA_HOME=/app/jdk/jdk1.8.0_92
export HOME=/app/hadoop
export HADOOP_HOME=$HOME
export HADOOP_COMMON_HOME=$HOME
export HADOOP_MAPRED_HOME=$HOME
export HADOOP_HDFS_HOME=$HOME
export YARN_HOME=$HOME
export CLASSPATH=.:$HADOOP_HOME/lib:$SQOOP_HOME/lib:$HIVE_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_PID_DIR=/app/hadoop/tmp
export YARN_PID_DIR=/app/hadoop/tmp
export HADOOP_LOG_DIR="/log/hadoop"
export YARN_LOG_DIR=/log/yarn

#export HADOOP_HEAPSIZE=4096
# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol. Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME=${JSVC_HOME}

export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
export HADOOP_NAMENODE_OPTS="-Xmx80G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution"
export HADOOP_DATANODE_OPTS="-Xmx6G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done

# The maximum amount of heap to use, in MB. Default is 1000.
#export HADOOP_HEAPSIZE=
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""

# Extra Java runtime options. Empty by default.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"

export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"

# On secure datanodes, user to run the datanode as after dropping privileges.
# This **MUST** be uncommented to enable secure HDFS if using privileged ports
# to provide authentication of data transfer protocol. This **MUST NOT** be
# defined if SASL is configured for authentication of data transfer protocol
# using non-privileged ports.
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}

# Where log files are stored. $HADOOP_HOME/logs by default.
#export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER

# Where log files are stored in the secure data environment.
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}

###
# HDFS Mover specific parameters
###
# Specify the JVM options to be used when starting the HDFS Mover.
# These options will be appended to the options specified as HADOOP_OPTS
# and therefore may override any similar flags set in HADOOP_OPTS
#
# export HADOOP_MOVER_OPTS=""

###
# Advanced Users Only!
###

# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by
# the user that will run the hadoop daemons. Otherwise there is the
# potential for a symlink attack.
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER

二、core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://bdp-core</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp/</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>CNSZ17PL1523:2181,CNSZ17PL1524:2181,CNSZ17PL1525:2181,CNSZ17PL1526:2181,CNSZ17PL1527:2181,CNSZ17PL1528:2181,CNSZ17PL1529:2181</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.Lz4Codec</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>2880</value>
</property>
<property>
<name>net.topology.script.file.name</name>
<value>/app/hadoop/etc/hadoop/rack.sh</value>
</property>
</configuration>

三、hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>bdp-core</value>
</property>
<property>
<name>dfs.ha.namenodes.bdp-core</name>
<value>nn1,nn2</value>
</property>
<!--NameNode1 的地址-->
<property>
<name>dfs.namenode.rpc-address.bdp-core.nn1</name>
<value>CNSZ17PL1523:8020</value>
</property>
<!--NameNode2 的地址-->
<property>
<name>dfs.namenode.rpc-address.bdp-core.nn2</name>
<value>CNSZ17PL1524:8020</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/dfs/nn/local</value>
</property>
<property>
<name>dfs.namenode.http-address.bdp-core.nn1</name>
<value>CNSZ17PL1523:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.bdp-core.nn2</name>
<value>CNSZ17PL1524:50070</value>
</property>
<!--journal node的地址-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://CNSZ17PL1523:8485;CNSZ17PL1524:8485;CNSZ17PL1525:8485;CNSZ17PL1526:8485;CNSZ17PL1527:8485;CNSZ17PL1528:8485;CNSZ17PL1529:8485/bdp-core</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/dfs/jn</value>
</property>
<property>
<name>dfs.qjournal.start-segment.timeout.ms</name>
<value>60000</value>
</property>
<property>
<name>dfs.qjournal.prepare-recovery.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.accept-recovery.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.finalize-segment.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.select-input-streams.timeout.ms</name>
<value>60000</value>
</property>
<property>
<name>dfs.qjournal.get-journal-state.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.new-epoch.timeout.ms</name>
<value>240000</value>
</property>
<property>
<name>dfs.qjournal.write-txns.timeout.ms</name>
<value>60000</value>
</property>

<property>
<name>dfs.client.failover.proxy.provider.bdp-core</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
<description>Number of replication for each chunk.</description>
</property>
<!--需要根据实际配置进行修改-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/var/lib/hadoop-hdfs/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/HDATA/12/dfs/local,/HDATA/11/dfs/local,/HDATA/10/dfs/local,/HDATA/9/dfs/local,/HDATA/8/dfs/local,/HDATA/7/dfs/local,/HDATA/6/dfs/local,/HDATA/5/dfs/local,/HDATA/4/dfs/local,/HDATA/3/dfs/local,/HDATA/2/dfs/local,/HDATA/1/dfs/local</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>8192</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/app/hadoop/etc/hadoop/exclude.list</value>
<description> List of nodes to decommission </description>
</property>
<property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name>
<value>10737418240</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
<value>0.75</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.streams.cache.size</name>
<value>1000</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name>
<value>10000</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/app/var/run/hadoop-hdfs/dn._PORT</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>300</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>40</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

四、yarn-site.xml

<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>List of directories to store localized files in.</description>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data/yarn/local</value>
</property>
<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///data/yarn/log</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://bdp-core/var/log/hadoop-yarn/apps</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/hdfs/*,
$HADOOP_COMMON_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/mapreduce/*,
$HADOOP_COMMON_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/yarn/*,
$HADOOP_COMMON_HOME/share/hadoop/yarn/lib/*
</value>
</property>
<!-- resourcemanager config -->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-rm-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>CNSZ17PL1523</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>CNSZ17PL1524</value>
</property>
<!-- fair scheduler -->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/app/hadoop/etc/hadoop/fair-scheduler.xml</value>
</property>
<property>
<name>yarn.scheduler.fair.user-as-default-queue</name>
<value>false</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/app/hadoop/etc/hadoop/yarn.exclude</value>
<final>true</final>
</property>
<!-- ZKRMStateStore config -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>CNSZ17PL1523:2181,CNSZ17PL1524:2181,CNSZ17PL1525:2181,CNSZ17PL1526:2181,CNSZ17PL1527:2181,CNSZ17PL1528:2181,CNSZ17PL1529:2181</value>
</property>
<!--
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>cnsz23pl0073:2181,cnsz23pl0069:2181,cnsz23pl0070:2181,cnsz23pl0071:2181,cnsz23pl0072:2181</value>
</property>
-->
<!-- applications manager interface -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>CNSZ17PL1523:23140</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>CNSZ17PL1524:23140</value>
</property>
<!-- scheduler interface -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>CNSZ17PL1523:23130</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>CNSZ17PL1524:23130</value>
</property>
<!-- RM admin interface -->
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>CNSZ17PL1523:23141</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>CNSZ17PL1524:23141</value>
</property>
<!-- RM resource-tracker interface -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>CNSZ17PL1523:23125</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>CNSZ17PL1524:23125</value>
</property>
<!-- RM web application interface -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>CNSZ17PL1523:23188</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>CNSZ17PL1524:23188</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>CNSZ17PL1523:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>CNSZ17PL1524:23189</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://CNSZ17PL1525:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>CNSZ17PL1525:54315</value>
</property>
<!-- Node Manager Configs -->
<property>
<description>Address where the localizer IPC is.</description>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:23344</value>
</property>
<property>
<description>NM Webapp address.</description>
<name>yarn.nodemanager.webapp.address</name>
<value>0.0.0.0:23999</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///HDATA/11/yarn/local,file:///HDATA/10/yarn/local,file:///HDATA/9/yarn/local,file:///HDATA/8/yarn/local,file:///HDATA/7/yarn/local,file:///HDATA/6/yarn/local,file:///HDATA/5/yarn/local,file:///HDATA/4/yarn/local,file:///HDATA/3/yarn/local,file:///HDATA/2/yarn/local,file:///HDATA/1/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///HDATA/11/yarn/logs,file:///HDATA/10/yarn/logs,file:///HDATA/9/yarn/logs,file:///HDATA/8/yarn/logs,file:///HDATA/7/yarn/logs,file:///HDATA/6/yarn/logs,file:///HDATA/5/yarn/logs,file:///HDATA/4/yarn/logs,file:///HDATA/3/yarn/logs,file:///HDATA/2/yarn/logs,file:///HDATA/1/yarn/logs</value>
</property>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>1200</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
<property>
<name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
<value>true</value>
</property>
<!-- tuning -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>120000</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>16</value>
</property>
<!-- tuning yarn container -->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>16384</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.fair.allow-undeclared-pools</name>
<value>false</value>
</property>
<property>
<name>yarn.scheduler.fair.allow-undeclared-pools</name>
<value>false</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>1209600</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>CNSZ17PL1525:54315</value>
</property>
</configuration>

五、mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>CNSZ17PL1525:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>CNSZ17PL1525:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>

<!-- tuning mapreduce -->
<property>
<name>mapreduce.map.memory.mb</name>
<value>5120</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx4096m -Dfile.encoding=UTF-8</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>13312</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx10649m -Dfile.encoding=UTF-8</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>2</value>
</property>
<property>
<name>mapreduce.jobhistory.max-age-ms</name>
<value>1296000000</value>
<source>mapred-default.xml</source>
</property>
<property>
<name>mapreduce.jobhistory.joblist.cache.size</name>
<value>200000</value>
<source>mapred-default.xml</source>
</property>
</configuration>

5个配置文件配完就ok了。其中的参数意思会有专门帖子讲,现在分发到每台机器上,执行脚本新建运行的时候的data  log等需要的目录,   dir.sh  (每台机器都执行。改成上述配置文件的hadoop2.7.3安装包也都每台机器都分发。每台机器的环境变量都需要增加。)

环境变量类似:

export ZOOKEEPER_HOME=/app/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export JAVA_HOME=/app/jdk
export JRE_HOME=/app/jdk/jre
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/app/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_HOME=$HADOOP_HOME

(dir,sh 脚本)

#!/bin/bash
groupadd hadoop
groupadd hdfs
useradd -g hdfs -G hadoop hdfs

echo "sf#123456" | passwd --stdin hdfs

for i in 1 2 3 4 5 6 7 8 9 10 11
do
datadir=/HDATA/$i/dfs/local
mrdir=/HDATA/$i/mapred/local
yarndir=/HDATA/$i/yarn/local
yarnlog=/HDATA/$i/yarn/logs
mkdir -p $datadir
mkdir -p $mrdir
mkdir -p $yarndir
mkdir -p $yarnlog
echo "$datadir $mrdir $yarndir $yarnlog make over and chown hdfs:hadoop"
chown hdfs:hadoop -R $datadir $mrdir

chown yarn:yarn -R  $yarndir $yarnlog

done

#log
mkdir -p /data/dfs/nn/local
chown hdfs:hadoop /data/dfs/nn/local
mkdir -p /log/hadoop /log/yarn /log/yarn-log /log/balant /log/hadoop-datanode-log/ /app/hadoop/tmp /app/var/run/hadoop-hdfs
chown hdfs:hadoop /log/hadoop /log/yarn /log/yarn-log /log/balant /log/hadoop-datanode-log/ /app/hadoop/tmp /app/var/run/hadoop-hdfs

最后将hadoop的应用目录赋给hdfs:hadoop

然后启动过程:

1. 启动 ZooKeeper 集群
在集群中安装 ZooKeeper 的主机上启动 ZooKeeper 服务。在本教程中也就是在 slave51、slave52、slave53 的主机上启动相应进程。分别登陆到三台机子上执行:
zookeeper的启动在每台zookeeper节点执行这句命令

zkServer.sh start

2. 格式化 ZooKeeper 集群
在任意的 namenode 上都可以执行,笔者还是选择了 master1 主机执行格式化命令(namenode1上执行)

hdfs zkfc -formatZK

3. 启动 JournalNode 集群
分别在 slave1、slave2、slave3 上执行以下命令(所有的journal节点)

hadoop-daemon.sh start journalnode

4. 格式化集群的 NameNode

在 master1 的主机上执行以下命令,以格式化 namenode:(namenode1节点执行)

hdfs namenode -format

5. 启动刚格式化的 NameNode
刚在 master1 上格式化了 namenode ,故就在 master1上执行(namenode1节点执行)

hadoop-daemon.sh start namenode

6. 同步 NameNode1 元数据到 NameNode2 上
复制你 NameNode 上的元数据目录到另一个 NameNode,也就是此处的 master5 复制元数据到 master52 上。在 master52 上执行以下命令:(namenode2节点执行)

hdfs namenode -bootstrapStandby

7. 启动 NameNode2
master2 主机拷贝了元数据之后,就接着启动 namenode 进程了,执行(namenode2节点执行)

hadoop-daemon.sh start namenode

8. 启动集群中所有的DataNode(所有datanode节点执行)

hadoop-daemon.sh start datanode

9. 启动 ZKFC

在 master1 和 master2 的主机上分别执行如下命令:((namenode1节点执行)&&(namenode2节点执行))

hadoop-daemon.sh start zkfc

10. 开启历史日志服务

在 master1和 master2 的主机上执行((namenode1节点执行)&&(namenode2节点执行))

mr-jobhistory-daemon.sh start historyserver

11. 在 RM1 启动 YARN

在 master1的主机上执行以下命令:((namenode1节点执行))

yarn-daemon.sh start resourcemanager

12. 在 RM2 单独启动 YARN

虽然上一步启动了 YARN ,但是在 master2 上是没有相应的 ResourceManager 进程,故需要在 master2 主机上单独启动:(namenode2节点执行)

yarn-daemon.sh start resourcemanager

13.启动所有datanode 的 nodemanager(所有datanode节点)

yarn-daemon.sh start nodemanager

Hadoop生产环境配置文件的更多相关文章

  1. Hadoop生产环境搭建(含HA、Federation)

    Hadoop生产环境搭建 1. 将安装包hadoop-2.x.x.tar.gz存放到某一目录下,并解压. 2. 修改解压后的目录中的文件夹etc/hadoop下的配置文件(若文件不存在,自己创建.) ...

  2. SpringBoot yml 配置 多配置文件,开发环境,生产环境配置文件分开

    原文地址:https://www.cnblogs.com/baoyi/p/SpringBoot_YML.html 1. 在 spring boot 中,有两种配置文件,一种是application.p ...

  3. 转 通过 spring 容器内建的 profile 功能实现开发环境、测试环境、生产环境配置自动切换

                                      软件开发的一般流程为工程师开发 -> 测试 -> 上线,因此就涉及到三个不同的环境,开发环境.测试环境以及生产环境,通常 ...

  4. webpack(7)-生产环境

    development(开发环境) 和 production(生产环境) 这两个环境下的构建目标存在着巨大差异.在开发环境中,我们需要:强大的 source map 和一个有着 live reload ...

  5. Spring.profile配合Jenkins发布War包,实现开发、测试和生产环境的按需切换

    前两篇不错 Spring.profile实现开发.测试和生产环境的配置和切换 - Strugglion - 博客园https://www.cnblogs.com/strugglion/p/709102 ...

  6. 一种简单的生产环境部署Node.js程序方法

    最近在部署Node.js程序时,写了段简单的脚本,发觉还挺简单的,忍不住想与大家分享. 配置文件 首先,本地测试环境和生产环境的数据库连接这些配置信息是不一样的,需要将其分开为两个文件存储 到conf ...

  7. spring boot区分生产环境和开发环境

    回顾一下spring boot使用基础,做个笔记. 通过配置文件,设置项目的开发环境和生成环境. 项目目录结构: application-dev.yml是开发环境配置文件,application-pr ...

  8. Spring.profile实现开发、测试和生产环境的配置和切换

    软件开发过程一般涉及“开发 -> 测试 -> 部署上线”多个阶段,每个阶段的环境的配置参数会有不同,如数据源,文件路径等.为避免每次切换环境时都要进行参数配置等繁琐的操作,可以通过spri ...

  9. 使用Asset Pipeline管理rails生产环境静态资源实现步骤

    1.    修改项目中指向静态资源文件的链接 a)     访问静态资源文件 <%= stylesheet_link_tag "application", media: &q ...

随机推荐

  1. BZOJ5419[Noi2018]情报中心——线段树合并+虚树+树形DP

    题目链接: [NOI2018]情报中心 题目大意:给出一棵n个节点的树,边有非负边权,并给出m条链,对于每条链有一个代价,要求选出两条有公共边的链使两条链的并的边权和-两条链的代价和最大. 花了一天的 ...

  2. Android Studio 导致C盘过大

    转载:http://blog.csdn.net/u010794180/article/details/48004415 这是一个可看可不看的文章,不可看是对与那些 C盘 容量不紧张的人而言:而我是属于 ...

  3. scrapy入门与进阶

    Scrapy是用纯Python实现一个为了爬取网站数据.提取结构性数据而编写的应用框架,用途非常广泛. 框架的力量,用户只需要定制开发几个模块就可以轻松的实现一个爬虫,用来抓取网页内容以及各种图片,非 ...

  4. 搜索引擎(Elasticsearch搜索详解)

    学完本课题,你应达成如下目标: 掌握ES搜索API的规则.用法. 掌握各种查询用法 搜索API 搜索API 端点地址 GET /twitter/_search?q=user:kimchy GET /t ...

  5. 【Luogu4781】【模板】拉格朗日插值

    [Luogu4781][模板]拉格朗日插值 题面 洛谷 题解 套个公式就好 #include<cstdio> #define ll long long #define MOD 998244 ...

  6. luogu1397 [NOI2013]矩阵游戏 (等比数列求和)

    一个比较显然的等比数列求和,但有一点问题就是n和m巨大.. 考虑到他们是在幂次上出现,所以可以模上P-1(费马小定理) 但是a或c等于1的时候,不能用等比数列求和公式,这时候就要乘n和m,又要变成模P ...

  7. [NOI2016]优秀的拆分&&BZOJ2119股市的预测

    [NOI2016]优秀的拆分 https://www.lydsy.com/JudgeOnline/problem.php?id=4650 题解 如果我们能够统计出一个数组a,一个数组b,a[i]表示以 ...

  8. PHP函数memory_get_usage获取PHP内存清耗量

    (PHP 4 >= 4.3.2, PHP 5, PHP 7) memory_get_usage — 返回分配给 PHP 的内存量 说明 int memory_get_usage ([ bool ...

  9. python学习day6 数据类型Ⅳ(集合)

    day6 数据类型-集合 内容补充: 列表功能: .reverse()反转 v = [1,2,3,4,5,6]v.reverse()print() #[6, 5, 4, 3, 2, 1] .sort( ...

  10. mui 对话框 点击按钮不关闭对话框的办法

    目前版本的 mui.js 点击对话框的按钮只能关闭对话框 做如下修改 点击按钮后return false 即可