【时间】2014年11月19日

【平台】Centos 6.5

【工具】scp

【软件】jdk-7u67-linux-x64.rpm

    CDH5.2.0-hadoop2.5.0

【步骤】

    1. 准备条件

      (1)集群规划

主机类型 IP地址 域名
master 192.168.50.10 master.hadoop.com
slave1 192.168.50.11 slave1.hadoop.com
slave2 192.168.50.12 slave2.hadoop.com
slave3 192.168.50.13 slave3.hadoop.com

   

     (2)以root身份登录操作系统

      (3)在集群中的每台主机上执行如下命令,设置主机名。

          hostname *.hadoop.com

          编辑文件/etc/sysconfig/network如下

          HOSTNAME=*.hadoop.com

      (4)修改文件/etc/hosts如下

         192.168.86.10 master.hadoop.com
         192.168.86.11 slave1.hadoop.com
         192.168.86.12 slave2.hadoop.com
         192.168.86.13 slave3.hadoop.com

          执行如下命令,将hosts文件复制到集群中每台主机上

          scp /etc/hosts 192.168.50.*:/etc/hosts

      (5)安装jdk

          rpm -ivh jdk-7u67-linux-x64.rpm

         创建文件

          echo -e "JAVA_HOME=/usr/java/default\nexport PATH=\$JAVA_HOME/bin:\$PATH" > /etc/profile.d/java-env.sh

          . /etc/profile.d/java-env.sh

      (6)关闭iptables

         service iptables stop

         chkconfig iptables off

      (7)关闭selinux。修改文件/etc/selinux/config,然后重启操作系统

         SELINUX=disabled

    2. 安装 (with YARN)

      (1)在master.hadoop.com主机上执行

          yum install hadoop-yarn-resourcemanager hadoop-mapreduce-historyserver hadoop-yarn-proxyserver hadoop-hdfs-namenode

         yum install hadoop-hdfs-secondarynamenode  可选,如果使用HA,就不要安装此包

      (2)在所有的slave*.hadoop.com主机上执行

          yum install hadoop-yarn-nodemanager hadoop-mapreduce hadoop-hdfs-datanode

    3. 配置。将以下文件修改完毕后,用scp命令复制到集群中的所有主机上

      (1)创建配置文件

cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster
alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster

      (2)创建必要的本地文件夹

sudo -u hdfs hadoop fs -mkdir -p /tmp && sudo -u hdfs hadoop fs -chmod -R  /tmp
sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn
sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging && sudo -u hdfs hadoop fs -chmod -R /tmp
sudo -u hdfs hadoop fs -mkdir -p /var
sudo -u hdfs hadoop fs -mkdir -p /var/log && sudo -u hdfs hadoop fs -chmod -R /var/log && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log
sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chmod -R /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn/apps
sudo -u hdfs hadoop fs -mkdir -p /user
sudo -u hdfs hadoop fs -mkdir -p /user/history && sudo -u hdfs hadoop fs -chown mapred /user/history
sudo -u hdfs hadoop fs -mkdir -p /user/test && sudo -u hdfs hadoop fs -chmod -R /user/test && sudo -u hdfs hadoop fs -chown test /user/test
sudo -u hdfs hadoop fs -mkdir -p /user/root && sudo -u hdfs hadoop fs -chmod -R /user/root && sudo -u hdfs hadoop fs -chown root /user/root

      (3)修改配置文件

        1)core-site.xml

  <property>
<name>fs.defaultFS</name>
<value>hdfs://master.hadoop.com:8020</value>
</property> <property>
<name>fs.trash.interval</name>
<value>1440</value>
</property> <property>
<name>fs.trash.checkpoint.interval</name>
<value>720</value>
</property> <property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property> <property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property> <property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>

        2)hdfs-site.xml

  <property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property> <property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/1/dfs/nn</value>
</property> <property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value>
</property> <property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>3</value>
</property> <property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property> <property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name>
<value>10737418240</value>
</property> <property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
<value>0.75</value>
</property> <property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property> <property>
<name>dfs.webhdfs.user.provider.user.pattern</name>
<value>^[A-Za-z0-9_][A-Za-z0-9._-]*[$]?$</value>
</property>

        3)yarn-site.xml

  <property>
<name>yarn.resourcemanager.hostname</name>
<value>master.hadoop.com</value>
</property> <property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property> <property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property> <property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property> <property>
<description>List of directories to store localized files in.</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local,/data/4/yarn/local</value>
</property> <property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs,/data/4/yarn/logs</value>
</property> <property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://master.hadoop.com:8020/var/log/hadoop-yarn/apps</value>
</property> <property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
</value>
</property> <property>
<name>yarn.web-proxy.address</name>
<value>master.hadoop.com</value>
</property> <property>
<description>It's not the memory the physical machine totally has, but that allocated to containers</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5120</value>
</property> <property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property> <property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>10240</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>512</value>
</property> <property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx512m</value>
</property> <property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property> <property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property> <property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property> <property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>10</value>
</property> <property>
<name>yarn.scheduler.increment-allocation-mb</name>
<value>512</value>
</property> <property>
<name>yarn.scheduler.increment-allocation-vcores</name>
<value>1</value>
</property>

        4)mapred-site.xml

  <property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property> <property>
<name>mapreduce.jobhistory.address</name>
<value>master.hadoop.com:10020</value>
</property> <property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master.hadoop.com:19888</value>
</property> <property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user/history</value>
</property> <property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/user/history/intermediate-done-dir</value>
</property> <property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/user/history/done-dir</value>
</property>

      (4)复制配置文件到集群中的所有主机上

          scp /etc/hadoop/conf.my_cluster/*-site.xml  192.168.50.*:/etc/hadoop/conf.my_cluster/

     4. 格式化HDFS

       sudo -u hdfs hdfs namenode -format

     5. 启动HDFS

       for x in `cd /etc/init.d ; ls hadoop-hdfs-*`; do service $x start; done

     6. 在HDFS上创建必要的文件夹

sudo -u hdfs hadoop fs -mkdir -p /tmp && sudo -u hdfs hadoop fs -chmod -R  /tmp
sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn
sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging && sudo -u hdfs hadoop fs -chmod -R /tmp
sudo -u hdfs hadoop fs -mkdir -p /var
sudo -u hdfs hadoop fs -mkdir -p /var/log && sudo -u hdfs hadoop fs -chmod -R /var/log && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log
sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chmod -R /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn/apps
sudo -u hdfs hadoop fs -mkdir -p /user
sudo -u hdfs hadoop fs -mkdir -p /user/history && sudo -u hdfs hadoop fs -chown mapred /user/history
sudo -u hdfs hadoop fs -mkdir -p /user/test && sudo -u hdfs hadoop fs -chmod -R /user/test && sudo -u hdfs hadoop fs -chown test /user/test
sudo -u hdfs hadoop fs -mkdir -p /user/root && sudo -u hdfs hadoop fs -chmod -R /user/root && sudo -u hdfs hadoop fs -chown root /user/root

     7. 操作YARN

       在集群中每台机器上执行如下命令:

      (1)启动

service hadoop-yarn-resourcemanager start;service hadoop-mapreduce-historyserver start;service hadoop-yarn-proxyserver start;service hadoop-yarn-nodemanager start

      (2)查看

service hadoop-yarn-resourcemanager status;service hadoop-mapreduce-historyserver status;service hadoop-yarn-proxyserver status;service hadoop-yarn-nodemanager status

      (3)停止

service hadoop-yarn-resourcemanager stop;service hadoop-mapreduce-historyserver stop;service hadoop-yarn-proxyserver stop;service hadoop-yarn-nodemanager stop 

       (4)重启

service hadoop-yarn-resourcemanager restart;service hadoop-mapreduce-historyserver restart;service hadoop-yarn-proxyserver restart;service hadoop-yarn-nodemanager restart

     8. 安装Hadoop客户端

      (1)安装CentOS 6.5

      (2)以root身份登录,执行以下命令:

rpm -ivh jdk-7u67-linux-x64.rpm

yum install hadoop-client

cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster
alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster scp 192.168.50.10:/etc/hadoop/conf.my_cluster/*-site.xml /etc/hadoop/conf.my_cluster/
scp 192.168.50.10:/etc/hosts /etc/
scp 192.168.50.10:/etc/profile.d/hadoop-env.sh /etc/profile.d/
. /etc/profile useradd -u 700 -g hadoop test
passwd test <test用户密码>

    9. 测试Hadoop with YARN

su - test

#计算Pi
hadoop fs -mkdir input
hadoop fs -put /etc/hadoop/conf/*.xml input
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input output
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 2 100 #执行grep任务
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output 'dfs[a-z.]+'
hadoop fs -ls output
hadoop fs -cat output/part-r-00000 | head

【参考】

    1)Cloudera 官方安装文档     http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_command_line.html

 

【Hadoop学习】CDH5.2安装部署的更多相关文章

  1. Ganglia监控Hadoop集群的安装部署[转]

    Ganglia监控Hadoop集群的安装部署 一. 安装环境 Ubuntu server 12.04 安装gmetad的机器:192.168.52.105 安装gmond的机 器:192.168.52 ...

  2. Hadoop分布式HA的安装部署

    Hadoop分布式HA的安装部署 前言 单机版的Hadoop环境只有一个namenode,一般namenode出现问题,整个系统也就无法使用,所以高可用主要指的是namenode的高可用,即存在两个n ...

  3. Apache Hadoop集群离线安装部署(三)——Hbase安装

    Apache Hadoop集群离线安装部署(一)——Hadoop(HDFS.YARN.MR)安装:http://www.cnblogs.com/pojishou/p/6366542.html Apac ...

  4. Apache Hadoop集群离线安装部署(二)——Spark-2.1.0 on Yarn安装

    Apache Hadoop集群离线安装部署(一)——Hadoop(HDFS.YARN.MR)安装:http://www.cnblogs.com/pojishou/p/6366542.html Apac ...

  5. Apache Hadoop集群离线安装部署(一)——Hadoop(HDFS、YARN、MR)安装

    虽然我已经装了个Cloudera的CDH集群(教程详见:http://www.cnblogs.com/pojishou/p/6267616.html),但实在太吃内存了,而且给定的组件版本是不可选的, ...

  6. 【Spark学习】Spark 1.1.0 with CDH5.2 安装部署

    [时间]2014年11月18日 [平台]Centos 6.5 [工具]scp [软件]jdk-7u67-linux-x64.rpm spark-worker-1.1.0+cdh5.2.0+56-1.c ...

  7. 高可用Hadoop平台-Ganglia安装部署

    1.概述 最近,有朋友私密我,Hadoop有什么好的监控工具,其实,Hadoop的监控工具还是蛮多的.今天给大家分享一个老牌监控工具Ganglia,这个在企业用的也算是比较多的,Hadoop对它的兼容 ...

  8. Hadoop+Hbas完全分布式安装部署

    Hadoop安装部署基本步骤: 1.安装jdk,配置环境变量. jdk可以去网上自行下载,环境变量如下: 编辑  vim  /etc/profile 文件,添加如下内容: export JAVA_HO ...

  9. hadoop学习通过虚拟机安装hadoop完全分布式集群

    要想深入的学习hadoop数据分析技术,首要的任务是必须要将hadoop集群环境搭建起来,可以将hadoop简化地想象成一个小软件,通过在各个物理节点上安装这个小软件,然后将其运行起来,就是一个had ...

  10. Hadoop完全分布式模式安装部署

    在Linux上搭建Hadoop系列:1.Hadoop环境搭建流程图2.搭建Hadoop单机模式3.搭建Hadoop伪分布式模式4.搭建Hadoop完全分布式模式 注:此教程皆是以范例讲述的,当然你可以 ...

随机推荐

  1. 团体程序设计天梯赛-练习集L1-023. 输出GPLT

    L1-023. 输出GPLT 时间限制 150 ms 内存限制 65536 kB 代码长度限制 8000 B 判题程序 Standard 作者 陈越 给定一个长度不超过10000的.仅由英文字母构成的 ...

  2. sequel 连接不上,命令行能连上

    Sequel pro won't connect anymore I'm running into some trouble right now. I worked yesterday on my d ...

  3. Java 程序检查远程服务器状态

    通常我们以命令的方式判断远程服务器是否正常运行有两种方式,ping 或 telnet 一个远程端口.假设我们要检查的远程服务器都是 Linux 系统. 从 JDK 1.5 以后, InetAddres ...

  4. [itint5]字符串匹配

    http://www.itint5.com/oj/#15 用hash来做,目前为止做到最好也是case16超时(20w的规模),即使分桶也超时.注意计算hashcode时,'a'要算成1,否则如果'a ...

  5. Qt: 访问容器(三种方法,加上for循环就四种了)good

    #include <iostream>#include <QString>#include <QList>#include <QListIterator> ...

  6. 173. Binary Search Tree Iterator

    题目: Implement an iterator over a binary search tree (BST). Your iterator will be initialized with th ...

  7. SGU 101

    SGU 101,郁闷,想出来算法,但是不知道是哪个地方的问题,wa在第四个test上. #include <iostream> #include <vector> #inclu ...

  8. 在VS中让一个JS文件智能提示另一个JS文件中的成员

    “在VS中如何让一个JS文件智能提示另一个JS文件中的成员” 有时候会有这种情况:当我的一个Web页面引用了两个JS文件(假如分别叫common.js和JScript1.js),如果JScript1. ...

  9. 1701. Ostap and Partners(并查集-关系)

    1701 又是类似食物链的这一类题 这题是找与根节点的和差关系 因为0节点是已知的 为0  那么所有的都可以转换为与0的和差关系 可以规定合并的两节点 由大的指向小的 然后再更新和差关系 有可能最后有 ...

  10. Fody

    Fody  https://github.com/Fody/Fody/ 有空还要看下怎么实现的.