注意:所有的安装用普通哟用户安装,所以首先使普通用户可以以sudo执行一些命令:

0.虚拟机中前期的网络配置参考:

  http://www.cnblogs.com/qlqwjy/p/7783253.html

1.赋予hadoop用户以sudo执行一些命令

visodo
或者
vim /etc/sudoers

添加下面第二行内容:

登录hadoop用户查看命令:

[hadoop@localhost java]$ sudo -l  #查看当前用户可以以sudo命令执行哪些命令
Matching Defaults entries for hadoop on this host:
requiretty, !visiblepw, always_set_home, env_reset, env_keep="COLORS DISPLAY HOSTNAME HISTSIZE INPUTRC KDEDIR
LS_COLORS", env_keep+="MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE", env_keep+="LC_COLLATE
LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES", env_keep+="LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE",
env_keep+="LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY", secure_path=/sbin\:/bin\:/usr/sbin\:/usr/bin User hadoop may run the following commands on this host:
(ALL) ALL

 ------------------------安装hadoop运行环境,切换到hadoop用户----------------------

  我所有的文件上传采用的sftp,建议安装git工具自带ssh和sftp等。注意自己的linux位数,我刚开始安装的64位JDK,结果linux是32位,JDK不能用

查看位数:

uname -a
或者
getconf LONG_BIT

1.安装JDK

(1)上传到服务器之后解压

sudo tar -zxvf ./jdk-7u65-linux-i586.tar.gz 

(2)查看当前安装目录:

[hadoop@localhost jdk1..0_65]$ pwd
/opt/java/jdk1..0_65

(3)配置环境变量 ;

[hadoop@localhost jdk1..0_65]$ tail - ~/.bashrc
export JAVA_HOME=/opt/java/jdk1..0_65
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:${PATH}

重新加载环境变量:

[hadoop@localhost jdk1..0_65]$ source ~/.bashrc 

 (4)执行java或者javac测试:

[hadoop@localhost jdk1..0_65]$ java -vsersion
Unrecognized option: -vsersion
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
[hadoop@localhost jdk1..0_65]$ javac -version
javac 1.7.0_65

2. 安装hadoop2.4.1

(1)将文件上传到服务器

sftp> put hadoop-2.4..tar.gz

(2)解压

sudo tar -zxvf ./hadoop-2.4..tar.gz

(3)解压后查看目录:

[hadoop@localhost hadoop-2.4.]$ ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share

  其中java相关的jar包存放在share目录,下面还有个docs目录,没啥用,删掉就行了。

  bin是可执行文件

  etc是hadoop是相关配置文件

  lib,libexec是相关的本地服务

  sbin是hadoop的管理执行文件

 

(4)修改配置文件:hadoop2.x的配置文件$HADOOP_HOME/etc/hadoop

  • 修改:hadoop-env.sh(设置JDK环境变量)

#第27行

export JAVA_HOME=/opt/java/jdk1..0_65
  • 修改:core-site.xml
        <!-- 指定HADOOP所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/hadoop-2.4.1/data/</value>
</property>
  • 修改hdfs-site.xml   hdfs-default.xml
        <!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
   </property>
    • 修改   mapred-site.xml  (mapreduce)

首先将mapred-site.xml.template改名字为mapred-site.xml。否则hadoop不会读取

[hadoop@localhost hadoop]$ sudo mv ./mapred-site.xml.template ./mapred-site.xml

修改:

        <!-- 指定mapreduce运行在yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
   </property>
    • 修改 yarn-site.xml  (修改yarn)
        <!-- 指定YARN的老大(ResourceManager)的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
   </property>
<!-- reducer获取数据的方式 -->
   <property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
 </property>

(5)关闭linux的防火墙:

[root@localhost ~]# service iptables stop  #关闭防火墙
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
[root@localhost ~]# ls
anaconda-ks.cfg install.log install.log.syslog
[root@localhost ~]# service iptables status  #查看iptables状态
iptables: Firewall is not running.

3.启动hadoop与测试hadoop

(1)前期准备

  • 首先将hadoop添加到环境变量,便于在任意目录使用hadoop的命令:
export JAVA_HOME=/opt/java/jdk1..0_65
export HADOOP_HOME=/opt/hadoop/hadoop-2.4.1
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:${PATH}:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
  • 格式化namenode(是对namenode进行初始化)
hdfs namenode -format (hadoop namenode -format)

执行命令之后会在我们的配置的hadoop的临时目录下面创建  dfs/name/current/    目录并且写入四个文件:

[root@localhost data]# ll ./dfs/name/current/
total
-rw-r--r--. root root Apr : fsimage_0000000000000000000
-rw-r--r--. root root Apr : fsimage_0000000000000000000.md5
-rw-r--r--. root root Apr : seen_txid
-rw-r--r--. root root Apr : VERSION

(2)启动hadoop(最好设置ssh秘钥登录,否则会输入多次密码,可以自己写个shell脚本调用hdfs和yarn两个ssh脚本)

  • 启动HDFS

先启动HDFS,到hadoop安装目录下:  /opt/hadoop/hadoop-2.4.1/sbin
  

sbin/start-dfs.sh

验证是否启动成功

[root@localhost sbin]# jps
SecondaryNameNode
Jps
DataNode
NameNode

解释:  上面启动hadoop的时候会读取启动localhost的Namenode,因为hadoop的安装目录下的etc下有个slaves文件,指定从哪些机器启动Namenode

如果搭建多个节点需要在下面的配置文件增加节点,正规的分布式集群

[root@localhost hadoop]# cat ./slaves
localhost
  • 启动yarn
[root@localhost sbin]# ./start-yarn.sh

再次查看:

[root@localhost sbin]# jps
NodeManager
ResourceManager
SecondaryNameNode
DataNode
Jps
NameNode

(3)测试上面启动的hdfs和yarn

http://192.168.2.136:50070 (HDFS管理界面)
http://192.168.2.136:8088 (MR管理界面)

  • 测试hdfs

我们也可以通过网页浏览hafs文件:

首先我们上传一个文件:

[root@localhost ~]# ll
total
-rw-------. root root Sep anaconda-ks.cfg
-rw-r--r--. root root Sep install.log
-rw-r--r--. root root Sep install.log.syslog
[root@localhost ~]# hadoop fs -put install.log hdfs://localhost:9000/  #将当前目录下的install.log上传到hsfs的根目录下

 接下来我们再次查看数据会发现:

点开也可以下载文件:

我们在本地删掉install.log然后从hdfs中下载文件:

[root@localhost ~]# rm -rf ./install.log  #删除文件
[root@localhost ~]# ls
anaconda-ks.cfg install.log.syslog [root@localhost ~]# hadoop fs -get hdfs://localhost:9000/install.log  #hadoop下载文件
[root@localhost ~]# ls  
anaconda-ks.cfg install.log install.log.syslo
  • 测试mapreduce

由于我们没有编写mapreduce程序,所以我们需要利用hadoop自带的一些程序进行测试,下面测试一个求PI的值和一个统计单词出现次数的mapreduce程序

进入到hadoop的mapreduce目录下:

[root@localhost mapreduce]# pwd
/opt/hadoop/hadoop-2.4./share/hadoop/mapreduce

例一:计算求pi值的mapreduce程序

[root@localhost mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar pi 5 5  #执行求pi值的mapreduce,开启5个map,每个map取样5个
Number of Maps = 5
Samples per Map = 5
Wrote input for Map #
Wrote input for Map #
Wrote input for Map #
Wrote input for Map #
Wrote input for Map #
Starting Job
// :: INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:
// :: INFO input.FileInputFormat: Total input paths to process :
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1523441540916_0001
// :: INFO impl.YarnClientImpl: Submitted application application_1523441540916_0001
// :: INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1523441540916_0001/
// :: INFO mapreduce.Job: Running job: job_1523441540916_0001
// :: INFO mapreduce.Job: Job job_1523441540916_0001 running in uber mode : false
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Job job_1523441540916_0001 completed successfully
// :: INFO mapreduce.Job: Counters:
File System Counters
FILE: Number of bytes read=
FILE: Number of bytes written=
FILE: Number of read operations=
FILE: Number of large read operations=
FILE: Number of write operations=
HDFS: Number of bytes read=
HDFS: Number of bytes written=
HDFS: Number of read operations=
HDFS: Number of large read operations=
HDFS: Number of write operations=
Job Counters
Launched map tasks=
Launched reduce tasks=
Data-local map tasks=
Total time spent by all maps in occupied slots (ms)=
Total time spent by all reduces in occupied slots (ms)=
Total time spent by all map tasks (ms)=
Total time spent by all reduce tasks (ms)=
Total vcore-seconds taken by all map tasks=
Total vcore-seconds taken by all reduce tasks=
Total megabyte-seconds taken by all map tasks=
Total megabyte-seconds taken by all reduce tasks=
Map-Reduce Framework
Map input records=
Map output records=
Map output bytes=
Map output materialized bytes=
Input split bytes=
Combine input records=
Combine output records=
Reduce input groups=
Reduce shuffle bytes=
Reduce input records=
Reduce output records=
Spilled Records=
Shuffled Maps =
Failed Shuffles=
Merged Map outputs=
GC time elapsed (ms)=
CPU time spent (ms)=
Physical memory (bytes) snapshot=
Virtual memory (bytes) snapshot=
Total committed heap usage (bytes)=
Shuffle Errors
BAD_ID=
CONNECTION=
IO_ERROR=
WRONG_LENGTH=
WRONG_MAP=
WRONG_REDUCE=
File Input Format Counters
Bytes Read=
File Output Format Counters
Bytes Written=
Job Finished in 188.318 seconds
Estimated value of Pi is 3.68000000000000000000  #计算结果

例二:一个wordcount的mapreduce(给一篇英文文章,会统计每个单词出现的次数)

(1)编辑一个英文文件

[root@localhost mapreduce]# cat ./test.txt
hello lll
hello kkk
hello meinv
hello

(2)为了计算我们需要将文件上传到hdfs中

先在hdfs中建一个目录:(两种创建目录的方式)

[root@localhost mapreduce]# hadoop fs -mkdir hdfs://localhost:9000/wordcount  #第一种
[root@localhost mapreduce]# hadoop fs -mkdir /wordcount/input          #第二种。/是相对于hdfs的根目录

然后我们可以在hdfs的web管理中看到目录:(其中tmp和user是我们执行上一个程序产生的目录)

 接下来我们将上面的英文文件上传到hdfs的wordcount/input/目录下

[root@localhost mapreduce]# hadoop fs -put test.txt /wordcount/input

从web中查看目录;

测试wordcount程序:(mapreduce启动很慢,因为要启动很多程序)

测试统计hdfs的/wordcount/input目录下的所有的文件,并将统计结果输出到/wordcount/output目录中,/是hdfs的根目录

[root@localhost mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4..jar wordcount /wordcount/input /wordcount/output
// :: INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:
// :: INFO input.FileInputFormat: Total input paths to process :
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1523441540916_0002
// :: INFO impl.YarnClientImpl: Submitted application application_1523441540916_0002
// :: INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1523441540916_0002/
// :: INFO mapreduce.Job: Running job: job_1523441540916_0002
// :: INFO mapreduce.Job: Job job_1523441540916_0002 running in uber mode : false
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Job job_1523441540916_0002 completed successfully
// :: INFO mapreduce.Job: Counters:
File System Counters
FILE: Number of bytes read=
FILE: Number of bytes written=
FILE: Number of read operations=
FILE: Number of large read operations=
FILE: Number of write operations=
HDFS: Number of bytes read=
HDFS: Number of bytes written=
HDFS: Number of read operations=
HDFS: Number of large read operations=
HDFS: Number of write operations=
Job Counters
Launched map tasks=
Launched reduce tasks=
Data-local map tasks=
Total time spent by all maps in occupied slots (ms)=
Total time spent by all reduces in occupied slots (ms)=
Total time spent by all map tasks (ms)=
Total time spent by all reduce tasks (ms)=
Total vcore-seconds taken by all map tasks=
Total vcore-seconds taken by all reduce tasks=
Total megabyte-seconds taken by all map tasks=
Total megabyte-seconds taken by all reduce tasks=
Map-Reduce Framework
Map input records=
Map output records=
Map output bytes=
Map output materialized bytes=
Input split bytes=
Combine input records=
Combine output records=
Reduce input groups=
Reduce shuffle bytes=
Reduce input records=
Reduce output records=
Spilled Records=
Shuffled Maps =
Failed Shuffles=
Merged Map outputs=
GC time elapsed (ms)=
CPU time spent (ms)=
Physical memory (bytes) snapshot=
Virtual memory (bytes) snapshot=
Total committed heap usage (bytes)=
Shuffle Errors
BAD_ID=
CONNECTION=
IO_ERROR=
WRONG_LENGTH=
WRONG_MAP=
WRONG_REDUCE=
File Input Format Counters
Bytes Read=
File Output Format Counter

 查看hdfs的/wordcount/output目录下的文件信息:

[root@localhost mapreduce]# hadoop fs -ls /wordcount/output  查看目录信息
Found items
-rw-r--r-- root supergroup -- : /wordcount/output/_SUCCESS
-rw-r--r-- root supergroup -- : /wordcount/output/part-r-

 查看统计结果文件信息:

[root@localhost mapreduce]# hadoop fs -cat /wordcount/output/part-r-
hello
kkk
lll
meinv

 也可以从web中下载查看:

hadoop2.4.1伪分布式环境搭建的更多相关文章

  1. Hadoop2.5.0伪分布式环境搭建

    本章主要介绍下在Linux系统下的Hadoop2.5.0伪分布式环境搭建步骤.首先要搭建Hadoop伪分布式环境,需要完成一些前置依赖工作,包括创建用户.安装JDK.关闭防火墙等. 一.创建hadoo ...

  2. 在Win7虚拟机下搭建Hadoop2.6.0伪分布式环境

    近几年大数据越来越火热.由于工作需要以及个人兴趣,最近开始学习大数据相关技术.学习过程中的一些经验教训希望能通过博文沉淀下来,与网友分享讨论,作为个人备忘. 第一篇,在win7虚拟机下搭建hadoop ...

  3. OS X Yosemite下安装Hadoop2.5.1伪分布式环境

    最近开始学习Hadoop,一直使用的是公司配好的环境.用了一段时间后发现对Hadoop还是一知半解,故决定动手在本机上安装一个供学习研究使用.正好自己用的是mac,所以没啥说的,直接安装. 总体流程 ...

  4. Hadoop学习笔记1:伪分布式环境搭建

    在搭建Hadoop环境之前,请先阅读如下博文,把搭建Hadoop环境之前的准备工作做好,博文如下: 1.CentOS 6.7下安装JDK , 地址: http://blog.csdn.net/yule ...

  5. 【Hadoop】伪分布式环境搭建、验证

    Hadoop伪分布式环境搭建: 自动部署脚本: #!/bin/bash set -eux export APP_PATH=/opt/applications export APP_NAME=Ares ...

  6. 大数据:Hadoop(JDK安装、HDFS伪分布式环境搭建、HDFS 的shell操作)

    所有的内容都来源与 Hadoop 官方文档 一.Hadoop 伪分布式安装步骤 1)JDK安装 解压:tar -zxvf jdk-7u79-linux-x64.tar.gz -C ~/app 添加到系 ...

  7. 【Hadoop离线基础总结】CDH版本Hadoop 伪分布式环境搭建

    CDH版本Hadoop 伪分布式环境搭建 服务规划 步骤 第一步:上传压缩包并解压 cd /export/softwares/ tar -zxvf hadoop-2.6.0-cdh5.14.0.tar ...

  8. HDFS 伪分布式环境搭建

    HDFS 伪分布式环境搭建 作者:Grey 原文地址: 博客园:HDFS 伪分布式环境搭建 CSDN:HDFS 伪分布式环境搭建 相关软件版本 Hadoop 2.6.5 CentOS 7 Oracle ...

  9. Hadoop-2.4.1完全分布式环境搭建

      Hadoop-2.4.1完全分布式环境搭建   Hadoop-2.4.1完全分布式环境搭建 一.配置步骤如下: 主机环境搭建,这里是使用了5台虚拟机,在ubuntu 13系统上进行搭建hadoop ...

随机推荐

  1. C# 中的 Async 和 Await

    这篇文章由Filip Ekberg为DNC杂志编写. 自跟随着.NET 4.5 及Visual Studio 2012的C# 5.0起,我们能够使用涉及到async和await关键字的新的异步模式.有 ...

  2. subprocess模块详解

    subprocess是Python与系统交互的一个库,该模块允许生成新进程,连接到它们的输入/输出/错误管道,并获取它们的返回代码. 该模块旨在替换几个较旧的模块和功能: os.system os.s ...

  3. javascript中的this作用域详解

    javascript中的this作用域详解 Javascript中this的指向一直是困扰我很久的问题,在使用中出错的机率也非常大.在面向对象语言中,它代表了当前对象的一个引用,而在js中却经常让我觉 ...

  4. BZOJ3572:[HNOI2014]世界树——题解

    +++++++++++++++++++++++++++++++++++++++++++ +本文作者:luyouqi233. + +欢迎访问我的博客:http://www.cnblogs.com/luy ...

  5. CF549H:Degenerate Matrix ——题解

    https://vjudge.net/problem/CodeForces-549H ———————————————————————— 题目大意:给一个矩阵,每个数可以加任意的数使得该矩阵为退化矩阵( ...

  6. bzoj1867: [Noi1999]钉子和小球(DP)

    一眼题...输出分数格式才是这题的难点QAQ 学习了分数结构体... #include<iostream> #include<cstring> #include<cstd ...

  7. spring的RestTemplate使用指南

    前言:现在restful接口越来越广泛,而如今很多接口摒弃了传统的配置复杂的webService开发模式,在java领域只需要很简单的springMvc就可以声明为一个控制器,再加上service层, ...

  8. [python]爬站点

    #!/usr/bin/python 2 import urllib 3 import urllib2 4 import re 5 import os 6 7 dirs = ['js','img','p ...

  9. bzoj 4488 [Jsoi2015]最大公约数 结论+暴力

    [Jsoi2015]最大公约数 Time Limit: 10 Sec  Memory Limit: 256 MBSubmit: 302  Solved: 169[Submit][Status][Dis ...

  10. Nginx简介及使用Nginx实现负载均衡的原理【通俗易懂,言简意赅】【转】

    Nginx 这个轻量级.高性能的 web server 主要可以干两件事情: 直接作为http server(代替apache,对PHP需要FastCGI处理器支持): 另外一个功能就是作为反向代理服 ...