Ubuntu上搭建Hadoop环境(单机模式+伪分布模式) (转载)
Hadoop在处理海量数据分析方面具有独天优势。今天花了在自己的Linux上搭建了伪分布模式,期间经历很多曲折,现在将经验总结如下。
首先,了解Hadoop的三种安装模式:
1. 单机模式. 单机模式是Hadoop的默认模。当配置文件为空时,Hadoop完全运行在本地。因为不需要与其他节点交互,单机模式就不使用HDFS,也不加载任何Hadoop的守护进程。该模式主要用于开发调试MapReduce程序的应用逻辑。
2. 伪分布模式. Hadoop守护进程运行在本地机器上,模拟一个小规模的的集群。该模式在单机模式之上增加了代码调试功能,允许你检查内存使用情况,HDFS输入输出,以及其他的守护进程交互。
3. 全分布模式. Hadoop守护进程运行在一个集群上。
参考资料:
1. Ubuntu11.10下安装Hadoop1.0.0(单机伪分布式)
5. Ubuntu上搭建Hadoop环境(单机模式+伪分布模式)
6. Hadoop的快速入门之 Ubuntu上搭建Hadoop环境(单机模式+伪分布模式)
本人极力推荐5和6,这两种教程从简到难,步骤详细,且有运行算例。下面我就将自己的安装过程大致回顾一下,为省时间,很多文字粘贴子参考资料5和6,再次感谢两位作者分享自己的安装经历。另外,下面的三篇文章可以从整体上把握Hadoop的结构,使你能够理解为什么要这么这么做。
我的安装的是ubuntu12.o4, 用户名derek, 机器名称是derekUbn, Hadoop的版本Hadoop-1.1.2.tar.gz,闲话少说,步骤和每一步的图示如下:
一、在Ubuntu下创建hadoop用户组和用户
1.添加hadoop用户到系统用户
- derek@derekUbun:~$ sudo addgroup hadoop
- derek@derekUbun:~$ sudo adduser --ingroup hadoop hadoop
- derek@derekUbun:~$ sudo addgroup hadoop
- derek@derekUbun:~$ sudo adduser --ingroup hadoop hadoop
2. 现在只是添加了一个用户hadoop,它并不具备管理员权限,我们给hadoop用户添加权限,打开/etc/sudoers文件
- derek@derekUbun:~$ sudo gedit /etc/sudoers
- derek@derekUbun:~$ sudo gedit /etc/sudoers
在root ALL=(ALL:ALL) ALL下添加hadoop ALL=(ALL:ALL) ALL
二、配置SSH
配置SSH是为了实现各机器之间执行指令无需输入登录密码。务必要避免输入密码,否则,主节点每次试图访问其他节点时,都需要手动输入这个密码。
SSH无密码原理:master(namenode/jobtrack)作为客户端,要实现
无密码公钥认证,连接到服务器slave(datanode/tasktracker)上时,需要在master上生成一个公钥对,包括一个公钥和一个私
钥,而后将公钥复制到所有的slave上。当master通过SSH连接slave时,slave就会生成一个随机数并用master的公钥对随机数进行
加密,并发送给master。Master收到密钥加密数之后再用私钥解密,并将解密数回传给slave,slave确认解密数无误后就允许master
进行连接了。这就是一个公钥认证的过程,期间不需要用户手工输入密码。重要过程是将客户端master复制到slave上。
1、安装ssh
1) 由于Hadoop用ssh通信,先安装ssh. 注意,我先从derek用户转到了hadoop.
- derek@derekUbun:~$ su - hadoop
- 密码:
- hadoop@derekUbun:~$ sudo apt-get install openssh-server
- [sudo] password for hadoop:
- 正在读取软件包列表... 完成
- 正在分析软件包的依赖关系树
- 正在读取状态信息... 完成
- openssh-server 已经是最新的版本了。
- 下列软件包是自动安装的并且现在不需要了:
- kde-l10n-de language-pack-kde-de language-pack-kde-en ssh-krb5
- language-pack-de-base language-pack-kde-zh-hans language-pack-kde-en-base
- kde-l10n-engb language-pack-kde-de-base kde-l10n-zhcn firefox-locale-de
- language-pack-de language-pack-kde-zh-hans-base
- 使用'apt-get autoremove'来卸载它们
- 升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 505 个软件包未被升级。
- derek@derekUbun:~$ su - hadoop
- 密码:
- hadoop@derekUbun:~$ sudo apt-get install openssh-server
- [sudo] password for hadoop:
- 正在读取软件包列表... 完成
- 正在分析软件包的依赖关系树
- 正在读取状态信息... 完成
- openssh-server 已经是最新的版本了。
- 下列软件包是自动安装的并且现在不需要了:
- kde-l10n-de language-pack-kde-de language-pack-kde-en ssh-krb5
- language-pack-de-base language-pack-kde-zh-hans language-pack-kde-en-base
- kde-l10n-engb language-pack-kde-de-base kde-l10n-zhcn firefox-locale-de
- language-pack-de language-pack-kde-zh-hans-base
- 使用'apt-get autoremove'来卸载它们
- 升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 505 个软件包未被升级。
因为我的机器已安装最新版的ssh,因此这一步实际上什么也没做。
2) 假设ssh安装完成,先启动服务。启动后,可以通过命令查看服务是否正确启动:
- hadoop@derekUbun:~$ sudo /etc/init.d/ssh start
- Rather than invoking init scripts through /etc/init.d, use the service(8)
- utility, e.g. service ssh start
- Since the script you are attempting to invoke has been converted to an
- Upstart job, you may also use the start(8) utility, e.g. start ssh
- hadoop@derekUbun:~$ ps -e |grep ssh
- 759 ? 00:00:00 sshd
- 1691 ? 00:00:00 ssh-agent
- 12447 ? 00:00:00 ssh
- 12448 ? 00:00:00 sshd
- 12587 ? 00:00:00 sshd
- hadoop@derekUbun:~$
- hadoop@derekUbun:~$ sudo /etc/init.d/ssh start
- Rather than invoking init scripts through /etc/init.d, use the service(8)
- utility, e.g. service ssh start
- Since the script you are attempting to invoke has been converted to an
- Upstart job, you may also use the start(8) utility, e.g. start ssh
- hadoop@derekUbun:~$ ps -e |grep ssh
- 759 ? 00:00:00 sshd
- 1691 ? 00:00:00 ssh-agent
- 12447 ? 00:00:00 ssh
- 12448 ? 00:00:00 sshd
- 12587 ? 00:00:00 sshd
- hadoop@derekUbun:~$
3) 作为一个安全通信协议(ssh生成密钥有rsa和dsa两种生成方式,默认情况下采用rsa方式),使用时需要密码,因此我们要设置成免密码登录,生成私钥和公钥:
- hadoop@derekUbun:~$ ssh-keygen -t rsa -P ""
- Generating public/private rsa key pair.
- Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
- /home/hadoop/.ssh/id_rsa already exists.
- Overwrite (y/n)? y
- Your identification has been saved in /home/hadoop/.ssh/id_rsa.
- Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
- The key fingerprint is:
- c7:36:c7:77:91:a2:32:28:35:a6:9f:36:dd:bd:dc:4f hadoop@derekUbun
- The key's randomart image is:
- +--[ RSA 2048]----+
- | |
- | .|
- | + . o |
- | + o. .. . .|
- | o .So=.o . .|
- | o oo+o.. . |
- | = . . . E|
- | . . . o. |
- | o .o|
- +-----------------+
- hadoop@derekUbun:~$
- hadoop@derekUbun:~$ ssh-keygen -t rsa -P ""
- Generating public/private rsa key pair.
- Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
- /home/hadoop/.ssh/id_rsa already exists.
- Overwrite (y/n)? y
- Your identification has been saved in /home/hadoop/.ssh/id_rsa.
- Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
- The key fingerprint is:
- c7:36:c7:77:91:a2:32:28:35:a6:9f:36:dd:bd:dc:4f hadoop@derekUbun
- The key's randomart image is:
- +--[ RSA 2048]----+
- | |
- | .|
- | + . o |
- | + o. .. . .|
- | o .So=.o . .|
- | o oo+o.. . |
- | = . . . E|
- | . . . o. |
- | o .o|
- +-----------------+
- hadoop@derekUbun:~$
(注:回车后会在~/.ssh/下生成两个文件:id_rsa和id_rsa.pub这两个文件是成对出现的前者为私钥,后者为公钥)
进入~/.ssh/目录下,将公钥id_rsa.pub追加到authorized_keys授权文件中,开始是没有authorized_keys文件的(authorized_keys 用于保存所有允许以当前用户身份登录到ssh客户端用户的公钥内容):
- hadoop@derekUbun:~$ cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys
- 现在可以登入ssh确认以后登录时不用输入密码:
- hadoop@derekUbun:~$ ssh localhost
- Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-27-generic-pae i686)
- * Documentation: https://help.ubuntu.com/
- 512 packages can be updated.
- 151 updates are security updates.
- Last login: Mon Mar 11 15:56:15 2013 from localhost
- hadoop@derekUbun:~$
- hadoop@derekUbun:~$ cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys
- 现在可以登入ssh确认以后登录时不用输入密码:
- hadoop@derekUbun:~$ ssh localhost
- Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-27-generic-pae i686)
- * Documentation: https://help.ubuntu.com/
- 512 packages can be updated.
- 151 updates are security updates.
- Last login: Mon Mar 11 15:56:15 2013 from localhost
- hadoop@derekUbun:~$
( 注:当ssh远程登录到其它机器后,现在你控制的是远程的机器,需要执行退出命令才能重新控制本地主机。)
登出:~$ exit
这样以后登录就不用输入密码了。
- hadoop@derekUbun:~$ exit
- Connection to localhost closed.
- hadoop@derekUbun:~$
- hadoop@derekUbun:~$ exit
- Connection to localhost closed.
- hadoop@derekUbun:~$
三、安装Java
使用derek用户,安装java. 因为我的电脑上已安装java,其安装目录是/usr/java/jdk1.7.0_17,可以显示我的这个安装版本。
- hadoop@derekUbun:~$ su - derek
- 密码:
- derek@derekUbun:~$ java -version
- java version "1.7.0_17"
- Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
- Java HotSpot(TM) Server VM (build 23.7-b01, mixed mode)
- hadoop@derekUbun:~$ su - derek
- 密码:
- derek@derekUbun:~$ java -version
- java version "1.7.0_17"
- Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
- Java HotSpot(TM) Server VM (build 23.7-b01, mixed mode)
四、安装hadoop-1.1.2
到官网下载hadoop源文件,我下载的是最新版本 jdk-7u17-linux-i586.tar.gz,将其解压并放到希望的目录中。我把
jdk-7u17-linux-i586.tar.gz放到/usr/local/hadoop,并将解压后的文件夹重命名为hadoop。
- hadoop@derekUbun:/usr
/local$ sudo tar xzf hadoop-1.1.2.tar.gz (注意,我已
将hadoop-1.1.2.tar.gz拷贝到usr/local/hadoop,然后转到hadoop用户上) - hadoop@derekUbun:/usr/local$ sudo mv hadoop-1.1.2 /usr/local/hadoop
- hadoop@derekUbun:/usr/local$ sudo tar xzf hadoop-1.1.2.tar.gz (注意,我已将hadoop-1.1.2.tar.gz拷贝到usr/local/hadoop,然后转到hadoop用户上)
- hadoop@derekUbun:/usr/local$ sudo mv hadoop-1.1.2 /usr/local/hadoop
要确保所有的操作都是在用户hadoop下完成的,所以将该hadoop文件夹的属主用户设为hadoop
- hadoop@derekUbun:/usr/local$ sudo chown -R hadoop:hadoop hadoop
- hadoop@derekUbun:/usr/local$ sudo chown -R hadoop:hadoop hadoop
五、配置hadoop-env.sh(Java 安装路径)
进入用hadoop用户登录,进入/usr/localhadoop目录,打开conf目录的hadoop-env.sh,添加以下信息:(找到#export JAVA_HOME=...,去掉#,然后加上本机jdk的路径)
export JAVA_HOME=/usr/java/jdk1.7.0_17 (视你机器的java安装路径而定,我的java安装目录是/usr/java/jdk1.7.0_17)
export HADOOP_INSTALL=/usr/local/hadoop( 注意,我这里用的HADOOP_INSTALL,而不是HADOOP_HOME,因为在新版中后者已经不用了。若用,会有警告)
export PATH=$PATH:/usr/local/hadoop/bin
- hadoop@derekUbun:/usr/local/hadoop$ sudo vi conf/hadoop-env.sh
- hadoop@derekUbun:/usr/local/hadoop$ sudo vi conf/hadoop-env.sh
- # Set Hadoop-specific environment variables here.
- # The only required environment variable is JAVA_HOME. All others are
- # optional. When running a distributed configuration it is best to
- # set JAVA_HOME in this file, so that it is correctly defined on
- # remote nodes.
- # The java implementation to use. Required.
- # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
- export JAVA_HOME=/usr/java/jdk1.7.0_17
- export HADOOP_INSTALL=/usr/local/hadoop
- export PATH=$PATH:/usr/local/hadoop/bin
- # Extra Java CLASSPATH elements. Optional.
- # export HADOOP_CLASSPATH=
- # The maximum amount of heap to use, in MB. Default is 1000.
- # export HADOOP_HEAPSIZE=2000
- # Extra Java runtime options. Empty by default.
- # export HADOOP_OPTS=-server
- "conf/hadoop-env.sh" 57L, 2356C
- # Set Hadoop-specific environment variables here.
- # The only required environment variable is JAVA_HOME. All others are
- # optional. When running a distributed configuration it is best to
- # set JAVA_HOME in this file, so that it is correctly defined on
- # remote nodes.
- # The java implementation to use. Required.
- # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
- export JAVA_HOME=/usr/java/jdk1.7.0_17
- export HADOOP_INSTALL=/usr/local/hadoop
- export PATH=$PATH:/usr/local/hadoop/bin
- # Extra Java CLASSPATH elements. Optional.
- # export HADOOP_CLASSPATH=
- # The maximum amount of heap to use, in MB. Default is 1000.
- # export HADOOP_HEAPSIZE=2000
- # Extra Java runtime options. Empty by default.
- # export HADOOP_OPTS=-server
- "conf/hadoop-env.sh" 57L, 2356C
并且,让环境变量配置生效source
- hadoop@derekUbun:/usr/local/hadoop$ source /usr/local/hadoop/conf/hadoop-env.sh
- hadoop@derekUbun:/usr/local/hadoop$ source /usr/local/hadoop/conf/hadoop-env.sh
至此,hadoop的单机模式已经安装成功。可以显示Hadoop版本如下
- hadoop@derekUbun:/usr/local/hadoop$ hadoop version
- Hadoop 1.1.2
- Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782
- Compiled by hortonfo on Thu Jan 31 02:03:24 UTC 2013
- From source with checksum c720ddcf4b926991de7467d253a79b8b
- hadoop@derekUbun:/usr/local/hadoop$
- hadoop@derekUbun:/usr/local/hadoop$ hadoop version
- Hadoop 1.1.2
- Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782
- Compiled by hortonfo on Thu Jan 31 02:03:24 UTC 2013
- From source with checksum c720ddcf4b926991de7467d253a79b8b
- hadoop@derekUbun:/usr/local/hadoop$
现在运行一下hadoop自带的例子WordCount来感受以下MapReduce过程:
在hadoop目录下新建input文件夹
- hadoop@derekUbun:/usr/local/hadoop$ mkdir input
- hadoop@derekUbun:/usr/local/hadoop$ mkdir input
将conf中的所有文件拷贝到input文件夹中
- hadoop@derekUbun:/usr/local/hadoop$ cp conf/* input
- hadoop@derekUbun:/usr/local/hadoop$ cp conf/* input
运行WordCount程序,并将结果保存到output中
- hadoop@derekUbun:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-1.1.2.jar wordcount input output
- hadoop@derekUbun:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-1.1.2.jar wordcount input output
运行
- hadoop@derekUbun:/usr/local/hadoop$ cat output/*
- hadoop@derekUbun:/usr/local/hadoop$ cat output/*
会看到conf所有文件的单词和频数都被统计出来。
六、 伪分布模式的一些配置
这里需要设定3个文件:core-site.xml hdfs-site.xml mapred-site.xml,都在/usr/local/hadoop/conf目录下
core-site.xml: Hadoop Core的配置项,例如HDFS和MapReduce常用的I/O设置等。
hdfs-site.xml: Hadoop 守护进程的配置项,包括namenode,辅助namenode和datanode等。
mapred-site.xml: MapReduce 守护进程的配置项,包括jobtracker和tasktracker。
1.编辑三个文件:
1). core-site.xml:
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/usr/local/hadoop/tmp</value>
- </property>
- </configuration>
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/usr/local/hadoop/tmp</value>
- </property>
- </configuration>
2).hdfs-site.xml:
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>2</value>
- </property>
- <property>
- <name>dfs.name.dir</name>
- <value>/usr/local/hadoop/datalog1,/usr/local/hadoop/datalog2</value>
- </property>
- <property>
- <name>dfs.data.dir</name>
- <value>/usr/local/hadoop/data1,/usr/local/hadoop/data2</value>
- </property>
- </configuration>
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>2</value>
- </property>
- <property>
- <name>dfs.name.dir</name>
- <value>/usr/local/hadoop/datalog1,/usr/local/hadoop/datalog2</value>
- </property>
- <property>
- <name>dfs.data.dir</name>
- <value>/usr/local/hadoop/data1,/usr/local/hadoop/data2</value>
- </property>
- </configuration>
3). mapred-site.xml:
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- </configuration>
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- </configuration>
2. 启动Hadoop到相关服务,格式化namenode, secondarynamenode, tasktracker:
- hadoop@derekUbun:/usr/local/hadoop$ source /usr/local/hadoop/conf/hadoop-env.sh
- hadoop@derekUbun:/usr/local/hadoop$ hadoop namenode -format
- hadoop@derekUbun:/usr/local/hadoop$ source /usr/local/hadoop/conf/hadoop-env.sh
- hadoop@derekUbun:/usr/local/hadoop$ hadoop namenode -format
看到下面的信息就说明hdfs文件系统格式化成功了
- 13/03/11 23:08:01 INFO common.Storage: Storage directory /usr/local/hadoop/datalog2 has been successfully formatted.
- 13/03/11 23:08:01 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at derekUbun/127.0.1.1
- ************************************************************/
- 13/03/11 23:08:01 INFO common.Storage: Storage directory /usr/local/hadoop/datalog2 has been successfully formatted.
- 13/03/11 23:08:01 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at derekUbun/127.0.1.1
- ************************************************************/
3. 启动Hadoop
接着执行start-all.sh来启动所有服务,包括namenode,datanode,start-all.sh脚本用来装载守护进程。用Java的jps命令列出所有守护进程来验证安装成功,出现如下列表,表明成功.
- hadoop@derekUbun:/usr/local/hadoop$ cd bin
- hadoop@derekUbun:/usr/local/hadoop/bin$ start-all.sh
- starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-namenode-derekUbun.out
- localhost: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-datanode-derekUbun.out
- localhost: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-derekUbun.out
- starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-derekUbun.out
- localhost: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-derekUbun.out
- hadoop@derekUbun:/usr/local/hadoop/bin$
- hadoop@derekUbun:/usr/local/hadoop$ cd bin
- hadoop@derekUbun:/usr/local/hadoop/bin$ start-all.sh
- starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-namenode-derekUbun.out
- localhost: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-datanode-derekUbun.out
- localhost: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-derekUbun.out
- starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-derekUbun.out
- localhost: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-derekUbun.out
- hadoop@derekUbun:/usr/local/hadoop/bin$
用Java的jps命令列出所有守护进程来验证安装成功
- hadoop@derekUbun:/usr/local/hadoop$ jps
- hadoop@derekUbun:/usr/local/hadoop$ jps
出现如下列表,表明成功
- hadoop@derekUbun:/usr/local/hadoop$ jps
- 8431 JobTracker
- 8684 TaskTracker
- 7821 NameNode
- 8915 Jps
- 8341 SecondaryNameNode
- hadoop@derekUbun:/usr/local/hadoop$
- hadoop@derekUbun:/usr/local/hadoop$ jps
- 8431 JobTracker
- 8684 TaskTracker
- 7821 NameNode
- 8915 Jps
- 8341 SecondaryNameNode
- hadoop@derekUbun:/usr/local/hadoop$
4. 检查运行状态
所有的设置已完成,Hadoop也启动了,现在可以通过下面的操作来查看服务是否正常,在Hadoop中用于监控集群健康状态的Web界面:
http://localhost:50030/ - Hadoop 管理介面
http://localhost:50060/- Hadoop Task Tracker 状态
http://localhost:50070/- Hadoop DFS 状态
至此,hadoop的伪分布模式已经安装成功,于是,再次在伪分布模式下运行一下hadoop自带的例子WordCount来感受以下MapReduce过程:
这时注意程序是在文件系统dfs运行的,创建的文件也都基于文件系统:
首先在dfs中创建input目录
- hadoop@derekUbun:/usr/local/hadoop$ hadoop dfs -mkdir input
- hadoop@derekUbun:/usr/local/hadoop$ hadoop dfs -mkdir input
将conf中的文件拷贝到dfs中的input
- hadoop@derekUbun:/usr/local/hadoop$ hadoop dfs -copyFromLocal conf/* input
- hadoop@derekUbun:/usr/local/hadoop$ hadoop dfs -copyFromLocal conf/* input
(注:可以使用查看和删除hadoop dfs中的文件)
在伪分布式模式下运行WordCount
- hadoop jar hadoop-examples-1.1.2.jar wordcount input output
- hadoop jar hadoop-examples-1.1.2.jar wordcount input output
- hadoop@derekUbun:/usr/local/hadoop$ hadoop jar hadoop-examples-1.1.2.jar wordcount input output
- 13/03/12 09:26:05 INFO input.FileInputFormat: Total input paths to process : 16
- 13/03/12 09:26:05 INFO util.NativeCodeLoader: Loaded the native-hadoop library
- 13/03/12 09:26:05 WARN snappy.LoadSnappy: Snappy native library not loaded
- 13/03/12 09:26:05 INFO mapred.JobClient: Running job: job_201303120920_0001
- 13/03/12 09:26:06 INFO mapred.JobClient: map 0% reduce 0%
- 13/03/12 09:26:10 INFO mapred.JobClient: map 12% reduce 0%
- 13/03/12 09:26:13 INFO mapred.JobClient: map 25% reduce 0%
- 13/03/12 09:26:15 INFO mapred.JobClient: map 37% reduce 0%
- 13/03/12 09:26:17 INFO mapred.JobClient: map 50% reduce 0%
- 13/03/12 09:26:18 INFO mapred.JobClient: map 62% reduce 0%
- 13/03/12 09:26:19 INFO mapred.JobClient: map 62% reduce 16%
- 13/03/12 09:26:20 INFO mapred.JobClient: map 75% reduce 16%
- 13/03/12 09:26:22 INFO mapred.JobClient: map 87% reduce 16%
- 13/03/12 09:26:24 INFO mapred.JobClient: map 100% reduce 16%
- 13/03/12 09:26:28 INFO mapred.JobClient: map 100% reduce 29%
- 13/03/12 09:26:30 INFO mapred.JobClient: map 100% reduce 100%
- 13/03/12 09:26:30 INFO mapred.JobClient: Job complete: job_201303120920_0001
- 13/03/12 09:26:30 INFO mapred.JobClient: Counters: 29
- 13/03/12 09:26:30 INFO mapred.JobClient: Job Counters
- 13/03/12 09:26:30 INFO mapred.JobClient: Launched reduce tasks=1
- 13/03/12 09:26:30 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=29912
- 13/03/12 09:26:30 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 13/03/12 09:26:30 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 13/03/12 09:26:30 INFO mapred.JobClient: Launched map tasks=16
- 13/03/12 09:26:30 INFO mapred.JobClient: Data-local map tasks=16
- 13/03/12 09:26:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19608
- 13/03/12 09:26:30 INFO mapred.JobClient: File Output Format Counters
- 13/03/12 09:26:30 INFO mapred.JobClient: Bytes Written=15836
- 13/03/12 09:26:30 INFO mapred.JobClient: FileSystemCounters
- 13/03/12 09:26:30 INFO mapred.JobClient: FILE_BYTES_READ=23161
- 13/03/12 09:26:30 INFO mapred.JobClient: HDFS_BYTES_READ=29346
- 13/03/12 09:26:30 INFO mapred.JobClient: FILE_BYTES_WRITTEN=944157
- 13/03/12 09:26:30 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=15836
- 13/03/12 09:26:30 INFO mapred.JobClient: File Input Format Counters
- 13/03/12 09:26:30 INFO mapred.JobClient: Bytes Read=27400
- 13/03/12 09:26:30 INFO mapred.JobClient: Map-Reduce Framework
- 13/03/12 09:26:30 INFO mapred.JobClient: Map output materialized bytes=23251
- 13/03/12 09:26:30 INFO mapred.JobClient: Map input records=778
- 13/03/12 09:26:30 INFO mapred.JobClient: Reduce shuffle bytes=23251
- 13/03/12 09:26:30 INFO mapred.JobClient: Spilled Records=2220
- 13/03/12 09:26:30 INFO mapred.JobClient: Map output bytes=36314
- 13/03/12 09:26:30 INFO mapred.JobClient: Total committed heap usage (bytes)=2736914432
- 13/03/12 09:26:30 INFO mapred.JobClient: CPU time spent (ms)=6550
- 13/03/12 09:26:30 INFO mapred.JobClient: Combine input records=2615
- 13/03/12 09:26:30 INFO mapred.JobClient: SPLIT_RAW_BYTES=1946
- 13/03/12 09:26:30 INFO mapred.JobClient: Reduce input records=1110
- 13/03/12 09:26:30 INFO mapred.JobClient: Reduce input groups=804
- 13/03/12 09:26:30 INFO mapred.JobClient: Combine output records=1110
- 13/03/12 09:26:30 INFO mapred.JobClient: Physical memory (bytes) snapshot=2738036736
- 13/03/12 09:26:30 INFO mapred.JobClient: Reduce output records=804
- 13/03/12 09:26:30 INFO mapred.JobClient: Virtual memory (bytes) snapshot=6773346304
- 13/03/12 09:26:30 INFO mapred.JobClient: Map output records=2615
- hadoop@derekUbun:/usr/local/hadoop$
- hadoop@derekUbun:/usr/local/hadoop$ hadoop jar hadoop-examples-1.1.2.jar wordcount input output
- 13/03/12 09:26:05 INFO input.FileInputFormat: Total input paths to process : 16
- 13/03/12 09:26:05 INFO util.NativeCodeLoader: Loaded the native-hadoop library
- 13/03/12 09:26:05 WARN snappy.LoadSnappy: Snappy native library not loaded
- 13/03/12 09:26:05 INFO mapred.JobClient: Running job: job_201303120920_0001
- 13/03/12 09:26:06 INFO mapred.JobClient: map 0% reduce 0%
- 13/03/12 09:26:10 INFO mapred.JobClient: map 12% reduce 0%
- 13/03/12 09:26:13 INFO mapred.JobClient: map 25% reduce 0%
- 13/03/12 09:26:15 INFO mapred.JobClient: map 37% reduce 0%
- 13/03/12 09:26:17 INFO mapred.JobClient: map 50% reduce 0%
- 13/03/12 09:26:18 INFO mapred.JobClient: map 62% reduce 0%
- 13/03/12 09:26:19 INFO mapred.JobClient: map 62% reduce 16%
- 13/03/12 09:26:20 INFO mapred.JobClient: map 75% reduce 16%
- 13/03/12 09:26:22 INFO mapred.JobClient: map 87% reduce 16%
- 13/03/12 09:26:24 INFO mapred.JobClient: map 100% reduce 16%
- 13/03/12 09:26:28 INFO mapred.JobClient: map 100% reduce 29%
- 13/03/12 09:26:30 INFO mapred.JobClient: map 100% reduce 100%
- 13/03/12 09:26:30 INFO mapred.JobClient: Job complete: job_201303120920_0001
- 13/03/12 09:26:30 INFO mapred.JobClient: Counters: 29
- 13/03/12 09:26:30 INFO mapred.JobClient: Job Counters
- 13/03/12 09:26:30 INFO mapred.JobClient: Launched reduce tasks=1
- 13/03/12 09:26:30 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=29912
- 13/03/12 09:26:30 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 13/03/12 09:26:30 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 13/03/12 09:26:30 INFO mapred.JobClient: Launched map tasks=16
- 13/03/12 09:26:30 INFO mapred.JobClient: Data-local map tasks=16
- 13/03/12 09:26:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19608
- 13/03/12 09:26:30 INFO mapred.JobClient: File Output Format Counters
- 13/03/12 09:26:30 INFO mapred.JobClient: Bytes Written=15836
- 13/03/12 09:26:30 INFO mapred.JobClient: FileSystemCounters
- 13/03/12 09:26:30 INFO mapred.JobClient: FILE_BYTES_READ=23161
- 13/03/12 09:26:30 INFO mapred.JobClient: HDFS_BYTES_READ=29346
- 13/03/12 09:26:30 INFO mapred.JobClient: FILE_BYTES_WRITTEN=944157
- 13/03/12 09:26:30 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=15836
- 13/03/12 09:26:30 INFO mapred.JobClient: File Input Format Counters
- 13/03/12 09:26:30 INFO mapred.JobClient: Bytes Read=27400
- 13/03/12 09:26:30 INFO mapred.JobClient: Map-Reduce Framework
- 13/03/12 09:26:30 INFO mapred.JobClient: Map output materialized bytes=23251
- 13/03/12 09:26:30 INFO mapred.JobClient: Map input records=778
- 13/03/12 09:26:30 INFO mapred.JobClient: Reduce shuffle bytes=23251
- 13/03/12 09:26:30 INFO mapred.JobClient: Spilled Records=2220
- 13/03/12 09:26:30 INFO mapred.JobClient: Map output bytes=36314
- 13/03/12 09:26:30 INFO mapred.JobClient: Total committed heap usage (bytes)=2736914432
- 13/03/12 09:26:30 INFO mapred.JobClient: CPU time spent (ms)=6550
- 13/03/12 09:26:30 INFO mapred.JobClient: Combine input records=2615
- 13/03/12 09:26:30 INFO mapred.JobClient: SPLIT_RAW_BYTES=1946
- 13/03/12 09:26:30 INFO mapred.JobClient: Reduce input records=1110
- 13/03/12 09:26:30 INFO mapred.JobClient: Reduce input groups=804
- 13/03/12 09:26:30 INFO mapred.JobClient: Combine output records=1110
- 13/03/12 09:26:30 INFO mapred.JobClient: Physical memory (bytes) snapshot=2738036736
- 13/03/12 09:26:30 INFO mapred.JobClient: Reduce output records=804
- 13/03/12 09:26:30 INFO mapred.JobClient: Virtual memory (bytes) snapshot=6773346304
- 13/03/12 09:26:30 INFO mapred.JobClient: Map output records=2615
- hadoop@derekUbun:/usr/local/hadoop$
显示输出结果
- hadoop@derekUbun:/usr/local/hadoop$ hadoop dfs -cat output/*
- hadoop@derekUbun:/usr/local/hadoop$ hadoop dfs -cat output/*
当Hadoop结束时,可以通过stop-all.sh脚本来关闭Hadoop的守护进程
- hadoop@derekUbun:/usr/local/hadoop$ bin/stop-all.sh
- hadoop@derekUbun:/usr/local/hadoop$ bin/stop-all.sh
现在,开始Hadoop之旅,实现一些算法吧!
注记:
1. 在伪分布模式,可以通过hadoop dfs -ls 查看input里的内容
2. 在伪分布模式,可以通过hadoop dfs -rmr 查看input里的内容
3. 在伪分布模式,input和output都在hadoop dfs文件里
Ubuntu上搭建Hadoop环境(单机模式+伪分布模式) (转载)的更多相关文章
- Ubuntu上搭建Hadoop环境(单机模式+伪分布模式)
首先要了解一下Hadoop的运行模式: 单机模式(standalone) 单机模式是Hadoop的默认模式.当首次解压Hadoop的源码包时,Hadoop无法了解硬件安装环境,便保守地选 ...
- Ubuntu上搭建Hadoop环境(单机模式+伪分布模式)【转】
[转自:]http://blog.csdn.net/hitwengqi/article/details/8008203 最近一直在自学Hadoop,今天花点时间搭建一个开发环境,并整理成文. 首先要了 ...
- 在Ubuntu上安装Hadoop(单机模式)步骤
1. 安装jdk:sudo apt-get install openjdk-6-jdk 2. 配置ssh:安装ssh:apt-get install openssh-server 为运行hadoop的 ...
- [Hadoop] 在Ubuntu系统上一步步搭建Hadoop(单机模式)
1 Hadoop的三种创建模式 单机模式操作是Hadoop的默认操作模式,当首次解压Hadoop的源码包时,Hadoop无法了解硬件安装环境,会保守地选择最小配置,即单机模式.该模式主要用于开发调试M ...
- 【一】、搭建Hadoop环境----本地、伪分布式
## 前期准备 1.搭建Hadoop环境需要Java的开发环境,所以需要先在LInux上安装java 2.将 jdk1.7.tar.gz 和hadoop 通过工具上传到Linux服务器上 3. ...
- 在ubuntu上搭建开发环境3---解决Y470一键系统重装之后恢复ubuntu引导启动的方法
2015/08/18 将知识.常用的操作整理出来一定要完整,注意细节. 就像下面是再2015.04.27时候整理的,当时确实实验成功了,但是可能忘记记下具体的细节,尤其是3.4.5.6步骤中的关于盘符 ...
- 在ubuntu上搭建交叉编译环境---arm-none-eabi-gcc
最近要开始搞新项目,基于arm的高通方案的项目. 那么,如何在ubuntu上搭建这个编译环境呢? 1.找到相关的安装包:http://download.csdn.net/download/storea ...
- 在 Ubuntu 上搭建 Hadoop 分布式集群 Eclipse 开发环境
一直在忙Android FrameWork,终于闲了一点,利用空余时间研究了一下Hadoop,并且在自己和同事的电脑上搭建了分布式集群,现在更新一下blog,分享自己的成果. 一 .环境 1.操作系统 ...
- 在ubuntu上搭建开发环境4---ubuntu简单的搭建LAMP环境和配置
最近重新安装了Ubuntu,但是之前的LAMP环境自然也就没有了,实在是不想再去编译搭建LAMP环境(这种方法实在是太费时间,而且太容易遇到各种不知道为什么的错误),所以,就去查查有没有什么简单的搭建 ...
随机推荐
- linux重新安装python
第一步:下载python2.7 wget https://www.Python.org/ftp/python/2.7.12/Python-2.7.12.tar.xz 第二步: 解压刚刚下载的压缩包 ...
- java定义object数组(可以存储String或int等多种类型)
需求| 想在数组中既有String类型又有int等类型,所以需要定义数组为Object类型 背景| 现在有一个字符串params,需要对其进行逗号分隔赋值到数组里,这时遇到了个问题,即使直接定义的 ...
- ubuntu 印象笔记
1. 印象笔记剪裁,直接浏览器上百度搜索,下载浏览器插件,登陆即可.(不过,我遇到个问题,浏览器登陆的服务器貌似是国际版的,与pc上的服务器不同,也没有成功转换过来.两账户相互独立.) 2. 印象笔记 ...
- NumPy IO
NumPy IO Numpy 可以读写磁盘上的文本数据或二进制数据. NumPy 为 ndarray 对象引入了一个简单的文件格式:npy. npy 文件用于存储重建 ndarray 所需的数据.图形 ...
- nodejs中.npmrc文件的内容
. nodejs安装后,使用npm安装模块的时候我出现了一个错误. getaddrinfo ENOTFOUND xxx 主要是这个配置文件的问题.搞不清楚.直接打开把文件内容删除变成 npmrc文件内 ...
- switch只跟在这些之后
switch case 可以用在他们之后
- day 17 re模块
RE模块 import re 对一个大篇幅的字符串,按照你的规则找出想要的字符串 # 单个字符匹配 import re # \w 与 \W #字母数字下划线, 非 # print(re.findall ...
- JQuery UI之Autocomplete(3)属性与事件
1.Autocomplete的属性 首先引入css和js文件,以及对应的HTML代码如下: <link href="../css/jquery-ui.css" rel=&qu ...
- 将之前的Power idea公司的数据按照下图所示的格式在屏幕上显示出来。
之前的文章 示例代码如下 assume cs:codesg ;将整个data段看作是一个数组,长度一共为 ;21*4+21*4+2*21=168+42=210字节 data segment db ' ...
- 处理后台向前台传递的json数据
在pom文件中添加下面三种依赖jar包 <dependency> <groupId>com.fasterxml.jackson.core</groupId> < ...