执行hadoop自带的WordCount实例

hadoop 自带的WordCount实例可以统计一批文本文件中各单词出现的次数。
下面介绍如何执行WordCount实例。

1.启动hadoop

[root@hadoop ~]# start-all.sh #启动hadoop

2.在本地新建目录及2个文件

[root@hadoop ~]# mkdir input

[root@hadoop ~]# cd input/

[root@hadoop input]# echo "hello world">test1.txt #新建2个测试文件

[root@hadoop input]# echo "hello hadoop">test2.txt

3.将本地文件系统上的input目录复制到HDFS根目录下，重命名为in

[root@hadoop ~]# hdfs dfs -put input/ /in

[root@hadoop ~]# hdfs dfs -ls / #查看根目录

Found 1 items

drwxr-xr-x - root supergroup 0 2018-07-20 03:06 /in

[root@hadoop ~]# hdfs dfs -ls /in #查看in根目录

Found 2 items

-rw-r--r-- 1 root supergroup 12 2018-07-20 03:06 /in/test1.txt

-rw-r--r-- 1 root supergroup 13 2018-07-20 03:06 /in/test2.txt

4.执行以下命令

[root@hadoop ~]# cd /usr/local/hadoop/share/hadoop/mapreduce/ #示例jar包在此目录中存放

[root@hadoop mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.7.jar wordcount /in /out #out为输出目录，执行命令之前必须为空或者不存在否则报错

[root@hadoop ~]# cd /usr/local/hadoop/share/hadoop/mapreduce/ #示例jar包在此目录中存放

[root@hadoop mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.7.jar wordcount /in /out

18/07/30 14:02:11 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.42.133:8032

18/07/30 14:02:13 INFO input.FileInputFormat: Total input paths to process : 2

18/07/30 14:02:13 INFO mapreduce.JobSubmitter: number of splits:2

18/07/30 14:02:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532913019648_0002

18/07/30 14:02:14 INFO impl.YarnClientImpl: Submitted application application_1532913019648_0002

18/07/30 14:02:14 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1532913019648_0002/

18/07/30 14:02:14 INFO mapreduce.Job: Running job: job_1532913019648_0002

18/07/30 14:02:36 INFO mapreduce.Job: Job job_1532913019648_0002 running in uber mode : false

18/07/30 14:02:36 INFO mapreduce.Job:  map 0% reduce 0%

18/07/30 14:04:37 INFO mapreduce.Job:  map 67% reduce 0%

18/07/30 14:04:42 INFO mapreduce.Job:  map 100% reduce 0%

18/07/30 14:05:21 INFO mapreduce.Job:  map 100% reduce 100%

18/07/30 14:05:23 INFO mapreduce.Job: Job job_1532913019648_0002 completed successfully

18/07/30 14:05:26 INFO mapreduce.Job: Counters: 49

    File System Counters

        FILE: Number of bytes read=55

        FILE: Number of bytes written=368074

        FILE: Number of read operations=0

        FILE: Number of large read operations=0

        FILE: Number of write operations=0

        HDFS: Number of bytes read=217

        HDFS: Number of bytes written=25

        HDFS: Number of read operations=9

        HDFS: Number of large read operations=0

        HDFS: Number of write operations=2

    Job Counters

        Launched map tasks=2

        Launched reduce tasks=1

        Data-local map tasks=2

        Total time spent by all maps in occupied slots (ms)=259093

        Total time spent by all reduces in occupied slots (ms)=21736

        Total time spent by all map tasks (ms)=259093

        Total time spent by all reduce tasks (ms)=21736

        Total vcore-milliseconds taken by all map tasks=259093

        Total vcore-milliseconds taken by all reduce tasks=21736

        Total megabyte-milliseconds taken by all map tasks=265311232

        Total megabyte-milliseconds taken by all reduce tasks=22257664

    Map-Reduce Framework

        Map input records=2

        Map output records=4

        Map output bytes=41

        Map output materialized bytes=61

        Input split bytes=192

        Combine input records=4

        Combine output records=4

        Reduce input groups=3

        Reduce shuffle bytes=61

        Reduce input records=4

        Reduce output records=3

        Spilled Records=8

        Shuffled Maps =2

        Failed Shuffles=0

        Merged Map outputs=2

        GC time elapsed (ms)=847

        CPU time spent (ms)=4390

        Physical memory (bytes) snapshot=461631488

        Virtual memory (bytes) snapshot=6226669568

        Total committed heap usage (bytes)=277356544

    Shuffle Errors

        BAD_ID=0

        CONNECTION=0

        IO_ERROR=0

        WRONG_LENGTH=0

        WRONG_MAP=0

        WRONG_REDUCE=0

    File Input Format Counters

        Bytes Read=25

    File Output Format Counters

        Bytes Written=25

执行命令时显示MapReduce过程

5.查看输出结果

1)直接查看HDFS上的输出文件

[root@hadoop mapreduce]# hdfs dfs -ls /out

Found 2 items

-rw-r--r--   1 root supergroup          0 2018-07-30 14:05 /out/_SUCCESS

-rw-r--r--   1 root supergroup         25 2018-07-30 14:05 /out/part-r-00000

[root@hadoop mapreduce]# hdfs dfs -cat /out/part-r-00000

hadoop    1

hello    2

world    1

2)也可以输入以下命令查看

[root@hadoop mapreduce]# hdfs dfs -cat /out/*

hadoop    1

hello    2

world    1

3)还可以把文件复制到本地查看

[root@hadoop mapreduce]# hdfs dfs -get /out /root/output

[root@hadoop mapreduce]# cd  /root/output/

[root@hadoop output]# ll

总用量 4

-rw-r--r-- 1 root root 25 7月  30 17:18 part-r-00000

-rw-r--r-- 1 root root  0 7月  30 17:18 _SUCCESS

[root@hadoop output]# cat part-r-00000

hadoop    1

hello    2

world    1

执行hadoop自带的WordCount实例的更多相关文章

hadoop自带例子wordcount的具体运行步骤
1.在hadoop所在目录“usr/local”下创建一个文件夹input root@ubuntu:/usr/local# mkdir input 2.在文件夹input中创建两个文本文件file1. ...
windows环境下跑hadoop自带的wordcount遇到的问题
hadoop环境自己之前也接触过,搭建的是一个伪分布的环境,主从节点都在我自己的机子上,即127.0.0.1,当初记得步骤很多很麻烦的样子(可能自己用ubuntu还不够熟练),包括myeclipse. ...
Hadoop(1)---运行Hadoop自带的wordcount出错问题。
在hadoop2.9.0版本中,对namenode.yarn做了ha,随后在某一台namenode节点上运行自带的wordcount程序出现偶发性的错误(有时成功,有时失败),错误信息如下: // : ...
linux下在eclipse上运行hadoop自带例子wordcount
启动eclipse:打开windows->open perspective->other->map/reduce 可以看到map/reduce开发视图.设置Hadoop locati ...
在命令行中运行Hadoop自带的WordCount程序
1.启动所有的线程服务 start-all.sh 记得要查看线程是否启动 jps 2.在根目录创建 wordcount.txt 文件放置一些数据 3.创建 hdfs dfs -mkdir /文件夹 ...
运行hadoop自带的wordcount例子程序
1.准备文件 [root@master ~]# cat input.txt hello java hello python hello c hello java hello js hello html ...
Hadoop环境搭建及wordcount程序
目的: 前期学习了一些机器学习基本算法,实际企业应用中算法是核心,运行的环境和数据处理的平台是基础. 手段: 搭建简易hadoop集群(由于机器限制在自己的笔记本上通过虚拟机搭建) 一.基础环境介绍 ...
Hadoop最基本的wordcount(统计词频)
package com.uniclick.dapa.dstest; import java.io.IOException; import java.net.URI; import org.apache ...
Hadoop3 在eclipse中访问hadoop并运行WordCount实例
前言: 毕业两年了,之前的工作一直没有接触过大数据的东西,对hadoop等比较陌生,所以最近开始学习了.对于我这样第一次学的人,过程还是充满了很多疑惑和不解的,不过我采取的策略是还是先让环 ...

随机推荐

PHP实现删除非站内外部链接实例代码
/** * 删除非站内链接 * * @access public * @param string $body 内容 * @param array $allow_urls ...
sencha touch 入门学习资料大全(2015-12-30)
现在sencha touch已经更新到2.4.2版本了重新整理一下资料官方网站:http://www.sencha.com/products/touch/ 在线文档:http://docs.sen ...
A - River Hopscotch
Every year the cows hold an event featuring a peculiar version of hopscotch that involves carefully ...
Solr-全文检索工具简介
一.Solr的简介 Solr 是Apache下的一个顶级开源项目,采用Java开发,它是基于Lucene的全文搜索服务.Solr可以独立运行在Jetty.Tomcat等这些Servlet容器中.都是W ...
ERP项目实施记录02
今天去第三方公司(B公司)考察: 公司成立:2011年12月注册地:深圳深圳:2~3个业务员东莞:5个开发人员,据说也是实施人员全功能者:BOSS A公司因战略调整,要将业务"下放& ...
Nordic NRF51822 从零开始系列（一）开发环境的搭建
硬件准备 (1)nrf51822 开发板一块(此处使用的是青云系列的,自带jlijnk ob+usb串口芯片)或者使用nrf51822模块+jlink_ob ( ...
ARM v8中断机制和中断处理（转）
https://blog.csdn.net/firefox_1980/article/details/40113637 https://blog.csdn.net/firefox_1980/artic ...
WPF datagrid 获取行或单格为NULL 问题
datagrid 属性 EnableRowVirtualization 设置为 false 解决...不要问我为什么. 害死我了
MSSQL 将大表改成已分区表
--select * from sys.partition_functions --select * from sys.partition_range_values use [UpdateLog] g ...
iPhone XS 能否经受的起寒冬的考验
我的知乎文章链接: https://zhuanlan.zhihu.com/p/51782644 华北地区近日寒风凛冽,温度骤降,已经进入真正的冬天了,最低温度可以达到零下10度,我们手里的iPhone ...

执行hadoop自带的WordCount实例

执行hadoop自带的WordCount实例的更多相关文章

随机推荐

热门专题