hadoop运行wordcount实例,hdfs简单操作
1.查看hadoop版本
[hadoop@ltt1 sbin]$ hadoop version
Hadoop 2.6.-cdh5.12.0
Subversion http://github.com/cloudera/hadoop -r dba647c5a8bc5e09b572d76a8d29481c78d1a0dd
Compiled by jenkins on --29T11:33Z
Compiled with protoc 2.5.
From source with checksum 7c45ae7a4592ce5af86bc4598c5b4
This command was run using /home/hadoop/hadoop260/share/hadoop/common/hadoop-common-2.6.-cdh5.12.0.jar
2.通过hadoop自带的jar文件,可以简单测试一些功能。
查看hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar文件所支持的MapReduce功能列表
[hadoop@ltt1 sbin]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.-cdh5.12.0.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
3.在hdfs上创建文件夹
hadoop fs -mkdir /input
4.查看hdfs的更目录列表
[hadoop@ltt1 ~]$ hadoop fs -ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2017-09-17 08:11 /input
drwx------ - hadoop supergroup 0 2017-09-17 08:07 /tmp
5.上传本地文件到hdfs
hadoop fs -put $HADOOP_HOME/*.txt /input
6.查看hdfs上input目录下文件
[hadoop@ltt1 ~]$ hadoop fs -ls /input
Found items
-rw-r--r-- hadoop supergroup -- : /input/LICENSE.txt
-rw-r--r-- hadoop supergroup -- : /input/NOTICE.txt
-rw-r--r-- hadoop supergroup -- : /input/README.txt
7.wordcount简单测试。
[hadoop@ltt1 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.-cdh5.12.0.jar wordcount /input /output
// :: INFO input.FileInputFormat: Total input paths to process :
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505605169997_0002
// :: INFO impl.YarnClientImpl: Submitted application application_1505605169997_0002
// :: INFO mapreduce.Job: The url to track the job: http://ltt1.bg.cn:9180/proxy/application_1505605169997_0002/
// :: INFO mapreduce.Job: Running job: job_1505605169997_0002
// :: INFO mapreduce.Job: Job job_1505605169997_0002 running in uber mode : false
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Job job_1505605169997_0002 completed successfully
// :: INFO mapreduce.Job: Counters: 50
>>提君博客原创 http://www.cnblogs.com/tijun/ <<
File System Counters
FILE: Number of bytes read=
FILE: Number of bytes written=
FILE: Number of read operations=
FILE: Number of large read operations=
FILE: Number of write operations=
HDFS: Number of bytes read=
HDFS: Number of bytes written=
HDFS: Number of read operations=
HDFS: Number of large read operations=
HDFS: Number of write operations=
Job Counters
Launched map tasks=
Launched reduce tasks=
Data-local map tasks=
Rack-local map tasks=
Total time spent by all maps in occupied slots (ms)=
Total time spent by all reduces in occupied slots (ms)=
Total time spent by all map tasks (ms)=
Total time spent by all reduce tasks (ms)=
Total vcore-milliseconds taken by all map tasks=
Total vcore-milliseconds taken by all reduce tasks=
Total megabyte-milliseconds taken by all map tasks=
Total megabyte-milliseconds taken by all reduce tasks=
Map-Reduce Framework
Map input records=
Map output records=
Map output bytes=
Map output materialized bytes=
Input split bytes=
Combine input records=
Combine output records=
Reduce input groups=
Reduce shuffle bytes=
Reduce input records=
Reduce output records=
Spilled Records=
Shuffled Maps =
Failed Shuffles=
Merged Map outputs=
GC time elapsed (ms)=
CPU time spent (ms)=
Physical memory (bytes) snapshot=
Virtual memory (bytes) snapshot=
Total committed heap usage (bytes)=
Shuffle Errors
BAD_ID=
CONNECTION=
IO_ERROR=
WRONG_LENGTH=
WRONG_MAP=
WRONG_REDUCE=
File Input Format Counters
Bytes Read=
File Output Format Counters
Bytes Written=
8.查看wordcount运行结果(由于结果太长,只举出了部分结果)
[hadoop@ltt1 ~]$ hadoop fs -cat /output/*
worldwide, 4
would 1
writing 2
writing, 4
written 19
xmlenc 1
year 1
you 12
your 5
zlib 1
252.227-7014(a)(1)) 1
§ 1
“AS 1
“Contributor 1
“Contributor” 1
“Covered 1
“Executable” 1
“Initial 1
“Larger 1
“Licensable” 1
“License” 1
“Modifications” 1
“Original 1
“Participant”) 1
“Patent 1
“Source 1
“Your”) 1
“You” 2
“commercial 3
“control” 1
>>提君博客原创 http://www.cnblogs.com/tijun/ <<
至此,通过一个wordcount的一个小栗子,简介实践了一下hdfs的创建文件夹,上传文件,查看目录,运行wordcount实例。
>>提君博客原创 http://www.cnblogs.com/tijun/ <<
hadoop运行wordcount实例,hdfs简单操作的更多相关文章
- Hadoop3 在eclipse中访问hadoop并运行WordCount实例
前言: 毕业两年了,之前的工作一直没有接触过大数据的东西,对hadoop等比较陌生,所以最近开始学习了.对于我这样第一次学的人,过程还是充满了很多疑惑和不解的,不过我采取的策略是还是先让环 ...
- hadoop2.6.5运行wordcount实例
运行wordcount实例 在/tmp目录下生成两个文本文件,上面随便写两个单词. cd /tmp/ mkdir file cd file/ echo "Hello world" ...
- [Linux][Hadoop] 运行WordCount例子
紧接上篇,完成Hadoop的安装并跑起来之后,是该运行相关例子的时候了,而最简单最直接的例子就是HelloWorld式的WordCount例子. 参照博客进行运行:http://xiejiangl ...
- Spark源码编译并在YARN上运行WordCount实例
在学习一门新语言时,想必我们都是"Hello World"程序开始,类似地,分布式计算框架的一个典型实例就是WordCount程序,接触过Hadoop的人肯定都知道用MapRedu ...
- [hadoop] hadoop 运行 wordcount
讲准备好的文本文件放到hdfs中 执行 hadoop 安装包中的例子 [root@hadoop01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2 ...
- hadoop中常用的hdfs代码操作
一:向HDFS中上传任意文本文件,如果指定的文件在HDFS中已经存在,由用户指定是追加到原有文件末尾还是覆盖原有的文件: package hadoopTest; import org.apache.h ...
- HDFS介绍及简单操作
目录 1.HDFS是什么? 2.HDFS设计基础与目标 3.HDFS体系结构 3.1 NameNode(NN)3.2 DataNode(DN)3.3 SecondaryNameNode(SNN)3.4 ...
- Spark学习笔记-如何运行wordcount(使用jar包)
IDE:eclipse Spark:spark-1.1.0-bin-hadoop2.4 scala:2.10.4 创建scala工程,编写wordcount程序如下 package com.luoga ...
- 一文了解 Hadoop 运行机制
大数据技术栈在当下已经是比较成熟的了,Hadoop 作为大数据存储的基石,其重要程度不言而喻,作为一个想从 java 后端转向大数据开发的程序员来说,打好 Hadoop 基础,就相当于夯实建造房屋的地 ...
随机推荐
- Eclipse快捷键:同时显示两个一模一样的代码窗口
如图: 同样的一个HTML文件,在代码编辑窗口,显示两个. 快捷键: Ctrl + Shift + -(减号) 既可以展示两个,也可以只显示一个 附加一个快捷键: Ctrl + Shift + ...
- selenium更加高效的PageObject 对象操作代码
重新封装了的selenium代码,包括click事件,sendkeys事件,select事件,以及对readonly日期控件的处理 package com.common; import java.ut ...
- php面试题汇总三(基础篇附答案)
问题 1. 如何访问会话变量(session)? A.通过$_GET B.通过$_POST C.通过$_REQUEST D.通过全局变量 E.以上都不对 2. 哪个函数能让服务器输出如下 header ...
- Go指针
Go 语言指针 Go 语言中指针是很容易学习的,Go 语言中使用指针可以更简单的执行一些任务. 接下来让我们来一步步学习 Go 语言指针. 我们都知道,变量是一种使用方便的占位符,用于引用计算机内存地 ...
- “margin塌陷” 嵌套盒子外边距合并现象
来源于官方文档对于外边距合并的解释: 注释:只有普通文档流中块框的垂直外边距才会发生外边距合并.行内框.浮动框或绝对定位之间的外边距不会合并. 出现外边距塌陷的三种情况: 1.相邻兄弟元素之间 若两者 ...
- APP软件半成品测试技巧
由于现在产品类型的多样性,产品功能的先进性,更多体现在产品质量的稳定性和可靠性.软件应用的领域不断深入,设计的复杂程度逐步扩大,开发的周期不断缩短,质量的要求就逐渐提高.然而根据我们公司的版本迭代速度 ...
- linux虚拟机ip地址更改
在虚拟机模式下 进入 cd /etc/sysconfig/network-scripts/ vim ifcfg-eth0 编辑 IPADDR=新的内网ip PREFIX = 24 (对应255.25 ...
- Hadoop2 和 Hadoop1 区别
Hadoop2 和 Hadoop1 区别 Namenode NameNode其实是Hadoop的一个目录服务,它包含着整个集群存储的文件的元数据. 早期发行的Hadoop1版本将所有HDFS目录和文件 ...
- 面试的妹纸问我:web缓存设置不是后台的事情吗?
背景介绍 团队最近在招前端开发,早上收到一封简历,是个妹纸,从技能点来看还算符合要求,于是约了下午3点过来面试. 整个面试过程持续了大约40分钟,问的题目也比较常规,其中一道题就是"常见的性 ...
- Java 的概述
Java是一种可以撰写跨平台应用软件的面向对象的程序设计语言,是有SunMicrosystems公司于1995年5月推出的Java程序设计语言和Java平台(即JavaSE,JavaEE,JavaME ...