Hadoop基准测试(二)
Hadoop Examples
除了《Hadoop基准测试(一)》提到的测试,Hadoop还自带了一些例子,比如WordCount和TeraSort,这些例子在hadoop-examples-2.6.0-mr1-cdh5.16.1.jar和hadoop-examples.jar中。执行以下命令:
hadoop jar hadoop-examples--mr1-cdh5.16.1.jar
会列出所有的示例程序:
bash--mr1-cdh5.16.1.jar An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
单词统计测试
进入角色hdfs创建的文件夹**,执行命令:vim words.txt,输入内容如下:
hello hadoop hbase mytest hadoop-node1 hadoop-master hadoop-node2 this is my test
执行命令:
../bin/hadoop fs -put words.txt /tmp/
将文件上传到HDFS中,如下:
执行以下命令,使用mapreduce统计指定文件单词个数,并将结果输入到指定文件:
hadoop jar ../jars/hadoop-examples--mr1-cdh5.16.1.jar wordcount /tmp/words.txt /tmp/words_result.txt
返回如下信息:
bash--mr1-cdh5.16.1.jar wordcount /tmp/words.txt /tmp/words_result.txt // :: INFO client.RMProxy: Connecting to ResourceManager at node1/ // :: INFO input.FileInputFormat: Total input paths to process : // :: INFO mapreduce.JobSubmitter: number of splits: // :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552358721447_0060 // :: INFO impl.YarnClientImpl: Submitted application application_1552358721447_0060 // :: INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552358721447_0060/ // :: INFO mapreduce.Job: Running job: job_1552358721447_0060 // :: INFO mapreduce.Job: Job job_1552358721447_0060 running in uber mode : false // :: INFO mapreduce.Job: map % reduce % // :: INFO mapreduce.Job: map % reduce % // :: INFO mapreduce.Job: map % reduce % // :: INFO mapreduce.Job: map % reduce % // :: INFO mapreduce.Job: Job job_1552358721447_0060 completed successfully // :: INFO mapreduce.Job: Counters: File System Counters FILE: Number of bytes read= FILE: Number of bytes written= FILE: Number of read operations= FILE: Number of large read operations= FILE: Number of HDFS: Number of bytes read= HDFS: Number of bytes written= HDFS: Number of read operations= HDFS: Number of large read operations= HDFS: Number of Job Counters Launched map tasks= Launched reduce tasks= Data-local map tasks= Total Total Total Total Total vcore-milliseconds taken by all map tasks= Total vcore-milliseconds taken by all reduce tasks= Total megabyte-milliseconds taken by all map tasks= Total megabyte-milliseconds taken by all reduce tasks= Map-Reduce Framework Map input records= Map output records= Map output bytes= Map output materialized bytes= Input Combine input records= Combine output records= Reduce input Reduce shuffle bytes= Reduce input records= Reduce output records= Spilled Records= Shuffled Maps = Failed Shuffles= Merged Map outputs= GC CPU Physical memory (bytes) snapshot= Virtual memory (bytes) snapshot= Total committed heap usage (bytes)= Shuffle Errors BAD_ID= CONNECTION= IO_ERROR= WRONG_LENGTH= WRONG_MAP= WRONG_REDUCE= File Input Format Counters Bytes Read= File Output Format Counters Bytes Written=
在hdfs目录下保存了任务的结果文件:
结果记录条目从0计数到47,共计48条:
每一个part对应一个Reduce:
执行命令,查看任务执行后的结果:
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-*****
返回结果如下:
bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00000 bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00011 is bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00015 this bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00022 hadoop bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00024 hbase bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00040 hadoop-node1 bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00041 hadoop-master hadoop-node2 bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00045 my bash-4.2$ hadoop fs -cat hdfs:///tmp/words_result.txt/part-r-00047 mytest
参考: https://jeoygin.org/2012/02/22/running-hadoop-on-centos-single-node-cluster/
Hadoop基准测试(二)的更多相关文章
- MySQL基准测试(二)--方法
MySQL基准测试(二)--方法 目的: 方法不是越高级越好.而应该善于做减法.至简是一种智慧,首先要做的是收集MySQL的各状态数据.收集到了,不管各个时间段出现的问题,至少你手上有第一时间的状态数 ...
- Hadoop(二):MapReduce程序(Java)
Java版本程序开发过程主要包含三个步骤,一是map.reduce程序开发:第二是将程序编译成JAR包:第三使用Hadoop jar命令进行任务提交. 下面拿一个具体的例子进行说明,一个简单的词频统计 ...
- Hadoop 基准测试与example
#pi值示例 hadoop jar /app/cdh23502/share/hadoop/mapreduce2/hadoop-mapreduce-examples--cdh5. #生成数据 第一个参数 ...
- Hadoop系列(二)hadoop2.2.0伪分布式安装
一.环境配置 安装虚拟机vmware,并在该虚拟机机中安装CentOS 6.4: 修改hostname(修改配置文件/etc/sysconfig/network中的HOSTNAME=hadoop),修 ...
- Hadoop MapReduce 二次排序原理及其应用
关于二次排序主要涉及到这么几个东西: 在0.20.0 以前使用的是 setPartitionerClass setOutputkeyComparatorClass setOutputValueGrou ...
- Hadoop基准测试(转载)
<hadoop the definitive way>(third version)中的Benchmarking a Hadoop Cluster Test Cases的class在新的版 ...
- hadoop系列二:HDFS文件系统的命令及JAVA客户端API
转载请在页首明显处注明作者与出处 一:说明 此为大数据系列的一些博文,有空的话会陆续更新,包含大数据的一些内容,如hadoop,spark,storm,机器学习等. 当前使用的hadoop版本为2.6 ...
- hadoop(二)搭建伪分布式集群
前言 前面只是大概介绍了一下Hadoop,现在就开始搭建集群了.我们下尝试一下搭建一个最简单的集群.之后为什么要这样搭建会慢慢的分享,先要看一下效果吧! 一.Hadoop的三种运行模式(启动模式) 1 ...
- Hadoop基准测试
其实就是从网络上copy的吧,在这里做一下记录 这个是看一下有哪些测试方式: hadoop jar /opt/cloudera/parcels/CDH-5.3.6-1.cdh5.3.6.p0.11/ ...
随机推荐
- 快速创建vue 项目
随着VUE 技术的不断更新,越来越多的开发者开始使用vue编写前端界面,今天我就和大家分享一下 ,如何快速创建一个vue项目. 前提: 安装了node.js 首先: 全局安装vue-cli 使用命令: ...
- Hibernate学习(二)
持久化对象的声明周期 1.Hibernate管理的持久化对象(PO persistence object )的生命周期有四种状态,分别是transient.persistent.detached和re ...
- ROM, RAM, NVRAM and Flash Memory on Cisco Routers
当谈到路由器有多少内存以及哪些内存做什么时,有时人们会感到困惑. 您应该熟悉4个内存术语,在升级路由器的IOS之前应检查其中2个. 这些是以下内容: ROM:ROM代表只读存储器. 它存储System ...
- 蓝桥杯-铺瓷砖(dfs)
问题描述 有一长度为N(1< =N< =10)的地板,给定两种不同瓷砖:一种长度为1,另一种长度为2,数目不限.要将这个长度为N的地板铺满,一共有多少种不同的铺法? 例如,长度为4的地面一 ...
- Python笔记⑤爬虫
爬虫的前奏 # 爬虫前奏 # 明确目的 # 找到数据对应的网页 # 分析网页的结果找到数据所在的标签位置 # 模拟HTTP请求,向服务器发送这个请求,获取到服务器返回给我们的HTML # 用正则表达式 ...
- [ DLPytorch ] 批量归一化与残差网络
批量归一化 通常来说,数据标准化预处理对于浅层模型就足够有效了.随着模型训练的进行,当每层中参数更新时,靠近输出层的输出较难出现剧烈变化.但对深层神经网络来说,即使输入数据已做标准化,训练中模型参数的 ...
- 操作系统OS,Python - 协程(Coroutine)
留坑 参考: https://en.wikipedia.org/wiki/Coroutine https://zh.wikipedia.org/wiki/%E5%8D%8F%E7%A8%8B http ...
- IDEA 查看某个class的maven引用依赖&如何展示Diagram Elements
1.打开对应的class,如下图所示,至于具体快捷键就不说了,我是设置的eclipse的快捷键: 2.定位到对应jar,记下jar名称及版本: 3.在右侧栏点击maven,再在展出的视图中找到对应的m ...
- 函数返回值retrun
如果函数不写retrun,默认返回None. return多个对象,那么Python帮我们把这多个对象封装成一个元组返回. return 作用 结束函数.返回某个对象
- CentOS修改各大yum源(centos5,centos6,centos7)
备份原配置文件 进入yum源配置目录: cd /etc/yum.repos.d 如果没有先安装wget: sudo yum install wget 修改yum源,就是修改CentOS-Base.re ...