HiBench成长笔记——(7) 阅读《The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis》
《The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis》内容精选
We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.
关键内容:speed、 throughput、HDFS bandwidth、 system resource、data access patterns
the last one is an enhanced version of the DFSIO benchmark that we have extended to evaluate the aggregated bandwidth delivered by HDFS.
关键内容:evaluate the aggregated bandwidth delivered by HDFS
As shown in Fig. 1, the aggregated throughput curve has a warm-up period and a cool-down period where map tasks are launching up and shutting down respectively. Between these two periods, there is a steady period where the aggregated throughput values are stable across different time slots. Therefore, the Enhanced DFSIO workload computes the aggregated HDFS throughput by averaging the throughput value of each time slot in the steady period. In Enhanced DFSIO, when the number of concurrent map tasks at a time slot is above a specified percentage (e.g., 50% is used in our benchmarking) of the total map task slots in the Hadoop cluster, the slot is considered to be in the steady period.
关键内容:warm-up period、cool-down period、steady period、computes the aggregated HDFS throughput by averaging the throughput value of each time slot in the steady period
In essence, the TeraSort workload is similar to Sort and therefore is I/O bound in nature. However, we have compressed its shuffle data (i.e., map output) in the experiment so as to minimize the disk and network I/O during shuffle, as shown in Table III. Consequently, TeraSort have very high CPU utilization and moderate disk I/O during the map stage and shuffle phases, and moderate CPU utilization and heavy disk I/O during the reduce phases, as shown in Fig. 4.
关键内容:map stage、shuffle phases、reduce phases、high、moderate、CPU utilization、disk I/O
The best performance (total running time) of Hadoop workloads is usually obtained by accurately estimating the size of the map output, shuffle data and reduce input data, and properly allocating memory buffers to prevent multiple spilling (to disk) of those data.
关键内容:estimating the size of the map output、 shuffle data and reduce input data、allocating memory buffers
HiBench成长笔记——(7) 阅读《The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis》的更多相关文章
- HiBench成长笔记——(2) CentOS部署安装HiBench
安装Scala 使用spark-shell命令进入shell模式,查看spark版本和Scala版本: 下载Scala2.10.5 wget https://downloads.lightbend.c ...
- HiBench成长笔记——(3) HiBench测试Spark
很多内容之前的博客已经提过,这里不再赘述,详细内容参照本系列前面的博客:https://www.cnblogs.com/ratels/p/10970905.html 创建并修改配置文件conf/spa ...
- HiBench成长笔记——(1) HiBench概述
测试分类 HiBench共计19个测试方向,可大致分为6个测试类别:分别是micro,ml(机器学习),sql,graph,websearch和streaming. 2.1 micro Benchma ...
- HiBench成长笔记——(5) HiBench-Spark-SQL-Scan源码分析
run.sh #!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributo ...
- HiBench成长笔记——(4) HiBench测试Spark SQL
很多内容之前的博客已经提过,这里不再赘述,详细内容参照本系列前面的博客:https://www.cnblogs.com/ratels/p/10970905.html 和 https://www.cnb ...
- HiBench成长笔记——(11) 分析源码run.sh
#!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributor licen ...
- HiBench成长笔记——(10) 分析源码execute_with_log.py
#!/usr/bin/env python2 # Licensed to the Apache Software Foundation (ASF) under one or more # contri ...
- HiBench成长笔记——(9) 分析源码monitor.py
monitor.py 是主监控程序,将监控数据写入日志,并统计监控数据生成HTML统计展示页面: #!/usr/bin/env python2 # Licensed to the Apache Sof ...
- HiBench成长笔记——(8) 分析源码workload_functions.sh
workload_functions.sh 是测试程序的入口,粘连了监控程序 monitor.py 和 主运行程序: #!/bin/bash # Licensed to the Apache Soft ...
随机推荐
- Nginx禁止使用ip访问,只允许使用域名访问
Nginx虚拟主机配置,vhosts下面有很多域名的配置: [root@external-lb01 vhosts]# pwd/data/nginx/conf/vhosts [root@external ...
- pycharm 右键无法显示unittest框架&&解决右键只有unittest 运行如何取消右键显示进行普通run
上面是普通文件和unittest 导入的文件右键快捷键显示情况,可以看出两者快捷键都是ctr+shift+F10,如果你是右键模式想运行unitest,但是又不知道哪里配置unittest直接运行快捷 ...
- 【NGINX】LINUX安装NGINX
安装依赖() · yum install gcc · yum install pcre-devel · yum install zlib zlib-devel · yum install openss ...
- LeetCode 79.单词搜索 - JavaScript
题目描述:给定一个二维网格和一个单词,找出该单词是否存在于网格中. 单词必须按照字母顺序,通过相邻的单元格内的字母构成,其中"相邻"单元格是那些水平相邻或垂直相邻的单元格.同一个单 ...
- boot集成mybatis分页插件pagehelper
导入依赖 <!-- https://mvnrepository.com/artifact/com.github.pagehelper/pagehelper-spring-boot-starter ...
- Python - python里有类似Java的接口(interface)吗?
参考 https://stackoverflow.com/questions/2124190/how-do-i-implement-interfaces-in-python https://stack ...
- iOS项目的一般开发流程
- 【LOJ3087】「GXOI / GZOI2019」旅行者
题意 给定一个 \(n\) 个点 \(m\) 条边的的有向图,给出 \(k\) 个关键点,求关键点两两最短路的最小值. \(n\le 10^5, m\le 5\cdot 10^5\). 题解 二进制分 ...
- Python 中命令行参数解析工具 docopt 安装和应用
什么是 docopt? 1.docopt 是一种 Python 编写的命令行执行脚本的交互语言. 它是一种语言! 它是一种语言! 它是一种语言! 2.使用这种语言可以在自己的脚本中,添加一些规则限制. ...
- Python环境搭建-2 编译器和解释器
编译器与解释器 编译器/解释器:高级语言与机器之间的翻译官 都是将代码翻译成机器可以执行的二进制机器码,只不过在运行原理和翻译过程有不同而已. 那么两者有什么区别呢? 编译器:先整体编译再执行 解释器 ...