Top 10 IDs base on their value

First , we need to set the reduce to 1. For each map task, it is not a good idea to output each key/value pair. Instead, we can just output the top 10 IDs based on their value. So, less data will be written to disk and transferred to the reducer. If we need to get the top 10 for each mapper task, we need to iterator over the whole split. In map function, we collect each id/value, add it to the data structure that supports sorting like black-red tree, keep only the top 10. In the cleanup function, we output the result.

 //hadoop code for map/reduce task , see the cleanup function.
public void run(Context context) throws IOException, InterruptedException {
setup(context);
try {
while (context.nextKey()) {
reduce(context.getCurrentKey(), context.getValues(), context);
}
} finally {
cleanup(context);
}
}

The map task below. the sorted IDs is written in cleanup function.

The reduce task has the similar logic.(Note: there is only 1 reducer)

reference:https://www.youtube.com/watch?v=Bj6-maOjB8M

Map Reduce Application(Top 10 IDs base on their value)的更多相关文章

  1. Map Reduce Application(Partitioninig/Binning)

    Map Reduce Application(Partitioninig/Group data by a defined key) Assuming we want to group data by ...

  2. Map Reduce Application(Join)

    We are going to explain how join works in MR , we will focus on reduce side join and map side join. ...

  3. mapreduce: 揭秘InputFormat--掌控Map Reduce任务执行的利器

    随着越来越多的公司采用Hadoop,它所处理的问题类型也变得愈发多元化.随着Hadoop适用场景数量的不断膨胀,控制好怎样执行以及何处执行map任务显得至关重要.实现这种控制的方法之一就是自定义Inp ...

  4. OWAP Top 10

    2013 Top 10 List   A1-Injection Injection flaws, such as SQL, OS, and LDAP injection occur when untr ...

  5. Python进阶:函数式编程(高阶函数,map,reduce,filter,sorted,返回函数,匿名函数,偏函数)...啊啊啊

    函数式编程 函数是Python内建支持的一种封装,我们通过把大段代码拆成函数,通过一层一层的函数调用,就可以把复杂任务分解成简单的任务,这种分解可以称之为面向过程的程序设计.函数就是面向过程的程序设计 ...

  6. 安全检测:2013 Top 10 List

    转自:https://www.owasp.org/index.php/Top_10_2013-Top_10   Risk 2013 Table of Contents 2013 Top 10 List ...

  7. (转)Python进阶:函数式编程(高阶函数,map,reduce,filter,sorted,返回函数,匿名函数,偏函数)

    原文:https://www.cnblogs.com/chenwolong/p/reduce.html 函数式编程 函数是Python内建支持的一种封装,我们通过把大段代码拆成函数,通过一层一层的函数 ...

  8. Chapter 3 Top 10 List

    3.1 Introduction Given a set of (key-as-string, value-as-integer) pairs, then finding a Top-N ( wher ...

  9. MapReduce剖析笔记之三:Job的Map/Reduce Task初始化

    上一节分析了Job由JobClient提交到JobTracker的流程,利用RPC机制,JobTracker接收到Job ID和Job所在HDFS的目录,够早了JobInProgress对象,丢入队列 ...

随机推荐

  1. Eclipse新导入的项目中ctrl+点击指定方法名或类名没有反应,不能跳转问题

    项目没有转成java项目 解决方法:右击项目名---选择properties----点击Project Facets,这样就可以实现ctrl+左键点击跳转了. 转成java项目后会报错 解决办法:选中 ...

  2. POJ 3528--Ultimate Weapon(三维凸包)

    Ultimate Weapon Time Limit: 2000MS   Memory Limit: 131072K Total Submissions: 2430   Accepted: 1173 ...

  3. POJ 2208--Pyramids(欧拉四面体体积计算)

    Pyramids Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 3451   Accepted: 1123   Specia ...

  4. jar包导入本地maven库的操作

    pom文件配置格式: <dependency> <groupId>A</groupId> <artifactId>B</artifactId> ...

  5. 继续深入更新shell脚本容易出错的地方

    一.在shell中用到如果需要输入某些值,需要用到read -p命令 这是我写的猜数字游戏,一开始在输出的时候,屏幕上总会打印输出  "INT" 经过反复的练习才发现 双引号后面应 ...

  6. HTML+css3 图片放大效果

    <div class="enlarge"> <img src="xx" alt="图片"/> </div> ...

  7. 利用ascii码生成26个英文字母

    <script> let a = ""; for (var i = 65; i < 91; i++) { a += String.fromCharCode(i); ...

  8. angular 打包

    ERROR in ng:///F:/IDEWorkspace/dsmc/dsmc-front-new/trunk/src/app/routes/city-manage/component-coding ...

  9. 纯js轮播图练习-3,类似于淘宝海报带小圆点轮播图

    基于js和css,跟着网上的视频教程,结合自己想要的效果,做出了一个类似于淘宝海报的效果. 如图:淘宝首页 自己做的: 代码: <!DOCTYPE html> <html> & ...

  10. 3.1 wifi网卡RT3070在S3C2440的移植和使用

    学习目标:熟悉RT3070在S3C2440的移植和使用,以及其中的相关工具的安装和使用: 一.配置内核选择WIFI驱动 1. 将usb wifi插到电脑,在ubuntu使用命令:# lsusb 查看w ...