原文地址:https://blog.csdn.net/liyong199012/article/details/25423221

一、    概念知识介绍

Hadoop MapReduce是一个用于处理海量数据的分布式计算框架。这个框架解决了诸如数据分布式存储、作业调度、容错、机器间通信等复杂问题,可以使没有并行 处理或者分布式计算经验的工程师,也能很轻松地写出结构简单的、应用于成百上千台机器处理大规模数据的并行分布式程序。

Hadoop MapReduce基于“分而治之”的思想,将计算任务抽象成map和reduce两个计算过程,可以简单理解为“分散运算—归并结果”的过程。一个 MapReduce程序首先会把输入数据分割成不相关的若干键/值对(key1/value1)集合,这些键/值对会由多个map任务来并行地处理。 MapReduce会对map的输出(一些中间键/值对key2/value2集合)按照key2进行排序,排序是用memcmp的方式对key在内存中 字节数组比较后进行升序排序,并将属于同一个key2的所有value2组合在一起作为reduce任务的输入,由reduce任务计算出最终结果并输出 key3/value3。作为一个优化,同一个计算节点上的key2/value2会通过combine在本地归并。基本流程如下:

Hadoop和单机程序计算流程对比:

常计算任务的输入和输出都是存放在文件里的,并且这些文件被存放在Hadoop分布式文件系统HDFS(Hadoop Distributed File System)中,系统会尽量调度计算任务到数据所在的节点上运行,而不是尽量将数据移动到计算节点上,减少大量数据在网络中传输,尽量节省带宽消耗。

应用程序开发人员一般情况下需要关心的是图中灰色的部分,单机程序需要处理数据读取和写入、数据处理;Hadoop程序需要实现map和 reduce,而数据读取和写入、map和reduce之间的数据传输、容错处理等由Hadoop MapReduce和HDFS自动完成。

二、    开发环境搭建

Map/Reduce程序依赖Hadoop集群,另外Eclipse需要安装依赖的hadoop包。

Hadoop集群搭建:参考Hadoop 2.2.0集群搭建

1.   安装、配置Eclipse

在官网下载合适的Eclipse,将hadoop开发所依赖的插件jar包拷贝到eclipse的安装文件夹plugins下。下载地址参考:hadoop2.2.0开发依赖的jar包,当然也可以自己编译。

启动eclipse,选择Window—>Prefrances,若出现如下Hadoop Map/Reduce说明插件安装成功

2.   配置DFS,主要是数据文件的输入输出管理。

Window—>Open Perspective—>other—>Map/Reduce,显示Map/Reduce视图。点击Map/Reduce Locations 的小象图标,新建Hadoop Location,输入如下:

项目视图会出现DFS Location,用来管理输入、输出数据文件。

需要配置hadoop安装文件夹:新建Map/Reduce工程单击Configure Hadoop install direction,输入hadoop的安装路径。

右键单击DFS Location下的空文件夹上传一个文本文件,然后刷新,若文件出现了则说明环境配置成功。

三、    编程模型

MapReduce编程模型的原理是:利用一个输入key/value pair集合来产生一个输出的key/value pair集合。MapReduce库的用户用两个函数表达这个计算:Map和Reduce。

用户自定义的Map函数接受一个输入的key/value pair值,然后产生一个中间key/value pair值的集合。MapReduce库把所有具有相同中间key值I的中间value值集合在一起后传递给reduce函数。

用户自定义的Reduce函数接受一个中间key的值I和相关的一个value值的集合。Reduce函数合并这些value值,形成一个较小的 value值的集合。一般的,每次Reduce函数调用只产生0或1个输出value值。通常我们通过一个迭代器把中间value值提供给Reduce函 数,这样我们就可以处理无法全部放入内存中的大量的value值的集合。

四、    小例子

1.      数据准备

以Tomcat日志为例,日志格式如下:

  1. 127.0.0.1,-,-,[08/May/2014:13:42:40 +0800],GET / HTTP/1.1,200,11444
  2. 127.0.0.1,-,-,[08/May/2014:13:42:42 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO HTTP/1.1,204,-
  3. 127.0.0.1,-,-,[08/May/2014:13:42:42 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO HTTP/1.1,204,-
  4. 127.0.0.1,-,-,[08/May/2014:13:42:47 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20
  5. 127.0.0.1,-,-,[08/May/2014:13:42:47 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198
  6. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525
  7. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,-
  8. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,-
  9. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,-
  10. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,-
  11. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,-
  12. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105
  13. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603
  14. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,-
  15. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093
  16. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105
  17. 127.0.0.1,-,-,[08/May/2014:13:42:48 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913
  18. 127.0.0.1,-,-,[08/May/2014:13:42:48 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22
  19. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:48 +0800],GET /jyglFront/baseInfo_articleList?flag=1 HTTP/1.1,200,8989
  20. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:48 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117
  21. 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20
  22. 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198
  23. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525
  24. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,-
  25. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,-
  26. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,-
  27. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,-
  28. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603
  29. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,-
  30. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105
  31. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105
  32. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,-
  33. 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913
  34. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093
  35. 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22
  36. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/baseInfo_articleList?flag=1 HTTP/1.1,200,8989
  37. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117
  38. 127.0.0.1,-,-,[08/May/2014:13:43:25 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getGraduateBatchByConditions?graduateBatchName=&pageSize=10&pageNo=1 HTTP/1.1,200,597
  39. 127.0.0.1,-,-,[08/May/2014:13:43:25 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getTotalGraduateBatchByCondition?graduateBatchName= HTTP/1.1,200,21
  40. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:26 +0800],GET /jyglFront/graduate_initGraduateBatch HTTP/1.1,200,8766
  41. 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089
  42. 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785
  43. 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227
  44. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:28 +0800],GET /jyglFront/graduate_initGraduateQulifyCheck HTTP/1.1,200,26397
  45. 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089
  46. 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785
  47. 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227
  48. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:29 +0800],GET /jyglFront/graduate_initLeaveSchoolInfo HTTP/1.1,200,20125
  49. 127.0.0.1,-,-,[08/May/2014:13:43:30 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089
  50. 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785
  51. 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227
  52. 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getAllGraduateBatch HTTP/1.1,200,597
  53. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:31 +0800],GET /jyglFront/graduate_initGraduateInfo HTTP/1.1,200,28464
  54. 127.0.0.1,-,-,[08/May/2014:14:27:10 +0800],GET / HTTP/1.1,200,11444
  55. 127.0.0.1,-,-,[08/May/2014:14:27:12 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO HTTP/1.1,204,-
  56. 127.0.0.1,-,-,[08/May/2014:14:27:12 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO HTTP/1.1,204,-
  57. 127.0.0.1,-,-,[08/May/2014:14:27:34 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest HTTP/1.1,200,43
  58. 127.0.0.1,-,-,[08/May/2014:14:27:34 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd HTTP/1.1,200,16
  59. 127.0.0.1,-,-,[08/May/2014:14:27:35 +0800],GET /jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403 HTTP/1.1,200,653
  60. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:35 +0800],GET /jyglFront/exam_initgroupsubscribestatistic HTTP/1.1,200,13551
  61. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:37 +0800],GET /jyglFront/exam_initsubstudentsubscribe HTTP/1.1,500,3900
  62. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:41 +0800],GET /jyglFront/supervisor/intoInitAssignmentDetail HTTP/1.1,200,1808
  63. 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20
  64. 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198
  65. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525
  66. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,-
  67. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,-
  68. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,-
  69. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,-
  70. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603
  71. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105
  72. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093
  73. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,-
  74. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,-
  75. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105
  76. 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913
  77. 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22
  78. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/baseInfo_articleList?flag=1 HTTP/1.1,200,8989
  79. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:43 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117
  80. 127.0.0.1,-,-,[08/May/2014:14:27:44 +0800],GET /jygl/jaxrs/nationInfo/getAllNationInPage?pageSize=10&pageNo=1 HTTP/1.1,200,374
  81. 127.0.0.1,-,-,[08/May/2014:14:27:44 +0800],GET /jygl/jaxrs/nationInfo/getTotalNations HTTP/1.1,200,22
  82. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/baseInfo_nationInfoList HTTP/1.1,200,7471
  83. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/common/css/menuStyle2.css HTTP/1.1,404,1060
  84. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/common/css/basic.css HTTP/1.1,200,1476
  85. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:45 +0800],GET /jyglFront/common/css/_images/botton2.gif HTTP/1.1,404,1075
  86. 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227
  87. 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/allGradeInfos HTTP/1.1,200,3785
  88. 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/getSpeicalListByTwo?gradeID=&educationLevelID= HTTP/1.1,200,12061
  89. 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/studyCenterService/allStudyCentersByUtilObject HTTP/1.1,200,6006
  90. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:48 +0800],GET /jyglFront/teaching/openReplaceChooseCourse HTTP/1.1,200,26455
  91. 127.0.0.1,-,-,[08/May/2014:14:27:49 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?newClassBatchName=&gradeName=&term=-1 HTTP/1.1,204,-
  92. 127.0.0.1,-,-,[08/May/2014:14:27:49 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?newClassBatchName=&gradeName=&term=-1 HTTP/1.1,204,-
  93. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:49 +0800],GET /jyglFront/teaching/openChooseCourse HTTP/1.1,200,1611
  94. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/currentGradeInfo HTTP/1.1,200,473
  95. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227
  96. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/allGradeInfos HTTP/1.1,200,3785
  97. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?gradeId=4af2a042437c2c0801437ed1cdea0017 HTTP/1.1,200,20
  98. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?gradeId=4af2a0423f41d66d013f5a1f766c00ce HTTP/1.1,200,20
  99. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/teachingPlanListByEducationLevelAndGradeId?grade=4af2a042437c2c0801437ed1cdea0017&educationLevel= HTTP/1.1,200,4849
  100. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:52 +0800],GET /jyglFront/teaching/teachingPlanList HTTP/1.1,200,22794
  101. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:52 +0800],GET /jyglFront/js/jquery.form.js HTTP/1.1,200,30330
  102. 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest HTTP/1.1,200,43
  103. 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd HTTP/1.1,200,16
  104. 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403 HTTP/1.1,200,653
  105. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:28:02 +0800],GET /jyglFront/exam_initgroupsubscribestatistic HTTP/1.1,200,13551
  106. 127.0.0.1,-,-,[08/May/2014:14:28:19 +0800],POST /jygl/jaxrs/right/addUserLog HTTP/1.1,200,-
  107. 127.0.0.1,-,-,[08/May/2014:14:31:42 +0800],GET /jygl/jaxrs/exam/examSubscribeService/groupSubscribe/201403/0/0/201309/1 HTTP/1.1,200,-

2.      要解决的问题:统计资源(URL)被访问的次数。

3.      编程实现

想法:解析Tomcat日志,map的工作是将每一行日志中的URL截取作为key值,value为1表示1次,reduce的工作是将相同key值的行合并,value为总次数。

代码如下:

package org.ly.ccnu;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class SecondTest extends Configured implements Tool{
enum Counter{
LINESKIP,
}
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>{
private static final IntWritable one = new IntWritable(1);
public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException{
String line = value.toString();
try{
String[] lineSplit = line.split(",");
String requestUrl = lineSplit[4];
requestUrl = requestUrl.substring(requestUrl.indexOf(' ')+1, requestUrl.lastIndexOf(' '));
Text out = new Text(requestUrl);
context.write(out,one);
}catch(java.lang.ArrayIndexOutOfBoundsException e){
context.getCounter(Counter.LINESKIP).increment(1);
}
}
}
public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{
public void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException{
int count = 0;
for(IntWritable v : values){
count = count + 1;
}
try {
context.write(key, new IntWritable(count));
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
@Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
Job job = new Job(conf, "logAnalysis");
job.setJarByClass(SecondTest.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputFormatClass(TextOutputFormat.class);
//keep the same format with the output of Map and Reduce
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.waitForCompletion(true);
return job.isSuccessful()?0:1;
}
public static void main(String[] args)throws Exception{
int res = ToolRunner.run(new Configuration(), new SecondTest(),args);
System.exit(res);
}
}

4.      处理结果

  1. / 2
  2. /jygl/jaxrs/article/getArticleList/10-1 3
  3. /jygl/jaxrs/article/getTotalArticleRecords 3
  4. /jygl/jaxrs/enroll/educationLevelService/allEducationLevels 5
  5. /jygl/jaxrs/enroll/gradeInfoService/allGradeInfos 2
  6. /jygl/jaxrs/enroll/gradeInfoService/currentGradeInfo 1
  7. /jygl/jaxrs/enroll/studyCenterService/allStudyCentersByUtilObject 1
  8. /jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest 2
  9. /jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd 2
  10. /jygl/jaxrs/exam/examParameterService/getAllGradeInfo 3
  11. /jygl/jaxrs/exam/examParameterService/getAllStudyCenters 3
  12. /jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403 2
  13. /jygl/jaxrs/exam/examSubscribeService/groupSubscribe/201403/0/0/201309/1 1
  14. /jygl/jaxrs/graduate/graduateBatchService/getAllGraduateBatch 1
  15. /jygl/jaxrs/graduate/graduateBatchService/getGraduateBatchByConditions?graduateBatchName=&pageSize=10&pageNo=1 1
  16. /jygl/jaxrs/graduate/graduateBatchService/getTotalGraduateBatchByCondition?graduateBatchName= 1
  17. /jygl/jaxrs/nationInfo/getAllNationInPage?pageSize=10&pageNo=1 1
  18. /jygl/jaxrs/nationInfo/getTotalNations 1
  19. /jygl/jaxrs/right/addUserLog 1
  20. /jygl/jaxrs/right/getUserByLoginName/admin 3
  21. /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin 3
  22. /jygl/jaxrs/teaching/teachingPlanService/getSpeicalListByTwo?gradeID=&educationLevelID= 1
  23. /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?gradeId=4af2a0423f41d66d013f5a1f766c00ce 1
  24. /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?gradeId=4af2a042437c2c0801437ed1cdea0017 1
  25. /jygl/jaxrs/teaching/teachingPlanService/teachingPlanListByEducationLevelAndGradeId?grade=4af2a042437c2c0801437ed1cdea0017&educationLevel= 1
  26. /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?newClassBatchName=&gradeName=&term=-1 2
  27. /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO 2
  28. /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO 2
  29. /jyglFront/baseInfo_articleList?flag=1 3
  30. /jyglFront/baseInfo_nationInfoList 1
  31. /jyglFront/common/css/_images/botton2.gif 1
  32. /jyglFront/common/css/basic.css 1
  33. /jyglFront/common/css/menuStyle2.css 1
  34. /jyglFront/exam_initgroupsubscribestatistic 2
  35. /jyglFront/exam_initsubstudentsubscribe 1
  36. /jyglFront/graduate_initGraduateBatch 1
  37. /jyglFront/graduate_initGraduateInfo 1
  38. /jyglFront/graduate_initGraduateQulifyCheck 1
  39. /jyglFront/graduate_initLeaveSchoolInfo 1
  40. /jyglFront/js/jquery.form.js 1
  41. /jyglFront/mainView/navigate/images/allmenu.gif 3
  42. /jyglFront/mainView/navigate/images/leftmenu_bg.gif 3
  43. /jyglFront/mainView/navigate/images/logo.png 3
  44. /jyglFront/mainView/navigate/images/toggle_menu.gif 3
  45. /jyglFront/mainView/navigate/js/frame.js 3
  46. /jyglFront/mainView/navigate/js/jquery.js 3
  47. /jyglFront/mainView/navigate/js/tree.js 3
  48. /jyglFront/mainView/navigate/menuList.jsp 3
  49. /jyglFront/mainView/navigate/style/images/header_bg.jpg 3
  50. /jyglFront/mainView/navigate/style/style.css 3
  51. /jyglFront/mainView/studentView/style/images/nav_10.png 3
  52. /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 3
  53. /jyglFront/supervisor/intoInitAssignmentDetail 1
  54. /jyglFront/teaching/openChooseCourse 1
  55. /jyglFront/teaching/openReplaceChooseCourse 1
  56. /jyglFront/teaching/teachingPlanList 1

Hadoop学习:Map/Reduce初探与小Demo实现的更多相关文章

  1. 大文本 通过 hadoop spark map reduce 获取 特征列 的 属性值 计算速度

    大文本 通过 hadoop spark map reduce   获取 特征列  的 属性值  计算速度

  2. Hadoop 少量map/reduce任务执行慢问题

    最近在做报表统计,跑hadoop任务. 之前也跑过map/reduce但是数据量不大,遇到某些map/reduce执行时间特别长的问题. 执行时间长有几种可能性: 1. 单个map/reduce任务处 ...

  3. python 学习 map /reduce

    python 内建了map()和reduce()函数 map()函数接收两个参数,一个是函数,一个是Iterable,map将传入的函数依次作用到序列的每个元素,并把结果作为新的Iterator返回. ...

  4. hadoop编译map/reduce时的问题

    参考链接 http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html http://blog.endlesscode.com/20 ...

  5. hadoop入门级总结二:Map/Reduce

    在上一篇博客:hadoop入门级总结一:HDFS中,简单的介绍了hadoop分布式文件系统HDFS的整体框架及文件写入读出机制.接下来,简要的总结一下hadoop的另外一大关键技术之一分布式计算框架: ...

  6. [BigData]关于Hadoop学习笔记第一天(PPT总结)(一)

    适合大数据的分布式存储与计算平台 l作者:Doug Cutting l受Google三篇论文的启发   lApache 官方版本(1.0.4) lCloudera 使用下载最多的版本,稳定,有商业支持 ...

  7. Map Reduce和流处理

    欢迎大家前往腾讯云+社区,获取更多腾讯海量技术实践干货哦~ 本文由@从流域到海域翻译,发表于腾讯云+社区 map()和reduce()是在集群式设备上用来做大规模数据处理的方法,用户定义一个特定的映射 ...

  8. Hadoop学习笔记2 - 第一和第二个Map Reduce程序

    转载请标注原链接http://www.cnblogs.com/xczyd/p/8608906.html 在Hdfs学习笔记1 - 使用Java API访问远程hdfs集群中,我们已经可以完成了访问hd ...

  9. hadoop学习WordCount+Block+Split+Shuffle+Map+Reduce技术详解

    转自:http://blog.csdn.net/yczws1/article/details/21899007 纯干货:通过WourdCount程序示例:详细讲解MapReduce之Block+Spl ...

随机推荐

  1. LInux 解压缩文件

    常用命令有2个,一个是tar,一个是zip,二选一就行 有的服务器没有安装zip命令,就只有tar可以用,我个人建议还是安装一个zip好一些,tar实在太繁琐 1.解压 tar -zxvf ./xxx ...

  2. python 读取单所有json数据写入mongodb(单个)

    <--------------主函数-------------------> from pymongo import MongoClientfrom bson.objectid impor ...

  3. linux下更改主机名方法hostname

    一.永久修改修改/etc/sysconfig/network,在里面指定主机名称HOSTNAME=然后执行命令hostname 主机名这个时候可以注销一下系统,再重登录之后就行了. 或者修改/etc/ ...

  4. JAVA 报错exe4j中this executable was created with an evaluation 怎么办

    如果使用未破解注册的exe4j打包JAR文件为EXE,运行EXE的时候就会出现下面的提示   打开exe4j软件,Change License或者是输入序列号,然后用注册机算一个注册码即可  

  5. springboot项目在Eclipse/Myeclipse中Debug启动跳转至断点(exitCurrentThread)

    Spring Boot项目使用了spring-boot-devtools工具且在Eclipse中Debug调试会自动跳转到这个方法: public static void exitCurrentThr ...

  6. 在Windows中监视IO性能

    附:在Windows中监视IO性能 本来准备写一篇windows中监视IO性能的,后来发现好像可写的内容不多,windows在细节这方面做的不是那么的好,不过那些基本信息还是有的. 在Windows中 ...

  7. vsphere 5.1 性能最佳实践。

    1.关于CPU负载.extop显示的结果 如果CPU load average>=1,说明主机过载了. 如果PCPU used%在80%左右说明良好,90%以上就临近过载了. VM赋予过多的vC ...

  8. Linux--U盘安装Ubuntu12.04[转]

    http://www.cnblogs.com/plokmju/p/linux_installubuntu.html 最近一直在研究Android内核驱动开发的相关事宜,使用VMware虚拟机虽然可以更 ...

  9. JAVA的CLASS文件详解

    一.事例 1.1 Test.java public class Test { public static void main(String[] args) { System.out.println(& ...

  10. filter中的DelegatingFilterProxy使用事例

    最近发现在filter内使用DelegatingFilterProxy过滤内容,那么为什么不用自带的Filter而使用Spring的DelegatingFilterProxy哪?最后才明白是因为fil ...