http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/

Hadoop家族系列文章,主要介绍Hadoop家族产品,常用的项目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增加的项目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。

从2011年开始,中国进入大数据风起云涌的时代,以Hadoop为代表的家族软件,占据了大数据处理的广阔地盘。开源界及厂商,所有数据软件,无一不向Hadoop靠拢。Hadoop也从小众的高富帅领域,变成了大数据开发的标准。在Hadoop原有技术基础之上,出现了Hadoop家族产品,通过“大数据”概念不断创新,推出科技进步。

作为IT界的开发人员,我们也要跟上节奏,抓住机遇,跟着Hadoop一起雄起!

关于作者:

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明出处:
http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/

前言

Mahout是Hadoop家族一员,从血缘就继承了Hadoop程序的特点,支持HDFS访问和MapReduce分步式算法。随着Mahout的发展,从0.7版本开始,Mahout做了重大的升级。移除了部分算法的单机内存计算,只支持基于Hadoop的MapReduce平行计算。

从这点上,我们能看出Mahout走向大数据,坚持并行化的决心!相信在Hadoop的大框架下,Mahout最终能成为一个大数据的明星产品!

目录

  1. Mahout开发环境介绍
  2. Mahout基于Hadoop的分步环境介绍
  3. 用Mahout实现协同过滤ItemCF
  4. 模板项目上传github

1. Mahout开发环境介绍

在 用Maven构建Mahout项目 文章中,我们已经配置好了基于Maven的Mahout的开发环境,我们将继续完成Mahout的分步式的程序开发。

本文的mahout版本为0.8。

开发环境:

  • Win7 64bit
  • Java 1.6.0_45
  • Maven 3
  • Eclipse Juno Service Release 2
  • Mahout 0.8
  • Hadoop 1.1.2

找到pom.xml,修改mahout版本为0.8

  1. <mahout.version>0.8</mahout.version>

然后,下载依赖库。

  1. ~ mvn clean install

由于 org.conan.mymahout.cluster06.Kmeans.java 类代码,是基于mahout-0.6的,所以会报错。我们可以先注释这个文件。

2. Mahout基于Hadoop的分步环境介绍

如上图所示,我们可以选择在win7中开发,也可以在linux中开发,开发过程我们可以在本地环境进行调试,标配的工具都是Maven和Eclipse。

Mahout在运行过程中,会把MapReduce的算法程序包,自动发布的Hadoop的集群环境中,这种开发和运行模式,就和真正的生产环境差不多了。

3. 用Mahout实现协同过滤ItemCF

实现步骤:

  • 1. 准备数据文件: item.csv
  • 2. Java程序:HdfsDAO.java
  • 3. Java程序:ItemCFHadoop.java
  • 4. 运行程序
  • 5. 推荐结果解读

1). 准备数据文件: item.csv
上传测试数据到HDFS,单机内存实验请参考文章:用Maven构建Mahout项目


  1. ~ hadoop fs -mkdir /user/hdfs/userCF
  2. ~ hadoop fs -copyFromLocal /home/conan/datafiles/item.csv /user/hdfs/userCF
  3. ~ hadoop fs -cat /user/hdfs/userCF/item.csv
  4. 1,101,5.0
  5. 1,102,3.0
  6. 1,103,2.5
  7. 2,101,2.0
  8. 2,102,2.5
  9. 2,103,5.0
  10. 2,104,2.0
  11. 3,101,2.5
  12. 3,104,4.0
  13. 3,105,4.5
  14. 3,107,5.0
  15. 4,101,5.0
  16. 4,103,3.0
  17. 4,104,4.5
  18. 4,106,4.0
  19. 5,101,4.0
  20. 5,102,3.0
  21. 5,103,2.0
  22. 5,104,4.0
  23. 5,105,3.5
  24. 5,106,4.0

2). Java程序:HdfsDAO.java
HdfsDAO.java,是一个HDFS操作的工具,用API实现Hadoop的各种HDFS命令,请参考文章:Hadoop编程调用HDFS

我们这里会用到HdfsDAO.java类中的一些方法:


  1. HdfsDAO hdfs = new HdfsDAO(HDFS, conf);
  2. hdfs.rmr(inPath);
  3. hdfs.mkdirs(inPath);
  4. hdfs.copyFile(localFile, inPath);
  5. hdfs.ls(inPath);
  6. hdfs.cat(inFile);

3). Java程序:ItemCFHadoop.java
用Mahout实现分步式算法,我们看到Mahout in Action中的解释。

实现程序:


  1. package org.conan.mymahout.recommendation;
  2. import org.apache.hadoop.mapred.JobConf;
  3. import org.apache.mahout.cf.taste.hadoop.item.RecommenderJob;
  4. import org.conan.mymahout.hdfs.HdfsDAO;
  5. public class ItemCFHadoop {
  6. private static final String HDFS = "hdfs://192.168.1.210:9000";
  7. public static void main(String[] args) throws Exception {
  8. String localFile = "datafile/item.csv";
  9. String inPath = HDFS + "/user/hdfs/userCF";
  10. String inFile = inPath + "/item.csv";
  11. String outPath = HDFS + "/user/hdfs/userCF/result/";
  12. String outFile = outPath + "/part-r-00000";
  13. String tmpPath = HDFS + "/tmp/" + System.currentTimeMillis();
  14. JobConf conf = config();
  15. HdfsDAO hdfs = new HdfsDAO(HDFS, conf);
  16. hdfs.rmr(inPath);
  17. hdfs.mkdirs(inPath);
  18. hdfs.copyFile(localFile, inPath);
  19. hdfs.ls(inPath);
  20. hdfs.cat(inFile);
  21. StringBuilder sb = new StringBuilder();
  22. sb.append("--input ").append(inPath);
  23. sb.append(" --output ").append(outPath);
  24. sb.append(" --booleanData true");
  25. sb.append(" --similarityClassname org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.EuclideanDistanceSimilarity");
  26. sb.append(" --tempDir ").append(tmpPath);
  27. args = sb.toString().split(" ");
  28. RecommenderJob job = new RecommenderJob();
  29. job.setConf(conf);
  30. job.run(args);
  31. hdfs.cat(outFile);
  32. }
  33. public static JobConf config() {
  34. JobConf conf = new JobConf(ItemCFHadoop.class);
  35. conf.setJobName("ItemCFHadoop");
  36. conf.addResource("classpath:/hadoop/core-site.xml");
  37. conf.addResource("classpath:/hadoop/hdfs-site.xml");
  38. conf.addResource("classpath:/hadoop/mapred-site.xml");
  39. return conf;
  40. }
  41. }

RecommenderJob.java,实际上就是封装了,上面整个图的分步式并行算法的执行过程!如果没有这层封装,我们需要自己去实现图中8个步骤MapReduce算法。

关于上面算法的深度剖析,请参考文章:R实现MapReduce的协同过滤算法

4). 运行程序
控制台输出:


  1. Delete: hdfs://192.168.1.210:9000/user/hdfs/userCF
  2. Create: hdfs://192.168.1.210:9000/user/hdfs/userCF
  3. copy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCF
  4. ls: hdfs://192.168.1.210:9000/user/hdfs/userCF
  5. ==========================================================
  6. name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229
  7. ==========================================================
  8. cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv
  9. 1,101,5.0
  10. 1,102,3.0
  11. 1,103,2.5
  12. 2,101,2.0
  13. 2,102,2.5
  14. 2,103,5.0
  15. 2,104,2.0
  16. 3,101,2.5
  17. 3,104,4.0
  18. 3,105,4.5
  19. 3,107,5.0
  20. 4,101,5.0
  21. 4,103,3.0
  22. 4,104,4.5
  23. 4,106,4.0
  24. 5,101,4.0
  25. 5,102,3.0
  26. 5,103,2.0
  27. 5,104,4.0
  28. 5,105,3.5
  29. 5,106,4.0SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
  30. SLF4J: Defaulting to no-operation (NOP) logger implementation
  31. SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
  32. 2013-10-14 10:26:35 org.apache.hadoop.util.NativeCodeLoader
  33. 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  34. 2013-10-14 10:26:35 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  35. 信息: Total input paths to process : 1
  36. 2013-10-14 10:26:35 org.apache.hadoop.io.compress.snappy.LoadSnappy
  37. 警告: Snappy native library not loaded
  38. 2013-10-14 10:26:36 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  39. 信息: Running job: job_local_0001
  40. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize
  41. 信息: Using ResourceCalculatorPlugin : null
  42. 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  43. 信息: io.sort.mb = 100
  44. 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  45. 信息: data buffer = 79691776/99614720
  46. 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  47. 信息: record buffer = 262144/327680
  48. 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  49. 信息: Starting flush of map output
  50. 2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getCompressor
  51. 信息: Got brand-new compressor
  52. 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  53. 信息: Finished spill 0
  54. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task done
  55. 信息: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
  56. 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  57. 信息:
  58. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone
  59. 信息: Task 'attempt_local_0001_m_000000_0' done.
  60. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize
  61. 信息: Using ResourceCalculatorPlugin : null
  62. 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  63. 信息:
  64. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge
  65. 信息: Merging 1 sorted segments
  66. 2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getDecompressor
  67. 信息: Got brand-new decompressor
  68. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge
  69. 信息: Down to the last merge-pass, with 1 segments left of total size: 42 bytes
  70. 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  71. 信息:
  72. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task done
  73. 信息: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
  74. 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  75. 信息:
  76. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task commit
  77. 信息: Task attempt_local_0001_r_000000_0 is allowed to commit now
  78. 2013-10-14 10:26:36 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  79. 信息: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/itemIDIndex
  80. 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  81. 信息: reduce > reduce
  82. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone
  83. 信息: Task 'attempt_local_0001_r_000000_0' done.
  84. 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  85. 信息: map 100% reduce 100%
  86. 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  87. 信息: Job complete: job_local_0001
  88. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  89. 信息: Counters: 19
  90. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  91. 信息: File Output Format Counters
  92. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  93. 信息: Bytes Written=187
  94. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  95. 信息: FileSystemCounters
  96. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  97. 信息: FILE_BYTES_READ=3287330
  98. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  99. 信息: HDFS_BYTES_READ=916
  100. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  101. 信息: FILE_BYTES_WRITTEN=3443292
  102. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  103. 信息: HDFS_BYTES_WRITTEN=645
  104. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  105. 信息: File Input Format Counters
  106. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  107. 信息: Bytes Read=229
  108. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  109. 信息: Map-Reduce Framework
  110. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  111. 信息: Map output materialized bytes=46
  112. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  113. 信息: Map input records=21
  114. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  115. 信息: Reduce shuffle bytes=0
  116. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  117. 信息: Spilled Records=14
  118. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  119. 信息: Map output bytes=84
  120. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  121. 信息: Total committed heap usage (bytes)=376569856
  122. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  123. 信息: SPLIT_RAW_BYTES=116
  124. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  125. 信息: Combine input records=21
  126. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  127. 信息: Reduce input records=7
  128. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  129. 信息: Reduce input groups=7
  130. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  131. 信息: Combine output records=7
  132. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  133. 信息: Reduce output records=7
  134. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  135. 信息: Map output records=21
  136. 2013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  137. 信息: Total input paths to process : 1
  138. 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  139. 信息: Running job: job_local_0002
  140. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize
  141. 信息: Using ResourceCalculatorPlugin : null
  142. 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  143. 信息: io.sort.mb = 100
  144. 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  145. 信息: data buffer = 79691776/99614720
  146. 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  147. 信息: record buffer = 262144/327680
  148. 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  149. 信息: Starting flush of map output
  150. 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  151. 信息: Finished spill 0
  152. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task done
  153. 信息: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
  154. 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  155. 信息:
  156. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone
  157. 信息: Task 'attempt_local_0002_m_000000_0' done.
  158. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize
  159. 信息: Using ResourceCalculatorPlugin : null
  160. 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  161. 信息:
  162. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge
  163. 信息: Merging 1 sorted segments
  164. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge
  165. 信息: Down to the last merge-pass, with 1 segments left of total size: 68 bytes
  166. 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  167. 信息:
  168. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task done
  169. 信息: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
  170. 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  171. 信息:
  172. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task commit
  173. 信息: Task attempt_local_0002_r_000000_0 is allowed to commit now
  174. 2013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  175. 信息: Saved output of task 'attempt_local_0002_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/userVectors
  176. 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  177. 信息: reduce > reduce
  178. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone
  179. 信息: Task 'attempt_local_0002_r_000000_0' done.
  180. 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  181. 信息: map 100% reduce 100%
  182. 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  183. 信息: Job complete: job_local_0002
  184. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  185. 信息: Counters: 20
  186. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  187. 信息: org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
  188. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  189. 信息: USERS=5
  190. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  191. 信息: File Output Format Counters
  192. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  193. 信息: Bytes Written=288
  194. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  195. 信息: FileSystemCounters
  196. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  197. 信息: FILE_BYTES_READ=6574274
  198. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  199. 信息: HDFS_BYTES_READ=1374
  200. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  201. 信息: FILE_BYTES_WRITTEN=6887592
  202. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  203. 信息: HDFS_BYTES_WRITTEN=1120
  204. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  205. 信息: File Input Format Counters
  206. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  207. 信息: Bytes Read=229
  208. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  209. 信息: Map-Reduce Framework
  210. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  211. 信息: Map output materialized bytes=72
  212. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  213. 信息: Map input records=21
  214. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  215. 信息: Reduce shuffle bytes=0
  216. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  217. 信息: Spilled Records=42
  218. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  219. 信息: Map output bytes=63
  220. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  221. 信息: Total committed heap usage (bytes)=575930368
  222. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  223. 信息: SPLIT_RAW_BYTES=116
  224. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  225. 信息: Combine input records=0
  226. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  227. 信息: Reduce input records=21
  228. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  229. 信息: Reduce input groups=5
  230. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  231. 信息: Combine output records=0
  232. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  233. 信息: Reduce output records=5
  234. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  235. 信息: Map output records=21
  236. 2013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  237. 信息: Total input paths to process : 1
  238. 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  239. 信息: Running job: job_local_0003
  240. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize
  241. 信息: Using ResourceCalculatorPlugin : null
  242. 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  243. 信息: io.sort.mb = 100
  244. 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  245. 信息: data buffer = 79691776/99614720
  246. 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  247. 信息: record buffer = 262144/327680
  248. 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  249. 信息: Starting flush of map output
  250. 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  251. 信息: Finished spill 0
  252. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task done
  253. 信息: Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting
  254. 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  255. 信息:
  256. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone
  257. 信息: Task 'attempt_local_0003_m_000000_0' done.
  258. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize
  259. 信息: Using ResourceCalculatorPlugin : null
  260. 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  261. 信息:
  262. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge
  263. 信息: Merging 1 sorted segments
  264. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge
  265. 信息: Down to the last merge-pass, with 1 segments left of total size: 89 bytes
  266. 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  267. 信息:
  268. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task done
  269. 信息: Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting
  270. 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  271. 信息:
  272. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task commit
  273. 信息: Task attempt_local_0003_r_000000_0 is allowed to commit now
  274. 2013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  275. 信息: Saved output of task 'attempt_local_0003_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/ratingMatrix
  276. 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  277. 信息: reduce > reduce
  278. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone
  279. 信息: Task 'attempt_local_0003_r_000000_0' done.
  280. 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  281. 信息: map 100% reduce 100%
  282. 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  283. 信息: Job complete: job_local_0003
  284. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  285. 信息: Counters: 21
  286. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  287. 信息: File Output Format Counters
  288. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  289. 信息: Bytes Written=335
  290. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  291. 信息: org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
  292. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  293. 信息: USER_RATINGS_NEGLECTED=0
  294. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  295. 信息: USER_RATINGS_USED=21
  296. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  297. 信息: FileSystemCounters
  298. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  299. 信息: FILE_BYTES_READ=9861349
  300. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  301. 信息: HDFS_BYTES_READ=1950
  302. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  303. 信息: FILE_BYTES_WRITTEN=10331958
  304. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  305. 信息: HDFS_BYTES_WRITTEN=1751
  306. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  307. 信息: File Input Format Counters
  308. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  309. 信息: Bytes Read=288
  310. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  311. 信息: Map-Reduce Framework
  312. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  313. 信息: Map output materialized bytes=93
  314. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  315. 信息: Map input records=5
  316. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  317. 信息: Reduce shuffle bytes=0
  318. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  319. 信息: Spilled Records=14
  320. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  321. 信息: Map output bytes=336
  322. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  323. 信息: Total committed heap usage (bytes)=775290880
  324. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  325. 信息: SPLIT_RAW_BYTES=157
  326. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  327. 信息: Combine input records=21
  328. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  329. 信息: Reduce input records=7
  330. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  331. 信息: Reduce input groups=7
  332. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  333. 信息: Combine output records=7
  334. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  335. 信息: Reduce output records=7
  336. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  337. 信息: Map output records=21
  338. 2013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  339. 信息: Total input paths to process : 1
  340. 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  341. 信息: Running job: job_local_0004
  342. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize
  343. 信息: Using ResourceCalculatorPlugin : null
  344. 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  345. 信息: io.sort.mb = 100
  346. 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  347. 信息: data buffer = 79691776/99614720
  348. 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  349. 信息: record buffer = 262144/327680
  350. 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  351. 信息: Starting flush of map output
  352. 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  353. 信息: Finished spill 0
  354. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task done
  355. 信息: Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting
  356. 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  357. 信息:
  358. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone
  359. 信息: Task 'attempt_local_0004_m_000000_0' done.
  360. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize
  361. 信息: Using ResourceCalculatorPlugin : null
  362. 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  363. 信息:
  364. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge
  365. 信息: Merging 1 sorted segments
  366. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge
  367. 信息: Down to the last merge-pass, with 1 segments left of total size: 118 bytes
  368. 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  369. 信息:
  370. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task done
  371. 信息: Task:attempt_local_0004_r_000000_0 is done. And is in the process of commiting
  372. 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  373. 信息:
  374. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task commit
  375. 信息: Task attempt_local_0004_r_000000_0 is allowed to commit now
  376. 2013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  377. 信息: Saved output of task 'attempt_local_0004_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/weights
  378. 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  379. 信息: reduce > reduce
  380. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone
  381. 信息: Task 'attempt_local_0004_r_000000_0' done.
  382. 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  383. 信息: map 100% reduce 100%
  384. 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  385. 信息: Job complete: job_local_0004
  386. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  387. 信息: Counters: 20
  388. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  389. 信息: File Output Format Counters
  390. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  391. 信息: Bytes Written=381
  392. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  393. 信息: FileSystemCounters
  394. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  395. 信息: FILE_BYTES_READ=13148476
  396. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  397. 信息: HDFS_BYTES_READ=2628
  398. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  399. 信息: FILE_BYTES_WRITTEN=13780408
  400. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  401. 信息: HDFS_BYTES_WRITTEN=2551
  402. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  403. 信息: File Input Format Counters
  404. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  405. 信息: Bytes Read=335
  406. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  407. 信息: org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
  408. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  409. 信息: ROWS=7
  410. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  411. 信息: Map-Reduce Framework
  412. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  413. 信息: Map output materialized bytes=122
  414. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  415. 信息: Map input records=7
  416. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  417. 信息: Reduce shuffle bytes=0
  418. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  419. 信息: Spilled Records=16
  420. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  421. 信息: Map output bytes=516
  422. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  423. 信息: Total committed heap usage (bytes)=974651392
  424. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  425. 信息: SPLIT_RAW_BYTES=158
  426. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  427. 信息: Combine input records=24
  428. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  429. 信息: Reduce input records=8
  430. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  431. 信息: Reduce input groups=8
  432. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  433. 信息: Combine output records=8
  434. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  435. 信息: Reduce output records=5
  436. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  437. 信息: Map output records=24
  438. 2013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  439. 信息: Total input paths to process : 1
  440. 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  441. 信息: Running job: job_local_0005
  442. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize
  443. 信息: Using ResourceCalculatorPlugin : null
  444. 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  445. 信息: io.sort.mb = 100
  446. 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  447. 信息: data buffer = 79691776/99614720
  448. 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  449. 信息: record buffer = 262144/327680
  450. 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  451. 信息: Starting flush of map output
  452. 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  453. 信息: Finished spill 0
  454. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task done
  455. 信息: Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting
  456. 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  457. 信息:
  458. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone
  459. 信息: Task 'attempt_local_0005_m_000000_0' done.
  460. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize
  461. 信息: Using ResourceCalculatorPlugin : null
  462. 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  463. 信息:
  464. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge
  465. 信息: Merging 1 sorted segments
  466. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge
  467. 信息: Down to the last merge-pass, with 1 segments left of total size: 121 bytes
  468. 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  469. 信息:
  470. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task done
  471. 信息: Task:attempt_local_0005_r_000000_0 is done. And is in the process of commiting
  472. 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  473. 信息:
  474. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task commit
  475. 信息: Task attempt_local_0005_r_000000_0 is allowed to commit now
  476. 2013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  477. 信息: Saved output of task 'attempt_local_0005_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/pairwiseSimilarity
  478. 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  479. 信息: reduce > reduce
  480. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone
  481. 信息: Task 'attempt_local_0005_r_000000_0' done.
  482. 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  483. 信息: map 100% reduce 100%
  484. 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  485. 信息: Job complete: job_local_0005
  486. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  487. 信息: Counters: 21
  488. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  489. 信息: File Output Format Counters
  490. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  491. 信息: Bytes Written=392
  492. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  493. 信息: FileSystemCounters
  494. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  495. 信息: FILE_BYTES_READ=16435577
  496. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  497. 信息: HDFS_BYTES_READ=3488
  498. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  499. 信息: FILE_BYTES_WRITTEN=17230010
  500. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  501. 信息: HDFS_BYTES_WRITTEN=3408
  502. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  503. 信息: File Input Format Counters
  504. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  505. 信息: Bytes Read=381
  506. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  507. 信息: org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
  508. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  509. 信息: PRUNED_COOCCURRENCES=0
  510. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  511. 信息: COOCCURRENCES=57
  512. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  513. 信息: Map-Reduce Framework
  514. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  515. 信息: Map output materialized bytes=125
  516. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  517. 信息: Map input records=5
  518. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  519. 信息: Reduce shuffle bytes=0
  520. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  521. 信息: Spilled Records=14
  522. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  523. 信息: Map output bytes=744
  524. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  525. 信息: Total committed heap usage (bytes)=1174011904
  526. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  527. 信息: SPLIT_RAW_BYTES=129
  528. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  529. 信息: Combine input records=21
  530. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  531. 信息: Reduce input records=7
  532. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  533. 信息: Reduce input groups=7
  534. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  535. 信息: Combine output records=7
  536. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  537. 信息: Reduce output records=7
  538. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  539. 信息: Map output records=21
  540. 2013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  541. 信息: Total input paths to process : 1
  542. 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  543. 信息: Running job: job_local_0006
  544. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize
  545. 信息: Using ResourceCalculatorPlugin : null
  546. 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  547. 信息: io.sort.mb = 100
  548. 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  549. 信息: data buffer = 79691776/99614720
  550. 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  551. 信息: record buffer = 262144/327680
  552. 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  553. 信息: Starting flush of map output
  554. 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  555. 信息: Finished spill 0
  556. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task done
  557. 信息: Task:attempt_local_0006_m_000000_0 is done. And is in the process of commiting
  558. 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  559. 信息:
  560. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone
  561. 信息: Task 'attempt_local_0006_m_000000_0' done.
  562. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize
  563. 信息: Using ResourceCalculatorPlugin : null
  564. 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  565. 信息:
  566. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge
  567. 信息: Merging 1 sorted segments
  568. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge
  569. 信息: Down to the last merge-pass, with 1 segments left of total size: 158 bytes
  570. 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  571. 信息:
  572. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task done
  573. 信息: Task:attempt_local_0006_r_000000_0 is done. And is in the process of commiting
  574. 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  575. 信息:
  576. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task commit
  577. 信息: Task attempt_local_0006_r_000000_0 is allowed to commit now
  578. 2013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  579. 信息: Saved output of task 'attempt_local_0006_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/similarityMatrix
  580. 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  581. 信息: reduce > reduce
  582. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone
  583. 信息: Task 'attempt_local_0006_r_000000_0' done.
  584. 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  585. 信息: map 100% reduce 100%
  586. 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  587. 信息: Job complete: job_local_0006
  588. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  589. 信息: Counters: 19
  590. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  591. 信息: File Output Format Counters
  592. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  593. 信息: Bytes Written=554
  594. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  595. 信息: FileSystemCounters
  596. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  597. 信息: FILE_BYTES_READ=19722740
  598. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  599. 信息: HDFS_BYTES_READ=4342
  600. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  601. 信息: FILE_BYTES_WRITTEN=20674772
  602. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  603. 信息: HDFS_BYTES_WRITTEN=4354
  604. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  605. 信息: File Input Format Counters
  606. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  607. 信息: Bytes Read=392
  608. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  609. 信息: Map-Reduce Framework
  610. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  611. 信息: Map output materialized bytes=162
  612. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  613. 信息: Map input records=7
  614. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  615. 信息: Reduce shuffle bytes=0
  616. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  617. 信息: Spilled Records=14
  618. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  619. 信息: Map output bytes=599
  620. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  621. 信息: Total committed heap usage (bytes)=1373372416
  622. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  623. 信息: SPLIT_RAW_BYTES=140
  624. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  625. 信息: Combine input records=25
  626. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  627. 信息: Reduce input records=7
  628. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  629. 信息: Reduce input groups=7
  630. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  631. 信息: Combine output records=7
  632. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  633. 信息: Reduce output records=7
  634. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  635. 信息: Map output records=25
  636. 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  637. 信息: Total input paths to process : 1
  638. 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  639. 信息: Total input paths to process : 1
  640. 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  641. 信息: Running job: job_local_0007
  642. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize
  643. 信息: Using ResourceCalculatorPlugin : null
  644. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  645. 信息: io.sort.mb = 100
  646. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  647. 信息: data buffer = 79691776/99614720
  648. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  649. 信息: record buffer = 262144/327680
  650. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  651. 信息: Starting flush of map output
  652. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  653. 信息: Finished spill 0
  654. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done
  655. 信息: Task:attempt_local_0007_m_000000_0 is done. And is in the process of commiting
  656. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  657. 信息:
  658. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone
  659. 信息: Task 'attempt_local_0007_m_000000_0' done.
  660. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize
  661. 信息: Using ResourceCalculatorPlugin : null
  662. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  663. 信息: io.sort.mb = 100
  664. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  665. 信息: data buffer = 79691776/99614720
  666. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  667. 信息: record buffer = 262144/327680
  668. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  669. 信息: Starting flush of map output
  670. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  671. 信息: Finished spill 0
  672. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done
  673. 信息: Task:attempt_local_0007_m_000001_0 is done. And is in the process of commiting
  674. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  675. 信息:
  676. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone
  677. 信息: Task 'attempt_local_0007_m_000001_0' done.
  678. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize
  679. 信息: Using ResourceCalculatorPlugin : null
  680. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  681. 信息:
  682. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge
  683. 信息: Merging 2 sorted segments
  684. 2013-10-14 10:26:42 org.apache.hadoop.io.compress.CodecPool getDecompressor
  685. 信息: Got brand-new decompressor
  686. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge
  687. 信息: Down to the last merge-pass, with 2 segments left of total size: 233 bytes
  688. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  689. 信息:
  690. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done
  691. 信息: Task:attempt_local_0007_r_000000_0 is done. And is in the process of commiting
  692. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  693. 信息:
  694. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task commit
  695. 信息: Task attempt_local_0007_r_000000_0 is allowed to commit now
  696. 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  697. 信息: Saved output of task 'attempt_local_0007_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/partialMultiply
  698. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  699. 信息: reduce > reduce
  700. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone
  701. 信息: Task 'attempt_local_0007_r_000000_0' done.
  702. 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  703. 信息: map 100% reduce 100%
  704. 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  705. 信息: Job complete: job_local_0007
  706. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  707. 信息: Counters: 19
  708. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  709. 信息: File Output Format Counters
  710. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  711. 信息: Bytes Written=572
  712. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  713. 信息: FileSystemCounters
  714. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  715. 信息: FILE_BYTES_READ=34517913
  716. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  717. 信息: HDFS_BYTES_READ=8751
  718. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  719. 信息: FILE_BYTES_WRITTEN=36182630
  720. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  721. 信息: HDFS_BYTES_WRITTEN=7934
  722. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  723. 信息: File Input Format Counters
  724. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  725. 信息: Bytes Read=0
  726. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  727. 信息: Map-Reduce Framework
  728. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  729. 信息: Map output materialized bytes=241
  730. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  731. 信息: Map input records=12
  732. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  733. 信息: Reduce shuffle bytes=0
  734. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  735. 信息: Spilled Records=56
  736. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  737. 信息: Map output bytes=453
  738. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  739. 信息: Total committed heap usage (bytes)=2558459904
  740. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  741. 信息: SPLIT_RAW_BYTES=665
  742. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  743. 信息: Combine input records=0
  744. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  745. 信息: Reduce input records=28
  746. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  747. 信息: Reduce input groups=7
  748. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  749. 信息: Combine output records=0
  750. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  751. 信息: Reduce output records=7
  752. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  753. 信息: Map output records=28
  754. 2013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  755. 信息: Total input paths to process : 1
  756. 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  757. 信息: Running job: job_local_0008
  758. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize
  759. 信息: Using ResourceCalculatorPlugin : null
  760. 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  761. 信息: io.sort.mb = 100
  762. 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  763. 信息: data buffer = 79691776/99614720
  764. 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  765. 信息: record buffer = 262144/327680
  766. 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  767. 信息: Starting flush of map output
  768. 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  769. 信息: Finished spill 0
  770. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task done
  771. 信息: Task:attempt_local_0008_m_000000_0 is done. And is in the process of commiting
  772. 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  773. 信息:
  774. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone
  775. 信息: Task 'attempt_local_0008_m_000000_0' done.
  776. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize
  777. 信息: Using ResourceCalculatorPlugin : null
  778. 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  779. 信息:
  780. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge
  781. 信息: Merging 1 sorted segments
  782. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge
  783. 信息: Down to the last merge-pass, with 1 segments left of total size: 206 bytes
  784. 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  785. 信息:
  786. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task done
  787. 信息: Task:attempt_local_0008_r_000000_0 is done. And is in the process of commiting
  788. 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  789. 信息:
  790. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task commit
  791. 信息: Task attempt_local_0008_r_000000_0 is allowed to commit now
  792. 2013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  793. 信息: Saved output of task 'attempt_local_0008_r_000000_0' to hdfs://192.168.1.210:9000/user/hdfs/userCF/result
  794. 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  795. 信息: reduce > reduce
  796. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone
  797. 信息: Task 'attempt_local_0008_r_000000_0' done.
  798. 2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  799. 信息: map 100% reduce 100%
  800. 2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  801. 信息: Job complete: job_local_0008
  802. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  803. 信息: Counters: 19
  804. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  805. 信息: File Output Format Counters
  806. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  807. 信息: Bytes Written=217
  808. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  809. 信息: FileSystemCounters
  810. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  811. 信息: FILE_BYTES_READ=26299802
  812. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  813. 信息: HDFS_BYTES_READ=7357
  814. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  815. 信息: FILE_BYTES_WRITTEN=27566408
  816. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  817. 信息: HDFS_BYTES_WRITTEN=6269
  818. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  819. 信息: File Input Format Counters
  820. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  821. 信息: Bytes Read=572
  822. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  823. 信息: Map-Reduce Framework
  824. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  825. 信息: Map output materialized bytes=210
  826. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  827. 信息: Map input records=7
  828. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  829. 信息: Reduce shuffle bytes=0
  830. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  831. 信息: Spilled Records=42
  832. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  833. 信息: Map output bytes=927
  834. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  835. 信息: Total committed heap usage (bytes)=1971453952
  836. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  837. 信息: SPLIT_RAW_BYTES=137
  838. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  839. 信息: Combine input records=0
  840. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  841. 信息: Reduce input records=21
  842. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  843. 信息: Reduce input groups=5
  844. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  845. 信息: Combine output records=0
  846. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  847. 信息: Reduce output records=5
  848. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  849. 信息: Map output records=21
  850. cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-00000
  851. 1 [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334]
  852. 2 [106:1.560478,105:1.4795978,107:0.69935876]
  853. 3 [103:1.2475469,106:1.1944525,102:1.1462644]
  854. 4 [102:1.6462644,105:1.5277859,107:0.69935876]
  855. 5 [107:1.1993587]

5). 推荐结果解读
我们可以把上面的日志分解析成3个部分解读

  • a. 初始化环境
  • b. 算法执行
  • c. 打印推荐结果

a. 初始化环境
出初HDFS的数据目录和工作目录,并上传数据文件。


  1. Delete: hdfs://192.168.1.210:9000/user/hdfs/userCF
  2. Create: hdfs://192.168.1.210:9000/user/hdfs/userCF
  3. copy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCF
  4. ls: hdfs://192.168.1.210:9000/user/hdfs/userCF
  5. ==========================================================
  6. name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229
  7. ==========================================================
  8. cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv

b. 算法执行
分别执行,上图中对应的8种MapReduce算法。
Job complete: job_local_0001
Job complete: job_local_0002
Job complete: job_local_0003
Job complete: job_local_0004
Job complete: job_local_0005
Job complete: job_local_0006
Job complete: job_local_0007
Job complete: job_local_0008

c. 打印推荐结果

方便我们看到计算后的推荐结果


  1. cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-00000
  2. 1 [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334]
  3. 2 [106:1.560478,105:1.4795978,107:0.69935876]
  4. 3 [103:1.2475469,106:1.1944525,102:1.1462644]
  5. 4 [102:1.6462644,105:1.5277859,107:0.69935876]
  6. 5 [107:1.1993587]

4. 模板项目上传github

https://github.com/bsspirit/maven_mahout_template/tree/mahout-0.8

大家可以下载这个项目,做为开发的起点。


  1. ~ git clone https://github.com/bsspirit/maven_mahout_template
  2. ~ git checkout mahout-0.8

我们完成了基于物品的协同过滤分步式算法实现,下面将继续介绍Mahout的Kmeans的分步式算法实现,请参考文章:Mahout分步式程序开发 聚类Kmeans

转载请注明出处:
http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/

 

This entry was posted in Hadoop实践JAVA语言实践数据挖掘架构设计

Mahout分步式程序开发 基于物品的协同过滤ItemCF的更多相关文章

  1. 转】Mahout分步式程序开发 基于物品的协同过滤ItemCF

    原博文出自于: http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/ 感谢! Posted: Oct 14, 2013 Tags: Hadoopite ...

  2. 转】Mahout分步式程序开发 聚类Kmeans

    原博文出自于: http://blog.fens.me/hadoop-mahout-kmeans/ 感谢! Mahout分步式程序开发 聚类Kmeans Hadoop家族系列文章,主要介绍Hadoop ...

  3. Mahout分步式程序开发 聚类Kmeans(转)

    Posted: Oct 14, 2013 Tags: clusterHadoopkmeansMahoutR聚类 Comments: 13 Comments Mahout分步式程序开发 聚类Kmeans ...

  4. 基于物品的协同过滤ItemCF的mapreduce实现

    文章的UML图比较好看..... 原文链接:www.cnblogs.com/anny-1980/articles/3519555.html 基于物品的协同过滤ItemCF 数据集字段: 1.  Use ...

  5. 基于物品的协同过滤item-CF 之电影推荐 python

    推荐算法有基于协同的Collaboration Filtering:包括 user Based和item Based:基于内容 : Content Based 协同过滤包括基于物品的协同过滤和基于用户 ...

  6. 推荐召回--基于物品的协同过滤:ItemCF

    目录 1. 前言 2. 原理&计算&改进 3. 总结 1. 前言 说完基于用户的协同过滤后,趁热打铁,我们来说说基于物品的协同过滤:"看了又看","买了又 ...

  7. 基于物品的协同过滤推荐算法——读“Item-Based Collaborative Filtering Recommendation Algorithms” .

    ligh@local-host$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.3 基于物品的协同过滤推荐算法--读"Item-Based ...

  8. ItemCF_基于物品的协同过滤_MapReduceJava代码实现思路

    ItemCF_基于物品的协同过滤 1.    概念 2.    原理 如何给用户推荐? 给用户推荐他没有买过的物品--103 3.    java代码实现思路 数据集: 第一步:构建物品的同现矩阵 第 ...

  9. Music Recommendation System with User-based and Item-based Collaborative Filtering Technique(使用基于用户及基于物品的协同过滤技术的音乐推荐系统)【更新】

    摘要: 大数据催生了互联网,电子商务,也导致了信息过载.信息过载的问题可以由推荐系统来解决.推荐系统可以提供选择新产品(电影,音乐等)的建议.这篇论文介绍了一个音乐推荐系统,它会根据用户的历史行为和口 ...

随机推荐

  1. 16、Redis手动创建集群

    写在前面的话:读书破万卷,编码如有神 --------------------------------------------------------------------------------- ...

  2. Java Web c3p0 pool池泄漏优化与日志分析

    问题跟踪: 近期在整合SSH(spring.springmvc.hibernate)项目,提供给第三方服务.每当调用内存池达到上限之后,外界调用服务直接失败,提示[cannot open connec ...

  3. 解决 MyEclipse build workspace 慢,validation javascript 更慢的问题

    自从升级了MyEclipse到7.0,项目Build的时候总是很慢,显示Validating 那些js,html文件.不管我怎么调整 Windows > Preference > MyEc ...

  4. OpenVPN搭建中tap与tun的实际使用区别

    tap俗称网桥模式,tun俗称路由模式,tap在二层,tun在三层,在实际应用中,其实以上这些知识概念,我是抄来的,具体的解释可以看以下参考链接. 下面将介绍在实际使用中的区别: 1.tap可以直接使 ...

  5. @Transactional导致AbstractRoutingDataSource动态数据源无法切换的解决办法

    上午花了大半天排查一个多数据源主从切换的问题,记录一下: 背景: 项目的数据库采用了读写分离多数据源,采用AOP进行拦截,利用ThreadLocal及AbstractRoutingDataSource ...

  6. Unity IOC容器通过配置实现类型映射的几种基本使用方法

    网上关于Unity IOC容器使用的方法已很多,但未能做一个总结,故我这里总结一下,方便大家选择. 首先讲一下通过代码来进行类型映射,很简单,代码如下 unityContainer = new Uni ...

  7. GO环境变量设置

    GOROOT就是go的安装路径在~/.bash_profile中添加下面语句: GOROOT=/usr/local/go export GOROOT 当然, 要执行go命令和go工具, 就要配置go的 ...

  8. Delphi如何在Form的标题栏绘制自定义文字

    Delphi中Form窗体的标题被设计成绘制在系统菜单的旁边,如果你想要在标题栏绘制自定义文本又不想改变Caption属性,你需要处理特定的Windows消息:WM_NCPAINT.. WM_NCPA ...

  9. MongoDB C# 驱动的各种版本下载地址

    https://github.com/mongodb/mongo-csharp-driver/releases

  10. A股和B股票的区别?

    一.A股和B股的区别——概念不同 (一).A股的概念A股是由中国境内的公司发行,正式名称是人民币普通股票,供境内机构.组织或个人(从4月1日起,境内港.澳.台居民可开立A股账户)以人民币认购和交易的普 ...