
  1. mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh

但是,我首先运行就出错了,因为我不是root账户,所以先改下路径,打开classify-20newsgroups.sh,替换/tmp/mahout-work-为/home/mahout/mahout-work-,这样用户mahout就具有了操作权限,但是还是出错,提示curl 找不到命令,好吧,我没安装这个,sudo apt-get install curl,ok ,ubuntu还是方便呀。


  1. + ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq


  1. mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh
  2. Please select a number to choose the corresponding task to run
  3. 1. cnaivebayes
  4. 2. naivebayes
  5. 3. sgd
  6. 4. clean -- cleans up the work area in /home/mahout/mahout-work-mahout
  7. Enter your choice : 2
  8. ok. You chose 2 and we'll use naivebayes
  9. creating work directory at /home/mahout/mahout-work-mahout
  10. + echo 'Preparing 20newsgroups data'
  11. Preparing 20newsgroups data
  12. + rm -rf /home/mahout/mahout-work-mahout/20news-all
  13. + mkdir /home/mahout/mahout-work-mahout/20news-all
  14. + cp -R /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.religion.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.religion.misc /home/mahout/mahout-work-mahout/20news-all
  15. + echo 'Creating sequence files from 20newsgroups data'
  16. Creating sequence files from 20newsgroups data
  17. + ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq
  18. Warning: $HADOOP_HOME is deprecated.
  20. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  21. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  22. Warning: $HADOOP_HOME is deprecated.
  24. SLF4J: Class path contains multiple SLF4J bindings.
  25. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  26. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  27. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  28. 13/08/26 23:38:49 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/home/mahout/mahout-work-mahout/20news-all], --keyPrefix=[], --output=[/home/mahout/mahout-work-mahout/20news-seq], --startPhase=[0], --tempDir=[temp]}
  29. 13/08/26 23:42:57 INFO driver.MahoutDriver: Program took 248530 ms (Minutes: 4.142166666666666)
  30. + echo 'Converting sequence files to vectors'
  31. Converting sequence files to vectors
  32. + ./bin/mahout seq2sparse -i /home/mahout/mahout-work-mahout/20news-seq -o /home/mahout/mahout-work-mahout/20news-vectors -lnorm -nv -wt tfidf
  33. Warning: $HADOOP_HOME is deprecated.
  35. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  36. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  37. Warning: $HADOOP_HOME is deprecated.
  39. SLF4J: Class path contains multiple SLF4J bindings.
  40. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  41. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  42. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  43. 13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
  44. 13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
  45. 13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
  46. 13/08/26 23:43:17 INFO input.FileInputFormat: Total input paths to process : 1
  47. 13/08/26 23:43:17 INFO mapred.JobClient: Running job: job_201308212334_0056
  48. 13/08/26 23:43:18 INFO mapred.JobClient: map 0% reduce 0%
  49. 13/08/26 23:43:45 INFO mapred.JobClient: map 78% reduce 0%
  50. 13/08/26 23:43:51 INFO mapred.JobClient: map 100% reduce 0%
  51. 13/08/26 23:43:56 INFO mapred.JobClient: Job complete: job_201308212334_0056
  52. 13/08/26 23:43:56 INFO mapred.JobClient: Counters: 19
  53. 13/08/26 23:43:56 INFO mapred.JobClient: Job Counters
  54. 13/08/26 23:43:56 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=32883
  55. 13/08/26 23:43:56 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  56. 13/08/26 23:43:56 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  57. 13/08/26 23:43:56 INFO mapred.JobClient: Launched map tasks=1
  58. 13/08/26 23:43:56 INFO mapred.JobClient: Data-local map tasks=1
  59. 13/08/26 23:43:56 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
  60. 13/08/26 23:43:56 INFO mapred.JobClient: File Output Format Counters
  61. 13/08/26 23:43:56 INFO mapred.JobClient: Bytes Written=27503580
  62. 13/08/26 23:43:56 INFO mapred.JobClient: FileSystemCounters
  63. 13/08/26 23:43:56 INFO mapred.JobClient: HDFS_BYTES_READ=36694022
  64. 13/08/26 23:43:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21899
  65. 13/08/26 23:43:56 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=27503580
  66. 13/08/26 23:43:56 INFO mapred.JobClient: File Input Format Counters
  67. 13/08/26 23:43:56 INFO mapred.JobClient: Bytes Read=36693889
  68. 13/08/26 23:43:56 INFO mapred.JobClient: Map-Reduce Framework
  69. 13/08/26 23:43:56 INFO mapred.JobClient: Map input records=18846
  70. 13/08/26 23:43:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=75157504
  71. 13/08/26 23:43:56 INFO mapred.JobClient: Spilled Records=0
  72. 13/08/26 23:43:56 INFO mapred.JobClient: CPU time spent (ms)=5730
  73. 13/08/26 23:43:56 INFO mapred.JobClient: Total committed heap usage (bytes)=15859712
  74. 13/08/26 23:43:56 INFO mapred.JobClient: Virtual memory (bytes) snapshot=974381056
  75. 13/08/26 23:43:56 INFO mapred.JobClient: Map output records=18846
  76. 13/08/26 23:43:56 INFO mapred.JobClient: SPLIT_RAW_BYTES=133
  77. 13/08/26 23:43:56 INFO input.FileInputFormat: Total input paths to process : 1
  78. 13/08/26 23:43:56 INFO mapred.JobClient: Running job: job_201308212334_0057
  79. 13/08/26 23:43:57 INFO mapred.JobClient: map 0% reduce 0%
  80. 13/08/26 23:44:15 INFO mapred.JobClient: map 3% reduce 0%
  81. 13/08/26 23:44:18 INFO mapred.JobClient: map 23% reduce 0%
  82. 13/08/26 23:44:21 INFO mapred.JobClient: map 60% reduce 0%
  83. 13/08/26 23:44:24 INFO mapred.JobClient: map 100% reduce 0%
  84. 13/08/26 23:44:48 INFO mapred.JobClient: map 100% reduce 100%
  85. 13/08/26 23:44:53 INFO mapred.JobClient: Job complete: job_201308212334_0057
  86. 13/08/26 23:44:53 INFO mapred.JobClient: Counters: 29
  87. 13/08/26 23:44:53 INFO mapred.JobClient: Job Counters
  88. 13/08/26 23:44:53 INFO mapred.JobClient: Launched reduce tasks=1
  89. 13/08/26 23:44:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=31312
  90. 13/08/26 23:44:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  91. 13/08/26 23:44:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  92. 13/08/26 23:44:53 INFO mapred.JobClient: Launched map tasks=1
  93. 13/08/26 23:44:53 INFO mapred.JobClient: Data-local map tasks=1
  94. 13/08/26 23:44:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18422
  95. 13/08/26 23:44:53 INFO mapred.JobClient: File Output Format Counters
  96. 13/08/26 23:44:53 INFO mapred.JobClient: Bytes Written=2315037
  97. 13/08/26 23:44:53 INFO mapred.JobClient: FileSystemCounters
  98. 13/08/26 23:44:53 INFO mapred.JobClient: FILE_BYTES_READ=11857906
  99. 13/08/26 23:44:53 INFO mapred.JobClient: HDFS_BYTES_READ=27503742
  100. 13/08/26 23:44:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=15440401
  101. 13/08/26 23:44:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2315037
  102. 13/08/26 23:44:53 INFO mapred.JobClient: File Input Format Counters
  103. 13/08/26 23:44:53 INFO mapred.JobClient: Bytes Read=27503580
  104. 13/08/26 23:44:53 INFO mapred.JobClient: Map-Reduce Framework
  105. 13/08/26 23:44:53 INFO mapred.JobClient: Map output materialized bytes=3538084
  106. 13/08/26 23:44:53 INFO mapred.JobClient: Map input records=18846
  107. 13/08/26 23:44:53 INFO mapred.JobClient: Reduce shuffle bytes=0
  108. 13/08/26 23:44:53 INFO mapred.JobClient: Spilled Records=849345
  109. 13/08/26 23:44:53 INFO mapred.JobClient: Map output bytes=39462740
  110. 13/08/26 23:44:53 INFO mapred.JobClient: Total committed heap usage (bytes)=176033792
  111. 13/08/26 23:44:53 INFO mapred.JobClient: CPU time spent (ms)=14080
  112. 13/08/26 23:44:53 INFO mapred.JobClient: Combine input records=3026242
  113. 13/08/26 23:44:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=162
  114. 13/08/26 23:44:53 INFO mapred.JobClient: Reduce input records=192904
  115. 13/08/26 23:44:53 INFO mapred.JobClient: Reduce input groups=192904
  116. 13/08/26 23:44:53 INFO mapred.JobClient: Combine output records=554873
  117. 13/08/26 23:44:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=283111424
  118. 13/08/26 23:44:53 INFO mapred.JobClient: Reduce output records=93563
  119. 13/08/26 23:44:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  120. 13/08/26 23:44:53 INFO mapred.JobClient: Map output records=2664273
  121. 13/08/26 23:44:54 INFO input.FileInputFormat: Total input paths to process : 1
  122. 13/08/26 23:44:55 INFO mapred.JobClient: Running job: job_201308212334_0058
  123. 13/08/26 23:44:56 INFO mapred.JobClient: map 0% reduce 0%
  124. 13/08/26 23:45:13 INFO mapred.JobClient: map 94% reduce 0%
  125. 13/08/26 23:45:16 INFO mapred.JobClient: map 100% reduce 0%
  126. 13/08/26 23:45:43 INFO mapred.JobClient: map 100% reduce 100%
  127. 13/08/26 23:45:48 INFO mapred.JobClient: Job complete: job_201308212334_0058
  128. 13/08/26 23:45:48 INFO mapred.JobClient: Counters: 29
  129. 13/08/26 23:45:48 INFO mapred.JobClient: Job Counters
  130. 13/08/26 23:45:48 INFO mapred.JobClient: Launched reduce tasks=1
  131. 13/08/26 23:45:48 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21298
  132. 13/08/26 23:45:48 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  133. 13/08/26 23:45:48 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  134. 13/08/26 23:45:48 INFO mapred.JobClient: Launched map tasks=1
  135. 13/08/26 23:45:48 INFO mapred.JobClient: Data-local map tasks=1
  136. 13/08/26 23:45:48 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=24763
  137. 13/08/26 23:45:48 INFO mapred.JobClient: File Output Format Counters
  138. 13/08/26 23:45:48 INFO mapred.JobClient: Bytes Written=29314118
  139. 13/08/26 23:45:48 INFO mapred.JobClient: FileSystemCounters
  140. 13/08/26 23:45:48 INFO mapred.JobClient: FILE_BYTES_READ=27274291
  141. 13/08/26 23:45:48 INFO mapred.JobClient: HDFS_BYTES_READ=29440826
  142. 13/08/26 23:45:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=54595105
  143. 13/08/26 23:45:48 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118
  144. 13/08/26 23:45:48 INFO mapred.JobClient: File Input Format Counters
  145. 13/08/26 23:45:48 INFO mapred.JobClient: Bytes Read=27503580
  146. 13/08/26 23:45:48 INFO mapred.JobClient: Map-Reduce Framework
  147. 13/08/26 23:45:48 INFO mapred.JobClient: Map output materialized bytes=27274291
  148. 13/08/26 23:45:48 INFO mapred.JobClient: Map input records=18846
  149. 13/08/26 23:45:48 INFO mapred.JobClient: Reduce shuffle bytes=0
  150. 13/08/26 23:45:48 INFO mapred.JobClient: Spilled Records=37692
  151. 13/08/26 23:45:48 INFO mapred.JobClient: Map output bytes=27199343
  152. 13/08/26 23:45:48 INFO mapred.JobClient: Total committed heap usage (bytes)=215695360
  153. 13/08/26 23:45:48 INFO mapred.JobClient: CPU time spent (ms)=12980
  154. 13/08/26 23:45:48 INFO mapred.JobClient: Combine input records=0
  155. 13/08/26 23:45:48 INFO mapred.JobClient: SPLIT_RAW_BYTES=162
  156. 13/08/26 23:45:48 INFO mapred.JobClient: Reduce input records=18846
  157. 13/08/26 23:45:48 INFO mapred.JobClient: Reduce input groups=18846
  158. 13/08/26 23:45:48 INFO mapred.JobClient: Combine output records=0
  159. 13/08/26 23:45:48 INFO mapred.JobClient: Physical memory (bytes) snapshot=332349440
  160. 13/08/26 23:45:48 INFO mapred.JobClient: Reduce output records=18846
  161. 13/08/26 23:45:48 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  162. 13/08/26 23:45:48 INFO mapred.JobClient: Map output records=18846
  163. 13/08/26 23:45:49 INFO input.FileInputFormat: Total input paths to process : 1
  164. 13/08/26 23:45:49 INFO mapred.JobClient: Running job: job_201308212334_0059
  165. 13/08/26 23:45:50 INFO mapred.JobClient: map 0% reduce 0%
  166. 13/08/26 23:46:10 INFO mapred.JobClient: map 100% reduce 0%
  167. 13/08/26 23:46:25 INFO mapred.JobClient: map 100% reduce 92%
  168. 13/08/26 23:46:31 INFO mapred.JobClient: map 100% reduce 100%
  169. 13/08/26 23:46:36 INFO mapred.JobClient: Job complete: job_201308212334_0059
  170. 13/08/26 23:46:36 INFO mapred.JobClient: Counters: 29
  171. 13/08/26 23:46:36 INFO mapred.JobClient: Job Counters
  172. 13/08/26 23:46:36 INFO mapred.JobClient: Launched reduce tasks=1
  173. 13/08/26 23:46:36 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18217
  174. 13/08/26 23:46:36 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  175. 13/08/26 23:46:36 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  176. 13/08/26 23:46:36 INFO mapred.JobClient: Launched map tasks=1
  177. 13/08/26 23:46:36 INFO mapred.JobClient: Data-local map tasks=1
  178. 13/08/26 23:46:36 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20981
  179. 13/08/26 23:46:36 INFO mapred.JobClient: File Output Format Counters
  180. 13/08/26 23:46:36 INFO mapred.JobClient: Bytes Written=29314118
  181. 13/08/26 23:46:36 INFO mapred.JobClient: FileSystemCounters
  182. 13/08/26 23:46:36 INFO mapred.JobClient: FILE_BYTES_READ=29059398
  183. 13/08/26 23:46:36 INFO mapred.JobClient: HDFS_BYTES_READ=29314278
  184. 13/08/26 23:46:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58163419
  185. 13/08/26 23:46:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118
  186. 13/08/26 23:46:36 INFO mapred.JobClient: File Input Format Counters
  187. 13/08/26 23:46:36 INFO mapred.JobClient: Bytes Read=29314118
  188. 13/08/26 23:46:36 INFO mapred.JobClient: Map-Reduce Framework
  189. 13/08/26 23:46:36 INFO mapred.JobClient: Map output materialized bytes=29059398
  190. 13/08/26 23:46:36 INFO mapred.JobClient: Map input records=18846
  191. 13/08/26 23:46:36 INFO mapred.JobClient: Reduce shuffle bytes=0
  192. 13/08/26 23:46:36 INFO mapred.JobClient: Spilled Records=37692
  193. 13/08/26 23:46:36 INFO mapred.JobClient: Map output bytes=28984080
  194. 13/08/26 23:46:36 INFO mapred.JobClient: Total committed heap usage (bytes)=205225984
  195. 13/08/26 23:46:36 INFO mapred.JobClient: CPU time spent (ms)=8650
  196. 13/08/26 23:46:36 INFO mapred.JobClient: Combine input records=0
  197. 13/08/26 23:46:37 INFO mapred.JobClient: SPLIT_RAW_BYTES=160
  198. 13/08/26 23:46:37 INFO mapred.JobClient: Reduce input records=18846
  199. 13/08/26 23:46:37 INFO mapred.JobClient: Reduce input groups=18846
  200. 13/08/26 23:46:37 INFO mapred.JobClient: Combine output records=0
  201. 13/08/26 23:46:37 INFO mapred.JobClient: Physical memory (bytes) snapshot=313606144
  202. 13/08/26 23:46:37 INFO mapred.JobClient: Reduce output records=18846
  203. 13/08/26 23:46:37 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  204. 13/08/26 23:46:37 INFO mapred.JobClient: Map output records=18846
  205. 13/08/26 23:46:37 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
  206. 13/08/26 23:46:37 INFO input.FileInputFormat: Total input paths to process : 1
  207. 13/08/26 23:46:37 INFO mapred.JobClient: Running job: job_201308212334_0060
  208. 13/08/26 23:46:38 INFO mapred.JobClient: map 0% reduce 0%
  209. 13/08/26 23:46:56 INFO mapred.JobClient: map 100% reduce 0%
  210. 13/08/26 23:47:14 INFO mapred.JobClient: map 100% reduce 100%
  211. 13/08/26 23:47:19 INFO mapred.JobClient: Job complete: job_201308212334_0060
  212. 13/08/26 23:47:19 INFO mapred.JobClient: Counters: 29
  213. 13/08/26 23:47:19 INFO mapred.JobClient: Job Counters
  214. 13/08/26 23:47:19 INFO mapred.JobClient: Launched reduce tasks=1
  215. 13/08/26 23:47:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21504
  216. 13/08/26 23:47:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  217. 13/08/26 23:47:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  218. 13/08/26 23:47:19 INFO mapred.JobClient: Launched map tasks=1
  219. 13/08/26 23:47:19 INFO mapred.JobClient: Data-local map tasks=1
  220. 13/08/26 23:47:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14273
  221. 13/08/26 23:47:19 INFO mapred.JobClient: File Output Format Counters
  222. 13/08/26 23:47:19 INFO mapred.JobClient: Bytes Written=1890073
  223. 13/08/26 23:47:19 INFO mapred.JobClient: FileSystemCounters
  224. 13/08/26 23:47:19 INFO mapred.JobClient: FILE_BYTES_READ=4880788
  225. 13/08/26 23:47:19 INFO mapred.JobClient: HDFS_BYTES_READ=29314271
  226. 13/08/26 23:47:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6235019
  227. 13/08/26 23:47:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1890073
  228. 13/08/26 23:47:19 INFO mapred.JobClient: File Input Format Counters
  229. 13/08/26 23:47:19 INFO mapred.JobClient: Bytes Read=29314118
  230. 13/08/26 23:47:19 INFO mapred.JobClient: Map-Reduce Framework
  231. 13/08/26 23:47:19 INFO mapred.JobClient: Map output materialized bytes=1309902
  232. 13/08/26 23:47:19 INFO mapred.JobClient: Map input records=18846
  233. 13/08/26 23:47:19 INFO mapred.JobClient: Reduce shuffle bytes=0
  234. 13/08/26 23:47:19 INFO mapred.JobClient: Spilled Records=442187
  235. 13/08/26 23:47:19 INFO mapred.JobClient: Map output bytes=31005336
  236. 13/08/26 23:47:19 INFO mapred.JobClient: Total committed heap usage (bytes)=176033792
  237. 13/08/26 23:47:19 INFO mapred.JobClient: CPU time spent (ms)=9210
  238. 13/08/26 23:47:19 INFO mapred.JobClient: Combine input records=2838837
  239. 13/08/26 23:47:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=153
  240. 13/08/26 23:47:19 INFO mapred.JobClient: Reduce input records=93564
  241. 13/08/26 23:47:19 INFO mapred.JobClient: Reduce input groups=93564
  242. 13/08/26 23:47:19 INFO mapred.JobClient: Combine output records=348623
  243. 13/08/26 23:47:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=284684288
  244. 13/08/26 23:47:19 INFO mapred.JobClient: Reduce output records=93564
  245. 13/08/26 23:47:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  246. 13/08/26 23:47:19 INFO mapred.JobClient: Map output records=2583778
  247. 13/08/26 23:47:19 INFO input.FileInputFormat: Total input paths to process : 1
  248. 13/08/26 23:47:19 INFO mapred.JobClient: Running job: job_201308212334_0061
  249. 13/08/26 23:47:20 INFO mapred.JobClient: map 0% reduce 0%
  250. 13/08/26 23:47:38 INFO mapred.JobClient: map 100% reduce 0%
  251. 13/08/26 23:47:53 INFO mapred.JobClient: map 100% reduce 67%
  252. 13/08/26 23:47:59 INFO mapred.JobClient: map 100% reduce 100%
  253. 13/08/26 23:48:04 INFO mapred.JobClient: Job complete: job_201308212334_0061
  254. 13/08/26 23:48:04 INFO mapred.JobClient: Counters: 29
  255. 13/08/26 23:48:04 INFO mapred.JobClient: Job Counters
  256. 13/08/26 23:48:04 INFO mapred.JobClient: Launched reduce tasks=1
  257. 13/08/26 23:48:04 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18292
  258. 13/08/26 23:48:04 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  259. 13/08/26 23:48:04 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  260. 13/08/26 23:48:04 INFO mapred.JobClient: Launched map tasks=1
  261. 13/08/26 23:48:04 INFO mapred.JobClient: Data-local map tasks=1
  262. 13/08/26 23:48:04 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19293
  263. 13/08/26 23:48:04 INFO mapred.JobClient: File Output Format Counters
  264. 13/08/26 23:48:04 INFO mapred.JobClient: Bytes Written=28689283
  265. 13/08/26 23:48:04 INFO mapred.JobClient: FileSystemCounters
  266. 13/08/26 23:48:04 INFO mapred.JobClient: FILE_BYTES_READ=29059398
  267. 13/08/26 23:48:04 INFO mapred.JobClient: HDFS_BYTES_READ=31204324
  268. 13/08/26 23:48:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58165045
  269. 13/08/26 23:48:04 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283
  270. 13/08/26 23:48:04 INFO mapred.JobClient: File Input Format Counters
  271. 13/08/26 23:48:04 INFO mapred.JobClient: Bytes Read=29314118
  272. 13/08/26 23:48:04 INFO mapred.JobClient: Map-Reduce Framework
  273. 13/08/26 23:48:04 INFO mapred.JobClient: Map output materialized bytes=29059398
  274. 13/08/26 23:48:04 INFO mapred.JobClient: Map input records=18846
  275. 13/08/26 23:48:04 INFO mapred.JobClient: Reduce shuffle bytes=0
  276. 13/08/26 23:48:04 INFO mapred.JobClient: Spilled Records=37692
  277. 13/08/26 23:48:04 INFO mapred.JobClient: Map output bytes=28984080
  278. 13/08/26 23:48:04 INFO mapred.JobClient: Total committed heap usage (bytes)=205225984
  279. 13/08/26 23:48:04 INFO mapred.JobClient: CPU time spent (ms)=8770
  280. 13/08/26 23:48:04 INFO mapred.JobClient: Combine input records=0
  281. 13/08/26 23:48:04 INFO mapred.JobClient: SPLIT_RAW_BYTES=153
  282. 13/08/26 23:48:04 INFO mapred.JobClient: Reduce input records=18846
  283. 13/08/26 23:48:04 INFO mapred.JobClient: Reduce input groups=18846
  284. 13/08/26 23:48:04 INFO mapred.JobClient: Combine output records=0
  285. 13/08/26 23:48:04 INFO mapred.JobClient: Physical memory (bytes) snapshot=320401408
  286. 13/08/26 23:48:04 INFO mapred.JobClient: Reduce output records=18846
  287. 13/08/26 23:48:04 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  288. 13/08/26 23:48:04 INFO mapred.JobClient: Map output records=18846
  289. 13/08/26 23:48:05 INFO input.FileInputFormat: Total input paths to process : 1
  290. 13/08/26 23:48:05 INFO mapred.JobClient: Running job: job_201308212334_0062
  291. 13/08/26 23:48:06 INFO mapred.JobClient: map 0% reduce 0%
  292. 13/08/26 23:48:24 INFO mapred.JobClient: map 100% reduce 0%
  293. 13/08/26 23:48:36 INFO mapred.JobClient: map 100% reduce 33%
  294. 13/08/26 23:48:39 INFO mapred.JobClient: map 100% reduce 86%
  295. 13/08/26 23:48:48 INFO mapred.JobClient: map 100% reduce 100%
  296. 13/08/26 23:48:53 INFO mapred.JobClient: Job complete: job_201308212334_0062
  297. 13/08/26 23:48:53 INFO mapred.JobClient: Counters: 29
  298. 13/08/26 23:48:53 INFO mapred.JobClient: Job Counters
  299. 13/08/26 23:48:53 INFO mapred.JobClient: Launched reduce tasks=1
  300. 13/08/26 23:48:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18225
  301. 13/08/26 23:48:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  302. 13/08/26 23:48:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  303. 13/08/26 23:48:53 INFO mapred.JobClient: Launched map tasks=1
  304. 13/08/26 23:48:53 INFO mapred.JobClient: Data-local map tasks=1
  305. 13/08/26 23:48:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=21045
  306. 13/08/26 23:48:53 INFO mapred.JobClient: File Output Format Counters
  307. 13/08/26 23:48:53 INFO mapred.JobClient: Bytes Written=28689283
  308. 13/08/26 23:48:53 INFO mapred.JobClient: FileSystemCounters
  309. 13/08/26 23:48:53 INFO mapred.JobClient: FILE_BYTES_READ=28437750
  310. 13/08/26 23:48:53 INFO mapred.JobClient: HDFS_BYTES_READ=28689443
  311. 13/08/26 23:48:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=56920127
  312. 13/08/26 23:48:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283
  313. 13/08/26 23:48:53 INFO mapred.JobClient: File Input Format Counters
  314. 13/08/26 23:48:53 INFO mapred.JobClient: Bytes Read=28689283
  315. 13/08/26 23:48:53 INFO mapred.JobClient: Map-Reduce Framework
  316. 13/08/26 23:48:53 INFO mapred.JobClient: Map output materialized bytes=28437750
  317. 13/08/26 23:48:53 INFO mapred.JobClient: Map input records=18846
  318. 13/08/26 23:48:53 INFO mapred.JobClient: Reduce shuffle bytes=0
  319. 13/08/26 23:48:53 INFO mapred.JobClient: Spilled Records=37692
  320. 13/08/26 23:48:53 INFO mapred.JobClient: Map output bytes=28362505
  321. 13/08/26 23:48:53 INFO mapred.JobClient: Total committed heap usage (bytes)=204603392
  322. 13/08/26 23:48:53 INFO mapred.JobClient: CPU time spent (ms)=8340
  323. 13/08/26 23:48:53 INFO mapred.JobClient: Combine input records=0
  324. 13/08/26 23:48:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=160
  325. 13/08/26 23:48:53 INFO mapred.JobClient: Reduce input records=18846
  326. 13/08/26 23:48:53 INFO mapred.JobClient: Reduce input groups=18846
  327. 13/08/26 23:48:53 INFO mapred.JobClient: Combine output records=0
  328. 13/08/26 23:48:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=313868288
  329. 13/08/26 23:48:53 INFO mapred.JobClient: Reduce output records=18846
  330. 13/08/26 23:48:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  331. 13/08/26 23:48:53 INFO mapred.JobClient: Map output records=18846
  332. 13/08/26 23:48:53 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
  333. 13/08/26 23:48:53 INFO driver.MahoutDriver: Program took 339621 ms (Minutes: 5.66035)
  334. + echo 'Creating training and holdout set with a random 80-20 split of the generated vector dataset'
  335. Creating training and holdout set with a random 80-20 split of the generated vector dataset
  336. + ./bin/mahout split -i /home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors --trainingOutput /home/mahout/mahout-work-mahout/20news-train-vectors --testOutput /home/mahout/mahout-work-mahout/20news-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
  337. Warning: $HADOOP_HOME is deprecated.
  339. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  340. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  341. Warning: $HADOOP_HOME is deprecated.
  343. SLF4J: Class path contains multiple SLF4J bindings.
  344. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  345. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  346. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  347. 13/08/26 23:49:06 WARN driver.MahoutDriver: No split.props found on classpath, will use command-line arguments only
  348. 13/08/26 23:49:07 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors], --method=[sequential], --overwrite=null, --randomSelectionPct=[40], --sequenceFiles=null, --startPhase=[0], --tempDir=[temp], --testOutput=[/home/mahout/mahout-work-mahout/20news-test-vectors], --trainingOutput=[/home/mahout/mahout-work-mahout/20news-train-vectors]}
  349. 13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 has 162419 lines
  350. 13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 test split size is 64968 based on random selection percentage 40
  351. 13/08/26 23:49:11 INFO util.NativeCodeLoader: Loaded the native-hadoop library
  352. 13/08/26 23:49:11 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
  353. 13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
  354. 13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
  355. 13/08/26 23:49:16 INFO utils.SplitInput: file: part-r-00000, input: 162419 train: 11321, test: 7525 starting at 0
  356. 13/08/26 23:49:16 INFO driver.MahoutDriver: Program took 9786 ms (Minutes: 0.1631)
  357. + echo 'Training Naive Bayes model'
  358. Training Naive Bayes model
  359. + ./bin/mahout trainnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -el -o /home/mahout/mahout-work-mahout/model -li /home/mahout/mahout-work-mahout/labelindex -ow
  360. Warning: $HADOOP_HOME is deprecated.
  362. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  363. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  364. Warning: $HADOOP_HOME is deprecated.
  366. SLF4J: Class path contains multiple SLF4J bindings.
  367. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  368. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  369. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  370. 13/08/26 23:49:22 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only
  371. 13/08/26 23:49:22 INFO common.AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --output=[/home/mahout/mahout-work-mahout/model], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
  372. 13/08/26 23:49:23 INFO common.HadoopUtil: Deleting temp
  373. 13/08/26 23:49:23 INFO util.NativeCodeLoader: Loaded the native-hadoop library
  374. 13/08/26 23:49:23 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
  375. 13/08/26 23:49:23 INFO compress.CodecPool: Got brand-new decompressor
  376. 13/08/26 23:49:26 INFO input.FileInputFormat: Total input paths to process : 1
  377. 13/08/26 23:49:26 INFO mapred.JobClient: Running job: job_201308212334_0063
  378. 13/08/26 23:49:27 INFO mapred.JobClient: map 0% reduce 0%
  379. 13/08/26 23:49:49 INFO mapred.JobClient: map 43% reduce 0%
  380. 13/08/26 23:49:52 INFO mapred.JobClient: map 100% reduce 0%
  381. 13/08/26 23:50:13 INFO mapred.JobClient: map 100% reduce 100%
  382. 13/08/26 23:50:18 INFO mapred.JobClient: Job complete: job_201308212334_0063
  383. 13/08/26 23:50:18 INFO mapred.JobClient: Counters: 29
  384. 13/08/26 23:50:18 INFO mapred.JobClient: Job Counters
  385. 13/08/26 23:50:18 INFO mapred.JobClient: Launched reduce tasks=1
  386. 13/08/26 23:50:18 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22816
  387. 13/08/26 23:50:18 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  388. 13/08/26 23:50:18 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  389. 13/08/26 23:50:18 INFO mapred.JobClient: Launched map tasks=1
  390. 13/08/26 23:50:18 INFO mapred.JobClient: Data-local map tasks=1
  391. 13/08/26 23:50:18 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20680
  392. 13/08/26 23:50:18 INFO mapred.JobClient: File Output Format Counters
  393. 13/08/26 23:50:18 INFO mapred.JobClient: Bytes Written=2718605
  394. 13/08/26 23:50:18 INFO mapred.JobClient: FileSystemCounters
  395. 13/08/26 23:50:18 INFO mapred.JobClient: FILE_BYTES_READ=1404371
  396. 13/08/26 23:50:18 INFO mapred.JobClient: HDFS_BYTES_READ=12669237
  397. 13/08/26 23:50:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2854477
  398. 13/08/26 23:50:18 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2718605
  399. 13/08/26 23:50:18 INFO mapred.JobClient: File Input Format Counters
  400. 13/08/26 23:50:18 INFO mapred.JobClient: Bytes Read=12668431
  401. 13/08/26 23:50:18 INFO mapred.JobClient: Map-Reduce Framework
  402. 13/08/26 23:50:18 INFO mapred.JobClient: Map output materialized bytes=1404363
  403. 13/08/26 23:50:18 INFO mapred.JobClient: Map input records=11321
  404. 13/08/26 23:50:18 INFO mapred.JobClient: Reduce shuffle bytes=1404363
  405. 13/08/26 23:50:18 INFO mapred.JobClient: Spilled Records=40
  406. 13/08/26 23:50:18 INFO mapred.JobClient: Map output bytes=16682576
  407. 13/08/26 23:50:18 INFO mapred.JobClient: Total committed heap usage (bytes)=176164864
  408. 13/08/26 23:50:18 INFO mapred.JobClient: CPU time spent (ms)=8190
  409. 13/08/26 23:50:18 INFO mapred.JobClient: Combine input records=11321
  410. 13/08/26 23:50:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=148
  411. 13/08/26 23:50:18 INFO mapred.JobClient: Reduce input records=20
  412. 13/08/26 23:50:18 INFO mapred.JobClient: Reduce input groups=20
  413. 13/08/26 23:50:18 INFO mapred.JobClient: Combine output records=20
  414. 13/08/26 23:50:18 INFO mapred.JobClient: Physical memory (bytes) snapshot=294400000
  415. 13/08/26 23:50:18 INFO mapred.JobClient: Reduce output records=20
  416. 13/08/26 23:50:18 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1961967616
  417. 13/08/26 23:50:18 INFO mapred.JobClient: Map output records=11321
  418. 13/08/26 23:50:18 INFO input.FileInputFormat: Total input paths to process : 1
  419. 13/08/26 23:50:18 INFO mapred.JobClient: Running job: job_201308212334_0064
  420. 13/08/26 23:50:19 INFO mapred.JobClient: map 0% reduce 0%
  421. 13/08/26 23:50:40 INFO mapred.JobClient: map 100% reduce 0%
  422. 13/08/26 23:51:01 INFO mapred.JobClient: map 100% reduce 100%
  423. 13/08/26 23:51:06 INFO mapred.JobClient: Job complete: job_201308212334_0064
  424. 13/08/26 23:51:06 INFO mapred.JobClient: Counters: 29
  425. 13/08/26 23:51:06 INFO mapred.JobClient: Job Counters
  426. 13/08/26 23:51:06 INFO mapred.JobClient: Launched reduce tasks=1
  427. 13/08/26 23:51:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=24609
  428. 13/08/26 23:51:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  429. 13/08/26 23:51:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  430. 13/08/26 23:51:06 INFO mapred.JobClient: Launched map tasks=1
  431. 13/08/26 23:51:06 INFO mapred.JobClient: Data-local map tasks=1
  432. 13/08/26 23:51:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=15258
  433. 13/08/26 23:51:06 INFO mapred.JobClient: File Output Format Counters
  434. 13/08/26 23:51:06 INFO mapred.JobClient: Bytes Written=893560
  435. 13/08/26 23:51:06 INFO mapred.JobClient: FileSystemCounters
  436. 13/08/26 23:51:06 INFO mapred.JobClient: FILE_BYTES_READ=362674
  437. 13/08/26 23:51:06 INFO mapred.JobClient: HDFS_BYTES_READ=2718737
  438. 13/08/26 23:51:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=771195
  439. 13/08/26 23:51:06 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=893560
  440. 13/08/26 23:51:06 INFO mapred.JobClient: File Input Format Counters
  441. 13/08/26 23:51:06 INFO mapred.JobClient: Bytes Read=2718605
  442. 13/08/26 23:51:06 INFO mapred.JobClient: Map-Reduce Framework
  443. 13/08/26 23:51:06 INFO mapred.JobClient: Map output materialized bytes=362666
  444. 13/08/26 23:51:06 INFO mapred.JobClient: Map input records=20
  445. 13/08/26 23:51:06 INFO mapred.JobClient: Reduce shuffle bytes=362666
  446. 13/08/26 23:51:06 INFO mapred.JobClient: Spilled Records=4
  447. 13/08/26 23:51:06 INFO mapred.JobClient: Map output bytes=893434
  448. 13/08/26 23:51:06 INFO mapred.JobClient: Total committed heap usage (bytes)=223264768
  449. 13/08/26 23:51:06 INFO mapred.JobClient: CPU time spent (ms)=5370
  450. 13/08/26 23:51:06 INFO mapred.JobClient: Combine input records=2
  451. 13/08/26 23:51:06 INFO mapred.JobClient: SPLIT_RAW_BYTES=132
  452. 13/08/26 23:51:06 INFO mapred.JobClient: Reduce input records=2
  453. 13/08/26 23:51:06 INFO mapred.JobClient: Reduce input groups=2
  454. 13/08/26 23:51:06 INFO mapred.JobClient: Combine output records=2
  455. 13/08/26 23:51:06 INFO mapred.JobClient: Physical memory (bytes) snapshot=300597248
  456. 13/08/26 23:51:06 INFO mapred.JobClient: Reduce output records=2
  457. 13/08/26 23:51:06 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1961967616
  458. 13/08/26 23:51:06 INFO mapred.JobClient: Map output records=2
  459. 13/08/26 23:51:07 INFO driver.MahoutDriver: Program took 104944 ms (Minutes: 1.7490666666666668)
  460. + echo 'Self testing on training set'
  461. Self testing on training set
  462. + ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
  463. Warning: $HADOOP_HOME is deprecated.
  465. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  466. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  467. Warning: $HADOOP_HOME is deprecated.
  469. SLF4J: Class path contains multiple SLF4J bindings.
  470. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  471. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  472. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  473. 13/08/26 23:51:19 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
  474. 13/08/26 23:51:19 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
  475. 13/08/26 23:51:20 INFO input.FileInputFormat: Total input paths to process : 1
  476. 13/08/26 23:51:21 INFO mapred.JobClient: Running job: job_201308212334_0065
  477. 13/08/26 23:51:22 INFO mapred.JobClient: map 0% reduce 0%
  478. 13/08/26 23:51:45 INFO mapred.JobClient: map 51% reduce 0%
  479. 13/08/26 23:51:48 INFO mapred.JobClient: map 89% reduce 0%
  480. 13/08/26 23:51:54 INFO mapred.JobClient: map 100% reduce 0%
  481. 13/08/26 23:51:58 INFO mapred.JobClient: Job complete: job_201308212334_0065
  482. 13/08/26 23:51:58 INFO mapred.JobClient: Counters: 19
  483. 13/08/26 23:51:58 INFO mapred.JobClient: Job Counters
  484. 13/08/26 23:51:58 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=34216
  485. 13/08/26 23:51:58 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  486. 13/08/26 23:51:58 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  487. 13/08/26 23:51:58 INFO mapred.JobClient: Launched map tasks=1
  488. 13/08/26 23:51:58 INFO mapred.JobClient: Data-local map tasks=1
  489. 13/08/26 23:51:58 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
  490. 13/08/26 23:51:58 INFO mapred.JobClient: File Output Format Counters
  491. 13/08/26 23:51:58 INFO mapred.JobClient: Bytes Written=2132486
  492. 13/08/26 23:51:58 INFO mapred.JobClient: FileSystemCounters
  493. 13/08/26 23:51:58 INFO mapred.JobClient: HDFS_BYTES_READ=16279896
  494. 13/08/26 23:51:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22523
  495. 13/08/26 23:51:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2132486
  496. 13/08/26 23:51:58 INFO mapred.JobClient: File Input Format Counters
  497. 13/08/26 23:51:58 INFO mapred.JobClient: Bytes Read=12668431
  498. 13/08/26 23:51:58 INFO mapred.JobClient: Map-Reduce Framework
  499. 13/08/26 23:51:58 INFO mapred.JobClient: Map input records=11321
  500. 13/08/26 23:51:58 INFO mapred.JobClient: Physical memory (bytes) snapshot=87547904
  501. 13/08/26 23:51:58 INFO mapred.JobClient: Spilled Records=0
  502. 13/08/26 23:51:58 INFO mapred.JobClient: CPU time spent (ms)=9380
  503. 13/08/26 23:51:58 INFO mapred.JobClient: Total committed heap usage (bytes)=28131328
  504. 13/08/26 23:51:58 INFO mapred.JobClient: Virtual memory (bytes) snapshot=976572416
  505. 13/08/26 23:51:58 INFO mapred.JobClient: Map output records=11321
  506. 13/08/26 23:51:58 INFO mapred.JobClient: SPLIT_RAW_BYTES=148
  507. 13/08/26 23:51:59 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
  508. Summary
  509. -------------------------------------------------------
  510. Correctly Classified Instances : 11256 99.4258%
  511. Incorrectly Classified Instances : 65 0.5742%
  512. Total Classified Instances : 11321
  514. =======================================================
  515. Confusion Matrix
  516. -------------------------------------------------------
  517. a b c d e f g h i j k l m n o p q r s t <--Classified as
  518. 454 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 | 458 a = alt.atheism
  519. 0 588 0 3 0 2 0 0 0 0 0 1 0 1 0 0 0 0 0 0 | 595 b = comp.graphics
  520. 0 3 553 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 563 c = comp.os.ms-windows.misc
  521. 0 0 0 592 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 | 595 d = comp.sys.ibm.pc.hardware
  522. 0 0 0 1 593 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 594 e = comp.sys.mac.hardware
  523. 0 2 0 1 0 576 1 0 0 0 0 0 0 0 1 0 0 0 0 0 | 581 f = comp.windows.x
  524. 0 1 0 0 0 0 579 0 0 0 0 0 1 0 0 0 0 0 0 0 | 581 g = misc.forsale
  525. 0 0 0 0 0 0 1 594 0 0 0 0 1 0 0 0 0 0 0 0 | 596 h = rec.autos
  526. 0 0 0 0 0 0 1 2 591 0 0 0 0 0 0 0 0 0 0 0 | 594 i = rec.motorcycles
  527. 0 0 0 0 0 0 0 0 0 615 1 0 0 0 0 0 0 0 0 0 | 616 j = rec.sport.baseball
  528. 0 0 0 0 0 0 1 0 0 1 581 0 0 0 0 0 0 0 0 0 | 583 k = rec.sport.hockey
  529. 0 0 1 0 0 0 0 0 0 0 0 627 1 0 0 0 0 1 0 0 | 630 l = sci.crypt
  530. 0 0 0 2 0 0 1 0 0 0 0 0 588 0 0 0 0 0 0 0 | 591 m = sci.electronics
  531. 0 1 0 0 0 0 0 0 0 0 0 0 0 586 1 0 0 0 0 0 | 588 n = sci.med
  532. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 615 0 0 0 0 0 | 615 o = sci.space
  533. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 619 1 0 0 0 | 620 p = soc.religion.christian
  534. 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 541 0 0 0 | 543 q = talk.politics.mideast
  535. 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 560 0 0 | 561 r = talk.politics.guns
  536. 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 1 351 0 | 359 s = talk.religion.misc
  537. 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 4 0 453 | 458 t = talk.politics.misc
  539. 13/08/26 23:51:59 INFO driver.MahoutDriver: Program took 40214 ms (Minutes: 0.6702333333333333)
  540. + echo 'Testing on holdout set'
  541. Testing on holdout set
  542. + ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-test-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
  543. Warning: $HADOOP_HOME is deprecated.
  545. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  546. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  547. Warning: $HADOOP_HOME is deprecated.
  549. SLF4J: Class path contains multiple SLF4J bindings.
  550. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  551. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  552. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  553. 13/08/26 23:52:09 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
  554. 13/08/26 23:52:09 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-test-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
  555. 13/08/26 23:52:10 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-testing
  556. 13/08/26 23:52:10 INFO input.FileInputFormat: Total input paths to process : 1
  557. 13/08/26 23:52:11 INFO mapred.JobClient: Running job: job_201308212334_0066
  558. 13/08/26 23:52:12 INFO mapred.JobClient: map 0% reduce 0%
  559. 13/08/26 23:52:30 INFO mapred.JobClient: map 85% reduce 0%
  560. 13/08/26 23:52:36 INFO mapred.JobClient: map 100% reduce 0%
  561. 13/08/26 23:52:41 INFO mapred.JobClient: Job complete: job_201308212334_0066
  562. 13/08/26 23:52:41 INFO mapred.JobClient: Counters: 19
  563. 13/08/26 23:52:41 INFO mapred.JobClient: Job Counters
  564. 13/08/26 23:52:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=25113
  565. 13/08/26 23:52:41 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  566. 13/08/26 23:52:41 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  567. 13/08/26 23:52:41 INFO mapred.JobClient: Launched map tasks=1
  568. 13/08/26 23:52:41 INFO mapred.JobClient: Data-local map tasks=1
  569. 13/08/26 23:52:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
  570. 13/08/26 23:52:41 INFO mapred.JobClient: File Output Format Counters
  571. 13/08/26 23:52:41 INFO mapred.JobClient: Bytes Written=1417942
  572. 13/08/26 23:52:41 INFO mapred.JobClient: FileSystemCounters
  573. 13/08/26 23:52:41 INFO mapred.JobClient: HDFS_BYTES_READ=12148944
  574. 13/08/26 23:52:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22522
  575. 13/08/26 23:52:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1417942
  576. 13/08/26 23:52:41 INFO mapred.JobClient: File Input Format Counters
  577. 13/08/26 23:52:41 INFO mapred.JobClient: Bytes Read=8537480
  578. 13/08/26 23:52:41 INFO mapred.JobClient: Map-Reduce Framework
  579. 13/08/26 23:52:41 INFO mapred.JobClient: Map input records=7525
  580. 13/08/26 23:52:41 INFO mapred.JobClient: Physical memory (bytes) snapshot=85057536
  581. 13/08/26 23:52:41 INFO mapred.JobClient: Spilled Records=0
  582. 13/08/26 23:52:41 INFO mapred.JobClient: CPU time spent (ms)=6630
  583. 13/08/26 23:52:41 INFO mapred.JobClient: Total committed heap usage (bytes)=28155904
  584. 13/08/26 23:52:41 INFO mapred.JobClient: Virtual memory (bytes) snapshot=976572416
  585. 13/08/26 23:52:41 INFO mapred.JobClient: Map output records=7525
  586. 13/08/26 23:52:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=147
  587. 13/08/26 23:52:42 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
  588. Summary
  589. -------------------------------------------------------
  590. Correctly Classified Instances : 6801 90.3787%
  591. Incorrectly Classified Instances : 724 9.6213%
  592. Total Classified Instances : 7525
  594. =======================================================
  595. Confusion Matrix
  596. -------------------------------------------------------
  597. a b c d e f g h i j k l m n o p q r s t <--Classified as
  598. 318 0 0 0 1 0 0 0 1 0 0 0 0 0 1 4 0 0 15 1 | 341 a = alt.atheism
  599. 1 318 7 20 4 7 7 2 0 1 0 1 1 2 6 0 0 0 0 1 | 378 b = comp.graphics
  600. 0 25 277 78 12 15 5 0 0 0 0 2 4 0 1 0 0 0 0 3 | 422 c = comp.os.ms-windows.misc
  601. 1 4 3 336 20 3 8 0 0 0 0 1 11 0 0 0 0 0 0 0 | 387 d = comp.sys.ibm.pc.hardware
  602. 0 3 1 6 350 1 3 0 0 0 0 1 3 1 0 0 0 0 0 0 | 369 e = comp.sys.mac.hardware
  603. 1 20 3 6 7 365 3 0 0 0 0 1 0 0 0 0 1 0 0 0 | 407 f = comp.windows.x
  604. 0 1 1 19 8 0 329 13 1 0 0 2 14 0 4 0 0 1 1 0 | 394 g = misc.forsale
  605. 0 2 1 2 3 1 10 361 8 0 0 0 4 0 0 0 0 1 0 1 | 394 h = rec.autos
  606. 0 0 0 1 0 0 2 3 393 1 0 0 0 0 0 0 0 1 0 1 | 402 i = rec.motorcycles
  607. 0 0 0 1 0 0 2 3 0 360 6 0 2 2 1 0 0 0 0 1 | 378 j = rec.sport.baseball
  608. 0 1 0 2 1 0 0 0 2 5 401 0 1 0 0 1 0 0 0 2 | 416 k = rec.sport.hockey
  609. 1 1 0 1 3 2 1 1 0 0 0 344 1 1 2 0 1 1 0 1 | 361 l = sci.crypt
  610. 0 5 0 15 14 0 5 1 1 0 0 2 348 1 1 0 0 0 0 0 | 393 m = sci.electronics
  611. 1 2 1 1 1 0 1 0 0 1 0 1 4 381 5 0 0 1 1 1 | 402 n = sci.med
  612. 1 4 0 0 2 0 2 1 0 0 0 1 2 1 356 0 0 1 0 1 | 372 o = sci.space
  613. 5 0 0 1 1 0 0 1 0 0 1 0 0 1 0 359 3 0 4 1 | 377 p = soc.religion.christian
  614. 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 2 389 0 0 2 | 397 q = talk.politics.mideast
  615. 0 0 1 0 1 1 0 1 0 0 0 2 1 1 0 0 0 335 0 6 | 349 r = talk.politics.guns
  616. 29 1 0 1 0 0 1 0 0 1 0 0 0 0 2 24 0 8 197 5 | 269 s = talk.religion.misc
  617. 2 0 0 0 2 0 0 1 0 1 1 1 0 1 2 0 2 17 3 284 | 317 t = talk.politics.misc
  619. 13/08/26 23:52:42 INFO driver.MahoutDriver: Program took 32480 ms (Minutes: 0.5413333333333333)





mahout 运行Twenty Newsgroups Classification实例的更多相关文章

  1. Twenty Newsgroups Classification实例任务之TrainNaiveBayesJob(一)

    接着上篇blog,继续看log里面的信息如下: + echo 'Training Naive Bayes model' Training Naive Bayes model + ./bin/mahou ...

  2. Twenty Newsgroups Classification任务之二seq2sparse(5)

    接上篇blog,继续分析.接下来要调用代码如下: // Should document frequency features be processed if (shouldPrune || proce ...

  3. Twenty Newsgroups Classification任务之二seq2sparse(3)

    接上篇,如果想对上篇的问题进行测试其实可以简单的编写下面的代码: package mahout.fansy.test.bayes.write; import java.io.IOException; ...

  4. Twenty Newsgroups Classification任务之二seq2sparse

    seq2sparse对应于mahout中的org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles,从昨天跑的算法中的任务监控界面可以看到 ...

  5. Twenty Newsgroups Classification任务之二seq2sparse(2)

    接上篇,SequenceFileTokenizerMapper的输出文件在/home/mahout/mahout-work-mahout0/20news-vectors/tokenized-docum ...

  6. W3School-CSS 分类 (Classification) 实例

    CSS 分类 (Classification) 实例 CSS 实例 CSS 背景实例 CSS 文本实例 CSS 字体(font)实例 CSS 边框(border)实例 CSS 外边距 (margin) ...

  7. 在Linux(Centos7)系统上对进行Hadoop分布式配置以及运行Hadoop伪分布式实例

    在Linux(Centos7)系统上对进行Hadoop分布式配置以及运行Hadoop伪分布式实例                                                     ...

  8. [HBase Manual]CH5 HBase运行模式:单实例和分布式

    HBase运行模式:单实例和分布式 HBase运行模式:单实例和分布式 1.单实例模式 1.1 单实例在HDFS下 2.分布式 2.1 伪分布式 3完全分布式 HBase有2种运行模式,单实例和分布式 ...

  9. CSS 分类 (Classification) 实例

    CSS 分类 (Classification) 实例CSS 分类属性 (Classification)CSS 分类属性允许你控制如何显示元素,设置图像显示于另一元素中的何处,相对于其正常位置来定位元素 ...


  1. dword word byte 相互转换 .xml

    pre{ line-height:1; color:#800080; background-color:#d2c39b; font-size:16px;}.sysFunc{color:#627cf6; ...

  2. Chapter6:函数

    执行函数的第一步是(隐式地)定义并初始化它的形参.所以,函数最外层作用域中的局部变量也不能使用与函数形参一样的名字. 局部静态变量:在程序的执行路径第一次经过对象定义语句时初始化,并且直到程序终止才被 ...

  3. configsections規範配置信息

    對於小型項目,配置信息可以通过appSettings进行配置,而如果配置信息太多,appSettings显得有些乱,而且在开发人员调用时,也不够友好,节点名称很容易写错,这时,我们有几种解决方案 1 ...

  4. [转]32位和64位系统区别及int字节数

    一)64位系统和32位有什么区别? 1.64bit CPU拥有更大的寻址能力,最大支持到16GB内存,而32bit只支持4G内存 2.64位CPU一次可提取64位数据,比32位提高了一倍,理论上性能会 ...

  5. 单源最短路径-Dijkstra算法

    1.算法标签 贪心 2.算法描述 具体的算法描述网上有好多,我觉得莫过于直接wiki,只说明一些我之前比较迷惑的. 对于Dijkstra算法,最重要的是维护以下几个数据结构: 顶点集合S : 表示已经 ...

  6. 给一已经排序数组A和x,求A中是否包含两个元素之和为x

    亲爱的大神老爷们,这是小渣第一次写blog,欢迎大家来喷(批评指正),算法渣因为面试中连这道题都不会写就自己琢磨了一下,也参考了网上大家的做法 解法一: 思路:从首尾向目的靠拢,因为已经排序,[假设存 ...

  7. Jquery 操作页面中iframe自动跟随窗口大小变化,而页面不出现滚动条,只在iframe内部出滚动条

    很多时候大家需要iframe自适应所加载的页面高度而不要iframe滚动条,但是这次我需要的是页面不需要滚动条而iframe要滚动条,且iframe自动跟随窗口大小变化.自适应页面大小.下面是代码,下 ...

  8. 转】windows下使用批处理脚本实现多个版本的JDK切换

    原博文出自于: http://www.cnblogs.com/xdp-gacl/p/5209386.html 感谢! 一.JDK版本切换批处理脚本 我们平时在window上做开发的时候,可能需要同时开 ...

  9. reentrant可重入函数

    在多任务操作系统环境中,应用程序的各个任务是并发运行的,所以会经常出现多个任务“同时”调用同一个函数的情况.这里之所以在“同时” 这个词上使用了引号,是因为这个歌”同时“的含义与我们平时所说的同时不是 ...

  10. python 加密解密

    1. 使用base64 s1 = base64.encodestring('hello world') s2 = base64.decodestring(s1) print s1, s2 结果 1 2 ...