按照mahout官网https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups的说法,我只用运行一条命令就可以完成这个算法的调用了,如下:

  1. mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh

但是,我首先运行就出错了,因为我不是root账户,所以先改下路径,打开classify-20newsgroups.sh,替换/tmp/mahout-work-为/home/mahout/mahout-work-,这样用户mahout就具有了操作权限,但是还是出错,提示curl 找不到命令,好吧,我没安装这个,sudo apt-get install curl,ok ,ubuntu还是方便呀。

然后再运行,结果运行到2/3时候还是出错,然后我查看详细信息,居然map输入的数据条数为0?啥意思?好吧,应该是本地文件操作和HDFS文件操作混淆了,其实在执行:

  1. + ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq

这一步前应该把本地的20news-all上传到HDFS文件系统上面,然后重新执行第一条命令即可,全部信息如下(太多了,不知道贴的完不?):

  1. mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh
  2. Please select a number to choose the corresponding task to run
  3. 1. cnaivebayes
  4. 2. naivebayes
  5. 3. sgd
  6. 4. clean -- cleans up the work area in /home/mahout/mahout-work-mahout
  7. Enter your choice : 2
  8. ok. You chose 2 and we'll use naivebayes
  9. creating work directory at /home/mahout/mahout-work-mahout
  10. + echo 'Preparing 20newsgroups data'
  11. Preparing 20newsgroups data
  12. + rm -rf /home/mahout/mahout-work-mahout/20news-all
  13. + mkdir /home/mahout/mahout-work-mahout/20news-all
  14. + cp -R /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.religion.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.religion.misc /home/mahout/mahout-work-mahout/20news-all
  15. + echo 'Creating sequence files from 20newsgroups data'
  16. Creating sequence files from 20newsgroups data
  17. + ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq
  18. Warning: $HADOOP_HOME is deprecated.
  19.  
  20. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  21. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  22. Warning: $HADOOP_HOME is deprecated.
  23.  
  24. SLF4J: Class path contains multiple SLF4J bindings.
  25. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  26. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  27. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  28. 13/08/26 23:38:49 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/home/mahout/mahout-work-mahout/20news-all], --keyPrefix=[], --output=[/home/mahout/mahout-work-mahout/20news-seq], --startPhase=[0], --tempDir=[temp]}
  29. 13/08/26 23:42:57 INFO driver.MahoutDriver: Program took 248530 ms (Minutes: 4.142166666666666)
  30. + echo 'Converting sequence files to vectors'
  31. Converting sequence files to vectors
  32. + ./bin/mahout seq2sparse -i /home/mahout/mahout-work-mahout/20news-seq -o /home/mahout/mahout-work-mahout/20news-vectors -lnorm -nv -wt tfidf
  33. Warning: $HADOOP_HOME is deprecated.
  34.  
  35. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  36. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  37. Warning: $HADOOP_HOME is deprecated.
  38.  
  39. SLF4J: Class path contains multiple SLF4J bindings.
  40. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  41. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  42. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  43. 13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
  44. 13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
  45. 13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
  46. 13/08/26 23:43:17 INFO input.FileInputFormat: Total input paths to process : 1
  47. 13/08/26 23:43:17 INFO mapred.JobClient: Running job: job_201308212334_0056
  48. 13/08/26 23:43:18 INFO mapred.JobClient: map 0% reduce 0%
  49. 13/08/26 23:43:45 INFO mapred.JobClient: map 78% reduce 0%
  50. 13/08/26 23:43:51 INFO mapred.JobClient: map 100% reduce 0%
  51. 13/08/26 23:43:56 INFO mapred.JobClient: Job complete: job_201308212334_0056
  52. 13/08/26 23:43:56 INFO mapred.JobClient: Counters: 19
  53. 13/08/26 23:43:56 INFO mapred.JobClient: Job Counters
  54. 13/08/26 23:43:56 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=32883
  55. 13/08/26 23:43:56 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  56. 13/08/26 23:43:56 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  57. 13/08/26 23:43:56 INFO mapred.JobClient: Launched map tasks=1
  58. 13/08/26 23:43:56 INFO mapred.JobClient: Data-local map tasks=1
  59. 13/08/26 23:43:56 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
  60. 13/08/26 23:43:56 INFO mapred.JobClient: File Output Format Counters
  61. 13/08/26 23:43:56 INFO mapred.JobClient: Bytes Written=27503580
  62. 13/08/26 23:43:56 INFO mapred.JobClient: FileSystemCounters
  63. 13/08/26 23:43:56 INFO mapred.JobClient: HDFS_BYTES_READ=36694022
  64. 13/08/26 23:43:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21899
  65. 13/08/26 23:43:56 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=27503580
  66. 13/08/26 23:43:56 INFO mapred.JobClient: File Input Format Counters
  67. 13/08/26 23:43:56 INFO mapred.JobClient: Bytes Read=36693889
  68. 13/08/26 23:43:56 INFO mapred.JobClient: Map-Reduce Framework
  69. 13/08/26 23:43:56 INFO mapred.JobClient: Map input records=18846
  70. 13/08/26 23:43:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=75157504
  71. 13/08/26 23:43:56 INFO mapred.JobClient: Spilled Records=0
  72. 13/08/26 23:43:56 INFO mapred.JobClient: CPU time spent (ms)=5730
  73. 13/08/26 23:43:56 INFO mapred.JobClient: Total committed heap usage (bytes)=15859712
  74. 13/08/26 23:43:56 INFO mapred.JobClient: Virtual memory (bytes) snapshot=974381056
  75. 13/08/26 23:43:56 INFO mapred.JobClient: Map output records=18846
  76. 13/08/26 23:43:56 INFO mapred.JobClient: SPLIT_RAW_BYTES=133
  77. 13/08/26 23:43:56 INFO input.FileInputFormat: Total input paths to process : 1
  78. 13/08/26 23:43:56 INFO mapred.JobClient: Running job: job_201308212334_0057
  79. 13/08/26 23:43:57 INFO mapred.JobClient: map 0% reduce 0%
  80. 13/08/26 23:44:15 INFO mapred.JobClient: map 3% reduce 0%
  81. 13/08/26 23:44:18 INFO mapred.JobClient: map 23% reduce 0%
  82. 13/08/26 23:44:21 INFO mapred.JobClient: map 60% reduce 0%
  83. 13/08/26 23:44:24 INFO mapred.JobClient: map 100% reduce 0%
  84. 13/08/26 23:44:48 INFO mapred.JobClient: map 100% reduce 100%
  85. 13/08/26 23:44:53 INFO mapred.JobClient: Job complete: job_201308212334_0057
  86. 13/08/26 23:44:53 INFO mapred.JobClient: Counters: 29
  87. 13/08/26 23:44:53 INFO mapred.JobClient: Job Counters
  88. 13/08/26 23:44:53 INFO mapred.JobClient: Launched reduce tasks=1
  89. 13/08/26 23:44:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=31312
  90. 13/08/26 23:44:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  91. 13/08/26 23:44:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  92. 13/08/26 23:44:53 INFO mapred.JobClient: Launched map tasks=1
  93. 13/08/26 23:44:53 INFO mapred.JobClient: Data-local map tasks=1
  94. 13/08/26 23:44:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18422
  95. 13/08/26 23:44:53 INFO mapred.JobClient: File Output Format Counters
  96. 13/08/26 23:44:53 INFO mapred.JobClient: Bytes Written=2315037
  97. 13/08/26 23:44:53 INFO mapred.JobClient: FileSystemCounters
  98. 13/08/26 23:44:53 INFO mapred.JobClient: FILE_BYTES_READ=11857906
  99. 13/08/26 23:44:53 INFO mapred.JobClient: HDFS_BYTES_READ=27503742
  100. 13/08/26 23:44:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=15440401
  101. 13/08/26 23:44:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2315037
  102. 13/08/26 23:44:53 INFO mapred.JobClient: File Input Format Counters
  103. 13/08/26 23:44:53 INFO mapred.JobClient: Bytes Read=27503580
  104. 13/08/26 23:44:53 INFO mapred.JobClient: Map-Reduce Framework
  105. 13/08/26 23:44:53 INFO mapred.JobClient: Map output materialized bytes=3538084
  106. 13/08/26 23:44:53 INFO mapred.JobClient: Map input records=18846
  107. 13/08/26 23:44:53 INFO mapred.JobClient: Reduce shuffle bytes=0
  108. 13/08/26 23:44:53 INFO mapred.JobClient: Spilled Records=849345
  109. 13/08/26 23:44:53 INFO mapred.JobClient: Map output bytes=39462740
  110. 13/08/26 23:44:53 INFO mapred.JobClient: Total committed heap usage (bytes)=176033792
  111. 13/08/26 23:44:53 INFO mapred.JobClient: CPU time spent (ms)=14080
  112. 13/08/26 23:44:53 INFO mapred.JobClient: Combine input records=3026242
  113. 13/08/26 23:44:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=162
  114. 13/08/26 23:44:53 INFO mapred.JobClient: Reduce input records=192904
  115. 13/08/26 23:44:53 INFO mapred.JobClient: Reduce input groups=192904
  116. 13/08/26 23:44:53 INFO mapred.JobClient: Combine output records=554873
  117. 13/08/26 23:44:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=283111424
  118. 13/08/26 23:44:53 INFO mapred.JobClient: Reduce output records=93563
  119. 13/08/26 23:44:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  120. 13/08/26 23:44:53 INFO mapred.JobClient: Map output records=2664273
  121. 13/08/26 23:44:54 INFO input.FileInputFormat: Total input paths to process : 1
  122. 13/08/26 23:44:55 INFO mapred.JobClient: Running job: job_201308212334_0058
  123. 13/08/26 23:44:56 INFO mapred.JobClient: map 0% reduce 0%
  124. 13/08/26 23:45:13 INFO mapred.JobClient: map 94% reduce 0%
  125. 13/08/26 23:45:16 INFO mapred.JobClient: map 100% reduce 0%
  126. 13/08/26 23:45:43 INFO mapred.JobClient: map 100% reduce 100%
  127. 13/08/26 23:45:48 INFO mapred.JobClient: Job complete: job_201308212334_0058
  128. 13/08/26 23:45:48 INFO mapred.JobClient: Counters: 29
  129. 13/08/26 23:45:48 INFO mapred.JobClient: Job Counters
  130. 13/08/26 23:45:48 INFO mapred.JobClient: Launched reduce tasks=1
  131. 13/08/26 23:45:48 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21298
  132. 13/08/26 23:45:48 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  133. 13/08/26 23:45:48 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  134. 13/08/26 23:45:48 INFO mapred.JobClient: Launched map tasks=1
  135. 13/08/26 23:45:48 INFO mapred.JobClient: Data-local map tasks=1
  136. 13/08/26 23:45:48 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=24763
  137. 13/08/26 23:45:48 INFO mapred.JobClient: File Output Format Counters
  138. 13/08/26 23:45:48 INFO mapred.JobClient: Bytes Written=29314118
  139. 13/08/26 23:45:48 INFO mapred.JobClient: FileSystemCounters
  140. 13/08/26 23:45:48 INFO mapred.JobClient: FILE_BYTES_READ=27274291
  141. 13/08/26 23:45:48 INFO mapred.JobClient: HDFS_BYTES_READ=29440826
  142. 13/08/26 23:45:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=54595105
  143. 13/08/26 23:45:48 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118
  144. 13/08/26 23:45:48 INFO mapred.JobClient: File Input Format Counters
  145. 13/08/26 23:45:48 INFO mapred.JobClient: Bytes Read=27503580
  146. 13/08/26 23:45:48 INFO mapred.JobClient: Map-Reduce Framework
  147. 13/08/26 23:45:48 INFO mapred.JobClient: Map output materialized bytes=27274291
  148. 13/08/26 23:45:48 INFO mapred.JobClient: Map input records=18846
  149. 13/08/26 23:45:48 INFO mapred.JobClient: Reduce shuffle bytes=0
  150. 13/08/26 23:45:48 INFO mapred.JobClient: Spilled Records=37692
  151. 13/08/26 23:45:48 INFO mapred.JobClient: Map output bytes=27199343
  152. 13/08/26 23:45:48 INFO mapred.JobClient: Total committed heap usage (bytes)=215695360
  153. 13/08/26 23:45:48 INFO mapred.JobClient: CPU time spent (ms)=12980
  154. 13/08/26 23:45:48 INFO mapred.JobClient: Combine input records=0
  155. 13/08/26 23:45:48 INFO mapred.JobClient: SPLIT_RAW_BYTES=162
  156. 13/08/26 23:45:48 INFO mapred.JobClient: Reduce input records=18846
  157. 13/08/26 23:45:48 INFO mapred.JobClient: Reduce input groups=18846
  158. 13/08/26 23:45:48 INFO mapred.JobClient: Combine output records=0
  159. 13/08/26 23:45:48 INFO mapred.JobClient: Physical memory (bytes) snapshot=332349440
  160. 13/08/26 23:45:48 INFO mapred.JobClient: Reduce output records=18846
  161. 13/08/26 23:45:48 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  162. 13/08/26 23:45:48 INFO mapred.JobClient: Map output records=18846
  163. 13/08/26 23:45:49 INFO input.FileInputFormat: Total input paths to process : 1
  164. 13/08/26 23:45:49 INFO mapred.JobClient: Running job: job_201308212334_0059
  165. 13/08/26 23:45:50 INFO mapred.JobClient: map 0% reduce 0%
  166. 13/08/26 23:46:10 INFO mapred.JobClient: map 100% reduce 0%
  167. 13/08/26 23:46:25 INFO mapred.JobClient: map 100% reduce 92%
  168. 13/08/26 23:46:31 INFO mapred.JobClient: map 100% reduce 100%
  169. 13/08/26 23:46:36 INFO mapred.JobClient: Job complete: job_201308212334_0059
  170. 13/08/26 23:46:36 INFO mapred.JobClient: Counters: 29
  171. 13/08/26 23:46:36 INFO mapred.JobClient: Job Counters
  172. 13/08/26 23:46:36 INFO mapred.JobClient: Launched reduce tasks=1
  173. 13/08/26 23:46:36 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18217
  174. 13/08/26 23:46:36 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  175. 13/08/26 23:46:36 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  176. 13/08/26 23:46:36 INFO mapred.JobClient: Launched map tasks=1
  177. 13/08/26 23:46:36 INFO mapred.JobClient: Data-local map tasks=1
  178. 13/08/26 23:46:36 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20981
  179. 13/08/26 23:46:36 INFO mapred.JobClient: File Output Format Counters
  180. 13/08/26 23:46:36 INFO mapred.JobClient: Bytes Written=29314118
  181. 13/08/26 23:46:36 INFO mapred.JobClient: FileSystemCounters
  182. 13/08/26 23:46:36 INFO mapred.JobClient: FILE_BYTES_READ=29059398
  183. 13/08/26 23:46:36 INFO mapred.JobClient: HDFS_BYTES_READ=29314278
  184. 13/08/26 23:46:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58163419
  185. 13/08/26 23:46:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118
  186. 13/08/26 23:46:36 INFO mapred.JobClient: File Input Format Counters
  187. 13/08/26 23:46:36 INFO mapred.JobClient: Bytes Read=29314118
  188. 13/08/26 23:46:36 INFO mapred.JobClient: Map-Reduce Framework
  189. 13/08/26 23:46:36 INFO mapred.JobClient: Map output materialized bytes=29059398
  190. 13/08/26 23:46:36 INFO mapred.JobClient: Map input records=18846
  191. 13/08/26 23:46:36 INFO mapred.JobClient: Reduce shuffle bytes=0
  192. 13/08/26 23:46:36 INFO mapred.JobClient: Spilled Records=37692
  193. 13/08/26 23:46:36 INFO mapred.JobClient: Map output bytes=28984080
  194. 13/08/26 23:46:36 INFO mapred.JobClient: Total committed heap usage (bytes)=205225984
  195. 13/08/26 23:46:36 INFO mapred.JobClient: CPU time spent (ms)=8650
  196. 13/08/26 23:46:36 INFO mapred.JobClient: Combine input records=0
  197. 13/08/26 23:46:37 INFO mapred.JobClient: SPLIT_RAW_BYTES=160
  198. 13/08/26 23:46:37 INFO mapred.JobClient: Reduce input records=18846
  199. 13/08/26 23:46:37 INFO mapred.JobClient: Reduce input groups=18846
  200. 13/08/26 23:46:37 INFO mapred.JobClient: Combine output records=0
  201. 13/08/26 23:46:37 INFO mapred.JobClient: Physical memory (bytes) snapshot=313606144
  202. 13/08/26 23:46:37 INFO mapred.JobClient: Reduce output records=18846
  203. 13/08/26 23:46:37 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  204. 13/08/26 23:46:37 INFO mapred.JobClient: Map output records=18846
  205. 13/08/26 23:46:37 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
  206. 13/08/26 23:46:37 INFO input.FileInputFormat: Total input paths to process : 1
  207. 13/08/26 23:46:37 INFO mapred.JobClient: Running job: job_201308212334_0060
  208. 13/08/26 23:46:38 INFO mapred.JobClient: map 0% reduce 0%
  209. 13/08/26 23:46:56 INFO mapred.JobClient: map 100% reduce 0%
  210. 13/08/26 23:47:14 INFO mapred.JobClient: map 100% reduce 100%
  211. 13/08/26 23:47:19 INFO mapred.JobClient: Job complete: job_201308212334_0060
  212. 13/08/26 23:47:19 INFO mapred.JobClient: Counters: 29
  213. 13/08/26 23:47:19 INFO mapred.JobClient: Job Counters
  214. 13/08/26 23:47:19 INFO mapred.JobClient: Launched reduce tasks=1
  215. 13/08/26 23:47:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21504
  216. 13/08/26 23:47:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  217. 13/08/26 23:47:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  218. 13/08/26 23:47:19 INFO mapred.JobClient: Launched map tasks=1
  219. 13/08/26 23:47:19 INFO mapred.JobClient: Data-local map tasks=1
  220. 13/08/26 23:47:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14273
  221. 13/08/26 23:47:19 INFO mapred.JobClient: File Output Format Counters
  222. 13/08/26 23:47:19 INFO mapred.JobClient: Bytes Written=1890073
  223. 13/08/26 23:47:19 INFO mapred.JobClient: FileSystemCounters
  224. 13/08/26 23:47:19 INFO mapred.JobClient: FILE_BYTES_READ=4880788
  225. 13/08/26 23:47:19 INFO mapred.JobClient: HDFS_BYTES_READ=29314271
  226. 13/08/26 23:47:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6235019
  227. 13/08/26 23:47:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1890073
  228. 13/08/26 23:47:19 INFO mapred.JobClient: File Input Format Counters
  229. 13/08/26 23:47:19 INFO mapred.JobClient: Bytes Read=29314118
  230. 13/08/26 23:47:19 INFO mapred.JobClient: Map-Reduce Framework
  231. 13/08/26 23:47:19 INFO mapred.JobClient: Map output materialized bytes=1309902
  232. 13/08/26 23:47:19 INFO mapred.JobClient: Map input records=18846
  233. 13/08/26 23:47:19 INFO mapred.JobClient: Reduce shuffle bytes=0
  234. 13/08/26 23:47:19 INFO mapred.JobClient: Spilled Records=442187
  235. 13/08/26 23:47:19 INFO mapred.JobClient: Map output bytes=31005336
  236. 13/08/26 23:47:19 INFO mapred.JobClient: Total committed heap usage (bytes)=176033792
  237. 13/08/26 23:47:19 INFO mapred.JobClient: CPU time spent (ms)=9210
  238. 13/08/26 23:47:19 INFO mapred.JobClient: Combine input records=2838837
  239. 13/08/26 23:47:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=153
  240. 13/08/26 23:47:19 INFO mapred.JobClient: Reduce input records=93564
  241. 13/08/26 23:47:19 INFO mapred.JobClient: Reduce input groups=93564
  242. 13/08/26 23:47:19 INFO mapred.JobClient: Combine output records=348623
  243. 13/08/26 23:47:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=284684288
  244. 13/08/26 23:47:19 INFO mapred.JobClient: Reduce output records=93564
  245. 13/08/26 23:47:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  246. 13/08/26 23:47:19 INFO mapred.JobClient: Map output records=2583778
  247. 13/08/26 23:47:19 INFO input.FileInputFormat: Total input paths to process : 1
  248. 13/08/26 23:47:19 INFO mapred.JobClient: Running job: job_201308212334_0061
  249. 13/08/26 23:47:20 INFO mapred.JobClient: map 0% reduce 0%
  250. 13/08/26 23:47:38 INFO mapred.JobClient: map 100% reduce 0%
  251. 13/08/26 23:47:53 INFO mapred.JobClient: map 100% reduce 67%
  252. 13/08/26 23:47:59 INFO mapred.JobClient: map 100% reduce 100%
  253. 13/08/26 23:48:04 INFO mapred.JobClient: Job complete: job_201308212334_0061
  254. 13/08/26 23:48:04 INFO mapred.JobClient: Counters: 29
  255. 13/08/26 23:48:04 INFO mapred.JobClient: Job Counters
  256. 13/08/26 23:48:04 INFO mapred.JobClient: Launched reduce tasks=1
  257. 13/08/26 23:48:04 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18292
  258. 13/08/26 23:48:04 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  259. 13/08/26 23:48:04 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  260. 13/08/26 23:48:04 INFO mapred.JobClient: Launched map tasks=1
  261. 13/08/26 23:48:04 INFO mapred.JobClient: Data-local map tasks=1
  262. 13/08/26 23:48:04 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19293
  263. 13/08/26 23:48:04 INFO mapred.JobClient: File Output Format Counters
  264. 13/08/26 23:48:04 INFO mapred.JobClient: Bytes Written=28689283
  265. 13/08/26 23:48:04 INFO mapred.JobClient: FileSystemCounters
  266. 13/08/26 23:48:04 INFO mapred.JobClient: FILE_BYTES_READ=29059398
  267. 13/08/26 23:48:04 INFO mapred.JobClient: HDFS_BYTES_READ=31204324
  268. 13/08/26 23:48:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58165045
  269. 13/08/26 23:48:04 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283
  270. 13/08/26 23:48:04 INFO mapred.JobClient: File Input Format Counters
  271. 13/08/26 23:48:04 INFO mapred.JobClient: Bytes Read=29314118
  272. 13/08/26 23:48:04 INFO mapred.JobClient: Map-Reduce Framework
  273. 13/08/26 23:48:04 INFO mapred.JobClient: Map output materialized bytes=29059398
  274. 13/08/26 23:48:04 INFO mapred.JobClient: Map input records=18846
  275. 13/08/26 23:48:04 INFO mapred.JobClient: Reduce shuffle bytes=0
  276. 13/08/26 23:48:04 INFO mapred.JobClient: Spilled Records=37692
  277. 13/08/26 23:48:04 INFO mapred.JobClient: Map output bytes=28984080
  278. 13/08/26 23:48:04 INFO mapred.JobClient: Total committed heap usage (bytes)=205225984
  279. 13/08/26 23:48:04 INFO mapred.JobClient: CPU time spent (ms)=8770
  280. 13/08/26 23:48:04 INFO mapred.JobClient: Combine input records=0
  281. 13/08/26 23:48:04 INFO mapred.JobClient: SPLIT_RAW_BYTES=153
  282. 13/08/26 23:48:04 INFO mapred.JobClient: Reduce input records=18846
  283. 13/08/26 23:48:04 INFO mapred.JobClient: Reduce input groups=18846
  284. 13/08/26 23:48:04 INFO mapred.JobClient: Combine output records=0
  285. 13/08/26 23:48:04 INFO mapred.JobClient: Physical memory (bytes) snapshot=320401408
  286. 13/08/26 23:48:04 INFO mapred.JobClient: Reduce output records=18846
  287. 13/08/26 23:48:04 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  288. 13/08/26 23:48:04 INFO mapred.JobClient: Map output records=18846
  289. 13/08/26 23:48:05 INFO input.FileInputFormat: Total input paths to process : 1
  290. 13/08/26 23:48:05 INFO mapred.JobClient: Running job: job_201308212334_0062
  291. 13/08/26 23:48:06 INFO mapred.JobClient: map 0% reduce 0%
  292. 13/08/26 23:48:24 INFO mapred.JobClient: map 100% reduce 0%
  293. 13/08/26 23:48:36 INFO mapred.JobClient: map 100% reduce 33%
  294. 13/08/26 23:48:39 INFO mapred.JobClient: map 100% reduce 86%
  295. 13/08/26 23:48:48 INFO mapred.JobClient: map 100% reduce 100%
  296. 13/08/26 23:48:53 INFO mapred.JobClient: Job complete: job_201308212334_0062
  297. 13/08/26 23:48:53 INFO mapred.JobClient: Counters: 29
  298. 13/08/26 23:48:53 INFO mapred.JobClient: Job Counters
  299. 13/08/26 23:48:53 INFO mapred.JobClient: Launched reduce tasks=1
  300. 13/08/26 23:48:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18225
  301. 13/08/26 23:48:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  302. 13/08/26 23:48:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  303. 13/08/26 23:48:53 INFO mapred.JobClient: Launched map tasks=1
  304. 13/08/26 23:48:53 INFO mapred.JobClient: Data-local map tasks=1
  305. 13/08/26 23:48:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=21045
  306. 13/08/26 23:48:53 INFO mapred.JobClient: File Output Format Counters
  307. 13/08/26 23:48:53 INFO mapred.JobClient: Bytes Written=28689283
  308. 13/08/26 23:48:53 INFO mapred.JobClient: FileSystemCounters
  309. 13/08/26 23:48:53 INFO mapred.JobClient: FILE_BYTES_READ=28437750
  310. 13/08/26 23:48:53 INFO mapred.JobClient: HDFS_BYTES_READ=28689443
  311. 13/08/26 23:48:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=56920127
  312. 13/08/26 23:48:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283
  313. 13/08/26 23:48:53 INFO mapred.JobClient: File Input Format Counters
  314. 13/08/26 23:48:53 INFO mapred.JobClient: Bytes Read=28689283
  315. 13/08/26 23:48:53 INFO mapred.JobClient: Map-Reduce Framework
  316. 13/08/26 23:48:53 INFO mapred.JobClient: Map output materialized bytes=28437750
  317. 13/08/26 23:48:53 INFO mapred.JobClient: Map input records=18846
  318. 13/08/26 23:48:53 INFO mapred.JobClient: Reduce shuffle bytes=0
  319. 13/08/26 23:48:53 INFO mapred.JobClient: Spilled Records=37692
  320. 13/08/26 23:48:53 INFO mapred.JobClient: Map output bytes=28362505
  321. 13/08/26 23:48:53 INFO mapred.JobClient: Total committed heap usage (bytes)=204603392
  322. 13/08/26 23:48:53 INFO mapred.JobClient: CPU time spent (ms)=8340
  323. 13/08/26 23:48:53 INFO mapred.JobClient: Combine input records=0
  324. 13/08/26 23:48:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=160
  325. 13/08/26 23:48:53 INFO mapred.JobClient: Reduce input records=18846
  326. 13/08/26 23:48:53 INFO mapred.JobClient: Reduce input groups=18846
  327. 13/08/26 23:48:53 INFO mapred.JobClient: Combine output records=0
  328. 13/08/26 23:48:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=313868288
  329. 13/08/26 23:48:53 INFO mapred.JobClient: Reduce output records=18846
  330. 13/08/26 23:48:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
  331. 13/08/26 23:48:53 INFO mapred.JobClient: Map output records=18846
  332. 13/08/26 23:48:53 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
  333. 13/08/26 23:48:53 INFO driver.MahoutDriver: Program took 339621 ms (Minutes: 5.66035)
  334. + echo 'Creating training and holdout set with a random 80-20 split of the generated vector dataset'
  335. Creating training and holdout set with a random 80-20 split of the generated vector dataset
  336. + ./bin/mahout split -i /home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors --trainingOutput /home/mahout/mahout-work-mahout/20news-train-vectors --testOutput /home/mahout/mahout-work-mahout/20news-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
  337. Warning: $HADOOP_HOME is deprecated.
  338.  
  339. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  340. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  341. Warning: $HADOOP_HOME is deprecated.
  342.  
  343. SLF4J: Class path contains multiple SLF4J bindings.
  344. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  345. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  346. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  347. 13/08/26 23:49:06 WARN driver.MahoutDriver: No split.props found on classpath, will use command-line arguments only
  348. 13/08/26 23:49:07 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors], --method=[sequential], --overwrite=null, --randomSelectionPct=[40], --sequenceFiles=null, --startPhase=[0], --tempDir=[temp], --testOutput=[/home/mahout/mahout-work-mahout/20news-test-vectors], --trainingOutput=[/home/mahout/mahout-work-mahout/20news-train-vectors]}
  349. 13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 has 162419 lines
  350. 13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 test split size is 64968 based on random selection percentage 40
  351. 13/08/26 23:49:11 INFO util.NativeCodeLoader: Loaded the native-hadoop library
  352. 13/08/26 23:49:11 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
  353. 13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
  354. 13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
  355. 13/08/26 23:49:16 INFO utils.SplitInput: file: part-r-00000, input: 162419 train: 11321, test: 7525 starting at 0
  356. 13/08/26 23:49:16 INFO driver.MahoutDriver: Program took 9786 ms (Minutes: 0.1631)
  357. + echo 'Training Naive Bayes model'
  358. Training Naive Bayes model
  359. + ./bin/mahout trainnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -el -o /home/mahout/mahout-work-mahout/model -li /home/mahout/mahout-work-mahout/labelindex -ow
  360. Warning: $HADOOP_HOME is deprecated.
  361.  
  362. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  363. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  364. Warning: $HADOOP_HOME is deprecated.
  365.  
  366. SLF4J: Class path contains multiple SLF4J bindings.
  367. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  368. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  369. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  370. 13/08/26 23:49:22 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only
  371. 13/08/26 23:49:22 INFO common.AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --output=[/home/mahout/mahout-work-mahout/model], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
  372. 13/08/26 23:49:23 INFO common.HadoopUtil: Deleting temp
  373. 13/08/26 23:49:23 INFO util.NativeCodeLoader: Loaded the native-hadoop library
  374. 13/08/26 23:49:23 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
  375. 13/08/26 23:49:23 INFO compress.CodecPool: Got brand-new decompressor
  376. 13/08/26 23:49:26 INFO input.FileInputFormat: Total input paths to process : 1
  377. 13/08/26 23:49:26 INFO mapred.JobClient: Running job: job_201308212334_0063
  378. 13/08/26 23:49:27 INFO mapred.JobClient: map 0% reduce 0%
  379. 13/08/26 23:49:49 INFO mapred.JobClient: map 43% reduce 0%
  380. 13/08/26 23:49:52 INFO mapred.JobClient: map 100% reduce 0%
  381. 13/08/26 23:50:13 INFO mapred.JobClient: map 100% reduce 100%
  382. 13/08/26 23:50:18 INFO mapred.JobClient: Job complete: job_201308212334_0063
  383. 13/08/26 23:50:18 INFO mapred.JobClient: Counters: 29
  384. 13/08/26 23:50:18 INFO mapred.JobClient: Job Counters
  385. 13/08/26 23:50:18 INFO mapred.JobClient: Launched reduce tasks=1
  386. 13/08/26 23:50:18 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22816
  387. 13/08/26 23:50:18 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  388. 13/08/26 23:50:18 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  389. 13/08/26 23:50:18 INFO mapred.JobClient: Launched map tasks=1
  390. 13/08/26 23:50:18 INFO mapred.JobClient: Data-local map tasks=1
  391. 13/08/26 23:50:18 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20680
  392. 13/08/26 23:50:18 INFO mapred.JobClient: File Output Format Counters
  393. 13/08/26 23:50:18 INFO mapred.JobClient: Bytes Written=2718605
  394. 13/08/26 23:50:18 INFO mapred.JobClient: FileSystemCounters
  395. 13/08/26 23:50:18 INFO mapred.JobClient: FILE_BYTES_READ=1404371
  396. 13/08/26 23:50:18 INFO mapred.JobClient: HDFS_BYTES_READ=12669237
  397. 13/08/26 23:50:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2854477
  398. 13/08/26 23:50:18 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2718605
  399. 13/08/26 23:50:18 INFO mapred.JobClient: File Input Format Counters
  400. 13/08/26 23:50:18 INFO mapred.JobClient: Bytes Read=12668431
  401. 13/08/26 23:50:18 INFO mapred.JobClient: Map-Reduce Framework
  402. 13/08/26 23:50:18 INFO mapred.JobClient: Map output materialized bytes=1404363
  403. 13/08/26 23:50:18 INFO mapred.JobClient: Map input records=11321
  404. 13/08/26 23:50:18 INFO mapred.JobClient: Reduce shuffle bytes=1404363
  405. 13/08/26 23:50:18 INFO mapred.JobClient: Spilled Records=40
  406. 13/08/26 23:50:18 INFO mapred.JobClient: Map output bytes=16682576
  407. 13/08/26 23:50:18 INFO mapred.JobClient: Total committed heap usage (bytes)=176164864
  408. 13/08/26 23:50:18 INFO mapred.JobClient: CPU time spent (ms)=8190
  409. 13/08/26 23:50:18 INFO mapred.JobClient: Combine input records=11321
  410. 13/08/26 23:50:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=148
  411. 13/08/26 23:50:18 INFO mapred.JobClient: Reduce input records=20
  412. 13/08/26 23:50:18 INFO mapred.JobClient: Reduce input groups=20
  413. 13/08/26 23:50:18 INFO mapred.JobClient: Combine output records=20
  414. 13/08/26 23:50:18 INFO mapred.JobClient: Physical memory (bytes) snapshot=294400000
  415. 13/08/26 23:50:18 INFO mapred.JobClient: Reduce output records=20
  416. 13/08/26 23:50:18 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1961967616
  417. 13/08/26 23:50:18 INFO mapred.JobClient: Map output records=11321
  418. 13/08/26 23:50:18 INFO input.FileInputFormat: Total input paths to process : 1
  419. 13/08/26 23:50:18 INFO mapred.JobClient: Running job: job_201308212334_0064
  420. 13/08/26 23:50:19 INFO mapred.JobClient: map 0% reduce 0%
  421. 13/08/26 23:50:40 INFO mapred.JobClient: map 100% reduce 0%
  422. 13/08/26 23:51:01 INFO mapred.JobClient: map 100% reduce 100%
  423. 13/08/26 23:51:06 INFO mapred.JobClient: Job complete: job_201308212334_0064
  424. 13/08/26 23:51:06 INFO mapred.JobClient: Counters: 29
  425. 13/08/26 23:51:06 INFO mapred.JobClient: Job Counters
  426. 13/08/26 23:51:06 INFO mapred.JobClient: Launched reduce tasks=1
  427. 13/08/26 23:51:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=24609
  428. 13/08/26 23:51:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  429. 13/08/26 23:51:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  430. 13/08/26 23:51:06 INFO mapred.JobClient: Launched map tasks=1
  431. 13/08/26 23:51:06 INFO mapred.JobClient: Data-local map tasks=1
  432. 13/08/26 23:51:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=15258
  433. 13/08/26 23:51:06 INFO mapred.JobClient: File Output Format Counters
  434. 13/08/26 23:51:06 INFO mapred.JobClient: Bytes Written=893560
  435. 13/08/26 23:51:06 INFO mapred.JobClient: FileSystemCounters
  436. 13/08/26 23:51:06 INFO mapred.JobClient: FILE_BYTES_READ=362674
  437. 13/08/26 23:51:06 INFO mapred.JobClient: HDFS_BYTES_READ=2718737
  438. 13/08/26 23:51:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=771195
  439. 13/08/26 23:51:06 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=893560
  440. 13/08/26 23:51:06 INFO mapred.JobClient: File Input Format Counters
  441. 13/08/26 23:51:06 INFO mapred.JobClient: Bytes Read=2718605
  442. 13/08/26 23:51:06 INFO mapred.JobClient: Map-Reduce Framework
  443. 13/08/26 23:51:06 INFO mapred.JobClient: Map output materialized bytes=362666
  444. 13/08/26 23:51:06 INFO mapred.JobClient: Map input records=20
  445. 13/08/26 23:51:06 INFO mapred.JobClient: Reduce shuffle bytes=362666
  446. 13/08/26 23:51:06 INFO mapred.JobClient: Spilled Records=4
  447. 13/08/26 23:51:06 INFO mapred.JobClient: Map output bytes=893434
  448. 13/08/26 23:51:06 INFO mapred.JobClient: Total committed heap usage (bytes)=223264768
  449. 13/08/26 23:51:06 INFO mapred.JobClient: CPU time spent (ms)=5370
  450. 13/08/26 23:51:06 INFO mapred.JobClient: Combine input records=2
  451. 13/08/26 23:51:06 INFO mapred.JobClient: SPLIT_RAW_BYTES=132
  452. 13/08/26 23:51:06 INFO mapred.JobClient: Reduce input records=2
  453. 13/08/26 23:51:06 INFO mapred.JobClient: Reduce input groups=2
  454. 13/08/26 23:51:06 INFO mapred.JobClient: Combine output records=2
  455. 13/08/26 23:51:06 INFO mapred.JobClient: Physical memory (bytes) snapshot=300597248
  456. 13/08/26 23:51:06 INFO mapred.JobClient: Reduce output records=2
  457. 13/08/26 23:51:06 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1961967616
  458. 13/08/26 23:51:06 INFO mapred.JobClient: Map output records=2
  459. 13/08/26 23:51:07 INFO driver.MahoutDriver: Program took 104944 ms (Minutes: 1.7490666666666668)
  460. + echo 'Self testing on training set'
  461. Self testing on training set
  462. + ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
  463. Warning: $HADOOP_HOME is deprecated.
  464.  
  465. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  466. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  467. Warning: $HADOOP_HOME is deprecated.
  468.  
  469. SLF4J: Class path contains multiple SLF4J bindings.
  470. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  471. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  472. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  473. 13/08/26 23:51:19 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
  474. 13/08/26 23:51:19 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
  475. 13/08/26 23:51:20 INFO input.FileInputFormat: Total input paths to process : 1
  476. 13/08/26 23:51:21 INFO mapred.JobClient: Running job: job_201308212334_0065
  477. 13/08/26 23:51:22 INFO mapred.JobClient: map 0% reduce 0%
  478. 13/08/26 23:51:45 INFO mapred.JobClient: map 51% reduce 0%
  479. 13/08/26 23:51:48 INFO mapred.JobClient: map 89% reduce 0%
  480. 13/08/26 23:51:54 INFO mapred.JobClient: map 100% reduce 0%
  481. 13/08/26 23:51:58 INFO mapred.JobClient: Job complete: job_201308212334_0065
  482. 13/08/26 23:51:58 INFO mapred.JobClient: Counters: 19
  483. 13/08/26 23:51:58 INFO mapred.JobClient: Job Counters
  484. 13/08/26 23:51:58 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=34216
  485. 13/08/26 23:51:58 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  486. 13/08/26 23:51:58 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  487. 13/08/26 23:51:58 INFO mapred.JobClient: Launched map tasks=1
  488. 13/08/26 23:51:58 INFO mapred.JobClient: Data-local map tasks=1
  489. 13/08/26 23:51:58 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
  490. 13/08/26 23:51:58 INFO mapred.JobClient: File Output Format Counters
  491. 13/08/26 23:51:58 INFO mapred.JobClient: Bytes Written=2132486
  492. 13/08/26 23:51:58 INFO mapred.JobClient: FileSystemCounters
  493. 13/08/26 23:51:58 INFO mapred.JobClient: HDFS_BYTES_READ=16279896
  494. 13/08/26 23:51:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22523
  495. 13/08/26 23:51:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2132486
  496. 13/08/26 23:51:58 INFO mapred.JobClient: File Input Format Counters
  497. 13/08/26 23:51:58 INFO mapred.JobClient: Bytes Read=12668431
  498. 13/08/26 23:51:58 INFO mapred.JobClient: Map-Reduce Framework
  499. 13/08/26 23:51:58 INFO mapred.JobClient: Map input records=11321
  500. 13/08/26 23:51:58 INFO mapred.JobClient: Physical memory (bytes) snapshot=87547904
  501. 13/08/26 23:51:58 INFO mapred.JobClient: Spilled Records=0
  502. 13/08/26 23:51:58 INFO mapred.JobClient: CPU time spent (ms)=9380
  503. 13/08/26 23:51:58 INFO mapred.JobClient: Total committed heap usage (bytes)=28131328
  504. 13/08/26 23:51:58 INFO mapred.JobClient: Virtual memory (bytes) snapshot=976572416
  505. 13/08/26 23:51:58 INFO mapred.JobClient: Map output records=11321
  506. 13/08/26 23:51:58 INFO mapred.JobClient: SPLIT_RAW_BYTES=148
  507. 13/08/26 23:51:59 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
  508. Summary
  509. -------------------------------------------------------
  510. Correctly Classified Instances : 11256 99.4258%
  511. Incorrectly Classified Instances : 65 0.5742%
  512. Total Classified Instances : 11321
  513.  
  514. =======================================================
  515. Confusion Matrix
  516. -------------------------------------------------------
  517. a b c d e f g h i j k l m n o p q r s t <--Classified as
  518. 454 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 | 458 a = alt.atheism
  519. 0 588 0 3 0 2 0 0 0 0 0 1 0 1 0 0 0 0 0 0 | 595 b = comp.graphics
  520. 0 3 553 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 563 c = comp.os.ms-windows.misc
  521. 0 0 0 592 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 | 595 d = comp.sys.ibm.pc.hardware
  522. 0 0 0 1 593 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 594 e = comp.sys.mac.hardware
  523. 0 2 0 1 0 576 1 0 0 0 0 0 0 0 1 0 0 0 0 0 | 581 f = comp.windows.x
  524. 0 1 0 0 0 0 579 0 0 0 0 0 1 0 0 0 0 0 0 0 | 581 g = misc.forsale
  525. 0 0 0 0 0 0 1 594 0 0 0 0 1 0 0 0 0 0 0 0 | 596 h = rec.autos
  526. 0 0 0 0 0 0 1 2 591 0 0 0 0 0 0 0 0 0 0 0 | 594 i = rec.motorcycles
  527. 0 0 0 0 0 0 0 0 0 615 1 0 0 0 0 0 0 0 0 0 | 616 j = rec.sport.baseball
  528. 0 0 0 0 0 0 1 0 0 1 581 0 0 0 0 0 0 0 0 0 | 583 k = rec.sport.hockey
  529. 0 0 1 0 0 0 0 0 0 0 0 627 1 0 0 0 0 1 0 0 | 630 l = sci.crypt
  530. 0 0 0 2 0 0 1 0 0 0 0 0 588 0 0 0 0 0 0 0 | 591 m = sci.electronics
  531. 0 1 0 0 0 0 0 0 0 0 0 0 0 586 1 0 0 0 0 0 | 588 n = sci.med
  532. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 615 0 0 0 0 0 | 615 o = sci.space
  533. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 619 1 0 0 0 | 620 p = soc.religion.christian
  534. 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 541 0 0 0 | 543 q = talk.politics.mideast
  535. 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 560 0 0 | 561 r = talk.politics.guns
  536. 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 1 351 0 | 359 s = talk.religion.misc
  537. 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 4 0 453 | 458 t = talk.politics.misc
  538.  
  539. 13/08/26 23:51:59 INFO driver.MahoutDriver: Program took 40214 ms (Minutes: 0.6702333333333333)
  540. + echo 'Testing on holdout set'
  541. Testing on holdout set
  542. + ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-test-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
  543. Warning: $HADOOP_HOME is deprecated.
  544.  
  545. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
  546. MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
  547. Warning: $HADOOP_HOME is deprecated.
  548.  
  549. SLF4J: Class path contains multiple SLF4J bindings.
  550. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  551. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  552. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  553. 13/08/26 23:52:09 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
  554. 13/08/26 23:52:09 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-test-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
  555. 13/08/26 23:52:10 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-testing
  556. 13/08/26 23:52:10 INFO input.FileInputFormat: Total input paths to process : 1
  557. 13/08/26 23:52:11 INFO mapred.JobClient: Running job: job_201308212334_0066
  558. 13/08/26 23:52:12 INFO mapred.JobClient: map 0% reduce 0%
  559. 13/08/26 23:52:30 INFO mapred.JobClient: map 85% reduce 0%
  560. 13/08/26 23:52:36 INFO mapred.JobClient: map 100% reduce 0%
  561. 13/08/26 23:52:41 INFO mapred.JobClient: Job complete: job_201308212334_0066
  562. 13/08/26 23:52:41 INFO mapred.JobClient: Counters: 19
  563. 13/08/26 23:52:41 INFO mapred.JobClient: Job Counters
  564. 13/08/26 23:52:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=25113
  565. 13/08/26 23:52:41 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  566. 13/08/26 23:52:41 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  567. 13/08/26 23:52:41 INFO mapred.JobClient: Launched map tasks=1
  568. 13/08/26 23:52:41 INFO mapred.JobClient: Data-local map tasks=1
  569. 13/08/26 23:52:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
  570. 13/08/26 23:52:41 INFO mapred.JobClient: File Output Format Counters
  571. 13/08/26 23:52:41 INFO mapred.JobClient: Bytes Written=1417942
  572. 13/08/26 23:52:41 INFO mapred.JobClient: FileSystemCounters
  573. 13/08/26 23:52:41 INFO mapred.JobClient: HDFS_BYTES_READ=12148944
  574. 13/08/26 23:52:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22522
  575. 13/08/26 23:52:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1417942
  576. 13/08/26 23:52:41 INFO mapred.JobClient: File Input Format Counters
  577. 13/08/26 23:52:41 INFO mapred.JobClient: Bytes Read=8537480
  578. 13/08/26 23:52:41 INFO mapred.JobClient: Map-Reduce Framework
  579. 13/08/26 23:52:41 INFO mapred.JobClient: Map input records=7525
  580. 13/08/26 23:52:41 INFO mapred.JobClient: Physical memory (bytes) snapshot=85057536
  581. 13/08/26 23:52:41 INFO mapred.JobClient: Spilled Records=0
  582. 13/08/26 23:52:41 INFO mapred.JobClient: CPU time spent (ms)=6630
  583. 13/08/26 23:52:41 INFO mapred.JobClient: Total committed heap usage (bytes)=28155904
  584. 13/08/26 23:52:41 INFO mapred.JobClient: Virtual memory (bytes) snapshot=976572416
  585. 13/08/26 23:52:41 INFO mapred.JobClient: Map output records=7525
  586. 13/08/26 23:52:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=147
  587. 13/08/26 23:52:42 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
  588. Summary
  589. -------------------------------------------------------
  590. Correctly Classified Instances : 6801 90.3787%
  591. Incorrectly Classified Instances : 724 9.6213%
  592. Total Classified Instances : 7525
  593.  
  594. =======================================================
  595. Confusion Matrix
  596. -------------------------------------------------------
  597. a b c d e f g h i j k l m n o p q r s t <--Classified as
  598. 318 0 0 0 1 0 0 0 1 0 0 0 0 0 1 4 0 0 15 1 | 341 a = alt.atheism
  599. 1 318 7 20 4 7 7 2 0 1 0 1 1 2 6 0 0 0 0 1 | 378 b = comp.graphics
  600. 0 25 277 78 12 15 5 0 0 0 0 2 4 0 1 0 0 0 0 3 | 422 c = comp.os.ms-windows.misc
  601. 1 4 3 336 20 3 8 0 0 0 0 1 11 0 0 0 0 0 0 0 | 387 d = comp.sys.ibm.pc.hardware
  602. 0 3 1 6 350 1 3 0 0 0 0 1 3 1 0 0 0 0 0 0 | 369 e = comp.sys.mac.hardware
  603. 1 20 3 6 7 365 3 0 0 0 0 1 0 0 0 0 1 0 0 0 | 407 f = comp.windows.x
  604. 0 1 1 19 8 0 329 13 1 0 0 2 14 0 4 0 0 1 1 0 | 394 g = misc.forsale
  605. 0 2 1 2 3 1 10 361 8 0 0 0 4 0 0 0 0 1 0 1 | 394 h = rec.autos
  606. 0 0 0 1 0 0 2 3 393 1 0 0 0 0 0 0 0 1 0 1 | 402 i = rec.motorcycles
  607. 0 0 0 1 0 0 2 3 0 360 6 0 2 2 1 0 0 0 0 1 | 378 j = rec.sport.baseball
  608. 0 1 0 2 1 0 0 0 2 5 401 0 1 0 0 1 0 0 0 2 | 416 k = rec.sport.hockey
  609. 1 1 0 1 3 2 1 1 0 0 0 344 1 1 2 0 1 1 0 1 | 361 l = sci.crypt
  610. 0 5 0 15 14 0 5 1 1 0 0 2 348 1 1 0 0 0 0 0 | 393 m = sci.electronics
  611. 1 2 1 1 1 0 1 0 0 1 0 1 4 381 5 0 0 1 1 1 | 402 n = sci.med
  612. 1 4 0 0 2 0 2 1 0 0 0 1 2 1 356 0 0 1 0 1 | 372 o = sci.space
  613. 5 0 0 1 1 0 0 1 0 0 1 0 0 1 0 359 3 0 4 1 | 377 p = soc.religion.christian
  614. 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 2 389 0 0 2 | 397 q = talk.politics.mideast
  615. 0 0 1 0 1 1 0 1 0 0 0 2 1 1 0 0 0 335 0 6 | 349 r = talk.politics.guns
  616. 29 1 0 1 0 0 1 0 0 1 0 0 0 0 2 24 0 8 197 5 | 269 s = talk.religion.misc
  617. 2 0 0 0 2 0 0 1 0 1 1 1 0 1 2 0 2 17 3 284 | 317 t = talk.politics.misc
  618.  
  619. 13/08/26 23:52:42 INFO driver.MahoutDriver: Program took 32480 ms (Minutes: 0.5413333333333333)

在job信息可以看到全部的任务信息,如下:


然后对照每个job信息,查看相应的mapper和reducer就可以分析这个算法了。

分享,快乐,成长

转载请注明出处:http://blog.csdn.net/fansy1990

mahout 运行Twenty Newsgroups Classification实例的更多相关文章

  1. Twenty Newsgroups Classification实例任务之TrainNaiveBayesJob(一)

    接着上篇blog,继续看log里面的信息如下: + echo 'Training Naive Bayes model' Training Naive Bayes model + ./bin/mahou ...

  2. Twenty Newsgroups Classification任务之二seq2sparse(5)

    接上篇blog,继续分析.接下来要调用代码如下: // Should document frequency features be processed if (shouldPrune || proce ...

  3. Twenty Newsgroups Classification任务之二seq2sparse(3)

    接上篇,如果想对上篇的问题进行测试其实可以简单的编写下面的代码: package mahout.fansy.test.bayes.write; import java.io.IOException; ...

  4. Twenty Newsgroups Classification任务之二seq2sparse

    seq2sparse对应于mahout中的org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles,从昨天跑的算法中的任务监控界面可以看到 ...

  5. Twenty Newsgroups Classification任务之二seq2sparse(2)

    接上篇,SequenceFileTokenizerMapper的输出文件在/home/mahout/mahout-work-mahout0/20news-vectors/tokenized-docum ...

  6. W3School-CSS 分类 (Classification) 实例

    CSS 分类 (Classification) 实例 CSS 实例 CSS 背景实例 CSS 文本实例 CSS 字体(font)实例 CSS 边框(border)实例 CSS 外边距 (margin) ...

  7. 在Linux(Centos7)系统上对进行Hadoop分布式配置以及运行Hadoop伪分布式实例

    在Linux(Centos7)系统上对进行Hadoop分布式配置以及运行Hadoop伪分布式实例                                                     ...

  8. [HBase Manual]CH5 HBase运行模式:单实例和分布式

    HBase运行模式:单实例和分布式 HBase运行模式:单实例和分布式 1.单实例模式 1.1 单实例在HDFS下 2.分布式 2.1 伪分布式 3完全分布式 HBase有2种运行模式,单实例和分布式 ...

  9. CSS 分类 (Classification) 实例

    CSS 分类 (Classification) 实例CSS 分类属性 (Classification)CSS 分类属性允许你控制如何显示元素,设置图像显示于另一元素中的何处,相对于其正常位置来定位元素 ...

随机推荐

  1. dword word byte 相互转换 .xml

    pre{ line-height:1; color:#800080; background-color:#d2c39b; font-size:16px;}.sysFunc{color:#627cf6; ...

  2. Chapter6:函数

    执行函数的第一步是(隐式地)定义并初始化它的形参.所以,函数最外层作用域中的局部变量也不能使用与函数形参一样的名字. 局部静态变量:在程序的执行路径第一次经过对象定义语句时初始化,并且直到程序终止才被 ...

  3. configsections規範配置信息

    對於小型項目,配置信息可以通过appSettings进行配置,而如果配置信息太多,appSettings显得有些乱,而且在开发人员调用时,也不够友好,节点名称很容易写错,这时,我们有几种解决方案 1 ...

  4. [转]32位和64位系统区别及int字节数

    一)64位系统和32位有什么区别? 1.64bit CPU拥有更大的寻址能力,最大支持到16GB内存,而32bit只支持4G内存 2.64位CPU一次可提取64位数据,比32位提高了一倍,理论上性能会 ...

  5. 单源最短路径-Dijkstra算法

    1.算法标签 贪心 2.算法描述 具体的算法描述网上有好多,我觉得莫过于直接wiki,只说明一些我之前比较迷惑的. 对于Dijkstra算法,最重要的是维护以下几个数据结构: 顶点集合S : 表示已经 ...

  6. 给一已经排序数组A和x,求A中是否包含两个元素之和为x

    亲爱的大神老爷们,这是小渣第一次写blog,欢迎大家来喷(批评指正),算法渣因为面试中连这道题都不会写就自己琢磨了一下,也参考了网上大家的做法 解法一: 思路:从首尾向目的靠拢,因为已经排序,[假设存 ...

  7. Jquery 操作页面中iframe自动跟随窗口大小变化,而页面不出现滚动条,只在iframe内部出滚动条

    很多时候大家需要iframe自适应所加载的页面高度而不要iframe滚动条,但是这次我需要的是页面不需要滚动条而iframe要滚动条,且iframe自动跟随窗口大小变化.自适应页面大小.下面是代码,下 ...

  8. 转】windows下使用批处理脚本实现多个版本的JDK切换

    原博文出自于: http://www.cnblogs.com/xdp-gacl/p/5209386.html 感谢! 一.JDK版本切换批处理脚本 我们平时在window上做开发的时候,可能需要同时开 ...

  9. reentrant可重入函数

    在多任务操作系统环境中,应用程序的各个任务是并发运行的,所以会经常出现多个任务“同时”调用同一个函数的情况.这里之所以在“同时” 这个词上使用了引号,是因为这个歌”同时“的含义与我们平时所说的同时不是 ...

  10. python 加密解密

    1. 使用base64 s1 = base64.encodestring('hello world') s2 = base64.decodestring(s1) print s1, s2 结果 1 2 ...