按照mahout官网https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups的说法,我只用运行一条命令就可以完成这个算法的调用了,如下:

mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh 

但是,我首先运行就出错了,因为我不是root账户,所以先改下路径,打开classify-20newsgroups.sh,替换/tmp/mahout-work-为/home/mahout/mahout-work-,这样用户mahout就具有了操作权限,但是还是出错,提示curl 找不到命令,好吧,我没安装这个,sudo apt-get install curl,ok ,ubuntu还是方便呀。

然后再运行,结果运行到2/3时候还是出错,然后我查看详细信息,居然map输入的数据条数为0?啥意思?好吧,应该是本地文件操作和HDFS文件操作混淆了,其实在执行:

+ ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq

这一步前应该把本地的20news-all上传到HDFS文件系统上面,然后重新执行第一条命令即可,全部信息如下(太多了,不知道贴的完不?):

mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh
Please select a number to choose the corresponding task to run
1. cnaivebayes
2. naivebayes
3. sgd
4. clean -- cleans up the work area in /home/mahout/mahout-work-mahout
Enter your choice : 2
ok. You chose 2 and we'll use naivebayes
creating work directory at /home/mahout/mahout-work-mahout
+ echo 'Preparing 20newsgroups data'
Preparing 20newsgroups data
+ rm -rf /home/mahout/mahout-work-mahout/20news-all
+ mkdir /home/mahout/mahout-work-mahout/20news-all
+ cp -R /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.religion.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.religion.misc /home/mahout/mahout-work-mahout/20news-all
+ echo 'Creating sequence files from 20newsgroups data'
Creating sequence files from 20newsgroups data
+ ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:38:49 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/home/mahout/mahout-work-mahout/20news-all], --keyPrefix=[], --output=[/home/mahout/mahout-work-mahout/20news-seq], --startPhase=[0], --tempDir=[temp]}
13/08/26 23:42:57 INFO driver.MahoutDriver: Program took 248530 ms (Minutes: 4.142166666666666)
+ echo 'Converting sequence files to vectors'
Converting sequence files to vectors
+ ./bin/mahout seq2sparse -i /home/mahout/mahout-work-mahout/20news-seq -o /home/mahout/mahout-work-mahout/20news-vectors -lnorm -nv -wt tfidf
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
13/08/26 23:43:17 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:43:17 INFO mapred.JobClient: Running job: job_201308212334_0056
13/08/26 23:43:18 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:43:45 INFO mapred.JobClient: map 78% reduce 0%
13/08/26 23:43:51 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:43:56 INFO mapred.JobClient: Job complete: job_201308212334_0056
13/08/26 23:43:56 INFO mapred.JobClient: Counters: 19
13/08/26 23:43:56 INFO mapred.JobClient: Job Counters
13/08/26 23:43:56 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=32883
13/08/26 23:43:56 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:43:56 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:43:56 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:43:56 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:43:56 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/08/26 23:43:56 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:43:56 INFO mapred.JobClient: Bytes Written=27503580
13/08/26 23:43:56 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:43:56 INFO mapred.JobClient: HDFS_BYTES_READ=36694022
13/08/26 23:43:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21899
13/08/26 23:43:56 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=27503580
13/08/26 23:43:56 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:43:56 INFO mapred.JobClient: Bytes Read=36693889
13/08/26 23:43:56 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:43:56 INFO mapred.JobClient: Map input records=18846
13/08/26 23:43:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=75157504
13/08/26 23:43:56 INFO mapred.JobClient: Spilled Records=0
13/08/26 23:43:56 INFO mapred.JobClient: CPU time spent (ms)=5730
13/08/26 23:43:56 INFO mapred.JobClient: Total committed heap usage (bytes)=15859712
13/08/26 23:43:56 INFO mapred.JobClient: Virtual memory (bytes) snapshot=974381056
13/08/26 23:43:56 INFO mapred.JobClient: Map output records=18846
13/08/26 23:43:56 INFO mapred.JobClient: SPLIT_RAW_BYTES=133
13/08/26 23:43:56 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:43:56 INFO mapred.JobClient: Running job: job_201308212334_0057
13/08/26 23:43:57 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:44:15 INFO mapred.JobClient: map 3% reduce 0%
13/08/26 23:44:18 INFO mapred.JobClient: map 23% reduce 0%
13/08/26 23:44:21 INFO mapred.JobClient: map 60% reduce 0%
13/08/26 23:44:24 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:44:48 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:44:53 INFO mapred.JobClient: Job complete: job_201308212334_0057
13/08/26 23:44:53 INFO mapred.JobClient: Counters: 29
13/08/26 23:44:53 INFO mapred.JobClient: Job Counters
13/08/26 23:44:53 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:44:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=31312
13/08/26 23:44:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:44:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:44:53 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:44:53 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:44:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18422
13/08/26 23:44:53 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:44:53 INFO mapred.JobClient: Bytes Written=2315037
13/08/26 23:44:53 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:44:53 INFO mapred.JobClient: FILE_BYTES_READ=11857906
13/08/26 23:44:53 INFO mapred.JobClient: HDFS_BYTES_READ=27503742
13/08/26 23:44:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=15440401
13/08/26 23:44:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2315037
13/08/26 23:44:53 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:44:53 INFO mapred.JobClient: Bytes Read=27503580
13/08/26 23:44:53 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:44:53 INFO mapred.JobClient: Map output materialized bytes=3538084
13/08/26 23:44:53 INFO mapred.JobClient: Map input records=18846
13/08/26 23:44:53 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:44:53 INFO mapred.JobClient: Spilled Records=849345
13/08/26 23:44:53 INFO mapred.JobClient: Map output bytes=39462740
13/08/26 23:44:53 INFO mapred.JobClient: Total committed heap usage (bytes)=176033792
13/08/26 23:44:53 INFO mapred.JobClient: CPU time spent (ms)=14080
13/08/26 23:44:53 INFO mapred.JobClient: Combine input records=3026242
13/08/26 23:44:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=162
13/08/26 23:44:53 INFO mapred.JobClient: Reduce input records=192904
13/08/26 23:44:53 INFO mapred.JobClient: Reduce input groups=192904
13/08/26 23:44:53 INFO mapred.JobClient: Combine output records=554873
13/08/26 23:44:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=283111424
13/08/26 23:44:53 INFO mapred.JobClient: Reduce output records=93563
13/08/26 23:44:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:44:53 INFO mapred.JobClient: Map output records=2664273
13/08/26 23:44:54 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:44:55 INFO mapred.JobClient: Running job: job_201308212334_0058
13/08/26 23:44:56 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:45:13 INFO mapred.JobClient: map 94% reduce 0%
13/08/26 23:45:16 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:45:43 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:45:48 INFO mapred.JobClient: Job complete: job_201308212334_0058
13/08/26 23:45:48 INFO mapred.JobClient: Counters: 29
13/08/26 23:45:48 INFO mapred.JobClient: Job Counters
13/08/26 23:45:48 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:45:48 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21298
13/08/26 23:45:48 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:45:48 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:45:48 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:45:48 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:45:48 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=24763
13/08/26 23:45:48 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:45:48 INFO mapred.JobClient: Bytes Written=29314118
13/08/26 23:45:48 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:45:48 INFO mapred.JobClient: FILE_BYTES_READ=27274291
13/08/26 23:45:48 INFO mapred.JobClient: HDFS_BYTES_READ=29440826
13/08/26 23:45:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=54595105
13/08/26 23:45:48 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118
13/08/26 23:45:48 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:45:48 INFO mapred.JobClient: Bytes Read=27503580
13/08/26 23:45:48 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:45:48 INFO mapred.JobClient: Map output materialized bytes=27274291
13/08/26 23:45:48 INFO mapred.JobClient: Map input records=18846
13/08/26 23:45:48 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:45:48 INFO mapred.JobClient: Spilled Records=37692
13/08/26 23:45:48 INFO mapred.JobClient: Map output bytes=27199343
13/08/26 23:45:48 INFO mapred.JobClient: Total committed heap usage (bytes)=215695360
13/08/26 23:45:48 INFO mapred.JobClient: CPU time spent (ms)=12980
13/08/26 23:45:48 INFO mapred.JobClient: Combine input records=0
13/08/26 23:45:48 INFO mapred.JobClient: SPLIT_RAW_BYTES=162
13/08/26 23:45:48 INFO mapred.JobClient: Reduce input records=18846
13/08/26 23:45:48 INFO mapred.JobClient: Reduce input groups=18846
13/08/26 23:45:48 INFO mapred.JobClient: Combine output records=0
13/08/26 23:45:48 INFO mapred.JobClient: Physical memory (bytes) snapshot=332349440
13/08/26 23:45:48 INFO mapred.JobClient: Reduce output records=18846
13/08/26 23:45:48 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:45:48 INFO mapred.JobClient: Map output records=18846
13/08/26 23:45:49 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:45:49 INFO mapred.JobClient: Running job: job_201308212334_0059
13/08/26 23:45:50 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:46:10 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:46:25 INFO mapred.JobClient: map 100% reduce 92%
13/08/26 23:46:31 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:46:36 INFO mapred.JobClient: Job complete: job_201308212334_0059
13/08/26 23:46:36 INFO mapred.JobClient: Counters: 29
13/08/26 23:46:36 INFO mapred.JobClient: Job Counters
13/08/26 23:46:36 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:46:36 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18217
13/08/26 23:46:36 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:46:36 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:46:36 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:46:36 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:46:36 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20981
13/08/26 23:46:36 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:46:36 INFO mapred.JobClient: Bytes Written=29314118
13/08/26 23:46:36 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:46:36 INFO mapred.JobClient: FILE_BYTES_READ=29059398
13/08/26 23:46:36 INFO mapred.JobClient: HDFS_BYTES_READ=29314278
13/08/26 23:46:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58163419
13/08/26 23:46:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118
13/08/26 23:46:36 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:46:36 INFO mapred.JobClient: Bytes Read=29314118
13/08/26 23:46:36 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:46:36 INFO mapred.JobClient: Map output materialized bytes=29059398
13/08/26 23:46:36 INFO mapred.JobClient: Map input records=18846
13/08/26 23:46:36 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:46:36 INFO mapred.JobClient: Spilled Records=37692
13/08/26 23:46:36 INFO mapred.JobClient: Map output bytes=28984080
13/08/26 23:46:36 INFO mapred.JobClient: Total committed heap usage (bytes)=205225984
13/08/26 23:46:36 INFO mapred.JobClient: CPU time spent (ms)=8650
13/08/26 23:46:36 INFO mapred.JobClient: Combine input records=0
13/08/26 23:46:37 INFO mapred.JobClient: SPLIT_RAW_BYTES=160
13/08/26 23:46:37 INFO mapred.JobClient: Reduce input records=18846
13/08/26 23:46:37 INFO mapred.JobClient: Reduce input groups=18846
13/08/26 23:46:37 INFO mapred.JobClient: Combine output records=0
13/08/26 23:46:37 INFO mapred.JobClient: Physical memory (bytes) snapshot=313606144
13/08/26 23:46:37 INFO mapred.JobClient: Reduce output records=18846
13/08/26 23:46:37 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:46:37 INFO mapred.JobClient: Map output records=18846
13/08/26 23:46:37 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
13/08/26 23:46:37 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:46:37 INFO mapred.JobClient: Running job: job_201308212334_0060
13/08/26 23:46:38 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:46:56 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:47:14 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:47:19 INFO mapred.JobClient: Job complete: job_201308212334_0060
13/08/26 23:47:19 INFO mapred.JobClient: Counters: 29
13/08/26 23:47:19 INFO mapred.JobClient: Job Counters
13/08/26 23:47:19 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:47:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21504
13/08/26 23:47:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:47:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:47:19 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:47:19 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:47:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14273
13/08/26 23:47:19 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:47:19 INFO mapred.JobClient: Bytes Written=1890073
13/08/26 23:47:19 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:47:19 INFO mapred.JobClient: FILE_BYTES_READ=4880788
13/08/26 23:47:19 INFO mapred.JobClient: HDFS_BYTES_READ=29314271
13/08/26 23:47:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6235019
13/08/26 23:47:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1890073
13/08/26 23:47:19 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:47:19 INFO mapred.JobClient: Bytes Read=29314118
13/08/26 23:47:19 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:47:19 INFO mapred.JobClient: Map output materialized bytes=1309902
13/08/26 23:47:19 INFO mapred.JobClient: Map input records=18846
13/08/26 23:47:19 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:47:19 INFO mapred.JobClient: Spilled Records=442187
13/08/26 23:47:19 INFO mapred.JobClient: Map output bytes=31005336
13/08/26 23:47:19 INFO mapred.JobClient: Total committed heap usage (bytes)=176033792
13/08/26 23:47:19 INFO mapred.JobClient: CPU time spent (ms)=9210
13/08/26 23:47:19 INFO mapred.JobClient: Combine input records=2838837
13/08/26 23:47:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=153
13/08/26 23:47:19 INFO mapred.JobClient: Reduce input records=93564
13/08/26 23:47:19 INFO mapred.JobClient: Reduce input groups=93564
13/08/26 23:47:19 INFO mapred.JobClient: Combine output records=348623
13/08/26 23:47:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=284684288
13/08/26 23:47:19 INFO mapred.JobClient: Reduce output records=93564
13/08/26 23:47:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:47:19 INFO mapred.JobClient: Map output records=2583778
13/08/26 23:47:19 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:47:19 INFO mapred.JobClient: Running job: job_201308212334_0061
13/08/26 23:47:20 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:47:38 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:47:53 INFO mapred.JobClient: map 100% reduce 67%
13/08/26 23:47:59 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:48:04 INFO mapred.JobClient: Job complete: job_201308212334_0061
13/08/26 23:48:04 INFO mapred.JobClient: Counters: 29
13/08/26 23:48:04 INFO mapred.JobClient: Job Counters
13/08/26 23:48:04 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:48:04 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18292
13/08/26 23:48:04 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:48:04 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:48:04 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:48:04 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:48:04 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19293
13/08/26 23:48:04 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:48:04 INFO mapred.JobClient: Bytes Written=28689283
13/08/26 23:48:04 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:48:04 INFO mapred.JobClient: FILE_BYTES_READ=29059398
13/08/26 23:48:04 INFO mapred.JobClient: HDFS_BYTES_READ=31204324
13/08/26 23:48:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58165045
13/08/26 23:48:04 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283
13/08/26 23:48:04 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:48:04 INFO mapred.JobClient: Bytes Read=29314118
13/08/26 23:48:04 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:48:04 INFO mapred.JobClient: Map output materialized bytes=29059398
13/08/26 23:48:04 INFO mapred.JobClient: Map input records=18846
13/08/26 23:48:04 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:48:04 INFO mapred.JobClient: Spilled Records=37692
13/08/26 23:48:04 INFO mapred.JobClient: Map output bytes=28984080
13/08/26 23:48:04 INFO mapred.JobClient: Total committed heap usage (bytes)=205225984
13/08/26 23:48:04 INFO mapred.JobClient: CPU time spent (ms)=8770
13/08/26 23:48:04 INFO mapred.JobClient: Combine input records=0
13/08/26 23:48:04 INFO mapred.JobClient: SPLIT_RAW_BYTES=153
13/08/26 23:48:04 INFO mapred.JobClient: Reduce input records=18846
13/08/26 23:48:04 INFO mapred.JobClient: Reduce input groups=18846
13/08/26 23:48:04 INFO mapred.JobClient: Combine output records=0
13/08/26 23:48:04 INFO mapred.JobClient: Physical memory (bytes) snapshot=320401408
13/08/26 23:48:04 INFO mapred.JobClient: Reduce output records=18846
13/08/26 23:48:04 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:48:04 INFO mapred.JobClient: Map output records=18846
13/08/26 23:48:05 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:48:05 INFO mapred.JobClient: Running job: job_201308212334_0062
13/08/26 23:48:06 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:48:24 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:48:36 INFO mapred.JobClient: map 100% reduce 33%
13/08/26 23:48:39 INFO mapred.JobClient: map 100% reduce 86%
13/08/26 23:48:48 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:48:53 INFO mapred.JobClient: Job complete: job_201308212334_0062
13/08/26 23:48:53 INFO mapred.JobClient: Counters: 29
13/08/26 23:48:53 INFO mapred.JobClient: Job Counters
13/08/26 23:48:53 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:48:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18225
13/08/26 23:48:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:48:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:48:53 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:48:53 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:48:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=21045
13/08/26 23:48:53 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:48:53 INFO mapred.JobClient: Bytes Written=28689283
13/08/26 23:48:53 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:48:53 INFO mapred.JobClient: FILE_BYTES_READ=28437750
13/08/26 23:48:53 INFO mapred.JobClient: HDFS_BYTES_READ=28689443
13/08/26 23:48:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=56920127
13/08/26 23:48:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283
13/08/26 23:48:53 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:48:53 INFO mapred.JobClient: Bytes Read=28689283
13/08/26 23:48:53 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:48:53 INFO mapred.JobClient: Map output materialized bytes=28437750
13/08/26 23:48:53 INFO mapred.JobClient: Map input records=18846
13/08/26 23:48:53 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:48:53 INFO mapred.JobClient: Spilled Records=37692
13/08/26 23:48:53 INFO mapred.JobClient: Map output bytes=28362505
13/08/26 23:48:53 INFO mapred.JobClient: Total committed heap usage (bytes)=204603392
13/08/26 23:48:53 INFO mapred.JobClient: CPU time spent (ms)=8340
13/08/26 23:48:53 INFO mapred.JobClient: Combine input records=0
13/08/26 23:48:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=160
13/08/26 23:48:53 INFO mapred.JobClient: Reduce input records=18846
13/08/26 23:48:53 INFO mapred.JobClient: Reduce input groups=18846
13/08/26 23:48:53 INFO mapred.JobClient: Combine output records=0
13/08/26 23:48:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=313868288
13/08/26 23:48:53 INFO mapred.JobClient: Reduce output records=18846
13/08/26 23:48:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:48:53 INFO mapred.JobClient: Map output records=18846
13/08/26 23:48:53 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
13/08/26 23:48:53 INFO driver.MahoutDriver: Program took 339621 ms (Minutes: 5.66035)
+ echo 'Creating training and holdout set with a random 80-20 split of the generated vector dataset'
Creating training and holdout set with a random 80-20 split of the generated vector dataset
+ ./bin/mahout split -i /home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors --trainingOutput /home/mahout/mahout-work-mahout/20news-train-vectors --testOutput /home/mahout/mahout-work-mahout/20news-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:49:06 WARN driver.MahoutDriver: No split.props found on classpath, will use command-line arguments only
13/08/26 23:49:07 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors], --method=[sequential], --overwrite=null, --randomSelectionPct=[40], --sequenceFiles=null, --startPhase=[0], --tempDir=[temp], --testOutput=[/home/mahout/mahout-work-mahout/20news-test-vectors], --trainingOutput=[/home/mahout/mahout-work-mahout/20news-train-vectors]}
13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 has 162419 lines
13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 test split size is 64968 based on random selection percentage 40
13/08/26 23:49:11 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/26 23:49:11 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
13/08/26 23:49:16 INFO utils.SplitInput: file: part-r-00000, input: 162419 train: 11321, test: 7525 starting at 0
13/08/26 23:49:16 INFO driver.MahoutDriver: Program took 9786 ms (Minutes: 0.1631)
+ echo 'Training Naive Bayes model'
Training Naive Bayes model
+ ./bin/mahout trainnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -el -o /home/mahout/mahout-work-mahout/model -li /home/mahout/mahout-work-mahout/labelindex -ow
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:49:22 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only
13/08/26 23:49:22 INFO common.AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --output=[/home/mahout/mahout-work-mahout/model], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
13/08/26 23:49:23 INFO common.HadoopUtil: Deleting temp
13/08/26 23:49:23 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/26 23:49:23 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/08/26 23:49:23 INFO compress.CodecPool: Got brand-new decompressor
13/08/26 23:49:26 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:49:26 INFO mapred.JobClient: Running job: job_201308212334_0063
13/08/26 23:49:27 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:49:49 INFO mapred.JobClient: map 43% reduce 0%
13/08/26 23:49:52 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:50:13 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:50:18 INFO mapred.JobClient: Job complete: job_201308212334_0063
13/08/26 23:50:18 INFO mapred.JobClient: Counters: 29
13/08/26 23:50:18 INFO mapred.JobClient: Job Counters
13/08/26 23:50:18 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:50:18 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22816
13/08/26 23:50:18 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:50:18 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:50:18 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:50:18 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:50:18 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20680
13/08/26 23:50:18 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:50:18 INFO mapred.JobClient: Bytes Written=2718605
13/08/26 23:50:18 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:50:18 INFO mapred.JobClient: FILE_BYTES_READ=1404371
13/08/26 23:50:18 INFO mapred.JobClient: HDFS_BYTES_READ=12669237
13/08/26 23:50:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2854477
13/08/26 23:50:18 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2718605
13/08/26 23:50:18 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:50:18 INFO mapred.JobClient: Bytes Read=12668431
13/08/26 23:50:18 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:50:18 INFO mapred.JobClient: Map output materialized bytes=1404363
13/08/26 23:50:18 INFO mapred.JobClient: Map input records=11321
13/08/26 23:50:18 INFO mapred.JobClient: Reduce shuffle bytes=1404363
13/08/26 23:50:18 INFO mapred.JobClient: Spilled Records=40
13/08/26 23:50:18 INFO mapred.JobClient: Map output bytes=16682576
13/08/26 23:50:18 INFO mapred.JobClient: Total committed heap usage (bytes)=176164864
13/08/26 23:50:18 INFO mapred.JobClient: CPU time spent (ms)=8190
13/08/26 23:50:18 INFO mapred.JobClient: Combine input records=11321
13/08/26 23:50:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=148
13/08/26 23:50:18 INFO mapred.JobClient: Reduce input records=20
13/08/26 23:50:18 INFO mapred.JobClient: Reduce input groups=20
13/08/26 23:50:18 INFO mapred.JobClient: Combine output records=20
13/08/26 23:50:18 INFO mapred.JobClient: Physical memory (bytes) snapshot=294400000
13/08/26 23:50:18 INFO mapred.JobClient: Reduce output records=20
13/08/26 23:50:18 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1961967616
13/08/26 23:50:18 INFO mapred.JobClient: Map output records=11321
13/08/26 23:50:18 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:50:18 INFO mapred.JobClient: Running job: job_201308212334_0064
13/08/26 23:50:19 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:50:40 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:51:01 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:51:06 INFO mapred.JobClient: Job complete: job_201308212334_0064
13/08/26 23:51:06 INFO mapred.JobClient: Counters: 29
13/08/26 23:51:06 INFO mapred.JobClient: Job Counters
13/08/26 23:51:06 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:51:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=24609
13/08/26 23:51:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:51:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:51:06 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:51:06 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:51:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=15258
13/08/26 23:51:06 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:51:06 INFO mapred.JobClient: Bytes Written=893560
13/08/26 23:51:06 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:51:06 INFO mapred.JobClient: FILE_BYTES_READ=362674
13/08/26 23:51:06 INFO mapred.JobClient: HDFS_BYTES_READ=2718737
13/08/26 23:51:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=771195
13/08/26 23:51:06 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=893560
13/08/26 23:51:06 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:51:06 INFO mapred.JobClient: Bytes Read=2718605
13/08/26 23:51:06 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:51:06 INFO mapred.JobClient: Map output materialized bytes=362666
13/08/26 23:51:06 INFO mapred.JobClient: Map input records=20
13/08/26 23:51:06 INFO mapred.JobClient: Reduce shuffle bytes=362666
13/08/26 23:51:06 INFO mapred.JobClient: Spilled Records=4
13/08/26 23:51:06 INFO mapred.JobClient: Map output bytes=893434
13/08/26 23:51:06 INFO mapred.JobClient: Total committed heap usage (bytes)=223264768
13/08/26 23:51:06 INFO mapred.JobClient: CPU time spent (ms)=5370
13/08/26 23:51:06 INFO mapred.JobClient: Combine input records=2
13/08/26 23:51:06 INFO mapred.JobClient: SPLIT_RAW_BYTES=132
13/08/26 23:51:06 INFO mapred.JobClient: Reduce input records=2
13/08/26 23:51:06 INFO mapred.JobClient: Reduce input groups=2
13/08/26 23:51:06 INFO mapred.JobClient: Combine output records=2
13/08/26 23:51:06 INFO mapred.JobClient: Physical memory (bytes) snapshot=300597248
13/08/26 23:51:06 INFO mapred.JobClient: Reduce output records=2
13/08/26 23:51:06 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1961967616
13/08/26 23:51:06 INFO mapred.JobClient: Map output records=2
13/08/26 23:51:07 INFO driver.MahoutDriver: Program took 104944 ms (Minutes: 1.7490666666666668)
+ echo 'Self testing on training set'
Self testing on training set
+ ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:51:19 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
13/08/26 23:51:19 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
13/08/26 23:51:20 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:51:21 INFO mapred.JobClient: Running job: job_201308212334_0065
13/08/26 23:51:22 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:51:45 INFO mapred.JobClient: map 51% reduce 0%
13/08/26 23:51:48 INFO mapred.JobClient: map 89% reduce 0%
13/08/26 23:51:54 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:51:58 INFO mapred.JobClient: Job complete: job_201308212334_0065
13/08/26 23:51:58 INFO mapred.JobClient: Counters: 19
13/08/26 23:51:58 INFO mapred.JobClient: Job Counters
13/08/26 23:51:58 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=34216
13/08/26 23:51:58 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:51:58 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:51:58 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:51:58 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:51:58 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/08/26 23:51:58 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:51:58 INFO mapred.JobClient: Bytes Written=2132486
13/08/26 23:51:58 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:51:58 INFO mapred.JobClient: HDFS_BYTES_READ=16279896
13/08/26 23:51:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22523
13/08/26 23:51:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2132486
13/08/26 23:51:58 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:51:58 INFO mapred.JobClient: Bytes Read=12668431
13/08/26 23:51:58 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:51:58 INFO mapred.JobClient: Map input records=11321
13/08/26 23:51:58 INFO mapred.JobClient: Physical memory (bytes) snapshot=87547904
13/08/26 23:51:58 INFO mapred.JobClient: Spilled Records=0
13/08/26 23:51:58 INFO mapred.JobClient: CPU time spent (ms)=9380
13/08/26 23:51:58 INFO mapred.JobClient: Total committed heap usage (bytes)=28131328
13/08/26 23:51:58 INFO mapred.JobClient: Virtual memory (bytes) snapshot=976572416
13/08/26 23:51:58 INFO mapred.JobClient: Map output records=11321
13/08/26 23:51:58 INFO mapred.JobClient: SPLIT_RAW_BYTES=148
13/08/26 23:51:59 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 11256 99.4258%
Incorrectly Classified Instances : 65 0.5742%
Total Classified Instances : 11321 =======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classified as
454 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 | 458 a = alt.atheism
0 588 0 3 0 2 0 0 0 0 0 1 0 1 0 0 0 0 0 0 | 595 b = comp.graphics
0 3 553 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 563 c = comp.os.ms-windows.misc
0 0 0 592 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 | 595 d = comp.sys.ibm.pc.hardware
0 0 0 1 593 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 594 e = comp.sys.mac.hardware
0 2 0 1 0 576 1 0 0 0 0 0 0 0 1 0 0 0 0 0 | 581 f = comp.windows.x
0 1 0 0 0 0 579 0 0 0 0 0 1 0 0 0 0 0 0 0 | 581 g = misc.forsale
0 0 0 0 0 0 1 594 0 0 0 0 1 0 0 0 0 0 0 0 | 596 h = rec.autos
0 0 0 0 0 0 1 2 591 0 0 0 0 0 0 0 0 0 0 0 | 594 i = rec.motorcycles
0 0 0 0 0 0 0 0 0 615 1 0 0 0 0 0 0 0 0 0 | 616 j = rec.sport.baseball
0 0 0 0 0 0 1 0 0 1 581 0 0 0 0 0 0 0 0 0 | 583 k = rec.sport.hockey
0 0 1 0 0 0 0 0 0 0 0 627 1 0 0 0 0 1 0 0 | 630 l = sci.crypt
0 0 0 2 0 0 1 0 0 0 0 0 588 0 0 0 0 0 0 0 | 591 m = sci.electronics
0 1 0 0 0 0 0 0 0 0 0 0 0 586 1 0 0 0 0 0 | 588 n = sci.med
0 0 0 0 0 0 0 0 0 0 0 0 0 0 615 0 0 0 0 0 | 615 o = sci.space
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 619 1 0 0 0 | 620 p = soc.religion.christian
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 541 0 0 0 | 543 q = talk.politics.mideast
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 560 0 0 | 561 r = talk.politics.guns
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 1 351 0 | 359 s = talk.religion.misc
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 4 0 453 | 458 t = talk.politics.misc 13/08/26 23:51:59 INFO driver.MahoutDriver: Program took 40214 ms (Minutes: 0.6702333333333333)
+ echo 'Testing on holdout set'
Testing on holdout set
+ ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-test-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:52:09 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
13/08/26 23:52:09 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-test-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
13/08/26 23:52:10 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-testing
13/08/26 23:52:10 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:52:11 INFO mapred.JobClient: Running job: job_201308212334_0066
13/08/26 23:52:12 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:52:30 INFO mapred.JobClient: map 85% reduce 0%
13/08/26 23:52:36 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:52:41 INFO mapred.JobClient: Job complete: job_201308212334_0066
13/08/26 23:52:41 INFO mapred.JobClient: Counters: 19
13/08/26 23:52:41 INFO mapred.JobClient: Job Counters
13/08/26 23:52:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=25113
13/08/26 23:52:41 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:52:41 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:52:41 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:52:41 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:52:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/08/26 23:52:41 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:52:41 INFO mapred.JobClient: Bytes Written=1417942
13/08/26 23:52:41 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:52:41 INFO mapred.JobClient: HDFS_BYTES_READ=12148944
13/08/26 23:52:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22522
13/08/26 23:52:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1417942
13/08/26 23:52:41 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:52:41 INFO mapred.JobClient: Bytes Read=8537480
13/08/26 23:52:41 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:52:41 INFO mapred.JobClient: Map input records=7525
13/08/26 23:52:41 INFO mapred.JobClient: Physical memory (bytes) snapshot=85057536
13/08/26 23:52:41 INFO mapred.JobClient: Spilled Records=0
13/08/26 23:52:41 INFO mapred.JobClient: CPU time spent (ms)=6630
13/08/26 23:52:41 INFO mapred.JobClient: Total committed heap usage (bytes)=28155904
13/08/26 23:52:41 INFO mapred.JobClient: Virtual memory (bytes) snapshot=976572416
13/08/26 23:52:41 INFO mapred.JobClient: Map output records=7525
13/08/26 23:52:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=147
13/08/26 23:52:42 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 6801 90.3787%
Incorrectly Classified Instances : 724 9.6213%
Total Classified Instances : 7525 =======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classified as
318 0 0 0 1 0 0 0 1 0 0 0 0 0 1 4 0 0 15 1 | 341 a = alt.atheism
1 318 7 20 4 7 7 2 0 1 0 1 1 2 6 0 0 0 0 1 | 378 b = comp.graphics
0 25 277 78 12 15 5 0 0 0 0 2 4 0 1 0 0 0 0 3 | 422 c = comp.os.ms-windows.misc
1 4 3 336 20 3 8 0 0 0 0 1 11 0 0 0 0 0 0 0 | 387 d = comp.sys.ibm.pc.hardware
0 3 1 6 350 1 3 0 0 0 0 1 3 1 0 0 0 0 0 0 | 369 e = comp.sys.mac.hardware
1 20 3 6 7 365 3 0 0 0 0 1 0 0 0 0 1 0 0 0 | 407 f = comp.windows.x
0 1 1 19 8 0 329 13 1 0 0 2 14 0 4 0 0 1 1 0 | 394 g = misc.forsale
0 2 1 2 3 1 10 361 8 0 0 0 4 0 0 0 0 1 0 1 | 394 h = rec.autos
0 0 0 1 0 0 2 3 393 1 0 0 0 0 0 0 0 1 0 1 | 402 i = rec.motorcycles
0 0 0 1 0 0 2 3 0 360 6 0 2 2 1 0 0 0 0 1 | 378 j = rec.sport.baseball
0 1 0 2 1 0 0 0 2 5 401 0 1 0 0 1 0 0 0 2 | 416 k = rec.sport.hockey
1 1 0 1 3 2 1 1 0 0 0 344 1 1 2 0 1 1 0 1 | 361 l = sci.crypt
0 5 0 15 14 0 5 1 1 0 0 2 348 1 1 0 0 0 0 0 | 393 m = sci.electronics
1 2 1 1 1 0 1 0 0 1 0 1 4 381 5 0 0 1 1 1 | 402 n = sci.med
1 4 0 0 2 0 2 1 0 0 0 1 2 1 356 0 0 1 0 1 | 372 o = sci.space
5 0 0 1 1 0 0 1 0 0 1 0 0 1 0 359 3 0 4 1 | 377 p = soc.religion.christian
0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 2 389 0 0 2 | 397 q = talk.politics.mideast
0 0 1 0 1 1 0 1 0 0 0 2 1 1 0 0 0 335 0 6 | 349 r = talk.politics.guns
29 1 0 1 0 0 1 0 0 1 0 0 0 0 2 24 0 8 197 5 | 269 s = talk.religion.misc
2 0 0 0 2 0 0 1 0 1 1 1 0 1 2 0 2 17 3 284 | 317 t = talk.politics.misc 13/08/26 23:52:42 INFO driver.MahoutDriver: Program took 32480 ms (Minutes: 0.5413333333333333)

在job信息可以看到全部的任务信息,如下:


然后对照每个job信息,查看相应的mapper和reducer就可以分析这个算法了。

分享,快乐,成长

转载请注明出处:http://blog.csdn.net/fansy1990

mahout 运行Twenty Newsgroups Classification实例的更多相关文章

  1. Twenty Newsgroups Classification实例任务之TrainNaiveBayesJob(一)

    接着上篇blog,继续看log里面的信息如下: + echo 'Training Naive Bayes model' Training Naive Bayes model + ./bin/mahou ...

  2. Twenty Newsgroups Classification任务之二seq2sparse(5)

    接上篇blog,继续分析.接下来要调用代码如下: // Should document frequency features be processed if (shouldPrune || proce ...

  3. Twenty Newsgroups Classification任务之二seq2sparse(3)

    接上篇,如果想对上篇的问题进行测试其实可以简单的编写下面的代码: package mahout.fansy.test.bayes.write; import java.io.IOException; ...

  4. Twenty Newsgroups Classification任务之二seq2sparse

    seq2sparse对应于mahout中的org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles,从昨天跑的算法中的任务监控界面可以看到 ...

  5. Twenty Newsgroups Classification任务之二seq2sparse(2)

    接上篇,SequenceFileTokenizerMapper的输出文件在/home/mahout/mahout-work-mahout0/20news-vectors/tokenized-docum ...

  6. W3School-CSS 分类 (Classification) 实例

    CSS 分类 (Classification) 实例 CSS 实例 CSS 背景实例 CSS 文本实例 CSS 字体(font)实例 CSS 边框(border)实例 CSS 外边距 (margin) ...

  7. 在Linux(Centos7)系统上对进行Hadoop分布式配置以及运行Hadoop伪分布式实例

    在Linux(Centos7)系统上对进行Hadoop分布式配置以及运行Hadoop伪分布式实例                                                     ...

  8. [HBase Manual]CH5 HBase运行模式:单实例和分布式

    HBase运行模式:单实例和分布式 HBase运行模式:单实例和分布式 1.单实例模式 1.1 单实例在HDFS下 2.分布式 2.1 伪分布式 3完全分布式 HBase有2种运行模式,单实例和分布式 ...

  9. CSS 分类 (Classification) 实例

    CSS 分类 (Classification) 实例CSS 分类属性 (Classification)CSS 分类属性允许你控制如何显示元素,设置图像显示于另一元素中的何处,相对于其正常位置来定位元素 ...

随机推荐

  1. Windows字符集的统一与转换

    以前也零零散散看过一些字符编码的问题,今天看来这边博客,感觉很多东西都总结在里面,非常值得学习! 一.字符集的历史渊源 在Windows编程时经常会遇到编码转换的问题,一直以来让刚接触的人摸不着头脑. ...

  2. 新手须知设计的法则 Mark

    经常看到一些讲如何学习设计的文章,坦白讲感觉有些千篇一律.且不痛不痒,都说要看点书.学点画.练软件.多观察……唉,练软件这事还要说么,难道你还需要告诉一个人学开发是需要学习编程语言的? 学习是基于过往 ...

  3. BITED-Windows8应用开发学习札记之二:Win8应用常用视图设计

    感觉自我表述能力有欠缺,技术也不够硬,所以之后的Windows8应用开发学习札记的文章就偏向于一些我认为较难的地方和重点了多有抱歉. 上节课是入门,这节课就已经开始进行视图设计了. Windows应用 ...

  4. 非阻塞式socket的select()用法

    Select在Socket编程中还是比较重要的,可是对于初学Socket的人来说都不太爱用Select写程序,他们只 是习惯写诸如 connect.accept.recv或recvfrom这样的阻塞程 ...

  5. 客户端无法tcp连接上本地虚拟机的问题(最后是linux防火墙问题)

    刚装好裸的centos6.5,很多东西跟以前比都是没有的,所以做起来会遇到很多问题. 今天刚把svn 无法ci的问题解决了,起完服后,发现客户端连不上. 1)端口转发,查看了一下虚拟机的端口转发,发现 ...

  6. 【CLR】奇妙的String

    - 一.背景 1. 以下代码的HashCode是否相同,它们是否是同个对象: var A = "ab" + "c"; var B = "abc&quo ...

  7. 第二百零八天 how can I 坚持

    今天徐斌生日,生日快乐.买了两个小蛋糕,哈哈 还买了两条熊猫鱼.不知道鱼会不会冻死啊,买了加热器又不想用,看他们造化吧. LOL不错的游戏的. 睡觉,好冷.

  8. EntityFramwork6连接MySql错误

    EntityFramwork6连接MySql错误 使用EF6连接MySql产生Exception: ProHub.ssdl(2,2) : 错误 0152: MySql.Data.MySqlClient ...

  9. USB开发库文件分析

    stm32f10x_it.c: 该文件中包含 USB 中断服务程序,由于 USB 中断有很多情况,这里的中断服务程序只是调用 usb_Istr.c 文件中的 USB_Istr 函数,由 USB_Ist ...

  10. 修改hosts文件解决OneDrive被墙的问题

    增加如下内容就可以了.如果不知道修改hosts文件的具体方法请自行百度. 134.170.108.26 onedrive.live.com 134.170.108.152 skyapi.onedriv ...