按照mahout官网https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups的说法,我只用运行一条命令就可以完成这个算法的调用了,如下:

mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh 

但是,我首先运行就出错了,因为我不是root账户,所以先改下路径,打开classify-20newsgroups.sh,替换/tmp/mahout-work-为/home/mahout/mahout-work-,这样用户mahout就具有了操作权限,但是还是出错,提示curl 找不到命令,好吧,我没安装这个,sudo apt-get install curl,ok ,ubuntu还是方便呀。

然后再运行,结果运行到2/3时候还是出错,然后我查看详细信息,居然map输入的数据条数为0?啥意思?好吧,应该是本地文件操作和HDFS文件操作混淆了,其实在执行:

+ ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq

这一步前应该把本地的20news-all上传到HDFS文件系统上面,然后重新执行第一条命令即可,全部信息如下(太多了,不知道贴的完不?):

mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh
Please select a number to choose the corresponding task to run
1. cnaivebayes
2. naivebayes
3. sgd
4. clean -- cleans up the work area in /home/mahout/mahout-work-mahout
Enter your choice : 2
ok. You chose 2 and we'll use naivebayes
creating work directory at /home/mahout/mahout-work-mahout
+ echo 'Preparing 20newsgroups data'
Preparing 20newsgroups data
+ rm -rf /home/mahout/mahout-work-mahout/20news-all
+ mkdir /home/mahout/mahout-work-mahout/20news-all
+ cp -R /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.religion.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.religion.misc /home/mahout/mahout-work-mahout/20news-all
+ echo 'Creating sequence files from 20newsgroups data'
Creating sequence files from 20newsgroups data
+ ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:38:49 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/home/mahout/mahout-work-mahout/20news-all], --keyPrefix=[], --output=[/home/mahout/mahout-work-mahout/20news-seq], --startPhase=[0], --tempDir=[temp]}
13/08/26 23:42:57 INFO driver.MahoutDriver: Program took 248530 ms (Minutes: 4.142166666666666)
+ echo 'Converting sequence files to vectors'
Converting sequence files to vectors
+ ./bin/mahout seq2sparse -i /home/mahout/mahout-work-mahout/20news-seq -o /home/mahout/mahout-work-mahout/20news-vectors -lnorm -nv -wt tfidf
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
13/08/26 23:43:17 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:43:17 INFO mapred.JobClient: Running job: job_201308212334_0056
13/08/26 23:43:18 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:43:45 INFO mapred.JobClient: map 78% reduce 0%
13/08/26 23:43:51 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:43:56 INFO mapred.JobClient: Job complete: job_201308212334_0056
13/08/26 23:43:56 INFO mapred.JobClient: Counters: 19
13/08/26 23:43:56 INFO mapred.JobClient: Job Counters
13/08/26 23:43:56 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=32883
13/08/26 23:43:56 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:43:56 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:43:56 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:43:56 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:43:56 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/08/26 23:43:56 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:43:56 INFO mapred.JobClient: Bytes Written=27503580
13/08/26 23:43:56 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:43:56 INFO mapred.JobClient: HDFS_BYTES_READ=36694022
13/08/26 23:43:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21899
13/08/26 23:43:56 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=27503580
13/08/26 23:43:56 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:43:56 INFO mapred.JobClient: Bytes Read=36693889
13/08/26 23:43:56 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:43:56 INFO mapred.JobClient: Map input records=18846
13/08/26 23:43:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=75157504
13/08/26 23:43:56 INFO mapred.JobClient: Spilled Records=0
13/08/26 23:43:56 INFO mapred.JobClient: CPU time spent (ms)=5730
13/08/26 23:43:56 INFO mapred.JobClient: Total committed heap usage (bytes)=15859712
13/08/26 23:43:56 INFO mapred.JobClient: Virtual memory (bytes) snapshot=974381056
13/08/26 23:43:56 INFO mapred.JobClient: Map output records=18846
13/08/26 23:43:56 INFO mapred.JobClient: SPLIT_RAW_BYTES=133
13/08/26 23:43:56 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:43:56 INFO mapred.JobClient: Running job: job_201308212334_0057
13/08/26 23:43:57 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:44:15 INFO mapred.JobClient: map 3% reduce 0%
13/08/26 23:44:18 INFO mapred.JobClient: map 23% reduce 0%
13/08/26 23:44:21 INFO mapred.JobClient: map 60% reduce 0%
13/08/26 23:44:24 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:44:48 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:44:53 INFO mapred.JobClient: Job complete: job_201308212334_0057
13/08/26 23:44:53 INFO mapred.JobClient: Counters: 29
13/08/26 23:44:53 INFO mapred.JobClient: Job Counters
13/08/26 23:44:53 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:44:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=31312
13/08/26 23:44:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:44:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:44:53 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:44:53 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:44:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18422
13/08/26 23:44:53 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:44:53 INFO mapred.JobClient: Bytes Written=2315037
13/08/26 23:44:53 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:44:53 INFO mapred.JobClient: FILE_BYTES_READ=11857906
13/08/26 23:44:53 INFO mapred.JobClient: HDFS_BYTES_READ=27503742
13/08/26 23:44:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=15440401
13/08/26 23:44:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2315037
13/08/26 23:44:53 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:44:53 INFO mapred.JobClient: Bytes Read=27503580
13/08/26 23:44:53 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:44:53 INFO mapred.JobClient: Map output materialized bytes=3538084
13/08/26 23:44:53 INFO mapred.JobClient: Map input records=18846
13/08/26 23:44:53 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:44:53 INFO mapred.JobClient: Spilled Records=849345
13/08/26 23:44:53 INFO mapred.JobClient: Map output bytes=39462740
13/08/26 23:44:53 INFO mapred.JobClient: Total committed heap usage (bytes)=176033792
13/08/26 23:44:53 INFO mapred.JobClient: CPU time spent (ms)=14080
13/08/26 23:44:53 INFO mapred.JobClient: Combine input records=3026242
13/08/26 23:44:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=162
13/08/26 23:44:53 INFO mapred.JobClient: Reduce input records=192904
13/08/26 23:44:53 INFO mapred.JobClient: Reduce input groups=192904
13/08/26 23:44:53 INFO mapred.JobClient: Combine output records=554873
13/08/26 23:44:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=283111424
13/08/26 23:44:53 INFO mapred.JobClient: Reduce output records=93563
13/08/26 23:44:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:44:53 INFO mapred.JobClient: Map output records=2664273
13/08/26 23:44:54 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:44:55 INFO mapred.JobClient: Running job: job_201308212334_0058
13/08/26 23:44:56 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:45:13 INFO mapred.JobClient: map 94% reduce 0%
13/08/26 23:45:16 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:45:43 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:45:48 INFO mapred.JobClient: Job complete: job_201308212334_0058
13/08/26 23:45:48 INFO mapred.JobClient: Counters: 29
13/08/26 23:45:48 INFO mapred.JobClient: Job Counters
13/08/26 23:45:48 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:45:48 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21298
13/08/26 23:45:48 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:45:48 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:45:48 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:45:48 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:45:48 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=24763
13/08/26 23:45:48 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:45:48 INFO mapred.JobClient: Bytes Written=29314118
13/08/26 23:45:48 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:45:48 INFO mapred.JobClient: FILE_BYTES_READ=27274291
13/08/26 23:45:48 INFO mapred.JobClient: HDFS_BYTES_READ=29440826
13/08/26 23:45:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=54595105
13/08/26 23:45:48 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118
13/08/26 23:45:48 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:45:48 INFO mapred.JobClient: Bytes Read=27503580
13/08/26 23:45:48 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:45:48 INFO mapred.JobClient: Map output materialized bytes=27274291
13/08/26 23:45:48 INFO mapred.JobClient: Map input records=18846
13/08/26 23:45:48 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:45:48 INFO mapred.JobClient: Spilled Records=37692
13/08/26 23:45:48 INFO mapred.JobClient: Map output bytes=27199343
13/08/26 23:45:48 INFO mapred.JobClient: Total committed heap usage (bytes)=215695360
13/08/26 23:45:48 INFO mapred.JobClient: CPU time spent (ms)=12980
13/08/26 23:45:48 INFO mapred.JobClient: Combine input records=0
13/08/26 23:45:48 INFO mapred.JobClient: SPLIT_RAW_BYTES=162
13/08/26 23:45:48 INFO mapred.JobClient: Reduce input records=18846
13/08/26 23:45:48 INFO mapred.JobClient: Reduce input groups=18846
13/08/26 23:45:48 INFO mapred.JobClient: Combine output records=0
13/08/26 23:45:48 INFO mapred.JobClient: Physical memory (bytes) snapshot=332349440
13/08/26 23:45:48 INFO mapred.JobClient: Reduce output records=18846
13/08/26 23:45:48 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:45:48 INFO mapred.JobClient: Map output records=18846
13/08/26 23:45:49 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:45:49 INFO mapred.JobClient: Running job: job_201308212334_0059
13/08/26 23:45:50 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:46:10 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:46:25 INFO mapred.JobClient: map 100% reduce 92%
13/08/26 23:46:31 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:46:36 INFO mapred.JobClient: Job complete: job_201308212334_0059
13/08/26 23:46:36 INFO mapred.JobClient: Counters: 29
13/08/26 23:46:36 INFO mapred.JobClient: Job Counters
13/08/26 23:46:36 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:46:36 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18217
13/08/26 23:46:36 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:46:36 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:46:36 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:46:36 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:46:36 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20981
13/08/26 23:46:36 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:46:36 INFO mapred.JobClient: Bytes Written=29314118
13/08/26 23:46:36 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:46:36 INFO mapred.JobClient: FILE_BYTES_READ=29059398
13/08/26 23:46:36 INFO mapred.JobClient: HDFS_BYTES_READ=29314278
13/08/26 23:46:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58163419
13/08/26 23:46:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118
13/08/26 23:46:36 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:46:36 INFO mapred.JobClient: Bytes Read=29314118
13/08/26 23:46:36 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:46:36 INFO mapred.JobClient: Map output materialized bytes=29059398
13/08/26 23:46:36 INFO mapred.JobClient: Map input records=18846
13/08/26 23:46:36 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:46:36 INFO mapred.JobClient: Spilled Records=37692
13/08/26 23:46:36 INFO mapred.JobClient: Map output bytes=28984080
13/08/26 23:46:36 INFO mapred.JobClient: Total committed heap usage (bytes)=205225984
13/08/26 23:46:36 INFO mapred.JobClient: CPU time spent (ms)=8650
13/08/26 23:46:36 INFO mapred.JobClient: Combine input records=0
13/08/26 23:46:37 INFO mapred.JobClient: SPLIT_RAW_BYTES=160
13/08/26 23:46:37 INFO mapred.JobClient: Reduce input records=18846
13/08/26 23:46:37 INFO mapred.JobClient: Reduce input groups=18846
13/08/26 23:46:37 INFO mapred.JobClient: Combine output records=0
13/08/26 23:46:37 INFO mapred.JobClient: Physical memory (bytes) snapshot=313606144
13/08/26 23:46:37 INFO mapred.JobClient: Reduce output records=18846
13/08/26 23:46:37 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:46:37 INFO mapred.JobClient: Map output records=18846
13/08/26 23:46:37 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
13/08/26 23:46:37 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:46:37 INFO mapred.JobClient: Running job: job_201308212334_0060
13/08/26 23:46:38 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:46:56 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:47:14 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:47:19 INFO mapred.JobClient: Job complete: job_201308212334_0060
13/08/26 23:47:19 INFO mapred.JobClient: Counters: 29
13/08/26 23:47:19 INFO mapred.JobClient: Job Counters
13/08/26 23:47:19 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:47:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21504
13/08/26 23:47:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:47:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:47:19 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:47:19 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:47:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14273
13/08/26 23:47:19 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:47:19 INFO mapred.JobClient: Bytes Written=1890073
13/08/26 23:47:19 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:47:19 INFO mapred.JobClient: FILE_BYTES_READ=4880788
13/08/26 23:47:19 INFO mapred.JobClient: HDFS_BYTES_READ=29314271
13/08/26 23:47:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6235019
13/08/26 23:47:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1890073
13/08/26 23:47:19 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:47:19 INFO mapred.JobClient: Bytes Read=29314118
13/08/26 23:47:19 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:47:19 INFO mapred.JobClient: Map output materialized bytes=1309902
13/08/26 23:47:19 INFO mapred.JobClient: Map input records=18846
13/08/26 23:47:19 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:47:19 INFO mapred.JobClient: Spilled Records=442187
13/08/26 23:47:19 INFO mapred.JobClient: Map output bytes=31005336
13/08/26 23:47:19 INFO mapred.JobClient: Total committed heap usage (bytes)=176033792
13/08/26 23:47:19 INFO mapred.JobClient: CPU time spent (ms)=9210
13/08/26 23:47:19 INFO mapred.JobClient: Combine input records=2838837
13/08/26 23:47:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=153
13/08/26 23:47:19 INFO mapred.JobClient: Reduce input records=93564
13/08/26 23:47:19 INFO mapred.JobClient: Reduce input groups=93564
13/08/26 23:47:19 INFO mapred.JobClient: Combine output records=348623
13/08/26 23:47:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=284684288
13/08/26 23:47:19 INFO mapred.JobClient: Reduce output records=93564
13/08/26 23:47:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:47:19 INFO mapred.JobClient: Map output records=2583778
13/08/26 23:47:19 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:47:19 INFO mapred.JobClient: Running job: job_201308212334_0061
13/08/26 23:47:20 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:47:38 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:47:53 INFO mapred.JobClient: map 100% reduce 67%
13/08/26 23:47:59 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:48:04 INFO mapred.JobClient: Job complete: job_201308212334_0061
13/08/26 23:48:04 INFO mapred.JobClient: Counters: 29
13/08/26 23:48:04 INFO mapred.JobClient: Job Counters
13/08/26 23:48:04 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:48:04 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18292
13/08/26 23:48:04 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:48:04 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:48:04 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:48:04 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:48:04 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19293
13/08/26 23:48:04 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:48:04 INFO mapred.JobClient: Bytes Written=28689283
13/08/26 23:48:04 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:48:04 INFO mapred.JobClient: FILE_BYTES_READ=29059398
13/08/26 23:48:04 INFO mapred.JobClient: HDFS_BYTES_READ=31204324
13/08/26 23:48:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58165045
13/08/26 23:48:04 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283
13/08/26 23:48:04 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:48:04 INFO mapred.JobClient: Bytes Read=29314118
13/08/26 23:48:04 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:48:04 INFO mapred.JobClient: Map output materialized bytes=29059398
13/08/26 23:48:04 INFO mapred.JobClient: Map input records=18846
13/08/26 23:48:04 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:48:04 INFO mapred.JobClient: Spilled Records=37692
13/08/26 23:48:04 INFO mapred.JobClient: Map output bytes=28984080
13/08/26 23:48:04 INFO mapred.JobClient: Total committed heap usage (bytes)=205225984
13/08/26 23:48:04 INFO mapred.JobClient: CPU time spent (ms)=8770
13/08/26 23:48:04 INFO mapred.JobClient: Combine input records=0
13/08/26 23:48:04 INFO mapred.JobClient: SPLIT_RAW_BYTES=153
13/08/26 23:48:04 INFO mapred.JobClient: Reduce input records=18846
13/08/26 23:48:04 INFO mapred.JobClient: Reduce input groups=18846
13/08/26 23:48:04 INFO mapred.JobClient: Combine output records=0
13/08/26 23:48:04 INFO mapred.JobClient: Physical memory (bytes) snapshot=320401408
13/08/26 23:48:04 INFO mapred.JobClient: Reduce output records=18846
13/08/26 23:48:04 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:48:04 INFO mapred.JobClient: Map output records=18846
13/08/26 23:48:05 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:48:05 INFO mapred.JobClient: Running job: job_201308212334_0062
13/08/26 23:48:06 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:48:24 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:48:36 INFO mapred.JobClient: map 100% reduce 33%
13/08/26 23:48:39 INFO mapred.JobClient: map 100% reduce 86%
13/08/26 23:48:48 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:48:53 INFO mapred.JobClient: Job complete: job_201308212334_0062
13/08/26 23:48:53 INFO mapred.JobClient: Counters: 29
13/08/26 23:48:53 INFO mapred.JobClient: Job Counters
13/08/26 23:48:53 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:48:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18225
13/08/26 23:48:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:48:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:48:53 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:48:53 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:48:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=21045
13/08/26 23:48:53 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:48:53 INFO mapred.JobClient: Bytes Written=28689283
13/08/26 23:48:53 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:48:53 INFO mapred.JobClient: FILE_BYTES_READ=28437750
13/08/26 23:48:53 INFO mapred.JobClient: HDFS_BYTES_READ=28689443
13/08/26 23:48:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=56920127
13/08/26 23:48:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283
13/08/26 23:48:53 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:48:53 INFO mapred.JobClient: Bytes Read=28689283
13/08/26 23:48:53 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:48:53 INFO mapred.JobClient: Map output materialized bytes=28437750
13/08/26 23:48:53 INFO mapred.JobClient: Map input records=18846
13/08/26 23:48:53 INFO mapred.JobClient: Reduce shuffle bytes=0
13/08/26 23:48:53 INFO mapred.JobClient: Spilled Records=37692
13/08/26 23:48:53 INFO mapred.JobClient: Map output bytes=28362505
13/08/26 23:48:53 INFO mapred.JobClient: Total committed heap usage (bytes)=204603392
13/08/26 23:48:53 INFO mapred.JobClient: CPU time spent (ms)=8340
13/08/26 23:48:53 INFO mapred.JobClient: Combine input records=0
13/08/26 23:48:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=160
13/08/26 23:48:53 INFO mapred.JobClient: Reduce input records=18846
13/08/26 23:48:53 INFO mapred.JobClient: Reduce input groups=18846
13/08/26 23:48:53 INFO mapred.JobClient: Combine output records=0
13/08/26 23:48:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=313868288
13/08/26 23:48:53 INFO mapred.JobClient: Reduce output records=18846
13/08/26 23:48:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896
13/08/26 23:48:53 INFO mapred.JobClient: Map output records=18846
13/08/26 23:48:53 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0
13/08/26 23:48:53 INFO driver.MahoutDriver: Program took 339621 ms (Minutes: 5.66035)
+ echo 'Creating training and holdout set with a random 80-20 split of the generated vector dataset'
Creating training and holdout set with a random 80-20 split of the generated vector dataset
+ ./bin/mahout split -i /home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors --trainingOutput /home/mahout/mahout-work-mahout/20news-train-vectors --testOutput /home/mahout/mahout-work-mahout/20news-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:49:06 WARN driver.MahoutDriver: No split.props found on classpath, will use command-line arguments only
13/08/26 23:49:07 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors], --method=[sequential], --overwrite=null, --randomSelectionPct=[40], --sequenceFiles=null, --startPhase=[0], --tempDir=[temp], --testOutput=[/home/mahout/mahout-work-mahout/20news-test-vectors], --trainingOutput=[/home/mahout/mahout-work-mahout/20news-train-vectors]}
13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 has 162419 lines
13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 test split size is 64968 based on random selection percentage 40
13/08/26 23:49:11 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/26 23:49:11 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor
13/08/26 23:49:16 INFO utils.SplitInput: file: part-r-00000, input: 162419 train: 11321, test: 7525 starting at 0
13/08/26 23:49:16 INFO driver.MahoutDriver: Program took 9786 ms (Minutes: 0.1631)
+ echo 'Training Naive Bayes model'
Training Naive Bayes model
+ ./bin/mahout trainnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -el -o /home/mahout/mahout-work-mahout/model -li /home/mahout/mahout-work-mahout/labelindex -ow
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:49:22 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only
13/08/26 23:49:22 INFO common.AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --output=[/home/mahout/mahout-work-mahout/model], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
13/08/26 23:49:23 INFO common.HadoopUtil: Deleting temp
13/08/26 23:49:23 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/26 23:49:23 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/08/26 23:49:23 INFO compress.CodecPool: Got brand-new decompressor
13/08/26 23:49:26 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:49:26 INFO mapred.JobClient: Running job: job_201308212334_0063
13/08/26 23:49:27 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:49:49 INFO mapred.JobClient: map 43% reduce 0%
13/08/26 23:49:52 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:50:13 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:50:18 INFO mapred.JobClient: Job complete: job_201308212334_0063
13/08/26 23:50:18 INFO mapred.JobClient: Counters: 29
13/08/26 23:50:18 INFO mapred.JobClient: Job Counters
13/08/26 23:50:18 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:50:18 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22816
13/08/26 23:50:18 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:50:18 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:50:18 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:50:18 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:50:18 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20680
13/08/26 23:50:18 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:50:18 INFO mapred.JobClient: Bytes Written=2718605
13/08/26 23:50:18 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:50:18 INFO mapred.JobClient: FILE_BYTES_READ=1404371
13/08/26 23:50:18 INFO mapred.JobClient: HDFS_BYTES_READ=12669237
13/08/26 23:50:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2854477
13/08/26 23:50:18 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2718605
13/08/26 23:50:18 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:50:18 INFO mapred.JobClient: Bytes Read=12668431
13/08/26 23:50:18 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:50:18 INFO mapred.JobClient: Map output materialized bytes=1404363
13/08/26 23:50:18 INFO mapred.JobClient: Map input records=11321
13/08/26 23:50:18 INFO mapred.JobClient: Reduce shuffle bytes=1404363
13/08/26 23:50:18 INFO mapred.JobClient: Spilled Records=40
13/08/26 23:50:18 INFO mapred.JobClient: Map output bytes=16682576
13/08/26 23:50:18 INFO mapred.JobClient: Total committed heap usage (bytes)=176164864
13/08/26 23:50:18 INFO mapred.JobClient: CPU time spent (ms)=8190
13/08/26 23:50:18 INFO mapred.JobClient: Combine input records=11321
13/08/26 23:50:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=148
13/08/26 23:50:18 INFO mapred.JobClient: Reduce input records=20
13/08/26 23:50:18 INFO mapred.JobClient: Reduce input groups=20
13/08/26 23:50:18 INFO mapred.JobClient: Combine output records=20
13/08/26 23:50:18 INFO mapred.JobClient: Physical memory (bytes) snapshot=294400000
13/08/26 23:50:18 INFO mapred.JobClient: Reduce output records=20
13/08/26 23:50:18 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1961967616
13/08/26 23:50:18 INFO mapred.JobClient: Map output records=11321
13/08/26 23:50:18 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:50:18 INFO mapred.JobClient: Running job: job_201308212334_0064
13/08/26 23:50:19 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:50:40 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:51:01 INFO mapred.JobClient: map 100% reduce 100%
13/08/26 23:51:06 INFO mapred.JobClient: Job complete: job_201308212334_0064
13/08/26 23:51:06 INFO mapred.JobClient: Counters: 29
13/08/26 23:51:06 INFO mapred.JobClient: Job Counters
13/08/26 23:51:06 INFO mapred.JobClient: Launched reduce tasks=1
13/08/26 23:51:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=24609
13/08/26 23:51:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:51:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:51:06 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:51:06 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:51:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=15258
13/08/26 23:51:06 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:51:06 INFO mapred.JobClient: Bytes Written=893560
13/08/26 23:51:06 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:51:06 INFO mapred.JobClient: FILE_BYTES_READ=362674
13/08/26 23:51:06 INFO mapred.JobClient: HDFS_BYTES_READ=2718737
13/08/26 23:51:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=771195
13/08/26 23:51:06 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=893560
13/08/26 23:51:06 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:51:06 INFO mapred.JobClient: Bytes Read=2718605
13/08/26 23:51:06 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:51:06 INFO mapred.JobClient: Map output materialized bytes=362666
13/08/26 23:51:06 INFO mapred.JobClient: Map input records=20
13/08/26 23:51:06 INFO mapred.JobClient: Reduce shuffle bytes=362666
13/08/26 23:51:06 INFO mapred.JobClient: Spilled Records=4
13/08/26 23:51:06 INFO mapred.JobClient: Map output bytes=893434
13/08/26 23:51:06 INFO mapred.JobClient: Total committed heap usage (bytes)=223264768
13/08/26 23:51:06 INFO mapred.JobClient: CPU time spent (ms)=5370
13/08/26 23:51:06 INFO mapred.JobClient: Combine input records=2
13/08/26 23:51:06 INFO mapred.JobClient: SPLIT_RAW_BYTES=132
13/08/26 23:51:06 INFO mapred.JobClient: Reduce input records=2
13/08/26 23:51:06 INFO mapred.JobClient: Reduce input groups=2
13/08/26 23:51:06 INFO mapred.JobClient: Combine output records=2
13/08/26 23:51:06 INFO mapred.JobClient: Physical memory (bytes) snapshot=300597248
13/08/26 23:51:06 INFO mapred.JobClient: Reduce output records=2
13/08/26 23:51:06 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1961967616
13/08/26 23:51:06 INFO mapred.JobClient: Map output records=2
13/08/26 23:51:07 INFO driver.MahoutDriver: Program took 104944 ms (Minutes: 1.7490666666666668)
+ echo 'Self testing on training set'
Self testing on training set
+ ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:51:19 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
13/08/26 23:51:19 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
13/08/26 23:51:20 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:51:21 INFO mapred.JobClient: Running job: job_201308212334_0065
13/08/26 23:51:22 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:51:45 INFO mapred.JobClient: map 51% reduce 0%
13/08/26 23:51:48 INFO mapred.JobClient: map 89% reduce 0%
13/08/26 23:51:54 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:51:58 INFO mapred.JobClient: Job complete: job_201308212334_0065
13/08/26 23:51:58 INFO mapred.JobClient: Counters: 19
13/08/26 23:51:58 INFO mapred.JobClient: Job Counters
13/08/26 23:51:58 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=34216
13/08/26 23:51:58 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:51:58 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:51:58 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:51:58 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:51:58 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/08/26 23:51:58 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:51:58 INFO mapred.JobClient: Bytes Written=2132486
13/08/26 23:51:58 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:51:58 INFO mapred.JobClient: HDFS_BYTES_READ=16279896
13/08/26 23:51:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22523
13/08/26 23:51:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2132486
13/08/26 23:51:58 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:51:58 INFO mapred.JobClient: Bytes Read=12668431
13/08/26 23:51:58 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:51:58 INFO mapred.JobClient: Map input records=11321
13/08/26 23:51:58 INFO mapred.JobClient: Physical memory (bytes) snapshot=87547904
13/08/26 23:51:58 INFO mapred.JobClient: Spilled Records=0
13/08/26 23:51:58 INFO mapred.JobClient: CPU time spent (ms)=9380
13/08/26 23:51:58 INFO mapred.JobClient: Total committed heap usage (bytes)=28131328
13/08/26 23:51:58 INFO mapred.JobClient: Virtual memory (bytes) snapshot=976572416
13/08/26 23:51:58 INFO mapred.JobClient: Map output records=11321
13/08/26 23:51:58 INFO mapred.JobClient: SPLIT_RAW_BYTES=148
13/08/26 23:51:59 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 11256 99.4258%
Incorrectly Classified Instances : 65 0.5742%
Total Classified Instances : 11321 =======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classified as
454 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 | 458 a = alt.atheism
0 588 0 3 0 2 0 0 0 0 0 1 0 1 0 0 0 0 0 0 | 595 b = comp.graphics
0 3 553 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 563 c = comp.os.ms-windows.misc
0 0 0 592 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 | 595 d = comp.sys.ibm.pc.hardware
0 0 0 1 593 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 594 e = comp.sys.mac.hardware
0 2 0 1 0 576 1 0 0 0 0 0 0 0 1 0 0 0 0 0 | 581 f = comp.windows.x
0 1 0 0 0 0 579 0 0 0 0 0 1 0 0 0 0 0 0 0 | 581 g = misc.forsale
0 0 0 0 0 0 1 594 0 0 0 0 1 0 0 0 0 0 0 0 | 596 h = rec.autos
0 0 0 0 0 0 1 2 591 0 0 0 0 0 0 0 0 0 0 0 | 594 i = rec.motorcycles
0 0 0 0 0 0 0 0 0 615 1 0 0 0 0 0 0 0 0 0 | 616 j = rec.sport.baseball
0 0 0 0 0 0 1 0 0 1 581 0 0 0 0 0 0 0 0 0 | 583 k = rec.sport.hockey
0 0 1 0 0 0 0 0 0 0 0 627 1 0 0 0 0 1 0 0 | 630 l = sci.crypt
0 0 0 2 0 0 1 0 0 0 0 0 588 0 0 0 0 0 0 0 | 591 m = sci.electronics
0 1 0 0 0 0 0 0 0 0 0 0 0 586 1 0 0 0 0 0 | 588 n = sci.med
0 0 0 0 0 0 0 0 0 0 0 0 0 0 615 0 0 0 0 0 | 615 o = sci.space
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 619 1 0 0 0 | 620 p = soc.religion.christian
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 541 0 0 0 | 543 q = talk.politics.mideast
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 560 0 0 | 561 r = talk.politics.guns
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 1 351 0 | 359 s = talk.religion.misc
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 4 0 453 | 458 t = talk.politics.misc 13/08/26 23:51:59 INFO driver.MahoutDriver: Program took 40214 ms (Minutes: 0.6702333333333333)
+ echo 'Testing on holdout set'
Testing on holdout set
+ ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-test-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing
Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/08/26 23:52:09 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
13/08/26 23:52:09 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-test-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
13/08/26 23:52:10 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-testing
13/08/26 23:52:10 INFO input.FileInputFormat: Total input paths to process : 1
13/08/26 23:52:11 INFO mapred.JobClient: Running job: job_201308212334_0066
13/08/26 23:52:12 INFO mapred.JobClient: map 0% reduce 0%
13/08/26 23:52:30 INFO mapred.JobClient: map 85% reduce 0%
13/08/26 23:52:36 INFO mapred.JobClient: map 100% reduce 0%
13/08/26 23:52:41 INFO mapred.JobClient: Job complete: job_201308212334_0066
13/08/26 23:52:41 INFO mapred.JobClient: Counters: 19
13/08/26 23:52:41 INFO mapred.JobClient: Job Counters
13/08/26 23:52:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=25113
13/08/26 23:52:41 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/26 23:52:41 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/26 23:52:41 INFO mapred.JobClient: Launched map tasks=1
13/08/26 23:52:41 INFO mapred.JobClient: Data-local map tasks=1
13/08/26 23:52:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/08/26 23:52:41 INFO mapred.JobClient: File Output Format Counters
13/08/26 23:52:41 INFO mapred.JobClient: Bytes Written=1417942
13/08/26 23:52:41 INFO mapred.JobClient: FileSystemCounters
13/08/26 23:52:41 INFO mapred.JobClient: HDFS_BYTES_READ=12148944
13/08/26 23:52:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22522
13/08/26 23:52:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1417942
13/08/26 23:52:41 INFO mapred.JobClient: File Input Format Counters
13/08/26 23:52:41 INFO mapred.JobClient: Bytes Read=8537480
13/08/26 23:52:41 INFO mapred.JobClient: Map-Reduce Framework
13/08/26 23:52:41 INFO mapred.JobClient: Map input records=7525
13/08/26 23:52:41 INFO mapred.JobClient: Physical memory (bytes) snapshot=85057536
13/08/26 23:52:41 INFO mapred.JobClient: Spilled Records=0
13/08/26 23:52:41 INFO mapred.JobClient: CPU time spent (ms)=6630
13/08/26 23:52:41 INFO mapred.JobClient: Total committed heap usage (bytes)=28155904
13/08/26 23:52:41 INFO mapred.JobClient: Virtual memory (bytes) snapshot=976572416
13/08/26 23:52:41 INFO mapred.JobClient: Map output records=7525
13/08/26 23:52:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=147
13/08/26 23:52:42 INFO test.TestNaiveBayesDriver: Standard NB Results: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 6801 90.3787%
Incorrectly Classified Instances : 724 9.6213%
Total Classified Instances : 7525 =======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classified as
318 0 0 0 1 0 0 0 1 0 0 0 0 0 1 4 0 0 15 1 | 341 a = alt.atheism
1 318 7 20 4 7 7 2 0 1 0 1 1 2 6 0 0 0 0 1 | 378 b = comp.graphics
0 25 277 78 12 15 5 0 0 0 0 2 4 0 1 0 0 0 0 3 | 422 c = comp.os.ms-windows.misc
1 4 3 336 20 3 8 0 0 0 0 1 11 0 0 0 0 0 0 0 | 387 d = comp.sys.ibm.pc.hardware
0 3 1 6 350 1 3 0 0 0 0 1 3 1 0 0 0 0 0 0 | 369 e = comp.sys.mac.hardware
1 20 3 6 7 365 3 0 0 0 0 1 0 0 0 0 1 0 0 0 | 407 f = comp.windows.x
0 1 1 19 8 0 329 13 1 0 0 2 14 0 4 0 0 1 1 0 | 394 g = misc.forsale
0 2 1 2 3 1 10 361 8 0 0 0 4 0 0 0 0 1 0 1 | 394 h = rec.autos
0 0 0 1 0 0 2 3 393 1 0 0 0 0 0 0 0 1 0 1 | 402 i = rec.motorcycles
0 0 0 1 0 0 2 3 0 360 6 0 2 2 1 0 0 0 0 1 | 378 j = rec.sport.baseball
0 1 0 2 1 0 0 0 2 5 401 0 1 0 0 1 0 0 0 2 | 416 k = rec.sport.hockey
1 1 0 1 3 2 1 1 0 0 0 344 1 1 2 0 1 1 0 1 | 361 l = sci.crypt
0 5 0 15 14 0 5 1 1 0 0 2 348 1 1 0 0 0 0 0 | 393 m = sci.electronics
1 2 1 1 1 0 1 0 0 1 0 1 4 381 5 0 0 1 1 1 | 402 n = sci.med
1 4 0 0 2 0 2 1 0 0 0 1 2 1 356 0 0 1 0 1 | 372 o = sci.space
5 0 0 1 1 0 0 1 0 0 1 0 0 1 0 359 3 0 4 1 | 377 p = soc.religion.christian
0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 2 389 0 0 2 | 397 q = talk.politics.mideast
0 0 1 0 1 1 0 1 0 0 0 2 1 1 0 0 0 335 0 6 | 349 r = talk.politics.guns
29 1 0 1 0 0 1 0 0 1 0 0 0 0 2 24 0 8 197 5 | 269 s = talk.religion.misc
2 0 0 0 2 0 0 1 0 1 1 1 0 1 2 0 2 17 3 284 | 317 t = talk.politics.misc 13/08/26 23:52:42 INFO driver.MahoutDriver: Program took 32480 ms (Minutes: 0.5413333333333333)

在job信息可以看到全部的任务信息,如下:


然后对照每个job信息,查看相应的mapper和reducer就可以分析这个算法了。

分享,快乐,成长

转载请注明出处:http://blog.csdn.net/fansy1990

mahout 运行Twenty Newsgroups Classification实例的更多相关文章

  1. Twenty Newsgroups Classification实例任务之TrainNaiveBayesJob(一)

    接着上篇blog,继续看log里面的信息如下: + echo 'Training Naive Bayes model' Training Naive Bayes model + ./bin/mahou ...

  2. Twenty Newsgroups Classification任务之二seq2sparse(5)

    接上篇blog,继续分析.接下来要调用代码如下: // Should document frequency features be processed if (shouldPrune || proce ...

  3. Twenty Newsgroups Classification任务之二seq2sparse(3)

    接上篇,如果想对上篇的问题进行测试其实可以简单的编写下面的代码: package mahout.fansy.test.bayes.write; import java.io.IOException; ...

  4. Twenty Newsgroups Classification任务之二seq2sparse

    seq2sparse对应于mahout中的org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles,从昨天跑的算法中的任务监控界面可以看到 ...

  5. Twenty Newsgroups Classification任务之二seq2sparse(2)

    接上篇,SequenceFileTokenizerMapper的输出文件在/home/mahout/mahout-work-mahout0/20news-vectors/tokenized-docum ...

  6. W3School-CSS 分类 (Classification) 实例

    CSS 分类 (Classification) 实例 CSS 实例 CSS 背景实例 CSS 文本实例 CSS 字体(font)实例 CSS 边框(border)实例 CSS 外边距 (margin) ...

  7. 在Linux(Centos7)系统上对进行Hadoop分布式配置以及运行Hadoop伪分布式实例

    在Linux(Centos7)系统上对进行Hadoop分布式配置以及运行Hadoop伪分布式实例                                                     ...

  8. [HBase Manual]CH5 HBase运行模式:单实例和分布式

    HBase运行模式:单实例和分布式 HBase运行模式:单实例和分布式 1.单实例模式 1.1 单实例在HDFS下 2.分布式 2.1 伪分布式 3完全分布式 HBase有2种运行模式,单实例和分布式 ...

  9. CSS 分类 (Classification) 实例

    CSS 分类 (Classification) 实例CSS 分类属性 (Classification)CSS 分类属性允许你控制如何显示元素,设置图像显示于另一元素中的何处,相对于其正常位置来定位元素 ...

随机推荐

  1. unity3d Human skin real time rendering 真实模拟人皮实时渲染(转)

    先放出结果图片...由于网上下的模型是拼的,所以眼皮,脸颊,嘴唇看起来像 存在裂痕,解决方式是加入曲面细分和置换贴图 进行一定隆起,但是博主试了一下fragment shader的曲面细分,虽然细分成 ...

  2. CSS学习进度备忘

    书签:“CSS 高级”跳过:另外跳过的内容有待跟进 __________________ 学习资源:W3School. _________________ 跳过的内容:1.“CSS id 选择器”的“ ...

  3. Python单例模式研究

    方法一 import threading class Singleton(object): __instance = None __lock = threading.Lock()   # used t ...

  4. MFC学习20160718(GetModuleFileName&amp;&amp;GetAppDataPath)

    1.标题栏设置 一.对话框标题栏内容为静态 直接在对话框属性“General”的“Caption”中修改. 二.对话框标题栏内容为动态生成的 在对应对话框的初始化函数OnInitDialog()中添加 ...

  5. JavaScript高级程序设计(第三版)第四章 变量,作用域和内存问题

    JavaScript变量可以用来保存两种类型的值:基本类型值和引用类型值.基本类型值和引用类型值具有以下特点: 基本类型值在内存中占据固定大小的空间,因此被保存在栈内存中: 从一个变量向另一个变量复制 ...

  6. Python进程和线程

    引入进程和线程的概念及区别 1.线程的基本概念 概念 线程是进程中执行运算的最小单位,是进程中的一个实体,是被系统独立调度和分派的基本单位,线程自己不拥有系统资源,只拥有一点在运行中必不可少的资源,但 ...

  7. webstorm+nodejs+JetBrains IDE Support+chrome打造前端开发神器

    #webstorm+nodejs+JetBrains IDE Support+chrome打造前端开发神器 -- 工欲善其事 必先利其器 ##各工具介绍 `webstorm`是**JetBrains* ...

  8. 初识Rest、JSR、JCP、JAX-RS及Jersey

    REST:即表述性状态传递(英文:Representational State Transfer,简称REST)是一种分布式应用的架构风格,也是一种大流量分布式应用的设计方法论. JSR是Java S ...

  9. ets dets

    相同点:ets和dets都提供“键—值”搜索表 不同点:ets驻留在内存,dets驻留在磁盘 特点:ets表和dets表可以被多个进程共享,因此通过这两个模块可以实现数据间的交换 一  ets表 实现 ...

  10. Hadoop的分布模式安装

      1.确定集群的结构 IP(主机名) 角色 192.168.1.220(hadoop0) NameNode.JobTracker 192.168.1.221(hadoop1) SecondaryNa ...