Mahout之Navie Bayesian命令端运行
landen@landen-Lenovo:~/文档/20news$ mahout trainclassifier --help
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/home/landen/UntarFile/hadoop-1.0.4
No HADOOP_CONF_DIR set, using /home/landen/UntarFile/hadoop-1.0.4/conf
MAHOUT-JOB: /home/landen/UntarFile/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
Usage:
[--gramSize <gramSize> --help --input <input> --output <output>
--classifierType <classifierType> --dataSource <dataSource> --alpha <a> --minDf
<minDf> --minSupport <minSupport> --skipCleanup]
Options
--gramSize (-ng) gramSize Size of the n-gram. Default Value:
1
--help (-h) Print out help
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for output.
--classifierType (-type) classifierType Type of classifier: bayes|cbayes.
Default: bayes
--dataSource (-source) dataSource Location of model: hdfs. Default
Value: hdfs
--alpha (-a) a Smoothing parameter Default Value:
1.0
--minDf (-mf) minDf Minimum Term Document Frequency: 1
--minSupport (-ms) minSupport Minimum Support (Term Frequency):
1
--skipCleanup (-sc) Skip cleanup of feature extraction
output
13/07/12 16:32:22 INFO driver.MahoutDriver: Program took 52 ms (Minutes: 9.5E-4)
landen@landen-Lenovo:~/文档/20news$ mahout testclassifier --help
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/home/landen/UntarFile/hadoop-1.0.4
No HADOOP_CONF_DIR set, using /home/landen/UntarFile/hadoop-1.0.4/conf
MAHOUT-JOB: /home/landen/UntarFile/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
Usage:
[--defaultCat <defaultCat> --testDir <testDir> --encoding <encoding>
--gramSize <gramSize> --model <model> --classifierType <classifierType>
--dataSource <dataSource> --help --method <method> --verbose --alpha <a>
--confusionMatrix <confusionMatrix>]
Options
--defaultCat (-default) defaultCat The default category Default
Value: unknown
--testDir (-d) testDir The directory where test documents
resides in
--encoding (-e) encoding The file encoding. Defaults to
UTF-8
--gramSize (-ng) gramSize Size of the n-gram. Default Value:
1
--model (-m) model The path on HDFS as defined by the
-source parameter
--classifierType (-type) classifierType Type of classifier: bayes|cbayes.
Default Value: bayes
--dataSource (-source) dataSource Location of model: hdfs
--help (-h) Print out help
--method (-method) method Method of Classification:
sequential|mapreduce. Default
Value: mapreduce
--verbose (-v) Output which values were correctly
and incorrectly classified
--alpha (-a) a Smoothing parameter Default Value:
1.0
--confusionMatrix (-cm) confusionMatrix Export ConfusionMatrix as
SequenceFile
13/07/12 16:32:37 INFO driver.MahoutDriver: Program took 42 ms (Minutes: 7.0E-4)
landen@landen-Lenovo:~/文档/20news$ hadoop fs -ls /20news
Warning: $HADOOP_HOME is deprecated.
Found 3 items
drwxr-xr-x - landen supergroup 0 2013-07-11 17:16 /20news/20news-test
drwxr-xr-x - landen supergroup 0 2013-07-11 17:16 /20news/20news-train
drwxr-xr-x - landen supergroup 0 2013-07-11 21:54 /20news/model
landen@landen-Lenovo:~/文档/20news$ mahout testclassifier -m /20news/model -d /20news/20news-test -type bayes -ng 3 -source hdfs -method mapreduce
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/home/landen/UntarFile/hadoop-1.0.4
No HADOOP_CONF_DIR set, using /home/landen/UntarFile/hadoop-1.0.4/conf
MAHOUT-JOB: /home/landen/UntarFile/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
13/07/12 16:39:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 16:40:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/07/12 16:40:00 WARN snappy.LoadSnappy: Snappy native library not loaded
13/07/12 16:40:00 INFO mapred.FileInputFormat: Total input paths to process : 20
13/07/12 16:40:01 INFO mapred.JobClient: Running job: job_201307111633_0009
13/07/12 16:40:02 INFO mapred.JobClient: map 0% reduce 0%
13/07/12 16:43:18 INFO mapred.JobClient: map 3% reduce 0%
13/07/12 16:43:22 INFO mapred.JobClient: map 5% reduce 0%
13/07/12 16:43:28 INFO mapred.JobClient: map 6% reduce 0%
13/07/12 16:43:37 INFO mapred.JobClient: map 8% reduce 0%
13/07/12 16:43:42 INFO mapred.JobClient: map 4% reduce 0%
13/07/12 16:43:56 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000001_0, Status : FAILED
13/07/12 16:44:06 INFO mapred.JobClient: map 5% reduce 1%
13/07/12 16:44:13 INFO mapred.JobClient: map 6% reduce 1%
13/07/12 16:44:23 INFO mapred.JobClient: map 7% reduce 1%
13/07/12 16:44:29 INFO mapred.JobClient: map 8% reduce 1%
13/07/12 16:44:35 INFO mapred.JobClient: map 11% reduce 1%
13/07/12 16:44:38 INFO mapred.JobClient: map 12% reduce 1%
13/07/12 16:44:44 INFO mapred.JobClient: map 13% reduce 1%
13/07/12 16:44:47 INFO mapred.JobClient: map 9% reduce 1%
13/07/12 16:44:53 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000002_0, Status : FAILED
Error: Java heap space
attempt_201307111633_0009_m_000002_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).
attempt_201307111633_0009_m_000002_0: log4j:WARN Please initialize the log4j system properly.
13/07/12 16:45:03 INFO mapred.JobClient: map 9% reduce 3%
13/07/12 16:45:28 INFO mapred.JobClient: map 14% reduce 3%
13/07/12 16:45:31 INFO mapred.JobClient: map 17% reduce 3%
13/07/12 16:45:34 INFO mapred.JobClient: map 20% reduce 3%
13/07/12 16:45:37 INFO mapred.JobClient: map 20% reduce 5%
13/07/12 16:45:46 INFO mapred.JobClient: map 20% reduce 6%
13/07/12 16:45:55 INFO mapred.JobClient: map 22% reduce 6%
13/07/12 16:45:58 INFO mapred.JobClient: map 24% reduce 6%
13/07/12 16:46:01 INFO mapred.JobClient: map 25% reduce 6%
13/07/12 16:46:07 INFO mapred.JobClient: map 25% reduce 8%
13/07/12 16:46:22 INFO mapred.JobClient: map 26% reduce 8%
13/07/12 16:46:25 INFO mapred.JobClient: map 27% reduce 8%
13/07/12 16:46:31 INFO mapred.JobClient: map 28% reduce 8%
13/07/12 16:46:40 INFO mapred.JobClient: map 29% reduce 8%
13/07/12 16:47:04 INFO mapred.JobClient: map 30% reduce 8%
13/07/12 16:47:16 INFO mapred.JobClient: map 30% reduce 10%
13/07/12 16:47:32 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000007_0, Status : FAILED
Error: Java heap space
13/07/12 16:47:56 INFO mapred.JobClient: map 34% reduce 10%
13/07/12 16:48:13 INFO mapred.JobClient: map 34% reduce 11%
13/07/12 16:48:19 INFO mapred.JobClient: map 39% reduce 11%
13/07/12 16:48:22 INFO mapred.JobClient: map 40% reduce 11%
13/07/12 16:48:34 INFO mapred.JobClient: map 40% reduce 13%
13/07/12 16:48:43 INFO mapred.JobClient: map 44% reduce 13%
13/07/12 16:48:46 INFO mapred.JobClient: map 45% reduce 13%
13/07/12 16:48:58 INFO mapred.JobClient: map 45% reduce 15%
13/07/12 16:49:04 INFO mapred.JobClient: map 48% reduce 15%
13/07/12 16:49:07 INFO mapred.JobClient: map 50% reduce 15%
13/07/12 16:49:13 INFO mapred.JobClient: map 50% reduce 16%
13/07/12 16:49:25 INFO mapred.JobClient: map 53% reduce 16%
13/07/12 16:49:28 INFO mapred.JobClient: map 54% reduce 16%
13/07/12 16:49:43 INFO mapred.JobClient: map 59% reduce 18%
13/07/12 16:49:58 INFO mapred.JobClient: map 59% reduce 20%
13/07/12 16:50:04 INFO mapred.JobClient: map 64% reduce 20%
13/07/12 16:50:13 INFO mapred.JobClient: map 64% reduce 21%
13/07/12 16:50:25 INFO mapred.JobClient: map 69% reduce 21%
13/07/12 16:50:43 INFO mapred.JobClient: map 69% reduce 23%
13/07/12 16:50:46 INFO mapred.JobClient: map 73% reduce 23%
13/07/12 16:50:49 INFO mapred.JobClient: map 75% reduce 23%
13/07/12 16:50:58 INFO mapred.JobClient: map 75% reduce 25%
13/07/12 16:51:08 INFO mapred.JobClient: map 78% reduce 25%
13/07/12 16:51:11 INFO mapred.JobClient: map 80% reduce 25%
13/07/12 16:51:23 INFO mapred.JobClient: map 80% reduce 26%
13/07/12 16:51:29 INFO mapred.JobClient: map 83% reduce 26%
13/07/12 16:51:32 INFO mapred.JobClient: map 85% reduce 26%
13/07/12 16:51:44 INFO mapred.JobClient: map 85% reduce 28%
13/07/12 16:51:50 INFO mapred.JobClient: map 89% reduce 28%
13/07/12 16:51:53 INFO mapred.JobClient: map 90% reduce 28%
13/07/12 16:52:14 INFO mapred.JobClient: map 90% reduce 30%
13/07/12 16:52:20 INFO mapred.JobClient: map 95% reduce 30%
13/07/12 16:52:26 INFO mapred.JobClient: map 95% reduce 31%
13/07/12 16:52:49 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000004_0, Status : FAILED
org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException: EEXIST: 文件已存在
at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:167)
at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:312)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:385)
at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: EEXIST: 文件已存在
at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
... 7 more
attempt_201307111633_0009_m_000004_0: Exception in thread "Thread for syncLogs" java.lang.OutOfMemoryError: Java heap space
attempt_201307111633_0009_m_000004_0: at java.util.Arrays.copyOfRange(Arrays.java:2694)
attempt_201307111633_0009_m_000004_0: at java.lang.String.<init>(String.java:203)
attempt_201307111633_0009_m_000004_0: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread for syncLogs"
13/07/12 16:53:02 INFO mapred.JobClient: map 97% reduce 31%
13/07/12 16:53:05 INFO mapred.JobClient: map 95% reduce 31%
13/07/12 16:53:10 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000004_1, Status : FAILED
Error: Java heap space
13/07/12 16:53:20 INFO mapred.JobClient: map 96% reduce 31%
13/07/12 16:53:23 INFO mapred.JobClient: map 98% reduce 31%
13/07/12 16:53:26 INFO mapred.JobClient: map 100% reduce 31%
13/07/12 16:53:35 INFO mapred.JobClient: map 100% reduce 100%
13/07/12 16:53:41 INFO mapred.JobClient: Job complete: job_201307111633_0009
13/07/12 16:53:41 INFO mapred.JobClient: Counters: 30
13/07/12 16:53:41 INFO mapred.JobClient: Job Counters
13/07/12 16:53:41 INFO mapred.JobClient: Launched reduce tasks=1
13/07/12 16:53:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1153539
13/07/12 16:53:41 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 16:53:41 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 16:53:41 INFO mapred.JobClient: Launched map tasks=25
13/07/12 16:53:41 INFO mapred.JobClient: Data-local map tasks=25
13/07/12 16:53:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=596582
13/07/12 16:53:41 INFO mapred.JobClient: File Input Format Counters
13/07/12 16:53:41 INFO mapred.JobClient: Bytes Read=10399829
13/07/12 16:53:41 INFO mapred.JobClient: File Output Format Counters
13/07/12 16:53:41 INFO mapred.JobClient: Bytes Written=13482
13/07/12 16:53:41 INFO mapred.JobClient: FileSystemCounters
13/07/12 16:53:41 INFO mapred.JobClient: FILE_BYTES_READ=11889
13/07/12 16:53:41 INFO mapred.JobClient: HDFS_BYTES_READ=421848302
13/07/12 16:53:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=497127
13/07/12 16:53:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=13482
13/07/12 16:53:41 INFO mapred.JobClient: Map-Reduce Framework
13/07/12 16:53:41 INFO mapred.JobClient: Map output materialized bytes=12003
13/07/12 16:53:41 INFO mapred.JobClient: Map input records=7532
13/07/12 16:53:41 INFO mapred.JobClient: Reduce shuffle bytes=11395
13/07/12 16:53:41 INFO mapred.JobClient: Spilled Records=460
13/07/12 16:53:41 INFO mapred.JobClient: Map output bytes=377830
13/07/12 16:53:41 INFO mapred.JobClient: Total committed heap usage (bytes)=2999517184
13/07/12 16:53:41 INFO mapred.JobClient: CPU time spent (ms)=293160
13/07/12 16:53:41 INFO mapred.JobClient: Map input bytes=10399829
13/07/12 16:53:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=2273
13/07/12 16:53:41 INFO mapred.JobClient: Combine input records=7532
13/07/12 16:53:41 INFO mapred.JobClient: Reduce input records=230
13/07/12 16:53:41 INFO mapred.JobClient: Reduce input groups=230
13/07/12 16:53:41 INFO mapred.JobClient: Combine output records=230
13/07/12 16:53:41 INFO mapred.JobClient: Physical memory (bytes) snapshot=3793125376
13/07/12 16:53:41 INFO mapred.JobClient: Reduce output records=230
13/07/12 16:53:41 INFO mapred.JobClient: Virtual memory (bytes) snapshot=8323325952
13/07/12 16:53:41 INFO mapred.JobClient: Map output records=7532
13/07/12 16:53:43 INFO bayes.BayesClassifierDriver: =======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classified as
381 0 0 0 0 9 2 0 1 0 1 0 1 0 0 0 0 0 3 0 | 398 a = rec.motorcycles
1 284 0 0 0 1 4 0 6 2 11 0 3 65 0 0 5 0 3 10 | 395 b = comp.windows.x
1 0 340 3 0 2 6 1 0 0 0 0 1 1 12 0 7 0 2 0 | 376 c = talk.politics.mideast
4 0 1 330 0 2 2 0 0 2 1 1 3 0 1 3 12 0 2 0 | 364 d = talk.politics.guns
3 0 4 31 37 6 9 1 0 10 0 0 0 6 93 9 6 36 0 0 | 251 e = talk.religion.misc
7 0 0 0 0 361 2 2 0 1 3 0 6 1 0 1 0 0 11 1 | 396 f = rec.autos
0 0 0 0 0 1 383 9 1 0 0 0 0 0 0 0 0 0 3 0 | 397 g = rec.sport.baseball
1 0 0 0 0 0 8 382 1 0 0 0 2 1 1 0 2 0 1 0 | 399 h = rec.sport.hockey
1 0 0 0 0 3 3 0 335 4 5 0 10 4 0 0 2 0 10 8 | 385 i = comp.sys.mac.hardware
0 3 0 0 0 0 1 0 0 367 0 0 5 10 1 3 2 0 2 0 | 394 j = sci.space
0 0 0 0 0 2 1 0 27 1 300 0 19 11 0 0 0 0 11 20 | 392 k = comp.sys.ibm.pc.hardware
6 0 2 110 0 6 11 4 1 14 0 104 2 1 11 10 26 1 1 0 | 310 l = talk.politics.misc
6 0 1 0 0 4 1 0 8 2 16 0 314 9 0 4 15 0 5 8 | 393 m = sci.electronics
0 13 1 0 0 2 6 0 11 5 11 0 11 304 0 2 10 0 5 8 | 389 n = comp.graphics
2 0 0 0 0 0 5 1 0 2 1 0 1 3 373 5 0 2 1 2 | 398 o = soc.religion.christian
3 0 0 1 0 2 3 3 2 3 2 0 12 10 8 337 1 0 9 0 | 396 p = sci.med
0 1 0 1 0 0 4 0 3 0 1 0 3 8 0 2 370 0 2 1 | 396 q = sci.crypt
9 0 4 10 1 4 6 1 2 4 2 0 0 2 77 14 12 170 0 1 | 319 r = alt.atheism
4 0 0 0 0 9 1 1 9 1 12 0 6 3 0 2 0 0 340 2 | 390 s = misc.forsale
6 5 0 0 0 1 8 0 8 5 50 0 2 39 1 0 8 0 3 258 | 394 t = comp.os.ms-windows.misc
13/07/12 16:53:43 INFO driver.MahoutDriver: Program took 824521 ms (Minutes: 13.742016666666666)
landen@landen-Lenovo:~/文档/20news$
Mahout之Navie Bayesian命令端运行的更多相关文章
- jmeter命令行运行-分布式测试
上一篇文章我们说到了jmeter命令行运行但是是单节点下的, jmeter底层用java开发,耗内存.cpu,如果项目要求大并发去压测服务端的话,jmeter单节点难以完成大并发的请求,这时就需要对j ...
- jmeter命令行运行-单节点
jmeter有自己的GUI页面,但是当线程数很多或者现在有很多的测试场景都是基于linux下进行压测,这时我们可以使用jmeter的命令行方式来执行测试,该篇文章介绍jmeter单节点命令运行方式. ...
- 如何在命令行里运行python脚本
python是一款应用非常广泛的脚本程序语言,谷歌公司的网页就是用python编写.python在生物信息.统计.网页制作.计算等多个领域都体现出了强大的功能.python和其他脚本语言如java.R ...
- Linux的watch命令 — 实时监测命令的运行结果
Linux的watch命令 — 实时监测命令的运行结果 watch 是一个非常实用的命令,基本所有的 Linux 发行版都带有这个小工具,如同名字一样,watch 可以帮你监测一个命令的运行结果,省得 ...
- 从命令行运行django数据库操作
从命令行运行django数据库操作,报错: django.core.exceptions.ImproperlyConfigured: Requested setting DEFAULT_INDEX_T ...
- python命令行运行在win和Linux系统的不同
今天,在完成一个小的python习题,习题的主要内容是读取一个帮助模块,并保存到本地文件. 知道是用pydoc进行模块的读取,但是在windows系统下,调用os模块之后,结果总是为空. 核心语句: ...
- 用DOS命令来运行Java代码
用DOS命令来运行Java代码.. ----------------- Demo.java public class Demo { public static void main(String[] a ...
- WPF C# 命令的运行机制
1.概述 1.1 WPF C# 命令的本质 命令是 WPF 中的输入机制,它提供的输入处理比设备输入具有更高的语义级别. 例如,在许多应用程序中都能找到的“复制”.“剪切”和“粘贴”操作就是命令. W ...
- 含有package关键字的java文件在命令行运行报错
程序中含有package关键字,使用命令行运行程序时出现"找不到或无法加载主类",而使用Eclipse软件可以正常运行程序的可能解决办法. 在包下的类,在Java源文件的地方编译后 ...
随机推荐
- Spark快速入门(1)
1 安装Spark 首先,到 https://spark.apache.org/downloads.html 选择最新的 Spark 版本和 Hadoop 版本(实际上我们暂时用不上 Hadoop,所 ...
- 【英语】Bingo口语笔记(64) - Beat系列
- [转载] FFmpeg API 变更记录
最近一两年内FFmpeg项目发展的速度很快,本来是一件好事.但是随之而来的问题就是其API(接口函数)一直在发生变动.这么一来基于旧一点版本的FFmpeg的程序的代码在最新的类库上可能就跑不通了. 例 ...
- CentOS 安装BitTorrent Sync详细步骤
官网:https://www.getsync.com 这个软件安装完后通过网页浏览器设置共享目录并生成同步Secret,异地的客户端可以通过这个同步Secret访问共的目录,其中有读写和只读两种同步方 ...
- linux shell中的 #!/bin/bash
#!/bin/bash是指此脚本使用/bin/bash来解释执行. 其中,#!是一个特殊的表示符,其后,跟着解释此脚本的shell路径. bash只是shell的一种,还有很多其它shell,如:sh ...
- MYSQL中delete删除多表数据
MYSQL中delete删除多表数据 DELETE删除多表数据,怎样才能同时删除多个关联表的数据呢?这里做了深入的解释: 1. delete from t1 where 条件 2.delete t1 ...
- Unicode和汉字编码小知识
Unicode和汉字编码小知识 将汉字进行UNICODE编码,如:“王”编码后就成了“\王”,UNICODE字符以\u开始,后面有4个数字或者字母,所有字符都是16进制的数字,每两位表示的256以内的 ...
- Web通信之:长轮询(long-polling)(转)
Web通信之:长轮询(long-polling) “轮询”是个耐人寻味的词,第一次看到它的时候我就直接理解为“轮流查询”了.但是看到了英文才知道这个是网络通信专业的术语.轮询,其实就是一群人在排队买东 ...
- selenium python (三)鼠标事件
# -*- coding: utf-8 -*-#鼠标事件 #ActionChains类中包括: # context_click() 右击: ...
- Selenium2Library系列 keywords 之 _SelectElementKeywords 之 select_all_from_list(self, locator)
def select_all_from_list(self, locator): """Selects all values from multi-select list ...