Mahout之Navie Bayesian命令端运行
landen@landen-Lenovo:~/文档/20news$ mahout trainclassifier --help
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/home/landen/UntarFile/hadoop-1.0.4
No HADOOP_CONF_DIR set, using /home/landen/UntarFile/hadoop-1.0.4/conf
MAHOUT-JOB: /home/landen/UntarFile/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
Usage:
[--gramSize <gramSize> --help --input <input> --output <output>
--classifierType <classifierType> --dataSource <dataSource> --alpha <a> --minDf
<minDf> --minSupport <minSupport> --skipCleanup]
Options
--gramSize (-ng) gramSize Size of the n-gram. Default Value:
1
--help (-h) Print out help
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for output.
--classifierType (-type) classifierType Type of classifier: bayes|cbayes.
Default: bayes
--dataSource (-source) dataSource Location of model: hdfs. Default
Value: hdfs
--alpha (-a) a Smoothing parameter Default Value:
1.0
--minDf (-mf) minDf Minimum Term Document Frequency: 1
--minSupport (-ms) minSupport Minimum Support (Term Frequency):
1
--skipCleanup (-sc) Skip cleanup of feature extraction
output
13/07/12 16:32:22 INFO driver.MahoutDriver: Program took 52 ms (Minutes: 9.5E-4)
landen@landen-Lenovo:~/文档/20news$ mahout testclassifier --help
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/home/landen/UntarFile/hadoop-1.0.4
No HADOOP_CONF_DIR set, using /home/landen/UntarFile/hadoop-1.0.4/conf
MAHOUT-JOB: /home/landen/UntarFile/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
Usage:
[--defaultCat <defaultCat> --testDir <testDir> --encoding <encoding>
--gramSize <gramSize> --model <model> --classifierType <classifierType>
--dataSource <dataSource> --help --method <method> --verbose --alpha <a>
--confusionMatrix <confusionMatrix>]
Options
--defaultCat (-default) defaultCat The default category Default
Value: unknown
--testDir (-d) testDir The directory where test documents
resides in
--encoding (-e) encoding The file encoding. Defaults to
UTF-8
--gramSize (-ng) gramSize Size of the n-gram. Default Value:
1
--model (-m) model The path on HDFS as defined by the
-source parameter
--classifierType (-type) classifierType Type of classifier: bayes|cbayes.
Default Value: bayes
--dataSource (-source) dataSource Location of model: hdfs
--help (-h) Print out help
--method (-method) method Method of Classification:
sequential|mapreduce. Default
Value: mapreduce
--verbose (-v) Output which values were correctly
and incorrectly classified
--alpha (-a) a Smoothing parameter Default Value:
1.0
--confusionMatrix (-cm) confusionMatrix Export ConfusionMatrix as
SequenceFile
13/07/12 16:32:37 INFO driver.MahoutDriver: Program took 42 ms (Minutes: 7.0E-4)
landen@landen-Lenovo:~/文档/20news$ hadoop fs -ls /20news
Warning: $HADOOP_HOME is deprecated.
Found 3 items
drwxr-xr-x - landen supergroup 0 2013-07-11 17:16 /20news/20news-test
drwxr-xr-x - landen supergroup 0 2013-07-11 17:16 /20news/20news-train
drwxr-xr-x - landen supergroup 0 2013-07-11 21:54 /20news/model
landen@landen-Lenovo:~/文档/20news$ mahout testclassifier -m /20news/model -d /20news/20news-test -type bayes -ng 3 -source hdfs -method mapreduce
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/home/landen/UntarFile/hadoop-1.0.4
No HADOOP_CONF_DIR set, using /home/landen/UntarFile/hadoop-1.0.4/conf
MAHOUT-JOB: /home/landen/UntarFile/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
13/07/12 16:39:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 16:40:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/07/12 16:40:00 WARN snappy.LoadSnappy: Snappy native library not loaded
13/07/12 16:40:00 INFO mapred.FileInputFormat: Total input paths to process : 20
13/07/12 16:40:01 INFO mapred.JobClient: Running job: job_201307111633_0009
13/07/12 16:40:02 INFO mapred.JobClient: map 0% reduce 0%
13/07/12 16:43:18 INFO mapred.JobClient: map 3% reduce 0%
13/07/12 16:43:22 INFO mapred.JobClient: map 5% reduce 0%
13/07/12 16:43:28 INFO mapred.JobClient: map 6% reduce 0%
13/07/12 16:43:37 INFO mapred.JobClient: map 8% reduce 0%
13/07/12 16:43:42 INFO mapred.JobClient: map 4% reduce 0%
13/07/12 16:43:56 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000001_0, Status : FAILED
13/07/12 16:44:06 INFO mapred.JobClient: map 5% reduce 1%
13/07/12 16:44:13 INFO mapred.JobClient: map 6% reduce 1%
13/07/12 16:44:23 INFO mapred.JobClient: map 7% reduce 1%
13/07/12 16:44:29 INFO mapred.JobClient: map 8% reduce 1%
13/07/12 16:44:35 INFO mapred.JobClient: map 11% reduce 1%
13/07/12 16:44:38 INFO mapred.JobClient: map 12% reduce 1%
13/07/12 16:44:44 INFO mapred.JobClient: map 13% reduce 1%
13/07/12 16:44:47 INFO mapred.JobClient: map 9% reduce 1%
13/07/12 16:44:53 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000002_0, Status : FAILED
Error: Java heap space
attempt_201307111633_0009_m_000002_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).
attempt_201307111633_0009_m_000002_0: log4j:WARN Please initialize the log4j system properly.
13/07/12 16:45:03 INFO mapred.JobClient: map 9% reduce 3%
13/07/12 16:45:28 INFO mapred.JobClient: map 14% reduce 3%
13/07/12 16:45:31 INFO mapred.JobClient: map 17% reduce 3%
13/07/12 16:45:34 INFO mapred.JobClient: map 20% reduce 3%
13/07/12 16:45:37 INFO mapred.JobClient: map 20% reduce 5%
13/07/12 16:45:46 INFO mapred.JobClient: map 20% reduce 6%
13/07/12 16:45:55 INFO mapred.JobClient: map 22% reduce 6%
13/07/12 16:45:58 INFO mapred.JobClient: map 24% reduce 6%
13/07/12 16:46:01 INFO mapred.JobClient: map 25% reduce 6%
13/07/12 16:46:07 INFO mapred.JobClient: map 25% reduce 8%
13/07/12 16:46:22 INFO mapred.JobClient: map 26% reduce 8%
13/07/12 16:46:25 INFO mapred.JobClient: map 27% reduce 8%
13/07/12 16:46:31 INFO mapred.JobClient: map 28% reduce 8%
13/07/12 16:46:40 INFO mapred.JobClient: map 29% reduce 8%
13/07/12 16:47:04 INFO mapred.JobClient: map 30% reduce 8%
13/07/12 16:47:16 INFO mapred.JobClient: map 30% reduce 10%
13/07/12 16:47:32 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000007_0, Status : FAILED
Error: Java heap space
13/07/12 16:47:56 INFO mapred.JobClient: map 34% reduce 10%
13/07/12 16:48:13 INFO mapred.JobClient: map 34% reduce 11%
13/07/12 16:48:19 INFO mapred.JobClient: map 39% reduce 11%
13/07/12 16:48:22 INFO mapred.JobClient: map 40% reduce 11%
13/07/12 16:48:34 INFO mapred.JobClient: map 40% reduce 13%
13/07/12 16:48:43 INFO mapred.JobClient: map 44% reduce 13%
13/07/12 16:48:46 INFO mapred.JobClient: map 45% reduce 13%
13/07/12 16:48:58 INFO mapred.JobClient: map 45% reduce 15%
13/07/12 16:49:04 INFO mapred.JobClient: map 48% reduce 15%
13/07/12 16:49:07 INFO mapred.JobClient: map 50% reduce 15%
13/07/12 16:49:13 INFO mapred.JobClient: map 50% reduce 16%
13/07/12 16:49:25 INFO mapred.JobClient: map 53% reduce 16%
13/07/12 16:49:28 INFO mapred.JobClient: map 54% reduce 16%
13/07/12 16:49:43 INFO mapred.JobClient: map 59% reduce 18%
13/07/12 16:49:58 INFO mapred.JobClient: map 59% reduce 20%
13/07/12 16:50:04 INFO mapred.JobClient: map 64% reduce 20%
13/07/12 16:50:13 INFO mapred.JobClient: map 64% reduce 21%
13/07/12 16:50:25 INFO mapred.JobClient: map 69% reduce 21%
13/07/12 16:50:43 INFO mapred.JobClient: map 69% reduce 23%
13/07/12 16:50:46 INFO mapred.JobClient: map 73% reduce 23%
13/07/12 16:50:49 INFO mapred.JobClient: map 75% reduce 23%
13/07/12 16:50:58 INFO mapred.JobClient: map 75% reduce 25%
13/07/12 16:51:08 INFO mapred.JobClient: map 78% reduce 25%
13/07/12 16:51:11 INFO mapred.JobClient: map 80% reduce 25%
13/07/12 16:51:23 INFO mapred.JobClient: map 80% reduce 26%
13/07/12 16:51:29 INFO mapred.JobClient: map 83% reduce 26%
13/07/12 16:51:32 INFO mapred.JobClient: map 85% reduce 26%
13/07/12 16:51:44 INFO mapred.JobClient: map 85% reduce 28%
13/07/12 16:51:50 INFO mapred.JobClient: map 89% reduce 28%
13/07/12 16:51:53 INFO mapred.JobClient: map 90% reduce 28%
13/07/12 16:52:14 INFO mapred.JobClient: map 90% reduce 30%
13/07/12 16:52:20 INFO mapred.JobClient: map 95% reduce 30%
13/07/12 16:52:26 INFO mapred.JobClient: map 95% reduce 31%
13/07/12 16:52:49 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000004_0, Status : FAILED
org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException: EEXIST: 文件已存在
at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:167)
at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:312)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:385)
at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: EEXIST: 文件已存在
at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
... 7 more
attempt_201307111633_0009_m_000004_0: Exception in thread "Thread for syncLogs" java.lang.OutOfMemoryError: Java heap space
attempt_201307111633_0009_m_000004_0: at java.util.Arrays.copyOfRange(Arrays.java:2694)
attempt_201307111633_0009_m_000004_0: at java.lang.String.<init>(String.java:203)
attempt_201307111633_0009_m_000004_0: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread for syncLogs"
13/07/12 16:53:02 INFO mapred.JobClient: map 97% reduce 31%
13/07/12 16:53:05 INFO mapred.JobClient: map 95% reduce 31%
13/07/12 16:53:10 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000004_1, Status : FAILED
Error: Java heap space
13/07/12 16:53:20 INFO mapred.JobClient: map 96% reduce 31%
13/07/12 16:53:23 INFO mapred.JobClient: map 98% reduce 31%
13/07/12 16:53:26 INFO mapred.JobClient: map 100% reduce 31%
13/07/12 16:53:35 INFO mapred.JobClient: map 100% reduce 100%
13/07/12 16:53:41 INFO mapred.JobClient: Job complete: job_201307111633_0009
13/07/12 16:53:41 INFO mapred.JobClient: Counters: 30
13/07/12 16:53:41 INFO mapred.JobClient: Job Counters
13/07/12 16:53:41 INFO mapred.JobClient: Launched reduce tasks=1
13/07/12 16:53:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1153539
13/07/12 16:53:41 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 16:53:41 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 16:53:41 INFO mapred.JobClient: Launched map tasks=25
13/07/12 16:53:41 INFO mapred.JobClient: Data-local map tasks=25
13/07/12 16:53:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=596582
13/07/12 16:53:41 INFO mapred.JobClient: File Input Format Counters
13/07/12 16:53:41 INFO mapred.JobClient: Bytes Read=10399829
13/07/12 16:53:41 INFO mapred.JobClient: File Output Format Counters
13/07/12 16:53:41 INFO mapred.JobClient: Bytes Written=13482
13/07/12 16:53:41 INFO mapred.JobClient: FileSystemCounters
13/07/12 16:53:41 INFO mapred.JobClient: FILE_BYTES_READ=11889
13/07/12 16:53:41 INFO mapred.JobClient: HDFS_BYTES_READ=421848302
13/07/12 16:53:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=497127
13/07/12 16:53:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=13482
13/07/12 16:53:41 INFO mapred.JobClient: Map-Reduce Framework
13/07/12 16:53:41 INFO mapred.JobClient: Map output materialized bytes=12003
13/07/12 16:53:41 INFO mapred.JobClient: Map input records=7532
13/07/12 16:53:41 INFO mapred.JobClient: Reduce shuffle bytes=11395
13/07/12 16:53:41 INFO mapred.JobClient: Spilled Records=460
13/07/12 16:53:41 INFO mapred.JobClient: Map output bytes=377830
13/07/12 16:53:41 INFO mapred.JobClient: Total committed heap usage (bytes)=2999517184
13/07/12 16:53:41 INFO mapred.JobClient: CPU time spent (ms)=293160
13/07/12 16:53:41 INFO mapred.JobClient: Map input bytes=10399829
13/07/12 16:53:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=2273
13/07/12 16:53:41 INFO mapred.JobClient: Combine input records=7532
13/07/12 16:53:41 INFO mapred.JobClient: Reduce input records=230
13/07/12 16:53:41 INFO mapred.JobClient: Reduce input groups=230
13/07/12 16:53:41 INFO mapred.JobClient: Combine output records=230
13/07/12 16:53:41 INFO mapred.JobClient: Physical memory (bytes) snapshot=3793125376
13/07/12 16:53:41 INFO mapred.JobClient: Reduce output records=230
13/07/12 16:53:41 INFO mapred.JobClient: Virtual memory (bytes) snapshot=8323325952
13/07/12 16:53:41 INFO mapred.JobClient: Map output records=7532
13/07/12 16:53:43 INFO bayes.BayesClassifierDriver: =======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classified as
381 0 0 0 0 9 2 0 1 0 1 0 1 0 0 0 0 0 3 0 | 398 a = rec.motorcycles
1 284 0 0 0 1 4 0 6 2 11 0 3 65 0 0 5 0 3 10 | 395 b = comp.windows.x
1 0 340 3 0 2 6 1 0 0 0 0 1 1 12 0 7 0 2 0 | 376 c = talk.politics.mideast
4 0 1 330 0 2 2 0 0 2 1 1 3 0 1 3 12 0 2 0 | 364 d = talk.politics.guns
3 0 4 31 37 6 9 1 0 10 0 0 0 6 93 9 6 36 0 0 | 251 e = talk.religion.misc
7 0 0 0 0 361 2 2 0 1 3 0 6 1 0 1 0 0 11 1 | 396 f = rec.autos
0 0 0 0 0 1 383 9 1 0 0 0 0 0 0 0 0 0 3 0 | 397 g = rec.sport.baseball
1 0 0 0 0 0 8 382 1 0 0 0 2 1 1 0 2 0 1 0 | 399 h = rec.sport.hockey
1 0 0 0 0 3 3 0 335 4 5 0 10 4 0 0 2 0 10 8 | 385 i = comp.sys.mac.hardware
0 3 0 0 0 0 1 0 0 367 0 0 5 10 1 3 2 0 2 0 | 394 j = sci.space
0 0 0 0 0 2 1 0 27 1 300 0 19 11 0 0 0 0 11 20 | 392 k = comp.sys.ibm.pc.hardware
6 0 2 110 0 6 11 4 1 14 0 104 2 1 11 10 26 1 1 0 | 310 l = talk.politics.misc
6 0 1 0 0 4 1 0 8 2 16 0 314 9 0 4 15 0 5 8 | 393 m = sci.electronics
0 13 1 0 0 2 6 0 11 5 11 0 11 304 0 2 10 0 5 8 | 389 n = comp.graphics
2 0 0 0 0 0 5 1 0 2 1 0 1 3 373 5 0 2 1 2 | 398 o = soc.religion.christian
3 0 0 1 0 2 3 3 2 3 2 0 12 10 8 337 1 0 9 0 | 396 p = sci.med
0 1 0 1 0 0 4 0 3 0 1 0 3 8 0 2 370 0 2 1 | 396 q = sci.crypt
9 0 4 10 1 4 6 1 2 4 2 0 0 2 77 14 12 170 0 1 | 319 r = alt.atheism
4 0 0 0 0 9 1 1 9 1 12 0 6 3 0 2 0 0 340 2 | 390 s = misc.forsale
6 5 0 0 0 1 8 0 8 5 50 0 2 39 1 0 8 0 3 258 | 394 t = comp.os.ms-windows.misc
13/07/12 16:53:43 INFO driver.MahoutDriver: Program took 824521 ms (Minutes: 13.742016666666666)
landen@landen-Lenovo:~/文档/20news$
Mahout之Navie Bayesian命令端运行的更多相关文章
- jmeter命令行运行-分布式测试
上一篇文章我们说到了jmeter命令行运行但是是单节点下的, jmeter底层用java开发,耗内存.cpu,如果项目要求大并发去压测服务端的话,jmeter单节点难以完成大并发的请求,这时就需要对j ...
- jmeter命令行运行-单节点
jmeter有自己的GUI页面,但是当线程数很多或者现在有很多的测试场景都是基于linux下进行压测,这时我们可以使用jmeter的命令行方式来执行测试,该篇文章介绍jmeter单节点命令运行方式. ...
- 如何在命令行里运行python脚本
python是一款应用非常广泛的脚本程序语言,谷歌公司的网页就是用python编写.python在生物信息.统计.网页制作.计算等多个领域都体现出了强大的功能.python和其他脚本语言如java.R ...
- Linux的watch命令 — 实时监测命令的运行结果
Linux的watch命令 — 实时监测命令的运行结果 watch 是一个非常实用的命令,基本所有的 Linux 发行版都带有这个小工具,如同名字一样,watch 可以帮你监测一个命令的运行结果,省得 ...
- 从命令行运行django数据库操作
从命令行运行django数据库操作,报错: django.core.exceptions.ImproperlyConfigured: Requested setting DEFAULT_INDEX_T ...
- python命令行运行在win和Linux系统的不同
今天,在完成一个小的python习题,习题的主要内容是读取一个帮助模块,并保存到本地文件. 知道是用pydoc进行模块的读取,但是在windows系统下,调用os模块之后,结果总是为空. 核心语句: ...
- 用DOS命令来运行Java代码
用DOS命令来运行Java代码.. ----------------- Demo.java public class Demo { public static void main(String[] a ...
- WPF C# 命令的运行机制
1.概述 1.1 WPF C# 命令的本质 命令是 WPF 中的输入机制,它提供的输入处理比设备输入具有更高的语义级别. 例如,在许多应用程序中都能找到的“复制”.“剪切”和“粘贴”操作就是命令. W ...
- 含有package关键字的java文件在命令行运行报错
程序中含有package关键字,使用命令行运行程序时出现"找不到或无法加载主类",而使用Eclipse软件可以正常运行程序的可能解决办法. 在包下的类,在Java源文件的地方编译后 ...
随机推荐
- 数据绑定表达式(上):.NET发现之旅(一)
数据绑定表达式(上):.NET发现之旅(一) 2009-06-30 10:29:06 来源:网络转载 作者:佚名 共有评论(0)条 浏览次数:859 作为.NET平台软件开发者,我们频繁与各种各样的数 ...
- [Everyday Mathematics]20150225
设 $f:\bbR\to\bbR$ 二次可微, 适合 $f(0)=0$. 试证: $$\bex \exists\ \xi\in\sex{-\frac{\pi}{2},\frac{\pi}{2}},\s ...
- hibernate建表 一对多 多的一方控制一的一方
一对多 单向<one-to-many>通过calss操作student 外键在student表中,所以外键由student维护<many-to-one>通过student操作c ...
- windows下ncl生成tiff图(案例)
一:安装软件和准备数据 1.需要安装Vapor(注意安装路径不要存在空格) 注:版本2.4.2及以后 2.安装NCL,方法见http://www.cnblogs.com/striver-zhu/p/4 ...
- 基类,派生类,内存分配情况 .xml
pre{ line-height:1; color:#1e1e1e; background-color:#d2d2d2; font-size:16px;}.sysFunc{color:#627cf6; ...
- 《Windows核心编程》第5版 学习进度备忘
学习资源:<Windows核心编程>第5版 知识基础支持: 本书与<Windows程序设计>第5版珍藏版结合很好,二者重叠内容不多,二者互补性强,而且相关方面的优秀书籍 跳过的 ...
- ansible条件使用--实践
ansible条件使用 1.条件使用最简单的方式 ansible中使用条件最简单的方式如下所示: [root@ansibleserver kel]# cat conditions.yml --- - ...
- Module ngx_http_index_module nginx的首页模块
Example Configuration:例子配置文件Directives 指令 index 首页 The ngx_http_index_module module processes r ...
- 50+ 响应式的Prestashop电商主题
PrestaShop是一款针对web2.0设计的全功能.跨平台的免费开源电子商务解决方案,自08年1.0版本发布,短短两年时间,发展迅速,全球已超过四万家网店采用Prestashop进行部署.Pres ...
- 40个最好的Tumblr主题
如果安装了一款较好的Tumblr主题,你的Tumblr空间将焕然一新.然而找到一款合适的主题并不是一件容易的事,这正是本文中我整理那么多优质的Tumblr模板作为灵感的原因.其中有一些免费的Tumbl ...