参考:http://www.360doc.com/content/14/0117/09/1200324_345883534.shtml

Precondition: 启动Hadoop集群

bin/hdfs namenode -format

sbin/start-dfs.sh(启动Namenode,nodenode相关节点)

sbin/start-yarn.sh(启动ResourceManager,nodeManager相关资源)

bin/hdfs dfsadmin -safemode leave(关闭安全模式)

Note:所有bin/mahout下对应的输入文件,输入文件夹均在HDFS文件目录下

In addition,Mahout下处理的文件必须是SequenceFile文件格式的,故需将txt格式文件转化为SequenceFile文件,如下:

kelvin@Master:~/UntarFile/mahout-0.7-cdh4.5.0$  bin/mahout seqdirectory -input(输入) /TagOutput/txtFile.txt -output(输出)/TagOutput/seqFile.txt   --charset UTF-8

相关实践过程如下:

(将sequenceFile文件转化为可读的txt文件)

kelvin@Master:~/UntarFile/mahout-0.7-cdh4.5.0$  bin/mahout seqdumper -i(输入) /TagOutput/clusteredPoints/part-m-00000 -o(输出) ./TagOutput/clusterPoints.txt
Running on hadoop, using /home/kelvin/UntarFile/hadoop2CDH4//bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/kelvin/UntarFile/mahout-0.7-cdh4.5.0/mahout-examples-0.7-cdh4.5.0-job.jar
14/06/06 02:18:07 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/TagOutput/clusteredPoints/part-m-00000], --output=[./TagOutput/clusterPoints.txt], --startPhase=[0], --tempDir=[temp]}
14/06/06 02:18:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/06/06 02:18:08 INFO driver.MahoutDriver: Program took 1162 ms (Minutes: 0.019366666666666667)
kelvin@Master:~/UntarFile/mahout-0.7-cdh4.5.0$ cd TagOutput/
kelvin@Master:~/UntarFile/mahout-0.7-cdh4.5.0/TagOutput$ ll
total 12
drwxrwxr-x  2 kelvin kelvin 4096  6? 6 02:12 ./
drwxr-xr-x 16 kelvin kelvin 4096  6? 6 02:15 ../
-rw-rw-r--  1 kelvin kelvin 2767  6? 6 02:18 clusterPoints.txt
kelvin@Master:~/UntarFile/mahout-0.7-cdh4.5.0/TagOutput$ cat clusterPoints.txt
Input Path: /TagOutput/clusteredPoints/part-m-00000
Key class: class org.apache.hadoop.io.IntWritable Value Class: class org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable
Key: 14: Value: wt: 1.0 distance: 13.357142857142858  vec: 2012000317 = [24.000]
Key: 14: Value: wt: 1.0 distance: 6.642857142857143  vec: 2012000318 = [4.000]
Key: 25: Value: wt: 1.0 distance: 11.615384615384592  vec: 2012000319 = [56.000]
Key: 14: Value: wt: 1.0 distance: 7.642857142857142  vec: 2012000320 = [3.000]
Key: 14: Value: wt: 1.0 distance: 11.357142857142858  vec: 2012000321 = [22.000]
Key: 25: Value: wt: 1.0 distance: 0.6153846153846132  vec: 2012000322 = [45.000]
Key: 14: Value: wt: 1.0 distance: 13.357142857142858  vec: 2012000323 = [24.000]
Key: 14: Value: wt: 1.0 distance: 6.642857142857143  vec: 2012000324 = [4.000]
Key: 25: Value: wt: 1.0 distance: 11.615384615384592  vec: 2012000325 = [56.000]
Key: 14: Value: wt: 1.0 distance: 7.642857142857142  vec: 2012000326 = [3.000]
Key: 14: Value: wt: 1.0 distance: 11.357142857142858  vec: 2012000327 = [22.000]
Key: 25: Value: wt: 1.0 distance: 2.384615384615403  vec: 2012000328 = [42.000]
Key: 25: Value: wt: 1.0 distance: 4.61538461538464  vec: 2012000329 = [49.000]
Key: 25: Value: wt: 1.0 distance: 3.384615384615356  vec: 2012000330 = [41.000]
Key: 14: Value: wt: 1.0 distance: 5.642857142857143  vec: 2012000331 = [5.000]
Key: 14: Value: wt: 1.0 distance: 7.642857142857142  vec: 2012000332 = [3.000]
Key: 14: Value: wt: 1.0 distance: 10.357142857142858  vec: 2012000333 = [21.000]
Key: 25: Value: wt: 1.0 distance: 10.38461538461539  vec: 2012000334 = [34.000]
Key: 25: Value: wt: 1.0 distance: 15.384615384615387  vec: 2012000335 = [29.000]
Key: 25: Value: wt: 1.0 distance: 1.3846153846153868  vec: 2012000336 = [43.000]
Key: 25: Value: wt: 1.0 distance: 9.615384615384606  vec: 2012000337 = [54.000]
Key: 25: Value: wt: 1.0 distance: 8.384615384615397  vec: 2012000338 = [36.000]
Key: 14: Value: wt: 1.0 distance: 8.642857142857142  vec: 2012000339 = [2.000]
Key: 14: Value: wt: 1.0 distance: 5.642857142857143  vec: 2012000340 = [5.000]
Key: 20: Value: wt: 1.0 distance: 1.7999999999999972  vec: 2012000341 = [78.000]
Key: 25: Value: wt: 1.0 distance: 9.615384615384606  vec: 2012000342 = [54.000]
Key: 20: Value: wt: 1.0 distance: 13.800000000000018  vec: 2012000343 = [66.000]
Key: 14: Value: wt: 1.0 distance: 3.6428571428571423  vec: 2012000344 = [7.000]
Key: 20: Value: wt: 1.0 distance: 29.200000000000053  vec: 2012000345 = [109.000]
Key: 20: Value: wt: 1.0 distance: 11.800000000000068  vec: 2012000346 = [68.000]
Key: 20: Value: wt: 1.0 distance: 1.7999999999999972  vec: 2012000347 = [78.000]
Key: 25: Value: wt: 1.0 distance: 6.384615384615375  vec: 2012000348 = [38.000]
Count: 32
(将相应的数据结点聚类后输出)

kelvin@Master:~/UntarFile/mahout-0.7-cdh4.5.0$  bin/mahout clusterdump --input /TagOutput/*final(:目录,非文件,最后一次迭代的clusters) --pointsDir /TagOutput/clusteredPoints(最后一次聚类后的点 --output ./TagOutput/clusterResult.txt
Running on hadoop, using /home/kelvin/UntarFile/hadoop2CDH4//bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/kelvin/UntarFile/mahout-0.7-cdh4.5.0/mahout-examples-0.7-cdh4.5.0-job.jar
14/06/06 02:50:11 INFO common.AbstractJob: Command line arguments: {--dictionaryType=[text], --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], --endPhase=[2147483647], --input=[/TagOutput/*final], --output=[./TagOutput/clusterResult.txt], --outputFormat=[TEXT], --pointsDir=[/TagOutput/clusteredPoints], --startPhase=[0], --tempDir=[temp]}
14/06/06 02:50:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/06/06 02:50:14 INFO clustering.ClusterDumper: Wrote 3 clusters
14/06/06 02:50:14 INFO driver.MahoutDriver: Program took 2809 ms (Minutes: 0.046816666666666666)
kelvin@Master:~/UntarFile/mahout-0.7-cdh4.5.0$ cd TagOutput/
kelvin@Master:~/UntarFile/mahout-0.7-cdh4.5.0/TagOutput$ ls
clusterPoints.txt  clusterResult.txt
kelvin@Master:~/UntarFile/mahout-0.7-cdh4.5.0/TagOutput$ ll
total 16
drwxrwxr-x  2 kelvin kelvin 4096  6? 6 02:30 ./
drwxr-xr-x 16 kelvin kelvin 4096  6? 6 02:15 ../
-rw-rw-r--  1 kelvin kelvin 2767  6? 6 02:18 clusterPoints.txt
-rw-rw-r--  1 kelvin kelvin 2108  6? 6 02:50 clusterResult.txt
kelvin@Master:~/UntarFile/mahout-0.7-cdh4.5.0/TagOutput$ cat clusterResult.txt
VL-14{n=14 c=[10.643] r=[9.013]}
        Weight : [props - optional]:  Point:
        1.0 : [distance=13.357142857142858]: 2012000317 = [24.000]
        1.0 : [distance=6.642857142857143]: 2012000318 = [4.000]
        1.0 : [distance=7.642857142857142]: 2012000320 = [3.000]
        1.0 : [distance=11.357142857142858]: 2012000321 = [22.000]
        1.0 : [distance=13.357142857142858]: 2012000323 = [24.000]
        1.0 : [distance=6.642857142857143]: 2012000324 = [4.000]
        1.0 : [distance=7.642857142857142]: 2012000326 = [3.000]
        1.0 : [distance=11.357142857142858]: 2012000327 = [22.000]
        1.0 : [distance=5.642857142857143]: 2012000331 = [5.000]
        1.0 : [distance=7.642857142857142]: 2012000332 = [3.000]
        1.0 : [distance=10.357142857142858]: 2012000333 = [21.000]
        1.0 : [distance=8.642857142857142]: 2012000339 = [2.000]
        1.0 : [distance=5.642857142857143]: 2012000340 = [5.000]
        1.0 : [distance=3.6428571428571423]: 2012000344 = [7.000]
VL-20{n=5 c=[79.800] r=[15.419]}
        Weight : [props - optional]:  Point:
        1.0 : [distance=1.7999999999999972]: 2012000341 = [78.000]
        1.0 : [distance=13.800000000000018]: 2012000343 = [66.000]
        1.0 : [distance=29.200000000000053]: 2012000345 = [109.000]
        1.0 : [distance=11.800000000000068]: 2012000346 = [68.000]
        1.0 : [distance=1.7999999999999972]: 2012000347 = [78.000]
VL-25{n=13 c=[44.385] r=[8.553]}
        Weight : [props - optional]:  Point:
        1.0 : [distance=11.615384615384592]: 2012000319 = [56.000]
        1.0 : [distance=0.6153846153846132]: 2012000322 = [45.000]
        1.0 : [distance=11.615384615384592]: 2012000325 = [56.000]
        1.0 : [distance=2.384615384615403]: 2012000328 = [42.000]
        1.0 : [distance=4.61538461538464]: 2012000329 = [49.000]
        1.0 : [distance=3.384615384615356]: 2012000330 = [41.000]
        1.0 : [distance=10.38461538461539]: 2012000334 = [34.000]
        1.0 : [distance=15.384615384615387]: 2012000335 = [29.000]
        1.0 : [distance=1.3846153846153868]: 2012000336 = [43.000]
        1.0 : [distance=9.615384615384606]: 2012000337 = [54.000]
        1.0 : [distance=8.384615384615397]: 2012000338 = [36.000]
        1.0 : [distance=9.615384615384606]: 2012000342 = [54.000]
        1.0 : [distance=6.384615384615375]: 2012000348 = [38.000]
上述结果的输出等同于在Eclipse中编写的ClusterDumper,如下:

public static void run(Configuration conf, Path input, Path output, DistanceMeasure measure, double t1, double t2,
double convergenceDelta, int maxIterations) throws Exception {
Path directoryContainingConvertedInput = new Path(output, DIRECTORY_CONTAINING_CONVERTED_INPUT);
log.info("Preparing Input");
TagInputDriver.runJob(input, directoryContainingConvertedInput, "org.apache.mahout.math.RandomAccessSparseVector");
log.info("Running Canopy to get initial clusters");
Path canopyOutput = new Path(output, "canopies");
CanopyDriver.run(new Configuration(), directoryContainingConvertedInput, canopyOutput, measure, t1, t2, false, 0.0,
false);
log.info("Running KMeans");
TagKMeansDriver.run(conf, directoryContainingConvertedInput, new Path(canopyOutput, Cluster.INITIAL_CLUSTERS_DIR
+ "-final"), output, convergenceDelta, maxIterations, true, 0.0, false);
// run ClusterDumper
ClusterDumper clusterDumper = new ClusterDumper(new Path(output, "clusters-*-final"), new Path(output,
"clusteredPoints"));
clusterDumper.printClusters(null);
}

其他前提操作:

kelvin@Master:~/UntarFile/hadoop2CDH4$ bin/hadoop fs -put ./../mahout-0.7-cdh4.5.0/output/* /TagOutput
14/06/06 01:53:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
kelvin@Master:~/UntarFile/hadoop2CDH4$ bin/hadoop fs -ls /TagOutput
14/06/06 01:53:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 15 items
-rw-r--r--   1 kelvin supergroup        194 2014-06-06 01:53 /TagOutput/_policy
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusteredPoints
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusters-0
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusters-1
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusters-10-final
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusters-2
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusters-3
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusters-4
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusters-5
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusters-6
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusters-7
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusters-8
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/clusters-9
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/data
drwxr-xr-x   - kelvin supergroup          0 2014-06-06 01:53 /TagOutput/random-seeds

mahout学习的更多相关文章

  1. 转】Mahout学习路线图

    原博文出自于: http://blog.fens.me/hadoop-mahout-roadmap/ 感谢! Mahout学习路线图 Hadoop家族系列文章,主要介绍Hadoop家族产品,常用的项目 ...

  2. mahout第一篇-----Mahout学习路线图

    Mahout学习路线图 前言 Mahout是Hadoop家族中与众不同的一个成员,是基于一个Hadoop的机器学习和数据挖掘的分布式计算框架.Mahout是一个跨学科产品,同时也是我认为Hadoop家 ...

  3. Mahout学习路线图

    转自:http://blog.fens.me/hadoop-mahout-roadmap/ Mahout学习路线图 Hadoop家族系列文章,主要介绍Hadoop家族产品,常用的项目包括Hadoop, ...

  4. Mahout学习路线图-张丹老师

    前言 Mahout是Hadoop家族中与众不同的一个成员,是基于一个Hadoop的机器学习和数据挖掘的分布式计算框架.Mahout是一个跨学科产品,同时也是我认为Hadoop家族中,最有竞争力,最难掌 ...

  5. Hadoop里的数据挖掘应用-Mahout——学习笔记<三>

    之前有幸在MOOC学院抽中小象学院hadoop体验课. 这是小象学院hadoop2.X的笔记 由于平时对数据挖掘做的比较多,所以优先看Mahout方向视频. Mahout有很好的扩展性与容错性(基于H ...

  6. Mahout学习之Mahout简介、安装、配置、入门程序测试

    一.Mahout简介 查了Mahout的中文意思——驭象的人,再看看Mahout的logo,好吧,想和小黄象happy地玩耍,得顺便陪陪这位驭象人耍耍了... 附logo: (就是他,骑在象头上的那个 ...

  7. Mahout学习之Mahout简单介绍、安装、配置、入门程序測试

    一.Mahout简单介绍 查了Mahout的中文意思--驭象的人,再看看Mahout的logo,好吧,想和小黄象happy地玩耍,得顺便陪陪这位驭象人耍耍了... 附logo: (就是他,骑在象头上的 ...

  8. mahout学习-1

    一. 安装软件 需要安装如下文件: java, Eclipse, Maven,Hadoop,mahout 二. 推荐系统简介 每天,我们都会对一些事物表达自己的看法,喜欢,或不喜欢,或不在乎.这些都在 ...

  9. Mahout学习资料

    Apache Mahout 简介 http://www.ibm.com/developerworks/cn/java/j-mahout/ 从源代码剖析Mahout推荐引擎 http://blog.fe ...

随机推荐

  1. 2018.07.31 POJ1741Tree(点分治)

    传送门 只是来贴一个点分治的板子(年轻时候写的丑别介意). 代码: #include<cstdio> #include<cstring> #include<algorit ...

  2. javascript 字符数组转换成以逗号隔开的字符串

    var ids = [];angular.forEach(pulsarServers,function (server) { ids.push(server.id);});ids = ids.join ...

  3. 玩转Nodejs的集群

    在Nodejs中使用集群还是不容易的.Javascript的单线程属性让nodejs下的应用很难使用现代机器的多核特性.比如下面的代码实现了一个http服务器的主干部分.这部分代码只会执行在一个线程上 ...

  4. afx_msg解释

    以前一直不知道AFX_MSG是什么意思,只是觉得它应该是个消息映射函数,但是具体代表什么意思,会返回一个什么样的值是一点都不清楚,今天查了下资料,把查到的东西放这,以免以后忘了还得再查. 在头文件(D ...

  5. C++中的乱七八糟问题

    1   在编写的c++程序中,如果是窗口,有时会一闪就消失了,如果不想让其消失,在程序结尾处添加: #include“iostream.h” system("pause"); 分析 ...

  6. hdu2844

    题目 这道题,刚开始题没读懂,就是这句话:,A1,A2,A3...An and C1,C2,C3...Cn corresponding to the number of Tony's coins of ...

  7. Delphi Dll 动态调用例子(1)

    http://blog.sina.com.cn/s/blog_62c46c3701010q7h.html 一.编写dll library TestDllByD2007; uses  SysUtils, ...

  8. IDEA13 项目配置

    之前用了一段时间的idea,有些老的代码,用eclipse跑了一下,比较麻烦,于是试用一下idea,最后,项目可以顺利跑起来. 对项目的配置,主要是在F4中,即:Module Setting,在模块的 ...

  9. asp.net core mvc 管道之中间件

    asp.net core mvc 管道之中间件 http请求处理管道通过注册中间件来实现各种功能,松耦合并且很灵活 此文简单介绍asp.net core mvc中间件的注册以及运行过程 通过理解中间件 ...

  10. C#一个简单的关于线程的实例

    很多初学者听到线程会觉得晦涩难懂,很多资料一堆专有名词也是让人心烦意乱,本着学习加分享的态度,这里做一个简单的实例分享帮助初学者们初识多线程.  首先大概讲述一下多线程和多进程的区别,任务管理器里各种 ...