学习Mahout(一)

Mahout 官方下载地址：http://apache.fayea.com/apache-mirror/mahout/

环境ubuntu 12.04， hadoop1.2.1 ，mahout 0.9 ， memory 2G

1 首先解压tar包

tar -zxvf /mnt/hgfs/mnt/mahout-distribution-0.9.tar.gz -C /opt/hadoop/

2 添加环境变量

export HADOOP_HOME=/opt/hadoop/hadoop-1.2.

export HADOOP_CONF_DIR=${HADOOP_HOME}/conf

export MAHOUT_HOME=/opt/hadoop/mahout-distribution-0.9

你也可以将上面的新增环境变量加入~/.bashrc文件中去

3 启动你的hadoop服务，这里不再累述，自己参考：http://www.cnblogs.com/chenfool/p/3574789.html

4 执行一下mahout

cd /opt/hadoop/mahout-distribution-0.9

bin/mahout --help

报错，错误信息：

Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.

使用vi 打开bin/mahout查看，搜索JAVA_HEAP_MAX=-X

看到它写死：JAVA_HEAP_MAX=-Xmx3g

尼玛啊，什么机器能轻松给3G的内存，改写成JAVA_HEAP_MAX=-Xmx1g

再查找一下mapred.map.child.java.opts 、 mapred.reduce.child.java.opts ，都写着4096m，还让渣渣机器活吗？

自己根据自己机器实际情况调整参数，保存退出。

再执行

bin/mahout --help

arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump: : Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
lucene.vector: : Generate Vectors from a Lucene index
lucene2seq: : Generate Text SequenceFiles from a Lucene index
matrixdump: : Dump matrix in CSV format
matrixmult: : Take the product of two matrices
parallelALS: : ALS-WR factorization of a rating matrix
qualcluster: : Runs clustering experiments and summarizes results in a CSV
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
regexconverter: : Convert text files on a per line basis based on regular expressions
resplit: : Splits a set of SequenceFiles into a number of equal splits
rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
runlogistic: : Run a logistic regression model against CSV data
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
seqdirectory: : Generate sequence files (of Text) from a directory
seqdumper: : Generic Sequence File dumper
seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipedia xml dump to sequence file
spectralkmeans: : Spectral k-means clustering
split: : Split Input data into test and train sets
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : Lanczos Singular Value Decomposition
testnb: : Test the Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainlogistic: : Train a logistic regression using stochastic gradient descent
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
vectordump: : Dump vectors from a sequence file to text
viterbi: : Viterbi decoding of hidden states from given output states sequence

证明mahout 环境部署成功了。

参考文章：

http://blog.sina.com.cn/s/blog_916b71bb0101jq44.html

http://samchu.logdown.com/posts/192574-mahout-09-installation-verification-records

学习Mahout(一)的更多相关文章

零基础学习Mahout之-----搭建单机环境
一.Mahout是什么? Mahout是Apache的一个开源项目(http://mahout.apache.org/),提供了机器学习领域的若干经典算法,以便开发人员快速构建机器学习和数据挖掘方面的 ...
学习Mahout (四)
在Mahout 学习(三)中,我贴了example的代码,里面生成向量文件的代码: InputDriver.runJob(input, directoryContainingConvertedInpu ...
学习Mahout(三)
开发+运行第一个Mahout的程序代码: /** * Licensed to the Apache Software Foundation (ASF) under one or more * con ...
学习Mahout(二)
继续上一篇博客. 这篇博客介绍如何跑一下mahout自带的Hello world程序我将mahout 安装在/opt/hadoop/mahout-distribution-0.9 cd /opt/h ...
mahout分类学习和遇到的问题总结
这段时间学习Mahout有喜有悲.在这里首先感谢樊哲老师的指导.以下列出关于这次Mahout分类的学习和遇到的问题,还请大家多多提出建议:(全部文件操作都使用是在hdfs上边进行的). (本人用的环境 ...
mahout中kmeans算法和Canopy算法实现原理
本文讲一下mahout中kmeans算法和Canopy算法实现原理. 一. Kmeans是一个很经典的聚类算法,我想大家都非常熟悉.虽然算法较为简单,在实际应用中却可以有不错的效果:其算法原理也决定了 ...
Mahout机器学习平台之聚类算法具体剖析（含实例分析）
第一部分: 学习Mahout必需要知道的资料查找技能: 学会查官方帮助文档: 解压用于安装文件(mahout-distribution-0.6.tar.gz),找到例如以下位置.我将该文件解压到win ...
hadoop学习路线
学习hadoop,首先我们要知道hadoop是什么? 说到底Hadoop只是一项分布式系统的工具,我们在学习的时候要理解分布式系统设计中的原则以及方法,只有这样才能以不变应万变.再一个就是一定要动手, ...
《mahout实战》
<mahout实战> 基本信息原书名:Mahout in action 作者: (美)Sean Owen Robin Anil Ted Dunning Ellen Fr ...

随机推荐

ubuntu 用shell脚本实现将当前文件夹下全部文件夹中的某一类文件复制到同一文件夹下
当前文件夹下有一些文件和文件夹,当中每一个文件夹里都有若干.txt文件. 如今要求在当前文件夹创建一个新文件夹all,且将那些文件夹全部.txt文件都复制到文件夹all.在ubuntu12.04的s ...
加载和执行 --《高性能JavaScript》
1.起因: 每次遇到<script> 标签时,页面必须停下来等待代码下载并执行完,然后再继续处理其他部分. 2.减少JavaScript对性能的影响 1.将所有的JavaScript文件放 ...
POJ 2482 Stars in Your Window(线段树+扫描线)
题目链接非常不容易的一道题,把每个点向右上构造一个矩形,将问题转化为重合矩形那个亮度最大,注意LL,注意排序. #include <cstdio> #include <cstrin ...
两个月刷完Leetcode前400题经验总结
更新:气死了,挂个傻逼: 每次做个分享.组织个活动,就会有一些傻逼冒泡生怕别人不知道他是傻逼,气死我了!自己好好看看非法集资的概念,我办这个活动,一分钱都没收,入群99元是督促大家完成刷题任务,最后完 ...
python day- 10 动态参数函数的嵌套命名空间和作用域 global和nolocal
一.动态参数: 动态参数是形参的一类分为:动态位置参数(* + 函数名)表示调用后返回的是元祖动态关键字参数(** + 函数名)表示形参的排列顺序: 位置参数 > 动态位置参 ...
junit使用小结
1.spring中使用 @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(classes=CDPlayerConfig.cla ...
spring list map set
1 list  <property name="someList& ...
Serialization and deserialization are bottlenecks in parallel and distributed computing, especially in machine learning applications with large objects and large quantities of data.
Serialization and deserialization are bottlenecks in parallel and distributed computing, especially ...
Oracle Exception
Oracle存储过程的异常处理 1.为了提高存储过程的健壮性,避免运行错误,当建立存储过程时应包含异常处理部分.2.异常(EXCEPTION)是一种PL/SQL标识符,包括预定义异常.非预定义异常和自 ...
finally{} 代码块
package Exception; /* * finally{}代码块 * * finally{]代码块是必须要被执行的,不管异常是否处理成功,该代码块里面的代码体都会被执行, */ public ...

学习Mahout(一)

学习Mahout(一)的更多相关文章

随机推荐

热门专题