安装Hama之前,应该首先确保系统中已经安装了hadoop,本集群使用的版本为hadoop-2.3.0

一、下载及解压Hama文件

  下载地址:http://www.apache.org/dyn/closer.cgi/hama,选用的是目前最新版本:hama0.6.4。解压之后的存放位置自己设定。

二、修改配置文件

  1. 在hama-env.sh文件中加入JAVA_HOME变量(分布式情况下,设为机器的值)
  2. 配置hama-site.xml(分布式情况下,所有机器的配置相同)

bsp.master.address为bsp master地址。fs.default.name参数设置成hadoop里namenode的地址。hama.zookeeper.quorum和      hama.zookeeper.property.clientPort两个参数和zookeeper有关,设置成为zookeeper的quorum server即可,单机伪分布式就是本机地址。

4. 配置groomservers文件。hama与hadoop具有相似的主从结构,该文件存放从节点的IP地址,每个IP占一行。(分布式情况下只需要配置BSPMaster所在的机器即可)

5. hama0.6.4自带的hadoop核心包为1.2.0,与集群hadoop2.3.0不一致,需要进行替换,具体是在hadoop的lib文件夹下找到hadoop-core-2.3.0*.jar和hadoop-test-2.3.0*.jar,拷贝到hama的lib目录下,并删除hadoop-core-1.2.0.jar和hadoop-test-1.2.0.jar两个文件。

  

  6. 此时可能会报找不到类的错, 需加入缺失的jar包。(把hadoop开头的jar包和protobuf-java-2.5.0.jar导入到hama/lib下)

三、编写Hama job

在eclipse下新建Java Project,将hama安装时需要的jar包全部导入工程。

官网中计算PI的例子:

 package pi;

 import java.io.IOException;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hama.HamaConfiguration;
import org.apache.hama.bsp.BSP;
import org.apache.hama.bsp.BSPJob;
import org.apache.hama.bsp.BSPJobClient;
import org.apache.hama.bsp.BSPPeer;
import org.apache.hama.bsp.ClusterStatus;
import org.apache.hama.bsp.FileOutputFormat;
import org.apache.hama.bsp.NullInputFormat;
import org.apache.hama.bsp.TextOutputFormat;
import org.apache.hama.bsp.sync.SyncException; public class PiEstimator {
private static Path TMP_OUTPUT = new Path("/tmp/pi-"
+ System.currentTimeMillis()); public static class MyEstimator
extends
BSP<NullWritable, NullWritable, Text, DoubleWritable, DoubleWritable> {
public static final Log LOG = LogFactory.getLog(MyEstimator.class);
private String masterTask;
private static final int iterations = 100000; @Override
public void bsp(
BSPPeer<NullWritable, NullWritable, Text, DoubleWritable, DoubleWritable> peer)
throws IOException, SyncException, InterruptedException { int in = 0;
for (int i = 0; i < iterations; i++) {
double x = 2.0 * Math.random() - 1.0, y = 2.0 * Math.random() - 1.0;
if ((Math.sqrt(x * x + y * y) < 1.0)) {
in++;
}
} double data = 4.0 * in / iterations; peer.send(masterTask, new DoubleWritable(data));
peer.sync(); if (peer.getPeerName().equals(masterTask)) {
double pi = 0.0;
int numPeers = peer.getNumCurrentMessages();
DoubleWritable received;
while ((received = peer.getCurrentMessage()) != null) {
pi += received.get();
} pi = pi / numPeers;
peer.write(new Text("Estimated value1 of PI is"),
new DoubleWritable(pi));
}
peer.sync(); int in2 = 0;
for (int i = 0; i < iterations; i++) {
double x = 2.0 * Math.random() - 1.0, y = 2.0 * Math.random() - 1.0;
if ((Math.sqrt(x * x + y * y) < 1.0)) {
in2++;
}
} double data2 = 4.0 * in2 / iterations; peer.send(masterTask, new DoubleWritable(data2));
peer.sync(); if (peer.getPeerName().equals(masterTask)) {
double pi2 = 0.0;
int numPeers = peer.getNumCurrentMessages();
DoubleWritable received;
while ((received = peer.getCurrentMessage()) != null) {
pi2 += received.get();
} pi2 = pi2 / numPeers;
peer.write(new Text("Estimated value2 of PI is"),
new DoubleWritable(pi2));
}
peer.sync(); } @Override
public void setup(
BSPPeer<NullWritable, NullWritable, Text, DoubleWritable, DoubleWritable> peer)
throws IOException {
// Choose one as a master this.masterTask = peer.getPeerName(peer.getNumPeers() / 2);
} @Override
public void cleanup(
BSPPeer<NullWritable, NullWritable, Text, DoubleWritable, DoubleWritable> peer)
throws IOException { // if (peer.getPeerName().equals(masterTask)) {
// double pi = 0.0;
// int numPeers = peer.getNumCurrentMessages();
// DoubleWritable received;
// while ((received = peer.getCurrentMessage()) != null) {
// pi += received.get();
// }
//
// pi = pi / numPeers;
// peer.write(new Text("Estimated value of PI is"),
// new DoubleWritable(pi));
// }
}
} static void printOutput(HamaConfiguration conf) throws IOException {
FileSystem fs = FileSystem.get(conf);
FileStatus[] files = fs.listStatus(TMP_OUTPUT);
for (int i = 0; i < files.length; i++) {
if (files[i].getLen() > 0) {
FSDataInputStream in = fs.open(files[i].getPath());
IOUtils.copyBytes(in, System.out, conf, false);
in.close();
break;
}
} fs.delete(TMP_OUTPUT, true);
} public static void main(String[] args) throws InterruptedException,
IOException, ClassNotFoundException {
// BSP job configuration
HamaConfiguration conf = new HamaConfiguration();
BSPJob bsp = new BSPJob(conf, PiEstimator.class);
// Set the job name
bsp.setJobName("Pi Estimation Example");
bsp.setBspClass(MyEstimator.class);
bsp.setInputFormat(NullInputFormat.class);
bsp.setOutputKeyClass(Text.class);
bsp.setOutputValueClass(DoubleWritable.class);
bsp.setOutputFormat(TextOutputFormat.class);
FileOutputFormat.setOutputPath(bsp, TMP_OUTPUT); BSPJobClient jobClient = new BSPJobClient(conf);
ClusterStatus cluster = jobClient.getClusterStatus(true); if (args.length > 0) {
bsp.setNumBspTask(Integer.parseInt(args[0]));
} else {
// Set to maximum
bsp.setNumBspTask(cluster.getMaxTasks());
} long startTime = System.currentTimeMillis(); if (bsp.waitForCompletion(true)) {
printOutput(conf);
System.out.println("Job Finished in "
+ (System.currentTimeMillis() - startTime) / 1000.0
+ " seconds");
}
} }

View PiEstimator 

将工程Export成Jar文件,发到集群上运行。运行命令:

$HAMA_HOME/bin/hama  jar  jarName.jar

输出:

Current supersteps number: 0()

Current supersteps number: 4()

The total number of supersteps: 4(总超级步数目)

Counters: 8(一共8个计数器,如下8个。所有计数器列表待完善)

org.apache.hama.bsp.JobInProgress$JobCounter

SUPERSTEPS=4(BSPMaster超级步数目)

LAUNCHED_TASKS=3(共多少个task)

org.apache.hama.bsp.BSPPeerImpl$PeerCounter

SUPERSTEP_SUM=12(总共的超级步数目,task数目*BSPMaster超级步数目)

MESSAGE_BYTES_TRANSFERED=48(传输信息字节数)

TIME_IN_SYNC_MS=657(同步消耗时间)

TOTAL_MESSAGES_SENT=6(发送信息条数)

TOTAL_MESSAGES_RECEIVED=6(接收信息条数)

TASK_OUTPUT_RECORDS=2(任务输出记录数)

PageRank例子:

 package pi;

 import java.io.IOException;

 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hama.HamaConfiguration;
import org.apache.hama.bsp.HashPartitioner;
import org.apache.hama.bsp.TextOutputFormat;
import org.apache.hama.graph.AverageAggregator;
import org.apache.hama.graph.Edge;
import org.apache.hama.graph.GraphJob;
import org.apache.hama.graph.Vertex;
import org.apache.hama.graph.VertexInputReader; /**
* Real pagerank with dangling node contribution.
*/
public class PageRank { public static class PageRankVertex extends
Vertex<Text, NullWritable, DoubleWritable> { static double DAMPING_FACTOR = 0.85;
static double MAXIMUM_CONVERGENCE_ERROR = 0.001; @Override
public void setup(HamaConfiguration conf) {
String val = conf.get("hama.pagerank.alpha");
if (val != null) {
DAMPING_FACTOR = Double.parseDouble(val);
}
val = conf.get("hama.graph.max.convergence.error");
if (val != null) {
MAXIMUM_CONVERGENCE_ERROR = Double.parseDouble(val);
}
} @Override
public void compute(Iterable<DoubleWritable> messages)
throws IOException {
// initialize this vertex to 1 / count of global vertices in this
// graph
if (this.getSuperstepCount() == 0) {
this.setValue(new DoubleWritable(1.0 / this.getNumVertices()));
} else if (this.getSuperstepCount() >= 1) {
double sum = 0;
for (DoubleWritable msg : messages) {
sum += msg.get();
}
double alpha = (1.0d - DAMPING_FACTOR) / this.getNumVertices();
this.setValue(new DoubleWritable(alpha + (sum * DAMPING_FACTOR)));
} // if we have not reached our global error yet, then proceed.
DoubleWritable globalError = this.getAggregatedValue(0);
if (globalError != null && this.getSuperstepCount() > 2
&& MAXIMUM_CONVERGENCE_ERROR > globalError.get()) {
voteToHalt();
return;
} // in each superstep we are going to send a new rank to our
// neighbours
sendMessageToNeighbors(new DoubleWritable(this.getValue().get()
/ this.getEdges().size()));
}
} public static GraphJob createJob(String[] args, HamaConfiguration conf)
throws IOException {
GraphJob pageJob = new GraphJob(conf, PageRank.class);
pageJob.setJobName("Pagerank"); pageJob.setVertexClass(PageRankVertex.class);
pageJob.setInputPath(new Path(args[0]));
pageJob.setOutputPath(new Path(args[1])); // set the defaults
pageJob.setMaxIteration(30);
pageJob.set("hama.pagerank.alpha", "0.85");
// reference vertices to itself, because we don't have a dangling node
// contribution here
pageJob.set("hama.graph.self.ref", "true");
pageJob.set("hama.graph.max.convergence.error", "1"); if (args.length == 3) {
pageJob.setNumBspTask(Integer.parseInt(args[2]));
} // error
pageJob.setAggregatorClass(AverageAggregator.class); // Vertex reader
pageJob.setVertexInputReaderClass(PagerankTextReader.class); pageJob.setVertexIDClass(Text.class);
pageJob.setVertexValueClass(DoubleWritable.class);
pageJob.setEdgeValueClass(NullWritable.class); pageJob.setPartitioner(HashPartitioner.class);
pageJob.setOutputFormat(TextOutputFormat.class);
pageJob.setOutputKeyClass(Text.class);
pageJob.setOutputValueClass(DoubleWritable.class);
return pageJob;
} private static void printUsage() {
System.out.println("Usage: <input> <output> [tasks]");
System.exit(-1);
} public static class PagerankTextReader
extends
VertexInputReader<LongWritable, Text, Text, NullWritable, DoubleWritable> { @Override
public boolean parseVertex(LongWritable key, Text value,
Vertex<Text, NullWritable, DoubleWritable> vertex)
throws Exception {
String[] split = value.toString().split("\t");
for (int i = 0; i < split.length; i++) {
if (i == 0) {
vertex.setVertexID(new Text(split[i]));
} else {
vertex.addEdge(new Edge<Text, NullWritable>(new Text(
split[i]), null));
}
}
return true;
} } public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
if (args.length < 2)
printUsage(); HamaConfiguration conf = new HamaConfiguration(new Configuration());
GraphJob pageJob = createJob(args, conf); long startTime = System.currentTimeMillis();
if (pageJob.waitForCompletion(true)) {
System.out.println("Job Finished in "
+ (System.currentTimeMillis() - startTime) / 1000.0
+ " seconds");
}
}
}

View PageRank

输出:

Apache Hama安装部署的更多相关文章

  1. Apache Kylin安装部署

    0x01 Kylin安装环境 Kylin依赖于hadoop大数据平台,安装部署之前确认,大数据平台已经安装Hadoop, HBase, Hive. 1.1 了解kylin的两种二进制包 预打包的二进制 ...

  2. Apache Ranger安装部署

    1.概述 Apache Ranger提供了一个集中式的安全管理框架,用户可以通过操作Ranger Admin页面来配置各种策略,从而实现对Hadoop生成组件,比如HDFS.YARN.Hive.HBa ...

  3. Apache的安装部署 2(加密认证 ,网页重写 ,搭建论坛)

    一.http和https的基本理论知识1. 关于https: HTTPS(全称:Hypertext Transfer Protocol Secure,超文本传输安全协议),是以安全为目标的HTTP通道 ...

  4. Apache Solr 初级教程(介绍、安装部署、Java接口、中文分词)

    Python爬虫视频教程零基础小白到scrapy爬虫高手-轻松入门 https://item.taobao.com/item.htm?spm=a1z38n.10677092.0.0.482434a6E ...

  5. 安装部署Apache Hadoop (本地模式和伪分布式)

    本节内容: Hadoop版本 安装部署Hadoop 一.Hadoop版本 1. Hadoop版本种类 目前Hadoop发行版非常多,有华为发行版.Intel发行版.Cloudera发行版(CDH)等, ...

  6. Apache入门篇(一)之安装部署apache

    一.HTTPD特性 (1)高度模块化:core(核心) + modules(模块) = apache(2)动态模块加载DSO机制: Dynamic Shared Object(动态共享对象)(3)MP ...

  7. Apache Hadoop集群离线安装部署(三)——Hbase安装

    Apache Hadoop集群离线安装部署(一)——Hadoop(HDFS.YARN.MR)安装:http://www.cnblogs.com/pojishou/p/6366542.html Apac ...

  8. Apache Hadoop集群离线安装部署(二)——Spark-2.1.0 on Yarn安装

    Apache Hadoop集群离线安装部署(一)——Hadoop(HDFS.YARN.MR)安装:http://www.cnblogs.com/pojishou/p/6366542.html Apac ...

  9. Apache Hadoop集群离线安装部署(一)——Hadoop(HDFS、YARN、MR)安装

    虽然我已经装了个Cloudera的CDH集群(教程详见:http://www.cnblogs.com/pojishou/p/6267616.html),但实在太吃内存了,而且给定的组件版本是不可选的, ...

随机推荐

  1. 《Pro ASP.NET MVC 4》异常整理

    最近在和同学一起研究毕业设计,准备模仿<Pro ASP.NET MVC 4>里面的SportsStore设计模式和测试驱动开发. 由于和书中VS版本不同,发现不少问题,在此总结. 用户代码 ...

  2. Lock、ReentrantLock、synchronized、ReentrantReadWriteLock使用

    先来看一段代码,实现如下打印效果: 1 2 A 3 4 B 5 6 C 7 8 D 9 10 E 11 12 F 13 14 G 15 16 H 17 18 I 19 20 J 21 22 K 23 ...

  3. 《疯狂Java讲义》学习笔记——第2章 理解面向对象

    面向对象的三种基本特征:继承,封装,多态 UML(统一建模语言) 2.1 面向对象 2.1.1 结构化程序设计简介 图2.1  结构化软件的逻辑结构示意图 从图2.1可以看出,结构化设计需要采用自顶向 ...

  4. No.024:Swap Nodes in Pairs

    问题: Given a linked list, swap every two adjacent nodes and return its head. For example, Given 1-> ...

  5. 十一个行为模式之迭代器模式(Iterator Pattern)

    定义: 提供一种方法来访问聚合对象,而不用暴露这个对象的内部表示.使得存储和遍历两个职责相互分离,提高系统的可扩展性. 结构图: Iterator:抽象迭代器类,定义了访问和遍历元素的接口,例如:ne ...

  6. Effective C#中文版

    我看的书是<Effective C#中文版——改善C#程序的50种方法>,Bill Wagner著,李建忠译.书比较老了,04年写的,主要针对C#1.0,但我相信其中的观点现在仍有价值.( ...

  7. 缓存、队列(Memcached,Redis,rabbitMQ)

    一.Memcached Memcached 是一个高性能的分布式内存对象缓存系统,用于动态Web应用以减轻数据库负载.它通过在内存中缓存数据和对象来减少读取数据库的次数,从而提高动态.数据库驱动网站的 ...

  8. HTML5进阶段内联标签汇总(小篇)

    内联元素,与别人公用一行,但是设置宽高无效.其特点: ①和其他元素都在一行上: ②高,行高及外边距和内边距不可改变: ③宽度就是它的文字或图片的宽度,不可改变 ④内联元素只能容纳文本或者其他内联元素 ...

  9. SAP中需要记住的一些标准表

    E070 请求号的抬头表 E071 请求号的行项目表 E07T 请求号的文本 E71K TASK的抬头(子请求) ******************************************* ...

  10. Android自定义控件4--优酷菜单的菜单键及细节补充

    在上篇文章中实现了优酷菜单执行动画,本文接着完善已经实现的动画功能 本文地址:http://www.cnblogs.com/wuyudong/p/5915958.html ,转载请注明源地址. 已经实 ...