1.通读http://spark.incubator.apache.org/docs/latest/spark-standalone.html

2.在每台机器上将spark安装到/opt/spark

3.在第一台机器上启动spark master.

[root@jfp3-1 latest]# ./sbin/start-master.sh

在logs目录查看日志:

[root@jfp3-1 latest]# tail -100f logs/spark-root-org.apache.spark.deploy.master.Master-1-jfp3-1.out
Spark Command: /usr/java/default/bin/java -cp :/opt/spark/spark-0.9.0-incubating-bin-hadoop2/conf:/opt/spark/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip jfp3-1 --port 7077 --webui-port 8080
========================================

log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/02/21 04:59:50 INFO Master: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/02/21 04:59:50 INFO Master: Starting Spark master at spark://jfp3-1:7077
14/02/21 04:59:51 INFO MasterWebUI: Started Master web UI at http://jfp3-1:8080
14/02/21 04:59:51 INFO Master: I have been elected leader! New state: ALIVE

启动http://jfp3-1:8080上看集群的状况

4.在第2,3,4太机器上启动spark worker

[root@jfp3-2 latest]# ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://192.168.0.71:7077
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/02/21 05:05:09 INFO Worker: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/02/21 05:05:09 INFO Worker: Starting Spark worker jfp3-2:53344 with 32 cores, 61.9 GB RAM
14/02/21 05:05:09 INFO Worker: Spark home: /opt/spark/latest
14/02/21 05:05:09 INFO WorkerWebUI: Started Worker web UI at http://jfp3-2:8081
14/02/21 05:05:09 INFO Worker: Connecting to master spark://192.168.0.71:7077...
14/02/21 05:05:30 INFO Worker: Connecting to master spark://192.168.0.71:7077...
14/02/21 05:05:50 INFO Worker: Connecting to master spark://192.168.0.71:7077...
14/02/21 05:06:10 ERROR Worker: All masters are unresponsive! Giving up.

同时在master的日志中也发现错误日志:

14/02/21 05:06:23 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@jfp3-1:7077] -> [akka.tcp://sparkWorker@jfp3-3:53721]: Error [Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: jfp3-3/192.168.0.73:53721
]
14/02/21 05:06:23 INFO Master: akka.tcp://sparkWorker@jfp3-3:53721 got disassociated, removing it.
14/02/21 05:06:23 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@jfp3-1:7077] -> [akka.tcp://sparkWorker@jfp3-3:53721]: Error [Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: jfp3-3/192.168.0.73:53721
]

用IP连spark master出现问题改用hostname:

[root@jfp3-2 latest]# ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://jfp3-1:7077
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/02/21 05:08:41 INFO Worker: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/02/21 05:08:41 INFO Worker: Starting Spark worker jfp3-2:60198 with 32 cores, 61.9 GB RAM
14/02/21 05:08:41 INFO Worker: Spark home: /opt/spark/latest
14/02/21 05:08:41 INFO WorkerWebUI: Started Worker web UI at http://jfp3-2:8081
14/02/21 05:08:41 INFO Worker: Connecting to master spark://jfp3-1:7077...
14/02/21 05:08:41 INFO Worker: Successfully registered with master spark://jfp3-1:7077

5.在spark master界面上查看集群状态,发现多了3个worker

6. 启动HDFS集群

7.进入spark-shell界面:

[root@jfp3-1 latest]# MASTER=spark://jfp3-1:7077 ./bin/spark-shell

计算HDFS上的一个文件包含2144这个字符的行数

scala> val textFile = sc.textFile("hdfs://192.168.0.71/user/shaochen/apsh/20111201/20111201/44-ABIS-APSH-1G-20111201")
14/02/21 10:16:18 INFO MemoryStore: ensureFreeSpace(146579) called with curMem=0, maxMem=308713881
14/02/21 10:16:18 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 143.1 KB, free 294.3 MB)
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12

scala> val targetRows = textFile.filter(line => line.contains("2144"))
targetRows: org.apache.spark.rdd.RDD[String] = FilteredRDD[2] at filter at <console>:14

scala> targetRows.count()
14/02/21 10:18:27 INFO FileInputFormat: Total input paths to process : 1
14/02/21 10:18:27 INFO SparkContext: Starting job: count at <console>:17
14/02/21 10:18:27 INFO DAGScheduler: Got job 0 (count at <console>:17) with 11 output partitions (allowLocal=false)
14/02/21 10:18:27 INFO DAGScheduler: Final stage: Stage 0 (count at <console>:17)
14/02/21 10:18:27 INFO DAGScheduler: Parents of final stage: List()
14/02/21 10:18:27 INFO DAGScheduler: Missing parents: List()
14/02/21 10:18:27 INFO DAGScheduler: Submitting Stage 0 (FilteredRDD[2] at filter at <console>:14), which has no missing parents
14/02/21 10:18:27 INFO DAGScheduler: Submitting 11 missing tasks from Stage 0 (FilteredRDD[2] at filter at <console>:14)
14/02/21 10:18:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 11 tasks
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:0 as 1716 bytes in 5 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:1 as 1716 bytes in 1 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:2 as TID 2 on executor 0: jfp3-4 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:2 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:3 as TID 3 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:3 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:4 as TID 4 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:4 as 1716 bytes in 1 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:5 as TID 5 on executor 0: jfp3-4 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:5 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:6 as TID 6 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:6 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:7 as TID 7 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:7 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:8 as TID 8 on executor 0: jfp3-4 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:8 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:9 as TID 9 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:9 as 1716 bytes in 1 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:10 as TID 10 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:10 as 1716 bytes in 1 ms
14/02/21 10:18:30 INFO TaskSetManager: Finished TID 10 in 2850 ms on jfp3-2 (progress: 0/11)
14/02/21 10:18:30 INFO DAGScheduler: Completed ResultTask(0, 10)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 5 in 3188 ms on jfp3-4 (progress: 1/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 5)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 8 in 3188 ms on jfp3-4 (progress: 2/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 8)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 1 in 3237 ms on jfp3-2 (progress: 3/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 1)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 7 in 3234 ms on jfp3-2 (progress: 4/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 7)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 2 in 3269 ms on jfp3-4 (progress: 5/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 2)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 9 in 3300 ms on jfp3-3 (progress: 6/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 9)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 4 in 3362 ms on jfp3-2 (progress: 7/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 4)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 3 in 3423 ms on jfp3-3 (progress: 8/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 3)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 6 in 3439 ms on jfp3-3 (progress: 9/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 6)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 0 in 3458 ms on jfp3-3 (progress: 10/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 0)
14/02/21 10:18:31 INFO TaskSchedulerImpl: Remove TaskSet 0.0 from pool
14/02/21 10:18:31 INFO DAGScheduler: Stage 0 (count at <console>:17) finished in 3.466 s
14/02/21 10:18:31 INFO SparkContext: Job finished: count at <console>:17, took 3.593541623 s
res0: Long = 12129

附录:

命令脚本集合:

启动master:

/opt/spark/latest/sbin/start-master.sh

启动worker:

/opt/spark/latest/bin/spark-class org.apache.spark.deploy.worker.Worker spark://jfp3-1:7077

在standalone模式下运行yarn 0.9.0对HDFS上的数据进行计算的更多相关文章

  1. OLE DB访问接口“MICROSOFT.JET.OLEDB.4.0”配置为在单线程单位模式下运行,所以该访问接口无法用于分布式

    OLE DB访问接口"MICROSOFT.JET.OLEDB.4.0"配置为在单线程单位模式下运行,所以该访问接口无法用于分布式 数据库操作excel时遇到的以上问题的解决方法 解 ...

  2. MySQL-Front 出现“程序注册时间到期 程序将被限制模式下运行”解决方式

    MySQL-Front 出现“程序注册时间到期 程序将被限制模式下运行”解决方式 在用mysql-front的时候遇到显示:程序注册时间到期程序将被限制模式下运行.可以在“帮助”菜单下的点“登记”-- ...

  3. [Selenium]Grid模式下运行时打印出当前Case在哪台node机器上运行

    当Case在本地运行成功,在Grid模式下运行失败时,我们需要在Grid模式下进行调试,同时登录远程的node去查看运行的情况. Hub是随机将case分配到某台node上运行的,怎样知道当前的cas ...

  4. 非GUI模式下运行JMeter和远程启动JMeter

    JMeter是一款非常不错的免费开源压力测试工具,越来越多的公司在使用.不过,在使用过程中可能会存在一些问题,比如:GUI模式非常消耗资源,单个客户端测试无法达到目标压力.而使用非 GUI 模式,即命 ...

  5. 教你50招提升ASP.NET性能(十一):避免在调试模式下运行网站

    (17)Avoid running sites in debug mode 招数17: 避免在调试模式下运行网站 When it comes to ASP.NET, one of the most c ...

  6. 关于spark standalone模式下的executor问题

    1.spark standalone模式下,worker与executor是一一对应的. 2.如果想要多个worker,那么需要修改spark-env的SPARK_WORKER_INSTANCES为2 ...

  7. C++程序在debug模式下遇到Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call问题。

    今天遇到一个Access Violation的crash,只看crash call stack没有找到更多的线索,于是在debug模式下又跑了一遍,遇到了如下的一个debug的错误提示框: 这个是什么 ...

  8. Standalone模式下,通过Systemd管理Flink1.11.1的启停及异常退出

    Flink以Standalone模式运行时,可能会发生jobmanager(以下简称jm)或taskmanager(以下简称tm)异常退出的情况,我们可以使用Linux自带的Systemd方式管理jm ...

  9. 在debug模式下运行不报错,换到release模式下报找不到某某库或文件的错。。解决办法

    我遇到的问题是:把edit secheme调到debug模式运行没有问题,然后调到release模式的时候报目录下没有libTuyoo.a 解决办法 把断开真机设备,用IOS device下relea ...

随机推荐

  1. C#中把Datatable转换为Json的5个代码实例

    一. /// <summary> /// Datatable转换为Json /// </summary> /// <param name="table" ...

  2. Java通过继承thread类与实现Runnable接口实现多线程的区别

    Java中线程的创建有两种方式: 1.  通过继承Thread类,重写Thread的run()方法,将线程运行的逻辑放在其中 2.  通过实现Runnable接口,实例化Thread类 一.通过继承T ...

  3. 【Jersey】IntelliJ IDEA + Maven + Jetty + Jersey搭建RESTful服务

    本文参考以下内容: 使用Jersey实现RESTful风格的webservice(一) Starting out with Jersey & Apache Tomcat using Intel ...

  4. JFrame中setDefaultCloseOperation的参数含义

    实例1:一个空的java窗口 // JFrameDemo1.java import javax.swing.*;     //使用Swing类,必须引入Swing包 public class JFra ...

  5. Mac平台上OpenCV开发环境搭建

    转载于:https://segmentfault.com/a/1190000000711132 linux 编译指定库.头文件的路径问题 http://blog.csdn.net/jiaweizou/ ...

  6. Oracle简单的函数语言

    函数:这里的函数相当于java中写好的一些方法,有名字,可以传递参数,实现某一项具体功能. 函数分为: 1.单行函数 1.字符函数 2.日期函数 3.数字函数 4.转换函数 2.分组函数(后面的章节再 ...

  7. php中的匿名函数和闭包(closure)

    一:匿名函数 (在php5.3.0 或以上才能使用) php中的匿名函数(Anonymous functions), 也叫闭包函数(closures), 允许指定一个没有名称的函数.最常用的就是回调函 ...

  8. VBA中练习ADO:ActiveX Data Object

    前期绑定,要先添加引用---"Microsoft ActiveX Data Objects 6.1" ADO学习的权威参考可点击:w3school ADO简单理解:是几个Activ ...

  9. Hosts文件的使用

    hosts文件是什么?在哪里?hosts文件:系统文件,可以记事本打开并编辑.一般用于域名到ip地址的解析.当用户在浏览器中输入网络的域名时,系统首先会自动从hosts文件中找到对应的ip地址,一旦找 ...

  10. XAF响应式布局皮肤界面展示

    XAF为了对手机.平板电脑的支持,增加了新的响应式布局皮肤支持,这个功能已经出来很久了,对于平板电脑.PC的支持已经很不错了,对于手机的界面还不是很完美. 本篇展示一下当前的效果,让有需要的同学.还没 ...