• 环境准备:

在虚拟机下,大家三台Linux ubuntu 14.04 server x64 系统(下载地址:http://releases.ubuntu.com/14.04.2/ubuntu-14.04.2-server-amd64.iso):

192.168.1.200 master

192.168.1.201 node1

192.168.1.202 node2

  • 在Master上安装Spark环境:

Spark集群环境搭建:

搭建hadoop集群使用hadoop版本是hadoop2.6.4(这里hadoop我已经安装完成,具体如何安装hadoop具体请参考我的文章:《Hadoop:搭建hadoop集群》)

搭建spark这里使用spark版本是spark1.6.2(spark-1.6.2-bin-hadoop2.6.tgz)

1、下载安装包到master虚拟服务器:

在线下载:

  1. hadoop@master:~/ wget http://mirror.bit.edu.cn/apache/spark/spark-1.6.2/spark-1.6.2-bin-hadoop2.6.tgz

离线上传到集群:

2、解压spark安装包到master虚拟服务器/usr/local/spark下,并分配权限:

  1. #解压到/usr/local/下
    hadoop@master:~$ sudo tar -zxvf spark-1.6.2-bin-hadoop2.6.tgz -C /usr/local/
  2. hadoop@master:~$ cd /usr/local/
  3. hadoop@master:/usr/local$ ls
  4. bin games include man share src
  5. etc hadoop lib sbin spark-1.6.2-bin-hadoop2.6
    #重命名 为spark
  6. hadoop@master:/usr/local$ sudo mv spark-1.6.2-bin-hadoop2.6/ spark/
  7. hadoop@master:/usr/local$ ls
  8. bin etc games hadoop include lib man sbin share spark src
    #分配权限
  9. hadoop@master:/usr/local$ sudo chown -R hadoop:hadoop spark
  10. hadoop@master:/usr/local$

3、在master虚拟服务器/etc/profile中添加Spark环境变量:

 编辑/etc/profile文件

  1. sudo vim /etc/profile

在尾部添加$SPARK_HOME变量,添加后,目前我的/etc/profile文件尾部内容如下:

  1. export JAVA_HOME=/usr/lib/jvm/java-8-oracle
  2. export JRE_HOME=/usr/lib/jvm/java-8-oracle
  3. export SCALA_HOME=/opt/scala/scala-2.10.5
  4. # add hadoop bin/ directory to PATH
  5. export HADOOP_HOME=/usr/local/hadoop
  6. export SPARK_HOME=/usr/local/spark
  7. export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$JAVA_HOME:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH
  8. export CLASSPATH=$CLASS_PATH::$JAVA_HOME/lib:$JAVA_HOME/jre/lib

生效:

  1. source /etc/profile
  • 在Master配置Spark:

1、配置master虚拟服务器hadoop-env.sh文件:

  1. sudo vim /usr/local/spark/conf/hadoop-env.sh

注意:默认情况下没有hadoop-env.sh和slaves文件,而是*.template文件,需要重命名:

  1. hadoop@master:/usr/local/spark/conf$ ls
  2. docker.properties.template metrics.properties.template spark-env.sh
  3. fairscheduler.xml.template slaves.template
  4. log4j.properties.template spark-defaults.conf.template
  5. hadoop@master:/usr/local/spark/conf$ sudo vim spark-env.sh
  6. hadoop@master:/usr/local/spark/conf$ mv slaves.template slaves

在文件末尾追加如下内容:

  1. export STANDALONE_SPARK_MASTER_HOST=192.168.1.200
  2. export SPARK_MASTER_IP=192.168.1.200
  3. export SPARK_WORKER_CORES=1
  4. #every slave node start work instance count
  5. export SPARK_WORKER_INSTANCES=1
  6. export SPARK_MASTER_PORT=7077
  7. export SPARK_WORKER_MEMORY=1g
  8. export MASTER=spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT}
  9. export SCALA_HOME=/opt/scala/scala-2.10.5
  10. export JAVA_HOME=/usr/lib/jvm/java-8-oracle
  11. export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://192.168.1.200:9000/SparkEventLog"
  12. export SPARK_WORKDER_OPTS="-Dspark.worker.cleanup.enabled=true"
  13. export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

2、配置master虚拟服务器下slaves文件:

  1. sudo vim /usr/local/spark/conf/slaves

在slaves文件中内容如下:

  1. 192.168.1.201
  2. 192.168.1.202

注意:每行写一个机器的ip。

3、Master虚拟机下/usr/local/spark/目录下创建logs文件夹,并分配777权限:

  1. hadoop@master:/usr/local/spark$ mkdir logs
  2. hadoop@master:/usr/local/spark$ chmod 777 logs
  • 复制Master虚拟服务器上的/usr/loca/spark下文件到所有slaves节点(node1、node2)下:

1、复制Master虚拟服务器上/usr/local/spark/安装文件到各个salves(node1、node2)上:

注意:拷贝钱需要在ssh到所有salves节点(node1、node2)上,创建/usr/local/spark/目录,并分配777权限。

  1. hadoop@master:/usr/local/spark/conf$ cd ~/
  2. hadoop@master:~$ sudo chmod 777 /usr/local/spark
  3. hadoop@master:~$ scp -r /usr/local/spark hadoop@node1:/usr/local
  4. scp: /usr/local/spark: Permission denied
  5. hadoop@master:~$ sudo scp -r /usr/local/spark hadoop@node1:/usr/local
  6. hadoop@node1's password:
  7. scp: /usr/local/spark: Permission denied
  8. hadoop@master:~$ sudo chmod 777 /usr/local/spark
  9. hadoop@master:~$ ssh node1
  10. Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64)
  11.  
  12. * Documentation: https://help.ubuntu.com/
  13.  
  14. System information as of Fri Sep 23 16:40:31 UTC 2016
  15.  
  16. System load: 0.08 Processes: 400
  17. Usage of /: 12.2% of 17.34GB Users logged in: 0
  18. Memory usage: 5% IP address for eth0: 192.168.1.201
  19. Swap usage: 0%
  20.  
  21. Graph this data and manage this system at:
  22. https://landscape.canonical.com/
  23.  
  24. New release '16.04.1 LTS' available.
  25. Run 'do-release-upgrade' to upgrade to it.
  26.  
  27. Last login: Wed Sep 21 16:19:25 2016 from master
  28. hadoop@node1:~$ cd /usr/local/
  29. hadoop@node1:/usr/local$ sudo mkdir spark
  30. [sudo] password for hadoop:
  31. hadoop@node1:/usr/local$ ls
  32. bin etc games hadoop include lib man sbin share spark src
  33. hadoop@node1:/usr/local$ sudo chmod 777 ./spark
  34. hadoop@node1:/usr/local$ exit
  35. hadoop@master:~$ scp -r /usr/local/spark hadoop@node1:/usr/local
  36. ...........
  37. hadoop@master:~$ ssh node2
  38. Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64)
  39.  
  40. * Documentation: https://help.ubuntu.com/
  41.  
  42. System information as of Fri Sep 23 16:15:03 UTC 2016
  43.  
  44. System load: 0.08 Processes: 435
  45. Usage of /: 13.0% of 17.34GB Users logged in: 0
  46. Memory usage: 6% IP address for eth0: 192.168.1.202
  47. Swap usage: 0%
  48.  
  49. Graph this data and manage this system at:
  50. https://landscape.canonical.com/
  51.  
  52. Last login: Wed Sep 21 16:19:47 2016 from master
  53. hadoop@node2:~$ cd /usr/local
  54. hadoop@node2:/usr/local$ sudo mkdir spark
  55. [sudo] password for hadoop:
  56. hadoop@node2:/usr/local$ sudo chmod 777 ./spark
  57. hadoop@node2:/usr/local$ exit
  58. logout
  59. Connection to node2 closed.
  60. hadoop@master:~$ scp -r /usr/local/spark hadoop@node2:/usr/local
  61. ...........

2、修改所有salves节点(node1、node2)上/etc/profile,追加$SPARK_HOME环境变量:

注意:一般都会遇到权限问题。最好登录到各个salves节点(node1、node2)上手动编辑/etc/profile。

  1. hadoop@master:~$ ssh node1
  2. Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64)
  3.  
  4. * Documentation: https://help.ubuntu.com/
  5.  
  6. System information as of Fri Sep 23 16:42:44 UTC 2016
  7.  
  8. System load: 0.01 Processes: 400
  9. Usage of /: 12.2% of 17.34GB Users logged in: 0
  10. Memory usage: 5% IP address for eth0: 192.168.1.201
  11. Swap usage: 0%
  12.  
  13. Graph this data and manage this system at:
  14. https://landscape.canonical.com/
  15.  
  16. New release '16.04.1 LTS' available.
  17. Run 'do-release-upgrade' to upgrade to it.
  18.  
  19. Last login: Fri Sep 23 16:40:52 2016 from master
  20. hadoop@node1:~$ sudo vim /etc/profile
  21. [sudo] password for hadoop:
  22. hadoop@node1:~$ exit
  23. logout
  24. Connection to node1 closed.
  25. hadoop@master:~$ ssh node2
  26. Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64)
  27.  
  28. * Documentation: https://help.ubuntu.com/
  29.  
  30. System information as of Fri Sep 23 16:44:42 UTC 2016
  31.  
  32. System load: 0.0 Processes: 400
  33. Usage of /: 13.0% of 17.34GB Users logged in: 0
  34. Memory usage: 5% IP address for eth0: 192.168.1.202
  35. Swap usage: 0%
  36.  
  37. Graph this data and manage this system at:
  38. https://landscape.canonical.com/
  39.  
  40. New release '16.04.1 LTS' available.
  41. Run 'do-release-upgrade' to upgrade to it.
  42.  
  43. Last login: Fri Sep 23 16:43:31 2016 from master
  44. hadoop@node2:~$ sudo vim /etc/profile
  45. [sudo] password for hadoop:
  46. hadoop@node2:~$ exit
  47. logout
  48. Connection to node2 closed.
  49. hadoop@master:~$

修改后的所有salves上/etc/profile文件与master节点上/etc/profile文件配置一致。

  • 在Master启动spark并验证是否配置成功:

1、启动命令:

一般要确保hadoop已经启动,之后才启动spark

  1. hadoop@master:~$ cd /usr/local/spark/
  2. hadoop@master:/usr/local/spark$ ./sbin/start-all.sh

2、验证是否启动成功:

方法一、jps

  1. hadoop@master:/usr/local/spark$ ./sbin/start-all.sh
  2. starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master--master.out
  3. 192.168.1.201: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker--node1.out
  4. 192.168.1.202: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker--node2.out
  5. hadoop@master:/usr/local/spark$ jps
  6. NameNode
  7. SecondaryNameNode
  8. Jps
  9. ResourceManager
  10. Master
  11. hadoop@master:/usr/local/spark$ cd ~/
  12. hadoop@master:~$ ssh node1
  13. Welcome to Ubuntu 14.04. LTS (GNU/Linux 3.16.--generic x86_64)
  14.  
  15. * Documentation: https://help.ubuntu.com/
  16.  
  17. System information as of Fri Sep :: UTC
  18.  
  19. System load: 0.06 Processes:
  20. Usage of /: 13.9% of .34GB Users logged in:
  21. Memory usage: % IP address for eth0: 192.168.1.201
  22. Swap usage: %
  23.  
  24. Graph this data and manage this system at:
  25. https://landscape.canonical.com/
  26.  
  27. New release '16.04.1 LTS' available.
  28. Run 'do-release-upgrade' to upgrade to it.
  29.  
  30. Last login: Fri Sep :: from master
  31. hadoop@node1:~$ jps
  32. 1392 DataNode
  33. 2449 Jps
  34. 2330 Worker
  35. 2079 NodeManager
  36. hadoop@node1:~$ exit
  37. logout
  38. Connection to node1 closed.
  39. hadoop@master:~$ ssh node2
  40. Welcome to Ubuntu 14.04. LTS (GNU/Linux 3.16.--generic x86_64)
  41.  
  42. * Documentation: https://help.ubuntu.com/
  43.  
  44. System information as of Fri Sep :: UTC
  45.  
  46. System load: 0.07 Processes:
  47. Usage of /: 14.7% of .34GB Users logged in:
  48. Memory usage: % IP address for eth0: 192.168.1.202
  49. Swap usage: %
  50.  
  51. Graph this data and manage this system at:
  52. https://landscape.canonical.com/
  53.  
  54. New release '16.04.1 LTS' available.
  55. Run 'do-release-upgrade' to upgrade to it.
  56.  
  57. Last login: Fri Sep :: from master
  58. hadoop@node2:~$ jps
  59. Worker
  60. NodeManager
  61. DataNode
  62. Jps
  63. hadoop@node2:~$

方法二、web方式http://192.168.1.200:8080看是否正常:

Hadoop+Spark:集群环境搭建的更多相关文章

  1. Spark集群环境搭建——Hadoop集群环境搭建

    Spark其实是Hadoop生态圈的一部分,需要用到Hadoop的HDFS.YARN等组件. 为了方便我们的使用,Spark官方已经为我们将Hadoop与scala组件集成到spark里的安装包,解压 ...

  2. Spark 集群环境搭建

    思路: ①先在主机s0上安装Scala和Spark,然后复制到其它两台主机s1.s2 ②分别配置三台主机环境变量,并使用source命令使之立即生效 主机映射信息如下: 192.168.32.100 ...

  3. Spark集群环境搭建——部署Spark集群

    在前面我们已经准备了三台服务器,并做好初始化,配置好jdk与免密登录等.并且已经安装好了hadoop集群. 如果还没有配置好的,参考我前面两篇博客: Spark集群环境搭建--服务器环境初始化:htt ...

  4. Spark集群环境搭建——服务器环境初始化

    Spark也是属于Hadoop生态圈的一部分,需要用到Hadoop框架里的HDFS存储和YARN调度,可以用Spark来替换MR做分布式计算引擎. 接下来,讲解一下spark集群环境的搭建部署. 一. ...

  5. Hadoop、Spark 集群环境搭建问题汇总

    Hadoop 问题1: Hadoop Slave节点 NodeManager 无法启动 解决方法: yarn-site.xml reducer取数据的方式是mapreduce_shuffle 问题2: ...

  6. Hadoop、Spark 集群环境搭建

    1.基础环境搭建 1.1运行环境说明 1.1.1硬软件环境 主机操作系统:Windows 64位,四核8线程,主频3.2G,8G内存 虚拟软件:VMware Workstation Pro 虚拟机操作 ...

  7. Hadoop,HBase集群环境搭建的问题集锦(四)

    21.Schema.xml和solrconfig.xml配置文件里參数说明: 參考资料:http://www.hipony.com/post-610.html 22.执行时报错: 23., /comm ...

  8. hadoop分布式集群环境搭建

    参考 http://www.cnblogs.com/zhijianliutang/p/5736103.html 1 wget http://mirrors.shu.edu.cn/apache/hado ...

  9. Hadoop,HBase集群环境搭建的问题集锦(二)

    10.艾玛, Datanode也启动不了了? 找到log: Caused by: java.net.UnknownHostException: Invalid host name: local hos ...

随机推荐

  1. 转:45 个 LoadRunner 面试问题(附答案)_纯英文,太有逼格了

    What is load testing? - Load testing is to test that if the application works fine with the loads th ...

  2. 应用的启动视图 LauchView

    @interface AppDelegate () @property(strong,nonatomic) UIImageView *launchImaViewO; @property(strong, ...

  3. OSG+VS2010+win7环境搭建---OsgEarth编译

    OSG+VS2010+win7环境搭建---OsgEarth编译 转:http://www.cnblogs.com/hnfxs/p/3161261.html Win7下 osg+vs2010环境搭建 ...

  4. 使用IE建多个会话的小技巧

    1 按F10出现菜单 2 选择文件----新建会话即可

  5. JSch - Java实现的SFTP(文件下载详解篇)

    上一篇讲述了使用JSch实现文件上传的功能,这一篇主要讲述一下JSch实现文件下载的功能.并介绍一些SFTP的辅助方法,如cd,ls等. 同样,JSch的文件下载也支持三种传输模式:OVERWRITE ...

  6. web app 自适应方案总结 弹性布局之rem

    关于rem,主要参考文档 1.腾讯ISUX (http://isux.tencent.com/web-app-rem.html) 2.http://www.w3cplus.com/css3/defin ...

  7. [LintCode] Perfect Squares 完全平方数

    Given a positive integer n, find the least number of perfect square numbers (for example, 1, 4, 9, 1 ...

  8. Daily Scrum 11.2

    由于11月1号是周六,小组里人不是很齐,所以Scrum会议暂停一次. 周日大家的工作都已经进入到尾声了,但是由于人员方面出现一些问题,界面方面做的还不到位.鉴于我们还只是完成了一个比较简单的工作,与真 ...

  9. 创造tips的秘籍——PHP回调后门

    作者:Phithon 原文连接:https://www.leavesongs.com/PENETRATION/php-callback-backdoor.html 最近很多人分享一些过狗过盾的一句话, ...

  10. websocket总结

    一.WebSocket简介 WebSocket  protocol是HTML5一种新的协议,WebSocket 是目前唯一真正实现全双工通信的服务器向客户端推送的互联网技术.WebSocket的出现使 ...