1.官方网址

http://spark.apache.org/

image.png

2.点击下载

下载最新版本目前是(2.4.3)
此spark预设为hadoop2.7或者更高版本,我前面安装的是hadoop3.1.2后面试一下不知道兼容不
具体地址:http://spark.apache.org/downloads.html

image.png

跳转到此页面继续选择一个下载地址

image.png

选择我们下载好的spark安装包上传到我们的虚拟机

image.png

上传成功

  1. [shaozhiqi@hadoop102 opt]$ cd software/
  2. [shaozhiqi@hadoop102 software]$ ll
  3. total 739668
  4. -rw-rw-r--. 1 shaozhiqi shaozhiqi 332433589 Jun 23 19:59 hadoop-3.1.2.tar.gz
  5. -rw-rw-r--. 1 shaozhiqi shaozhiqi 194990602 Jun 23 19:59 jdk-8u211-linux-x64.tar.gz
  6. -rw-rw-r--. 1 shaozhiqi shaozhiqi 229988313 Jun 30 17:46 spark-2.4.3-bin-hadoop2.7.tgz

解压

  1. [shaozhiqi@hadoop102 software]$ tar -zxvf spark-2.4.3-bin-hadoop2.7.tgz -C /opt/module/

进入解压后的spark目录

  1. [shaozhiqi@hadoop102 module]$ pwd
  2. /opt/module
  3. [shaozhiqi@hadoop102 module]$ ll
  4. total 12
  5. drwxr-xr-x. 15 shaozhiqi shaozhiqi 4096 Jun 30 10:48 hadoop-3.1.2
  6. drwxr-xr-x. 7 shaozhiqi shaozhiqi 4096 Jun 23 15:46 jdk1.8.0_211
  7. drwxr-xr-x. 13 shaozhiqi shaozhiqi 4096 May 1 13:19 spark-2.4.3-bin-hadoop2.7
  8. [shaozhiqi@hadoop102 module]$ cd spark-2.4.3-bin-hadoop2.7/
  9. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ ls
  10. bin data jars LICENSE NOTICE R RELEASE yarn
  11. conf examples kubernetes licenses python README.md sbin
  12. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$

3 相关文件解释

3.1 有bin目录和sbin目录,sbin目录里放的都是负责管理集群的命令

  1. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ cd sbin/
  2. [shaozhiqi@hadoop102 sbin]$ ls
  3. slaves.sh start-mesos-shuffle-service.sh stop-mesos-dispatcher.sh
  4. spark-config.sh start-shuffle-service.sh stop-mesos-shuffle-service.sh
  5. spark-daemon.sh start-slave.sh stop-shuffle-service.sh
  6. spark-daemons.sh start-slaves.sh stop-slave.sh
  7. start-all.sh start-thriftserver.sh stop-slaves.sh
  8. start-history-server.sh stop-all.sh stop-thriftserver.sh
  9. start-master.sh stop-history-server.sh
  10. start-mesos-dispatcher.sh stop-master.sh
  11. [shaozhiqi@hadoop102 sbin]$

3.2 bin目录里面是一些spark具体的操作命令,如提交任务等

  1. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ cd bin/
  2. [shaozhiqi@hadoop102 bin]$ ls
  3. beeline load-spark-env.sh spark-class spark-shell spark-submit
  4. beeline.cmd pyspark spark-class2.cmd spark-shell2.cmd spark-submit2.cmd
  5. docker-image-tool.sh pyspark2.cmd spark-class.cmd spark-shell.cmd spark-submit.cmd
  6. find-spark-home pyspark.cmd sparkR spark-sql
  7. find-spark-home.cmd run-example sparkR2.cmd spark-sql2.cmd
  8. load-spark-env.cmd run-example.cmd sparkR.cmd spark-sql.cmd
  9. [shaozhiqi@hadoop102 bin]$

3.3 Conf主要是spark的配置文件

  1. [shaozhiqi@hadoop102 conf]$ ll
  2. total 36
  3. -rw-r--r--. 1 shaozhiqi shaozhiqi 996 May 1 13:19 docker.properties.template
  4. -rw-r--r--. 1 shaozhiqi shaozhiqi 1105 May 1 13:19 fairscheduler.xml.template
  5. -rw-r--r--. 1 shaozhiqi shaozhiqi 2025 May 1 13:19 log4j.properties.template
  6. -rw-r--r--. 1 shaozhiqi shaozhiqi 7801 May 1 13:19 metrics.properties.template
  7. -rw-r--r--. 1 shaozhiqi shaozhiqi 865 May 1 13:19 slaves.template
  8. -rw-r--r--. 1 shaozhiqi shaozhiqi 1292 May 1 13:19 spark-defaults.conf.template
  9. -rwxr-xr-x. 1 shaozhiqi shaozhiqi 4221 May 1 13:19 spark-env.sh.template
  10. [shaozhiqi@hadoop102 conf]$ pwd
  11. /opt/module/spark-2.4.3-bin-hadoop2.7/conf
  12. [shaozhiqi@hadoop102 conf]$

4. 操作

4.1 重命名这三个配置文件:

  1. [shaozhiqi@hadoop102 conf]$ mv slaves.template slaves
  2. [shaozhiqi@hadoop102 conf]$ mv spark-defaults.conf.template spark-defaults.conf
  3. [shaozhiqi@hadoop102 conf]$ mv spark-env.sh.template spark-env.sh

4.2修改slaves(配置worker)

  1. [shaozhiqi@hadoop102 conf]$ vim slaves
  2. # A Spark Worker will be started on each of the machines listed below.
  3. hadoop102
  4. hadoop103
  5. hadoop104

4.3修改spark-env.sh,配置marster

  1. [shaozhiqi@hadoop102 conf]$ vim spark-env.sh
  2. SPARK_MASTER_HOST=hadoop102
  3. SPARK_MASTER_PORT=7077
  4. # Options for the daemons used in the standalone deploy mode
  5. # - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
  6. # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master

4.4分发到我们的其他机器

  1. [shaozhiqi@hadoop102 module]$ testxsync spark-2.4.3-bin-hadoop2.7/

4.5检查是否分发成功

103成功多了spark-2.4.3-bin-hadoop2.7

  1. [shaozhiqi@hadoop103 module]$ ll
  2. total 12
  3. drwxr-xr-x. 15 shaozhiqi shaozhiqi 4096 Jun 30 10:30 hadoop-3.1.2
  4. drwxr-xr-x. 7 shaozhiqi shaozhiqi 4096 Jun 23 15:19 jdk1.8.0_211
  5. drwxr-xr-x. 13 shaozhiqi shaozhiqi 4096 Jun 30 18:35 spark-2.4.3-bin-hadoop2.7
  6. [shaozhiqi@hadoop103 module]$

104成功

  1. [shaozhiqi@hadoop104 ~]$ cd /opt/module/
  2. [shaozhiqi@hadoop104 module]$ ll
  3. total 12
  4. drwxr-xr-x. 15 shaozhiqi shaozhiqi 4096 Jun 30 10:27 hadoop-3.1.2
  5. drwxr-xr-x. 7 shaozhiqi shaozhiqi 4096 Jun 23 15:23 jdk1.8.0_211
  6. drwxr-xr-x. 13 shaozhiqi shaozhiqi 4096 Jun 30 18:35 spark-2.4.3-bin-hadoop2.7
  7. [shaozhiqi@hadoop104 module]$

4.6单独启动spark(Hadoop的namenode和datanode都没有启动)

  1. [shaozhiqi@hadoop102 hadoop-3.1.2]$ jps
  2. 12022 Jps
  3. [shaozhiqi@hadoop102 hadoop-3.1.2]$

到spark目录

  1. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ sbin/start-all.sh
  2. starting org.apache.spark.deploy.master.Master, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.master.Master-1-hadoop102.out
  3. hadoop104: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop104.out
  4. hadoop103: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop103.out
  5. hadoop102: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop102.out
  6. hadoop104: failed to launch: nice -n 0 /opt/module/spark-2.4.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop102:7077
  7. hadoop104: JAVA_HOME is not set
  8. hadoop104: full log in /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop104.out
  9. hadoop103: failed to launch: nice -n 0 /opt/module/spark-2.4.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop102:7077
  10. hadoop103: JAVA_HOME is not set
  11. hadoop103: full log in /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop103.out
  12. hadoop102: failed to launch: nice -n 0 /opt/module/spark-2.4.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop102:7077
  13. hadoop102: JAVA_HOME is not set
  14. hadoop102: full log in /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop102.out
  15. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$

日志中也有fail,验证下页面:

image.png

Workers没有其他机器,启动失败

4.7重新修改下我们的配置文件,先停掉spark

  1. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ sbin/stop-all.sh
  2. export JAVA_HOME=/opt/module/jdk1.8.0_211
  3. export SPARK_MASTER_HOS=hadoop102
  4. export SPARK_MASTER_PORT=7077

4.8重新分发下修改的配置

  1. [shaozhiqi@hadoop102 module]$ testxsync spark-2.4.3-bin-hadoop2.7/

4.9重新启动spark

  1. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ sbin/start-all.sh
  2. starting org.apache.spark.deploy.master.Master, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.master.Master-1-hadoop102.out
  3. hadoop103: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop103.out
  4. hadoop104: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop104.out
  5. hadoop102: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop102.out
  6. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$

4.10验证:

image.png

4.11查看进程:

102

  1. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ jps
  2. 13217 Worker
  3. 13297 Jps
  4. 13135 Master
  5. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$

103

  1. [shaozhiqi@hadoop103 conf]$ jps
  2. 10528 Worker
  3. 10601 Jps
  4. [shaozhiqi@hadoop103 conf]$

104

  1. [shaozhiqi@hadoop104 module]$ jps
  2. 11814 Jps
  3. 11741 Worker
  4. [shaozhiqi@hadoop104 module]$

4.12跑一个官方的示例

查看示例版本

  1. [shaozhiqi@hadoop102 examples]$ cd jars
  2. [shaozhiqi@hadoop102 jars]$ ll
  3. total 2132
  4. -rw-r--r--. 1 shaozhiqi shaozhiqi 153982 May 1 13:19 scopt_2.11-3.7.0.jar
  5. -rw-r--r--. 1 shaozhiqi shaozhiqi 2023919 May 1 13:19 spark-examples_2.11-2.4.3.jar

提交任务
bin/spark-submit
--class org.apache.spark.examples.SparkPi \ //指定一个主类
--master spark://hadoop102:7077 \ //指明也提交给那个集群
--executor-memory 1G \ //任务执行时的内存可不指定
--total-executor-cores 2 // 执行executor个数
./examples/jars/spark-examples_2.11-2.4.3.jar \ //那个jar包执行
100 //参数

  1. bin/spark-submit \
  2. --class org.apache.spark.examples.SparkPi \
  3. --master spark://hadoop102:7077 \
  4. --executor-memory 1G \
  5. --total-executor-cores 2 \
  6. ./examples/jars/spark-examples_2.11-2.4.3.jar \
  7. 100

查看我们的spark监控:发现了我们刚刚执行的任务在执行中

image.png

4.13 Spark-shell也可以提交任务。会打开我们的Scala代码编辑器,这样我们可以直接写代码进行提交任务

  1. [shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ bin/spark-shell --master spark://hadoop102:7077
  2. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
  3. Setting default log level to "WARN".
  4. To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
  5. Spark context Web UI available at http://hadoop102:4040
  6. Spark context available as 'sc' (master = spark://hadoop102:7077, app id = app-20190630044455-0001).
  7. Spark session available as 'spark'.
  8. Welcome to
  9. ____ __
  10. / __/__ ___ _____/ /__
  11. _\ \/ _ \/ _ `/ __/ '_/
  12. /___/ .__/\_,_/_/ /_/\_\ version 2.4.3
  13. /_/
  14. Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_211)
  15. Type in expressions to have them evaluated.
  16. Type :help for more information.
  17. scala>

访问4.13.中的web ui http://hadoop102:4040

之所以要替换成IP是因为我们的win10没有配置ip和机器名的映射,此页面的作用我后续会补充

image.png

spark下载安装,运行examples(spark一)的更多相关文章

  1. 大数据学习day18----第三阶段spark01--------0.前言(分布式运算框架的核心思想,MR与Spark的比较,spark可以怎么运行,spark提交到spark集群的方式)1. spark(standalone模式)的安装 2. Spark各个角色的功能 3.SparkShell的使用,spark编程入门(wordcount案例)

    0.前言 0.1  分布式运算框架的核心思想(此处以MR运行在yarn上为例)  提交job时,resourcemanager(图中写成了master)会根据数据的量以及工作的复杂度,解析工作量,从而 ...

  2. MongoDB下载+安装+运行

    一. 官网下载安装 MongoDB 提供了 OSX 平台上 64 位的安装包,你可以在官网下载安装包. 下载地址:MongoDB官网-Community Server 选择适合自己平台的版本, 下载对 ...

  3. Win10-64位 免安装版Mysql8下载安装运行

    今天忙活了很久去下载安装Mysql,感觉网上的那些教程怎么都对不上呢,很奇怪,不过我乱点一通至少能用了,先凑和着用吧... 记录一下, 要是不对的,以后再修改...windows10系统 2018-5 ...

  4. Spark下载与入门(Spark自学二)

    2.1 下载Spark 略 2.2 Spark中Python和Scala的shell Spark shell可用来与分布式存储在许多机器的内存或者硬盘上的数据进行交互,并且处理过程的分发由Spark自 ...

  5. spark 卡在spark context,运行出现spark Exception encountered while connecting to the server : javax.security.sasl.SaslException

    原因: 使用root用户运行spark代码 解决方法:使用非管理员账户运行spark即可 [userone@localhost bin]$ ./add-user.sh What type of use ...

  6. Elasticsearch-6.7.0系列(一)9200端口 .tar.gz版本centos7环境--下载安装运行

    https://www.elastic.co/guide/index.html(推荐)        ES官方英文原版文档,一般会更新到最新版本 https://www.elastic.co/cn/d ...

  7. mac下Spark的安装与使用

    每次接触一个新的知识之前我都抱有恐惧之心,因为总认为自己没有接触到的知识都很高大上,比如上篇介绍到的Hadoop的安装与使用与本篇要介绍的Spark,其实在自己真正琢磨以后才发现本以为高大上的知识其实 ...

  8. 使用Ghost版本Windows7系统下载安装virtualBox和centos7异常解决

    使用Ghost版本Windows7系统下载安装virtualBox和centos7异常解决: 下载安装运行virtualBox时出现获取VirtualBox对象严重错误(如图): 解决方案步骤: 在开 ...

  9. Windows上安装运行Spark

    1.下载Scala: https://www.scala-lang.org/download/ ①注意:必须下载官方要求的JDK版本,并设置JAVA_HOME,否则后面将出现很多麻烦! ②Scala当 ...

随机推荐

  1. Vulnhub靶场 DC-2 WP

    DC-2简介 描述 与DC-1一样,DC-2是另一个专门构建的易受攻击的实验室,目的是获得渗透测试领域的经验. 与原始DC-1一样,它在设计时就考虑了初学者. 必须具备Linux技能并熟悉Linux命 ...

  2. GO语言web框架Gin之完全指南(二)

    这篇主要讲解自定义日志与数据验证 参数验证 我们知道,一个请求完全依赖前端的参数验证是不够的,需要前后端一起配合,才能万无一失,下面介绍一下,在Gin框架里面,怎么做接口参数验证的呢 gin 目前是使 ...

  3. SpringBoot源码分析(一)@SpringBootApplication解析

    @SpringBootApplication解析 三层注解 @SpringBootConfiguration @EnableAutoConfiguration @ComponentScan(exclu ...

  4. Django魔法

    (●'◡'●)

  5. Check If It Is a Straight Line

    2019-10-21 10:35:33 问题描述: 问题求解: public boolean checkStraightLine(int[][] coordinates) { int n = coor ...

  6. 李宏毅老师机器学习课程笔记_ML Lecture 3-1: Gradient Descent

    引言: 这个系列的笔记是台大李宏毅老师机器学习的课程笔记 视频链接(bilibili):李宏毅机器学习(2017) 另外已经有有心的同学做了速记并更新在github上:李宏毅机器学习笔记(LeeML- ...

  7. iOS URL schemes

    来源:知乎 launch center pro支持的参数主要有两个,[prompt]文本输入框和[clipboard]剪贴板 淘宝宝贝搜索 taobao://http://s.taobao.com/? ...

  8. Jmeter4.0之语言修改(二)

    下载最新Jmeter后,解压后,点击jmeter.bat,启动后,界面显示的是英文,那如何切换到中文了,步骤是点击Options中的Choose Language,选择中文 但是关闭Jmeter再次启 ...

  9. Bug2020011601,在ssh项目的applicaitonContext.xml中,少了一个双引号,打包成功(没报错),项目运行才发现

    在ssh项目的applicaitonContext.xml中,少了一个双引号,打包成功(没报错),项目运行才发现. 加上少的双引号,解决了.

  10. Linux 脏数据回刷参数与调优

    简介 我们知道,Linux用cache/buffer缓存数据,且有个回刷任务在适当时候把脏数据回刷到存储介质中.什么是适当的时候?换句话说,什么时候触发回刷?是脏数据达到多少阈值还是定时触发,或者两者 ...