spark下载安装,运行examples(spark一)
1.官方网址
2.点击下载
下载最新版本目前是(2.4.3)
此spark预设为hadoop2.7或者更高版本,我前面安装的是hadoop3.1.2后面试一下不知道兼容不
具体地址:http://spark.apache.org/downloads.html
跳转到此页面继续选择一个下载地址
选择我们下载好的spark安装包上传到我们的虚拟机
上传成功
[shaozhiqi@hadoop102 opt]$ cd software/
[shaozhiqi@hadoop102 software]$ ll
total 739668
-rw-rw-r--. 1 shaozhiqi shaozhiqi 332433589 Jun 23 19:59 hadoop-3.1.2.tar.gz
-rw-rw-r--. 1 shaozhiqi shaozhiqi 194990602 Jun 23 19:59 jdk-8u211-linux-x64.tar.gz
-rw-rw-r--. 1 shaozhiqi shaozhiqi 229988313 Jun 30 17:46 spark-2.4.3-bin-hadoop2.7.tgz
解压
[shaozhiqi@hadoop102 software]$ tar -zxvf spark-2.4.3-bin-hadoop2.7.tgz -C /opt/module/
进入解压后的spark目录
[shaozhiqi@hadoop102 module]$ pwd
/opt/module
[shaozhiqi@hadoop102 module]$ ll
total 12
drwxr-xr-x. 15 shaozhiqi shaozhiqi 4096 Jun 30 10:48 hadoop-3.1.2
drwxr-xr-x. 7 shaozhiqi shaozhiqi 4096 Jun 23 15:46 jdk1.8.0_211
drwxr-xr-x. 13 shaozhiqi shaozhiqi 4096 May 1 13:19 spark-2.4.3-bin-hadoop2.7
[shaozhiqi@hadoop102 module]$ cd spark-2.4.3-bin-hadoop2.7/
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ ls
bin data jars LICENSE NOTICE R RELEASE yarn
conf examples kubernetes licenses python README.md sbin
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$
3 相关文件解释
3.1 有bin目录和sbin目录,sbin目录里放的都是负责管理集群的命令
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ cd sbin/
[shaozhiqi@hadoop102 sbin]$ ls
slaves.sh start-mesos-shuffle-service.sh stop-mesos-dispatcher.sh
spark-config.sh start-shuffle-service.sh stop-mesos-shuffle-service.sh
spark-daemon.sh start-slave.sh stop-shuffle-service.sh
spark-daemons.sh start-slaves.sh stop-slave.sh
start-all.sh start-thriftserver.sh stop-slaves.sh
start-history-server.sh stop-all.sh stop-thriftserver.sh
start-master.sh stop-history-server.sh
start-mesos-dispatcher.sh stop-master.sh
[shaozhiqi@hadoop102 sbin]$
3.2 bin目录里面是一些spark具体的操作命令,如提交任务等
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ cd bin/
[shaozhiqi@hadoop102 bin]$ ls
beeline load-spark-env.sh spark-class spark-shell spark-submit
beeline.cmd pyspark spark-class2.cmd spark-shell2.cmd spark-submit2.cmd
docker-image-tool.sh pyspark2.cmd spark-class.cmd spark-shell.cmd spark-submit.cmd
find-spark-home pyspark.cmd sparkR spark-sql
find-spark-home.cmd run-example sparkR2.cmd spark-sql2.cmd
load-spark-env.cmd run-example.cmd sparkR.cmd spark-sql.cmd
[shaozhiqi@hadoop102 bin]$
3.3 Conf主要是spark的配置文件
[shaozhiqi@hadoop102 conf]$ ll
total 36
-rw-r--r--. 1 shaozhiqi shaozhiqi 996 May 1 13:19 docker.properties.template
-rw-r--r--. 1 shaozhiqi shaozhiqi 1105 May 1 13:19 fairscheduler.xml.template
-rw-r--r--. 1 shaozhiqi shaozhiqi 2025 May 1 13:19 log4j.properties.template
-rw-r--r--. 1 shaozhiqi shaozhiqi 7801 May 1 13:19 metrics.properties.template
-rw-r--r--. 1 shaozhiqi shaozhiqi 865 May 1 13:19 slaves.template
-rw-r--r--. 1 shaozhiqi shaozhiqi 1292 May 1 13:19 spark-defaults.conf.template
-rwxr-xr-x. 1 shaozhiqi shaozhiqi 4221 May 1 13:19 spark-env.sh.template
[shaozhiqi@hadoop102 conf]$ pwd
/opt/module/spark-2.4.3-bin-hadoop2.7/conf
[shaozhiqi@hadoop102 conf]$
4. 操作
4.1 重命名这三个配置文件:
[shaozhiqi@hadoop102 conf]$ mv slaves.template slaves
[shaozhiqi@hadoop102 conf]$ mv spark-defaults.conf.template spark-defaults.conf
[shaozhiqi@hadoop102 conf]$ mv spark-env.sh.template spark-env.sh
4.2修改slaves(配置worker)
[shaozhiqi@hadoop102 conf]$ vim slaves
# A Spark Worker will be started on each of the machines listed below.
hadoop102
hadoop103
hadoop104
4.3修改spark-env.sh,配置marster
[shaozhiqi@hadoop102 conf]$ vim spark-env.sh
SPARK_MASTER_HOST=hadoop102
SPARK_MASTER_PORT=7077
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
4.4分发到我们的其他机器
[shaozhiqi@hadoop102 module]$ testxsync spark-2.4.3-bin-hadoop2.7/
4.5检查是否分发成功
103成功多了spark-2.4.3-bin-hadoop2.7
[shaozhiqi@hadoop103 module]$ ll
total 12
drwxr-xr-x. 15 shaozhiqi shaozhiqi 4096 Jun 30 10:30 hadoop-3.1.2
drwxr-xr-x. 7 shaozhiqi shaozhiqi 4096 Jun 23 15:19 jdk1.8.0_211
drwxr-xr-x. 13 shaozhiqi shaozhiqi 4096 Jun 30 18:35 spark-2.4.3-bin-hadoop2.7
[shaozhiqi@hadoop103 module]$
104成功
[shaozhiqi@hadoop104 ~]$ cd /opt/module/
[shaozhiqi@hadoop104 module]$ ll
total 12
drwxr-xr-x. 15 shaozhiqi shaozhiqi 4096 Jun 30 10:27 hadoop-3.1.2
drwxr-xr-x. 7 shaozhiqi shaozhiqi 4096 Jun 23 15:23 jdk1.8.0_211
drwxr-xr-x. 13 shaozhiqi shaozhiqi 4096 Jun 30 18:35 spark-2.4.3-bin-hadoop2.7
[shaozhiqi@hadoop104 module]$
4.6单独启动spark(Hadoop的namenode和datanode都没有启动)
[shaozhiqi@hadoop102 hadoop-3.1.2]$ jps
12022 Jps
[shaozhiqi@hadoop102 hadoop-3.1.2]$
到spark目录
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.master.Master-1-hadoop102.out
hadoop104: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop104.out
hadoop103: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop103.out
hadoop102: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop102.out
hadoop104: failed to launch: nice -n 0 /opt/module/spark-2.4.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop102:7077
hadoop104: JAVA_HOME is not set
hadoop104: full log in /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop104.out
hadoop103: failed to launch: nice -n 0 /opt/module/spark-2.4.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop102:7077
hadoop103: JAVA_HOME is not set
hadoop103: full log in /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop103.out
hadoop102: failed to launch: nice -n 0 /opt/module/spark-2.4.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop102:7077
hadoop102: JAVA_HOME is not set
hadoop102: full log in /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop102.out
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$
日志中也有fail,验证下页面:
Workers没有其他机器,启动失败
4.7重新修改下我们的配置文件,先停掉spark
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ sbin/stop-all.sh
export JAVA_HOME=/opt/module/jdk1.8.0_211
export SPARK_MASTER_HOS=hadoop102
export SPARK_MASTER_PORT=7077
4.8重新分发下修改的配置
[shaozhiqi@hadoop102 module]$ testxsync spark-2.4.3-bin-hadoop2.7/
4.9重新启动spark
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.master.Master-1-hadoop102.out
hadoop103: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop103.out
hadoop104: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop104.out
hadoop102: starting org.apache.spark.deploy.worker.Worker, logging to /opt/module/spark-2.4.3-bin-hadoop2.7/logs/spark-shaozhiqi-org.apache.spark.deploy.worker.Worker-1-hadoop102.out
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$
4.10验证:
4.11查看进程:
102
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ jps
13217 Worker
13297 Jps
13135 Master
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$
103
[shaozhiqi@hadoop103 conf]$ jps
10528 Worker
10601 Jps
[shaozhiqi@hadoop103 conf]$
104
[shaozhiqi@hadoop104 module]$ jps
11814 Jps
11741 Worker
[shaozhiqi@hadoop104 module]$
4.12跑一个官方的示例
查看示例版本
[shaozhiqi@hadoop102 examples]$ cd jars
[shaozhiqi@hadoop102 jars]$ ll
total 2132
-rw-r--r--. 1 shaozhiqi shaozhiqi 153982 May 1 13:19 scopt_2.11-3.7.0.jar
-rw-r--r--. 1 shaozhiqi shaozhiqi 2023919 May 1 13:19 spark-examples_2.11-2.4.3.jar
提交任务
bin/spark-submit
--class org.apache.spark.examples.SparkPi \ //指定一个主类
--master spark://hadoop102:7077 \ //指明也提交给那个集群
--executor-memory 1G \ //任务执行时的内存可不指定
--total-executor-cores 2 // 执行executor个数
./examples/jars/spark-examples_2.11-2.4.3.jar \ //那个jar包执行
100 //参数
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://hadoop102:7077 \
--executor-memory 1G \
--total-executor-cores 2 \
./examples/jars/spark-examples_2.11-2.4.3.jar \
100
查看我们的spark监控:发现了我们刚刚执行的任务在执行中
4.13 Spark-shell也可以提交任务。会打开我们的Scala代码编辑器,这样我们可以直接写代码进行提交任务
[shaozhiqi@hadoop102 spark-2.4.3-bin-hadoop2.7]$ bin/spark-shell --master spark://hadoop102:7077
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop102:4040
Spark context available as 'sc' (master = spark://hadoop102:7077, app id = app-20190630044455-0001).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.3
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_211)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
访问4.13.中的web ui http://hadoop102:4040
之所以要替换成IP是因为我们的win10没有配置ip和机器名的映射,此页面的作用我后续会补充
spark下载安装,运行examples(spark一)的更多相关文章
- 大数据学习day18----第三阶段spark01--------0.前言(分布式运算框架的核心思想,MR与Spark的比较,spark可以怎么运行,spark提交到spark集群的方式)1. spark(standalone模式)的安装 2. Spark各个角色的功能 3.SparkShell的使用,spark编程入门(wordcount案例)
0.前言 0.1 分布式运算框架的核心思想(此处以MR运行在yarn上为例) 提交job时,resourcemanager(图中写成了master)会根据数据的量以及工作的复杂度,解析工作量,从而 ...
- MongoDB下载+安装+运行
一. 官网下载安装 MongoDB 提供了 OSX 平台上 64 位的安装包,你可以在官网下载安装包. 下载地址:MongoDB官网-Community Server 选择适合自己平台的版本, 下载对 ...
- Win10-64位 免安装版Mysql8下载安装运行
今天忙活了很久去下载安装Mysql,感觉网上的那些教程怎么都对不上呢,很奇怪,不过我乱点一通至少能用了,先凑和着用吧... 记录一下, 要是不对的,以后再修改...windows10系统 2018-5 ...
- Spark下载与入门(Spark自学二)
2.1 下载Spark 略 2.2 Spark中Python和Scala的shell Spark shell可用来与分布式存储在许多机器的内存或者硬盘上的数据进行交互,并且处理过程的分发由Spark自 ...
- spark 卡在spark context,运行出现spark Exception encountered while connecting to the server : javax.security.sasl.SaslException
原因: 使用root用户运行spark代码 解决方法:使用非管理员账户运行spark即可 [userone@localhost bin]$ ./add-user.sh What type of use ...
- Elasticsearch-6.7.0系列(一)9200端口 .tar.gz版本centos7环境--下载安装运行
https://www.elastic.co/guide/index.html(推荐) ES官方英文原版文档,一般会更新到最新版本 https://www.elastic.co/cn/d ...
- mac下Spark的安装与使用
每次接触一个新的知识之前我都抱有恐惧之心,因为总认为自己没有接触到的知识都很高大上,比如上篇介绍到的Hadoop的安装与使用与本篇要介绍的Spark,其实在自己真正琢磨以后才发现本以为高大上的知识其实 ...
- 使用Ghost版本Windows7系统下载安装virtualBox和centos7异常解决
使用Ghost版本Windows7系统下载安装virtualBox和centos7异常解决: 下载安装运行virtualBox时出现获取VirtualBox对象严重错误(如图): 解决方案步骤: 在开 ...
- Windows上安装运行Spark
1.下载Scala: https://www.scala-lang.org/download/ ①注意:必须下载官方要求的JDK版本,并设置JAVA_HOME,否则后面将出现很多麻烦! ②Scala当 ...
随机推荐
- .NET Core学习笔记(5)——WebAPI从Server端push消息到Client
标题起得有点厉害,汉字夹杂着E文,不符合教育部公布的“向社会推荐使用的外语词中文译名”规范.不过他管不着我.写本篇的起因,是重构一个现有的WinForms程序,将Server端的部分逻辑从raw so ...
- 程序员找工作必备 PHP 基础面试题
1.优化 MYSQL 数据库的方法 (1) 选取最适用的字段属性,尽可能减少定义字段长度,尽量把字段设置 NOT NULL, 例如’省份,性别’, 最好设置为 ENUM (2) 使用连接(JOIN)来 ...
- Windows主机与centOS虚拟机之间"ping不通"
为什么要遇到这个问题 这是我重新安装centOS7.5虚拟机之后遇到的问题——我需要安装一个SecureCRT工具,结果主机与虚拟机没有ping通. 在安装这个工具之前需要进行主机与虚拟机的相互pin ...
- VMware Tools失效的处理方案
VMware Tools是一个实现主机与虚拟机文件分享,具有可支持自由拖拽的功能的工具,如果没有VM tools,那么没有了复制粘贴切换的虚拟机是很不方便的. 长时间未开的虚拟机,一次尝试拖拽Wind ...
- gRPC(2):客户端创建和调用原理
1. gRPC 客户端创建流程 1.1 背景 gRPC 是在 HTTP/2 之上实现的 RPC 框架,HTTP/2 是第 7 层(应用层)协议,它运行在 TCP(第 4 层 - 传输层)协议之上,相比 ...
- Python-函数练习题1
# coding=utf-8 '''定义一个方法get_num(num),num参数是列表类型,判断列表里面的元素为数字类型.其他类型则报错, 并且返回一个偶数列表:(注:列表里面的元素为偶数).'' ...
- 强化学习之七:Visualizing an Agent’s Thoughts and Actions
本文是对Arthur Juliani在Medium平台发布的强化学习系列教程的个人中文翻译,该翻译是基于个人分享知识的目的进行的,欢迎交流!(This article is my personal t ...
- WordPress 迁移站点更换域名为新域名
使用 wp-cli 工具搜索替换域名的方式更换 WordPress 域名 wp-cli 是一个命令行工具,可以让我们通过命令行安装.更新 WordPress,对 WordPress 执行一些批量操作, ...
- OLED的使用-4线SPI驱动
一 .OLED屏 1.OLED屏(七针) 2.OLED电路图 3.0.96'OLED简介 该模块特点: 1.三色可选,模块有两种单色和黄蓝双色两种颜色可选,单色为纯白色和纯蓝色,双 色为黄蓝双色: 2 ...
- Mayor's posters POJ - 2528 线段树(离散化处理大数?)
题意:输入t组数据,输入n代表有n块广告牌,按照顺序贴上去,输入左边和右边到达的地方,问贴完以后还有多少块广告牌可以看到(因为有的被完全覆盖了). 思路:很明显就是线段树更改区间,不过这个区间的跨度有 ...