Spark进阶之路-Standalone模式搭建

                                   作者:尹正杰

版权声明:原创作品,谢绝转载!否则将追究法律责任。

一.Spark的集群的准备环境

1>.master节点信息(s101)

2>.worker节点信息(s102)

3>.worker节点信息(s103)

4>.worker节点信息(s104)

二.Spark的Standalone模式搭建

1>.下载Spark安装包

  Spark下载地址:https://archive.apache.org/dist/spark/ 

[yinzhengjie@s101 download]$ sudo yum -y install wget
[sudo] password for yinzhengjie:
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.aliyun.com
* extras: mirrors.aliyun.com
* updates: mirrors.aliyun.com
Resolving Dependencies
--> Running transaction check
---> Package wget.x86_64 :1.14-.el7_4. will be installed
--> Finished Dependency Resolution Dependencies Resolved =====================================================================================================================================================================
Package Arch Version Repository Size
=====================================================================================================================================================================
Installing:
wget x86_64 1.14-.el7_4. base k Transaction Summary
=====================================================================================================================================================================
Install Package Total download size: k
Installed size: 2.0 M
Downloading packages:
wget-1.14-.el7_4..x86_64.rpm | kB ::
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : wget-1.14-.el7_4..x86_64 /
Verifying : wget-1.14-.el7_4..x86_64 / Installed:
wget.x86_64 :1.14-.el7_4. Complete!
[yinzhengjie@s101 download]$

安装wget软件包([yinzhengjie@s101 download]$ sudo yum -y install wget)

[yinzhengjie@s101 download]$ wget https://archive.apache.org/dist/spark/spark-2.1.1/spark-2.1.1-bin-hadoop2.7.tgz    #下载你想要下载的版本

2>.解压配置文件

[yinzhengjie@s101 download]$ ll
total
-rw-r--r--. yinzhengjie yinzhengjie Aug hadoop-2.7..tar.gz
-rw-r--r--. yinzhengjie yinzhengjie May jdk-8u131-linux-x64.tar.gz
-rw-r--r--. yinzhengjie yinzhengjie Jul spark-2.1.-bin-hadoop2..tgz
-rw-r--r--. yinzhengjie yinzhengjie Jun : zookeeper-3.4..tar.gz
[yinzhengjie@s101 download]$
[yinzhengjie@s101 download]$ tar -xf spark-2.1.-bin-hadoop2..tgz -C /soft/              #加压Spark安装包到指定目录
[yinzhengjie@s101 download]$ ll /soft/
total
lrwxrwxrwx. yinzhengjie yinzhengjie Aug : hadoop -> /soft/hadoop-2.7./
drwxr-xr-x. yinzhengjie yinzhengjie Aug : hadoop-2.7.
lrwxrwxrwx. yinzhengjie yinzhengjie Aug : jdk -> /soft/jdk1..0_131/
drwxr-xr-x. yinzhengjie yinzhengjie Mar jdk1..0_131
drwxr-xr-x. yinzhengjie yinzhengjie Apr spark-2.1.-bin-hadoop2.
lrwxrwxrwx. yinzhengjie yinzhengjie Aug : zk -> /soft/zookeeper-3.4./
drwxr-xr-x. yinzhengjie yinzhengjie Mar : zookeeper-3.4.
[yinzhengjie@s101 download]$ ll /soft/spark-2.1.-bin-hadoop2./                    #查看目录结构
total
drwxr-xr-x. yinzhengjie yinzhengjie Apr bin
drwxr-xr-x. yinzhengjie yinzhengjie Apr conf
drwxr-xr-x. yinzhengjie yinzhengjie Apr data
drwxr-xr-x. yinzhengjie yinzhengjie Apr examples
drwxr-xr-x. yinzhengjie yinzhengjie Apr jars
-rw-r--r--. yinzhengjie yinzhengjie Apr LICENSE
drwxr-xr-x. yinzhengjie yinzhengjie Apr licenses
-rw-r--r--. yinzhengjie yinzhengjie Apr NOTICE
drwxr-xr-x. yinzhengjie yinzhengjie Apr python
drwxr-xr-x. yinzhengjie yinzhengjie Apr R
-rw-r--r--. yinzhengjie yinzhengjie Apr README.md
-rw-r--r--. yinzhengjie yinzhengjie Apr RELEASE
drwxr-xr-x. yinzhengjie yinzhengjie Apr sbin
drwxr-xr-x. yinzhengjie yinzhengjie Apr yarn
[yinzhengjie@s101 download]$

3>.编辑slaves配置文件,将worker的节点主机名输入,默认是localhost

[yinzhengjie@s101 download]$ cd /soft/spark-2.1.-bin-hadoop2./conf/
[yinzhengjie@s101 conf]$ ll
total
-rw-r--r--. yinzhengjie yinzhengjie Apr docker.properties.template
-rw-r--r--. yinzhengjie yinzhengjie Apr fairscheduler.xml.template
-rw-r--r--. yinzhengjie yinzhengjie Apr log4j.properties.template
-rw-r--r--. yinzhengjie yinzhengjie Apr metrics.properties.template
-rw-r--r--. yinzhengjie yinzhengjie Apr slaves.template
-rw-r--r--. yinzhengjie yinzhengjie Apr spark-defaults.conf.template
-rwxr-xr-x. yinzhengjie yinzhengjie Apr spark-env.sh.template
[yinzhengjie@s101 conf]$ cp slaves.template slaves
[yinzhengjie@s101 conf]$ vi slaves
[yinzhengjie@s101 conf]$ cat slaves
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# # A Spark Worker will be started on each of the machines listed below.
s102
s103
s104
[yinzhengjie@s101 conf]$

4>.编辑spark-env.sh文件,指定master节点和端口号

[yinzhengjie@s101 ~]$ cp /soft/spark/conf/spark-env.sh.template /soft/spark/conf/spark-env.sh
[yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ echo export JAVA_HOME=/soft/jdk >> /soft/spark/conf/spark-env.sh
[yinzhengjie@s101 ~]$ echo SPARK_MASTER_HOST=s101 >> /soft/spark/conf/spark-env.sh
[yinzhengjie@s101 ~]$ echo SPARK_MASTER_PORT= >> /soft/spark/conf/spark-env.sh
[yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ grep -v ^# /soft/spark/conf/spark-env.sh | grep -v ^$
export JAVA_HOME=/soft/jdk
SPARK_MASTER_HOST=s101
SPARK_MASTER_PORT=
[yinzhengjie@s101 ~]$

5>.将s101的spark配置信息分发到worker节点

[yinzhengjie@s101 ~]$ more `which xrsync.sh`
#!/bin/bash
#@author :yinzhengjie
#blog:http://www.cnblogs.com/yinzhengjie
#EMAIL:y1053419035@qq.com #判断用户是否传参
if [ $# -lt ];then
echo "请输入参数";
exit
fi #获取文件路径
file=$@ #获取子路径
filename=`basename $file` #获取父路径
dirpath=`dirname $file` #获取完整路径
cd $dirpath
fullpath=`pwd -P` #同步文件到DataNode
for (( i=;i<=;i++ ))
do
#使终端变绿色
tput setaf
echo =========== s$i %file ===========
#使终端变回原来的颜色,即白灰色
tput setaf
#远程执行命令
rsync -lr $filename `whoami`@s$i:$fullpath
#判断命令是否执行成功
if [ $? == ];then
echo "命令执行成功"
fi
done
[yinzhengjie@s101 ~]$

需要配置无秘钥登录,之后执行启动脚本进行同步([yinzhengjie@s101 ~]$ more `which xrsync.sh`)

  关于配置无秘钥登录请参考我之间的笔记:https://www.cnblogs.com/yinzhengjie/p/9065191.html。配置好无秘钥登录后,直接执行上面的脚本进行同步数据。

[yinzhengjie@s101 ~]$ xrsync.sh /soft/spark-2.1.-bin-hadoop2./
=========== s102 %file ===========
命令执行成功
=========== s103 %file ===========
命令执行成功
=========== s104 %file ===========
命令执行成功
[yinzhengjie@s101 ~]$

6>.修改配置文件,将spark运行脚本添加至系统环境变量

[yinzhengjie@s101 ~]$ ln -s /soft/spark-2.1.-bin-hadoop2./ /soft/spark      #这里做一个软连接,方便简写目录名称
[yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ sudo vi /etc/profile                      #修改系统环境变量的配置文件
[sudo] password for yinzhengjie:
[yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ tail - /etc/profile
#ADD SPARK_PATH by yinzhengjie
export SPARK_HOME=/soft/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
[yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ source /etc/profile                      #重写加载系统配置文件,使其变量在当前shell生效。
[yinzhengjie@s101 ~]$

7>.启动spark集群

[yinzhengjie@s101 ~]$ more `which xcall.sh`
#!/bin/bash
#@author :yinzhengjie
#blog:http://www.cnblogs.com/yinzhengjie
#EMAIL:y1053419035@qq.com #判断用户是否传参
if [ $# -lt ];then
echo "请输入参数"
exit
fi #获取用户输入的命令
cmd=$@ for (( i=;i<=;i++ ))
do
#使终端变绿色
tput setaf
echo ============= s$i $cmd ============
#使终端变回原来的颜色,即白灰色
tput setaf
#远程执行命令
ssh s$i $cmd
#判断命令是否执行成功
if [ $? == ];then
echo "命令执行成功"
fi
done
[yinzhengjie@s101 ~]$

[yinzhengjie@s101 ~]$ more `which xcall.sh`

[yinzhengjie@s101 ~]$ /soft/spark/sbin/start-all.sh       #启动spark集群
starting org.apache.spark.deploy.master.Master, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.master.Master--s101.out
s102: starting org.apache.spark.deploy.worker.Worker, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.worker.Worker--s102.out
s103: starting org.apache.spark.deploy.worker.Worker, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.worker.Worker--s103.out
s104: starting org.apache.spark.deploy.worker.Worker, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.worker.Worker--s104.out
[yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ xcall.sh jps              #查看进程master和slave节点是否起来了
============= s101 jps ============
Jps
Master
命令执行成功
============= s102 jps ============
Jps
Worker
命令执行成功
============= s103 jps ============
Jps
Worker
命令执行成功
============= s104 jps ============
Jps
Worker
命令执行成功
[yinzhengjie@s101 ~]$

8>.检查Spark的webUI界面

9>.启动spark-shell 

三.在Spark集群中执行Wordcount

1>.链接到master集群([yinzhengjie@s101 ~]$ spark-shell --master spark://s101:7077)

2>.登录webUI,查看正在运行的APP

3>.查看应用细节

4>.查看job的信息

5>.查看stage

6>.查看具体的详细信息

7>.退出spark-shell

8>.查看spark的完成应用,发现日志没了?

  那么问题来了。如果看日志呢?详情请参考:https://www.cnblogs.com/yinzhengjie/p/9410989.html

Spark进阶之路-Standalone模式搭建的更多相关文章

  1. Spark进阶之路-Spark HA配置

    Spark进阶之路-Spark HA配置 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 集群部署完了,但是有一个很大的问题,那就是Master节点存在单点故障,要解决此问题,就要借 ...

  2. Spark进阶之路-日志服务器的配置

    Spark进阶之路-日志服务器的配置 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 如果你还在纠结如果配置Spark独立模式(Standalone)集群,可以参考我之前分享的笔记: ...

  3. Redis进阶:Redis的哨兵模式搭建

    Redis进阶:Redis的哨兵模式搭建 哨兵机制介绍 单机版的Redis存在性能瓶颈,Redis通过提高主从复制实现读写分离,提高了了Redis的可用性,另一方便也能实现数据在多个Redis直接的备 ...

  4. 【SSH进阶之路】Hibernate搭建开发环境+简单实例(二)

    Hibernate是很典型的持久层框架,持久化的思想是很值得我们学习和研究的.这篇博文,我们主要以实例的形式学习Hibernate,不深究Hibernate的思想和原理,否则,一味追求,苦学思想和原理 ...

  5. spark 源码编译 standalone 模式部署

    本文介绍如何编译 spark 的源码,并且用 standalone 的方式在单机上部署 spark. 步骤如下: 1. 下载 spark 并且解压 本文选择 spark 的最新版本 2.2.0 (20 ...

  6. 【Spark】Spark-shell案例——standAlone模式下读取HDFS上存放的文件

    目录 可以先用local模式读取一下 步骤 一.先将做测试的数据上传到HDFS 二.开发scala代码 standAlone模式查看HDFS上的文件 步骤 一.退出local模式,重新进入Spark- ...

  7. Spark环境搭建(七)-----------spark的Local和standalone模式启动

    spark的启动方式有两种,一种单机模式(Local),另一种是多机器的集群模式(Standalone) Standalone 搭建: 准备:hadoop001,hadoop002两台安装spark的 ...

  8. Spark进阶之路-Spark提交Jar包执行

    Spark进阶之路-Spark提交Jar包执行 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 在实际开发中,使用spark-submit提交jar包是很常见的方式,因为用spark ...

  9. Spark3.0.1各种集群模式搭建

    对于spark前来围观的小伙伴应该都有所了解,也是现在比较流行的计算框架,基本上是有点规模的公司标配,所以如果有时间也可以补一下短板. 简单来说Spark作为准实时大数据计算引擎,Spark的运行需要 ...

随机推荐

  1. <<梦断代码>>阅读笔记二

    这是第二篇读书笔记,这本书我已经读了有一大半了,感觉书中所描述的人都是疯子,一群有创造力,却又耐得住寂寞的疯子. 我从书中发现几点我比较感兴趣的内容. 第一个,乐高之梦.将程序用乐高积木一样拼接起来. ...

  2. Python学习笔记 -- 第四章

    高阶函数 变量可以指向函数 f=abs f(-10) 10 变量f指向abs函数,直接调用abs()函数和调用f()完全相同 传入参数 变量可以指向函数,函数的参数可以接收另一个函数的参数,这种函数成 ...

  3. Python学习笔记(三)——条件语句、循环语句

    注:需注意代码的缩进格式 注:需注意代码的缩进格式 注:需注意代码的缩进格式 Python 与其他语言最大的区别就是,Python 的代码块不使用大括号 {} 来控制类,函数以及其他逻辑判断.pyth ...

  4. Spring配置常识

    (1)数据源配置 <bean id="dataSource" class="com.alibaba.druid.pool.DruidDataSource" ...

  5. 关于EA和ED的区别

    在申请美国大学本科的过程中,申请的截止时间往往分为两轮:提前申请(Early Decision/Action) 和常规申请 (Regular Decision).提前申请,顾名思义,截止时间会相对早一 ...

  6. Netty4ClientHttpRequest代码赏析

    private static int getPort(URI uri) { int port = uri.getPort(); if (port == -1) { if ("http&quo ...

  7. Android控件第7类——对话框

    1.AlertDialog AlertDialog用来生成对话框,功能十分强大. AlertDialog可以分成4个组成部分:标题栏上的图标,标题区,文本区,按钮区. 使用方法: 创建AlertDia ...

  8. C/S架构引用Lodop 如何在C#调用web打印控件Lodop

    lodop是web打印控件,引用安装目录下的ocx文件,可以在c/s架构中使用. 该文件所在路径:C:\Program Files (x86)\MountTaiSoftware\Lodop 有32位和 ...

  9. 美图美妆由Try Try接手运营

    美图又把一个拖累营收的业务转让出去了. 美图的电商业务——美图美妆应用在向用户发布终止运营的公告后,宣布把业务交给了寺库旗下公司 Try Try 运营.Try Try 接手了美图美妆的所有管理运营权, ...

  10. MT【57】2017联赛一试解答倒数第二题:一道不等式的最值

    注:康拓诺维奇不等式的应用