8.spark Core 进阶1

(e.g. standalone manager, Mesos, YARN)

In "cluster" mode, the framework launches the driver inside of the cluster.

In "client" mode, the submitter launches the driver outside of the cluster.

Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called the driver program).

Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers (either Spark’s own standalone cluster manager, Mesos or YARN), which allocate resources across applications. Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.

There are several useful things to note about this architecture:

Each application gets its own executor processes, which stay up for the duration of the whole application and run tasks in multiple threads. This has the benefit of isolating applications from each other, on both the scheduling side (each driver schedules its own tasks) and executor side (tasks from different applications run in different JVMs). However, it also means that data cannot be shared across different Spark applications (instances of SparkContext) without writing it to an external storage system.
Spark is agnostic to the underlying cluster manager. As long as it can acquire executor processes, and these communicate with each other, it is relatively easy to run it even on a cluster manager that also supports other applications (e.g. Mesos/YARN).
The driver program must listen for and accept incoming connections from its executors throughout its lifetime (e.g., see spark.driver.port in the network config section). As such, the driver program must be network addressable from the worker nodes.
Because the driver schedules tasks on the cluster, it should be run close to the worker nodes, preferably on the same local area network. If you’d like to send requests to the cluster remotely, it’s better to open an RPC to the driver and have it submit operations from nearby than to run a driver far away from the worker nodes.

8.spark Core 进阶1的更多相关文章

9.spark Core 进阶2--Cashe
RDD Persistence One of the most important capabilities in Spark is persisting (or caching) a d ...
Spark 3.x Spark Core详解 & 性能优化
Spark Core 1. 概述 Spark 是一种基于内存的快速.通用.可扩展的大数据分析计算引擎 1.1 Hadoop vs Spark 上面流程对应Hadoop的处理流程,下面对应着Spark的 ...
Spark Streaming揭秘 Day35 Spark core思考
Spark Streaming揭秘 Day35 Spark core思考 Spark上的子框架,都是后来加上去的.都是在Spark core上完成的,所有框架一切的实现最终还是由Spark core来 ...
【Spark Core】任务运行机制和Task源代码浅析1
引言上一小节<TaskScheduler源代码与任务提交原理浅析2>介绍了Driver側将Stage进行划分.依据Executor闲置情况分发任务,终于通过DriverActor向exe ...
TypeError: Error #1034: 强制转换类型失败:无法将 mx.controls::DataGrid@9a7c0a1 转换为 spark.core.IViewport。
1.错误描述 TypeError: Error #1034: 强制转换类型失败:无法将 mx.controls::DataGrid@9aa90a1 转换为 spark.core.IViewport. ...
Spark Core
Spark Core DAG概念有向无环图 Spark会根据用户提交的计算逻辑中的RDD的转换(变换方法)和动作(action方法)来生成RDD之间的依赖关系,同时 ...
Spark Streaming 进阶与案例实战
Spark Streaming 进阶与案例实战 1.带状态的算子: UpdateStateByKey 2.实战:计算到目前位置累积出现的单词个数写入到MySql中 1.create table CRE ...
spark core （二）
一.Spark-Shell交互式工具 1.Spark-Shell交互式工具 Spark-Shell提供了一种学习API的简单方式, 以及一个能够交互式分析数据的强大工具. 在Scala语言环境下或Py ...
Spark Core 资源调度与任务调度（standalone client 流程描述）
Spark Core 资源调度与任务调度(standalone client 流程描述) Spark集群启动: 集群启动后,Worker会向Master汇报资源情况(实际上将Worker的资 ...

随机推荐

Eureka 系列（01）最简使用姿态
目录 Eureka 系列(01)最简使用姿态 0. Spring Cloud 系列目录 - Eureka 篇 1. 服务发现与发现 1.1 服务发现(Service Discovery) 1.2 服务 ...
JUC源码分析-线程池篇（三）ScheduledThreadPoolExecutor
JUC源码分析-线程池篇(三)ScheduledThreadPoolExecutor ScheduledThreadPoolExecutor 继承自 ThreadPoolExecutor.它主要用来在 ...
leetcode.字符串.12整数转罗马数字-Java
1. 具体题目罗马数字包含以下七种字符: I, V, X, L,C,D 和 M. I 1V 5X 10L 50C 100D 500M 1000例如, 罗马数字 2 写做 ...
<python基础>python继承机制
子类在调用某个方法或变量的时候,首先在自己内部查找,如果没有找到,则开始根据继承机制在父类里查找. 根据父类定义中的顺序,以深度优先的方式逐一查找父类! class D: def show(self) ...
ubuntu QT安装以及配置交叉编译环境
我的环境:ubuntu14.04 64位 1.下载Qt: 上网搜索 qt-opensource-linux-x64-5.3.0.run下载有好多百度云盘要么在官网下载追新版本http://down ...
从零开始搭搭建系统3.1——顶级pom制定
从零开始搭搭建系统3.1——顶级pom制定
DEV插件下的控件Grid和Gridlookupedit控件的结合使用
创建GridlookupEtid控件设置其对应属性: 设置属性 this.gridLookUpEdit1.Properties.TextEditStyle = DevExpress.XtraEdit ...
selenium+plantomJS
#!/usr/bin/env python # -*- coding:utf-8 -*- """ 流程框架: 1.搜索关键词,利用selenium驱动浏览器搜索关键词,查 ...
C# 记录循环消耗时间
今天写了循环段代码,但是感觉好像性能很差的样子,就想看一下整个循环的执行时间,最开始我想到了DateTime.Now,但是诡异的是,如果我循环的次数比较少的话(少于30000次)就会发现2次时间间隔是 ...
const 命令
const 命令声明一个只读的常量,声明后值不可以改变 const 变量不可以重复声明 const一旦声明变量,就必须立即初始化,不能留到以后赋值. const命令声明的常量也是不提升,同样存在暂时性 ...

8.spark Core 进阶1

8.spark Core 进阶1的更多相关文章

随机推荐

热门专题