Spark源码分析 – SparkEnv
SparkEnv在两个地方会被创建, 由于SparkEnv中包含了很多重要的模块, 比如BlockManager, 所以SparkEnv很重要
Driver端, 在SparkContext初始化的时候, SparkEnv会被创建
// Create the Spark execution environment (cache, map output tracker, etc)
private[spark] val env = SparkEnv.createFromSystemProperties(
"<driver>", // 表示是driver, 下面的executor则是executorid
System.getProperty("spark.driver.host"),
System.getProperty("spark.driver.port").toInt,
true,
isLocal)
SparkEnv.set(env)
Executor端, 在executor初始化时被创建
// Initialize Spark environment (using system properties read above)
val env = SparkEnv.createFromSystemProperties(executorId, slaveHostname, 0, false, false)
SparkEnv.set(env)
SparkEnv Class
用于hold所有Spark运行时的环境对象, serializer, Akka actor system, block manager, and map output tracker等
/**
* Holds all the runtime environment objects for a running Spark instance (either master or worker),
* including the serializer, Akka actor system, block manager, map output tracker, etc. Currently
* Spark code finds the SparkEnv through a thread-local variable, so each thread that accesses these
* objects needs to have the right SparkEnv set. You can get the current environment with
* SparkEnv.get (e.g. after creating a SparkContext) and set it with SparkEnv.set.
*/
class SparkEnv (
val executorId: String,
val actorSystem: ActorSystem,
val serializerManager: SerializerManager,
val serializer: Serializer,
val closureSerializer: Serializer,
val cacheManager: CacheManager,
val mapOutputTracker: MapOutputTracker,
val shuffleFetcher: ShuffleFetcher,
val broadcastManager: BroadcastManager,
val blockManager: BlockManager,
val connectionManager: ConnectionManager,
val httpFileServer: HttpFileServer,
val sparkFilesDir: String,
val metricsSystem: MetricsSystem) {
}
SparkEnv Object
scala使用伴生object当作类接口
除了基本的get和set
就是在createFromSystemProperties中创建了一堆很关键的对象
object SparkEnv extends Logging {
private val env = new ThreadLocal[SparkEnv] // ThreadLocal,所以每个线程各访问各的
@volatile private var lastSetSparkEnv : SparkEnv = _ // 缓存最新更新的SparkEnv,并且volatile,便于其他线程获得 def set(e: SparkEnv) {
lastSetSparkEnv = e
env.set(e)
} /**
* Returns the ThreadLocal SparkEnv, if non-null. Else returns the SparkEnv
* previously set in any thread.
*/
def get: SparkEnv = {
Option(env.get()).getOrElse(lastSetSparkEnv) // 没有local时, 可以用lastSetSparkEnv
} /**
* Returns the ThreadLocal SparkEnv.
*/
def getThreadLocal : SparkEnv = {
env.get() // 只取到local的
} def createFromSystemProperties(
executorId: String,
hostname: String,
port: Int,
isDriver: Boolean,
isLocal: Boolean): SparkEnv = { val (actorSystem, boundPort) = AkkaUtils.createActorSystem("spark", hostname, port) val classLoader = Thread.currentThread.getContextClassLoader // Create an instance of the class named by the given Java system property, or by
// defaultClassName if the property is not set, and return it as a T
def instantiateClass[T](propertyName: String, defaultClassName: String): T = {
val name = System.getProperty(propertyName, defaultClassName)
Class.forName(name, true, classLoader).newInstance().asInstanceOf[T]
} val serializerManager = new SerializerManager val serializer = serializerManager.setDefault(
System.getProperty("spark.serializer", "org.apache.spark.serializer.JavaSerializer")) val closureSerializer = serializerManager.get(
System.getProperty("spark.closure.serializer", "org.apache.spark.serializer.JavaSerializer")) val connectionManager = blockManager.connectionManager val broadcastManager = new BroadcastManager(isDriver) val cacheManager = new CacheManager(blockManager)
// BlockManager
val blockManagerMaster = new BlockManagerMaster(registerOrLookup( // registerOrLookup表示只有在master上创建Actor对象, slave上只是创建ref
"BlockManagerMaster",
new BlockManagerMasterActor(isLocal)))
val blockManager = new BlockManager(executorId, actorSystem, blockManagerMaster, serializer)
// MapOutputTracker
val mapOutputTracker = new MapOutputTracker()
mapOutputTracker.trackerActor = registerOrLookup( // 同样只有在master创建actor对象
"MapOutputTracker",
new MapOutputTrackerActor(mapOutputTracker))
// ShuffleFetcher
val shuffleFetcher = instantiateClass[ShuffleFetcher](
"spark.shuffle.fetcher", "org.apache.spark.BlockStoreShuffleFetcher") val httpFileServer = new HttpFileServer()
httpFileServer.initialize()
System.setProperty("spark.fileserver.uri", httpFileServer.serverUri) val metricsSystem = if (isDriver) {
MetricsSystem.createMetricsSystem("driver")
} else {
MetricsSystem.createMetricsSystem("executor")
}
metricsSystem.start() new SparkEnv(
executorId,
actorSystem,
serializerManager,
serializer,
closureSerializer,
cacheManager,
mapOutputTracker,
shuffleFetcher,
broadcastManager,
blockManager,
connectionManager,
httpFileServer,
sparkFilesDir,
metricsSystem)
}
}
Spark源码分析 – SparkEnv的更多相关文章
- Spark源码分析 – 汇总索引
http://jerryshao.me/categories.html#architecture-ref http://blog.csdn.net/pelick/article/details/172 ...
- Spark源码分析之Spark-submit和Spark-class
有了前面spark-shell的经验,看这两个脚本就容易多啦.前面总结的Spark-shell的分析可以参考: Spark源码分析之Spark Shell(上) Spark源码分析之Spark She ...
- Spark源码分析:多种部署方式之间的区别与联系(转)
原文链接:Spark源码分析:多种部署方式之间的区别与联系(1) 从官方的文档我们可以知道,Spark的部署方式有很多种:local.Standalone.Mesos.YARN.....不同部署方式的 ...
- Spark 源码分析 -- task实际执行过程
Spark源码分析 – SparkContext 中的例子, 只分析到sc.runJob 那么最终是怎么执行的? 通过DAGScheduler切分成Stage, 封装成taskset, 提交给Task ...
- Spark源码分析 – Shuffle
参考详细探究Spark的shuffle实现, 写的很清楚, 当前设计的来龙去脉 Hadoop Hadoop的思路是, 在mapper端每次当memory buffer中的数据快满的时候, 先将memo ...
- Spark源码分析 – BlockManager
参考, Spark源码分析之-Storage模块 对于storage, 为何Spark需要storage模块?为了cache RDD Spark的特点就是可以将RDD cache在memory或dis ...
- Spark源码分析 -- TaskScheduler
Spark在设计上将DAGScheduler和TaskScheduler完全解耦合, 所以在资源管理和task调度上可以有更多的方案 现在支持, LocalSheduler, ClusterSched ...
- Spark源码分析 – DAGScheduler
DAGScheduler的架构其实非常简单, 1. eventQueue, 所有需要DAGScheduler处理的事情都需要往eventQueue中发送event 2. eventLoop Threa ...
- Spark源码分析之九:内存管理模型
Spark是现在很流行的一个基于内存的分布式计算框架,既然是基于内存,那么自然而然的,内存的管理就是Spark存储管理的重中之重了.那么,Spark究竟采用什么样的内存管理模型呢?本文就为大家揭开Sp ...
随机推荐
- 获取Oracle数据库中字段信息
select t.DATA_PRECISION,t.DATA_SCALE,t.DATA_LENGTH,t.DATA_TYPE,t.COLUMN_NAME, t.NULLABLE,t.DATA_DEFA ...
- nyoj16矩形嵌套(第一道dp关于dag的题目)
http://acm.nyist.net/JudgeOnline/problem.php?pid=16 题意:有n个矩形,每个矩形可以用a,b来描述,表示长和宽.矩形X(a,b)可以嵌套在矩形Y(c, ...
- 最新PHPcms9.6.0 任意文件上传漏洞
在用户注册处抓包: 然后发送到repeater POC: siteid=&modelid=&username=z1aaaac121&password=aasaewee311as ...
- 演练:创建和注册自定义 HTTP 模块
本演练演示自定义 HTTP 模块的基本功能. 对于每个请求,都需要调用 HTTP 模块以响应 BeginRequest 和 EndRequest 事件. 因此,该模块在处理请求之前和之后运行. 如果 ...
- 如何找到文件的家-打开文件对话框openFileDialog
private void button1_Click(object sender, EventArgs e) { openFileDialog1.Filter = "*.txt|*.txt& ...
- per-cpu
What is percpu data? percpu data 是内核为smp系统中不同CPU之间的数据保护方式,系统为每个CPU维护一段私有的空间,在这段空间中的数据只有这个CPU能访问.但是这种 ...
- gin入门
Download and install it: $ go get github.com/gin-gonic/gin Import it in your code: import "gith ...
- Android应用双击返回键退出
@Override public void onBackPressed() { // TODO 退出提示 if (System.currentTimeMillis() - mExitTime > ...
- 点击edittext并显示其内容
package com.example.sum;//sum import com.example.sum.R;//sum import android.app.Activity; import and ...
- 关于VS2013编辑器的问题
如果输出报错 This function or variable may be unsafe. 解决方法 1.用VS2013打开出现错误的代码文件 2.在工程文件名处右击鼠标打开快捷菜单,找到“属性” ...