概述

近期工作上忙死了……广播变量这一块事实上早就看过了，一直没有贴出来。

本文基于Spark 1.0源代码分析，主要探讨广播变量的初始化、创建、读取以及清除。

类关系

BroadcastManager类中包括一个BroadcastFactory对象的引用。大部分操作通过调用BroadcastFactory中的方法来实现。

BroadcastFactory是一个Trait，有两个直接子类TorrentBroadcastFactory、HttpBroadcastFactory。这两个子类实现了对HttpBroadcast、TorrentBroadcast的封装。而后面两个又同一时候集成了Broadcast抽象类。

图……就不画了

BroadcastManager的初始化

SparkContext初始化时会创建SparkEnv对象env，这个过程中会调用BroadcastManager的构造方法返回一个对象作为env的成员变量存在：

val broadcastManager = new BroadcastManager(isDriver, conf, securityManager)

构造BroadcastManager对象时会调用initialize方法，主要依据配置初始化broadcastFactory成员变量。并调用其initialize方法。

 val broadcastFactoryClass =

          conf.get("spark.broadcast.factory", "org.apache.spark.broadcast.HttpBroadcastFactory")

        broadcastFactory =

          Class.forName(broadcastFactoryClass).newInstance.asInstanceOf[BroadcastFactory]

        // Initialize appropriate BroadcastFactory and BroadcastObject

        broadcastFactory.initialize(isDriver, conf, securityManager)

两个工厂类的initialize方法都是对其对应实体类的initialize方法的调用。以下分开两个类来看。

HttpBroadcast的initialize方法

  def initialize(isDriver: Boolean, conf: SparkConf, securityMgr: SecurityManager) {

    synchronized {

      if (!initialized) {

        bufferSize = conf.getInt("spark.buffer.size", 65536)

        compress = conf.getBoolean("spark.broadcast.compress", true)

        securityManager = securityMgr

        if (isDriver) {

          createServer(conf)

          conf.set("spark.httpBroadcast.uri",  serverUri)

        }

        serverUri = conf.get("spark.httpBroadcast.uri")

        cleaner = new MetadataCleaner(MetadataCleanerType.HTTP_BROADCAST, cleanup, conf)

        compressionCodec = CompressionCodec.createCodec(conf)

        initialized = true

      }

    }

  }

除了一些变量的初始化外，主要做两件事情。一是createServer（仅仅有在Driver端会做），其次是创建一个MetadataCleaner对象。

createServer

  private def createServer(conf: SparkConf) {

    broadcastDir = Utils.createTempDir(Utils.getLocalDir(conf))

    server = new HttpServer(broadcastDir, securityManager)

    server.start()

    serverUri = server.uri

    logInfo("Broadcast server started at " + serverUri)

  }

首先创建一个存放广播变量的文件夹，默认是

conf.get("spark.local.dir",  System.getProperty("java.io.tmpdir")).split(',')(0)

然后初始化一个HttpServer对象并启动（封装了jetty），启动过程中包含载入资源文件，起port和线程用来监控请求等。这部分的细节在org.apache.spark.HttpServer类中。此处不做展开。

创建MetadataCleaner对象

一个MetadataCleaner对象包装了一个定时计划Timer，每隔一段时间运行一个回调函数，此处传入的回调函数为cleanup：

  private def cleanup(cleanupTime: Long) {

    val iterator = files.internalMap.entrySet().iterator()

    while(iterator.hasNext) {

      val entry = iterator.next()

      val (file, time) = (entry.getKey, entry.getValue)

      if (time < cleanupTime) {

        iterator.remove()

        deleteBroadcastFile(file)

      }

    }

  }

即清楚存在吵过一定时长的broadcast文件。在时长未设定（默认情况）时。不清除：

 if (delaySeconds > 0) {

    logDebug(

      "Starting metadata cleaner for " + name + " with delay of " + delaySeconds + " seconds " +

      "and period of " + periodSeconds + " secs")

    timer.schedule(task, periodSeconds * 1000, periodSeconds * 1000)

  }

TorrentBroadcast的initialize方法

  def initialize(_isDriver: Boolean, conf: SparkConf) {

    TorrentBroadcast.conf = conf // TODO: we might have to fix it in tests

    synchronized {

      if (!initialized) {

        initialized = true

      }

    }

  }

Torrent在此处没做什么，这也能够看出和Http的差别，Torrent的处理方式就是p2p。去中心化。

而Http是中心化服务，须要启动服务来接受请求。

创建broadcast变量

调用SparkContext中的 def broadcast[T: ClassTag](value: T): Broadcast[T]方法来初始化一个广播变量，实现例如以下：

def broadcast[T: ClassTag](value: T): Broadcast[T] = {

    val bc = env.broadcastManager.newBroadcast[T](value, isLocal)

    cleaner.foreach(_.registerBroadcastForCleanup(bc))

    bc

  }

即调用broadcastManager的newBroadcast方法：

  def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean) = {

    broadcastFactory.newBroadcast[T](value_, isLocal, nextBroadcastId.getAndIncrement())

  }

再调用工厂类的newBroadcast方法，此处返回的是一个Broadcast对象。

HttpBroadcastFactory的newBroadcast

  def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean, id: Long) =

    new HttpBroadcast[T](value_, isLocal, id)

即创建一个新的HttpBroadcast对象并返回。

构造对象时主要做两件事情：

 HttpBroadcast.synchronized {

    SparkEnv.get.blockManager.putSingle(

      blockId, value_, StorageLevel.MEMORY_AND_DISK, tellMaster = false)

  }

  if (!isLocal) {

    HttpBroadcast.write(id, value_)

  }

1.将变量id和值放入blockManager。但并不通知master

2.调用伴生对象的write方法

def write(id: Long, value: Any) {

    val file = getFile(id)

    val out: OutputStream = {

      if (compress) {

        compressionCodec.compressedOutputStream(new FileOutputStream(file))

      } else {

        new BufferedOutputStream(new FileOutputStream(file), bufferSize)

      }

    }

    val ser = SparkEnv.get.serializer.newInstance()

    val serOut = ser.serializeStream(out)

    serOut.writeObject(value)

    serOut.close()

    files += file

  }

write方法将对象值依照指定的压缩、序列化写入指定的文件。

这个文件所在的文件夹即是HttpServer的资源文件夹。文件名称和id的相应关系为：

case class BroadcastBlockId(broadcastId: Long, field: String = "") extends BlockId {

  def name = "broadcast_" + broadcastId + (if (field == "") "" else "_" + field)

}

TorrentBroadcastFactory的newBroadcast方法

  def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean, id: Long) =

    new TorrentBroadcast[T](value_, isLocal, id)

相同是创建一个TorrentBroadcast对象，并返回。

  TorrentBroadcast.synchronized {

    SparkEnv.get.blockManager.putSingle(

      broadcastId, value_, StorageLevel.MEMORY_AND_DISK, tellMaster = false)

  }

  if (!isLocal) {

    sendBroadcast()

  }

做两件事情，第一步和Http一样。第二步：

  def sendBroadcast() {

    val tInfo = TorrentBroadcast.blockifyObject(value_)

    totalBlocks = tInfo.totalBlocks

    totalBytes = tInfo.totalBytes

    hasBlocks = tInfo.totalBlocks

    // Store meta-info

    val metaId = BroadcastBlockId(id, "meta")

    val metaInfo = TorrentInfo(null, totalBlocks, totalBytes)

    TorrentBroadcast.synchronized {

      SparkEnv.get.blockManager.putSingle(

        metaId, metaInfo, StorageLevel.MEMORY_AND_DISK, tellMaster = true)

    }

    // Store individual pieces

    for (i <- 0 until totalBlocks) {

      val pieceId = BroadcastBlockId(id, "piece" + i)

      TorrentBroadcast.synchronized {

        SparkEnv.get.blockManager.putSingle(

          pieceId, tInfo.arrayOfBlocks(i), StorageLevel.MEMORY_AND_DISK, tellMaster = true)

      }

    }

  }

能够看出。先将元数据信息缓存到blockManager，再将块信息缓存过去。

开头能够看到有一个分块动作，是调用伴生对象的blockifyObject方法：

def blockifyObject[T](obj: T): TorrentInfo

此方法将对象obj分块（默认块大小为4M），返回一个TorrentInfo对象。第一个參数为一个TorrentBlock对象（包括blockID和block字节数组）、块数量以及obj的字节流总长度。

元数据信息中的blockId为广播变量id+后缀，value为总块数和总字节数。

数据信息是分块缓存。每块的id为广播变量id加后缀及块变好，数据位一个TorrentBlock对象

读取广播变量的值

通过调用bc.value来取得广播变量的值，其主要实如今反序列化方法readObject中

HttpBroadcast的反序列化

 HttpBroadcast.synchronized {

      SparkEnv.get.blockManager.getSingle(blockId) match {

        case Some(x) => value_ = x.asInstanceOf[T]

        case None => {

          logInfo("Started reading broadcast variable " + id)

          val start = System.nanoTime

          value_ = HttpBroadcast.read[T](id)

          /*

           * We cache broadcast data in the BlockManager so that subsequent tasks using it

           * do not need to re-fetch. This data is only used locally and no other node

           * needs to fetch this block, so we don't notify the master.

           */

          SparkEnv.get.blockManager.putSingle(

            blockId, value_, StorageLevel.MEMORY_AND_DISK, tellMaster = false)

          val time = (System.nanoTime - start) / 1e9

          logInfo("Reading broadcast variable " + id + " took " + time + " s")

        }

      }

    }

首先查看blockManager中是否已有，如有则直接取值。否则调用伴生对象的read方法进行读取：

def read[T: ClassTag](id: Long): T = {

    logDebug("broadcast read server: " +  serverUri + " id: broadcast-" + id)

    val url = serverUri + "/" + BroadcastBlockId(id).name

    var uc: URLConnection = null

    if (securityManager.isAuthenticationEnabled()) {

      logDebug("broadcast security enabled")

      val newuri = Utils.constructURIForAuthentication(new URI(url), securityManager)

      uc = newuri.toURL.openConnection()

      uc.setAllowUserInteraction(false)

    } else {

      logDebug("broadcast not using security")

      uc = new URL(url).openConnection()

    }

    val in = {

      uc.setReadTimeout(httpReadTimeout)

      val inputStream = uc.getInputStream

      if (compress) {

        compressionCodec.compressedInputStream(inputStream)

      } else {

        new BufferedInputStream(inputStream, bufferSize)

      }

    }

    val ser = SparkEnv.get.serializer.newInstance()

    val serIn = ser.deserializeStream(in)

    val obj = serIn.readObject[T]()

    serIn.close()

    obj

  }

使用serverUri和block id相应的文件名称直接开启一个HttpConnection将中心服务器上相应的数据取过来，使用配置的压缩和序列化机制进行解压和反序列化。

这里能够看到，全部须要用到广播变量值的executor都须要去driver上pull广播变量的内容。

取到值后，缓存到blockManager中，以便下次使用。

TorrentBroadcast的反序列化

private def readObject(in: ObjectInputStream) {

    in.defaultReadObject()

    TorrentBroadcast.synchronized {

      SparkEnv.get.blockManager.getSingle(broadcastId) match {

        case Some(x) =>

          value_ = x.asInstanceOf[T]

        case None =>

          val start = System.nanoTime

          logInfo("Started reading broadcast variable " + id)

          // Initialize @transient variables that will receive garbage values from the master.

          resetWorkerVariables()

          if (receiveBroadcast()) {

            value_ = TorrentBroadcast.unBlockifyObject[T](arrayOfBlocks, totalBytes, totalBlocks)

            /* Store the merged copy in cache so that the next worker doesn't need to rebuild it.

             * This creates a trade-off between memory usage and latency. Storing copy doubles

             * the memory footprint; not storing doubles deserialization cost. Also,

             * this does not need to be reported to BlockManagerMaster since other executors

             * does not need to access this block (they only need to fetch the chunks,

             * which are reported).

             */

            SparkEnv.get.blockManager.putSingle(

              broadcastId, value_, StorageLevel.MEMORY_AND_DISK, tellMaster = false)

            // Remove arrayOfBlocks from memory once value_ is on local cache

            resetWorkerVariables()

          } else {

            logError("Reading broadcast variable " + id + " failed")

          }

          val time = (System.nanoTime - start) / 1e9

          logInfo("Reading broadcast variable " + id + " took " + time + " s")

      }

    }

  }

和Http一样。都是先查看blockManager中是否已经缓存，若没有，则调用receiveBroadcast方法：

def receiveBroadcast(): Boolean = {

    // Receive meta-info about the size of broadcast data,

    // the number of chunks it is divided into, etc.

    val metaId = BroadcastBlockId(id, "meta")

    var attemptId = 10

    while (attemptId > 0 && totalBlocks == -1) {

      TorrentBroadcast.synchronized {

        SparkEnv.get.blockManager.getSingle(metaId) match {

          case Some(x) =>

            val tInfo = x.asInstanceOf[TorrentInfo]

            totalBlocks = tInfo.totalBlocks

            totalBytes = tInfo.totalBytes

            arrayOfBlocks = new Array[TorrentBlock](totalBlocks)

            hasBlocks = 0

          case None =>

            Thread.sleep(500)

        }

      }

      attemptId -= 1

    }

    if (totalBlocks == -1) {

      return false

    }

    /*

     * Fetch actual chunks of data. Note that all these chunks are stored in

     * the BlockManager and reported to the master, so that other executors

     * can find out and pull the chunks from this executor.

     */

    val recvOrder = new Random().shuffle(Array.iterate(0, totalBlocks)(_ + 1).toList)

    for (pid <- recvOrder) {

      val pieceId = BroadcastBlockId(id, "piece" + pid)

      TorrentBroadcast.synchronized {

        SparkEnv.get.blockManager.getSingle(pieceId) match {

          case Some(x) =>

            arrayOfBlocks(pid) = x.asInstanceOf[TorrentBlock]

            hasBlocks += 1

            SparkEnv.get.blockManager.putSingle(

              pieceId, arrayOfBlocks(pid), StorageLevel.MEMORY_AND_DISK, tellMaster = true)

          case None =>

            throw new SparkException("Failed to get " + pieceId + " of " + broadcastId)

        }

      }

    }

    hasBlocks == totalBlocks

  }

和写数据一样，相同是分成两个部分，首先取元数据信息，再依据元数据信息读取实际的block信息。注意这里都是从blockManager中读取的，这里贴出blockManager.getSingle的分析。

调用栈中最后到BlockManager.doGetRemote方法。中间有一条语句：

 val locations = Random.shuffle(master.getLocations(blockId))

即将存有这个block的节点信息随机打乱，然后使用：

 val data = BlockManagerWorker.syncGetBlock(

        GetBlock(blockId), ConnectionManagerId(loc.host, loc.port))

来获取。

从这里能够看出，Torrent方法首先将广播变量数据分块，并存到BlockManager中；每一个节点须要读取广播变量时。是分块读取。对每一块都读取其位置信息。然后随机选一个存有此块数据的节点进行get。每一个节点读取后会将包括的快信息报告给BlockManagerMaster。这样本地节点也成为了这个广播网络中的一个peer。

与Http方式形成鲜明对照，这是一个去中心化的网络。仅仅须要保持一个tracker就可以，这就是p2p的思想。

广播变量的清除

广播变量被创建时。紧接着有这样一句代码：

cleaner.foreach(_.registerBroadcastForCleanup(bc))

cleaner是一个ContextCleaner对象，会将刚刚创建的广播变量注冊到当中。调用栈为：

  def registerBroadcastForCleanup[T](broadcast: Broadcast[T]) {

    registerForCleanup(broadcast, CleanBroadcast(broadcast.id))

  }

  private def registerForCleanup(objectForCleanup: AnyRef, task: CleanupTask) {

    referenceBuffer += new CleanupTaskWeakReference(task, objectForCleanup, referenceQueue)

  }

等出现广播变量被弱引用时（关于弱引用，能够參考：http://blog.csdn.net/lyfi01/article/details/6415726）。则会运行

cleaner.foreach(_.start())

start方法中会调用keepCleaning方法。会遍历注冊的清理任务（包含RDD、shuffle和broadcast），依次进行清理：

private def keepCleaning(): Unit = Utils.logUncaughtExceptions {

    while (!stopped) {

      try {

        val reference = Option(referenceQueue.remove(ContextCleaner.REF_QUEUE_POLL_TIMEOUT))

          .map(_.asInstanceOf[CleanupTaskWeakReference])

        reference.map(_.task).foreach { task =>

          logDebug("Got cleaning task " + task)

          referenceBuffer -= reference.get

          task match {

            case CleanRDD(rddId) =>

              doCleanupRDD(rddId, blocking = blockOnCleanupTasks)

            case CleanShuffle(shuffleId) =>

              doCleanupShuffle(shuffleId, blocking = blockOnCleanupTasks)

            case CleanBroadcast(broadcastId) =>

              doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks)

          }

        }

      } catch {

        case e: Exception => logError("Error in cleaning thread", e)

      }

    }

  }

doCleanupBroadcast调用下面语句：

broadcastManager.unbroadcast(broadcastId, true, blocking)

然后是：

  def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean) {

    broadcastFactory.unbroadcast(id, removeFromDriver, blocking)

  }

每一个工厂类调用其相应实体类的伴生对象的unbroadcast方法。

HttpBroadcast中的变量清除

 def unpersist(id: Long, removeFromDriver: Boolean, blocking: Boolean) = synchronized {

    SparkEnv.get.blockManager.master.removeBroadcast(id, removeFromDriver, blocking)

    if (removeFromDriver) {

      val file = getFile(id)

      files.remove(file)

      deleteBroadcastFile(file)

    }

  }

1是删除blockManager中的缓存。2是删除本地持久化的文件

TorrentBroadcast中的变量清除

  def unpersist(id: Long, removeFromDriver: Boolean, blocking: Boolean) = synchronized {

    SparkEnv.get.blockManager.master.removeBroadcast(id, removeFromDriver, blocking)

  }

小结

Broadcast能够使用在executor端多次使用某个数据的场景（比方说字典），Http和Torrent两种方式相应传统的CS訪问方式和P2P訪问方式。当广播变量较大或者使用较频繁时。採用后者能够降低driver端的压力。

BlockManager在此处充当P2P中的tracker角色。没有展开描写叙述，兴许会开专题讲这个部分。

声明：本文为原创，禁止用于不论什么商业目的。转载请注明出处：http://blog.csdn.net/asongoficeandfire/article/details/37584643

Spark大师之路：广播变量（Broadcast）源代码分析的更多相关文章

Spark学习之路（四）Spark的广播变量和累加器
一.概述在spark程序中,当一个传递给Spark操作(例如map和reduce)的函数在远程节点上面运行时,Spark操作实际上操作的是这个函数所用变量的一个独立副本.这些变量会被复制到每台机器上 ...
Spark学习之路（四）Spark的广播变量和累加器[转]
概述在spark程序中,当一个传递给Spark操作(例如map和reduce)的函数在远程节点上面运行时,Spark操作实际上操作的是这个函数所用变量的一个独立副本.这些变量会被复制到每台机器上,并 ...
Spark 广播变量BroadCast
一. 广播变量广播变量允许程序员将一个只读的变量缓存在每台机器上,而不用在任务之间传递变量.广播变量可被用于有效地给每个节点一个大输入数据集的副本.Spark还尝试使用高效地广播算法来分发变量,进而 ...
spark中的广播变量broadcast
Spark中的Broadcast处理首先先来看一看broadcast的使用代码: val values = List[Int](1,2,3) val broadcastValues = sparkC ...
【Spark篇】---Spark中广播变量和累加器
一.前述 Spark中因为算子中的真正逻辑是发送到Executor中去运行的,所以当Executor中需要引用外部变量时,需要使用广播变量. 累机器相当于统筹大变量,常用于计数,统计. 二.具体原理 ...
Spark共享变量(广播变量、累加器)
转载自:https://blog.csdn.net/Android_xue/article/details/79780463 Spark两种共享变量:广播变量(broadcast variable)与 ...
spark的广播变量
直接上代码:包含了,map,filter,persist,mapPartitions等函数 String master = "spark://192.168.2.279:7077" ...
Spark2.0基于广播变量broadcast实现实时数据按天统计
package com.gm.hive.SparkHive; import java.text.SimpleDateFormat; import java.util.Arrays; import ja ...
Spark的广播变量模块
有人问我,如果让我设计广播变量该怎么设计,我想了想说,为啥不用zookeeper呢? 对啊,为啥不用zookeeper,也许spark的最初设计哲学就是尽量不使用别的组件,他有自己分布式内存文件系统, ...
初识Flink广播变量broadcast
Broadcast 广播变量:可以理解为是一个公共的共享变量,我们可以把一个dataset 或者不变的缓存对象(例如map list集合对象等)数据集广播出去,然后不同的任务在节点上都能够获取到,并在 ...

随机推荐

如何去除ecshop标题和网站底部的Powered by ECShop
这个问题困扰大家很久了,感觉Powered by ECShop出现在网站里边不爽,想方设法无法去除.今天在下专门把解决方法贴出来,希望能够方便大家! 注:我们使用ecshop的产品,建议把网站底部的P ...
HDU 1661 Assigments 贪心法题解
Problem Description In a factory, there are N workers to finish two types of tasks (A and B). Each t ...
hdu 5035 概率论
n服务形式,各服务窗口等候时间指数公布,求所需的等待时间. 解: 相两点:首先,等到轮到他,然后就是送达时间. 潜伏期期望每个表单1/ki(1/ki,宣布预期指数公式).总的等待时间预期1/(求和ki ...
lightoj1027（期望dp）
有一个迷宫,有n个门,走每个的概率都是相同的每个门有一个数字,如果是正数ai,那么表示走ai天就能走出迷宫,如果是负数,那么走-ai天会回到原地,然后会忘记之前的事情,继续选择门去走所以,如果都是 ...
c#为了实现自己的线程池功能(一)
线程池的技术背景在面向对象编程中,创建和销毁对象是非常费时间的,由于创建一个对象要获取内存资源或者其他很多其他资源,所以提高服务程序效率的一个手段就是尽可能降低创建和销毁对象的次数.特别是一些非常耗 ...
HDU 2828 DLX搜索
Lamp Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others) Total Submi ...
SQL：多表关联采取这一纪录迄今为止最大
笔者:iamlasong 1.需求两个表,投递记录表和封发开拆记录表,如今想知道投递日期距最后一次封发日期天数分布情况. 对这个需求,须要先查询出投递明细,同一时候要知道相应的邮件最后一次封发情况. ...
uva10465（完全背包，要求装满背包）
http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&c ...
加入指数（IOS开发）
该指数是用来协助查询. 原则上: - 索引的标题是不完全一样的标题显示: - 指数应该具有一定的代表性,它可表示一组数据: - 假设索引列表视图.在一般情况下不再使用扩展视图. (easy指向) 会又 ...
经典排序算法 - 归并排序Merge sort
经典排序算法 - 归并排序Merge sort 原理,把原始数组分成若干子数组,对每个子数组进行排序, 继续把子数组与子数组合并,合并后仍然有序,直到所有合并完,形成有序的数组举例无序数组[6 2 ...

Spark大师之路：广播变量（Broadcast）源代码分析

概述

类关系