spark 笔记 15: ShuffleManager，shuffle map两端的stage/task的桥梁

无论是Hadoop还是spark，shuffle操作都是决定其性能的重要因素。在不能减少shuffle的情况下，使用一个好的shuffle管理器也是优化性能的重要手段。

ShuffleManager的主要功能是在task直接传递数据，所以getWriter和getReader是它的主要接口。

大流程：

1）需求方：当一个Stage依赖于一个shuffleMap的结果，那它在DAG分解的时候就能识别到这个依赖，并注册到shuffleManager；

2）供应方：也就是shuffleMap，它在结束后，会将自己的结果注册到shuffleManager，并通知说自己已经结束了。

3）这样，shuffleManager就将shuffle两段连接了起来。

spark提供了两个shuffle管理器：

1）HashShuffleManager：提供了HashShuffleReader和HashShuffleWriter两个方法。数据的写入是按照k-v对的形式写入的，可以自定义排序和聚合。

* A ShuffleManager using hashing, that creates one output file per reduce partition on each
* mapper (possibly reusing these across waves of tasks).

2）SortShuffleManager：数据按顺序写入。保存了一个blockId文件和blockId.index文件，用途不太清楚。

引用一个别人的图来说明这个关系：

（图片来源：http://jerryshao.me/architecture/2014/01/04/spark-shuffle-detail-investigation/ ）

shuffle的数据读取的函数为HashShuffleReader，它基本上直接调用了下面的流程：

这是一段写得极为紧密的代码，几乎每一行都带了大量的运算，看得特纠结。。

======================获取block数据的流程============================

->BlockStoreShuffleFetcher.fetch[T](shuffleId: Int,reduceId: Int,...) : Iterator[T] --获取shuffleId，ReduceId对应的数据块

->val blockManager = SparkEnv.get.blockManager --获取数据块管理器

->val statuses = SparkEnv.get.mapOutputTracker.getServerStatuses(shuffleId, reduceId)

//--获取shuffleId和reduceId对应的数据‘目录’，statuses是以ManagerId为key的hash表，value是数据的大小

-> val splitsByAddress = new HashMap[BlockManagerId, ArrayBuffer[(Int, Long)]] --创建解析数据‘目录’的缓冲

->for (((address, size), index) <- statuses.zipWithIndex) --遍历整个，并加了个索引。

->splitsByAddress.getOrElseUpdate(address, ArrayBuffer()) += ((index, size)) --将‘目录’以地址为索引重新组织

->val blocksByAddress: Seq[(BlockManagerId, Seq[(BlockId, Long)])] = splitsByAddress.toSeq.map { --‘目录’再次重新组织

//新的‘目录’格式为Seq[(BlockManagerId, Seq[(BlockId, Long(数据长度))])]，因为需要BlockId获取数据

->case (address, splits) => (address, splits.map(s => (ShuffleBlockId(shuffleId, s._1, reduceId), s._2)))

->val blockFetcherItr = blockManager.getMultiple(blocksByAddress, serializer, shuffleMetrics) --获取多块block数据

-> new BlockFetcherIterator.NettyBlockFetcherIterator(this, blocksByAddress, serializer, readMetrics) --使用netty

->iter = BasicBlockFetcherIterator(blockManager, blocksByAddress, serializer, readMetrics) --获取blocks的迭代器

->BasicBlockFetcherIterator 初始化

->protected val localBlocksToFetch = new ArrayBuffer[BlockId]() --本地获取的blockId

->protected val remoteBlocksToFetch = new HashSet[BlockId]() --远程获取的blockId

->protected val results = new LinkedBlockingQueue[FetchResult] --获取的结果放在这里

->protected val fetchRequests = new Queue[FetchRequest] --需要发送出去的请求，主要是为了控制获取的速度

->iter.initialize() --初始化这个迭代器，它会启动获取数据

->val remoteRequests = splitLocalRemoteBlocks() --将传入的block请求转换按块划分的请求。控制并发度。

->val targetRequestSize = math.max(maxBytesInFlight / 5, 1L) --允许最多五个节点同时获取数据。

// Split local and remote blocks. Remote blocks are further split into FetchRequests of size

->val remoteRequests = new ArrayBuffer[FetchRequest] --请求块数组

->for ((address, blockInfos) <- blocksByAddress) { 遍历所有请求的blockId，以一个地址为单位

->if (address == blockManagerId) --本地获取

->localBlocksToFetch ++= blockInfos.filter(_._2 != 0).map(_._1) 本地允许所有同时获取，不控制并发

->else --远端机器

->val iterator = blockInfos.iterator --获得每个地址对应的blockId的迭代器

->while (iterator.hasNext) { --遍历一个地址的所有blockId

->val (blockId, size) = iterator.next()

->curBlocks += ((blockId, size)) --全部写到一个请求块里面

->if (curRequestSize >= targetRequestSize) --如果一个请求块获取的数据太大了

->remoteRequests += new FetchRequest(address, curBlocks) --那么新建一个请求块

->curBlocks = new ArrayBuffer[(BlockId, Long)] --创建新的请求块

->remoteRequests += new FetchRequest(address, curBlocks ) 将请求块封装到一个请求消息里面

->return remoteRequests 将所有的请求消息返回

->fetchRequests ++= Utils.randomize(remoteRequests) --// Add the remote requests into our queue in a random order随机打散

->while (!fetchRequests.isEmpty &&(bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight))

->sendRequest(fetchRequests.dequeue()) --发送请求数据的消息

->val cmId = new ConnectionManagerId(req.address.host, req.address.port)--连接一个地址的信息封装

->blockMessageArray = new BlockMessageArray(req.blocks.map { 遍历这个地址的所有的blockId

->case (blockId, size) => BlockMessage.fromGetBlock(GetBlock(blockId)) -- 将请求信息再封装一次！！

->future = connectionManager.sendMessageReliably(cmId, blockMessageArray.toBufferMessage) --异步执行

->future.onComplete { --future完成后回调

->case Success(message) => 成功获取数据

->val bufferMessage = message.asInstanceOf[BufferMessage] --获取到了数据块

->val blockMessageArray = BlockMessageArray.fromBufferMessage(bufferMessage) --格式转换

->for (blockMessage <- blockMessageArray) --遍历所有获取到的数据块，写到results中

->results.put(new FetchResult(blockId, sizeMap(blockId), () => dataDeserialize(...)))

->logDebug("Got remote block " + blockId + " after " + Utils.getUsedTimeMs(startTime))

->getLocalBlocks() --获取本地数据

->for (id <- localBlocksToFetch)

->val iter = getLocalFromDisk(id, serializer).get --从本地磁盘获取

->results.put(new FetchResult(id, 0, () => iter)) --写入results中

->val itr = blockFetcherItr.flatMap(unpackBlock) //处理可能有失败的block的情况

->val completionIter = CompletionIterator[T, Iterator[T]](itr, { context.taskMetrics.updateShuffleReadMetrics()})

->new InterruptibleIterator[T](context, completionIter) 将获取的结果以迭代器的形式返回给上层。

根据shuffleId、reduceId获取结果数据的状态的函数包含了一个缓存功能，稍微复杂，独立拉出来

->MapOutputTracker::getServerStatuses(shuffleId: Int, reduceId: Int): Array[(BlockManagerId, Long)]

->val statuses = mapStatuses.get(shuffleId).orNull

->if (statuses == null) --缓存没找到，到远端节点获取，在写入缓存

->if (fetching.contains(shuffleId))

->while (fetching.contains(shuffleId))

->fetching.wait()

->fetching += shuffleId

->val fetchedBytes = askTracker(GetMapOutputStatuses(shuffleId)).asInstanceOf[Array[Byte]]

->val future = trackerActor.ask(message)(timeout)

->Await.result(future, timeout)

->fetchedStatuses = MapOutputTracker.deserializeMapStatuses(fetchedBytes)

->mapStatuses.put(shuffleId, fetchedStatuses)

->fetching -= shuffleId

->fetching.notifyAll()

->result = MapOutputTracker.convertMapStatuses(shuffleId, reduceId, fetchedStatuses)

->statuses.map {status =>

->(status.location, decompressSize(status.compressedSizes(reduceId)))

->decompressSize(compressedSize: Byte) --注意这个数据长度的编码很有意思

->math.pow(LOG_BASE, compressedSize & 0xFF).toLong, --LOG_BASE=1.1

->return result

work接收到消息的处理函数

->BlockManagerWorker::onBlockMessageReceive(msg: Message, id: ConnectionManagerId): Option[Message] --接收消息

->case bufferMessage: BufferMessage => 接收到的是bufferMessage

->val responseMessages = blockMessages.map(processBlockMessage).filter(_ != None).map(_.get)

->processBlockMessage(blockMessage: BlockMessage): Option[BlockMessage] --处理blockMessage

->case BlockMessage.TYPE_PUT_BLOCK => --put消息

->val pB = PutBlock(blockMessage.getId, blockMessage.getData, blockMessage.getLevel)

->putBlock(pB.id, pB.data, pB.level) --调用写入函数

->blockManager.putBytes(id, bytes, level) --下面比较繁琐，以后再看

->doPut(blockId, ByteBufferValues(bytes), level, tellMaster, effectiveStorageLevel)

->case BlockMessage.TYPE_GET_BLOCK => { --get消息

->val gB = new GetBlock(blockMessage.getId)

->val buffer = getBlock(gB.id) --读取block

->val buffer = blockManager.getLocalBytes(id) --从本地磁盘读取block

->Some(BlockMessage.fromGotBlock(GotBlock(gB.id, buffer))) --返回给请求者的数据

=================================end====================================

shuffle接口类：

/**
 * Pluggable interface for shuffle systems. A ShuffleManager is created in SparkEnv on both the
 * driver and executors, based on the spark.shuffle.manager setting. The driver registers shuffles
 * with it, and executors (or tasks running locally in the driver) can ask to read and write data.
 *
 * NOTE: this will be instantiated by SparkEnv so its constructor can take a SparkConf and
 * boolean isDriver as parameters.
 */
private[spark] trait ShuffleManager {
  /**
   * Register a shuffle with the manager and obtain a handle for it to pass to tasks.
   */
  def registerShuffle[K, V, C](
      shuffleId: Int,
      numMaps: Int,
      dependency: ShuffleDependency[K, V, C]): ShuffleHandle

  /** Get a writer for a given partition. Called on executors by map tasks. */
  def getWriter[K, V](handle: ShuffleHandle, mapId: Int, context: TaskContext): ShuffleWriter[K, V]

  /**
   * Get a reader for a range of reduce partitions (startPartition to endPartition-1, inclusive).
   * Called on executors by reduce tasks.
   */
  def getReader[K, C](
      handle: ShuffleHandle,
      startPartition: Int,
      endPartition: Int,
      context: TaskContext): ShuffleReader[K, C]

  /** Remove a shuffle's metadata from the ShuffleManager. */
  def unregisterShuffle(shuffleId: Int)
  /** Shut down this ShuffleManager. */
  def stop(): Unit
}

/**
 * :: DeveloperApi ::
 * Represents a dependency on the output of a shuffle stage. Note that in the case of shuffle,
 * the RDD is transient since we don't need it on the executor side.
 *
 * @param _rdd the parent RDD
 * @param partitioner partitioner used to partition the shuffle output
 * @param serializer [[org.apache.spark.serializer.Serializer Serializer]] to use. If set to None,
 *                   the default serializer, as specified by `spark.serializer` config option, will
 *                   be used.
 */
@DeveloperApi
class ShuffleDependency[K, V, C](
    @transient _rdd: RDD[_ <: Product2[K, V]],
    val partitioner: Partitioner,
    val serializer: Option[Serializer] = None,
    val keyOrdering: Option[Ordering[K]] = None,
    val aggregator: Option[Aggregator[K, V, C]] = None,
    val mapSideCombine: Boolean = false)
  extends Dependency[Product2[K, V]] {

  override def rdd = _rdd.asInstanceOf[RDD[Product2[K, V]]]

  val shuffleId: Int = _rdd.context.newShuffleId()

  val shuffleHandle: ShuffleHandle = _rdd.context.env.shuffleManager.registerShuffle(
    shuffleId, _rdd.partitions.size, this)

  _rdd.sparkContext.cleaner.foreach(_.registerShuffleForCleanup(this))
}

/**
 * A basic ShuffleHandle implementation that just captures registerShuffle's parameters.
 */
private[spark] class BaseShuffleHandle[K, V, C](
    shuffleId: Int,
    val numMaps: Int,
    val dependency: ShuffleDependency[K, V, C])
  extends ShuffleHandle(shuffleId)

/**
 * Class that keeps track of the location of the map output of
 * a stage. This is abstract because different versions of MapOutputTracker
 * (driver and worker) use different HashMap to store its metadata.
 */
private[spark] abstract class MapOutputTracker(conf: SparkConf) extends Logging {

一个shuffleTask的结果会以MapStatus的形式返回给调度器，包括map执行的机器的BlockManager的地址以及输出结果的大小。注意，这个size是经过压缩后的大小

/**
 * Result returned by a ShuffleMapTask to a scheduler. Includes the block manager address that the
 * task ran on as well as the sizes of outputs for each reducer, for passing on to the reduce tasks.
 * The map output sizes are compressed using MapOutputTracker.compressSize.
 */
private[spark] class MapStatus(var location: BlockManagerId, var compressedSizes: Array[Byte])
  extends Externalizable {

private[spark] class BlockMessage() {
  // Un-initialized: typ = 0
  // GetBlock: typ = 1
  // GotBlock: typ = 2
  // PutBlock: typ = 3
  private var typ: Int = BlockMessage.TYPE_NON_INITIALIZED
  private var id: BlockId = null
  private var data: ByteBuffer = null
  private var level: StorageLevel = null

hash和sort的shuffleManager的reader都是用了这个HashShuffleReader，BlockStoreShuffleFetcher.fetch做了大部分工作。

private[spark] class HashShuffleReader[K, C](
    handle: BaseShuffleHandle[K, _, C],
    startPartition: Int,
    endPartition: Int,
    context: TaskContext)
  extends ShuffleReader[K, C]
{
  require(endPartition == startPartition + 1,
    "Hash shuffle currently only supports fetching one partition")

  private val dep = handle.dependency

  /** Read the combined key-values for this reduce task */
  override def read(): Iterator[Product2[K, C]] = {
    val readMetrics = context.taskMetrics.createShuffleReadMetricsForDependency()
    val ser = Serializer.getSerializer(dep.serializer)
    val iter = BlockStoreShuffleFetcher.fetch(handle.shuffleId, startPartition, context, ser,
      readMetrics)
    --下面这段是获取聚合器，它可以配置指定是map阶段聚合还是reduce阶段聚合。
    val aggregatedIter: Iterator[Product2[K, C]] = if (dep.aggregator.isDefined) {
      if (dep.mapSideCombine) {
        new InterruptibleIterator(context, dep.aggregator.get.combineCombinersByKey(iter, context))
      } else {
        new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context))
      }
    } else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
      throw new IllegalStateException("Aggregator is empty for map-side combine")
    } else {
      // Convert the Product2s to pairs since this is what downstream RDDs currently expect
      iter.asInstanceOf[Iterator[Product2[K, C]]].map(pair => (pair._1, pair._2))
    }

    // Sort the output if there is a sort ordering defined.
    dep.keyOrdering match {
      case Some(keyOrd: Ordering[K]) =>  --是否有自定义的排序算法
        // Create an ExternalSorter to sort the data. Note that if spark.shuffle.spill is disabled,
        // the ExternalSorter won't spill to disk.
        val sorter = new ExternalSorter[K, C, C](ordering = Some(keyOrd), serializer = Some(ser))
        sorter.insertAll(aggregatedIter)
        context.taskMetrics.memoryBytesSpilled += sorter.memoryBytesSpilled
        context.taskMetrics.diskBytesSpilled += sorter.diskBytesSpilled
        sorter.iterator
      case None =>
        aggregatedIter
    }
  }

  /** Close this reader */
  override def stop(): Unit = ???
}

HashShuffleWriter: 这个只是对父类的writer做了每次写入一个k-v的封装，比较简单

private[spark] class HashShuffleWriter[K, V](
    handle: BaseShuffleHandle[K, V, _],
    mapId: Int,
    context: TaskContext)
  extends ShuffleWriter[K, V] with Logging {

  private val blockManager = SparkEnv.get.blockManager
  private val shuffleBlockManager = blockManager.shuffleBlockManager
  private val ser = Serializer.getSerializer(dep.serializer.getOrElse(null))
  private val shuffle = shuffleBlockManager.forMapTask(dep.shuffleId, mapId, numOutputSplits, ser,
    writeMetrics)

  /** Write a bunch of records to this task's output */
  override def write(records: Iterator[_ <: Product2[K, V]]): Unit = {
    val iter = if (dep.aggregator.isDefined) {
      if (dep.mapSideCombine) {
        dep.aggregator.get.combineValuesByKey(records, context)
      } else {
        records
      }
    } else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
      throw new IllegalStateException("Aggregator is empty for map-side combine")
    } else {
      records
    }

    for (elem <- iter) {
      val bucketId = dep.partitioner.getPartition(elem._1)
      shuffle.writers(bucketId).write(elem)
    }
  }

SortShuffleWriter会把数据按顺序写入，并且保持存blockId文件和blockId.index文件。

private[spark] class SortShuffleWriter[K, V, C](
    handle: BaseShuffleHandle[K, V, C],
    mapId: Int,
    context: TaskContext)
  extends ShuffleWriter[K, V] with Logging {

  private val dep = handle.dependency
  private val numPartitions = dep.partitioner.numPartitions

  private val blockManager = SparkEnv.get.blockManager
  private val ser = Serializer.getSerializer(dep.serializer.orNull)

  private val conf = SparkEnv.get.conf
  private val fileBufferSize = conf.getInt("spark.shuffle.file.buffer.kb", 32) * 1024

  private var sorter: ExternalSorter[K, V, _] = null
  private var outputFile: File = null
  private var indexFile: File = null

  // Are we in the process of stopping? Because map tasks can call stop() with success = true
  // and then call stop() with success = false if they get an exception, we want to make sure
  // we don't try deleting files, etc twice.
  private var stopping = false

  private var mapStatus: MapStatus = null

  private val writeMetrics = new ShuffleWriteMetrics()
  context.taskMetrics.shuffleWriteMetrics = Some(writeMetrics)

  /** Write a bunch of records to this task's output */
  override def write(records: Iterator[_ <: Product2[K, V]]): Unit = {
    if (dep.mapSideCombine) {
      if (!dep.aggregator.isDefined) {
        throw new IllegalStateException("Aggregator is empty for map-side combine")
      }
      sorter = new ExternalSorter[K, V, C](
        dep.aggregator, Some(dep.partitioner), dep.keyOrdering, dep.serializer)
      sorter.insertAll(records)
    } else {
      // In this case we pass neither an aggregator nor an ordering to the sorter, because we don't
      // care whether the keys get sorted in each partition; that will be done on the reduce side
      // if the operation being run is sortByKey.
      sorter = new ExternalSorter[K, V, V](
        None, Some(dep.partitioner), None, dep.serializer)
      sorter.insertAll(records)
    }

    // Create a single shuffle file with reduce ID 0 that we'll write all results to. We'll later
    // serve different ranges of this file using an index file that we create at the end.
    val blockId = ShuffleBlockId(dep.shuffleId, mapId, 0)

    outputFile = blockManager.diskBlockManager.getFile(blockId)
    indexFile = blockManager.diskBlockManager.getFile(blockId.name + ".index")

    val partitionLengths = sorter.writePartitionedFile(blockId, context)

    // Register our map output with the ShuffleBlockManager, which handles cleaning it over time
    blockManager.shuffleBlockManager.addCompletedMap(dep.shuffleId, mapId, numPartitions)

    mapStatus = new MapStatus(blockManager.blockManagerId,
      partitionLengths.map(MapOutputTracker.compressSize))
  }

有关shuffle的细节，甚至是原理，都理解的不够深入，还有很多的需要学习。

来自为知笔记(Wiz)

spark 笔记 15: ShuffleManager，shuffle map两端的stage/task的桥梁的更多相关文章

Spark技术内幕：Shuffle Map Task运算结果的处理
Shuffle Map Task运算结果的处理这个结果的处理,分为两部分,一个是在Executor端是如何直接处理Task的结果的:还有就是Driver端,如果在接到Task运行结束的消息时,如何对 ...
【hadoop代码笔记】Mapreduce shuffle过程之Map输出过程
一.概要描述 shuffle是MapReduce的一个核心过程,因此没有在前面的MapReduce作业提交的过程中描述,而是单独拿出来比较详细的描述. 根据官方的流程图示如下: 本篇文章中只是想尝试从 ...
Spark技术内幕：Shuffle Pluggable框架详解，你怎么开发自己的Shuffle Service？
首先介绍一下需要实现的接口.框架的类图如图所示(今天CSDN抽风,竟然上传不了图片.如果需要实现新的Shuffle机制,那么需要实现这些接口. 1.1.1 org.apache.spark.shuf ...
【原创】大数据基础之Spark（5）Shuffle实现原理及代码解析
一简介 Shuffle,简而言之,就是对数据进行重新分区,其中会涉及大量的网络io和磁盘io,为什么需要shuffle,以词频统计reduceByKey过程为例, serverA:partition ...
Spark优化一则 - 减少Shuffle
Spark优化一则 - 减少Shuffle 看了Spark Summit 2014的A Deeper Understanding of Spark Internals,视频(要***)详细讲解了Spa ...
Spark性能优化：shuffle调优
调优概述大多数Spark作业的性能主要就是消耗在了shuffle环节,因为该环节包含了大量的磁盘IO.序列化.网络数据传输等操作.因此,如果要让作业的性能更上一层楼,就有必要对shuffle过程进行 ...
spark源码阅读--shuffle过程分析
ShuffleManager(一) 本篇,我们来看一下spark内核中另一个重要的模块,Shuffle管理器ShuffleManager.shuffle可以说是分布式计算中最重要的一个概念了,数据的j ...
spark笔记环境配置
spark笔记 spark简介 saprk 有六个核心组件: SparkCore.SparkSQL.SparkStreaming.StructedStreaming.MLlib,Graphx Spar ...
spark 笔记 8: Stage
Stage 是一组独立的任务,他们在一个job中执行相同的功能(function),功能的划分是以shuffle为边界的.DAG调度器以拓扑顺序执行同一个Stage中的task. /** * A st ...

随机推荐

干货分享！Oracle 的入门到精通 ~
Oracle Database,又名Oracle RDBMS,或简称Oracle.是甲骨文公司的一款关系数据库管理系统.它是在数据库领域一直处于领先地位的产品.可以说Oracle数据库系统是目前世界上 ...
mybatis字符#与字符$的区别
问题:使用in查询查询出一批数据,in查询的参数是字符串拼接的.调试过程中,把mybatis输出的sql复制到navicat中,在控制台将sql的参数也复制出来,替换到sql的字符 '?' 的位置,执 ...
Redis简介，应用场景，优势
Redis简介 Redis 是完全开源免费的,遵守BSD协议,是一个高性能的key-value数据库. Redis 与其他 key - value 缓存产品有以下三个特点: Redis支持数据的持久化 ...
Delphi 变量的作用域
Delphi CheckBox组件
python、第五篇：数据备份、pymysql模块
一 IDE工具介绍生产环境还是推荐使用mysql命令行,但为了方便我们测试,可以使用IDE工具下载链接:https://pan.baidu.com/s/1bpo5mqj 掌握: #1. 测试+链接 ...
神经网络优化算法：梯度下降法、Momentum、RMSprop和Adam
最近回顾神经网络的知识,简单做一些整理,归档一下神经网络优化算法的知识.关于神经网络的优化,吴恩达的深度学习课程讲解得非常通俗易懂,有需要的可以去学习一下,本人只是对课程知识点做一个总结.吴恩达的深度 ...
deep_learning_LSTM长短期记忆神经网络处理Mnist数据集
1.RNN(Recurrent Neural Network)循环神经网络模型详见RNN循环神经网络:https://www.cnblogs.com/pinard/p/6509630.html 2. ...
【转载】C++ 11中的右值引用
本篇随笔为转载,原博地址如下:http://www.cnblogs.com/TianFang/archive/2013/01/26/2878356.html 右值引用的功能首先,我并不介绍什么是右值 ...
Addthis分享插件后url乱码的解决办法
Addthis分享插件安装后,有时候URL后面会出现类似#.VB4mxhbjtnQ的一串乱码的乱码,作用是用来追踪客户客户,但是给客户的印象会以为木马中毒之类的 http://localhost/mi ...

spark 笔记 15: ShuffleManager，shuffle map两端的stage/task的桥梁

spark 笔记 15: ShuffleManager，shuffle map两端的stage/task的桥梁的更多相关文章

随机推荐

热门专题