Intro

  • 这篇是对一个Spark (Streaming)作业的log进行分析。用来加深对Spark application运行过程,优化空间的各种理解。

Here to Start

  • 从我这个初学者写得一个Spark Streaming程序开始...
  • package com.wttttt.spark
    
    /**
    * Created with IntelliJ IDEA.
    * Description:
    * Author: wttttt
    * Github: https://github.com/wttttt-wang/hadoop_inaction
    * Date: 2017-05-19
    * Time: 09:56
    */
    import java.util.regex.Pattern import org.apache.spark.SparkConf
    import org.apache.spark.streaming.{Milliseconds, StreamingContext}
    import org.slf4j.LoggerFactory import scala.collection.mutable object LocalTest {
    val logger = LoggerFactory.getLogger("LocalTest")
    def main(args: Array[String]) { val batchInterval = Milliseconds(10000)
    val slideInterval = Milliseconds(5000) val conf = new SparkConf()
    .setMaster("local[2]")
    .setAppName("LocalTest")
    // WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data,
    // otherwise Spark jobs will not get resources to process the received data.
    val sc = new StreamingContext(conf, Milliseconds(5000))
    sc.checkpoint("flumeCheckpoint/") val stream = sc.socketTextStream("localhost", 9998) val counts = stream.mapPartitions{ events =>
    val pattern = Pattern.compile("\\?Input=[^\\s]*\\s")
    val map = new mutable.HashMap[String, Int]()
    logger.info("Handling events, events is empty: " + events.isEmpty)
    while (events.hasNext){ // par is an Iterator!!!
    val line = events.next()
    val m = pattern.matcher(line)
    if (m.find()) {
    val words = line.substring(m.start(), m.end()).split("=")(1).toLowerCase()
    logger.info(s"Processing words $words")
    map.put(words, map.getOrElse(words, 0) + 1)
    }
    }
    map.iterator
    } val window = counts.reduceByKeyAndWindow(_+_, _-_, batchInterval, slideInterval)
    // window.print() // transform和它的变体trnasformWith运行在DStream上任意的RDD-to-RDD函数;
    // 可以用来使用那些不包含在DStrema API中RDD操作
    val sorted = window.transform(rdd =>{
    val sortRdd = rdd.map(t => (t._2, t._1)).sortByKey(false).map(t => (t._2, t._1))
    val more = sortRdd.take(2)
    more.foreach(println)
    sortRdd
    }) sorted.print() sc.start()
    sc.awaitTermination()
    }
    }
  • 看运行log (摘取的一部分)
  • 17/05/20 10:20:13 INFO Utils: Successfully started service 'sparkDriver' on port 52300.
    
    17/05/20 10:20:17 INFO ReducedWindowedDStream: Checkpoint interval automatically set to 10000 ms
    
    17/05/20 10:20:17 INFO SocketInputDStream: Slide time = 5000 ms
    
    17/05/20 10:20:17 INFO ReceiverTracker: Receiver 0 started
    
    17/05/20 10:20:17 INFO DAGScheduler: Got job 0 (start at LocalTest.scala:66) with 1 output partitions
    
    17/05/20 10:20:17 INFO JobScheduler: Started JobScheduler
    
    17/05/20 10:20:17 INFO StreamingContext: StreamingContext started
    
    17/05/20 10:20:20 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58
    
    17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 2 (mapPartitions at LocalTest.scala:36)
    
    17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 4 (reduceByKeyAndWindow at LocalTest.scala:52)
    
    17/05/20 10:20:20 INFO DAGScheduler: Got job 1 (sortByKey at LocalTest.scala:58) with 2 output partitions
    
    17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 3 (sortByKey at LocalTest.scala:58)
    
    17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 1, ShuffleMapStage 2)
    
    17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2)
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting ShuffleMapStage 2 (ParallelCollectionRDD[4] at reduceByKeyAndWindow at LocalTest.scala:52), which has no missing parents
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 2 (ParallelCollectionRDD[4] at reduceByKeyAndWindow at LocalTest.scala:52)
    
    17/05/20 10:20:20 INFO DAGScheduler: Job 1 finished: sortByKey at LocalTest.scala:58, took 0.314005 s
    
    17/05/20 10:20:20 INFO SparkContext: Starting job: take at LocalTest.scala:59
    
    17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 7 (map at LocalTest.scala:58)
    
    17/05/20 10:20:20 INFO DAGScheduler: Got job 2 (take at LocalTest.scala:59) with 1 output partitions
    
    17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 7 (take at LocalTest.scala:59)
    
    17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 6)
    
    17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 6)
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[7] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[7] at map at LocalTest.scala:58)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 6.0 with 2 tasks
    
    17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 5, localhost, executor driver, partition 0, PROCESS_LOCAL, 6109 bytes)
    
    17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 6.0 (TID 5)
    
    17/05/20 10:20:20 INFO BlockManager: Found block rdd60 locally
    
    17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 6.0 (TID 5). 1062 bytes result sent to driver
    
    17/05/20 10:20:20 INFO TaskSetManager: Starting task 1.0 in stage 6.0 (TID 6, localhost, executor driver, partition 1, PROCESS_LOCAL, 6109 bytes)
    
    17/05/20 10:20:20 INFO Executor: Running task 1.0 in stage 6.0 (TID 6)
    
    17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 5) in 14 ms on localhost (executor driver) (1/2)
    
    17/05/20 10:20:20 INFO BlockManager: Found block rdd61 locally
    
    17/05/20 10:20:20 INFO Executor: Finished task 1.0 in stage 6.0 (TID 6). 1062 bytes result sent to driver
    
    17/05/20 10:20:20 INFO TaskSetManager: Finished task 1.0 in stage 6.0 (TID 6) in 9 ms on localhost (executor driver) (2/2)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:20 INFO DAGScheduler: ShuffleMapStage 6 (map at LocalTest.scala:58) finished in 0.024 s
    
    17/05/20 10:20:20 INFO DAGScheduler: looking for newly runnable stages
    
    17/05/20 10:20:20 INFO DAGScheduler: running: Set(ResultStage 0)
    
    17/05/20 10:20:20 INFO DAGScheduler: waiting: Set(ResultStage 7)
    
    17/05/20 10:20:20 INFO DAGScheduler: failed: Set()
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting ResultStage 7 (MapPartitionsRDD[11] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:20 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 4.0 KB, free 912.2 MB)
    
    17/05/20 10:20:20 INFO MemoryStore: Block broadcast4piece0 stored as bytes in memory (estimated size 2.4 KB, free 912.2 MB)
    
    17/05/20 10:20:20 INFO BlockManagerInfo: Added broadcast4piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.3 MB)
    
    17/05/20 10:20:20 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 7 (MapPartitionsRDD[11] at map at LocalTest.scala:58)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 7.0 with 1 tasks
    
    17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 7, localhost, executor driver, partition 0, PROCESS_LOCAL, 5903 bytes)
    
    17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 7.0 (TID 7)
    
    17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 7.0 (TID 7). 1718 bytes result sent to driver
    
    17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 7) in 146 ms on localhost (executor driver) (1/1)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:20 INFO DAGScheduler: Job 2 finished: take at LocalTest.scala:59, took 0.198460 s
    
    17/05/20 10:20:20 INFO SparkContext: Starting job: print at LocalTest.scala:64
    
    17/05/20 10:20:20 INFO DAGScheduler: Got job 3 (print at LocalTest.scala:64) with 1 output partitions
    
    17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 11 (print at LocalTest.scala:64)
    
    17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 10)
    
    17/05/20 10:20:20 INFO CheckpointWriter: Submitted checkpoint of time 1495246820000 ms to writer queue
    
    17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List()
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting ResultStage 11 (MapPartitionsRDD[11] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:20 INFO CheckpointWriter: Saving checkpoint for time 1495246820000 ms to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000'
    
    17/05/20 10:20:20 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 4.0 KB, free 912.2 MB)
    
    17/05/20 10:20:20 INFO MemoryStore: Block broadcast5piece0 stored as bytes in memory (estimated size 2.4 KB, free 912.2 MB)
    
    17/05/20 10:20:20 INFO BlockManagerInfo: Added broadcast5piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.3 MB)
    
    17/05/20 10:20:20 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 11 (MapPartitionsRDD[11] at map at LocalTest.scala:58)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 11.0 with 1 tasks
    
    17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 11.0 (TID 8, localhost, executor driver, partition 0, PROCESS_LOCAL, 6672 bytes)
    
    17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 11.0 (TID 8)
    
    17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 11.0 (TID 8). 1718 bytes result sent to driver
    
    17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 11.0 (TID 8) in 39 ms on localhost (executor driver) (1/1)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:20 INFO DAGScheduler: ResultStage 11 (print at LocalTest.scala:64) finished in 0.041 s
    
    17/05/20 10:20:20 INFO DAGScheduler: Job 3 finished: print at LocalTest.scala:64, took 0.053225 s
    
    Time: 1495246820000 ms
    
    17/05/20 10:20:20 INFO JobScheduler: Finished job streaming job 1495246820000 ms.0 from job set of time 1495246820000 ms
    
    17/05/20 10:20:20 INFO JobScheduler: Total delay: 0.954 s for time 1495246820000 ms (execution: 0.064 s)
    
    17/05/20 10:20:21 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246110000.bk
    
    17/05/20 10:20:21 INFO JobGenerator: Checkpointing graph for time 1495246820000 ms
    
    17/05/20 10:20:21 INFO DStreamGraph: Updating checkpoint data for time 1495246820000 ms
    
    17/05/20 10:20:21 INFO CheckpointWriter: Checkpoint for time 1495246820000 ms saved to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000', took 3942 bytes and 125 ms
    
    17/05/20 10:20:21 INFO DStreamGraph: Updated checkpoint data for time 1495246820000 ms
    
    17/05/20 10:20:21 INFO CheckpointWriter: Submitted checkpoint of time 1495246820000 ms to writer queue
    
    17/05/20 10:20:21 INFO CheckpointWriter: Saving checkpoint for time 1495246820000 ms to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000'
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast4piece0 on 192.168.6.90:52302 in memory (size: 2.4 KB, free: 912.3 MB)
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast1piece0 on 192.168.6.90:52302 in memory (size: 1266.0 B, free: 912.3 MB)
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast2piece0 on 192.168.6.90:52302 in memory (size: 2.7 KB, free: 912.3 MB)
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast3piece0 on 192.168.6.90:52302 in memory (size: 2.9 KB, free: 912.3 MB)
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast5piece0 on 192.168.6.90:52302 in memory (size: 2.4 KB, free: 912.3 MB)
    
    17/05/20 10:20:21 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246110000
    
    17/05/20 10:20:21 INFO CheckpointWriter: Checkpoint for time 1495246820000 ms saved to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000', took 3938 bytes and 48 ms
    
    17/05/20 10:20:21 INFO DStreamGraph: Clearing checkpoint data for time 1495246820000 ms
    
    17/05/20 10:20:21 INFO DStreamGraph: Cleared checkpoint data for time 1495246820000 ms
    
    17/05/20 10:20:21 INFO ReceivedBlockTracker: Deleting batches: 
    
    17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Attempting to clear 2 old log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000: file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata/log-1495246055009-1495246115009
    
    file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata/log-1495246115167-1495246175167
    
    17/05/20 10:20:21 INFO InputInfoTracker: remove old batch metadata: 
    
    17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Cleared log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000
    
    17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Cleared log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000
    
    17/05/20 10:20:21 INFO MemoryStore: Block input-0-1495246821600 stored as bytes in memory (estimated size 284.0 B, free 912.2 MB)
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Added input-0-1495246821600 in memory on 192.168.6.90:52302 (size: 284.0 B, free: 912.3 MB)
    
    17/05/20 10:20:21 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s.
    
    17/05/20 10:20:21 WARN BlockManager: Block input-0-1495246821600 replicated to only 0 peer(s) instead of 1 peers
    
    17/05/20 10:20:21 INFO BlockGenerator: Pushed block input-0-1495246821600
    
    17/05/20 10:20:24 INFO MemoryStore: Block input-0-1495246824600 stored as bytes in memory (estimated size 284.0 B, free 912.2 MB)
    
    17/05/20 10:20:24 INFO BlockManagerInfo: Added input-0-1495246824600 in memory on 192.168.6.90:52302 (size: 284.0 B, free: 912.3 MB)
    
    17/05/20 10:20:24 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s.
    
    17/05/20 10:20:24 WARN BlockManager: Block input-0-1495246824600 replicated to only 0 peer(s) instead of 1 peers
    
    17/05/20 10:20:24 INFO BlockGenerator: Pushed block input-0-1495246824600
    
    17/05/20 10:20:25 INFO ShuffledDStream: Slicing from 1495246815000 ms to 1495246815000 ms (aligned to 1495246815000 ms and 1495246815000 ms)
    
    17/05/20 10:20:25 INFO ShuffledDStream: Time 1495246815000 ms is invalid as zeroTime is 1495246815000 ms , slideDuration is 5000 ms and difference is 0 ms
    
    17/05/20 10:20:25 INFO ShuffledDStream: Slicing from 1495246825000 ms to 1495246825000 ms (aligned to 1495246825000 ms and 1495246825000 ms)
    
    17/05/20 10:20:25 INFO ReducedWindowedDStream: Marking RDD 16 for time 1495246825000 ms for checkpointing
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58
    
    17/05/20 10:20:25 INFO DAGScheduler: Registering RDD 13 (mapPartitions at LocalTest.scala:36)
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 4 (sortByKey at LocalTest.scala:58) with 2 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 15 (sortByKey at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 12, ShuffleMapStage 13, ShuffleMapStage 14)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 13)
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ShuffleMapStage 13 (MapPartitionsRDD[13] at mapPartitions at LocalTest.scala:36), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 2.6 KB, free 912.2 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast6piece0 stored as bytes in memory (estimated size 1667.0 B, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast6piece0 in memory on 192.168.6.90:52302 (size: 1667.0 B, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 13 (MapPartitionsRDD[13] at mapPartitions at LocalTest.scala:36)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 13.0 with 2 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 13.0 (TID 9, localhost, executor driver, partition 0, ANY, 6020 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 13.0 (TID 9)
    
    17/05/20 10:20:25 INFO BlockManager: Found block input-0-1495246821600 locally
    
    17/05/20 10:20:25 INFO LocalTest: Handling events, events is empty: false
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test1 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test2 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test2 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 13.0 (TID 9). 1498 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 13.0 (TID 10, localhost, executor driver, partition 1, ANY, 6020 bytes)
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 13.0 (TID 9) in 28 ms on localhost (executor driver) (1/2)
    
    17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 13.0 (TID 10)
    
    17/05/20 10:20:25 INFO BlockManager: Found block input-0-1495246824600 locally
    
    17/05/20 10:20:25 INFO LocalTest: Handling events, events is empty: false
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test1 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test2 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test2 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 13.0 (TID 10). 1498 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 13.0 (TID 10) in 22 ms on localhost (executor driver) (2/2)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ShuffleMapStage 13 (mapPartitions at LocalTest.scala:36) finished in 0.048 s
    
    17/05/20 10:20:25 INFO DAGScheduler: looking for newly runnable stages
    
    17/05/20 10:20:25 INFO DAGScheduler: running: Set(ResultStage 0)
    
    17/05/20 10:20:25 INFO DAGScheduler: waiting: Set(ResultStage 15)
    
    17/05/20 10:20:25 INFO DAGScheduler: failed: Set()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 15 (MapPartitionsRDD[19] at sortByKey at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 6.4 KB, free 912.2 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast7piece0 stored as bytes in memory (estimated size 3.6 KB, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast7piece0 in memory on 192.168.6.90:52302 (size: 3.6 KB, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 15 (MapPartitionsRDD[19] at sortByKey at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 15.0 with 2 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 15.0 (TID 11, localhost, executor driver, partition 0, PROCESS_LOCAL, 6185 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 15.0 (TID 11)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd60 locally
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:25 INFO MemoryStore: Block rdd140 stored as bytes in memory (estimated size 155.0 B, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd140 in memory on 192.168.6.90:52302 (size: 155.0 B, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block rdd160 stored as bytes in memory (estimated size 155.0 B, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd160 in memory on 192.168.6.90:52302 (size: 155.0 B, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 15.0 (TID 11). 2799 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 15.0 (TID 12, localhost, executor driver, partition 1, PROCESS_LOCAL, 6185 bytes)
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 15.0 (TID 11) in 31 ms on localhost (executor driver) (1/2)
    
    17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 15.0 (TID 12)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd61 locally
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:25 INFO MemoryStore: Block rdd141 stored as bytes in memory (estimated size 180.0 B, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd141 in memory on 192.168.6.90:52302 (size: 180.0 B, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block rdd161 stored as bytes in memory (estimated size 180.0 B, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd161 in memory on 192.168.6.90:52302 (size: 180.0 B, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 15.0 (TID 12). 2803 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 15.0 (TID 12) in 39 ms on localhost (executor driver) (2/2)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 15.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ResultStage 15 (sortByKey at LocalTest.scala:58) finished in 0.069 s
    
    17/05/20 10:20:25 INFO DAGScheduler: Job 4 finished: sortByKey at LocalTest.scala:58, took 0.156155 s
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 127.1 KB, free 912.1 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast8piece0 stored as bytes in memory (estimated size 14.3 KB, free 912.1 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast8piece0 in memory on 192.168.6.90:52302 (size: 14.3 KB, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 8 from sortByKey at LocalTest.scala:58
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58
    
    17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 83 bytes
    
    17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 3 is 157 bytes
    
    17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 156 bytes
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 5 (sortByKey at LocalTest.scala:58) with 2 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 19 (sortByKey at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 16, ShuffleMapStage 17, ShuffleMapStage 18)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 19 (MapPartitionsRDD[16] at reduceByKeyAndWindow at LocalTest.scala:52), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 6.5 KB, free 912.1 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast9piece0 stored as bytes in memory (estimated size 3.6 KB, free 912.1 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast9piece0 in memory on 192.168.6.90:52302 (size: 3.6 KB, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 19 (MapPartitionsRDD[16] at reduceByKeyAndWindow at LocalTest.scala:52)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 19.0 with 2 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 19.0 (TID 13, localhost, executor driver, partition 0, PROCESS_LOCAL, 6122 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 19.0 (TID 13)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd160 locally
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 19.0 (TID 13). 1085 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 19.0 (TID 14, localhost, executor driver, partition 1, PROCESS_LOCAL, 6122 bytes)
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 19.0 (TID 13) in 30 ms on localhost (executor driver) (1/2)
    
    17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 19.0 (TID 14)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd161 locally
    
    17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 19.0 (TID 14). 1172 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 19.0 (TID 14) in 85 ms on localhost (executor driver) (2/2)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 19.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ResultStage 19 (sortByKey at LocalTest.scala:58) finished in 0.115 s
    
    17/05/20 10:20:25 INFO DAGScheduler: Job 5 finished: sortByKey at LocalTest.scala:58, took 0.129602 s
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 127.1 KB, free 912.0 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast10piece0 stored as bytes in memory (estimated size 14.3 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast10piece0 in memory on 192.168.6.90:52302 (size: 14.3 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 10 from sortByKey at LocalTest.scala:58
    
    17/05/20 10:20:25 INFO ReliableRDDCheckpointData: Done checkpointing RDD 16 to file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/874437fe-0c21-4a9d-8793-8ae914f3f38b/rdd-16, new parent is RDD 20
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: take at LocalTest.scala:59
    
    17/05/20 10:20:25 INFO DAGScheduler: Registering RDD 17 (map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 6 (take at LocalTest.scala:59) with 1 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 21 (take at LocalTest.scala:59)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 20)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 20)
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ShuffleMapStage 20 (MapPartitionsRDD[17] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 6.1 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast11piece0 stored as bytes in memory (estimated size 3.5 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast11piece0 in memory on 192.168.6.90:52302 (size: 3.5 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 20 (MapPartitionsRDD[17] at map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 20.0 with 2 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 20.0 (TID 15, localhost, executor driver, partition 0, PROCESS_LOCAL, 6169 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 20.0 (TID 15)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd160 locally
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 20.0 (TID 15). 1413 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 20.0 (TID 16, localhost, executor driver, partition 1, PROCESS_LOCAL, 6169 bytes)
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 20.0 (TID 15) in 44 ms on localhost (executor driver) (1/2)
    
    17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 20.0 (TID 16)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd161 locally
    
    17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 20.0 (TID 16). 1413 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 20.0 (TID 16) in 58 ms on localhost (executor driver) (2/2)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ShuffleMapStage 20 (map at LocalTest.scala:58) finished in 0.101 s
    
    17/05/20 10:20:25 INFO DAGScheduler: looking for newly runnable stages
    
    17/05/20 10:20:25 INFO DAGScheduler: running: Set(ResultStage 0)
    
    17/05/20 10:20:25 INFO DAGScheduler: waiting: Set(ResultStage 21)
    
    17/05/20 10:20:25 INFO DAGScheduler: failed: Set()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 21 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 4.0 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast12piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast12piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 21 (MapPartitionsRDD[22] at map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 21.0 with 1 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 21.0 (TID 17, localhost, executor driver, partition 0, ANY, 5903 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 21.0 (TID 17)
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 21.0 (TID 17). 1869 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 21.0 (TID 17) in 28 ms on localhost (executor driver) (1/1)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 21.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ResultStage 21 (take at LocalTest.scala:59) finished in 0.029 s
    
    17/05/20 10:20:25 INFO DAGScheduler: Job 6 finished: take at LocalTest.scala:59, took 0.150449 s
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: take at LocalTest.scala:59
    
    17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 4 is 160 bytes
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 7 (take at LocalTest.scala:59) with 1 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 23 (take at LocalTest.scala:59)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 22)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 23 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 4.0 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast13piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast13piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 23 (MapPartitionsRDD[22] at map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 23.0 with 1 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 23.0 (TID 18, localhost, executor driver, partition 1, ANY, 5903 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 23.0 (TID 18)
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 23.0 (TID 18). 1869 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 23.0 (TID 18) in 75 ms on localhost (executor driver) (1/1)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 23.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ResultStage 23 (take at LocalTest.scala:59) finished in 0.077 s
    
    17/05/20 10:20:25 INFO DAGScheduler: Job 7 finished: take at LocalTest.scala:59, took 0.087657 s
    
    17/05/20 10:20:25 INFO JobScheduler: Added jobs for time 1495246825000 ms
    
    17/05/20 10:20:25 INFO JobGenerator: Checkpointing graph for time 1495246825000 ms
    
    17/05/20 10:20:25 INFO DStreamGraph: Updating checkpoint data for time 1495246825000 ms
    
    17/05/20 10:20:25 INFO JobScheduler: Starting job streaming job 1495246825000 ms.0 from job set of time 1495246825000 ms
    
    (test3 ,8)
    
    (test2 ,4)
    
    17/05/20 10:20:25 INFO DStreamGraph: Updated checkpoint data for time 1495246825000 ms
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: print at LocalTest.scala:64
    
    17/05/20 10:20:25 INFO CheckpointWriter: Saving checkpoint for time 1495246825000 ms to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246825000'
    
    17/05/20 10:20:25 INFO CheckpointWriter: Submitted checkpoint of time 1495246825000 ms to writer queue
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 8 (print at LocalTest.scala:64) with 1 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 25 (print at LocalTest.scala:64)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 24)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 25 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 4.0 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast14piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast14piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 25 (MapPartitionsRDD[22] at map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 25.0 with 1 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 25.0 (TID 19, localhost, executor driver, partition 0, ANY, 6672 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 25.0 (TID 19)
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:25 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246115000.bk
    
    17/05/20 10:20:25 INFO CheckpointWriter: Checkpoint for time 1495246825000 ms saved to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246825000', took 4010 bytes and 37 ms
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 25.0 (TID 19). 1869 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 25.0 (TID 19) in 37 ms on localhost (executor driver) (1/1)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 25.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ResultStage 25 (print at LocalTest.scala:64) finished in 0.039 s
    
    17/05/20 10:20:25 INFO DAGScheduler: Job 8 finished: print at LocalTest.scala:64, took 0.046366 s
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: print at LocalTest.scala:64
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 9 (print at LocalTest.scala:64) with 1 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 27 (print at LocalTest.scala:64)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 26)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 27 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 4.0 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast15piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast15piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 27 (MapPartitionsRDD[22] at map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 27.0 (TID 20, localhost, executor driver, partition 1, ANY, 6672 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 27.0 (TID 20)
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    Time: 1495246825000 ms
    
    (test3 ,8)
    
    (test2 ,4)
    
    (test1 ,2)
    

配置JobHistory

  • spark-defaults.conf:
spark.eventLog.enabled  true
spark.eventLog.dir hdfs://host99:9000/sparkLogHistory
spark.eventLog.compress true
  • eventLog.dir要手动创建: hadoop fs -mkdir /sparkLogHistory
  • spark-env.sh
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=7777 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://host99:9000/sparkLogHistory"

  默认只保留最近三次的jobHistory,端口号是7777

  • 运行一个job试试

<Spark><Spark Streaming><作业分析><JobHistory>的更多相关文章

  1. 简单物联网:外网访问内网路由器下树莓派Flask服务器

    最近做一个小东西,大概过程就是想在教室,宿舍控制实验室的一些设备. 已经在树莓上搭了一个轻量的flask服务器,在实验室的路由器下,任何设备都是可以访问的:但是有一些限制条件,比如我想在宿舍控制我种花 ...

  2. 利用ssh反向代理以及autossh实现从外网连接内网服务器

    前言 最近遇到这样一个问题,我在实验室架设了一台服务器,给师弟或者小伙伴练习Linux用,然后平时在实验室这边直接连接是没有问题的,都是内网嘛.但是回到宿舍问题出来了,使用校园网的童鞋还是能连接上,使 ...

  3. 外网访问内网Docker容器

    外网访问内网Docker容器 本地安装了Docker容器,只能在局域网内访问,怎样从外网也能访问本地Docker容器? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Docker容器 ...

  4. 外网访问内网SpringBoot

    外网访问内网SpringBoot 本地安装了SpringBoot,只能在局域网内访问,怎样从外网也能访问本地SpringBoot? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装Java 1 ...

  5. 外网访问内网Elasticsearch WEB

    外网访问内网Elasticsearch WEB 本地安装了Elasticsearch,只能在局域网内访问其WEB,怎样从外网也能访问本地Elasticsearch? 本文将介绍具体的实现步骤. 1. ...

  6. 怎样从外网访问内网Rails

    外网访问内网Rails 本地安装了Rails,只能在局域网内访问,怎样从外网也能访问本地Rails? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Rails 默认安装的Rails端口 ...

  7. 怎样从外网访问内网Memcached数据库

    外网访问内网Memcached数据库 本地安装了Memcached数据库,只能在局域网内访问,怎样从外网也能访问本地Memcached数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装 ...

  8. 怎样从外网访问内网CouchDB数据库

    外网访问内网CouchDB数据库 本地安装了CouchDB数据库,只能在局域网内访问,怎样从外网也能访问本地CouchDB数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Cou ...

  9. 怎样从外网访问内网DB2数据库

    外网访问内网DB2数据库 本地安装了DB2数据库,只能在局域网内访问,怎样从外网也能访问本地DB2数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动DB2数据库 默认安装的DB2 ...

  10. 怎样从外网访问内网OpenLDAP数据库

    外网访问内网OpenLDAP数据库 本地安装了OpenLDAP数据库,只能在局域网内访问,怎样从外网也能访问本地OpenLDAP数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动 ...

随机推荐

  1. Elasticsearch SQL

    es sql是一个X-pack组件 ,允许对es执行类似sql的查询,可以将Elasticsearch SQL理解为一个编译器,既能理解es,又能理解sql.可以通过利用es,实施大规模实时读取和处理 ...

  2. logstash配置文件

    1. 安装  logstash 安装过程很简单,直接参照官方文档: https://www.elastic.co/guide/en/logstash/current/installing-logsta ...

  3. Forbidden Subwords

    pro: sol: 建出ac自动机. 一个合法的答案对应一条路径满足从一个scc走到另一个scc的路径. 发现这个题的方案数有可能是无限的. 会在以下两种情况无限: 因此,去掉无限情况后,环只有简单环 ...

  4. Leetcode 999. 车的可用捕获量

    999. 车的可用捕获量  显示英文描述 我的提交返回竞赛   用户通过次数255 用户尝试次数260 通过次数255 提交次数357 题目难度Easy 在一个 8 x 8 的棋盘上,有一个白色车(r ...

  5. myeclipse debug模式 报错source not found

    myeclipse debug模式下,启动报错 source not found:SignatureParser.current() line: 解决方法:将debug视图下的右上方的jar有断点的地 ...

  6. Spring Boot: remove jsessionid from url

    参考代码 :Spring Boot: remove jsessionid from url 我的SpringBoot用2.0.*,答案中的第一二个方案亲测无效. 应该在继承了Configuration ...

  7. Nodejs--url模块

    由于GET请求直接被嵌入在路径中,URL是完整的请求路径,包括了?后面的部分,因此你可以手动解析后面的内容作为GET请求的参数. url 模块中的 parse 函数可以用于解析url中的参数. url ...

  8. iOS 如何优化项目

    原文 前言 在项目业务趋于稳定的时候,开发完迭代需求后,我们可能会无所适从,进入一段空白期,但是对于攻城狮来说闲暇不是件好事,所以我们可能总想学点什么,却又没有头绪.这个时候我们就可以考虑完善和优化我 ...

  9. 把旧系统迁移到.Net Core 2.0 日记(8) - EASYUI datagrid+ Dapper+ 导出Excel

    迁移也没太大变化,有一个, 之前的Request.QueryString 是返回NameValueCollection, 现在则是返回整个字符串. 你要改成Request.Query[“key”] 直 ...

  10. 逆袭之旅DAY20.XIA.程序调试

    2018-07-16 20:25:50 F5:进入方法 F6:单步执行