<Spark><Spark Streaming><作业分析><JobHistory>
Intro
- 这篇是对一个Spark (Streaming)作业的log进行分析。用来加深对Spark application运行过程,优化空间的各种理解。
Here to Start
- 从我这个初学者写得一个Spark Streaming程序开始...
package com.wttttt.spark /**
* Created with IntelliJ IDEA.
* Description:
* Author: wttttt
* Github: https://github.com/wttttt-wang/hadoop_inaction
* Date: 2017-05-19
* Time: 09:56
*/
import java.util.regex.Pattern import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Milliseconds, StreamingContext}
import org.slf4j.LoggerFactory import scala.collection.mutable object LocalTest {
val logger = LoggerFactory.getLogger("LocalTest")
def main(args: Array[String]) { val batchInterval = Milliseconds(10000)
val slideInterval = Milliseconds(5000) val conf = new SparkConf()
.setMaster("local[2]")
.setAppName("LocalTest")
// WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data,
// otherwise Spark jobs will not get resources to process the received data.
val sc = new StreamingContext(conf, Milliseconds(5000))
sc.checkpoint("flumeCheckpoint/") val stream = sc.socketTextStream("localhost", 9998) val counts = stream.mapPartitions{ events =>
val pattern = Pattern.compile("\\?Input=[^\\s]*\\s")
val map = new mutable.HashMap[String, Int]()
logger.info("Handling events, events is empty: " + events.isEmpty)
while (events.hasNext){ // par is an Iterator!!!
val line = events.next()
val m = pattern.matcher(line)
if (m.find()) {
val words = line.substring(m.start(), m.end()).split("=")(1).toLowerCase()
logger.info(s"Processing words $words")
map.put(words, map.getOrElse(words, 0) + 1)
}
}
map.iterator
} val window = counts.reduceByKeyAndWindow(_+_, _-_, batchInterval, slideInterval)
// window.print() // transform和它的变体trnasformWith运行在DStream上任意的RDD-to-RDD函数;
// 可以用来使用那些不包含在DStrema API中RDD操作
val sorted = window.transform(rdd =>{
val sortRdd = rdd.map(t => (t._2, t._1)).sortByKey(false).map(t => (t._2, t._1))
val more = sortRdd.take(2)
more.foreach(println)
sortRdd
}) sorted.print() sc.start()
sc.awaitTermination()
}
}- 看运行log (摘取的一部分)
17/05/20 10:20:13 INFO Utils: Successfully started service 'sparkDriver' on port 52300. 17/05/20 10:20:17 INFO ReducedWindowedDStream: Checkpoint interval automatically set to 10000 ms 17/05/20 10:20:17 INFO SocketInputDStream: Slide time = 5000 ms 17/05/20 10:20:17 INFO ReceiverTracker: Receiver 0 started 17/05/20 10:20:17 INFO DAGScheduler: Got job 0 (start at LocalTest.scala:66) with 1 output partitions 17/05/20 10:20:17 INFO JobScheduler: Started JobScheduler 17/05/20 10:20:17 INFO StreamingContext: StreamingContext started 17/05/20 10:20:20 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58 17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 2 (mapPartitions at LocalTest.scala:36) 17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 4 (reduceByKeyAndWindow at LocalTest.scala:52) 17/05/20 10:20:20 INFO DAGScheduler: Got job 1 (sortByKey at LocalTest.scala:58) with 2 output partitions 17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 3 (sortByKey at LocalTest.scala:58) 17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 1, ShuffleMapStage 2) 17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2) 17/05/20 10:20:20 INFO DAGScheduler: Submitting ShuffleMapStage 2 (ParallelCollectionRDD[4] at reduceByKeyAndWindow at LocalTest.scala:52), which has no missing parents 17/05/20 10:20:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 2 (ParallelCollectionRDD[4] at reduceByKeyAndWindow at LocalTest.scala:52) 17/05/20 10:20:20 INFO DAGScheduler: Job 1 finished: sortByKey at LocalTest.scala:58, took 0.314005 s 17/05/20 10:20:20 INFO SparkContext: Starting job: take at LocalTest.scala:59 17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 7 (map at LocalTest.scala:58) 17/05/20 10:20:20 INFO DAGScheduler: Got job 2 (take at LocalTest.scala:59) with 1 output partitions 17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 7 (take at LocalTest.scala:59) 17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 6) 17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 6) 17/05/20 10:20:20 INFO DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[7] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[7] at map at LocalTest.scala:58) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 6.0 with 2 tasks 17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 5, localhost, executor driver, partition 0, PROCESS_LOCAL, 6109 bytes) 17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 6.0 (TID 5) 17/05/20 10:20:20 INFO BlockManager: Found block rdd60 locally 17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 6.0 (TID 5). 1062 bytes result sent to driver 17/05/20 10:20:20 INFO TaskSetManager: Starting task 1.0 in stage 6.0 (TID 6, localhost, executor driver, partition 1, PROCESS_LOCAL, 6109 bytes) 17/05/20 10:20:20 INFO Executor: Running task 1.0 in stage 6.0 (TID 6) 17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 5) in 14 ms on localhost (executor driver) (1/2) 17/05/20 10:20:20 INFO BlockManager: Found block rdd61 locally 17/05/20 10:20:20 INFO Executor: Finished task 1.0 in stage 6.0 (TID 6). 1062 bytes result sent to driver 17/05/20 10:20:20 INFO TaskSetManager: Finished task 1.0 in stage 6.0 (TID 6) in 9 ms on localhost (executor driver) (2/2) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 17/05/20 10:20:20 INFO DAGScheduler: ShuffleMapStage 6 (map at LocalTest.scala:58) finished in 0.024 s 17/05/20 10:20:20 INFO DAGScheduler: looking for newly runnable stages 17/05/20 10:20:20 INFO DAGScheduler: running: Set(ResultStage 0) 17/05/20 10:20:20 INFO DAGScheduler: waiting: Set(ResultStage 7) 17/05/20 10:20:20 INFO DAGScheduler: failed: Set() 17/05/20 10:20:20 INFO DAGScheduler: Submitting ResultStage 7 (MapPartitionsRDD[11] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:20 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 4.0 KB, free 912.2 MB) 17/05/20 10:20:20 INFO MemoryStore: Block broadcast4piece0 stored as bytes in memory (estimated size 2.4 KB, free 912.2 MB) 17/05/20 10:20:20 INFO BlockManagerInfo: Added broadcast4piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.3 MB) 17/05/20 10:20:20 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 7 (MapPartitionsRDD[11] at map at LocalTest.scala:58) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 7.0 with 1 tasks 17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 7, localhost, executor driver, partition 0, PROCESS_LOCAL, 5903 bytes) 17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 7.0 (TID 7) 17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 2 blocks 17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 7.0 (TID 7). 1718 bytes result sent to driver 17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 7) in 146 ms on localhost (executor driver) (1/1) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 17/05/20 10:20:20 INFO DAGScheduler: Job 2 finished: take at LocalTest.scala:59, took 0.198460 s 17/05/20 10:20:20 INFO SparkContext: Starting job: print at LocalTest.scala:64 17/05/20 10:20:20 INFO DAGScheduler: Got job 3 (print at LocalTest.scala:64) with 1 output partitions 17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 11 (print at LocalTest.scala:64) 17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 10) 17/05/20 10:20:20 INFO CheckpointWriter: Submitted checkpoint of time 1495246820000 ms to writer queue 17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List() 17/05/20 10:20:20 INFO DAGScheduler: Submitting ResultStage 11 (MapPartitionsRDD[11] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:20 INFO CheckpointWriter: Saving checkpoint for time 1495246820000 ms to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000' 17/05/20 10:20:20 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 4.0 KB, free 912.2 MB) 17/05/20 10:20:20 INFO MemoryStore: Block broadcast5piece0 stored as bytes in memory (estimated size 2.4 KB, free 912.2 MB) 17/05/20 10:20:20 INFO BlockManagerInfo: Added broadcast5piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.3 MB) 17/05/20 10:20:20 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 11 (MapPartitionsRDD[11] at map at LocalTest.scala:58) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 11.0 with 1 tasks 17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 11.0 (TID 8, localhost, executor driver, partition 0, PROCESS_LOCAL, 6672 bytes) 17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 11.0 (TID 8) 17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 2 blocks 17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 11.0 (TID 8). 1718 bytes result sent to driver 17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 11.0 (TID 8) in 39 ms on localhost (executor driver) (1/1) 17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool 17/05/20 10:20:20 INFO DAGScheduler: ResultStage 11 (print at LocalTest.scala:64) finished in 0.041 s 17/05/20 10:20:20 INFO DAGScheduler: Job 3 finished: print at LocalTest.scala:64, took 0.053225 s Time: 1495246820000 ms 17/05/20 10:20:20 INFO JobScheduler: Finished job streaming job 1495246820000 ms.0 from job set of time 1495246820000 ms 17/05/20 10:20:20 INFO JobScheduler: Total delay: 0.954 s for time 1495246820000 ms (execution: 0.064 s) 17/05/20 10:20:21 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246110000.bk 17/05/20 10:20:21 INFO JobGenerator: Checkpointing graph for time 1495246820000 ms 17/05/20 10:20:21 INFO DStreamGraph: Updating checkpoint data for time 1495246820000 ms 17/05/20 10:20:21 INFO CheckpointWriter: Checkpoint for time 1495246820000 ms saved to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000', took 3942 bytes and 125 ms 17/05/20 10:20:21 INFO DStreamGraph: Updated checkpoint data for time 1495246820000 ms 17/05/20 10:20:21 INFO CheckpointWriter: Submitted checkpoint of time 1495246820000 ms to writer queue 17/05/20 10:20:21 INFO CheckpointWriter: Saving checkpoint for time 1495246820000 ms to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000' 17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast4piece0 on 192.168.6.90:52302 in memory (size: 2.4 KB, free: 912.3 MB) 17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast1piece0 on 192.168.6.90:52302 in memory (size: 1266.0 B, free: 912.3 MB) 17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast2piece0 on 192.168.6.90:52302 in memory (size: 2.7 KB, free: 912.3 MB) 17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast3piece0 on 192.168.6.90:52302 in memory (size: 2.9 KB, free: 912.3 MB) 17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast5piece0 on 192.168.6.90:52302 in memory (size: 2.4 KB, free: 912.3 MB) 17/05/20 10:20:21 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246110000 17/05/20 10:20:21 INFO CheckpointWriter: Checkpoint for time 1495246820000 ms saved to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000', took 3938 bytes and 48 ms 17/05/20 10:20:21 INFO DStreamGraph: Clearing checkpoint data for time 1495246820000 ms 17/05/20 10:20:21 INFO DStreamGraph: Cleared checkpoint data for time 1495246820000 ms 17/05/20 10:20:21 INFO ReceivedBlockTracker: Deleting batches: 17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Attempting to clear 2 old log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000: file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata/log-1495246055009-1495246115009 file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata/log-1495246115167-1495246175167 17/05/20 10:20:21 INFO InputInfoTracker: remove old batch metadata: 17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Cleared log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000 17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Cleared log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000 17/05/20 10:20:21 INFO MemoryStore: Block input-0-1495246821600 stored as bytes in memory (estimated size 284.0 B, free 912.2 MB) 17/05/20 10:20:21 INFO BlockManagerInfo: Added input-0-1495246821600 in memory on 192.168.6.90:52302 (size: 284.0 B, free: 912.3 MB) 17/05/20 10:20:21 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s. 17/05/20 10:20:21 WARN BlockManager: Block input-0-1495246821600 replicated to only 0 peer(s) instead of 1 peers 17/05/20 10:20:21 INFO BlockGenerator: Pushed block input-0-1495246821600 17/05/20 10:20:24 INFO MemoryStore: Block input-0-1495246824600 stored as bytes in memory (estimated size 284.0 B, free 912.2 MB) 17/05/20 10:20:24 INFO BlockManagerInfo: Added input-0-1495246824600 in memory on 192.168.6.90:52302 (size: 284.0 B, free: 912.3 MB) 17/05/20 10:20:24 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s. 17/05/20 10:20:24 WARN BlockManager: Block input-0-1495246824600 replicated to only 0 peer(s) instead of 1 peers 17/05/20 10:20:24 INFO BlockGenerator: Pushed block input-0-1495246824600 17/05/20 10:20:25 INFO ShuffledDStream: Slicing from 1495246815000 ms to 1495246815000 ms (aligned to 1495246815000 ms and 1495246815000 ms) 17/05/20 10:20:25 INFO ShuffledDStream: Time 1495246815000 ms is invalid as zeroTime is 1495246815000 ms , slideDuration is 5000 ms and difference is 0 ms 17/05/20 10:20:25 INFO ShuffledDStream: Slicing from 1495246825000 ms to 1495246825000 ms (aligned to 1495246825000 ms and 1495246825000 ms) 17/05/20 10:20:25 INFO ReducedWindowedDStream: Marking RDD 16 for time 1495246825000 ms for checkpointing 17/05/20 10:20:25 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58 17/05/20 10:20:25 INFO DAGScheduler: Registering RDD 13 (mapPartitions at LocalTest.scala:36) 17/05/20 10:20:25 INFO DAGScheduler: Got job 4 (sortByKey at LocalTest.scala:58) with 2 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 15 (sortByKey at LocalTest.scala:58) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 12, ShuffleMapStage 13, ShuffleMapStage 14) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 13) 17/05/20 10:20:25 INFO DAGScheduler: Submitting ShuffleMapStage 13 (MapPartitionsRDD[13] at mapPartitions at LocalTest.scala:36), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 2.6 KB, free 912.2 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast6piece0 stored as bytes in memory (estimated size 1667.0 B, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast6piece0 in memory on 192.168.6.90:52302 (size: 1667.0 B, free: 912.3 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 13 (MapPartitionsRDD[13] at mapPartitions at LocalTest.scala:36) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 13.0 with 2 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 13.0 (TID 9, localhost, executor driver, partition 0, ANY, 6020 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 13.0 (TID 9) 17/05/20 10:20:25 INFO BlockManager: Found block input-0-1495246821600 locally 17/05/20 10:20:25 INFO LocalTest: Handling events, events is empty: false 17/05/20 10:20:25 INFO LocalTest: Processing words test1 17/05/20 10:20:25 INFO LocalTest: Processing words test2 17/05/20 10:20:25 INFO LocalTest: Processing words test2 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 13.0 (TID 9). 1498 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 13.0 (TID 10, localhost, executor driver, partition 1, ANY, 6020 bytes) 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 13.0 (TID 9) in 28 ms on localhost (executor driver) (1/2) 17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 13.0 (TID 10) 17/05/20 10:20:25 INFO BlockManager: Found block input-0-1495246824600 locally 17/05/20 10:20:25 INFO LocalTest: Handling events, events is empty: false 17/05/20 10:20:25 INFO LocalTest: Processing words test1 17/05/20 10:20:25 INFO LocalTest: Processing words test2 17/05/20 10:20:25 INFO LocalTest: Processing words test2 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO LocalTest: Processing words test3 17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 13.0 (TID 10). 1498 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 13.0 (TID 10) in 22 ms on localhost (executor driver) (2/2) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ShuffleMapStage 13 (mapPartitions at LocalTest.scala:36) finished in 0.048 s 17/05/20 10:20:25 INFO DAGScheduler: looking for newly runnable stages 17/05/20 10:20:25 INFO DAGScheduler: running: Set(ResultStage 0) 17/05/20 10:20:25 INFO DAGScheduler: waiting: Set(ResultStage 15) 17/05/20 10:20:25 INFO DAGScheduler: failed: Set() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 15 (MapPartitionsRDD[19] at sortByKey at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 6.4 KB, free 912.2 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast7piece0 stored as bytes in memory (estimated size 3.6 KB, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast7piece0 in memory on 192.168.6.90:52302 (size: 3.6 KB, free: 912.3 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 15 (MapPartitionsRDD[19] at sortByKey at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 15.0 with 2 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 15.0 (TID 11, localhost, executor driver, partition 0, PROCESS_LOCAL, 6185 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 15.0 (TID 11) 17/05/20 10:20:25 INFO BlockManager: Found block rdd60 locally 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:25 INFO MemoryStore: Block rdd140 stored as bytes in memory (estimated size 155.0 B, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd140 in memory on 192.168.6.90:52302 (size: 155.0 B, free: 912.3 MB) 17/05/20 10:20:25 INFO MemoryStore: Block rdd160 stored as bytes in memory (estimated size 155.0 B, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd160 in memory on 192.168.6.90:52302 (size: 155.0 B, free: 912.3 MB) 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 15.0 (TID 11). 2799 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 15.0 (TID 12, localhost, executor driver, partition 1, PROCESS_LOCAL, 6185 bytes) 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 15.0 (TID 11) in 31 ms on localhost (executor driver) (1/2) 17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 15.0 (TID 12) 17/05/20 10:20:25 INFO BlockManager: Found block rdd61 locally 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:25 INFO MemoryStore: Block rdd141 stored as bytes in memory (estimated size 180.0 B, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd141 in memory on 192.168.6.90:52302 (size: 180.0 B, free: 912.3 MB) 17/05/20 10:20:25 INFO MemoryStore: Block rdd161 stored as bytes in memory (estimated size 180.0 B, free 912.2 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd161 in memory on 192.168.6.90:52302 (size: 180.0 B, free: 912.3 MB) 17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 15.0 (TID 12). 2803 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 15.0 (TID 12) in 39 ms on localhost (executor driver) (2/2) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 15.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ResultStage 15 (sortByKey at LocalTest.scala:58) finished in 0.069 s 17/05/20 10:20:25 INFO DAGScheduler: Job 4 finished: sortByKey at LocalTest.scala:58, took 0.156155 s 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 127.1 KB, free 912.1 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast8piece0 stored as bytes in memory (estimated size 14.3 KB, free 912.1 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast8piece0 in memory on 192.168.6.90:52302 (size: 14.3 KB, free: 912.3 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 8 from sortByKey at LocalTest.scala:58 17/05/20 10:20:25 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58 17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 83 bytes 17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 3 is 157 bytes 17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 156 bytes 17/05/20 10:20:25 INFO DAGScheduler: Got job 5 (sortByKey at LocalTest.scala:58) with 2 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 19 (sortByKey at LocalTest.scala:58) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 16, ShuffleMapStage 17, ShuffleMapStage 18) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 19 (MapPartitionsRDD[16] at reduceByKeyAndWindow at LocalTest.scala:52), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 6.5 KB, free 912.1 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast9piece0 stored as bytes in memory (estimated size 3.6 KB, free 912.1 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast9piece0 in memory on 192.168.6.90:52302 (size: 3.6 KB, free: 912.3 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 19 (MapPartitionsRDD[16] at reduceByKeyAndWindow at LocalTest.scala:52) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 19.0 with 2 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 19.0 (TID 13, localhost, executor driver, partition 0, PROCESS_LOCAL, 6122 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 19.0 (TID 13) 17/05/20 10:20:25 INFO BlockManager: Found block rdd160 locally 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 19.0 (TID 13). 1085 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 19.0 (TID 14, localhost, executor driver, partition 1, PROCESS_LOCAL, 6122 bytes) 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 19.0 (TID 13) in 30 ms on localhost (executor driver) (1/2) 17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 19.0 (TID 14) 17/05/20 10:20:25 INFO BlockManager: Found block rdd161 locally 17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 19.0 (TID 14). 1172 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 19.0 (TID 14) in 85 ms on localhost (executor driver) (2/2) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 19.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ResultStage 19 (sortByKey at LocalTest.scala:58) finished in 0.115 s 17/05/20 10:20:25 INFO DAGScheduler: Job 5 finished: sortByKey at LocalTest.scala:58, took 0.129602 s 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 127.1 KB, free 912.0 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast10piece0 stored as bytes in memory (estimated size 14.3 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast10piece0 in memory on 192.168.6.90:52302 (size: 14.3 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 10 from sortByKey at LocalTest.scala:58 17/05/20 10:20:25 INFO ReliableRDDCheckpointData: Done checkpointing RDD 16 to file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/874437fe-0c21-4a9d-8793-8ae914f3f38b/rdd-16, new parent is RDD 20 17/05/20 10:20:25 INFO SparkContext: Starting job: take at LocalTest.scala:59 17/05/20 10:20:25 INFO DAGScheduler: Registering RDD 17 (map at LocalTest.scala:58) 17/05/20 10:20:25 INFO DAGScheduler: Got job 6 (take at LocalTest.scala:59) with 1 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 21 (take at LocalTest.scala:59) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 20) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 20) 17/05/20 10:20:25 INFO DAGScheduler: Submitting ShuffleMapStage 20 (MapPartitionsRDD[17] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 6.1 KB, free 911.9 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast11piece0 stored as bytes in memory (estimated size 3.5 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast11piece0 in memory on 192.168.6.90:52302 (size: 3.5 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 20 (MapPartitionsRDD[17] at map at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 20.0 with 2 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 20.0 (TID 15, localhost, executor driver, partition 0, PROCESS_LOCAL, 6169 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 20.0 (TID 15) 17/05/20 10:20:25 INFO BlockManager: Found block rdd160 locally 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 20.0 (TID 15). 1413 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 20.0 (TID 16, localhost, executor driver, partition 1, PROCESS_LOCAL, 6169 bytes) 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 20.0 (TID 15) in 44 ms on localhost (executor driver) (1/2) 17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 20.0 (TID 16) 17/05/20 10:20:25 INFO BlockManager: Found block rdd161 locally 17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 20.0 (TID 16). 1413 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 20.0 (TID 16) in 58 ms on localhost (executor driver) (2/2) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ShuffleMapStage 20 (map at LocalTest.scala:58) finished in 0.101 s 17/05/20 10:20:25 INFO DAGScheduler: looking for newly runnable stages 17/05/20 10:20:25 INFO DAGScheduler: running: Set(ResultStage 0) 17/05/20 10:20:25 INFO DAGScheduler: waiting: Set(ResultStage 21) 17/05/20 10:20:25 INFO DAGScheduler: failed: Set() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 21 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 4.0 KB, free 911.9 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast12piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast12piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 21 (MapPartitionsRDD[22] at map at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 21.0 with 1 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 21.0 (TID 17, localhost, executor driver, partition 0, ANY, 5903 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 21.0 (TID 17) 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 21.0 (TID 17). 1869 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 21.0 (TID 17) in 28 ms on localhost (executor driver) (1/1) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 21.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ResultStage 21 (take at LocalTest.scala:59) finished in 0.029 s 17/05/20 10:20:25 INFO DAGScheduler: Job 6 finished: take at LocalTest.scala:59, took 0.150449 s 17/05/20 10:20:25 INFO SparkContext: Starting job: take at LocalTest.scala:59 17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 4 is 160 bytes 17/05/20 10:20:25 INFO DAGScheduler: Got job 7 (take at LocalTest.scala:59) with 1 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 23 (take at LocalTest.scala:59) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 22) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 23 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 4.0 KB, free 911.9 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast13piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast13piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 23 (MapPartitionsRDD[22] at map at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 23.0 with 1 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 23.0 (TID 18, localhost, executor driver, partition 1, ANY, 5903 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 23.0 (TID 18) 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 23.0 (TID 18). 1869 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 23.0 (TID 18) in 75 ms on localhost (executor driver) (1/1) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 23.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ResultStage 23 (take at LocalTest.scala:59) finished in 0.077 s 17/05/20 10:20:25 INFO DAGScheduler: Job 7 finished: take at LocalTest.scala:59, took 0.087657 s 17/05/20 10:20:25 INFO JobScheduler: Added jobs for time 1495246825000 ms 17/05/20 10:20:25 INFO JobGenerator: Checkpointing graph for time 1495246825000 ms 17/05/20 10:20:25 INFO DStreamGraph: Updating checkpoint data for time 1495246825000 ms 17/05/20 10:20:25 INFO JobScheduler: Starting job streaming job 1495246825000 ms.0 from job set of time 1495246825000 ms (test3 ,8) (test2 ,4) 17/05/20 10:20:25 INFO DStreamGraph: Updated checkpoint data for time 1495246825000 ms 17/05/20 10:20:25 INFO SparkContext: Starting job: print at LocalTest.scala:64 17/05/20 10:20:25 INFO CheckpointWriter: Saving checkpoint for time 1495246825000 ms to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246825000' 17/05/20 10:20:25 INFO CheckpointWriter: Submitted checkpoint of time 1495246825000 ms to writer queue 17/05/20 10:20:25 INFO DAGScheduler: Got job 8 (print at LocalTest.scala:64) with 1 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 25 (print at LocalTest.scala:64) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 24) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 25 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 4.0 KB, free 911.9 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast14piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast14piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 25 (MapPartitionsRDD[22] at map at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 25.0 with 1 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 25.0 (TID 19, localhost, executor driver, partition 0, ANY, 6672 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 25.0 (TID 19) 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 17/05/20 10:20:25 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246115000.bk 17/05/20 10:20:25 INFO CheckpointWriter: Checkpoint for time 1495246825000 ms saved to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246825000', took 4010 bytes and 37 ms 17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 25.0 (TID 19). 1869 bytes result sent to driver 17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 25.0 (TID 19) in 37 ms on localhost (executor driver) (1/1) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 25.0, whose tasks have all completed, from pool 17/05/20 10:20:25 INFO DAGScheduler: ResultStage 25 (print at LocalTest.scala:64) finished in 0.039 s 17/05/20 10:20:25 INFO DAGScheduler: Job 8 finished: print at LocalTest.scala:64, took 0.046366 s 17/05/20 10:20:25 INFO SparkContext: Starting job: print at LocalTest.scala:64 17/05/20 10:20:25 INFO DAGScheduler: Got job 9 (print at LocalTest.scala:64) with 1 output partitions 17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 27 (print at LocalTest.scala:64) 17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 26) 17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List() 17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 27 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents 17/05/20 10:20:25 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 4.0 KB, free 911.9 MB) 17/05/20 10:20:25 INFO MemoryStore: Block broadcast15piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB) 17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast15piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB) 17/05/20 10:20:25 INFO SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:996 17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 27 (MapPartitionsRDD[22] at map at LocalTest.scala:58) 17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks 17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 27.0 (TID 20, localhost, executor driver, partition 1, ANY, 6672 bytes) 17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 27.0 (TID 20) 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms Time: 1495246825000 ms (test3 ,8) (test2 ,4) (test1 ,2)
配置JobHistory
- spark-defaults.conf:
spark.eventLog.enabled true
spark.eventLog.dir hdfs://host99:9000/sparkLogHistory
spark.eventLog.compress true
- eventLog.dir要手动创建: hadoop fs -mkdir /sparkLogHistory
- spark-env.sh
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=7777 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://host99:9000/sparkLogHistory"
默认只保留最近三次的jobHistory,端口号是7777
- 运行一个job试试
<Spark><Spark Streaming><作业分析><JobHistory>的更多相关文章
- 简单物联网:外网访问内网路由器下树莓派Flask服务器
最近做一个小东西,大概过程就是想在教室,宿舍控制实验室的一些设备. 已经在树莓上搭了一个轻量的flask服务器,在实验室的路由器下,任何设备都是可以访问的:但是有一些限制条件,比如我想在宿舍控制我种花 ...
- 利用ssh反向代理以及autossh实现从外网连接内网服务器
前言 最近遇到这样一个问题,我在实验室架设了一台服务器,给师弟或者小伙伴练习Linux用,然后平时在实验室这边直接连接是没有问题的,都是内网嘛.但是回到宿舍问题出来了,使用校园网的童鞋还是能连接上,使 ...
- 外网访问内网Docker容器
外网访问内网Docker容器 本地安装了Docker容器,只能在局域网内访问,怎样从外网也能访问本地Docker容器? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Docker容器 ...
- 外网访问内网SpringBoot
外网访问内网SpringBoot 本地安装了SpringBoot,只能在局域网内访问,怎样从外网也能访问本地SpringBoot? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装Java 1 ...
- 外网访问内网Elasticsearch WEB
外网访问内网Elasticsearch WEB 本地安装了Elasticsearch,只能在局域网内访问其WEB,怎样从外网也能访问本地Elasticsearch? 本文将介绍具体的实现步骤. 1. ...
- 怎样从外网访问内网Rails
外网访问内网Rails 本地安装了Rails,只能在局域网内访问,怎样从外网也能访问本地Rails? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Rails 默认安装的Rails端口 ...
- 怎样从外网访问内网Memcached数据库
外网访问内网Memcached数据库 本地安装了Memcached数据库,只能在局域网内访问,怎样从外网也能访问本地Memcached数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装 ...
- 怎样从外网访问内网CouchDB数据库
外网访问内网CouchDB数据库 本地安装了CouchDB数据库,只能在局域网内访问,怎样从外网也能访问本地CouchDB数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Cou ...
- 怎样从外网访问内网DB2数据库
外网访问内网DB2数据库 本地安装了DB2数据库,只能在局域网内访问,怎样从外网也能访问本地DB2数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动DB2数据库 默认安装的DB2 ...
- 怎样从外网访问内网OpenLDAP数据库
外网访问内网OpenLDAP数据库 本地安装了OpenLDAP数据库,只能在局域网内访问,怎样从外网也能访问本地OpenLDAP数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动 ...
随机推荐
- Elasticsearch SQL
es sql是一个X-pack组件 ,允许对es执行类似sql的查询,可以将Elasticsearch SQL理解为一个编译器,既能理解es,又能理解sql.可以通过利用es,实施大规模实时读取和处理 ...
- logstash配置文件
1. 安装 logstash 安装过程很简单,直接参照官方文档: https://www.elastic.co/guide/en/logstash/current/installing-logsta ...
- Forbidden Subwords
pro: sol: 建出ac自动机. 一个合法的答案对应一条路径满足从一个scc走到另一个scc的路径. 发现这个题的方案数有可能是无限的. 会在以下两种情况无限: 因此,去掉无限情况后,环只有简单环 ...
- Leetcode 999. 车的可用捕获量
999. 车的可用捕获量 显示英文描述 我的提交返回竞赛 用户通过次数255 用户尝试次数260 通过次数255 提交次数357 题目难度Easy 在一个 8 x 8 的棋盘上,有一个白色车(r ...
- myeclipse debug模式 报错source not found
myeclipse debug模式下,启动报错 source not found:SignatureParser.current() line: 解决方法:将debug视图下的右上方的jar有断点的地 ...
- Spring Boot: remove jsessionid from url
参考代码 :Spring Boot: remove jsessionid from url 我的SpringBoot用2.0.*,答案中的第一二个方案亲测无效. 应该在继承了Configuration ...
- Nodejs--url模块
由于GET请求直接被嵌入在路径中,URL是完整的请求路径,包括了?后面的部分,因此你可以手动解析后面的内容作为GET请求的参数. url 模块中的 parse 函数可以用于解析url中的参数. url ...
- iOS 如何优化项目
原文 前言 在项目业务趋于稳定的时候,开发完迭代需求后,我们可能会无所适从,进入一段空白期,但是对于攻城狮来说闲暇不是件好事,所以我们可能总想学点什么,却又没有头绪.这个时候我们就可以考虑完善和优化我 ...
- 把旧系统迁移到.Net Core 2.0 日记(8) - EASYUI datagrid+ Dapper+ 导出Excel
迁移也没太大变化,有一个, 之前的Request.QueryString 是返回NameValueCollection, 现在则是返回整个字符串. 你要改成Request.Query[“key”] 直 ...
- 逆袭之旅DAY20.XIA.程序调试
2018-07-16 20:25:50 F5:进入方法 F6:单步执行