
import org.apache.spark.SparkConf
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.Seconds
object NetWorkStream {
def main(args: Array[String]): Unit = {
var conf=new SparkConf().setMaster("spark://").setAppName("netWorkStream");
var ssc=new StreamingContext(conf,Seconds());
var lines= ssc.socketTextStream("", );
var words=lines.flatMap { line => line.split(" ")}
var wordCount= { w => (w,) }.reduceByKey(_+_);
nc -lk
zhang xing sheng zhang
// :: INFO scheduler.TaskSetManager: Finished task 0.0 in stage 128.0 (TID ) in  ms on (/)
// :: INFO scheduler.TaskSchedulerImpl: Removed TaskSet 128.0, whose tasks have all completed, from pool
// :: INFO scheduler.DAGScheduler: ResultStage (print at NetWorkStream.scala:) finished in 0.031 s
// :: INFO scheduler.DAGScheduler: Job finished: print at NetWorkStream.scala:, took 0.080836 s
// :: INFO spark.SparkContext: Starting job: print at NetWorkStream.scala:
// :: INFO scheduler.DAGScheduler: Got job (print at NetWorkStream.scala:) with output partitions
// :: INFO scheduler.DAGScheduler: Final stage: ResultStage (print at NetWorkStream.scala:)
// :: INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage )
// :: INFO scheduler.DAGScheduler: Missing parents: List()
// :: INFO scheduler.DAGScheduler: Submitting ResultStage (ShuffledRDD[] at reduceByKey at NetWorkStream.scala:), which has no missing parents
// :: INFO memory.MemoryStore: Block broadcast_67 stored as values in memory (estimated size 2.8 KB, free 366.2 MB)
// :: INFO memory.MemoryStore: Block broadcast_67_piece0 stored as bytes in memory (estimated size 1711.0 B, free 366.2 MB)
// :: INFO storage.BlockManagerInfo: Added broadcast_67_piece0 in memory on (size: 1711.0 B, free: 366.3 MB)
// :: INFO spark.SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO scheduler.DAGScheduler: Submitting missing tasks from ResultStage (ShuffledRDD[] at reduceByKey at NetWorkStream.scala:)
// :: INFO scheduler.TaskSchedulerImpl: Adding task set 130.0 with tasks
// :: INFO scheduler.TaskSetManager: Starting task 0.0 in stage 130.0 (TID ,, partition , NODE_LOCAL, bytes)
// :: INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task on executor id: hostname:
// :: INFO storage.BlockManagerInfo: Added broadcast_67_piece0 in memory on (size: 1711.0 B, free: 366.3 MB)
// :: INFO scheduler.TaskSetManager: Finished task 0.0 in stage 130.0 (TID ) in ms on (/)
// :: INFO scheduler.TaskSchedulerImpl: Removed TaskSet 130.0, whose tasks have all completed, from pool
// :: INFO scheduler.DAGScheduler: ResultStage (print at NetWorkStream.scala:) finished in 0.014 s
// :: INFO scheduler.DAGScheduler: Job finished: print at NetWorkStream.scala:, took 0.022658 s
Time: ms
var conf=new SparkConfig();
new StreamingContext(conf,Seconds(1));//创建context
  1. 定义上下文之后,你应该做下面事情
  2. After a context is defined, you have to do the following.
  3. 根据创建DStream定义输入数据源
  4. Define the input sources by creating input DStreams.
  5. 定义计算方式DStream转换和输出
  6. Define the streaming computations by applying transformation and output operations to DStreams.
  7. 使用streamingContext.start()启动接受数据的进程
  8. Start receiving data and processing it using streamingContext.start().
  9. 等待进程结束
  10. Wait for the processing to be stopped (manually or due to any error) using streamingContext.awaitTermination().
  11. 手动关闭进程
  12. The processing can be manually stopped using streamingContext.stop().
  1. 一旦一个上下文启动,不能在这个上下文中设置新计算或者添加
  2. Once a context has been started, no new streaming computations can be set up or added to it.
  3. 一旦一个上下文停止,就不能在重启
  4. Once a context has been stopped, it cannot be restarted.
  5. 在同一时间一个jvm只能有一个StreamingContext 在活动
  6. Only one StreamingContext can be active in a JVM at the same time.//ssc.stop(false)
  7. 在StreamingContext 上使用stop函数,同事也会停止sparkContext,仅仅停止StreamingContext,在调用stopSparkContext设置参数为false
  8. stop() on StreamingContext also stops the SparkContext. To stop only the StreamingContext, set the optional parameter of stop() called stopSparkContext to false.
  9. 一个SparkContext 可以创建多个streamingContext和重用,只要在上一个StreamingContext停止前创建下一个StreamingContext
  10. A SparkContext can be re-used to create multiple StreamingContexts, as long as the previous StreamingContext is stopped (without stopping the SparkContext) before the next StreamingContext is created.

