Spark Streaming官方文档学习--下
def getWordBlacklist(sparkContext):if ('wordBlacklist' not in globals()):globals()['wordBlacklist'] = sparkContext.broadcast(["a", "b", "c"])return globals()['wordBlacklist']def getDroppedWordsCounter(sparkContext):if ('droppedWordsCounter' not in globals()):globals()['droppedWordsCounter'] = sparkContext.accumulator(0)return globals()['droppedWordsCounter']def echo(time, rdd):# Get or register the blacklist Broadcastblacklist = getWordBlacklist(rdd.context)# Get or register the droppedWordsCounter AccumulatordroppedWordsCounter = getDroppedWordsCounter(rdd.context)# Use blacklist to drop words and use droppedWordsCounter to count themdef filterFunc(wordCount):if wordCount[0] in blacklist.value:droppedWordsCounter.add(wordCount[1])Falseelse:Truecounts = "Counts at time %s %s" % (time, rdd.filter(filterFunc).collect())wordCounts.foreachRDD(echo)
# Lazily instantiated global instance of SparkSessiondef getSparkSessionInstance(sparkConf):if ('sparkSessionSingletonInstance' not in globals()):globals()['sparkSessionSingletonInstance'] = SparkSession\.builder\.config(conf=sparkConf)\.getOrCreate()return globals()['sparkSessionSingletonInstance']...# DataFrame operations inside your streaming programwords = ... # DStream of stringsdef process(time, rdd):print("========= %s =========" % str(time))try:# Get the singleton instance of SparkSessionspark = getSparkSessionInstance(rdd.context.getConf())# Convert RDD[String] to RDD[Row] to DataFramerowRdd = rdd.map(lambda w: Row(word=w))wordsDataFrame = spark.createDataFrame(rowRdd)# Creates a temporary view using the DataFramewordsDataFrame.createOrReplaceTempView("words")# Do word count on table using SQL and print itwordCountsDataFrame = spark.sql("select word, count(*) as total from words group by word")wordCountsDataFrame.show()except:passwords.foreachRDD(process)
- Metadata checkpointing - Saving of the information defining the streaming computation to fault-tolerant storage like HDFS. This is used to recover from failure of the node running the driver of the streaming application. Metadata includes:
Configuration - The configuration that was used to create the streaming application.DStream operations - The set of DStream operations that define the streaming application.Incomplete batches - Batches whose jobs are queued but have not completed yet. - Data checkpointing - Saving of the generated RDDs to reliable storage.
- Usage of stateful transformations - If either updateStateByKey or reduceByKeyAndWindow (with inverse function) is used in the application, then the checkpoint directory must be provided to allow for periodic(周期的) RDD checkpointing.
- Recovering from failures of the driver running the application - Metadata checkpoints are used to recover with progress information.
- When the program is being started for the first time, it will create a new StreamingContext, set up all the streams and then call start().
- When the program is being restarted after failure, it will re-create a StreamingContext from the checkpoint data in the checkpoint directory.
# Function to create and setup a new StreamingContextdef functionToCreateContext():sc = SparkContext(...) # new contextssc = new StreamingContext(...)lines = ssc.socketTextStream(...) # create DStreams...ssc.checkpoint(checkpointDirectory) # set checkpoint directoryreturn ssc# Get StreamingContext from checkpoint data or create a new onecontext = StreamingContext.getOrCreate(checkpointDirectory, functionToCreateContext)# Do additional setup on context that needs to be done,# irrespective of whether it is being started or restartedcontext. ...# Start the contextcontext.start()context.awaitTermination()
StreamingContext.getOrCreate(checkpointDirectory, None).
- Cluster with a cluster manager
- Package the application JAR
If you are using spark-submit to start the application, then you will not need to provide Spark and Spark Streaming in the JAR. However, if your application uses advanced sources (e.g. Kafka, Flume), then you will have to package the extra artifact they link to, along with their dependencies, in the JAR that is used to deploy the application. - Configuring sufficient memory for the executors
Note that if you are doing 10 minute window operations, the system has to keep at least last 10 minutes of data in memory. So the memory requirements for the application depends on the operations used in it. - Configuring checkpointing
- Configuring automatic restart of the application driver
- Spark Standalone
the Standalone cluster manager can be instructed to supervise the driver, and relaunch it if the driver fails either due to non-zero exit code, or due to failure of the node running the driver. - YARN automatically restarting an application
- Mesos Marathon has been used to achieve this with Mesos
- Configuring write ahead logs
If enabled, all the data received from a receiver gets written into a write ahead log in the configuration checkpoint directory. - Setting the max receiving rate
- 更新的应用和旧的应用并行的执行,Once the new one (receiving the same data as the old one) has been warmed up and is ready for prime time, the old one be can be brought down.这要求,数据源可以向两个地方发送数据。
- 优雅的停止,就是处理完接受到的数据之后再停止。ensure data that has been received is completely processed before shutdown。Then the upgraded application can be started, which will start processing from the same point where the earlier application left off.为了实现这个需要数据源的数据是可以缓存的。
- Reducing the processing time of each batch of data by efficiently using cluster resources.
- Setting the right batch size such that the batches of data can be processed as fast as they are received (that is, data processing keeps up with the data ingestion).
Spark Streaming官方文档学习--下的更多相关文章
- Spark Streaming官方文档学习--上
官方文档地址:http://spark.apache.org/docs/latest/streaming-programming-guide.html Spark Streaming是spark ap ...
- Spark监控官方文档学习笔记
任务的监控和使用 有几种方式监控spark应用:Web UI,指标和外部方法 Web接口 每个SparkContext都会启动一个web UI,默认是4040端口,用来展示一些信息: 一系列调度的st ...
- Spring 4 官方文档学习(十一)Web MVC 框架
介绍Spring Web MVC 框架 Spring Web MVC的特性 其他MVC实现的可插拔性 DispatcherServlet 在WebApplicationContext中的特殊的bean ...
- Spark SQL 官方文档-中文翻译
Spark SQL 官方文档-中文翻译 Spark版本:Spark 1.5.2 转载请注明出处:http://www.cnblogs.com/BYRans/ 1 概述(Overview) 2 Data ...
- Spring 4 官方文档学习(十二)View技术
关键词:view technology.template.template engine.markup.内容较多,按需查用即可. 介绍 Thymeleaf Groovy Markup Template ...
- Spring 4 官方文档学习(十一)Web MVC 框架之配置Spring MVC
内容列表: 启用MVC Java config 或 MVC XML namespace 修改已提供的配置 类型转换和格式化 校验 拦截器 内容协商 View Controllers View Reso ...
- Spring Data Commons 官方文档学习
Spring Data Commons 官方文档学习 -by LarryZeal Version 1.12.6.Release, 2017-07-27 为知笔记版本在这里,带格式. Table o ...
- Spring 4 官方文档学习(十一)Web MVC 框架之resolving views 解析视图
接前面的Spring 4 官方文档学习(十一)Web MVC 框架,那篇太长,故另起一篇. 针对web应用的所有的MVC框架,都会提供一种呈现views的方式.Spring提供了view resolv ...
- Spring Boot 官方文档学习(一)入门及使用
个人说明:本文内容都是从为知笔记上复制过来的,样式难免走样,以后再修改吧.另外,本文可以看作官方文档的选择性的翻译(大部分),以及个人使用经验及问题. 其他说明:如果对Spring Boot没有概念, ...
随机推荐
- 如何清除DNS缓存,使用cmd命令清理DNS缓存方法
如何清除DNS缓存,使用cmd命令清理DNS缓存方法 有时候电脑突然上不了网,或者存在某些网站打不开的情况,但别的网站又可以打开,解决办法需要清除DNS缓存,那么如何清除DNS缓存呢,最常用的方法就是 ...
- 【Pro ASP.NET MVC 3 Framework】.学习笔记.6.SportsStore:导航
在之前的章节,偶们设置了核心的基础设施,现在我们将使用基础设计添加关键特性,你将会看到投资是如何回报的.我们能够很简单很容易地添加重要的面向客户的特性.沿途,你也会看到一些MVC框架提供的附加的特性. ...
- 错误代码:ERR_UNSAFE_PORT
修改下应用的端口为7788就好了:http://localhost:7788/taiping-sol-insu-vehicle/vehicleEntrance.action. 这个主要是chrome的 ...
- 为ecshop红包增加”转赠”功能
ecshop促销中使用红包激励用户购物,要想炒热活动,红包就需要有物以稀为贵的感觉.有人求有人送,这样红包之间的转赠有助于拉动第二梯队的顾客.但是如果已经把红包添加到自己的账户了怎么办?如果ecsho ...
- ab测试大并发错误
转载自http://xmarker.blog.163.com/blog/static/226484057201462263815783 apache 自带的ab工具测试,当并发量达到1000多的时候报 ...
- [xcode]Xcode查找函数(方法)调用及被调用
参考资料:http://stackoverflow.com/questions/7145045/find-method-references-in-xcode 这个功能有的说是 Find Caller ...
- android FragmentPagerAdapter getItem方法没有执行
转自 http://blog.csdn.net/getchance/article/details/40263505 在一个 Android 应用中,我使用 FragmentPagerAdapter ...
- PreparedStatement接口及其方法的使用
PreparedStatement接口是Statement接口的子接口,使用它的好处有三个 一:简化代码,便于sql语句的书写 二:有效的禁止sql语句的注入,例如:用户名和密码,使用Prepared ...
- Reflector8.5 .net反编译工具 破解教程
一.断网 二.打开软件.打开注册机 三. 四. 五. 六. 七. 八.
- Andriod 按钮代码
package com.example.test1; import android.support.v7.app.ActionBarActivity; import android.os.Bundle ...