spark streaming的容错：防止数据丢失

官方这么说的

[Since Spark 1.2] Configuring write ahead logs - Since Spark 1.2, we have introduced write ahead logs for achieving strong fault-tolerance guarantees. If enabled, all the data received from a receiver gets written into a write ahead log in the configuration checkpoint directory. This prevents data loss on driver recovery, thus ensuring zero data loss (discussed in detail in the Fault-tolerance Semantics section). This can be enabled by setting the configuration parameter spark.streaming.receiver.writeAheadLog.enable to true. However, these stronger semantics may come at the cost of the receiving throughput of individual receivers. This can be corrected by running more receivers in parallel to increase aggregate throughput. Additionally, it is recommended that the replication of the received data within Spark be disabled when the write ahead log is enabled as the log is already stored in a replicated storage system. This can be done by setting the storage level for the input stream to StorageLevel.MEMORY_AND_DISK_SER.

我理解，当worker或者driver挂掉后，可能会将receive的数据丢失，那么官方给的方案就是将接受的数据checkpoint到本地。

通过使用spark.streaming.receiver.writeAheadLog.enable=true来启用。另外，如果启动这个的话，那么streaming的存储策略就没有必要多个复本了，官方推荐使用StorageLevel.MEMORY_AND_DISK_SER即可

spark streaming的容错：防止数据丢失的更多相关文章

Spark Streaming的容错和数据无丢失机制
spark是迭代式的内存计算框架,具有很好的高可用性.sparkStreaming作为其模块之一,常被用于进行实时的流式计算.实时的流式处理系统必须是7*24运行的,同时可以从各种各样的系统错误中恢复 ...
62、Spark Streaming：容错机制以及事务语义
一. 容错机制 1.背景要理解Spark Streaming提供的容错机制,先回忆一下Spark RDD的基础容错语义: 1.RDD,Ressilient Distributed Dataset,是 ...
Spark Streaming 的容错
Spark Streaming 为了实现容错特性,接收到的数据需要在集群的多个Worker 节点上的 executors 之间保存副本(默认2份).当故障发生时,有两种数据需要恢复: 1. 已接收并且 ...
3.spark streaming Job 架构和容错解析
一.Spark streaming Job 架构 SparkStreaming框架会自动启动Job并每隔BatchDuration时间会自动触发Job的调用. Spark Streaming的Job ...
Spark入门实战系列--7.Spark Streaming（上）--实时流计算Spark Streaming原理介绍
[注]该系列文章以及使用到安装包/测试数据可以在<倾情大奉送--Spark入门实战系列>获取 .Spark Streaming简介 1.1 概述 Spark Streaming 是Spa ...
Spark Streaming编程指南
Overview A Quick Example Basic Concepts Linking Initializing StreamingContext Discretized Streams (D ...
Spark Streaming和Kafka整合保证数据零丢失
当我们正确地部署好Spark Streaming,我们就可以使用Spark Streaming提供的零数据丢失机制.为了体验这个关键的特性,你需要满足以下几个先决条件: 1.输入的数据来自可靠的数据源 ...
.Spark Streaming（上）--实时流计算Spark Streaming原理介
Spark入门实战系列--7.Spark Streaming(上)--实时流计算Spark Streaming原理介绍 http://www.cnblogs.com/shishanyuan/p/474 ...
spark streaming的理解和应用
1.Spark Streaming简介官方网站解释:http://spark.apache.org/docs/latest/streaming-programming-guide.html 该博客转 ...

随机推荐

自动更新前加密：Clickonce用法
一.加密dll 新建一个windows form application: static void Main(string[] args) { Process. ...
jq禁用html标签
原文:http://www.jb51.net/article/105154.htm 移除或禁用html元素的点击事件可以通过css实现也可以通过js或jQuery实现. 一.CSS方法 .disabl ...
C#生成和识别二维码
用到外部一个DLL文件(ThoughtWorks.QRCode.dll),看效果生成截图识别截图生成二维码后右键菜单可以保存二维码图片,然后可以到识别模式下进行识别,当然生成后可以用手机扫描识别 ...
INF文件详解
安装信息(Setup Information)文件是Windows系统支持的一种安装信息存放文件,一般以INF作为扩展名,因此也叫INF文件.安装信息INF文件与Windows内建的安装服务引擎(AP ...
Linux 磁盘自动挂载
磁盘代号或者装置的Label 挂载点档案系统格式档案系统参数是否用dump备份是否用fsck检查扇区 0 0 1 1 2 2 下面来写一个代表的 ...
比特币全节点（bitcoind） eth 全节点
运行全节点的用途: 1.挖矿 2.钱包运行全节点,可以做关于btc的任何事情,例如创建钱包地址.管理钱包地址.发送交易.查询全网的交易信息等等选个节点钱包:bitcoind 1.配置文件: ...
ZSTU OJ 3770: 黑帽子归纳总结
Description 一群非常聪明的人开舞会,每人头上都戴着一顶帽子.帽子只有黑白两种,黑的至少有一顶.每个人都能看到其它人帽子的颜色,却看不到自己的.主持人先让大家看看别人头上戴的是什幺帽子,然 ...
python 基础元组()
# 元组应用场景 # 尽管 Python的列表中可以存储不同类型的数据 # 但是在开发中,更多的应用场景是 # 1.列表存储相同类型的数据 # 2.通过迭代遍历,在循环体内部,针对列表中的每一项元素 ...
ASP.NET MVC学习笔记-----Filter(2)
接上篇ASP.NET MVC学习笔记-----Filter(1) Action Filter Action Filter可以基于任何目的使用,它需要实现IActionFilter接口: public ...
chrome 隐藏技能之 base64 图片转换
有时候我们要转换图片为base64,或者将base64转回图片,可能都需要找一些在线工具或者软件类型的工具才行.当然 chrome 也算是软件,但是好在做前端的都有 chrome.好了,来看下简单的例 ...

spark streaming的容错：防止数据丢失

spark streaming的容错：防止数据丢失的更多相关文章

随机推荐

热门专题