spark streaming 5: InputDStream
/**
* This is the abstract base class for all input streams. This class provides methods
* start() and stop() which is called by Spark Streaming system to start and stop receiving data.
* Input streams that can generate RDDs from new data by running a service/thread only on
* the driver node (that is, without running a receiver on worker nodes), can be
* implemented by directly inheriting this InputDStream. For example,
* FileInputDStream, a subclass of InputDStream, monitors a HDFS directory from the driver for
* new files and generates RDDs with the new files. For implementing input streams
* that requires running a receiver on the worker nodes, use
* [[org.apache.spark.streaming.dstream.ReceiverInputDStream]] as the parent class.
*
* @param ssc_ Streaming context that will execute this input stream
*/
abstract class InputDStream[T: ClassTag] (@transient ssc_ : StreamingContext)
extends DStream[T](ssc_) {
private[streaming] var lastValidTime: Time = null
ssc.graph.addInputStream(this)
/**
* Abstract class for defining any [[org.apache.spark.streaming.dstream.InputDStream]]
* that has to start a receiver on worker nodes to receive external data.
* Specific implementations of NetworkInputDStream must
* define `the getReceiver()` function that gets the receiver object of type
* [[org.apache.spark.streaming.receiver.Receiver]] that will be sent
* to the workers to receive data.
* @param ssc_ Streaming context that will execute this input stream
* @tparam T Class type of the object of this stream
*/
abstract class ReceiverInputDStream[T: ClassTag](@transient ssc_ : StreamingContext)
extends InputDStream[T](ssc_) {
/** Keeps all received blocks information */
private lazy val receivedBlockInfo = new HashMap[Time, Array[ReceivedBlockInfo]]
/** This is an unique identifier for the network input stream. */
val id = ssc.getNewReceiverStreamId()
/**
* Gets the receiver object that will be sent to the worker nodes
* to receive data. This method needs to defined by any specific implementation
* of a NetworkInputDStream.
*/
def getReceiver(): Receiver[T]
/** Ask ReceiverInputTracker for received data blocks and generates RDDs with them. */
override def compute(validTime: Time): Option[RDD[T]] = {
// If this is called for any time before the start time of the context,
// then this returns an empty RDD. This may happen when recovering from a
// master failure
if (validTime >= graph.startTime) {
val blockInfo = ssc.scheduler.receiverTracker.getReceivedBlockInfo(id)
receivedBlockInfo(validTime) = blockInfo
val blockIds = blockInfo.map(_.blockId.asInstanceOf[BlockId])
Some(new BlockRDD[T](ssc.sc, blockIds))
} else {
Some(new BlockRDD[T](ssc.sc, Array[BlockId]()))
}
}
spark streaming 5: InputDStream的更多相关文章
- Spark Streaming源码分析 – InputDStream
对于NetworkInputDStream而言,其实不是真正的流方式,将数据读出来后不是直接去处理,而是先写到blocks中,后面的RDD再从blocks中读取数据继续处理这就是一个将stream离散 ...
- Spark Streaming消费Kafka Direct方式数据零丢失实现
使用场景 Spark Streaming实时消费kafka数据的时候,程序停止或者Kafka节点挂掉会导致数据丢失,Spark Streaming也没有设置CheckPoint(据说比较鸡肋,虽然可以 ...
- spark streaming kafka1.4.1中的低阶api createDirectStream使用总结
转载:http://blog.csdn.net/ligt0610/article/details/47311771 由于目前每天需要从kafka中消费20亿条左右的消息,集群压力有点大,会导致job不 ...
- spark streaming集成kafka
Kakfa起初是由LinkedIn公司开发的一个分布式的消息系统,后成为Apache的一部分,它使用Scala编写,以可水平扩展和高吞吐率而被广泛使用.目前越来越多的开源分布式处理系统如Clouder ...
- spark streaming从指定offset处消费Kafka数据
spark streaming从指定offset处消费Kafka数据 -- : 770人阅读 评论() 收藏 举报 分类: spark() 原文地址:http://blog.csdn.net/high ...
- Spark streaming + Kafka 流式数据处理,结果存储至MongoDB、Solr、Neo4j(自用)
KafkaStreaming.scala文件 import kafka.serializer.StringDecoder import org.apache.spark.SparkConf impor ...
- spark streaming 接收kafka消息之一 -- 两种接收方式
源码分析的spark版本是1.6. 首先,先看一下 org.apache.spark.streaming.dstream.InputDStream 的 类说明: This is the abstrac ...
- Spark Streaming消费Kafka Direct保存offset到Redis,实现数据零丢失和exactly once
一.概述 上次写这篇文章文章的时候,Spark还是1.x,kafka还是0.8x版本,转眼间spark到了2.x,kafka也到了2.x,存储offset的方式也发生了改变,笔者根据上篇文章和网上文章 ...
- Error- Overloaded method value createDirectStream in error Spark Streaming打包报错
直接上代码 StreamingExamples.setStreamingLogLevels() val Array(brokers, topics) = args // Create context ...
随机推荐
- Java是面向对象的编程语言。它不仅吸收了C++语言的优点
Java是面向对象的编程语言.它不仅吸收了C++语言的优点,而且摒弃了C++中难于理解的多继承和指针的概念.因此,Java语言具有功能强大.使用方便的特点.Java语言作为静态面向对象的编程语言的代表 ...
- 解决'androidx.arch.core:core-runtime' has different version for the compile (2.0.0) and runtime (2.0.1)
先说原因,我们引用的包版本不同产生了冲突,所以编译不通过.解决的办法是在引用的时候排除一个版本,只留一个版本. 解决过程: 先找出哪些库引用了相同的库,仅仅是版本不同. gradle app:depe ...
- JavaJDBC【二、Mysql包加载与使用】
连接数据库前提条件是: 1.加载mysql驱动 2.获取连接 加载驱动前需要将mysql的jar包引入项目 Demo: package JDBC; import java.sql.DriverMana ...
- Java注解【二、Java中的常见注解】
JDK自带注解 @Override 重写 @Deprecated 已过期 @Suppvisewarnings 压制警告 Demo: public interface Person { public S ...
- Oracle【序列、索引、视图、分页】
1.Oracle序列语法:create sequence 序列名 特点1:默认是无值,指针指向没有值的位置 特点2:序列名.nextval 每次执行值会自增一次,步长为 1 特点3:序列名.currv ...
- Kaggle_泰坦尼克乘客存活预测
转载 逻辑回归应用之Kaggle泰坦尼克之灾 此转载只为保存!!! ————————————————版权声明:本文为CSDN博主「寒小阳」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附 ...
- usb发送字节
- js创建对象的几种方式(工厂模式、构造函数模式、原型模式)
普通方法创建对象 var obj = { name:"猪八戒", sayname:function () { alert(this.name); } } var obj1 = { ...
- itop4412编译内核时出现“recipe for target 'arch/arm/mach-exynos/cpu-exynos4.o' failed”的解决方法
依次执行如下命令 #su root 输入root用户密码 #cd #vim .bashrc 到达最底行,确保环境变量如下图所示 保存退出后,执行如下指令 #source .bashrc 重启Termi ...
- PHP中使用PDO的预处理功能避免SQL注入
不使用预处理功能 <?php $id = $_GET['id']; $dsn = 'mysql:host=localhost;port=3306;dbname=database'; try { ...