问题:spark中如果有两个DataFrame(或者DataSet),DataFrameA依赖DataFrameB,并且两个DataFrame都进行了cache,将DataFrameB unpersist之后,DataFrameA的cache也会失效,官方解释如下: When invalidating a cache, we invalid other caches dependent on this cache to ensure cached data is up to date. For
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data.Dstream本质就是离散化的stream,将stream离散化成一组RDD的list,所以基本的操作仍然是以RDD为基础下面看到DStream的基本定义,对于普通的
参考,http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html Overview SparkStreaming支持多种流输入,like Kafka, Flume, Twitter, ZeroMQ or plain old TCP sockets,并且可以在上面进行transform操作,最终数据存入HDFS,数据库或dashboard另外可以把Spark's in-built machine le