RDD, Resilient Distributed Dataset,弹性分布式数据集, 是Spark的核心概念. 对于RDD的原理性的知识,可以参阅Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing 和 An Architecture for Fast and General Data Processing on Large Clusters 这两篇论文. 这篇…