该部分分为两篇,分别介绍RDD与Dataset/DataFrame: 一.RDD 二.DataSet/DataFrame 先来看下官网对RDD.DataSet.DataFrame的解释: 1.RDD Resilient distributed dataset(RDD),which is a fault-tolerant collection of elements that can be operated on in parallel RDD——弹性分布式数据集,分布在集群的各个结点上具有容错性…