org.apache.spark.rddRDDabstract class RDD[T] extends Serializable with Logging A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated on in parallel. Thi…
RDD的中文解释是弹性分布式数据集.构造的数据集的时候用的是List(链表)或者Array数组类型/* 使用makeRDD创建RDD */ /* List */ val rdd01 = sc.makeRDD(List(,,,,,)) val r01 = rdd01.map { x => x * x } println(r01.collect().mkString(",")) /* Array */ val rdd02 = sc.makeRDD(Array(,,,,,)) val…