原文地址:https://blog.csdn.net/helloxiaozhe/article/details/80492933 1.创建一个RDD变量,通过help函数,查看相关函数定义和例子: >>> a = sc.parallelize([(1,2),(3,4),(5,6)]) >>> a ParallelCollectionRDD[21] at parallelize at PythonRDD.scala:475 >>> help(a.map)
map将函数作用到数据集的每一个元素上,生成一个新的分布式的数据集(RDD)返回 map函数的源码: def map(self, f, preservesPartitioning=False): """ Return a new RDD by applying a function to each element of this RDD. >>> rdd = sc.parallelize(["b", "a", &q