在Spark中有许多聚类操作是基于combineByKey的,例如group那个家族的操作等.所以combineByKey这个函数也是比较重要,所以下午花了点时间看来下这个函数.也参考了http://www.tuicool.com/articles/miueaqv这篇博客. 先看下combineByKey定义: /** * Generic function to combine the elements for each key using a custom set of aggregat…
一.函数的源码 /** * Simplified version of combineByKeyWithClassTag that hash-partitions the resulting RDD using the * existing partitioner/parallelism level. This method is here for backward compatibility. It * does not provide combiner classtag informatio…