java实现spark常用算子之join
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.VoidFunction;
import scala.Tuple2; import java.util.Arrays;
import java.util.List; /**
* join(otherDataSet,[numTasks]) 算子:
* 同样的也是按照key将两个RDD中进行汇总操作,会对每个key所对应的两个RDD中的数据进行笛卡尔积计算。
*
*按照key进行分类汇总,并且做笛卡尔积
*/
public class JoinOperator { public static void main(String[] args) {
SparkConf conf = new SparkConf().setMaster("local").setAppName("join");
JavaSparkContext sc = new JavaSparkContext(conf);
List<Tuple2<String,String>> stus = Arrays.asList(
new Tuple2<>("w1","1"),
new Tuple2<>("w2","2"),
new Tuple2<>("w3","3"),
new Tuple2<>("w2","22"),
new Tuple2<>("w1","11")
);
List<Tuple2<String,String>> scores = Arrays.asList(
new Tuple2<>("w1","a1"),
new Tuple2<>("w2","a2"),
new Tuple2<>("w2","a22"),
new Tuple2<>("w1","a11"),
new Tuple2<>("w3","a3")
); JavaPairRDD<String,String> stusRdd = sc.parallelizePairs(stus);
JavaPairRDD<String,String> scoresRdd = sc.parallelizePairs(scores);
JavaPairRDD<String,Tuple2<String,String>> result = stusRdd.join(scoresRdd); result.foreach(new VoidFunction<Tuple2<String, Tuple2<String, String>>>() {
@Override
public void call(Tuple2<String, Tuple2<String, String>> tuple) throws Exception {
System.err.println(tuple._1+":"+tuple._2);
}
}); }
}
微信扫描下图二维码加入博主知识星球,获取更多大数据、人工智能、算法等免费学习资料哦!
java实现spark常用算子之join的更多相关文章
- java实现spark常用算子之Union
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之TakeSample
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之SaveAsTextFile
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之Repartitions
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之mapPartitionsWithIndex
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之map
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之intersection
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之frist
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之flatmap
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
随机推荐
- GUI输入数据并保存
from tkinter import * def write_to_file(): fileContent = open("deliveries.txt","a&quo ...
- excel中如何设置只打印第一页
在打印表格时,怎样设置只打印第一页呢,操作很简单,下面,小编说下操作方法. 方法/步骤 打开要打印的工作表, 再点击“文件” 弹出的页面中,在左侧这里,点击“打印” 在右边弹出与打 ...
- linux系统创建windows启动盘
平时工作中用到linux的操作命令较多,因此为了方便,就给电脑装了双系统,一般工作的时候,都选择进入linux系统.但是今天有件工作之外的事情需要解决下:创建一个windows启动盘.如果按照往常来说 ...
- yarn application命令介绍
yarn application 1.-list 列出所有 application 信息 示例:yarn application -list 2.-appStates <Stat ...
- C++面试出现频率最高的30道题目
http://blog.csdn.net/wangshihui512/article/details/9092439 1.new.delete.malloc.free关系 delete会调用对象的析构 ...
- html5内容快速学习
accessKey 快捷键 <input type="text" accessType="m"/> <!-- chrome按下快捷键alt+m ...
- 阶段3 3.SpringMVC·_03.SpringMVC常用注解_2 RequestBody注解
拿整个请求体的数据
- Rxjava2实战--第四章 Rxjava的线程操作
Rxjava2实战--第四章 Rxjava的线程操作 1 调度器(Scheduler)种类 1.1 RxJava线程介绍 默认情况下, 1.2 Scheduler Sheduler 作用 single ...
- Maven打包时出现无法下载org.apache.maven.plugins插件
解决方式: 方式1:使用 mvn clean package -U 打包即可(注意:出于性能原因,Maven缓存插件无法下载的信息.根据您的设置,您可能需要通过将标志添加-U到命令行来清除此缓存,以使 ...
- CentOS的SVN服务器搭建与自动部署全过程
CentOS的SVN服务器搭建与自动部署全过程 http://www.jb51.net/article/106218.htm authz-db = authz 引起的 svn 认证失败 http:// ...