Spark2 Dataset之collect_set与collect_list
collect_set去除重复元素;collect_list不去除重复元素
select gender,
concat_ws(',', collect_set(children)),
concat_ws(',', collect_list(children))
from Affairs
group by gender
// 创建视图
data.createOrReplaceTempView("Affairs") val df3= spark.sql("select gender,concat_ws(',',collect_set(children)),concat_ws(',',collect_list(children)) from Affairs group by gender")
df3: org.apache.spark.sql.DataFrame = [gender: string, concat_ws(,, collect_set(children)): string ... 1 more field] df3.show // collect_set去除重复元素;collect_list不去除重复元素
+------+-----------------------------------+------------------------------------+
|gender|concat_ws(,, collect_set(children))|concat_ws(,, collect_list(children))|
+------+-----------------------------------+------------------------------------+
|female| no,yes| no,yes,no,no,yes|
| male| no,yes| no,yes,no,yes,no|
+------+-----------------------------------+------------------------------------+
Spark2 Dataset之collect_set与collect_list的更多相关文章
- Spark2 DataSet 创建新行之flatMap
val dfList = List(("Hadoop", "Java,SQL,Hive,HBase,MySQL"), ("Spark", & ...
- Spark2 Dataset行列操作和执行计划
Dataset是一个强类型的特定领域的对象,这种对象可以函数式或者关系操作并行地转换.每个Dataset也有一个被称为一个DataFrame的类型化视图,这种DataFrame是Row类型的Datas ...
- Spark2 Dataset DataFrame空值null,NaN判断和处理
import org.apache.spark.sql.SparkSession import org.apache.spark.sql.Dataset import org.apache.spark ...
- Spark2 Dataset分析函数--排名函数row_number,rank,dense_rank,percent_rank
select gender, age, row_number() over(partition by gender order by age) as rowNumber, ...
- Spark2 Dataset多维度统计cube与rollup
val df6 = spark.sql("select gender,children,max(age),avg(age),count(age) from Affairs group by ...
- Spark2 Dataset统计指标:mean均值,variance方差,stddev标准差,corr(Pearson相关系数),skewness偏度,kurtosis峰度
val df4=spark.sql("SELECT mean(age),variance(age),stddev(age),corr(age,yearsmarried),skewness(a ...
- Spark2 Dataset之视图与SQL
// 创建视图 data.createOrReplaceTempView("Affairs") val df1 = spark.sql("SELECT * FROM Af ...
- Spark2 Dataset聚合操作
data.groupBy("gender").agg(count($"age"),max($"age").as("maxAge&q ...
- Spark2 Dataset去重、差集、交集
import org.apache.spark.sql.functions._ // 对整个DataFrame的数据去重 data.distinct() data.dropDuplicates() / ...
随机推荐
- HQS——Half Quadratic Splitting半二次方分裂
变量分裂法 变量分裂法(Variable Splitting),解决目标函数是两个函数之和的优化问题. 1)其中g是n维向量到d维向量的一个映射. 变量分裂将上式变为: 问题(2)可能比(1)更容易或 ...
- [转]gluProject 和 gluUnproject 的详解
gluProject 和 gluUnproject 的详解 简介: 三维空间中,经常需要将 3D 空间中的点转换到 2D(屏幕坐标),或者将 2D 点转换到 3D 空间中.当你使用 OpenGL 的时 ...
- mySql的desc与explain分析性能(主要分析索引)
desc select * from A where id =‘110’; 查询结果的含义请参考:http://www.2cto.com/database/201209/156466.html
- 【MySQL】[Err] [Imp] 2006 - MySQL server has gone away .
wait_timeout= interactive_timeout = max_allowed_packet=10M my.ini 后面增加 就可以解决
- 【jmeter】 jmeter 测试HTTP接口
到apache官网下载jmeter:http://jmeter.apache.org/download_jmeter.cgi 1.运行 bin/jmeter.bat ,添加线程组 2.添加HTTP请求 ...
- [web] spring boot 整合MyBatis
1.maven依赖 <?xml version="1.0" encoding="UTF-8"?> <project xmlns="h ...
- CSS3制作美丽的3D表单
<!DOCTYPE HTML> <html lang="en-US"> <head> <meta charset="UTF-8& ...
- std::string与std::wstring互相转换
作者:zzandyc来源:CSDN原文:https ://blog.csdn.net/zzandyc/article/details/77540056 版权声明:本文为博主原创文章,转载请附上博文链接 ...
- Unity Shader 获取模型空间坐标
CGPROGRAM // Physically based Standard lighting model, and enable shadows on all light types #pragma ...
- wireshark----linux
1.[root@lc~]# tshark Running as user "root" and group "root". This could be da ...