collect_set去除重复元素;collect_list不去除重复元素
select gender,
       concat_ws(',', collect_set(children)),
       concat_ws(',', collect_list(children))
  from Affairs
 group by gender

// 创建视图
data.createOrReplaceTempView("Affairs") val df3= spark.sql("select gender,concat_ws(',',collect_set(children)),concat_ws(',',collect_list(children)) from Affairs group by gender")
df3: org.apache.spark.sql.DataFrame = [gender: string, concat_ws(,, collect_set(children)): string ... 1 more field] df3.show // collect_set去除重复元素;collect_list不去除重复元素
+------+-----------------------------------+------------------------------------+
|gender|concat_ws(,, collect_set(children))|concat_ws(,, collect_list(children))|
+------+-----------------------------------+------------------------------------+
|female| no,yes| no,yes,no,no,yes|
| male| no,yes| no,yes,no,yes,no|
+------+-----------------------------------+------------------------------------+

Spark2 Dataset之collect_set与collect_list的更多相关文章

  1. Spark2 DataSet 创建新行之flatMap

    val dfList = List(("Hadoop", "Java,SQL,Hive,HBase,MySQL"), ("Spark", & ...

  2. Spark2 Dataset行列操作和执行计划

    Dataset是一个强类型的特定领域的对象,这种对象可以函数式或者关系操作并行地转换.每个Dataset也有一个被称为一个DataFrame的类型化视图,这种DataFrame是Row类型的Datas ...

  3. Spark2 Dataset DataFrame空值null,NaN判断和处理

    import org.apache.spark.sql.SparkSession import org.apache.spark.sql.Dataset import org.apache.spark ...

  4. Spark2 Dataset分析函数--排名函数row_number,rank,dense_rank,percent_rank

    select gender,       age,       row_number() over(partition by gender order by age) as rowNumber,    ...

  5. Spark2 Dataset多维度统计cube与rollup

    val df6 = spark.sql("select gender,children,max(age),avg(age),count(age) from Affairs group by ...

  6. Spark2 Dataset统计指标:mean均值,variance方差,stddev标准差,corr(Pearson相关系数),skewness偏度,kurtosis峰度

    val df4=spark.sql("SELECT mean(age),variance(age),stddev(age),corr(age,yearsmarried),skewness(a ...

  7. Spark2 Dataset之视图与SQL

    // 创建视图 data.createOrReplaceTempView("Affairs") val df1 = spark.sql("SELECT * FROM Af ...

  8. Spark2 Dataset聚合操作

    data.groupBy("gender").agg(count($"age"),max($"age").as("maxAge&q ...

  9. Spark2 Dataset去重、差集、交集

    import org.apache.spark.sql.functions._ // 对整个DataFrame的数据去重 data.distinct() data.dropDuplicates() / ...

随机推荐

  1. HQS——Half Quadratic Splitting半二次方分裂

    变量分裂法 变量分裂法(Variable Splitting),解决目标函数是两个函数之和的优化问题. 1)其中g是n维向量到d维向量的一个映射. 变量分裂将上式变为: 问题(2)可能比(1)更容易或 ...

  2. [转]gluProject 和 gluUnproject 的详解

    gluProject 和 gluUnproject 的详解 简介: 三维空间中,经常需要将 3D 空间中的点转换到 2D(屏幕坐标),或者将 2D 点转换到 3D 空间中.当你使用 OpenGL 的时 ...

  3. mySql的desc与explain分析性能(主要分析索引)

    desc select * from A where id =‘110’; 查询结果的含义请参考:http://www.2cto.com/database/201209/156466.html

  4. 【MySQL】[Err] [Imp] 2006 - MySQL server has gone away .

    wait_timeout= interactive_timeout = max_allowed_packet=10M my.ini 后面增加 就可以解决

  5. 【jmeter】 jmeter 测试HTTP接口

    到apache官网下载jmeter:http://jmeter.apache.org/download_jmeter.cgi 1.运行 bin/jmeter.bat ,添加线程组 2.添加HTTP请求 ...

  6. [web] spring boot 整合MyBatis

    1.maven依赖 <?xml version="1.0" encoding="UTF-8"?> <project xmlns="h ...

  7. CSS3制作美丽的3D表单

    <!DOCTYPE HTML> <html lang="en-US"> <head> <meta charset="UTF-8& ...

  8. std::string与std::wstring互相转换

    作者:zzandyc来源:CSDN原文:https ://blog.csdn.net/zzandyc/article/details/77540056 版权声明:本文为博主原创文章,转载请附上博文链接 ...

  9. Unity Shader 获取模型空间坐标

    CGPROGRAM // Physically based Standard lighting model, and enable shadows on all light types #pragma ...

  10. wireshark----linux

    1.[root@lc~]# tshark   Running as user "root" and group "root". This could be da ...