有这样一组搜索结果数据:

租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List

{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}

现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

这里最关键的是sku对应到命中的关键词:

操作步骤1: 

将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。

    select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = '';

显示如下:

操作步骤2:

根据商品,汇总关键词列,这里考虑到平台,时间维度等。

grouping sets 分组汇总数据

collect_set 多行合并并且去重

collect_list 多行合并不去重

with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
) select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
collect_set(keywords) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

操作步骤3:

数组转字符串: concat_ws('分隔符',数组)

with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
),
tmp_b as (
select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
concat_ws(',',collect_set(keywords)) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
)
select * from tmp_b;

是不是太简单了。

hive之案例分析(grouping sets,lateral view explode, concat_ws)的更多相关文章

  1. Hive lateral view explode

    select 'hello', x from dual lateral view explode(array(1,2,3,4,5)) vt as x 结果是: hello   1 hello   2 ...

  2. hive lateral view 与 explode详解

    ref:https://blog.csdn.net/bitcarmanlee/article/details/51926530 1.explode hive wiki对于expolde的解释如下: e ...

  3. hive splict, explode, lateral view, concat_ws

    hive> create table arrays (x array<string>) > row format delimited fields terminated by ...

  4. hive 使用笔记(table format;lateral view)

    1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...

  5. hive 使用笔记(table format;lateral view横表转纵表)

    1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...

  6. hive中的lateral view 与 explode函数的使用

    hive中的lateral view 与 explode函数的使用 背景介绍: explode与lateral view在关系型数据库中本身是不该出现的. 因为他的出现本身就是在操作不满足第一范式的数 ...

  7. 【Hive学习之六】Hive Lateral View &视图&索引

    环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk8 hadoop-3.1.1 apache-hive-3.1.1 ...

  8. hive grouping sets 实现原理

    先下结论: 看了hive 1.1.0 grouping sets 实现(从源码及执行计划都可以看出与kylin实现不一样),(前提是可累加,如sum函数)他并没有像kylin一样先按照group by ...

  9. 【hive】lateral view的使用

    当使用UDTF函数的时候,hive只允许对拆分字段进行访问的 例如: select id,explode(arry1) from table; —错误 会报错FAILED: SemanticExcep ...

随机推荐

  1. How to calculate elapsed / execute time in Java

    How to calculate elapsed / execute time in JavaIn Java, you can use the following ways to measure el ...

  2. C#基础第一天-作业答案

    题一答案: Console.WriteLine("请输入a"); int a = Convert.ToInt32(Console.ReadLine()); Console.Writ ...

  3. 简单理解MapView 以及 设置 MKAnnotationView

    MKMapView  相当于一个容器 .可以展示  MKAnnotationView.. 要使用它需要设置 数据源代理 _mapView.delegate = self; 它的数据源对象就是 符合   ...

  4. 用较早版本的APIs实现抽象类

    原文链接:http://android.eoe.cn/topic/android_sdk 用较早版本的APIs实现抽象类 这节课程我们讨论如何创建一个实现类,即能对应新版本的API,又能够保持对老版本 ...

  5. 使用android-resource-remover优化资源使用率和lint-result.xml如果导出

    安装教程:http://blog.csdn.net/mlj1668956679/article/details/38643145   按照上面教程中.下载了 get-pip.py.后一运行出现这个问题 ...

  6. iOS_生成pem推送证书(用于百度云推送)

    具体步骤如下: 首先,需要一个pem的证书,该证书需要与开发时签名用的一致. 具体生成pem证书方法如下: 1. 登录到 iPhone Developer Connection Portal(http ...

  7. IOS高访微信聊天对话界面(sizeWithFont:constrainedToSize和stretchableImageWithLeftCapWidth的使用)

    大家好,百忙之中,抽出点空,写个微博,话说好久没写. 最近项目中有碰到写类似微信聊天界面上的效果,特整理了一下,写了一个小的Demo,希望给没头绪的同学们一个参考! 下载地址:http://files ...

  8. 【Unity】2.9 光源(Lights)

    分类:Unity.C#.VS2015 创建日期:2016-03-31 一.简介 光源 (Lights) 是每个场景的重要组成部分.网格和纹理决定了场景的形状和外观,而光源则决定了三维环境的颜色和氛围. ...

  9. VS Code插件Vue2 代码补全工具

    一.简介 此扩展将Vue 2代码片段和语法突出显示添加到Visual Studio代码中. 这个插件基于最新的Vue官方语法高亮文件添加了语法高亮,并且依据Vue 2的API添加了代码片段. 支持语言 ...

  10. elasticsearch简介和倒排序索引介绍

    介绍 我们为什么要用搜索引擎?我们的所有数据在数据库里面都有,而且 Oracle.SQL Server 等数据库里也能提供查询检索或者聚类分析功能,直接通过数据库查询不就可以了吗?确实,我们大部分的查 ...