hive之案例分析(grouping sets,lateral view explode, concat_ws)
有这样一组搜索结果数据:
租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

这里最关键的是sku对应到命中的关键词:
操作步骤1:
将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = '';
显示如下:

操作步骤2:
根据商品,汇总关键词列,这里考虑到平台,时间维度等。
grouping sets 分组汇总数据
collect_set 多行合并并且去重
collect_list 多行合并不去重
with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
) select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
collect_set(keywords) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

操作步骤3:
数组转字符串: concat_ws('分隔符',数组)
with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
),
tmp_b as (
select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
concat_ws(',',collect_set(keywords)) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
)
select * from tmp_b;

是不是太简单了。
hive之案例分析(grouping sets,lateral view explode, concat_ws)的更多相关文章
- Hive lateral view explode
select 'hello', x from dual lateral view explode(array(1,2,3,4,5)) vt as x 结果是: hello 1 hello 2 ...
- hive lateral view 与 explode详解
ref:https://blog.csdn.net/bitcarmanlee/article/details/51926530 1.explode hive wiki对于expolde的解释如下: e ...
- hive splict, explode, lateral view, concat_ws
hive> create table arrays (x array<string>) > row format delimited fields terminated by ...
- hive 使用笔记(table format;lateral view)
1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...
- hive 使用笔记(table format;lateral view横表转纵表)
1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...
- hive中的lateral view 与 explode函数的使用
hive中的lateral view 与 explode函数的使用 背景介绍: explode与lateral view在关系型数据库中本身是不该出现的. 因为他的出现本身就是在操作不满足第一范式的数 ...
- 【Hive学习之六】Hive Lateral View &视图&索引
环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk8 hadoop-3.1.1 apache-hive-3.1.1 ...
- hive grouping sets 实现原理
先下结论: 看了hive 1.1.0 grouping sets 实现(从源码及执行计划都可以看出与kylin实现不一样),(前提是可累加,如sum函数)他并没有像kylin一样先按照group by ...
- 【hive】lateral view的使用
当使用UDTF函数的时候,hive只允许对拆分字段进行访问的 例如: select id,explode(arry1) from table; —错误 会报错FAILED: SemanticExcep ...
随机推荐
- POJ 2115:C Looooops
C Looooops Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 19536 Accepted: 5204 Descr ...
- jQuery的dialog弹窗实现
jQuery实现dialog弹窗: html代码如下: <input type="button" onclick="performances();" va ...
- django class-based view 考古
django 中的view中进化史: 1.在“天地初开”的时候django中的view是通过函数来定义的.函数接收一个request并以一个response作为返回: 对于这个request是通过po ...
- Traefik Kubernetes 初试
traefik 是一个前端负载均衡器,对于微服务架构尤其是 kubernetes 等编排工具具有良好的支持:同 nginx 等相比,traefik 能够自动感知后端容器变化,从而实现自动服务发现:今天 ...
- Kinect v2 记录
最多可同时识别跟踪 6 人,每人可识别到 25 个关节数据.可以根据上身 10 个关节数据来判断坐姿状态. 物理极限识别范围:0.5m – 4.5m,最佳识别范围:0.8m – 3.5m. 深度数据可 ...
- [hihoCoder] 第五十周: 欧拉路·二
题目1 : 欧拉路·二 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 在上一回中小Hi和小Ho控制着主角收集了分散在各个木桥上的道具,这些道具其实是一块一块骨牌. 主角 ...
- mysql 数据类型TIMESTAMP用法
在mysql数据库中,timestamp数据类型是一个比较特殊的数据类型,可以自动在不使用程序更新情况下只要更新了记录timestamp会自动更新时间. 通常表中会有一个Create date 创建日 ...
- 使用putty部署远程J2EE环境
以前没弄过,开个帖子记录一下. 基本上要做的就是安装JDK.安装tomcat.安装sql. 1.安装JDK JDK在本机上,需要传输到远程linux服务器上.为了存放我们上传的文件.打开putty,进 ...
- KVM虚拟机安装报错 KVM is not available
在linux系统上使用kvm安装系统时,如果你的cpu不支持虚拟化技术那么可能会报以下错误: Warning:KVM is not available. This may mean the KVM p ...
- javascript基础拾遗(七)
1.对象的继承__proto__ var Language = { name: 'program', score: 8.0, popular: function () { return this.sc ...