自定义Hive UDAF 实现相邻去重

内置的两个聚合函数（UDAF）

collect_list()：多行字符串拼接为一行
collect_set()：多行字符串拼接为一行并去重
多行字符串拼接为一行并相邻去重UDAF：Concat()

concat_udaf.jar

package com.tcc.udaf;

import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;

public class Concat extends UDAF
{
public static class ConcatUDAFEvaluator
implements UDAFEvaluator
{
private PartialResult partial;

public void init()
{
this.partial = null;
}

public boolean iterate(String value, String deli)
{
if (value == null) {
return true;
}
if (this.partial == null) {
this.partial = new PartialResult();
this.partial.result = new String("");
if ((deli == null) || (deli.equals("")))
{
this.partial.delimiter = new String(",");
}
else
{
this.partial.delimiter = new String(deli);
}
}

if (this.partial.result.length() > 0)
{
this.partial.result = this.partial.result.concat(this.partial.delimiter);
}

this.partial.result = this.partial.result.concat(value);

return true;
}

public PartialResult terminatePartial() {
return this.partial;
}

public boolean merge(PartialResult other) {
if (other == null) {
return true;
}
if (this.partial == null) {
this.partial = new PartialResult();
this.partial.result = new String(other.result);
this.partial.delimiter = new String(other.delimiter);
}
else
{
if (this.partial.result.length() > 0)
{
this.partial.result = this.partial.result.concat(this.partial.delimiter);
}
this.partial.result = this.partial.result.concat(other.result);
}
return true;
}

public String terminate() {
String s = new String(this.partial.result);

if (s.indexOf(this.partial.delimiter) != -1) {
String[] str = s.split(this.partial.delimiter);

StringBuffer sb = new StringBuffer();

int i = 0; int j = 1;
while (i < str.length - 1) {
while (j < str.length) {
if (str[j].equals(str[i])) {
if (j == str.length - 1) {
sb.append(str[i]);
break;
}
j++;
} else {
sb.append(str[i]);
sb.append(this.partial.delimiter);
break;
}
}
i = j;
j = i + 1;
}
if ((i == str.length - 1) && (!str[i].equals(str[(i - 1)]))) {
sb.append(str[i]);
}
return sb.toString();
}
return s;
}

public static class PartialResult
{
String result;
String delimiter;
}
}
}

使用：

add jar concat_udaf.jar;
create temporary function Concat as 'com.tcc.udaf.Concat';
select a,concat(b,',') from concat_test group by a;
————————————————
转自：https://me.csdn.net/chuangchuangtao
原文链接：https://blog.csdn.net/chuangchuangtao/article/details/77455675

自定义Hive UDAF 实现相邻去重的更多相关文章

Hive UDAF开发详解
说明这篇文章是来自Hadoop Hive UDAF Tutorial - Extending Hive with Aggregation Functions:的不严格翻译,因为翻译的文章示例写得比较 ...
Hive UDAF开发之同时计算最大值与最小值
卷首语前一篇文章hive UDAF开发入门和运行过程详解(转)里面讲过UDAF的开发过程,其中说到如果要深入理解UDAF的执行,可以看看求平均值的UDF的源码本人在看完源码后,也还是没能十分理解里 ...
[转]hive中自定义函数(UDAF)实现多行字符串拼接为一行
函数如何使用: hive> desc concat_test;OKa intb string hive> select * from concat_test;OK1 ...
Hive UDAF介绍与开发
UDAF简介 UDAF是用户自定义聚合函数.Hive支持其用户自行开发聚合函数完成业务逻辑. 通俗点说,就是你可能需要做一些特殊的甚至是非常扭曲的逻辑聚合,但是Hive自带的聚合函数不够玩,同时也还找 ...
hive UDAF开发入门和运行过程详解（转）
介绍 hive的用户自定义聚合函数(UDAF)是一个很好的功能,集成了先进的数据处理.hive有两种UDAF:简单和通用.顾名思义,简单的UDAF,写的相当简单的,但因为使用Java反射导致性能损失, ...
自定义Hive函数
7. 函数 7.1 系统内置函数查看系统自带的函数:show functions; 显示自带的函数的用法:desc function upper(函数名); 详细显示自带的函数的用法:desc fu ...
hive UDAF开发和运行全过程
介绍 hive的用户自定义聚合函数(UDAF)是一个很好的功能,集成了先进的数据处理.hive有两种UDAF:简单和通用.顾名思义,简单的UDAF,写的相当简单的,但因为使用Java反射导致性能损失, ...
hive UDAF
java 程序 package com.ibeifeng.udaf; import org.apache.hadoop.hive.ql.exec.UDAF; import org.apache.had ...
hive UDAF源代码分析
sss /** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license a ...

随机推荐

利用hash或history实现单页面路由
目录 html代码 css代码 JavaScript代码 hash方式 history 方式浏览器端代码服务器端在chrome(版本 70.0.3538.110)测试正常编写涉及:css, h ...
ubuntu16.04双系统创建分区
ubuntu安装分区安装ubuntu 图1:Ubuntu Linux分区向导如果希望对分区过程进行完全控制,可以使用"其它"选项.单击"继续"按钮,安装向导 ...
tracert和traceroute介绍
一.tracert和traceroute简介相同点:都是用来跟踪路由,帮助排查问题,关注的是过程,而ping关注的是结果不同点:tracert请求是icmp echo报文 traceroute请求 ...
Java函数式编程原理以及应用
一. 函数式编程 Java8所有的新特性基本基于函数式编程的思想,函数式编程的带来,给Java注入了新鲜的活力. 下面来近距离观察一下函数式编程的几个特点: 函数可以作为变量.参数.返回值和数据类型. ...
bs4-BeautifulSoup
1.BeautifulSoup下载 pip install BeautifulSoup4 或者 pip install bs4 pip install lxml #解析器 2.BeautifulSou ...
Python程序包的构建和发布过程
关于我一个有思想的程序猿,终身学习实践者,目前在一个创业团队任team lead,技术栈涉及Android.Python.Java和Go,这个也是我们团队的主要技术栈. Github:https:/ ...
MySQL-InnoDB锁（二）
上篇文章中对InnoDB存储引擎中的锁进行学习,本文是实践部分,根据索引和查询范围,探究加锁范围的情况. 在本实例中,创建简单表如下: mysql> select * from t; +---- ...
Vue实现静态数据分页
<div style="padding:20px;" id="app"> <div class="panel panel-prima ...
Docker学习总结(五)--迁移与备份
将容器保存为镜像 docker commit myNginx mynginx_i 镜像备份 docker save -o myNginx.tar myNginx_i 镜像恢复 docker load ...
初探Electron，从入门到实践
本文由葡萄城技术团队于博客园原创并首发转载请注明出处:葡萄城官网,葡萄城为开发者提供专业的开发工具.解决方案和服务,赋能开发者. 在开始之前,我想您一定会有这样的困惑:标题里的Electron ...

自定义Hive UDAF 实现相邻去重

自定义Hive UDAF 实现相邻去重的更多相关文章

随机推荐

热门专题