hive--UDF、UDAF

1、UDF

package com.example.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public final class Lower extends UDF {

  public Text evaluate(final Text s) {

    if (s == null) { return null; }

    return new Text(s.toString().toLowerCase());

  }

}

add jar my_jar.jar;

create temporary function my_lower as 'com.example.hive.udf.Lower';

主要描述了实现一个udf的过程，首先自然是实现一个UDF函数，然后编译为jar并加入到hive的classpath中，最后创建一个临时变量名字让hive中调用。

2、UDAF

package org.apache.hadoop.hive.contrib.udaf.example;

import org.apache.hadoop.hive.ql.exec.UDAF;

import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;

/**

 * This is a simple UDAF that calculates average.

 *

 * It should be very easy to follow and can be used as an example for writing

 * new UDAFs.

 *

 * Note that Hive internally uses a different mechanism (called GenericUDAF) to

 * implement built-in aggregation functions, which are harder to program but

 * more efficient.

 *

 */

public final class UDAFExampleAvg extends UDAF {

  /**

   * The internal state of an aggregation for average.

   *

   * Note that this is only needed if the internal state cannot be represented

   * by a primitive.

   *

   * The internal state can also contains fields with types like

   * ArrayList<String> and HashMap<String,Double> if needed.

   */

  public static class UDAFAvgState {

    private long mCount;

    private double mSum;

  }

  /**

   * The actual class for doing the aggregation. Hive will automatically look

   * for all internal classes of the UDAF that implements UDAFEvaluator.

   */

  public static class UDAFExampleAvgEvaluator implements UDAFEvaluator {

    UDAFAvgState state;

    public UDAFExampleAvgEvaluator() {

      super();

      state = new UDAFAvgState();

      init();

    }

    /**

     * Reset the state of the aggregation.

     */

    public void init() {

      state.mSum = 0;

      state.mCount = 0;

    }

    /**

     * Iterate through one row of original data.

     *

     * The number and type of arguments need to the same as we call this UDAF

     * from Hive command line.

     *

     * This function should always return true.

     */

    public boolean iterate(Double o) {

      if (o != null) {

        state.mSum += o;

        state.mCount++;

      }

      return true;

    }

    /**

     * Terminate a partial aggregation and return the state. If the state is a

     * primitive, just return primitive Java classes like Integer or String.

     */

    public UDAFAvgState terminatePartial() {

      // This is SQL standard - average of zero items should be null.

      return state.mCount == 0 ? null : state;

    }

    /**

     * Merge with a partial aggregation.

     *

     * This function should always have a single argument which has the same

     * type as the return value of terminatePartial().

     */

    public boolean merge(UDAFAvgState o) {

      if (o != null) {

        state.mSum += o.mSum;

        state.mCount += o.mCount;

      }

      return true;

    }

    /**

     * Terminates the aggregation and return the final result.

     */

    public Double terminate() {

      // This is SQL standard - average of zero items should be null.

      return state.mCount == 0 ? null : Double.valueOf(state.mSum

          / state.mCount);

    }

  }

  private UDAFExampleAvg() {

    // prevent instantiation

  }

}

关于UDAF开发注意点：

1.需要import org.apache.hadoop.hive.ql.exec.UDAF以及org.apache.hadoop.hive.ql.exec.UDAFEvaluator,这两个包都是必须的

2.函数类需要继承UDAF类，内部类Evaluator实现UDAFEvaluator接口

3.Evaluator需要实现 init、iterate、terminatePartial、merge、terminate这几个函数

1）init函数类似于构造函数，用于UDAF的初始化

2）iterate接收传入的参数，并进行内部的轮转。其返回类型为boolean

3）terminatePartial无参数，其为iterate函数轮转结束后，返回乱转数据，iterate和terminatePartial类似于hadoop的Combiner

4）merge接收terminatePartial的返回结果，进行数据merge操作，其返回类型为boolean

5）terminate返回最终的聚集函数结果

hive--UDF、UDAF的更多相关文章

Hive 10、Hive的UDF、UDAF、UDTF
Hive自定义函数包括三种UDF.UDAF.UDTF UDF(User-Defined-Function) 一进一出 UDAF(User- Defined Aggregation Funcation) ...
hive中UDF、UDAF和UDTF使用
Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. 一.背景:Hive是基于Hadoop中的MapReduce,提供HQ ...
【转】hive中UDF、UDAF和UDTF使用
原博文出自于: http://blog.csdn.net/liuj2511981/article/details/8523084 感谢! Hive进行UDF开发十分简单,此处所说UDF为Tempora ...
HIVE函数的UDF、UDAF、UDTF
一.词义解析 UDF(User-Defined-Function) 一进一出 UDAF(User- Defined Aggregation Funcation) 多进一出 (聚合函数,MR) UDTF ...
【Spark-SQL学习之三】 UDF、UDAF、开窗函数
环境虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 scala-2.10.4(依赖jdk1.8) spark ...
UDF、UDAF、UDTF函数编写
一.UDF函数编写 1.步骤 1.继承UDF类 2.重写evalute方法 .继承GenericUDF .实现initialize.evaluate.getDisplayString方法 2.案例实 ...
Kafka：ZK+Kafka+Spark Streaming集群环境搭建（十五）Spark编写UDF、UDAF、Agg函数
Spark Sql提供了丰富的内置函数让开发者来使用,但实际开发业务场景可能很复杂,内置函数不能够满足业务需求,因此spark sql提供了可扩展的内置函数. UDF:是普通函数,输入一个或多个参数, ...
Hive 编程之DDL、DML、UDF、Select总结
Hive的基本理论与安装可参看作者上一篇博文<Apache Hive 基本理论与安装指南>. 一.Hive命令行所有的hive命令都可以通过hive命令行去执行,hive命令行中仍有许多 ...
在hive中UDF和UDAF使用说明
Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. 一.背景:Hive是基于Hadoop中的MapReduce,提供HQ ...
[转]HIVE UDF/UDAF/UDTF的Map Reduce代码框架模板
FROM : http://hugh-wangp.iteye.com/blog/1472371 自己写代码时候的利用到的模板 UDF步骤: 1.必须继承org.apache.hadoop.hive ...

随机推荐

Spring3之JDBC
Spring提供了统一的数据访问异常层次体系,所涉及到的大部分异常类型都定义在org.springframework.dao包中,出于这个体系中所有异常类型均以org.springframework. ...
word-wrap,word-break,text-wrap的区别
今晚看到了无双老师关于word-wrap,word-break区别的讲解 http://www.cnblogs.com/2050/archive/2012/08/10/2632256.html 受益 ...
C++第五章函数
书上的点: 这次直接写写画画了,遇到的bug也就直接敲了,忘记记录了,好在都在书上,所以勾画一下,提一下.发现每一章后面的小结,都蛮有意思的.可以抄一遍. 1.返回值的函数成为返回值函数(value- ...
CentOS 6.x安装gcc 4.8/4.9/5.2
1.gcc 4.8 cd /etc/yum.repos.d wget http://people.centos.org/tru/devtools-2/devtools-2.repo -gcc -bin ...
今天学习image在html中的应用
今天学习image在html中的应用上次在学习超级链接的使用中有一小问题,是在添加网址中href="http://www.baidu.com" 中不能忘记http://,否则链接 ...
Javascript原型钩沉
写在前面的总结: JS当中创建一个对象有好几种方式,大体上就是以下几种: ①通过var obj ={...} 这种方式一般称为字面量方式,{}直接写需要定义的字段 ②var obj = new Obj ...
Sql Server通过BCP数据导出Excel
1.1. bcp的主要参数介绍 bcp共有四个动作可以选择. (1) 导入. 这个动作使用in命令完成,后面跟需要导入的文件名. (2) 导出. 这个动作使用out命令完成,后面跟需要导出的文件名. ...
vim 高级使用技巧第二篇
上篇我贴上了我使用的vim配置及插件配置,有这些东西只能是一个脚本堆积,无从谈高效的代码阅读开发. 下面我们就来写经常使用的命令,就从配置F系列快捷键开始吧. F+ n 快捷键配置 F1基本上时帮助, ...
C语言:Day1~Day4
点击右键查看原图
WPF的TextBox的焦点获取与失去焦点的死循环解决方案
在WPF中实现一个弹出层自动获取焦点,弹出层实现是通过其UserControl的依赖属性Visibility的绑定实现的,让UserControl上的TextBox获取焦点,初始实现代码如下: pub ...

hive--UDF、UDAF

hive--UDF、UDAF的更多相关文章

随机推荐

热门专题