get top k elements of the same key in hive
key points:
1. group by key and sort by using distribute by and sort by.
2. get top k elements by a UDF (user defined function) RANK
---------Here is the source code.--------------
package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
public final class Rank extends UDF{
private int counter;
private String last_key;
public int evaluate(final String key){
if ( !key.equalsIgnoreCase(this.last_key) ) {
this.counter = 0;
this.last_key = key;
}
return this.counter++;
}
}
The details are as the following.
---original data, table region(region_nbr, region_id)---
100 10
200 12
300 33
100 4
100 8
200 20
300 31
300 3
400 4
200 2
-----what I need is as below-----
100 10
100 8
200 20
200 12
300 33
300 31
400 4
---
1. step1. compile java with a shell, compile_udf.sh.
#!/bin/bash
if [ $# != 1 ]; then
echo "Usage: $0 <java file>"
exit 1
fi
CNAME=${1%.java}
JARNAME=$CNAME.jar
JARDIR=/tmp/hive_jars/$CNAME
HIVE_HOME2="/usr/local/hive-0.9.0"
CLASSPATH=$(ls $HIVE_HOME2/lib/hive-serde-*.jar):$(ls $HIVE_HOME2/lib/hive-exec-*.jar):$(ls /home/oicq/hadoop/hadoop-1.0.2/hadoop-core-*.jar)
function tell {
echo
echo "$1 successfully compiled. In Hive run:"
#echo "$> add jar $JARNAME;"
#echo "$> create temporary function $CNAME as 'com.example.hive.udf.$CNAME';"
echo
}
mkdir -p $JARDIR
javac -classpath $CLASSPATH -d $JARDIR/ $1 && jar -cf $JARNAME -C $JARDIR/ . && tell $1
step 2. run hive
hive -e "add jar /data/ginobili/UDF/Rank.jar; create temporary function Rank as 'com.example.hive.udf.Rank'; select region_nbr, region_id from ( select region_nbr, region_id, Rank(region_nbr) as rank from (select * from test_gino.region distribute by region_nbr sort by region_nbr, region_id desc)a )b where rank < 2"
REFERENCE
http://stackoverflow.com/questions/9390698/hive-getting-top-n-records-in-group-by-query
get top k elements of the same key in hive的更多相关文章
- Leetcode 347. Top K Frequent Elements
Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...
- 347. Top K Frequent Elements
Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...
- [Swift]LeetCode347. 前K个高频元素 | Top K Frequent Elements
Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...
- C#版(打败99.28%的提交) - Leetcode 347. Top K Frequent Elements - 题解
版权声明: 本文为博主Bravo Yeung(知乎UserName同名)的原创文章,欲转载请先私信获博主允许,转载时请附上网址 http://blog.csdn.net/lzuacm. C#版 - L ...
- 347. Top K Frequent Elements (sort map)
Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...
- [LeetCode] Top K Frequent Elements 前K个高频元素
Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...
- [leetcode]347. Top K Frequent Elements K个最常见元素
Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...
- Top K Frequent Elements 前K个高频元素
Top K Frequent Elements 347. Top K Frequent Elements [LeetCode] Top K Frequent Elements 前K个高频元素
- [LeetCode] 347. Top K Frequent Elements 前K个高频元素
Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...
随机推荐
- mysql连接的空闲时间超过8小时后 MySQL自动断开该连接解决方案
在连接字符串中 添加设置节点 ConnectionLifeTime(计量单位为 秒).超过设定的连接会话 会被杀死! Connection Lifetime, ConnectionLifeTime ...
- 【HDOJ】2526 浪漫手机
字符串大水题. #include <cstdio> #include <cstring> #include <cstdlib> #define MAXN 105 t ...
- 【转】Java 多线程(四) 多线程访问成员变量与局部变量
原文网址:http://www.cnblogs.com/mengdd/archive/2013/02/16/2913659.html 先看一个程序例子: public class HelloThrea ...
- Android输入法界面管理(打开/关闭/状态获取)
最近做一个带发表情的聊天界面,需要管理系统输入法的状态, 一.打开输入法窗口: InputMethodManager inputMethodManager = (InputMethodManager) ...
- FZU2179/Codeforces 55D beautiful number 数位DP
题目大意: 求 1(m)到n直接有多少个数字x满足 x可以整出这个数字的每一位上的数字 思路: 整除每一位.只需要整除每一位的lcm即可 但是数字太大,dp状态怎么表示呢 发现 1~9的LCM 是2 ...
- 个性化品牌开始繁荣?为设计师和代工厂牵线的平台Maker's Row获得100万美元融资 | 36氪
个性化品牌开始繁荣?为设计师和代工厂牵线的平台Maker's Row获得100万美元融资 | 36氪 个性化品牌开始繁荣?为设计师和代工厂牵线的平台Maker's Row获得100万美元融资
- 控制反转(IOC)/依赖注入(DI)理解
个人学习笔记,来自Acode. 1.术语 控制反转/反向控制,英文全称“Inversion of Control”,简称IoC. 依赖注入,英文全称“Dependency Injection”,简称D ...
- Mac Dock 效果及原理(勾股定理)
这个是苹果机上的 Dock 效果,Windows 上也有一款专门的模拟软件——RocketDock. 代码如下: <!doctype html> <html> <head ...
- php 回调函数
publicfunction transaction(Closure $callback){ $this->beginTransaction(); // We'll simply ...
- Python之基础(二)
1.内建函数enumerate friends = ['john', 'pat', 'gary', 'michael'] for i, name in enumerate(friends): prin ...