HIVE中的order by操作

hive中常见的高级查询包括：group by、Order by、join、distribute by、sort by、cluster by、Union all。今天我们来看看order by操作，Order by表示按照某些字段排序，语法如下：

select col,col2...
from tableName
where condition
order by col1,col2 [asc|desc]

注意：

(1)：order by后面可以有多列进行排序，默认按字典排序。

(2)：order by为全局排序。

(3)：order by需要reduce操作，且只有一个reduce，无法配置(因为多个reduce无法完成全局排序)。

order by操作会受到如下属性的制约：

set hive.mapred.mode=nonstrict; (default value / 默认值)
set hive.mapred.mode=strict;

注：如果在strict模式下使用order by语句，那么必须要在语句中加上limit关键字，因为执行order by的时候只能启动单个reduce，如果排序的结果集过大，那么执行时间会非常漫长。

下面我们通过一个示例来深入体会order by的用法：

数据库有一个employees表，数据如下：

hive> select * from employees;
OK
lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love
liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love
zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love

现在我要按第二列(salary)降序排列：

hive> select * from employees order by salary desc;
//执行MapReduce的过程
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 2.62 sec HDFS Read: 415 HDFS Write: 245 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 620 msec
OK
zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love
liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love
lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love
Time taken: 20.484 seconds
hive>

此时的hive.mapred.mode属性为：

hive> set hive.mapred.mode;
hive.mapred.mode=nonstrict
hive>

现在我们将它改为strict，然后再使用order by进行查询：

hive> set hive.mapred.mode=strict;
hive> select * from employees order by salary desc;
FAILED: Error in semantic analysis: 1:33 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'salary'
hive>

注：在strict模式下查询必须加上limit关键字。

hive> select * from employees order by salary desc limit 3;
FAILED: Error in semantic analysis: No partition predicate found for Alias "employees" Table "employees"

注：另外还有一个要注意的是strict模式也会限制分区表的查询，解决方案是必须指定分区

先来看看分区：

hive> show partitions employees;
OK
date_time=2015-01-24/type=love
Time taken: 0.096 seconds

在strict模式先使用order by查询：

hive> select * from employees where partition(date_time='2015-01-24',type='love') order by salary desc limit 3;
FAILED: Parse Error: line 1:30 cannot recognize input near 'partition' '(' 'date_time' in expression specification
hive
> select * from employees where date_time='2015-01-24' and type='love' order by salary desc limit 3;
//执行MapReduce程序
Total MapReduce CPU Time Spent: 3 seconds 510 msec
OK
zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love
liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love
lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love
Time taken: 19.861 seconds
hive>

HIVE中的order by操作的更多相关文章

Hive中的Order by与关系型数据库中的order by语句的异同点
在Hive中,ORDER BY语句是对查询结果集进行整体的排序,最终将会产生一个reducer进行全局的排序,达到的最终结果是和传统的关系型数据库是一样的. 在数据量非常大的时候,全局排序的单个red ...
Hive中的order by、sort by、distribute by、cluster by解释及测试
结论: order by:全局排序,这也是4种排序手段中唯一一个能在终端输出中看出全局排序的方法,只有一个reduce,可能造成renduce任务时间过长,在严格模式下,要求必须具备limit子句. ...
Hive 中的 order by, sort by, distribute by 与 cluster by
Order By order by 会对输入做全排序, 因此只有一个Reducer(多个Reducer无法保证全局有序), 然而只有一个Reducer, 会导致当输入规模较大时, 消耗较长的计算时间. ...
hive中order by,sort by, distribute by, cluster by的用法
1.order by hive中的order by 和传统sql中的order by 一样,对数据做全局排序,加上排序,会新启动一个job进行排序,会把所有数据放到同一个reduce中进行处理,不管数 ...
hive中Sort By，Order By，Cluster By，Distribute By，Group By的区别
order by: hive中的order by 和传统sql中的order by 一样,对数据做全局排序,加上排序,会新启动一个job进行排序,会把所有数据放到同一个reduce中进行处理,不管数 ...
hive中order by,sort by, distribute by, cluster by作用以及用法
1. order by Hive中的order by跟传统的sql语言中的order by作用是一样的,会对查询的结果做一次全局排序,所以说,只有hive的sql中制定了order by所有的 ...
[转载]hive中order by,sort by, distribute by, cluster by作用以及用法
1. order by Hive中的order by跟传统的sql语言中的order by作用是一样的,会对查询的结果做一次全局排序,所以说,只有hive的sql中制定了order by所有的 ...
Hive中的排序语法
ORDER BY hive中的ORDER BY语句和关系数据库中的sql语法相似.他会对查询结果做全局排序,这意味着所有的数据会传送到一个Reduce任务上,这样会导致在大数量的情况下,花费大量时间. ...
hive：数据库“行专列”操作---使用collect_set/collect_list/collect_all & row_number()over(partition by 分组字段 [order by 排序字段])
方案一:请参考<数据库“行专列”操作---使用row_number()over(partition by 分组字段 [order by 排序字段])>,该方案是sqlserver,orac ...

随机推荐

std::copy 和 std::back_inserter
#define print_vector(v1) \ for(auto iter = v1.begin();iter != v1.end();iter++) \ cout<<*iter&l ...
HDUOJ----专题训练C
Problem C Time Limit : 1000/1000ms (Java/Other) Memory Limit : 32768/32768K (Java/Other) Total Sub ...
分布式缓存技术memcached学习系列（三）——memcached内存管理机制
几个重要概念 Slab memcached通过slab机制进行内存的分配和回收,slab是一个内存块,它是memcached一次申请内存的最小单位,.在启动memcached的时候一般会使用参数-m指 ...
查看sqlserver的端口号[转]
查看sqlserver的端口号背景这几天想写一个使用java连接sqlserver的数据库连接测试程序.但是在查看数据库连接字符格式以后发现需要sqlserver数据库服务的端口号.在安装sql ...
将jar文件转换成exe可执行文件[转]
将jar文件转换成exe可执行文件: exe文件使用方便,而且还可以提高源码及资源的安全性,但同时也失去了java的初衷--跨平台性. 如果你坚持要转换成exe文件,请按以下方式进行: 利用exe4j ...
[Creating an image format with an unknown type is an error] on cordova, ios 10
在 iOS 10 调用了获取相册的可编辑的照片后,会出现 [Creating an image format with an unknown type is an error] 这个 ...
转python+selenium 使用switch_to_alert 出现的怪异常
如果switch_to_alert不工作,最重要的问题就是,有1个以上的浏览器开启,导致alert抓取不到.并且在使用switch_to_alert的时候时间会比较长一些,需要等待一会儿才能完成acc ...
mac与windows上部署使用Redis
windows下Redis安装在Redis的官网下载页上有各种各样的版本,由于redis官网不支持windows,但是我们伟大的windows家族还是召唤了一群小伙伴开发了win版的redis.要在 ...
几种常见排序算法之Java实现（插入排序、希尔排序、冒泡排序、快速排序、选择排序、归并排序）
排序(Sorting) 是计算机程序设计中的一种重要操作,它的功能是将一个数据元素(或记录)的任意序列,重新排列成一个关键字有序的序列. 稳定度(稳定性)一个排序算法是稳定的,就是当有两个相等记录的关 ...
去除img、video之间默认间隔的几种方法
img,video{ /*第1种方式*/ border: ; vertical-align: bottom; /*第2种方式*/ outline-width:0px; vertical-align:t ...

HIVE中的order by操作

HIVE中的order by操作的更多相关文章

随机推荐

热门专题