1 数据来源

本次实战的数据来自于"YouTube视频统计与社交网络"的数据集,是西蒙弗雷泽大学计算机学院在2008年所爬取的数据

数据集地址

1. 1 Youtube视频表格式如下:

列名 注释
视频ID 一个11位字符串,是唯一的
上传 一个字符串的视频上传者的用户名
年龄 视频上传日期和2007年2月15日之间的整数天(YouTube的设立)
类别 由上传者选择的视频类别的字符串
长度 视频长度的整数v
观看数 一整数的视图
一个浮点数的视频速率
评分 整数的评分
评论数 一整数的评论
相关视频ID 最多20个字符串的相关视频ID

数据之间采用"\t"作为分隔符

具体数据如下:

video ID uploader age category length views rate ratings comments related IDs
ifnlnji-Y4s Hooran 1162 Travel & Events 239 189 4.8 10 3 tpAL3I0urI4 ... ifnlnji-Y4s

数据量大小为1G,条数为500万+

1.2 用户表

列名 uploader videos friends
类型 string int int
解释 上传者 上传视频数 朋友数

2 实战演练准备

2.1 环境搭建

使用环境为

hive-1.1.0-cdh5.4.5

hadoop-2.6.0-cdh5.4.5

演示形式为使用hive shell

2.2 数据清洗

我们一起来看看数据

video ID uploader age category length views rate ratings comments related IDs
ifnlnji-Y4s Hooran 1162 Travel & Events 239 189 4.8 10 3 tpAL3I0urI4 ... ifnlnji-Y4s

主要的问题在于category和relatedIDs处理,由于Hive是支持array格式的,所以我们想到的是使用array来存储category和relatedIDs,但是我们发现category的分割符是"&"而realatedIDs的分隔符是"\t",我们在创建表格的时候能够指定array的分隔符,但是只能指定一个,所以再将数据导入到Hive表格之前我们需要对数据进行一定转换和清洗

并且数据中肯定会存在一些不完整数据和一些奇怪的格式,所以数据的清洗是必要的,我在这里所使用的数据清洗方式是使用Spark进行清洗,也可以使用自定义UDF函数来进行清洗

数据清洗注意点

1)我们可以看到每行数据以"\t"作为分隔符,每行有十列数据,最后一列关联ID可以为空,那么我们对数据进行split之后数组的大小要大于8

2)数据中存在"uNiKXDA8eyQ KRQE 1035 News & Politics 107"这样格式的数据,所以在处理category时需要注意 News & Politics中间的&

处理后的数据如下:

video ID uploader age category length views rate ratings comments related IDs
PkGUU_ggO3k theresident 704 Entertainment 262 11235 3.85 247 280 PkGUU_ggO3k&EYC5bWF0ss8&...shU2hfHKmU0&p0lq5-8IDqY
RX24KLBhwMI lemonette 697 People&Blogs 512 24149 4.22 315 474 t60tW0WevkE&WZgoejVDZlo&...s8xf4QX1UvA&2cKd9ERh5-8

下面的实战都是基于数据清洗后的数据进行的

2.3 创建表格和数据导入

2.3.1 youtube表格的创建和导入

1)youtube1的创建,文件格式为textfile

create table youtube1(videoId string, uploader string, age int, category array, length int, views int, rate float, ratings int, comments int,relatedId array)

row format delimited

fields terminated by "\t"

collection items terminated by "&"

stored as textfile;

2)youtube2的创建,文件格式为orc

create table youtube2(videoId string, uploader string, age int, category array, length int, views int, rate float, ratings int, comments int,relatedId array)

row format delimited

fields terminated by "\t"

collection items terminated by "&"

stored as orc;

3)youtube3的创建,文件格式为orc,进行桶分区

create table youtube3(videoId string, uploader string, age int, category array, length int, views int, rate float, ratings int, comments int,relatedId array)

clustered by (uploader) into 8 buckets

row format delimited

fields terminated by "\t"

collection items terminated by "&"

stored as orc;

数据导入:

1)load data inpath "path" into table youtube1;

2)由于无法将textfile格式的数据导入到orc格式的表格,所以数据需要从youtube1导入到youtube2和youtube3:

insert into table youtube2 select * from youtube1;

insert into table youtube3 select * from youtube1;

2.3.2 user表格的创建和导入

1)user_tmp的创建,文件格式textfile,24buckets

create table user_tmp(uploader string,videos int,friends int)

clustered by (uploader) into 24 buckets

row format delimited

fields terminated by "\t"

stored as textfile;

2)user的创建,文件格式orc,24buckets

create table user(uploader string,videos int,friends int)

clustered by (uploader) into 24 buckets

row format delimited

fields terminated by "\t"

stored as orc;

user表的数据导入也是同理

数据导入:

1)load data inpath "path" into table user_tmp;

2)由于无法将textfile格式的数据导入到orc格式的表格,所以数据需要从user_tmp导入到user:

insert into table user select * from user_tmp;

3 实战需求

1)统计出观看数最多的10个视频

2)统计出视频类别热度的前10个类型

3)统计出视频观看数最高的50个视频的所属类别

4)统计出观看数最多的前N个视频所关联的视频的所属类别排行

5)筛选出每个类别中热度最高的前10个视频

6)筛选出每个类别中评分最高的前10个视频

7)找出用户中上传视频最多的10个用户的所有视频

8)筛选出每个类别中观看数Top10

4 实战演练

4.1 统计出观看数最多的10个视频

select * from youtube3 order by views desc limit 10;

结果如下:

hive> select * from youtube3 order by views desc limit 10;
Query ID = hadoop_20170710155353_4bc057f0-bbd5-4bfe-a66c-ec0e17cb3ca9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0101, Tracking URL = http://master:8088/proxy/application_1499153664137_0101/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0101
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
2017-07-10 15:53:55,297 Stage-1 map = 0%, reduce = 0%
...
2017-07-10 15:56:06,210 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 106.93 sec
MapReduce Total cumulative CPU time: 1 minutes 46 seconds 930 msec
Ended Job = job_1499153664137_0101
MapReduce Jobs Launched:
Stage-Stage-1: Map: 4 Reduce: 1 Cumulative CPU: 106.93 sec HDFS Read: 574632526 HDFS Write: 2916 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 46 seconds 930 msec
OK
dMH0bHeiRNg judsonlaipply 415 ["Comedy"] 360 79897120 4.65 287260 131356 ["HSoVKUVOnfQ","pv5zWaTEVkI","v1U8f7TmYkA","ZCT3MIUTc3o","mDiBOF8XI44","P9LmHXXWiJs","fo_QVq2lGMs","EtGQgSY9Nn4","5P6UU6m3cqk","61heClTFc5w","h3gdSHGcUU4","OPmYbP0F4Zw","gsOaQGF7kiQ","H2gw9VE16mo","rGkIUlYEQT8","innfyQZHPpo","cu8tUy14zQo","cQ25-glGRzI","WqDbn2iLwwY","jvz0bvYmnto"]
cQ25-glGRzI RCARecords 742 ["Music"] 227 77674728 4.45 188154 186858 ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
12Z3J1uzd0Q kaejane 404 ["Film","Animation"] 615 65341925 3.03 9189 5508 ["innfyQZHPpo","-_CSo1gOd48","1Al4crLW9EM","nhSZs-aAZbo","Af9aPlG2WFw","5P6UU6m3cqk","uBHsOTYO6AI","lW19DnWz6vg","vr3x_RRJdd4","5GE82tqcYYQ","D2kJZOfq7zk","lsO6D1rwrKc","dMH0bHeiRNg","kNLXjXxj3J8","vQbYvjmfbr4","G-HonZGBWus","MuOvqeABHvQ","-2caf6KlSyw","cQ25-glGRzI","pv5zWaTEVkI"]
LpAI8TzQDes IMVUinc 848 ["Entertainment"] 68 65078772 1.38 5678 2499 ["c2eP-UADAi8","5P6UU6m3cqk","cQ25-glGRzI","FYbqcgd97NQ","12Z3J1uzd0Q","VS4wFOWFUt8","RB-wUgnyGv0","sBQLq2VmZcA","w2xUzv6iZWo","oOF3T9Zvizc","244qR7SvvX0","pv5zWaTEVkI","dMH0bHeiRNg","b3u65f4CRLk"]
7AVHXe-ol-s internmarket 603 ["Music"] 264 60349673 3.16 1033 594 ["trrRo3kGRv0","CCsGkN1PEec","l5mQKJDO-nY","DP_4rQcU0G8","7xz5aOvP2YA","kCjKIus3iBA","nD0MILEXZP0","IQcyLMa716k","d05KUL8aW4E","szeeHnu2DM8","7b_wObF6vgs","UazqVaOg9uc","LRBD9l3Nv6Y","108YigkYFgM","_qCau44yfX0","NkkGj4_1m9A","kuyRSrklGU8","uLr7IJYyNnM","IEuyk_277xY","XmAaYo-CQNw"]
244qR7SvvX0 donotasyoudo 960 ["Entertainment"] 6 57790943 1.45 66412 14913 ["v3ARyAb_1Bs","nhSZs-aAZbo","2pNTrYd-4FQ","dMH0bHeiRNg","kHmvkRoEowc","yh0xYO3dYx8","5GE82tqcYYQ","ktUSIJEiOug","6PrDw6T3-rM","-xEzGIuY7kw","innfyQZHPpo","pv5zWaTEVkI","E7QOUJfRs3Y","9oRQJK61MWs"]
ePyRrb2-fzs 1988basti 826 ["Music"] 204 45984219 4.87 51039 27065 ["fm0T7_SGee4","h8oBykb_Pqs","iWg3IMN_rhU","MdOAr_4FJvc","xNp7OxgFJJM","nzaw_Gk9BAU","u9rl4apDFfY","fMzMm4sJYGo","b5eEa5a2Rio","aNR0tW_afdk","Kj_MceyjS6g","GojTUmjxVHU","CGPUuPHdHQg","SIM4DCn7AlE","ktUSIJEiOug","Dn6kGzRSjhU","gxxU68z8quM","HRSrFm_8YK8","2__Qdd11rfA","XewBty95JVg"]
xsRWpK4pf90 universalmusicgroup 902 ["Music"] 240 44614530 4.73 99373 53441 ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
ktUSIJEiOug aliciakeys 953 ["Music"] 251 43583367 4.85 103074 59672 ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
b3u65f4CRLk srcrecords 729 ["Music"] 256 43511791 4.76 55148 27818 ["4fZF9UsS8LY","5f7kfhHHH_A","H1vaszd6NnA","6Di-dAIR9RA","HiBcQeIax4g","JVciSFc5Wrw","Md6rURKhZmA","IG5ReXP0SSg","D9g2szHsoz0","EflOarvoeVs","7PxBGHjABnU","Gjq1g3j7WFc","mDvCcrU-Ob8","V9YE8dHMxvw","iWg3IMN_rhU","Un2dbprtZFE","WpgMuHwAdC4","RtQ5JENdx5I","ZEYgAgKVuO4","D_2jb8D8AOI"]
Time taken: 172.566 seconds, Fetched: 10 row(s)

4.2 统计出视频类别热度的前10个类型

select tagId, count(a.videoid) as sum from (select videoid,tagId from youtube3 lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc limit 10;

结果:

hive> select tagId, count(a.videoid) as sum from (select videoid,tagId from youtube3 lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc limit 10;
Query ID = hadoop_20170710155757_614ec9dd-ffa0-465d-8d62-c47b5ad585f0
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
Starting Job = job_1499153664137_0102, Tracking URL = http://master:8088/proxy/application_1499153664137_0102/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0102
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 2
2017-07-10 15:58:23,797 Stage-1 map = 0%, reduce = 0%
...
2017-07-10 15:59:16,341 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 19.51 sec
MapReduce Total cumulative CPU time: 19 seconds 510 msec
Ended Job = job_1499153664137_0102
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0103, Tracking URL = http://master:8088/proxy/application_1499153664137_0103/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0103
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-07-10 15:59:51,645 Stage-2 map = 0%, reduce = 0%
2017-07-10 16:00:18,373 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.97 sec
2017-07-10 16:00:48,410 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.57 sec
MapReduce Total cumulative CPU time: 2 seconds 570 msec
Ended Job = job_1499153664137_0103
MapReduce Jobs Launched:
Stage-Stage-1: Map: 4 Reduce: 2 Cumulative CPU: 19.51 sec HDFS Read: 47327614 HDFS Write: 900 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.57 sec HDFS Read: 5596 HDFS Write: 148 SUCCESS
Total MapReduce CPU Time Spent: 22 seconds 80 msec
OK
Entertainment 1304724
Music 1274825
Comedy 449652
Blogs 447581
People 447581
Film 442109
Animation 442109
Sports 390619
Politics 186753
News 186753
Time taken: 185.399 seconds, Fetched: 10 row(s)

4.3 统计出视频观看数最高的50个视频的所属类别

select tagId, count(a.videoid) as sum from (select videoid,tagId from (select * from youtube3 order by views desc limit 20) e lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc;

结果:

hive> select tagId, count(a.videoid) as sum from (select videoid,tagId from (select * from youtube3 order by views desc limit 20) e lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc;
Query ID = hadoop_20170710160909_c6cbbe29-4df3-4c0b-ad70-bd34857acc80
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0104, Tracking URL = http://master:8088/proxy/application_1499153664137_0104/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0104
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
2017-07-10 16:09:48,197 Stage-1 map = 0%, reduce = 0%
2017-07-10 16:10:17,734 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 3.09 sec
...
2017-07-10 16:10:59,048 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 47.74 sec
MapReduce Total cumulative CPU time: 47 seconds 740 msec
Ended Job = job_1499153664137_0104
Launching Job 2 out of 3
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0105, Tracking URL = http://master:8088/proxy/application_1499153664137_0105/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0105
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 2
2017-07-10 16:11:35,860 Stage-2 map = 0%, reduce = 0%
2017-07-10 16:12:04,633 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.07 sec
2017-07-10 16:12:23,187 Stage-2 map = 100%, reduce = 50%, Cumulative CPU 2.61 sec
2017-07-10 16:12:32,480 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.93 sec
MapReduce Total cumulative CPU time: 3 seconds 930 msec
Ended Job = job_1499153664137_0105
Launching Job 3 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0106, Tracking URL = http://master:8088/proxy/application_1499153664137_0106/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0106
Hadoop job information for Stage-3: number of mappers: 2; number of reducers: 1
2017-07-10 16:13:09,244 Stage-3 map = 0%, reduce = 0%
2017-07-10 16:13:37,263 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.18 sec
2017-07-10 16:14:05,048 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 3.5 sec
MapReduce Total cumulative CPU time: 3 seconds 500 msec
Ended Job = job_1499153664137_0106
MapReduce Jobs Launched:
Stage-Stage-1: Map: 4 Reduce: 1 Cumulative CPU: 47.74 sec HDFS Read: 57839263 HDFS Write: 228 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 2 Cumulative CPU: 3.93 sec HDFS Read: 6065 HDFS Write: 324 SUCCESS
Stage-Stage-3: Map: 2 Reduce: 1 Cumulative CPU: 3.5 sec HDFS Read: 6600 HDFS Write: 53 SUCCESS
Total MapReduce CPU Time Spent: 55 seconds 170 msec
OK
Music 12
Comedy 3
Entertainment 3
Film 2
Animation 2
Time taken: 297.466 seconds, Fetched: 5 row(s)

4.4 统计出观看数最多的前50个视频所关联的视频的所属类别排行

思路:

  1. 首先筛选出前50个视频所关联的视频,

    select * from youtube3 order by views desc limit 50
  2. 再将结果和youtube3进行join

    select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId
  3. 然后去重

    select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId
  4. 最后得出所有视频的类别排行

    select tagId, count(e.videoid) as sum from (select videoid,tagId from (select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId) d lateral view explode(category) catetory as tagId) e group by tagId order by sum desc;

结果:

hive> select tagId, count(e.videoid) as sum from (select videoid,tagId from (select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId) d lateral view explode(category) catetory as tagId) e group by tagId order by sum desc;
Query ID = hadoop_20170710170808_cfbfde68-c016-4c5d-860b-fe840a2d50cb
Total jobs = 7
Launching Job 1 out of 7
Number of reduce tasks determined at compile time: 1 Starting Job = job_1499153664137_0111, Tracking URL = http://master:8088/proxy/application_1499153664137_0111/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0111
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
2017-07-10 17:08:48,662 Stage-1 map = 0%, reduce = 0%
2017-07-10 17:09:25,175 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 20.83 sec
...
2017-07-10 17:10:15,276 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 81.01 sec
MapReduce Total cumulative CPU time: 1 minutes 21 seconds 10 msec
Ended Job = job_1499153664137_0111
Stage-10 is filtered out by condition resolver.
Stage-11 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
17/07/10 17:10:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Execution log at: /tmp/hadoop/hadoop_20170710170808_cfbfde68-c016-4c5d-860b-fe840a2d50cb.log
2017-07-10 05:10:23 Starting to launch local task to process map join; maximum memory = 518979584
2017-07-10 05:10:34 Dump the side-table for tag: 0 with group count: 743 into file: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-10_17-08-08_628_1950388697756865753-1/-local-10009/HashTable-Stage-8/MapJoin-mapfile30--.hashtable
2017-07-10 05:10:34 Uploaded 1 File to: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-10_17-08-08_628_1950388697756865753-1/-local-10009/HashTable-Stage-8/MapJoin-mapfile30--.hashtable (22611 bytes)
2017-07-10 05:10:34 End of local task; Time Taken: 11.392 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 3 out of 7
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1499153664137_0112, Tracking URL = http://master:8088/proxy/application_1499153664137_0112/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0112
Hadoop job information for Stage-8: number of mappers: 4; number of reducers: 0
2017-07-10 17:11:13,493 Stage-8 map = 0%, reduce = 0%
2017-07-10 17:11:57,229 Stage-8 map = 100%, reduce = 0%, Cumulative CPU 15.79 sec
MapReduce Total cumulative CPU time: 15 seconds 790 msec
Ended Job = job_1499153664137_0112
Launching Job 4 out of 7
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
Starting Job = job_1499153664137_0113, Tracking URL = http://master:8088/proxy/application_1499153664137_0113/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0113
Hadoop job information for Stage-3: number of mappers: 3; number of reducers: 2
2017-07-10 17:12:31,982 Stage-3 map = 0%, reduce = 0%
2017-07-10 17:13:17,467 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 6.53 sec
MapReduce Total cumulative CPU time: 6 seconds 530 msec
Ended Job = job_1499153664137_0113
Launching Job 5 out of 7
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
Starting Job = job_1499153664137_0114, Tracking URL = http://master:8088/proxy/application_1499153664137_0114/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0114
Hadoop job information for Stage-4: number of mappers: 2; number of reducers: 2
2017-07-10 17:13:55,123 Stage-4 map = 0%, reduce = 0%
2017-07-10 17:14:22,876 Stage-4 map = 50%, reduce = 0%, Cumulative CPU 0.91 sec
2017-07-10 17:14:23,905 Stage-4 map = 100%, reduce = 0%, Cumulative CPU 1.99 sec
2017-07-10 17:14:40,601 Stage-4 map = 100%, reduce = 50%, Cumulative CPU 3.35 sec
2017-07-10 17:14:52,033 Stage-4 map = 100%, reduce = 100%, Cumulative CPU 4.96 sec
MapReduce Total cumulative CPU time: 4 seconds 960 msec
Ended Job = job_1499153664137_0114
Launching Job 6 out of 7
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0115, Tracking URL = http://master:8088/proxy/application_1499153664137_0115/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0115
Hadoop job information for Stage-5: number of mappers: 2; number of reducers: 1
2017-07-10 17:15:27,647 Stage-5 map = 0%, reduce = 0%
2017-07-10 17:15:54,560 Stage-5 map = 50%, reduce = 0%, Cumulative CPU 0.91 sec
2017-07-10 17:15:55,587 Stage-5 map = 100%, reduce = 0%, Cumulative CPU 2.03 sec
2017-07-10 17:16:23,357 Stage-5 map = 100%, reduce = 100%, Cumulative CPU 3.59 sec
MapReduce Total cumulative CPU time: 3 seconds 590 msec
Ended Job = job_1499153664137_0115
MapReduce Jobs Launched:
Stage-Stage-1: Map: 4 Reduce: 1 Cumulative CPU: 81.01 sec HDFS Read: 463074822 HDFS Write: 26949 SUCCESS
Stage-Stage-8: Map: 4 Cumulative CPU: 15.79 sec HDFS Read: 47322670 HDFS Write: 30829 SUCCESS
Stage-Stage-3: Map: 3 Reduce: 2 Cumulative CPU: 6.53 sec HDFS Read: 43694 HDFS Write: 1126 SUCCESS
Stage-Stage-4: Map: 2 Reduce: 2 Cumulative CPU: 4.96 sec HDFS Read: 8888 HDFS Write: 725 SUCCESS
Stage-Stage-5: Map: 2 Reduce: 1 Cumulative CPU: 3.59 sec HDFS Read: 7001 HDFS Write: 207 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 51 seconds 880 msec
OK
Music 399
Entertainment 103
Comedy 94
People 36
Blogs 36
Film 19
Animation 19
Pets 12
Animals 12
UNA 11
Style 6
Politics 6
News 6
Howto 6
Sports 4
Vehicles 3
Autos 3
Travel 2
Events 2
Technology 1
Science 1
Time taken: 496.822 seconds, Fetched: 21 row(s)

4.5 筛选出某个类别(如music)中热度最高的前10个视频

思路:

  • 创建一个表格,存储每个视频对应一个标签的信息

    create table youtube_category(videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int,relatedId array)

    row format delimited

    fields terminated by "\t"

    collection items terminated by "&"

    stored as orc;
  • 将转换后的数据进行插入

    insert into table youtube_category select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from youtube3 lateral view explode(category) catetory as categoryId;
  • 根据观看数和类别进行查询

    select * from youtube_category where categoryId="Music" order by views desc limit 10;

结果如下:

hive> create table youtube_category(videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int,relatedId array<string>)
> row format delimited
> fields terminated by "\t"
> collection items terminated by "&"
> stored as orc;
OK
Time taken: 0.256 seconds
hive> insert into youtube_category select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from youtube3 lateral view explode(category) catetory as categoryId;
Query ID = hadoop_20170711091010_7c76d7bd-5af0-43f9-812f-e30c612ee60b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1499153664137_0116, Tracking URL = http://master:8088/proxy/application_1499153664137_0116/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0116
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 0
2017-07-11 09:11:18,741 Stage-1 map = 0%, reduce = 0%
...
2017-07-11 09:19:05,239 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 402.71 sec
MapReduce Total cumulative CPU time: 6 minutes 42 seconds 710 msec
Ended Job = job_1499153664137_0116
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://bydcluster1/user/hive/warehouse/youtube_category/.hive-staging_hive_2017-07-11_09-10-38_970_8191962088714912782-1/-ext-10000
Loading data to table default.youtube_category
Table default.youtube_category stats: [numFiles=4, numRows=6742209, totalSize=608748677, rawDataSize=10187373874]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 4 Cumulative CPU: 404.25 sec HDFS Read: 574634998 HDFS Write: 608749047 SUCCESS
Total MapReduce CPU Time Spent: 6 minutes 44 seconds 250 msec
OK
Time taken: 511.914 seconds
hive> select * from youtube_category where categoryId="music" order by views desc limit 10;
Query ID = hadoop_20170711091919_a48aa98b-9cfa-4f58-a106-9da85484f0dd
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0117, Tracking URL = http://master:8088/proxy/application_1499153664137_0117/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0117
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2017-07-11 09:20:37,607 Stage-1 map = 0%, reduce = 0%
...
2017-07-11 09:21:32,382 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 36.15 sec
MapReduce Total cumulative CPU time: 36 seconds 150 msec
Ended Job = job_1499153664137_0117
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3 Reduce: 1 Cumulative CPU: 36.15 sec HDFS Read: 607706126 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 36 seconds 150 msec
OK
Time taken: 96.836 seconds
hive> select * from youtube_category where catagoryId="Music" order by views desc limit 10;
FAILED: SemanticException [Error 10004]: Line 1:37 Invalid table alias or column reference 'catagoryId': (possible column names are: videoid, uploader, age, categoryid, length, views, rate, ratings, comments, relatedid)
hive> select * from youtube_category where catagoryid="Music" order by views desc limit 10;
FAILED: SemanticException [Error 10004]: Line 1:37 Invalid table alias or column reference 'catagoryid': (possible column names are: videoid, uploader, age, categoryid, length, views, rate, ratings, comments, relatedid)
hive> select * from youtube_category where categoryid="Music" order by views desc limit 10;
Query ID = hadoop_20170711092525_53f32ccf-7d04-4615-b446-9009cf16dc7f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0118, Tracking URL = http://master:8088/proxy/application_1499153664137_0118/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0118
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2017-07-11 09:26:08,727 Stage-1 map = 0%, reduce = 0%
...
2017-07-11 09:27:17,297 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 57.89 sec
MapReduce Total cumulative CPU time: 57 seconds 890 msec
Ended Job = job_1499153664137_0118
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3 Reduce: 1 Cumulative CPU: 58.14 sec HDFS Read: 607706243 HDFS Write: 3043 SUCCESS
Total MapReduce CPU Time Spent: 58 seconds 140 msec
OK
cQ25-glGRzI RCARecords 742 Music 227 77674728 4.45 188154 186858 ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
7AVHXe-ol-s internmarket 603 Music 264 60349673 3.16 1033 594 ["trrRo3kGRv0","CCsGkN1PEec","l5mQKJDO-nY","DP_4rQcU0G8","7xz5aOvP2YA","kCjKIus3iBA","nD0MILEXZP0","IQcyLMa716k","d05KUL8aW4E","szeeHnu2DM8","7b_wObF6vgs","UazqVaOg9uc","LRBD9l3Nv6Y","108YigkYFgM","_qCau44yfX0","NkkGj4_1m9A","kuyRSrklGU8","uLr7IJYyNnM","IEuyk_277xY","XmAaYo-CQNw"]
ePyRrb2-fzs 1988basti 826 Music 204 45984219 4.87 51039 27065 ["fm0T7_SGee4","h8oBykb_Pqs","iWg3IMN_rhU","MdOAr_4FJvc","xNp7OxgFJJM","nzaw_Gk9BAU","u9rl4apDFfY","fMzMm4sJYGo","b5eEa5a2Rio","aNR0tW_afdk","Kj_MceyjS6g","GojTUmjxVHU","CGPUuPHdHQg","SIM4DCn7AlE","ktUSIJEiOug","Dn6kGzRSjhU","gxxU68z8quM","HRSrFm_8YK8","2__Qdd11rfA","XewBty95JVg"]
xsRWpK4pf90 universalmusicgroup 902 Music 240 44614530 4.73 99373 53441 ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
ktUSIJEiOug aliciakeys 953 Music 251 43583367 4.85 103074 59672 ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
b3u65f4CRLk srcrecords 729 Music 256 43511791 4.76 55148 27818 ["4fZF9UsS8LY","5f7kfhHHH_A","H1vaszd6NnA","6Di-dAIR9RA","HiBcQeIax4g","JVciSFc5Wrw","Md6rURKhZmA","IG5ReXP0SSg","D9g2szHsoz0","EflOarvoeVs","7PxBGHjABnU","Gjq1g3j7WFc","mDvCcrU-Ob8","V9YE8dHMxvw","iWg3IMN_rhU","Un2dbprtZFE","WpgMuHwAdC4","RtQ5JENdx5I","ZEYgAgKVuO4","D_2jb8D8AOI"]
iWg3IMN_rhU TimbalandMusic 853 Music 216 43323757 4.8 58761 36142 ["SIM4DCn7AlE","GojTUmjxVHU","ePyRrb2-fzs","2__Qdd11rfA","P_TKpULdDvo","Dn6kGzRSjhU","1iLDIj0pDHk","xsRWpK4pf90","XewBty95JVg","y4jiyjDQGLY","h8oBykb_Pqs","xNp7OxgFJJM","b3u65f4CRLk","S-783rzHIS4","cZd1Js0QaOI","kZGEgVxyPHU","43o0vwAmFM8","BMMUWvavORI","v_-1peCW6Ok","9gkjyM7ZOUc"]
innfyQZHPpo chai0322 468 Music 210 41564032 2.61 13564 6126 ["nhSZs-aAZbo","hh0nVc0NYTE","12Z3J1uzd0Q","V1s9queYhF8","1Al4crLW9EM","61e1h4vALS0","8h98jb9Lk74","Z-HjmP7BCVc","o_pIQIV_NuU","82BDV1gYjdM","Af9aPlG2WFw","cC27PTFP4V8","-_CSo1gOd48","00c08ijIlHo","cIKxlWuTviE","dMH0bHeiRNg","iWg3IMN_rhU","v3ARyAb_1Bs","5P6UU6m3cqk","QjA5faZF1A8"]
Lt6o8NlrbHg seankingston 867 Music 257 41171303 4.74 101352 67579 ["qwflMOAaOf0","aVB5ViG_0yE","_QmajsQI9SM","Skz6h2gb-t8","t2dkCGDgVzk","HJgsbsMSe5w","BXhN2DbI0hc","sFJQAfW_jgw","RZJ32D-lrTE","U1bf7XLGGRE","ktUSIJEiOug","TWEez0U-9ag","BhkIjh-nuRo","a4X7eFbP3u4","b3u65f4CRLk","sm2fTDpuyyM","9pzro0T8rgY","TBzfqR4mrgE","c2cyose42jU","iWg3IMN_rhU"]
QjA5faZF1A8 guitar90 308 Music 320 40294882 4.83 329108 169563 ["r2BOApUvFpw","ATub40Npxik","m7Jh1BV1EOc","owAj5LiXG5w","Ddn4MGaS3N4","GxplDa3M5Io","by8oyJztzwo","aZpD0btOZx8","pZ9jrBg4Lwc","heISA256CRo","6wpPk8qk3uQ","i4BYMvVvMg0","2xjJXT0C0X4","p-VR-cXghko","-9ao_vOsZkg","wfqu4YEufYc","WetVXbYRfWk","JdxkVQy7QLM","fdjamy75C4A","dVUgd8ot6BE"]
Time taken: 111.631 seconds, Fetched: 10 row(s)

4.6 筛选出每个类别中评分最高的前10个视频

select * from youtube_category where categoryId="Music" order by ratings desc limit 10;

结果如下:

hive> select * from youtube_category where categoryId="Music" order by ratings desc limit 10;
Query ID = hadoop_20170711093838_6ab5573a-964f-486e-b0dc-77e5853fcfb5
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0119, Tracking URL = http://master:8088/proxy/application_1499153664137_0119/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0119
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2017-07-11 09:39:27,903 Stage-1 map = 0%, reduce = 0%
...
2017-07-11 09:40:36,298 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 57.41 sec
MapReduce Total cumulative CPU time: 57 seconds 410 msec
Ended Job = job_1499153664137_0119
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3 Reduce: 1 Cumulative CPU: 57.41 sec HDFS Read: 607706243 HDFS Write: 2957 SUCCESS
Total MapReduce CPU Time Spent: 57 seconds 410 msec
OK
QjA5faZF1A8 guitar90 308 Music 320 40294882 4.83 329108 169563 ["r2BOApUvFpw","ATub40Npxik","m7Jh1BV1EOc","owAj5LiXG5w","Ddn4MGaS3N4","GxplDa3M5Io","by8oyJztzwo","aZpD0btOZx8","pZ9jrBg4Lwc","heISA256CRo","6wpPk8qk3uQ","i4BYMvVvMg0","2xjJXT0C0X4","p-VR-cXghko","-9ao_vOsZkg","wfqu4YEufYc","WetVXbYRfWk","JdxkVQy7QLM","fdjamy75C4A","dVUgd8ot6BE"]
cQ25-glGRzI RCARecords 742 Music 227 77674728 4.45 188154 186858 ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
pv5zWaTEVkI OkGo 531 Music 184 32022043 4.83 121516 39974 ["dMH0bHeiRNg","x49WZRyXGe0","RbdbVhBGETQ","yRmqZRPgK1w","DjCL0_0Il7w","gq7r3F1SoX0","k66epna2Sss","1LNbzqoOPu4","vr3x_RRJdd4","ERV-wh4VwZI","xsRWpK4pf90","iAQZ_uui1SY","5P6UU6m3cqk","bav63MWNUKg"]
ktUSIJEiOug aliciakeys 953 Music 251 43583367 4.85 103074 59672 ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
Lt6o8NlrbHg seankingston 867 Music 257 41171303 4.74 101352 67579 ["qwflMOAaOf0","aVB5ViG_0yE","_QmajsQI9SM","Skz6h2gb-t8","t2dkCGDgVzk","HJgsbsMSe5w","BXhN2DbI0hc","sFJQAfW_jgw","RZJ32D-lrTE","U1bf7XLGGRE","ktUSIJEiOug","TWEez0U-9ag","BhkIjh-nuRo","a4X7eFbP3u4","b3u65f4CRLk","sm2fTDpuyyM","9pzro0T8rgY","TBzfqR4mrgE","c2cyose42jU","iWg3IMN_rhU"]
xsRWpK4pf90 universalmusicgroup 902 Music 240 44614530 4.73 99373 53441 ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
K2cYWfq--Nw FrEckleStudios 841 Music 224 17005186 4.82 90991 50788 ["SyIC3Munnyw","cZd1Js0QaOI","lLYD_-A_X5E","alqM0IYeH54","wQVEPFzkhaM","oGECJP3phyY","nPLOiBM8hLk","VpBDqtUEWcM","MJPdVVOmbz4","bPZJYQXQsm8","nPBmXEO3yUU","3jzSh_MLNcY","_EXeBLvmllg","Cva_sGN_0VA","Sr2JneittqQ","SYpYmkcadRA","71eBjNbdXDg","DgBgnoEY4iM","bl6RJyZdBSU","FAK_jtOf70g"]
-xEzGIuY7kw alyankovic 580 Music 171 24095019 4.84 90091 45517 ["HYokLWfqbaU","N26KWq7MmSc","JCAt9WcCFbM","Rt1_6uz_sVU","Nh9mVsBKwYs","p9Zt8mn14hY","8n7ncJEFuSw","FT060JGp9sQ","E6Zc9NyYH-k","ixyTNd-Ln38","zIllRdSzSug","GsfVw9xxoNY","XkDeJgGrtdU","fqz1ojIQTBk","5GE82tqcYYQ","ODdGhOOUOpI","v3ARyAb_1Bs","XbVtbc_XzrI","U1ULxKM75rY","Jw00EUh0GT4"]
EwTZ2xpQwpA TayZonday 796 Music 292 16841569 4.23 83514 129200 ["2x2W12A8Qow","P6dUCOS1bM0","NattlyH0IeM","nTQOpibv_OA","9mSKBgvHdoE","hjD6iigdB-g","caIBKOztlAo","xUz2YMmiq0k","aWY3eYOX3U0","mD5_GUovjiM","1oFS-q8BIps","deXAEN70CDY","0pElTyjfxe0","eyDuGwlrFRs","m6SjPfc_xNA","qYGvGWY1FDs","N0amCfgnwY8","JPu4uErBFks","pgSA-ErKd8c","ZZgGGlOGyUg"]
xWHf_vYZzQ8 universalmusicgroup 771 Music 262 25607299 4.79 83419 70529 ["jvz0bvYmnto","zElEs8yw7fw","9FcBnaLjxY4","95wgKdSJGDo","ueOgZPBXY4A","k3O5uy-MBBk","O3sGTaQ9s9c","aT434G38OBg","4PdDPrwIwhI","Y-VifE8EK8w","9wpxno6qUd0","B17SYROT3GI","wJtUuxmm-B0","LBDnkJ5h1ho","CfzpBjKpSPE","QF2AmC2xyXM","eoa6Gx4HxTc","mvPvcV44rCc"]
Time taken: 109.053 seconds, Fetched: 10 row(s)

4.7 找出用户中上传视频最多的10个用户的所有视频

思路:

  • 筛选出用户表中上传视频数最多的前十位

    select * from user order by videos desc limit 10;
  • 将筛选结果跟youtube3进行join得出结果

    select b.*,a.videos,a.friends from (select * from user order by videos desc limit 10) a join youtube3 b on a.uploader = b.uploader order by views desc limit 20;

结果如下:

hive> select b.*,a.videos,a.friends from (select * from user order by videos desc limit 10) a join youtube3 b on a.uploader = b.uploader order by views desc limit 20;
Query ID = hadoop_20170711101616_d9ebb69e-7fbe-4f09-ad55-54f4dfd11ebe
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0125, Tracking URL = http://master:8088/proxy/application_1499153664137_0125/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0125
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-07-11 10:17:30,594 Stage-1 map = 0%, reduce = 0%
2017-07-11 10:18:03,439 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.03 sec
2017-07-11 10:18:25,025 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 10.28 sec
MapReduce Total cumulative CPU time: 10 seconds 280 msec
Ended Job = job_1499153664137_0125
Stage-8 is filtered out by condition resolver.
Stage-9 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
17/07/11 10:18:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Execution log at: /tmp/hadoop/hadoop_20170711101616_d9ebb69e-7fbe-4f09-ad55-54f4dfd11ebe.log
2017-07-11 10:18:33 Starting to launch local task to process map join; maximum memory = 518979584
2017-07-11 10:18:44 Dump the side-table for tag: 0 with group count: 10 into file: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-11_10-16-50_653_2968482866464413222-1/-local-10007/HashTable-Stage-6/MapJoin-mapfile90--.hashtable
2017-07-11 10:18:44 Uploaded 1 File to: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-11_10-16-50_653_2968482866464413222-1/-local-10007/HashTable-Stage-6/MapJoin-mapfile90--.hashtable (611 bytes)
2017-07-11 10:18:44 End of local task; Time Taken: 11.244 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 3 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1499153664137_0126, Tracking URL = http://master:8088/proxy/application_1499153664137_0126/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0126
Hadoop job information for Stage-6: number of mappers: 4; number of reducers: 0
2017-07-11 10:19:25,285 Stage-6 map = 0%, reduce = 0%
...
2017-07-11 10:20:26,162 Stage-6 map = 100%, reduce = 0%, Cumulative CPU 44.09 sec
MapReduce Total cumulative CPU time: 44 seconds 90 msec
Ended Job = job_1499153664137_0126
Launching Job 4 out of 5
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0127, Tracking URL = http://master:8088/proxy/application_1499153664137_0127/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0127
Hadoop job information for Stage-3: number of mappers: 2; number of reducers: 1
2017-07-11 10:21:01,746 Stage-3 map = 0%, reduce = 0%
2017-07-11 10:21:30,462 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 4.27 sec
2017-07-11 10:21:48,968 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 6.36 sec
MapReduce Total cumulative CPU time: 6 seconds 360 msec
Ended Job = job_1499153664137_0127
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 10.28 sec HDFS Read: 9796823 HDFS Write: 434 SUCCESS
Stage-Stage-6: Map: 4 Cumulative CPU: 44.09 sec HDFS Read: 574641054 HDFS Write: 2133846 SUCCESS
Stage-Stage-3: Map: 2 Reduce: 1 Cumulative CPU: 6.36 sec HDFS Read: 2144607 HDFS Write: 6335 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 0 seconds 730 msec
OK
lUOe76YPY7M expertvillage 801 ["Howto","Style"] 119 1014983 3.36 596 689 ["5cEsa-rswrg","HOoQPbcF8Rk","ht3DMDx1spo","VFnf17cp3Eg","IP35rm_31RQ","hXuiuwZPX0M","s8pH5jgSuP8","KIEkwm_6DD4","ovFSahIPaVQ","rQtwcSDh3Hk","COeT3WR7SFc","ZNLBKRpWt7w","qncPBZ9DRRk","tOBqg3yDlyE","YgfWUXsbr7w","qBB3AsgjN1M","XX8ouWx6xm8","BtF9dYRuiGU","NRanI5qR81I","evfl0NGLjVE"] 86228 5659
2Aj2wx9PYxI expertvillage 815 ["Howto","Style"] 170 918553 3.82 1596 1533 ["hb5QaCfm7bg","YVDPSZe21KI","C1elpMjZ4wE","8gD1RG-tnNk","d-IDrRSVkCQ","IAOr6SYGAyw","3mbx03mP5eg","o-pN8qAiZhQ","K_qw03-3gFg","JgsO3A_vbtk","ffKr98Ium-M","WKYcBKa8_7Q","dgeX8CZmqvo","jgUCs72RDHs","PaNhROW-wEo","WhRCVm-1r2k","5I5O8P-r5Rk","reJSIZ3ugsE","-51iHvqP0Rs","9wItsn3r_kc"] 86228 5659
VBGHer3yFyc libertaddigitaltv 998 ["News","Politics"] 29 828616 4.68 1337 10763 ["HL9p8I3lOFA","Fb3HZ7K1gTk","AARX6XOA5h0","t3DPDKbRxio","VAmz8MNZdlk","WrC6_uBe4eA","d9X8DYOA5DE","pZuZv8UK7YA","NwswzwoA4pA","g1J2gvjp4fs","3SMhrQZdABc","h6M42Il4kN8","izM_JwEkWPs","_FM_IoPIFq4","uMZCCPtl_Do","5cZFinVFubQ","om0iHwO4d6o","lUZxlXkbaxM","iSJ3qKpC-3o","tRdq9_Si0Ws"] 6874 1
07jnqD8wvyE expertvillage 776 ["Howto","Style"] 217 574365 4.55 1073 1526 ["3wNpOp50uGI","-taU9d26wT4","_kly-fVUi1I","kZtRmnly_sU","LKe50AJvhz8","0GAUnuuBkW4","eM6Bpz6-dio","i9xf62PKC5M","mwFfk7igYC0","ilLTA4p4MMw","qyoLuTjguJA","aC-KOYQsIvU","0V6LWQZjYRk","_uZkvzYEXp0","auQbi_fkdGE","qgiUSEpg8Xc","4I79yc3uKfE","7Lz-XGjynN8","taHOwsdQXWI","2Aj2wx9PYxI"] 86228 5659
NDwZcLGekE8 expertvillage 801 ["Howto","Style"] 94 491582 4.35 553 588 ["CEmYX76UJy0","0sF-PoXR9ZU","qDfoNYpozxk","T4P3yp6mFfY","rzcj_Wq6BIE","A2LTVhqHAdo","4gyH0mJPqY8","m3PsQCMz3sY","wckYj_PAhgM","YkvwdM9Xstc","Ph2UAjGfeB4","vT_uXPfSAVE","89mAouUu8YM","0t5pPZ4AXX0","sLNFN75CYbw","QQGHXEPfMcQ","p6gcz4hkOmw","qk6R-X-YmUw","NR_9VlHFOu8","0gA_3BAxtVM"] 86228 5659
CEmYX76UJy0 expertvillage 801 ["Howto","Style"] 90 422284 4.48 448 459 ["NDwZcLGekE8","jRAKgd50Jh0","0sF-PoXR9ZU","IjjJWXgqWL4","m3PsQCMz3sY","8Di58l1dfPA","DpEVqy954OY","dmYMeImNy1s","qk6R-X-YmUw","wckYj_PAhgM","qZSD3JLPURk","YkvwdM9Xstc","fdSFh_z0HJY","n5LDyWyHJTo","Dmm5O1DPQYc","A2LTVhqHAdo","I2mM0r76b7o","iE16Z9Gh3TQ","rzcj_Wq6BIE","8szPfWiK-ag"] 86228 5659
E6VWi7IaroA expertvillage 946 ["Howto","Style"] 228 338889 4.46 300 251 ["fLy3M2jfe4w","cC1FBmbYgMk","xEJhInSxsQg","RxMLoqzKZ8s","Z9a--4RQ9O8","7neOXKLrgzk","JS-lS9ngMtY","aKIFfgPLY8I","EsqPNhtkWIo","NJUPNtZN2rc","vV7pajY0zOk","GTtZapaRDbg","-0UVDD6ZwYE","B9jNaUP59Vo","2QDGPe50E4Y","lJDZO_pX4aw","wuD2CemlvP0","ZTx2ZFI1spg","CzZghYdupiQ","5ueEKSJ8J-U"] 86228 5659
0sF-PoXR9ZU expertvillage 801 ["Sports"] 77 326365 4.21 512 663 ["24FJ58Q22D4","m3PsQCMz3sY","qk6R-X-YmUw","CEmYX76UJy0","NDwZcLGekE8","QiIunH47qew","I2mM0r76b7o","c_qSPdLPReM","5eYBaQ5Cg-c","fdSFh_z0HJY","8szPfWiK-ag","Dmm5O1DPQYc","A2LTVhqHAdo","n-Cvzk84X8o","T4P3yp6mFfY","1vo3CCBIcaY","fnGFR1AFCJc","n5LDyWyHJTo","YkvwdM9Xstc","emNfLmLTQb0"] 86228 5659
ohdiRHLSw5Y expertvillage 447 ["Howto","Style"] 78 321521 3.79 139 102 ["12_nJamoyTk","dXLhjYgMZ68","ukzFwRoFj4k","sOUgrJo2kIg","HhO39nCDfMg","3iVP0tzwhVc","MFNA0PqLynY","ZlcMzLhiBjg","7F8ajh_DDYs","6vaPIz6S6sk","-L_uhGXOtF0","-libzR5AV58","CSdSH7XKTtQ","KZX5jXPAWIk","Vkj5fbrQpNs","YCgnFIk5Acg","hjPDmVf6KJw","BZnhMl85dq4","iuqVpMdb1NM","GO2_3q6euug"] 86228 5659
0sF-PoXR9ZU expertvillage 801 ["Howto","Style"] 77 294737 4.21 438 599 ["24FJ58Q22D4","m3PsQCMz3sY","CEmYX76UJy0","NDwZcLGekE8","A2LTVhqHAdo","qk6R-X-YmUw","5eYBaQ5Cg-c","fdSFh_z0HJY","jGdXcitOUzY","n-Cvzk84X8o","wckYj_PAhgM","I2mM0r76b7o","QMZaxtjhZ5k","Dmm5O1DPQYc","vT_uXPfSAVE","8szPfWiK-ag","QiIunH47qew","YkvwdM9Xstc","DpEVqy954OY","c_qSPdLPReM"] 86228 5659
3Xc-kxSznZ0 expertvillage 430 ["Howto","Style"] 107 292329 3.33 565 964 ["9YAKfkVvvEc","wEkcPjBaHjs","zMl3ixv1kHw","Q4ZUPEgbzu0","-jIEGZwLPvo","YxdxGLBCCRA","Jp0LaJ_ftT0","PgCRWvwdFHM","yZL2MOFk6I0","qDJk3-ofbk0","e4nHAAuDiPE","cuRk9bTbU6A","z53cPMciVio","MREw0dIWaQ0","237AQAvXtw8","a3dnOupmt7Y","3-va5g_QVss","mFdaiY8YhEY","D6li4tYmKf0","bEVHBHKqBfQ"] 86228 5659
VxYkdycN3WE expertvillage 801 ["Howto","Style"] 77 257702 4.19 153 137 ["ZpCE7mnYJK0","NYDdCGtbr8E","904S9W9AZg8","Rl5KEODnCfI","7-i2gl9288I","TC1XlhRwhsU","rD_4nCKf5Ns","pFKZZ120fyU","oN4Esih54V8","N1Ov6cCUXkc","rDM97S4jPiw","IK2uLNND3i0","OZXbM6kRuuA","5pvyXkFTa18","bExUJAnCLZw","VoREkZvkyI8","BWhfpzAItx4","EEybT3IeyzA","QteH0KOJx0g","dbXHvd_SGkk"] 86228 5659
-libzR5AV58 expertvillage 447 ["Howto","Style"] 391 249682 3.34 73 62 ["-L_uhGXOtF0","4LvTYBbgMjE","-wO07skEauo","CSdSH7XKTtQ","kzQDoXKDgTM","e1kzLj-FZ6k","Guhgb3pWRAQ","KIbyySDSGEc","DGUBzs-GNxE","KZX5jXPAWIk","ohdiRHLSw5Y","6vaPIz6S6sk","5MrAuq_F2dU","gOv-yPqZ3LE","BZnhMl85dq4","oCDdDCqygOg","12_nJamoyTk","7F8ajh_DDYs","HhO39nCDfMg","X3-6IT-7S80"] 86228 5659
HL9p8I3lOFA libertaddigitaltv 998 ["News","Politics"] 17 235313 4.61 114 581 ["VBGHer3yFyc","Fb3HZ7K1gTk","Z3Pe8ff37gk","t3DPDKbRxio","pZuZv8UK7YA","_FM_IoPIFq4","1P_EQjd7lq4","yEzNx2g7ors","_neKPvLdG8Y","7i_WDprxACU","zCM0nsym0cE","om0iHwO4d6o","h6M42Il4kN8","Bjk1McjrWtg","VAmz8MNZdlk","-t2hwljZmA8","gH6PHFrA6g8","ysAwzxqbWD8","U5lX_EIFOOA","4zu_X_3qHLA"] 6874 1
oE5ZhBHy6Rs expertvillage 776 ["Howto","Style"] 143 234551 4.64 358 280 ["rCWRUtqJgzI","476vNb6thyM","eg3eo8x_9rw","grK43Poye1U","HGncDLMNPRI","iylaKSDfLSc","JFqiDcvRW2Y","XrbXGRLIDTc","NmxoR9lsu-c","JW8FJefw82k","n0Mi550FDng","QXsW44Wh6tY","6oJDYT5MlEA","IJy05ssELhg","TvT-M7tTQHE","GYNVEWCO_X0","5pJ_o19GVRA","q_Nn_CnvD9s","uqpQNkFxVAU","KKP8vWeEc30"] 86228 5659
wOnP_oAXUMA expertvillage 429 ["Howto","Style"] 120 228842 3.56 236 601 ["BlDWdfTAx8o","WF1kRLJ_4B8","6bu9csQC45c","xTAYAl4g7HE","871VTEF7qIY","UheCchftswc","TKg_cdwq9l4","ADQ4Bjq42wE","cSJCDcAKShA","SYklbxHP2tI","fktuPPNkQpQ","L6267-JZu3E","_IwJ3Aj2XrQ","hf-4ppNmH-U","aO8evzfTR8E","E-kNUEv0YgA","Hsr1-xcXFL8","VCiLZMjo_PQ","rMEgKgZOJi8","YeYU_OE5oIU"] 86228 5659
MVPCc4eODys expertvillage 448 ["Howto","Style"] 69 225879 4.29 171 315 ["iPPgmvcsJNw","xc8sysT31mQ","svt_fnKE_80","kBjUDCyDCuI","ysa50-plo48","yutDlQL9Ct0","gCra4qOrjFw","CN1QJDGinwE","dFJjaj7pXsA","LvJzFdcYSag","4I79yc3uKfE","AcryZ1o4RQE","WyyB0M6wAV8","A0sOVKbYR20","8XCyd0XEejc","5BmVKKpu_CU","8_QNzZ9WPRo","pNiX_l-HEGM","CQJSZs-euZU","ZwHHYzK8UCg"] 86228 5659
7neOXKLrgzk expertvillage 946 ["Howto","Style"] 111 215943 4.07 215 314 ["xEJhInSxsQg","PSR5kinMX3Y","E6VWi7IaroA","5ovGW0CzQp4","DeMcNhz0Vz8","CzZghYdupiQ","72EYsQEw35k","tScm-eZInBE","Hz4lnt7d88s","dgLUbXSqwSs","G3P4VFrB_zM","pU0O2-o67C0","pDGP8Q3Zzb8","hQhdK8l6CuY","VK9VflDsunI","LWgzG8I9bWY","csL7WNTWK9U","LE93Q5k5-fI","Uirspi_t3xM","wGLNJK4zcdE"] 86228 5659
AaO1aLwk53Y expertvillage 443 ["Howto","Style"] 156 204450 4.14 210 290 ["Q8YYZGvKM-8","Bn59zha-uAQ","BjnkKuI-YAY","aQeXHXkL6ow","Qtp2ibwd2Ms","nstbrkjk3OM","Odj4ATUPh9w","v2mm2-9JQ-I","328YVJVtMKU","B-jzkyKxDLo","yB-yGNxNEZg","b0msCCnzmNA","sfh5AJhRto8","SEILiSkGzrY","jm4uxgVDU6o","svOBjuvvy-k","y_1nj8fBQOE","zb8ArkvxOxo","VgWayeJdV0w","Yor1dHH6orE"] 86228 5659
m3PsQCMz3sY expertvillage 801 ["Howto","Style"] 99 203912 4.38 177 207 ["CEmYX76UJy0","0sF-PoXR9ZU","NDwZcLGekE8","qk6R-X-YmUw","DpEVqy954OY","Ph2UAjGfeB4","fdSFh_z0HJY","YkvwdM9Xstc","8szPfWiK-ag","Dmm5O1DPQYc","T4P3yp6mFfY","I2mM0r76b7o","7GtVwKZBf9U","mugebyUqK6o","qDfoNYpozxk","XA7dRqkp8YA","a28QcMXjHyI","n5LDyWyHJTo","ovoNU_CbmfA","Q2j6MlACIoc"] 86228 5659
Time taken: 300.414 seconds, Fetched: 20 row(s)

4.8 筛选出每个类别中观看数Top10

** select a.* from (select videoId,categoryId,views,row_number() over(partition by categoryId order by views desc) rank from youtube_category) a where rank<=10;**

hive> select a.* from (select videoId,categoryId,views,row_number() over(partition by categoryId order by views desc) rank from youtube_category) a where rank<=10;
Query ID = hadoop_20170713170101_76ffdd80-e72c-4c7f-b9ad-9b8c3f7e17ab
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 3
Starting Job = job_1499153664137_0143, Tracking URL = http://master:8088/proxy/application_1499153664137_0143/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job -kill job_1499153664137_0143
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 3
2017-07-13 17:01:52,690 Stage-1 map = 0%, reduce = 0%
2017-07-13 17:02:24,374 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 10.49 sec
...
2017-07-13 17:03:08,803 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 89.07 sec
MapReduce Total cumulative CPU time: 1 minutes 29 seconds 70 msec
Ended Job = job_1499153664137_0143
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3 Reduce: 3 Cumulative CPU: 89.07 sec HDFS Read: 73070026 HDFS Write: 7487 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 29 seconds 70 msec
OK
qruSOZq-wJg Activism 2673823 1
tFxk7glmMbo Activism 1063928 2
dYN278GB2Kg Activism 831836 3
n-Akqik3hME Activism 778987 4
LtqRAYWjz2Q Activism 768422 5
2v-whrgAbf4 Activism 742172 6
ja4d02s0WKQ Activism 733014 7
IpAOYskH1s8 Activism 714580 8
ieuVTOJyLBs Activism 640994 9
6CXBgbP_ZKs Activism 562163 10
12Z3J1uzd0Q Animation 65341925 1
bFytHZHFXhA Animation 38937813 2
sdUUx5FdySs Animation 16151661 3
vQbYvjmfbr4 Animation 12114231 4
rNKefRRvV7g Animation 11884311 5
6HIavxnUHls Animation 11735507 6
gm5DHKI8o5o Animation 9381674 7
316BF17k5d8 Animation 9372064 8
ZuQMn6Z0k_w Animation 9195588 9
6B26asyGKDo Animation 8915176 10
dMH0bHeiRNg Comedy 79897120 1
5P6UU6m3cqk Comedy 42525795 2
Tx1XIm6q4r4 Comedy 38378910 3
Q5im0Ssyyus Comedy 21170471 4
k66epna2Sss Comedy 17944367 5
AYxu_MQSTTY Comedy 15665340 6
pYak2F1hUYA Comedy 15634357 7
_OBlgSz8sSM Comedy 14567743 8
wCF3ywukQYA Comedy 14227802 9
sHzdsFiBbFc Comedy 13606292 10
Mz9BlgiV45I Education 2807232 1
VnGb6UQvwJs Education 2441432 2
j6lADdhmAec Education 2306689 3
pYfTQpsErsM Education 2066429 4
7GNE6AhL0qI Education 2037257 5
NiGZf15W9xc Education 1867126 6
-FeAK-q5Cok Education 1741504 7
N5P6o2YaqVI Education 1656512 8
Dl-agXA63nw Education 1529566 9
JX3VmDgiFnY Education 1373008 10
cQ25-glGRzI Music 77674728 1
7AVHXe-ol-s Music 60349673 2
ePyRrb2-fzs Music 45984219 3
xsRWpK4pf90 Music 44614530 4
ktUSIJEiOug Music 43583367 5
b3u65f4CRLk Music 43511791 6
iWg3IMN_rhU Music 43323757 7
innfyQZHPpo Music 41564032 8
Lt6o8NlrbHg Music 41171303 9
QjA5faZF1A8 Music 40294882 10
qruSOZq-wJg Nonprofits 2673823 1
tFxk7glmMbo Nonprofits 1063928 2
dYN278GB2Kg Nonprofits 831836 3
n-Akqik3hME Nonprofits 778987 4
LtqRAYWjz2Q Nonprofits 768422 5
2v-whrgAbf4 Nonprofits 742172 6
ja4d02s0WKQ Nonprofits 733014 7
IpAOYskH1s8 Nonprofits 714580 8
ieuVTOJyLBs Nonprofits 640994 9
6CXBgbP_ZKs Nonprofits 562163 10
SkELRp4wKPs Politics 12730681 1
AFFQrUyi8-s Politics 11953598 2
YgW7or1TuFk Politics 8516433 3
JgiGrXpOhYg Politics 8211043 4
up5jmbSjWkw Politics 7869023 5
hr23tpWX8lM Politics 7209885 6
Kje7NUNebL8 Politics 6707868 7
I4u3449L5VI Politics 6675798 8
jjXyqcx-mYY Politics 6462051 9
a9Vde3FHMmc Politics 6028642 10
8h98jb9Lk74 UNA 33880568 1
AR5yq0aMxI8 UNA 18409445 2
DH7FVB15EPU UNA 15487752 3
Qvfx6UiAAG0 UNA 13977592 4
n-FdoZdfpmE UNA 12241555 5
MddPeH1DAvY UNA 12211192 6
sI8Sus_KRpY UNA 10460047 7
14oqm9ywLIw UNA 10344366 8
LytRWuhn4BM UNA 9899270 9
iSByZARPJSU UNA 9316897 10
Oro28yg7W74 Gaming 727015 1
0to1HDxtkM4 Gaming 529346 2
M-w-gLVfi30 Gaming 396471 3
i6aH2F1WjsY Gaming 382350 4
EgbUSsblCSQ Gaming 266718 5
7SMnWr9MqwQ Gaming 210147 6
NQMBIRipp5A Gaming 207783 7
vi1lVqJSbsM Gaming 186709 8
aId2hK4pI2M Gaming 186054 9
tWPWcyLdrko Gaming 181979 10
sLGLum5SyKQ Howto 31121122 1
eMhGpzyFdhE Howto 16320501 2
KPOOWvP_dd8 Howto 11131527 3
91wuBqlny50 Howto 8579613 4
mr5ghuaTK14 Howto 8361812 5
GfPJeDssBOM Howto 6101232 6
6gmP4nk0EOE Howto 4970382 7
mM-30cmM33s Howto 4936417 8
STQ3nhXuuEM Howto 4655542 9
XZGgeGHU1Bs Howto 4424698 10
LU8DDYz68kM Pets 27721690 1
epUk3T2Kfno Pets 10352882 2
z3U0udLH974 Pets 9461084 3
kkT7A3jegBc Pets 9269896 4
TZ860P4iTaM Pets 9009434 5
7tRWRSfcDuQ Pets 8538635 6
Qit3ALTelOo Pets 7939352 7
Zi9GOvR3Ynw Pets 7351184 8
Kxa0mnDj0bs Pets 7289545 9
PadauuWF94w Pets 6271287 10
W1czBcnX1Ww Science 3234852 1
D99NHb6B03s Science 3176792 2
tk_F2Y-F2kE Science 3121903 3
nhyH7lQ6D2k Science 2879861 4
JCbKv9yiLiQ Science 2672391 5
U5vs2ly_grk Science 2611389 6
M0ODskdEPnQ Science 2555284 7
p4ebtj1jR7c Science 2536109 8
8wTlureUMP8 Science 2477804 9
NZNTgglPbUA Science 2230729 10
vt4X7zFfv4k Sports 12598542 1
P-bWsOK-h98 Sports 12101588 2
OS5tQvQOB-Y Sports 9047732 3
P9LmHXXWiJs Sports 7415813 4
NIKdK-T-jZM Sports 7175403 5
euMu1SKi-ak Sports 7040023 6
zKQgTiqhPbw Sports 7017699 7
yeXoxNP8_xY Sports 6422039 8
q8t7iSGAKik Sports 6312667 9
COcczatkNP4 Sports 6258611 10
sLGLum5SyKQ Style 31121122 1
eMhGpzyFdhE Style 16320501 2
KPOOWvP_dd8 Style 11131527 3
91wuBqlny50 Style 8579613 4
mr5ghuaTK14 Style 8361812 5
GfPJeDssBOM Style 6101232 6
6gmP4nk0EOE Style 4970382 7
mM-30cmM33s Style 4936417 8
STQ3nhXuuEM Style 4655542 9
XZGgeGHU1Bs Style 4424698 10
ZeBd_F2Bz5Y Vehicles 8623041 1
ju6t-yyoU8s Vehicles 7325889 2
uUurALr_Ckk Vehicles 7258142 3
WShY1ObPvhQ Vehicles 7047156 4
q3idQKi5EqM Vehicles 6470816 5
9JWywFpZkg4 Vehicles 6465830 6
tth9krDtxII Vehicles 6394563 7
npTRXr4Sgxg Vehicles 5450317 8
aCamHfJwSGU Vehicles 5354163 9
S-ppGiwc0wQ Vehicles 5039415 10
LU8DDYz68kM Animals 27721690 1
epUk3T2Kfno Animals 10352882 2
z3U0udLH974 Animals 9461084 3
kkT7A3jegBc Animals 9269896 4
TZ860P4iTaM Animals 9009434 5
7tRWRSfcDuQ Animals 8538635 6
Qit3ALTelOo Animals 7939352 7
Zi9GOvR3Ynw Animals 7351184 8
Kxa0mnDj0bs Animals 7289545 9
PadauuWF94w Animals 6271287 10
ZeBd_F2Bz5Y Autos 8623041 1
ju6t-yyoU8s Autos 7325889 2
uUurALr_Ckk Autos 7258142 3
WShY1ObPvhQ Autos 7047156 4
q3idQKi5EqM Autos 6470816 5
9JWywFpZkg4 Autos 6465830 6
tth9krDtxII Autos 6394563 7
npTRXr4Sgxg Autos 5450317 8
aCamHfJwSGU Autos 5354163 9
S-ppGiwc0wQ Autos 5039415 10
v3ARyAb_1Bs Blogs 31812447 1
5GE82tqcYYQ Blogs 30209692 2
ervaMPt4Ha0 Blogs 23859297 3
uWow42TCwzg Blogs 22389389 4
-_CSo1gOd48 Blogs 21176701 5
D2kJZOfq7zk Blogs 20458159 6
nhSZs-aAZbo Blogs 20423158 7
GuMMfgWhm3g Blogs 18634621 8
4jbkRGPxvaM Blogs 15980887 9
hMnk7lh9M3o Blogs 13553069 10
LpAI8TzQDes Entertainment 65078772 1
244qR7SvvX0 Entertainment 57790943 2
1uwOL4rB-go Entertainment 39883413 3
w2xUzv6iZWo Entertainment 25222946 4
vr3x_RRJdd4 Entertainment 25093671 5
lj3iNxZ8Dww Entertainment 23737579 6
RB-wUgnyGv0 Entertainment 23067889 7
lsO6D1rwrKc Entertainment 21758300 8
7iYWxfNSjYk Entertainment 19029677 9
5pGJCkCDK5A Entertainment 18858695 10
p0aQvKDA1K0 Events 12239023 1
bNF_P281Uu4 Events 9125026 2
AlPqL7IUT6M Events 6441558 3
J833f9fqWBA Events 4776832 4
xIvIWJbzimo Events 4284770 5
QuTj9a04o-s Events 4053317 6
eejQPUyeNiY Events 3573812 7
3QL97xldoXc Events 3010296 8
r43yCiKlbCo Events 2806995 9
z42fchrzhHY Events 2565257 10
12Z3J1uzd0Q Film 65341925 1
bFytHZHFXhA Film 38937813 2
sdUUx5FdySs Film 16151661 3
vQbYvjmfbr4 Film 12114231 4
rNKefRRvV7g Film 11884311 5
6HIavxnUHls Film 11735507 6
gm5DHKI8o5o Film 9381674 7
316BF17k5d8 Film 9372064 8
ZuQMn6Z0k_w Film 9195588 9
6B26asyGKDo Film 8915176 10
SkELRp4wKPs News 12730681 1
AFFQrUyi8-s News 11953598 2
YgW7or1TuFk News 8516433 3
JgiGrXpOhYg News 8211043 4
up5jmbSjWkw News 7869023 5
hr23tpWX8lM News 7209885 6
Kje7NUNebL8 News 6707868 7
I4u3449L5VI News 6675798 8
jjXyqcx-mYY News 6462051 9
a9Vde3FHMmc News 6028642 10
v3ARyAb_1Bs People 31812447 1
5GE82tqcYYQ People 30209692 2
ervaMPt4Ha0 People 23859297 3
uWow42TCwzg People 22389389 4
-_CSo1gOd48 People 21176701 5
D2kJZOfq7zk People 20458159 6
nhSZs-aAZbo People 20423158 7
GuMMfgWhm3g People 18634621 8
4jbkRGPxvaM People 15980887 9
hMnk7lh9M3o People 13553069 10
W1czBcnX1Ww Technology 3234852 1
D99NHb6B03s Technology 3176792 2
tk_F2Y-F2kE Technology 3121903 3
nhyH7lQ6D2k Technology 2879861 4
JCbKv9yiLiQ Technology 2672391 5
U5vs2ly_grk Technology 2611389 6
M0ODskdEPnQ Technology 2555284 7
p4ebtj1jR7c Technology 2536109 8
8wTlureUMP8 Technology 2477804 9
NZNTgglPbUA Technology 2230729 10
p0aQvKDA1K0 Travel 12239023 1
bNF_P281Uu4 Travel 9125026 2
AlPqL7IUT6M Travel 6441558 3
J833f9fqWBA Travel 4776832 4
xIvIWJbzimo Travel 4284770 5
QuTj9a04o-s Travel 4053317 6
eejQPUyeNiY Travel 3573812 7
3QL97xldoXc Travel 3010296 8
r43yCiKlbCo Travel 2806995 9
z42fchrzhHY Travel 2565257 10
Time taken: 121.381 seconds, Fetched: 250 row(s)

5 总结

上面的8个例子向大家展示Hive中简单和稍微复杂的操作,有的启动一个Job就可以完成,有的则需要启动多个Job才能完成,我们也可以看到,启动的Job越多运行时间就会越长,但是实际工作中的操作只会远比我们所演示要更加复杂,越是复杂的操作就更加需要去优化,来达到减少运行时间的目的,所以下一篇我们来看看Hive的优化实践

Hive实战之Youtube数据集的更多相关文章

  1. Spark入门实战系列--5.Hive(下)--Hive实战

    [注]该系列文章以及使用到安装包/测试数据 可以在<倾情大奉送--Spark入门实战系列>获取 1.Hive操作演示 1.1 内部表 1.1.1 创建表并加载数据 第一步   启动HDFS ...

  2. 60分钟内从零起步驾驭Hive实战学习笔记

    本博文的主要内容是: 1. Hive本质解析 2. Hive安装实战 3. 使用Hive操作搜索引擎数据实战 SparkSQL前身是Shark,Shark强烈依赖于Hive.Spark原来没有做SQL ...

  3. 60分钟内从零起步驾驭Hive实战学习笔记(Ubuntu里安装mysql)

    本博文的主要内容是: 1. Hive本质解析 2. Hive安装实战 3. 使用Hive操作搜索引擎数据实战 SparkSQL前身是Shark,Shark强烈依赖于Hive.Spark原来没有做SQL ...

  4. Hive实战UDF 外部依赖文件找不到的问题

    目录 关于外部依赖文件找不到的问题 为什么要使用外部依赖 为什么idea 里面可以运行上线之后不行 依赖文件直接打包在jar 包里面不香吗 学会独立思考并且解决问题 继承DbSearcher 读取文件 ...

  5. Hive实战—时间滑动窗口计算

    关注公众号:大数据技术派,回复: 资料,领取1024G资料. 目录 时间滑动计算 外部调用实现时间循环 自关联实现滑动时间窗口 扩展基于自然周的的滚动时间窗口计算 总结 时间滑动计算 今天遇到一个需求 ...

  6. Hive 实战(1)--hive数据导入/导出基础

    前沿: Hive也采用类SQL的语法, 但其作为数据仓库, 与面向OLTP的传统关系型数据库(Mysql/Oracle)有着天然的差别. 它用于离线的数据计算分析, 而不追求高并发/低延时的应用场景. ...

  7. Hive 实战(2)--hive分区分桶实战

    前言: 互联网应用, 当Mysql单机遇到性能瓶颈时, 往往采用的优化策略是分库分表. 由于互联网应用普遍的弱事务性, 这种优化效果非常的显著.而Hive作为数据仓库, 当数据量达到一定数量时, 查询 ...

  8. Python之大数据库hive实战

    今天和大家分享的是Python如何连接hive数据库来进行hivesql的查询操作.   step1:环境准备 Python版本:3.6.2 Windows版本:Windows10版本的64位 ste ...

  9. HIve实战分析Hadoop的日志

    1.日志格式分析首先分析 Hadoop 的日志格式, 日志是一行一条, 日志格式可以依次描述为:日期.时间.级别.相关类和提示信息.如下所示: -03-06 15:23:48,132 INFO org ...

随机推荐

  1. 教你怎么把iconfont转换成png透明图片

    1.进入iconfont图标库,登陆.http://www.iconfont.cn/ 2.选择想要的图标加入购物车. 3.直接选中下载图标为png格式.

  2. 打印杨辉三角--for循环

    要求打印7行直角杨辉三角 杨辉三角特点: 第1行和第2行数字都为1: 从第三行开始,除去开头和结尾数字为1,中间数字为上一行斜对角两个数字的和. 如下图: 打印结果: 代码如下: package 杨辉 ...

  3. css块级元素居中

    <!DOCTYPE html> <html> <head> <title>index</title> </head> <b ...

  4. C# 委托的理解

    1.什么是委托 委托可以理解为持有一个或多个方法的对象.如果执行委托的话,委托会 执行它所"持有"的方法.委托可以避免程序中大量使用if-else语句,使 程序拥有更好的扩展性. ...

  5. android蓝牙学习

    学习路线 1 蓝牙权限 <uses-permission android:name="android.permission.BLUETOOTH" /> <uses ...

  6. solr5Ik分词2

    <!--IK分词器--><fieldType name="text_ik" class="solr.TextField"><ana ...

  7. 收集一些工作中常用的经典SQL语句

    作为一枚程序员来说和数据库打交道是不可避免的,现收集一下工作中常用的SQL语句,希望能给大家带来一些帮助,当然不全面,欢迎补充! 1.执行插入语句,获取自动生成的递增的ID值 INSERT INTO ...

  8. openvpn配置注意事项

    1.安装VPN安装结束后,需要配置CONFIG文件夹服务端及客户端的配置文件,建议从sample文件里直接拷贝修改,网上的一些案例会引起无法启动的问题,没仔细研究过是哪里错了,反正最后从sample里 ...

  9. Linux crontab定时执行任务命令格式与详细例子

    基本格式 : * * * * * command 分 时 日 月 周 命令 第1列表示分钟1-59 每分钟用*或者 */1表示 第2列表示小时1-23(0表示0点) 第3列表示日期1-31 第4列表示 ...

  10. CSS3 @keyframes 规则

    今天来给大家分享一下CSS3 @keyframes 规则! 在你了解CSS3 @keyframes 规则时我先来给大家说说什么是css3中的动画 动画是使元素从一种样式逐渐变化为另一种样式的效果. 您 ...