第二章 impala基本使用

1、impala的使用

1.1、impala-shell语法

1.1.1、impala-shell的外部命令参数语法

不需要进入到impala-shell交互命令行当中即可执行的命令参数

impala-shell后面执行的时候可以带很多参数：

-h 查看帮助文档

impala-shell -h

[root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -h

Usage: impala_shell.py [options]

Options:

  -h, --help            show this help message and exit

  -i IMPALAD, --impalad=IMPALAD

                        <host:port> of impalad to connect to

                        [default: node03.hadoop.com:21000]

  -q QUERY, --query=QUERY

                        Execute a query without the shell [default: none]

  -f QUERY_FILE, --query_file=QUERY_FILE

                        Execute the queries in the query file, delimited by ;.

                        If the argument to -f is "-", then queries are read

                        from stdin and terminated with ctrl-d. [default: none]

  -k, --kerberos        Connect to a kerberized impalad [default: False]

  -o OUTPUT_FILE, --output_file=OUTPUT_FILE

                        If set, query results are written to the g

-r 刷新整个元数据，数据量大的时候，比较消耗服务器性能

impala-shell -r

#结果

[root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -r

Starting Impala Shell without Kerberos authentication

Connected to node03.hadoop.com:21000

Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)

Invalidating Metadata

***********************************************************************************

Welcome to the Impala shell.

(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)

The HISTORY command lists all shell commands in chronological order.

***********************************************************************************

+==========================================================================+

| DEPRECATION WARNING:                                                     |

| -r/--refresh_after_connect is deprecated and will be removed in a future |

| version of Impala shell.                                                 |

+==========================================================================+

Query: invalidate metadata

Query submitted at: 2019-08-22 14:45:28 (Coordinator: http://node03.hadoop.com:25000)

Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=ce4db858e1dfd774:814fabac00000000

Fetched 0 row(s) in 5.04s

-B 去格式化，查询大量数据可以提高性能

--print_header 去格式化显示列名

--output_delimiter 指定分隔符

-v 查看对应版本

impala-shell -v -V

#结果

[root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -v -V

Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018

-f 执行查询文件

--query_file 指定查询文件

cd /export/servers

vim impala-shell.sql

#写入下面两段话

use weblog;

select * from ods_click_pageviews limit 10;

#赋予可执行权限

chmod 755 imapala-shell.sql 

#通过-f 参数来执行执行的查询文件

impala-shell -f impala-shell.sql

#结果

[root@node03 hivedatas]# impala-shell -f imapala-shell.sql

Starting Impala Shell without Kerberos authentication

Connected to node03.hadoop.com:21000

Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)

Query: use hivesql

Query: select * from ods_click_pageviews limit 10

Query submitted at: 2019-08-22 15:29:54 (Coordinator: http://node03.hadoop.com:25000)

Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=6a4d51930cf99b9d:21f02c4e00000000

+--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+

| session                              | remote_addr     | remote_user | time_local          | request                    | visit_step | page_staylong | http_referer                                                                                                                                                                                                                                                                                                                    | http_user_agent                                                                                                                                                                                   | body_bytes_sent | status | datestr  |

+--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+

| d1328698-d475-4973-86ee-15ad9da8c860 | 1.80.249.223    | -           | 2013-09-18 07:57:33 | /hadoop-hive-intro/        | 1          | 60            | "http://www.google.com.hk/url?sa=t&rct=j&q=hive%E7%9A%84%E5%AE%89%E8%A3%85&source=web&cd=2&ved=0CC4QFjAB&url=%68%74%74%70%3a%2f%2f%62%6c%6f%67%2e%66%65%6e%73%2e%6d%65%2f%68%61%64%6f%6f%70%2d%68%69%76%65%2d%69%6e%74%72%6f%2f&ei=5lw5Uo-2NpGZiQfCwoG4BA&usg=AFQjCNF8EFxPuCMrm7CvqVgzcBUzrJZStQ&bvm=bv.52164340,d.aGc&cad=rjt" | "Mozilla/5.0(WindowsNT5.2;rv:23.0)Gecko/20100101Firefox/23.0"                                                                                                                                     | 14764           | 200    | 20130918 |

| 0370aa09-ebd6-4d31-b6a5-469050a7fe61 | 101.226.167.201 | -           | 2013-09-18 09:30:36 | /hadoop-mahout-roadmap/    | 1          | 60            | "http://blog.fens.me/hadoop-mahout-roadmap/"

-i 连接到impalad

--impalad 指定impalad去执行任务

-o 保存执行结果到文件当中去

--output_file 指定输出文件名

impala-shell -f impala-shell.sql -o fizz.txt

#结果

[root@node03 hivedatas]# impala-shell -f imapala-shell.sql -o fizz.txt

Starting Impala Shell without Kerberos authentication

Connected to node03.hadoop.com:21000

Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)

Query: use hivesql

Query: select * from ods_click_pageviews limit 10

Query submitted at: 2019-08-22 15:31:45 (Coordinator: http://node03.hadoop.com:25000)

Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=7c421ab5d208f3b1:dec5a09300000000

Fetched 10 row(s) in 0.13s

#当前文件夹多了一个 fizz.txt 文件

[root@node03 hivedatas]# ll

total 2592

-rw-r--r-- 1 root root     511 Aug 21  2017 dim_time_dat.txt

-rw-r--r-- 1 root root    9926 Aug 22 15:31 fizz.txt

-rwxr-xr-x 1 root root      57 Aug 22 15:29 imapala-shell.sql

-rwxrwxrwx 1 root root     133 Aug 20 00:36 movie.txt

-rw-r--r-- 1 root root   18372 Jun 17 18:33 pageview2

-rwxr-xr-x 1 root root     154 Aug 20 00:32 test.txt

-rw-r--r-- 1 root root     327 Aug 20 02:37 user_table

-rw-r--r-- 1 root root   10361 Jun 18 09:00 visit2

-rw-r--r-- 1 root root 2587511 Jun 17 18:05 weblog2

-p 显示查询计划

impala-shell -f impala-shell.sql -p

-q 执行片段sql语句

impala-shell -q "use hivesql;select * from ods_click_pageviews limit 10;"

[root@node03 hivedatas]# impala-shell -q "use hivesql;select * from ods_click_pageviews limit 10;"

Starting Impala Shell without Kerberos authentication

Connected to node03.hadoop.com:21000

Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)

Query: use hivesql

Query: select * from ods_click_pageviews limit 10

Query submitted at: 2019-08-22 15:36:58 (Coordinator: http://node03.hadoop.com:25000)

Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=b443d56565419f60:a149235700000000

+--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+

| session                              | remote_addr     | remote_user | time_local          | request                    | visit_step | page_staylong | http_referer                                                                                                                                                                                                                                                                                                                    | http_user_agent                                                                                                                                                                                   | body_bytes_sent | status | datestr  |

1.1.2、impala-shell的内部命令行参数语法

进入impala-shell命令行之后可以执行的语法

进入impala-shell：

impala-shell  #任意目录

#结果

[root@node03 hivedatas]# impala-shell

Starting Impala Shell without Kerberos authentication

Connected to node03.hadoop.com:21000

Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)

***********************************************************************************

Welcome to the Impala shell.

(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)

To see more tips, run the TIP command.

***********************************************************************************

[node03.hadoop.com:21000] >

help命令

帮助文档

[node03.hadoop.com:21000] > help;

Documented commands (type help <topic>):

========================================

compute  describe  explain  profile  rerun   set    show  unset  values   with

connect  exit      history  quit     select  shell  tip   use    version

Undocumented commands:

======================

alter   delete  drop  insert  source  summary  upsert

create  desc    help  load    src     update

connect命令

connect hostname 连接到某一台机器上面去执行

connect node02;

#结果

[node03.hadoop.com:21000] > connect node02;

Connected to node02:21000

Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)

[node02:21000] >

refresh命令

refresh dbname.tablename 增量刷新，刷新某一张表的元数据，主要用于刷新hive当中数据表里面的数据改变的情况

用于刷新hive当中数据表里面的数据改变的情况

refresh movie_info;

#结果

[node03:21000] > refresh movie_info;

Query: refresh movie_info

Query submitted at: 2019-08-22 15:49:24 (Coordinator: http://node03.hadoop.com:25000)

Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=f74330d533ff2402:27364f7600000000

Fetched 0 row(s) in 0.27s

invalidate metadata 命令：

invalidate metadata全量刷新，性能消耗较大，主要用于hive当中新建数据库或者数据库表的时候来进行刷新

invalidate metadata;

#结果

[node03:21000] > invalidate metadata;

Query: invalidate metadata

Query submitted at: 2019-08-22 15:48:04 (Coordinator: http://node03.hadoop.com:25000)

Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=6a431748d41bc369:7eeb053400000000

Fetched 0 row(s) in 2.87s

explain 命令：

用于查看sql语句的执行计划

explain select * from stu;

#结果

[node03:21000] > explain select * from user_table;

Query: explain select * from user_table

+------------------------------------------------------------------------------------+

| Explain String                                                                     |

+------------------------------------------------------------------------------------+

| Max Per-Host Resource Reservation: Memory=0B                                       |

| Per-Host Resource Estimates: Memory=32.00MB                                        |

| WARNING: The following tables are missing relevant table and/or column statistics. |

| hivesql.user_table                                                                 |

|                                                                                    |

| PLAN-ROOT SINK                                                                     |

| |                                                                                  |

| 01:EXCHANGE [UNPARTITIONED]                                                        |

| |                                                                                  |

| 00:SCAN HDFS [hivesql.user_table]                                                  |

|    partitions=1/1 files=1 size=327B                                                |

+------------------------------------------------------------------------------------+

Fetched 11 row(s) in 3.99s

explain的值可以设置成0,1,2,3等几个值，其中3级别是最高的，可以打印出最全的信息

set explain_level=3;

#结果

[node03:21000] > set explain_level=3;

EXPLAIN_LEVEL set to 3

[node03:21000] >

profile命令：

执行sql语句之后执行，可以打印出更加详细的执行步骤，

主要用于查询结果的查看，集群的调优等

select * from user_table;

profile;

#部分结果截取

[node03:21000] > profile;

Query Runtime Profile:

Query (id=ff4799938b710fbb:7997836800000000):

  Summary:

    Session ID: a14d3b3894050309:7f300ddf8dcd8584

    Session Type: BEESWAX

    Start Time: 2019-08-22 15:58:22.786612000

    End Time: 2019-08-22 15:58:24.558806000

    Query Type: QUERY

    Query State: FINISHED

    Query Status: OK

    Impala Version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)

    User: root

    Connected User: root

    Delegated User:

    Network Address: ::ffff:192.168.52.120:48318

    Default Db: hivesql

    Sql Statement: select * from user_table

    Coordinator: node03.hadoop.com:22000

    Query Options (set by configuration): EXPLAIN_LEVEL=3

    Query Options (set by configuration and planner): EXPLAIN_LEVEL=3,MT_DOP=0

    Plan:

注意:在hive窗口当中插入的数据或者新建的数据库或者数据库表，在impala当中是不可直接查询到的，需要刷新数据库，在impala-shell当中插入的数据，在impala当中是可以直接查询到的，不需要刷新数据库，其中使用的就是catalog这个服务的功能实现的，catalog是impala1.2版本之后增加的模块功能，主要作用就是同步impala之间的元数据

1.2、创建数据库

1.1.1进入impala交互窗口

impala-shell #进入到impala的交互窗口

1.1.2查看所有数据库

show databases;

1.1.3创建与删除数据库

创建数据库

CREATE DATABASE IF NOT EXISTS mydb1;

drop database  if exists  mydb;

1.3、创建数据库表

创建student表

CREATE TABLE IF NOT EXISTS mydb1.student (name STRING, age INT, contact INT );

创建employ表

create table employee (Id INT, name STRING, age INT,address STRING, salary BIGINT);

1.3.1、数据库表中插入数据

insert into employee (ID,NAME,AGE,ADDRESS,SALARY)VALUES (1, 'Ramesh', 32, 'Ahmedabad', 20000 );

insert into employee values (2, 'Khilan', 25, 'Delhi', 15000 );

Insert into employee values (3, 'kaushik', 23, 'Kota', 30000 );

Insert into employee values (4, 'Chaitali', 25, 'Mumbai', 35000 );

Insert into employee values (5, 'Hardik', 27, 'Bhopal', 40000 );

Insert into employee values (6, 'Komal', 22, 'MP', 32000 );

数据的覆盖

Insert overwrite employee values (1, 'Ram', 26, 'Vishakhapatnam', 37000 );

执行覆盖之后，表中只剩下了这一条数据了

另外一种建表语句

create table customer as select * from employee;

1.3.2、数据的查询

select * from employee;

select name,age from employee;

1.3.3、删除表

DROP table  mydb1.employee;

1.3.4、清空表数据

truncate  employee;

1.3.5、创建视图

CREATE VIEW IF NOT EXISTS employee_view AS select name, age from employee;

1.3.6、查看视图数据

select * from employee_view;

1.4、order by语句

基础语法

select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]

Select * from employee ORDER BY id asc;

1.5、group by 语句

Select name, sum(salary) from employee Group BY name;

1.6、 having 语句

基础语法

select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]

按年龄对表进行分组，并选择每个组的最大工资，并显示大于20000的工资

select max(salary) from employee group by age having max(salary) > 20000

1.7、 limit语句

select * from employee order by id limit 4;

2、impala当中的数据表导入几种方式

第一种方式，通过load hdfs的数据到impala当中去

create table user(id int ,name string,age int ) row format delimited fields terminated by "\t";

准备数据user.txt并上传到hdfs的 /user/impala路径下去

上传user.txt到hadoop上去：

hdfs dfs -put user.txt /user/impala/

查看是否上传成功：

hdfs dfs -ls /user/impala

1       kasha   15

2       fizz        20

3       pheonux    30

4       manzi  50

加载数据

load data inpath '/user/impala/' into table user;

查询加载的数据

select  *  from  user;

如果查询不不到数据，那么需要刷新一遍数据表

refresh  user;

第二种方式：

create  table  user2   as   select * from  user;

第三种方式：

insert  into  #不推荐使用 因为会产生大量的小文件

千万不要把impala当做一个数据库来使用

第四种方式：

insert  into  select  #用的比较多

第二章 impala基础使用的更多相关文章

java面向对象编程——第二章 java基础语法
第二章 java基础语法 1. java关键字 abstract boolean break byte case catch char class const continue default do ...
RxJava2实战--第二章 RxJava基础知识
第二章 RxJava基础知识 1. Observable 1.1 RxJava的使用三步骤创建Observable 创建Observer 使用subscribe()进行订阅 Observable.j ...
(转)JAVA AJAX教程第二章-JAVASCRIPT基础知识
开篇:JAVASCRIPT是AJAX技术中不可或缺的一部分,所以想学好AJAX以及现在流行的AJAX框架,学好JAVASCRIPT是最重要的.这章我给大家整理了一些JAVASCRIPT的基础知识.常用 ...
《SQL 基础教程》第二章:查询基础
这一章的结构如下: SELECT 语句基础算术运算符和比较运算符逻辑运算符 SELECT 语句可用于查询数据,并且可以设定条件来查询具有特定值的记录.条件的设定就需要算数运算符.比较运算符和逻辑运 ...
第二章----python基础
概要:python是一种计算机编程语言,有自己的一套语法,编译器或者解释器负责把符合语法的程序代码翻译成CPU能识别的机器码,然后执行.python使用缩进来组织代码块,Python程序中大小写是敏感 ...
第二章 python基础（一）
第一节 Python文件类型源代码 Python源代码的文件以“py”为扩展名,由Python程序解释,不需要编译字节代码 Python源文件经编译后生成的扩展名为“pyc”的文件编译方法 im ...
第二章 Python基础语法
2.1 环境的安装解释器:py2 / py3 (环境变量) 开发工具:pycharm 2.2 编码编码基础 ascii ,英文.符号,8位为一个东西,2**8 unicode ,万国码,可以表示所 ...
第二章 XHTML基础
1.一个网页,也就是一个XHTML文档,是由元素组成.元素定义了文本和图形在XHTML文档中的结构.XHTML文档的扩展名通常是.html或者htm. 2.XHTML元素使用XHTML标记定义,每个标 ...
第二章 Servlet基础
这章我们主要的目标理解Servlet Servlet的编码和部署 Servlet生命周期 Servlet的配置 Servlet与容器交互什么是Servlet -是运行在Web服务器或应用服务 ...

随机推荐

HDU 5863 cjj's string game ( 16年多校10 G 题、矩阵快速幂优化线性递推DP )
题目链接题意 : 有种不同的字符,每种字符有无限个,要求用这k种字符构造两个长度为n的字符串a和b,使得a串和b串的最长公共部分长度恰为m,问方案数分析 : 直觉是DP 不过当时看到 n 很大.但 ...
[Luogu] Mayan游戏
https://www.luogu.org/problemnew/show/P1312 太恶心了 #include <cstdio> #include <algorithm> ...
List对象遍历时null判断
使用for循环遍历list处理list元素时,对null值判断: 1.list为null时空指针异常 2.list不为空,但是list.size()=0时,不执行for循环内代码块 3.list.si ...
codeforces#1148E. Earth Wind and Fire（贪心）
题目链接: http://codeforces.com/contest/1148/problem/E 题意: 给出两个长度为$n$的序列,将第一个序列变成第二个序列,顺序不重要,只需要元素完全相同即可 ...
python上下文小记
python访问数据库是日常编程中经常使用的,一般的代码如下: 可以看出样例代码还是比较多的,有没有办法优化一下呢?有! def send_msgs(conn_id=None, **kwargs): ...
Spring学习随笔(2)：Eclipse下Spring环境配置+入门项目
1 准备工作 (按需下载) Eclipse 下载:http://www.eclipse.org/downloads/eclipse-packages/ : Spring 下载:http://repo. ...
Linux设备驱动程序之 RCU机制
读取-复制-更新(read-copy-update,RCU)是一种高级的互斥机制,在正确的条件下,可以获得高的性能: RCU对它保护的数据结构做了一些限定,它针对经常发生读而很少发生写的情况做了优化, ...
Beta冲刺（1/5）
队名:new game 组长博客:戳作业博客:戳组员情况鲍子涵(队长) 过去两天完成了哪些任务验收游戏素材学习Unity 2D Animation系统接下来的计划制作游戏需要的人物动画 ...
最大生成树+map实现技巧
POJ2263 //#include<bits/stdc++.h> #include<iostream> #include<cstdio> #include< ...
3.JSON使用
把 JSON 文本转换为 JavaScript 对象 JSON 最常见的用法之一,是从 web 服务器上读取 JSON 数据(作为文件或作为 HttpRequest),将 JSON 数据转换为 Jav ...

第二章 impala基础使用