Hive命令执行

打开任意一个安装了hive的服务器，进入hive bin 路径，可以看到存在以下文件（仅展示部分）：

-rwxr-xr-x 1 root root 1297 Jun 28 14:29 beeline

-rwxr-xr-x 1 root root 2487 Jun 28 14:29 beeline.cmd

-rwxr-xr-x 1 root root 9627 Nov 18 11:21 hive

-rwxr-xr-x 1 root root 8365 Jun 28 14:29 hive.cmd

-rwxr-xr-x 1 root root 1523 Jun 28 14:29 hive-config.cmd

-rwxr-xr-x 1 root root 1958 Nov 18 11:21 hive-config.sh

-rwxr-xr-x 1 root root 885 Jun 28 14:29 hiveserver2

-rwxr-xr-x 1 root root 546 Nov 18 11:21 hiveserver2-listener.sh

-rwxr-xr-x 1 root root 1030 Jun 28 14:29 hplsql

-rwxr-xr-x 1 root root 2220 Jun 28 14:29 hplsql.cmd

-rwxr-xr-x 1 root root 543 Nov 18 11:21 metastore-listener.sh

-rwxr-xr-x 1 root root 832 Jun 28 14:29 metatool

-rwxr-xr-x 1 root root 884 Jun 28 14:29 schematool

要连接hive执行sql主要使用hive和beeline，Beeline主要是开发来与新服务器进行交互。Hive CLI是基于 Apache Thrift的客户端，而Beeline是基于SQLLine CLI的JDBC客户端。Hive CLI使用HiveServer1使用Thrift协议连接到远程Hiveserver1实例。Beeline使用JDBC连接到远程HiveServer2实例。以下围绕hive和beeline展开介绍。

交互执行

直接使用 Hive 命令，不加任何参数，即可进入交互式命令行。本次案例主要围绕非交互进行介绍，交互式在此不再介绍。

非交互执行

Hive

路径下hive是一个shell工具，它可以用来运行于交互或批处理方式配置单元查询。Hive指令可以用./hive –H查看。

-d,--define <key=value> 　　　　　　　　　　　　　　 Variable subsitution to apply to hive

　　　　　　　　　　　　　　　　　　commands. e.g. -d A=B or --define A=B

--database <databasename> 　　　　　　　　　　　　 Specify the database to use -- 指定使用的数据库

-e <quoted-query-string> 　　　　　　　　　　　　　　SQL from command line -- 执行指定的 SQL

-f <filename> 　　　　　　　　　　　　　　　　SQL from files --执行 SQL 脚本

-H,--help 　　　　　　　　　　　　　　　　 Print help information -- 打印帮助信息

--hiveconf <property=value> 　　　　　　　　　　　　　　Use value for given property --自定义配置

--hivevar <key=value> 　　　　　　　　　　　　　　　 Variable subsitution to apply to hive --自定义变量

　　　　　　　　　　　　　　　　　　 commands. e.g. --hivevar A=B

-i <filename> 　　　　　　　　　　　　　　　 Initialization SQL file --在进入交互模式之前运行初始化脚本

-S,--silent 　　　　　　　　　　　　　　　　 Silent mode in interactive shell --静默模式

-v,--verbose 　　　　　　　　　　　　　　　　 Verbose mode (echo executed SQL to the console) --详细模式

使用hive无需输入用户名和密码即可连接执行sql语句，bin执行命令。

l 指定数据库并查询当前数据库下的表

./hive --database tpcds_database_parquet_100 -e "show tables"

l 执行sql语句

./hive -e "show databases;"

l 执行单个sql脚本

./hive --database tpcds_database_parquet_100 -v -f "/home/hive-testbench-hive14/sample-queries-tpcds/query12.sql"

Sql脚本可以是hdfs文件，./hive -f hdfs://SERVICE-HADOOP-admin/tmp/simple.sql;

l 运行结果写入文件

./hive --database tpcds_database_parquet_100 -v -e "show tables;" > ./a.txt

说明：-e 与 -f 不能同时使用。-i 能够与任意参数同时使用。多个 –i 的实例可以用来执行多个初始化脚本

beeline

Hive 内置了 HiveServer 和 HiveServer2 服务，两者都允许客户端使用多种编程语言进行连接，但是 HiveServer 不能处理多个客户端的并发请求，所以产生了HiveServer2, 支持多客户端并发访问和身份验证。HiveServer2 拥有自己的 CLI(Beeline)，Beeline 是一个基于 SQLLine 的 JDBC 客户端。在 Hive CLI 中支持的参数，Beeline 都支持。./beeline -help查看beeline语法规则：

-u <database url> 　　　　　　　　　　　　　　　　　　the JDBC URL to connect to

-r 　　　　　　　　　　　　　　　　　　　　　　　　　reconnect to last saved connect url (in conjunction with !save)

-n <username>　　　　　　　　　　　　　　　　　　　the username to connect as

-p <password> 　　　　　　　　　　　　the password to connect as

-d <driver class> 　　　　　　　　　　　　the driver class to use

-i <init file> 　　　　　　　　　　　　script file for initialization

-e <query> 　　　　　　　　　　　　　　query that should be executed

-f <exec file> 　　　　　　　　　　　　script file that should be executed

-w (or) --password-file <password file> 　　　　　　　　the password file to read password from

--hiveconf property=value 　　　　　　　　　　　　Use value for given property

--hivevar name=value 　　　　　　　　　　　　hive variable name and value

This is Hive specific settings in which variables

can be set at session level and referenced in Hive

commands or queries.

--property-file=<property-file> 　　　　　　　　　　　the file to read connection properties (url, driver, user, password) from

--color=[true/false] 　　　　　　　　　　 control whether color is used for display

--showHeader=[true/false] 　　　　　　　　　　show column names in query results

--headerInterval=ROWS; 　　　　　　　　　　the interval between which heades are displayed

--fastConnect=[true/false] 　　　　　　　　　　skip building table/column list for tab-completion

--autoCommit=[true/false] 　　　　　　　　　　enable/disable automatic transaction commit

--verbose=[true/false] 　　　　　　　　　　 show verbose error messages and debug info

--showWarnings=[true/false] 　　　　　　　　　　 display connection warnings

--showDbInPrompt=[true/false] 　　　　　　　　　　display the current database name in the prompt

--showNestedErrs=[true/false] 　　　　　　　　　　display nested errors

--numberFormat=[pattern] 　　　　　　　　　　 format numbers using DecimalFormat pattern

--force=[true/false] 　　　　　　　　　　 continue running script even after errors

--maxWidth=MAXWIDTH 　　　　　　　　　　the maximum width of the terminal

--maxColumnWidth=MAXCOLWIDTH 　　　　　　　the maximum width to use when displaying columns

--silent=[true/false] 　　　　　　　　　　　　　be more silent

--autosave=[true/false] 　　　　　　　　　　　　automatically save preferences

--outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv] 　　format mode for result display

Note that csv, and tsv are deprecated - use csv2, tsv2 instead

--incremental=[true/false] 　　　　　　　　　　　　Defaults to false. When set to false, the entire result set

is fetched and buffered before being displayed, yielding optimal

display column sizing. When set to true, result rows are displayed

immediately as they are fetched, yielding lower latency and

memory usage at the price of extra display column padding.

Setting --incremental=true is recommended if you encounter an OutOfMemory

on the client side (due to the fetched result set size being large).

Only applicable if --outputformat=table.

--incrementalBufferRows=NUMROWS 　　　　　　　　the number of rows to buffer when printing rows on stdout,

defaults to 1000; only applicable if --incremental=true

and --outputformat=table

--truncateTable=[true/false] 　　　　　　　　　　　　truncate table column when it exceeds length

--delimiterForDSV=DELIMITER 　　　　　　　　　　specify the delimiter for delimiter-separated values output format (default: |)

--isolation=LEVEL 　　　　　　　　　　　　　set the transaction isolation level

--nullemptystring=[true/false] 　　　　　　　　　　　　set to true to get historic behavior of printing null as empty string

--showConnectedUrl=[true/false] 　　　　　　　　　　Prompt HiveServer2s URI to which this beeline connected.

Only works for HiveServer2 cluster mode.

--maxHistoryRows=MAXHISTORYROWS 　　　　　　The maximum number of rows to store beeline history.

--convertBinaryArrayToString=[true/false] 　　　　　　display binary column data as string or as byte array

--help display this message

Example:

1. Connect using simple authentication to HiveServer2 on localhost:10000

$ beeline -u jdbc:hive2://localhost:10000 username password

2. Connect using simple authentication to HiveServer2 on hs.local:10000 using -n for username and -p for password

$ beeline -n username -p password -u jdbc:hive2://hs2.local:10012

3. Connect using Kerberos authentication with hive/localhost@mydomain.com as HiveServer2 principal

$ beeline -u "jdbc:hive2://hs2.local:10013/default;principal=hive/localhost@mydomain.com

4. Connect using SSL connection to HiveServer2 on localhost at 10000

$ beeline jdbc:hive2://localhost:10000/default;ssl=true;sslTrustStore=/usr/local/truststore;trustStorePassword=mytruststorepassword

5. Connect using LDAP authentication

　$ beeline -u jdbc:hive2://hs2.local:10013/default <ldap-username> <ldap-password>

参数详解

选项	描述
-u <database URL>	用于JDBC URL连接。用例：beeline -u db_URL
-r	重新连接到最近使用过的URL（如果用户有预先使过的用的，用!connect生成URL，用!save 生成beeline.properties.file）。用例: beeline -r Version: 2.1.0
-n <username>	连接时使用的用户名。用例: beeline -n valid_user
-p <password>	连接时使用的密码。用例: beeline -p valid_password 可选的密码模式: 从Hive 2.2.0开始参数-p选项是可选的。用例 : beeline -p [valid_password] 如果密码不是在-p之后提供的，Beeline将在初始化连接时提示输入密码。当密码提供后Beeline会用它来初始化连接而不提示。
-d <driver class>	配置使用的驱动类用例: beeline -d driver_class
-e <query>	应该执行的查询。查询语句两端用单引号和双引号。这个选项被使用多次。用例: beeline -e "query_string" 支持运行复杂的SQL语句，在一个语句中通过使用分号分隔。 Bug fix (null pointer exception): 0.13.0 Bug fix (--headerInterval not honored): 0.14.0 Bug fix (running -e in background): 1.3.0 and 2.0.0; workaround available for earlier versions
-f <file>	需要被执行的脚本文件。用例: beeline -f filepath Version: 0.12.0 注：如果脚本里面包含tabs，版本0.12.0中查询编译失败，这个bug已经在0.13.0版本修复了。 Bug fix (running -f in background): 1.3.0 and 2.0.0; workaround available for earlier versions
-i (or) --init <file or files>	初始化需要的初始文件。用例: beeline -i /tmp/initfile 单个文件:Version: 0.14.0 多个文件:Version: 2.1.0
-w (or) --password-file <password file>	从文件中读取密码。Version: 1.2.0
-a (or) --authType <auth type>	jdbc的认证类型是一个身份认证属性。Version: 0.13.0
--property-file <file>	读取配置属性的文件用例: beeline --property-file /tmp/a Version: 2.2.0
--hiveconf property=value	为给定的配置属性赋值。在hive.conf.restricted.list列表中的属性不能通过hiveconf的方式重置。用例: beeline --hiveconf prop1=value1 Version: 0.13.0
--hivevar name=value	Hive的变量名和变量值。这是一个Hive指定的设置，在这变量能够在会话级别被设置和被Hive命令和查询引用。用例: beeline --hivevar var1=value1
--color=[true/false]	制颜色是否被用来展示。默认是false 用例: beeline --color=true (不支持分隔的值输出方式)
--showHeader=[true/false]	展示列名是否在查询结果中。默认是true。用例: beeline --showHeader=false
--headerInterval=ROWS	当输出为表格时，重新显示列头时他们之间的间隔，用行数计算。默认值为100 用例: beeline --headerInterval=50 (不支持分隔的值输出方式)
--fastConnect=[true/false]	连接时，跳过为HiveQL语法的tab键自动补全功能而构建所有表和列的清单，默认为true不构建该列表。用例: beeline --fastConnect=false
--autoCommit=[true/false]	允许或者禁止自动事务执行。默认是false 用例: beeline --autoCommit=true
--verbose=[true/false]	展示冗长的报错信息和调试信息（true）或者不展示（false），默认是false 用例: beeline --verbose=true
--showWarnings=[true/false]	Default is false.连接时，在执行任意HiveQL命令后展示警告信息。默认是false。用例: beeline --showWarnings=true
--showDbInPrompt=[true/false]	在提示符里面展示当前数据库名字。默认是false。用例: beeline --showDbInPrompt=true
--showNestedErrs=[true/false]	展示内部错误，默认是false。用例: beeline --showNestedErrs=true
--numberFormat=[pattern]	用一个小数格式的模板来格式化数字。用例: beeline --numberFormat="#,###,##0.00"
--force=[true/false]	出错后继续运行脚本（true），或者不运行（false）。默认是false。用例: beeline--force=true
--maxWidth=MAXWIDTH	当输出格式是表格时，在截断数据前展示的最大宽度。默认是查询时的终端的当前宽度，然后回到80。用例: beeline --maxWidth=150
--maxColumnWidth=MAXCOLWIDTH	当输出是表格时，最大列宽，Hive 2.2.0以后默认是50，之前的版本是15。用例: beeline --maxColumnWidth=25
--silent=[true/false]	是（true）否（false）减少展示的信息量。它也会停止展示HiveServer2（Hive 0.14及之后的版本）的查询和命令（Hive 1.2.0及之后的版本）日志信息，默认是false。用例: beeline --silent=true
--autosave=[true/false]	自动保存参数选择（true）或者不保存（false）。默认是false。用例: beeline --autosave=true
--outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2]	结果展示的模式。默认是表格。查阅下方的Separated-Value Output Formats获取更多信息和推荐选项。用例: beeline --outputformat=tsv
--truncateTable=[true/false]	如果是true，那么当表格超出终端显示宽度时，截断表格的列在终端上展示。
--delimiterForDSV= DELIMITER	用于输出格式中划分值的界定符。默认是‘\|’
--isolation=LEVEL	设置事务隔离级别为TRANSACTION_READ_COMMITTED或者TRANSACTION_SERIALIZABLE. 可以查阅Java连接文档中“Field Detail”那一章节。用例: beeline --isolation=TRANSACTION_SERIALIZABLE
--nullemptystring=[true/false]	使用历史的打印空字符null的形式(true)还是使用当前打印空值的方式(false)，默认是false。用例: beeline --nullemptystring=false
--incremental=[true/false]	从Hive 2.3版本往后默认是true，在它之前是默认为false。当设置为false时，为了最佳的展示列宽，完整的结果集会在展示之前被收集然后缓存起来。当设置为true时，结果集一旦被抓取到就会立即展示，为了在展示列的填充额外消耗更少的延迟和内存。当你在客户端遭遇一个内存溢出时，推荐设置--incremental=true (因为抓取到的结果集非常大)。
--incrementalBufferRows=NUMROWS	当打印行到标准输出时，保存在缓存中的行数，默认是1000。只有当 --incremental=true 和 --outputformat=table才适用。用例: beeline --incrementalBufferRows=1000
--maxHistoryRows=NUMROWS	存储Beeline 历史记录的最大行数。
--delimiter=;	设置Beeline的查询语句分隔符。允许用多个字符的分隔符，但是引号，斜杠和--是不允许的，默认是分号; 用例: beeline --delimiter=$$
--convertBinaryArrayToString=[true/false]	展示二进制列数据为字符串或者位矩阵。用例: beeline --convertBinaryArrayToString=true
--help	展示一个帮助信息。用例: beeline --help

l 连接hive

./beeline -u jdbc:hive2://ip:10000 -n
admin -p hhh12345+

或者

./beeline -n admin -p hhh12345+ -d
"org.apache.hive.jdbc.HiveDriver" -u
"jdbc:hive2://ip:10000/default"

l 执行sql语句

./beeline -u jdbc:hive2://ip:10000 -n
admin -p hhh12345+ -e "show databases"

l 执行sql脚本

./beeline -u
jdbc:hive2://ip:10000/tpcds_database_parquet_100 -n admin -p hhh12345+ --silent=true -f "/home/hive-testbench-hive14/sample-queries-tpcds/query12.sql"

Jmeter hive sql执行

以上hive语句均可以在jmeter ssh command中执行。

jmeter ssh command方式执行hive指令的更多相关文章

linux ssh 登录同时执行其他指令
目的:懒的敲一些重复的指令,比如登录后cd到某个目录. 咋办: ssh -t user@xxx.xxx.xxx.xxx "cd /directory_wanted ; bash" ...
watch---周期性的方式执行给定的指令
watch命令以周期性的方式执行给定的指令,指令输出以全屏方式显示. 选项 -n:指定指令执行的间隔时间(秒): -d:高亮显示指令输出信息不同之处: -t:不显示标题.
通过 SSH 隧道方式图形化连接 AIX 服务器
跳转到主要内容登录 (或注册) 中文 [userid] IBM ID: 密码: 保持登录. 单击提交则表示您同意developerWorks 的条款和条件. 查看条款和条件. 需要一个 IBM ID ...
vagrant启动报错The following SSH command responded with a no
vagrant package打包生成box,以这个box为基础模板,打造vagrant环境,启动vagrant报错 angel:vagrant $ vagrant up Bringing machi ...
ssh连接远程主机执行脚本的环境变量问题
近日在使用ssh命令ssh user@remote ~/myscript.sh登陆到远程机器remote上执行脚本时,遇到一个奇怪的问题: ~/myscript.sh: line n: app: co ...
[ 转载 ] ssh连接远程主机执行脚本的环境变量问题
近日在使用ssh命令ssh user@remote ~/myscript.sh登陆到远程机器remote上执行脚本时,遇到一个奇怪的问题: ~/myscript.sh: line n: app: co ...
解决SSH登录用户执行的命令部分环境变量参数不生效的问题
问题概况 linux机器在/etc/profile配置完成环境变量后,SSH到目标机器执行命令,但是获取不到已配置的环境变量值. 例如场景: 在/etc/profile配置了http代理 export ...
hive的shell用法（脑子糊涂了，对着脚本第一行是 #!/bin/sh 疯狂执行hive -f 结果报错）
hive脚本的执行方式 hive脚本的执行方式大致有三种: hive控制台执行: hive -e "SQL"执行: hive -f SQL文件执行:参考hive用法: usage: ...
C++中执行windows指令
执行windows指令: BOOL ExecDosCmd(]) { SECURITY_ATTRIBUTES sa; HANDLE hRead,hWrite; sa.nLength = sizeof(S ...

随机推荐

2021年1月-第02阶段-前端基础-HTML+CSS进阶-VS Code 软件
软件安装 VSCode软件能够安装 VS Code 能够熟练使用 VS Code 软件能够安装 VS Code 最常用的插件 1. VS Code简介 1.1 VS Code 简介 Visual ...
.NET 分布式系统架构（有转载部分）
一.设计目的搭建一个大型平台需要综合考虑很多方面,不单纯是软件架构,还包括网络和硬件设备等.由于现代大部分应用建设都面临用户多.高并发.高可用的需求,传统软件架构已不能满足需求,需要支持分布式软件架 ...
【c++ Prime 学习笔记】第13章拷贝控制
定义一个类时,可显式或隐式的指定在此类型对象上拷贝.移动.赋值.销毁时做什么.通过5种成员函数实现拷贝控制操作: 拷贝构造函数:用同类型的另一个对象初始化本对象时做什么(拷贝初始化) 拷贝赋值算符:将 ...
2021北航敏捷软工Beta阶段评分与总结
概述 Beta 阶段评分,按照之前的规则,主要组成部分为: 博客部分,基于 Beta 阶段博客的评分(每篇正规博客 10 分,每篇 Scrum5 分,评定方式类比往年) 评审部分,基于 Beta 阶段 ...
[技术博客]在团队中使用Pull Request来管理代码
在团队中使用Pull Request来管理代码前言在参加多人共同开发项目,且选用Git作为代码托管工具的时候,我们不免会遇到分支冲突.覆盖.合并等问题.显然,因为同一个仓库是属于大家的,所以每个人 ...
elasticsearch使用ik中文分词器
elasticsearch使用ik中文分词器一.背景二.安装 ik 分词器 1.从 github 上找到和本次 es 版本匹配上的分词器 2.使用 es 自带的插件管理 elasticsearc ...
Noip模拟72 2021.10.9
T1 出了个大阴间题真就以为他出了个大阴间题就没写,打个暴力就跑了数据范围显然摆明是状压设$f[sta][0/1]$表示在已经选择的集合$sta$中,$A$的最大值是$A$还是$A+1$ 然后按 ...
单片机stm32串口分析
stm32作为现在嵌入式物联网单片机行业中经常要用多的技术,相信大家都有所接触,今天这篇就给大家详细的分析下有关于stm32的出口,还不是很清楚的朋友要注意看看了哦,在最后还会为大家分享有些关于stm ...
Python爬取COVID-19疫情监控实战
一.项目概述本项目基于Python.Flask.Echarts打造的一个疫情监控系统,涉及技术: Python网络爬虫 Python与Mysql数据库交互使用Flask构建web项目基于Echa ...
攻防世界 web3.backup
如果网站存在备份文件,常见的备份文件后缀名有:.git ..svn..swp..~..bak..bash_history..bkf尝试在URL后面,依次输入常见的文件备份扩展名. ip/index.p ...

jmeter ssh command方式执行hive指令