SparkSQL使用之Spark SQL CLI

Spark SQL CLI描述

Spark SQL CLI的引入使得在SparkSQL中通过hive metastore就可以直接对hive进行查询更加方便；当前版本中还不能使用Spark SQL CLI与ThriftServer进行交互。

使用Spark SQL CLI前需要注意：

1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下；

2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驱动的jar包

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/mysql-connector-java-5.1.-bin.jar

Spark SQL CLI命令参数介绍：

cd $SPARK_HOME/bin

spark-sql --help

Usage: ./bin/spark-sql [options] [cli option]

Spark assembly has been built with Hive, including Datanucleus jars on classpath

Options:

  --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.

  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or

                              on one of the worker machines inside the cluster ("cluster")

                              (Default: client).

  --class CLASS_NAME          Your application's main class (for Java / Scala apps).

  --name NAME                 A name of your application.

  --jars JARS                 Comma-separated list of local jars to include on the driver

                              and executor classpaths.

  --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place

                              on the PYTHONPATH for Python apps.

  --files FILES               Comma-separated list of files to be placed in the working

                              directory of each executor.

  --conf PROP=VALUE           Arbitrary Spark configuration property.

  --properties-file FILE      Path to a file from which to load extra properties. If not

                              specified, this will look for conf/spark-defaults.conf.

  --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 512M).

  --driver-java-options       Extra Java options to pass to the driver.

  --driver-library-path       Extra library path entries to pass to the driver.

  --driver-class-path         Extra class path entries to pass to the driver. Note that

                              jars added with --jars are automatically included in the

                              classpath.

  --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).

  --help, -h                  Show this help message and exit

  --verbose, -v               Print additional debug output

 Spark standalone with cluster deploy mode only:

  --driver-cores NUM          Cores for driver (Default: ).

  --supervise                 If given, restarts the driver on failure.

 Spark standalone and Mesos only:

  --total-executor-cores NUM  Total cores for all executors.

 YARN-only:

  --executor-cores NUM        Number of cores per executor (Default: ).

  --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").

  --num-executors NUM         Number of executors to launch (Default: ).

  --archives ARCHIVES         Comma separated list of archives to be extracted into the

                              working directory of each executor.

CLI options:

-d,--define <key=value>          Variable subsitution to apply to hive

                                  commands. e.g. -d A=B or --define A=B

    --database <databasename>     Specify the database to use

 -e <quoted-query-string>         SQL from command line

 -f <filename>                    SQL from files

 -h <hostname>                    connecting to Hive Server on remote host

    --hiveconf <property=value>   Use value for given property

    --hivevar <key=value>         Variable subsitution to apply to hive

                                  commands. e.g. --hivevar A=B

 -i <filename>                    Initialization SQL file

 -p <port>                        connecting to Hive Server on port number

 -S,--silent                      Silent mode in interactive shell

 -v,--verbose                     Verbose mode (echo executed SQL to the console)

在启动spark-sql时，如果不指定master，则以local的方式运行，master既可以指定standalone的地址，也可以指定yarn；

当设定master为yarn时(spark-sql --master yarn)时，可以通过http://hadoop000:8088页面监控到整个job的执行过程；

注：如果在$SPARK_HOME/conf/spark-defaults.conf中配置了spark.master spark://hadoop000:7077，那么在启动spark-sql时不指定master也是运行在standalone集群之上。

spark-sql使用

启动spark-sql：由于我已经在spark-defaults.conf中配置了spark.master spark://hadoop000:7077，就没在spark-sql启动时指定master了

cd $SPARK_HOME/bin

spark-sql

SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = - limit ;

SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit ;

上面两个sql语句用到的表现在存在hive中了，如果没有则手工创建下，创建脚本以及导入数据脚本如下：

create table page_views(

track_time string,

url string,

session_id string,

referer string,

ip string,

end_user_id string,

city_id string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data local inpath '/home/spark/software/data/page_views.dat' overwrite into table page_views;

SparkSQL使用之Spark SQL CLI的更多相关文章

6. 运行Spark SQL CLI
Spark SQL CLI可以很方便的在本地运行Hive元数据服务以及从命令行执行任务查询.需要注意的是,Spark SQL CLI不能与Thrift JDBC服务交互.在Spark目录下执行如下命令 ...
第6章运行Spark SQL CLI
第6章运行Spark SQL CLI Spark SQL CLI可以很方便的在本地运行Hive元数据服务以及从命令行执行查询任务.需要注意的是,Spark SQL CLI不能与Thrift JDBC ...
Spark SQL CLI 实现分析
背景本文主要介绍了Spark SQL里眼下的CLI实现,代码之后肯定会有不少变动,所以我关注的是比較核心的逻辑.主要是对照了Hive CLI的实现方式,比較Spark SQL在哪块地方做了改动,哪些 ...
spark-sql(spark sql cli)客户端集成hive
1.安装hadoop集群参考:http://www.cnblogs.com/wcwen1990/p/6739151.html 2.安装hive 参考:http://www.cnblogs.com/w ...
Spark 官方文档（5）——Spark SQL，DataFrames和Datasets 指南
Spark版本:1.6.2 概览 Spark SQL用于处理结构化数据,与Spark RDD API不同,它提供更多关于数据结构信息和计算任务运行信息的接口,Spark SQL内部使用这些额外的信息完 ...
Spark SQL 官方文档-中文翻译
Spark SQL 官方文档-中文翻译 Spark版本:Spark 1.5.2 转载请注明出处:http://www.cnblogs.com/BYRans/ 1 概述(Overview) 2 Data ...
Spark SQL 之 Performance Tuning & Distributed SQL Engine
Spark SQL 之 Performance Tuning & Distributed SQL Engine 转载请注明出处:http://www.cnblogs.com/BYRans/ 缓 ...
Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN
Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门起始点: SparkSession ...
Spark SQL官方文档阅读--待完善
1,DataFrame是一个将数据格式化为列形式的分布式容器,类似于一个关系型数据库表. 编程入口:SQLContext 2,SQLContext由SparkContext对象创建也可创建一个功能更 ...

随机推荐

每日学习心得：Js获取Checkboxlist所选值、instanceof 和typeof区别、为Array添加contains方法
2013-11-24 前言: 上周在工作中遇到了一些跟JS以及前台交互的问题,虽然算不上多么高深,但是在解决时也走了一些弯路,所以就总结一下. 1. JS获取checkboxList所选的值这 ...
C# & SQL Server大数据量插入方式对比
以下内容大部分来自: http://blog.csdn.net/tjvictor/article/details/4360030 部分内容出自互联网,实验结果为亲测. 最近自己开发一个向数据库中插入大 ...
OC-字典
1.所有的key都是一个字符串,键值是成对出现的.且都不能为空,非要为空要使用NSnull.字典是通过key来存取值的,key valu必须成对出场 2.字典是有键-值的数据组合,通过key查找对于 ...
JS 黑客帝国文字下落效果
黑客帝国文字下落效果源代码如下: <!DOCTYPE html> <html> <head> <meta charset="utf-8" ...
css样式单位取整,去掉'px'
alert(parseInt($(".themes1").css("margin-left"), 10));
== Got TLE on OJ? Here is the solution! ==
As a solo warrior in OJ, I spent about nearly 50% of my time on tackling TLE - that is innumerous ho ...
获取网页URL地址及参数等的两种方法(js和C#)
转:获取网页URL地址及参数等的两种方法(js和C#) 一 js 先看一个示例用javascript获取url网址信息 <script type="text/javascript&q ...
Android手机平板两不误，使用Fragment实现兼容手机和平板的程序
转载请注明出处:http://blog.csdn.net/guolin_blog/article/details/8744943 记得我之前参与开发过一个华为的项目,要求程序可以支持好几种终端设备,其 ...
AngularJS PhoneCat代码分析
转载自:http://www.tuicool.com/articles/ym6Jfen AngularJS 官方网站提供了一个用于学习的示例项目:PhoneCat.这是一个Web应用,用户可以浏览一些 ...
（二）java特征
java的核心是面向对象,与之相对的是面向过程的编程,在对整个java编程没有足够的理解和运用的情况下恐怕没办法很好的理解这两个概念. 在我的初步理解中,写一个程序就例如做一件事情,面向过程的 ...

SparkSQL使用之Spark SQL CLI

SparkSQL使用之Spark SQL CLI的更多相关文章

随机推荐

热门专题