SparkSQL使用之Thrift JDBC server

Thrift JDBC Server描述

Thrift JDBC Server使用的是HIVE0.12的HiveServer2实现。能够使用Spark或者hive0.12版本的beeline脚本与JDBC Server进行交互使用。Thrift JDBC Server默认监听端口是10000。

使用Thrift JDBC Server前需要注意：

1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下；

2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驱动的jar包

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/mysql-connector-java-5.1.27-bin.jar

Thrift JDBC Server命令使用帮助：

cd $SPARK_HOME/sbin

start-thriftserver.sh --help

Usage: ./sbin/start-thriftserver [options] [thrift server options]

Spark assembly has been built with Hive, including Datanucleus jars on classpath

Options:

  --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.

  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or

                              on one of the worker machines inside the cluster ("cluster")

                              (Default: client).

  --class CLASS_NAME          Your application's main class (for Java / Scala apps).

  --name NAME                 A name of your application.

  --jars JARS                 Comma-separated list of local jars to include on the driver

                              and executor classpaths.

  --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place

                              on the PYTHONPATH for Python apps.

  --files FILES               Comma-separated list of files to be placed in the working

                              directory of each executor.

  --conf PROP=VALUE           Arbitrary Spark configuration property.

  --properties-file FILE      Path to a file from which to load extra properties. If not

                              specified, this will look for conf/spark-defaults.conf.

  --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 512M).

  --driver-java-options       Extra Java options to pass to the driver.

  --driver-library-path       Extra library path entries to pass to the driver.

  --driver-class-path         Extra class path entries to pass to the driver. Note that

                              jars added with --jars are automatically included in the

                              classpath.

  --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).

  --help, -h                  Show this help message and exit

  --verbose, -v               Print additional debug output

 Spark standalone with cluster deploy mode only:

  --driver-cores NUM          Cores for driver (Default: ).

  --supervise                 If given, restarts the driver on failure.

 Spark standalone and Mesos only:

  --total-executor-cores NUM  Total cores for all executors.

 YARN-only:

  --executor-cores NUM        Number of cores per executor (Default: ).

  --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").

  --num-executors NUM         Number of executors to launch (Default: ).

  --archives ARCHIVES         Comma separated list of archives to be extracted into the

                              working directory of each executor.

Thrift server options:

    --hiveconf <property=value>   Use value for given property

master的描述与Spark SQL CLI一致

beeline命令使用帮助：

cd $SPARK_HOME/bin

beeline --help

Usage: java org.apache.hive.cli.beeline.BeeLine

   -u <database url>               the JDBC URL to connect to

   -n <username>                   the username to connect as

   -p <password>                   the password to connect as

   -d <driver class>               the driver class to use

   -e <query>                      query that should be executed

   -f <file>                       script file that should be executed

   --color=[true/false]            control whether color is used for display

   --showHeader=[true/false]       show column names in query results

   --headerInterval=ROWS;          the interval between which heades are displayed

   --fastConnect=[true/false]      skip building table/column list for tab-completion

   --autoCommit=[true/false]       enable/disable automatic transaction commit

   --verbose=[true/false]          show verbose error messages and debug info

   --showWarnings=[true/false]     display connection warnings

   --showNestedErrs=[true/false]   display nested errors

   --numberFormat=[pattern]        format numbers using DecimalFormat pattern

   --force=[true/false]            continue running script even after errors

   --maxWidth=MAXWIDTH             the maximum width of the terminal

   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns

   --silent=[true/false]           be more silent

   --autosave=[true/false]         automatically save preferences

   --outputformat=[table/vertical/csv/tsv]   format mode for result display

   --isolation=LEVEL               set the transaction isolation level

   --help                          display this message

Thrift JDBC Server/beeline启动

启动Thrift JDBC Server：默认端口是10000

cd $SPARK_HOME/sbin

start-thriftserver.sh

如何修改Thrift JDBC Server的默认监听端口号？借助于--hiveconf

start-thriftserver.sh  --hiveconf hive.server2.thrift.port=

HiveServer2 Clients 详情参见：https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

启动beeline

cd $SPARK_HOME/bin

beeline -u jdbc:hive2://hadoop000:10000/default -n hadoop

sql脚本测试

SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = -1000 limit 10;

SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit 10;

SparkSQL使用之Thrift JDBC server的更多相关文章

SparkSQL使用之JDBC代码访问Thrift JDBC Server
启动ThriftJDBCServer: cd $SPARK_HOME/sbin start-thriftserver.sh & 使用jdbc访问ThriftJDBCServer代码段: pac ...
Thrift项目Server端开发流程
Thrift项目Server端开发流程首先,先了解工程中所有包的功能(见下图) 该图为用户中心项目的目录结构,以下依次介绍. 1. src/main/java com.framework:该 ...
SparkSQL使用之如何使用UDF
使用java开发一个helloworld级别UDF,打包成udf.jar,存放在/home/hadoop/lib下,代码如下: package com.luogankun.udf; import or ...
Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN
Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门起始点: SparkSession ...
Spark SQL官方文档阅读--待完善
1,DataFrame是一个将数据格式化为列形式的分布式容器,类似于一个关系型数据库表. 编程入口:SQLContext 2,SQLContext由SparkContext对象创建也可创建一个功能更 ...
Apache Spark 2.2.0 中文文档
Apache Spark 2.2.0 中文文档 - 快速入门 | ApacheCN Geekhoo 关注 2017.09.20 13:55* 字数 2062 阅读 13评论 0喜欢 1 快速入门使用 ...
Spark入门之DataFrame/DataSet
目录 Part I. Gentle Overview of Big Data and Spark Overview 1.基本架构 2.基本概念 3.例子(可跳过) Spark工具箱 1.Dataset ...
Spark SQL概念学习系列之分布式SQL引擎
不多说,直接上干货! parkSQL作为分布式查询引擎:两种方式除了在Spark程序里使用Spark SQL,我们也可以把Spark SQL当作一个分布式查询引擎来使用,有以下两种使用方式: 1.T ...
Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets
Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门起始点: SparkSession ...

随机推荐

git tag知多少
这个命令,其实很有用,类似clearcase中的label,就是给一个版本设置一个标记(标签),方便后期查找特定的版本. tag和commit的sha1那串字符串的关系,不是很大,但是还是要说一下的. ...
C++11中的Lambda表达式
原文地址:C++中的Lambda表达式作者:果冻想一直都在提醒自己,我是搞C++的:但是当C++11出来这么长时间了,我却没有跟着队伍走,发现很对不起自己的身份,也还好,发现自己也有段时间没有写C ...
从一次异常中浅谈Hibernate的flush机制
摘自http://www.niwozhi.net/demo_c70_i1482.html http://blog.itpub.net/1586/viewspace-829613/ 这是在一次事务提交时 ...
【linux】常见的网络管理命令
last:查看目前和过去的用户登录信息 [root@paulinux ~]# last root pts/0 192.168.1.106 Fri Jun 10 09:53 still logged i ...
MATLAB绘透视图
MATLAB绘图随记(1)--如何画一个透明平面 http://blog.sina.com.cn/s/blog_5cd4cccf0100q90p.html 小老板让我绘个图找了些资料最后发现mat ...
【ntp】centos7下ntp服务器设置
安装ntp #检查服务是否安装 rpm -q ntp #安装ntp服务器 yum -y install ntp 修改配置文件:/etc/ntp.conf 内容如下: restrict default ...
MyEclipse9中的不伤眼修改、FreeMarker插件、JQuery提示插件、全屏(FullScreen)插件的安装
============下载相关附件===================== http://files.cnblogs.com/fhtwins/eclipse-fullscreen_1.0.7.zi ...
ALITUM DESIGNER 多PIN脚IC元件封装的制作
多IC芯片的管教众多,一个一个的添加引脚效率较低,网上有好的方法,现总结如下 1 在元件库.schlib中新建元件,画出框图和添加第一个PIN脚 2利用smart paste快速放置众多PIN脚(具体 ...
web.xml配置
<?xml version="1.0" encoding="UTF-8"?> <web-app version="2.5" ...
scrapy 模拟登录后再抓取
深度好文: from scrapy.contrib.spiders.init import InitSpider from scrapy.http import Request, FormReques ...

SparkSQL使用之Thrift JDBC server

SparkSQL使用之Thrift JDBC server的更多相关文章

随机推荐

热门专题