sparksql hive

https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html

https://cwiki.apache.org/confluence/display/Hive/Home

【服务数仓，支持sql强标准】

Apache Hive

The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax.

【执行引擎有Spark】

Built on top of Apache Hadoop™, Hive provides the following features:

Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
A mechanism to impose structure on a variety of data formats
Access to files stored either directly in Apache HDFS™ or in other data storage systems such as Apache HBase™
Query execution via Apache Tez™, Apache Spark™, or MapReduce
Procedural language with HPL-SQL
Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider.

Hive provides standard SQL functionality, including many of the later SQL:2003 and SQL:2011 features for analytics.
Hive's SQL can also be extended with user code via user defined functions (UDFs), user defined aggregates (UDAFs), and user defined table functions (UDTFs).

There is not a single "Hive format" in which data must be stored. Hive comes with built in connectors for comma and tab-separated values (CSV/TSV) text files, Apache Parquet™, Apache ORC™, and other formats.
Users can extend Hive with connectors for other formats. Please see File Formats and Hive SerDe in the Developer Guide for details.

Hive is not designed for online transaction processing (OLTP) workloads. It is best used for traditional data warehousing tasks.
Hive is designed to maximize scalability (scale out with more machines added dynamically to the Hadoop cluster), performance, extensibility, fault-tolerance, and loose-coupling with its input formats.

Components of Hive include HCatalog and WebHCat.

HCatalog is a component of Hive. It is a table and storage management layer for Hadoop that enables users with different data processing tools — including Pig and MapReduce — to more easily read and write data on the grid.
WebHCat provides a service that you can use to run Hadoop MapReduce (or YARN), Pig, Hive jobs or perform Hive metadata operations using an HTTP (REST style) interface.

https://issues.apache.org/jira/browse/HIVE-7292

Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend.

Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop.

【在多reducer阶段，性能佳】

Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does.

This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated!

【共享Hive元数据】

Sparksql 没有元数据？通过临时创建元数据或者直接用Hive的元数据？

Sparksql 取代 Hive？的更多相关文章

SparkSQL读取Hive中的数据
由于我Spark采用的是Cloudera公司的CDH,并且安装的时候是在线自动安装和部署的集群.最近在学习SparkSQL,看到SparkSQL on HIVE.下面主要是介绍一下如何通过SparkS ...
SparkSQL与Hive on Spark的比较
简要介绍了SparkSQL与Hive on Spark的区别与联系一.关于Spark 简介在Hadoop的整个生态系统中,Spark和MapReduce在同一个层级,即主要解决分布式计算框架的问题 ...
关于sparksql操作hive，读取本地csv文件并以parquet的形式装入hive中
说明:spark版本:2.2.0 hive版本:1.2.1 需求: 有本地csv格式的一个文件,格式为${当天日期}visit.txt,例如20180707visit.txt,现在需要将其通过spar ...
spark on yarn模式下配置spark-sql访问hive元数据
spark on yarn模式下配置spark-sql访问hive元数据目的:在spark on yarn模式下,执行spark-sql访问hive的元数据.并对比一下spark-sql 和hive ...
sparksql 操作hive
写在前面:hive的版本是1.2.1spark的版本是1.6.x http://spark.apache.org/docs/1.6.1/sql-programming-guide.html#hive- ...
【完美解决】Spark-SQL、Hive多 Metastore、多后端、多库
[完美解决]Spark-SQL.Hive多 Metastore.多后端.多库 [完美解决]Spark-SQL.Hive多 Metastore.多后端.多库 SparkSQL 支持同时连接多种 Meta ...
hive on spark VS SparkSQL VS hive on tez
http://blog.csdn.net/wtq1993/article/details/52435563 http://blog.csdn.net/yeruby/article/details/51 ...
Spark-SQL连接Hive
第一步:修个Hive的配置文件hive-site.xml 添加如下属性,取消本地元数据服务: <property> <name>hive.metastore.local< ...
SparkSQL与Hive on Spark
SparkSQL与Hive on Spark的比较简要介绍了SparkSQL与Hive on Spark的区别与联系一.关于Spark 简介在Hadoop的整个生态系统中,Spark和MapR ...

随机推荐

Emmet插件的快捷键
Emmet插件的快捷键 html:5+tab键,可以生成html标签.!+tab键,也可以生成html标签.============================================== ...
vue生命周期回调方法
最近在用vue开发一个商品列表页,因需要根据请求回的字段是否有内容来显示隐藏该字段, 但因为vue异步加载导致显示隐藏方法不起作业(主要是判断条件取不到页面渲染内容),围观了vue生命周期后发现upd ...
洛谷——P2386 放苹果
P2386 放苹果题目背景 (poj1664) 题目描述把M个同样的苹果放在N个同样的盘子里,允许有的盘子空着不放,问共有多少种不同的分发(5,1,1和1,1,5是同一种方法) 输入输出格式输入 ...
CodeForces - 600F Edge coloring of bipartite graph
Discription You are given an undirected bipartite graph without multiple edges. You should paint the ...
Linux下查看某个命令的参数
1.一般每个命令都带有help参数,使用方法如下: shutdown --help 提示:shutdown为关机命令,在真实环境使用时需要root权限,比如前面加sudo. 2.使用man命令查看,使 ...
SMART OS
http://blog.csdn.net/babyfacer/article/details/8577333
安装Vmware增强工具
主机: Win7 虚拟机: VMware8.0+Debian6 目标: 离线安装软件包和VMware Tools 在虚拟机上安装完debian6后 1.在vmware的菜单中选择Vm->inst ...
mac 安装scrapy
https://jingyan.baidu.com/article/14bd256e748346bb6d2612be.html 1.安装Python 安装完了记得配置环境,将python目录和pyth ...
[剑指Offer]2.变态跳台阶
题目一仅仅青蛙一次能够跳上1级台阶,也能够跳上2级--它也能够跳上n级. 求该青蛙跳上一个n级的台阶总共同拥有多少种跳法. 思路用Fib(n)表示青蛙跳上n阶台阶的跳法数,设定Fib(0) = 1 ...
使用glReadPixels 读取颜色缓存，深度缓存和模板缓存数据
glReadPixels (GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid *p ...

Sparksql 取代 Hive？

Apache Hive

Sparksql 取代 Hive？的更多相关文章

随机推荐

热门专题