sparksql hive作为数据源
根据官方文档的说法,要把hive-site.xml,core-site.xml,hdfs-site.xml拷贝到spark的conf目录下,保证mysql已经启动
java
public class Demo {
private static SparkSession session = SparkSession.builder().appName("demo").enableHiveSupport()
.config("spark.sql.warehouse.dir", "/user/hive/warehouse").getOrCreate(); public static void main(String[] args) {
session.sql("drop table if exists students_info");
session.sql("create table if not exists students_info(name string,age int) "
+ "row format delimited fields terminated by '\t' \r\n"); // 将数据导入学生信息表
session.sql(
"load data local inpath '/opt/module/spark-test/data/student_infos.txt' into table default.students_info"); session.sql("drop table if exists students_score");
session.sql("create table if not exists students_score(name string,score int) \r\n"
+ "row format delimited fields terminated by '\t' \r\n"); // 将数据导入学生成绩表
session.sql(
"load data local inpath '/opt/module/spark-test/data/student_scores.txt' into table default.students_score"); // 查询
Dataset<Row> dataset = session.sql(
"select s1.name,s1.age,s2.score from students_info s1 join students_score s2 on s1.name=s2.name where s2.score>80"); // 将dataset中的数据保存到hive中
session.sql("drop table if exists students_result");
dataset.write().saveAsTable("students_result"); // 将hive中的表转成dataset,查看数据是否成功保存
Dataset<Row> table = session.table("students_result");
table.show(); session.stop(); }
}
scala
object Demo {
def main(args: Array[String]): Unit = {
val session = SparkSession.builder().appName("demo").enableHiveSupport().config("spark.sql.warehouse.dir", "/user/hive/warehouse").getOrCreate() session.sql("drop table if exists students_info")
session.sql("create table if not exists students_info(name string,age int) \r\n row format delimited fields terminated by '\t'") session.sql("load data local inpath '/opt/module/spark-test/data/student_infos.txt' into table default.students_info") session.sql("drop table if exists students_score")
session.sql("create table if not exists students_score(name string,score int) \r\n row format delimited fields terminated by '\t'") session.sql("load data local inpath '/opt/module/spark-test/data/student_scores.txt' into table default.students_score") //保存到hive中
session.sql("drop table if exists students_result")
session.sql("select s1.name,s1.age,s2.score from students_info s1 join students_score s2 on s1.name=s2.name where s2.score >90").write.saveAsTable("students_result") //检查数据是否保存
val df = session.table("students_result")
df.show() session.stop()
}
}
sparksql hive作为数据源的更多相关文章
- SparkSQL读写外部数据源--数据分区
import com.twq.dataset.Utils._ import org.apache.spark.sql.{SaveMode, SparkSession} object FileParti ...
- SparkSQL读写外部数据源-基本操作load和save
数据源-基本操作load和save object BasicTest { def main(args: Array[String]): Unit = { val spark = SparkSessio ...
- SparkSQL读写外部数据源-jext文件和table数据源的读写
object ParquetFileTest { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() ...
- SparkSQL读写外部数据源-通过jdbc读写mysql数据库
object JdbcDatasourceTest { def main(args: Array[String]): Unit = { val spark = SparkSession .builde ...
- SparkSQL读写外部数据源--csv文件的读写
object CSVFileTest { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .ap ...
- SparkSQL读写外部数据源-json文件的读写
object JsonFileTest { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .m ...
- 报表使用hive数据源报java.net.SocketTimeoutException: Read timed out
数据库表的数据量大概50W左右,在报表设计器下创建了hive的数据源,连接正常,由于数据量比较大,就用了润乾报表的大数据报表功能,报表设置好后,发布到页面中报错: 数据集ds1中,SQL语句SELEC ...
- Sparksql 取代 Hive?
sparksql hive https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-o ...
- SparkSQL程序设计
1.创建Spark Session val spark = SparkSession.builder . master("local") .appName("spark ...
随机推荐
- JS学习笔记 - 微博发布效果
<script> window.onload = function() { var oTxt = document.getElementById('txt1'); var oBtn = d ...
- HDU 2473 Junk-Mail Filter 并查集删除(FZU 2155盟国)
http://acm.hdu.edu.cn/showproblem.php?pid=2473 http://acm.fzu.edu.cn/problem.php?pid=2155 题目大意: 编号0~ ...
- IOS获取preferreces偏好设置plistname名称的方法
//获取preferreces偏好设置plistname名称的方法1 -(NSArray*)loadSpecifiersFromPlistName:(NSString*)plistName targe ...
- ThinkPHP视图查询
ThinkPHP视图查询 一.总结 1.这里的视图查询和多表查询很像,当然多表查询的话肯定要支持左右链接查询 2.view:视图的使用,关键字是view 3.sql视图功能支持:thinkphp支持视 ...
- python3中numpy函数tile的用法
tile函数位于python模块 numpy.lib.shape_base中,他的功能是重复某个数组.比如tile(A,n),功能是将数组A重复n次,构成一个新的数组,我们还是使用具体的例子来说明问题 ...
- swift项目第七天:构建访客界面以及监听按钮点击
一:访客界面效果如图 二:xib封装访客视图的view 1:业务逻辑分析:1:由于用户未登录时要显示访客视图,要先进行判断用户是否登录,未登录则显示访客视图,登录则显示正常的登陆界面,由于要在四个子控 ...
- LoaderManager使用具体解释(一)---没有Loader之前的世界
来源: http://www.androiddesignpatterns.com/2012/07/loaders-and-loadermanager-background.html 感谢作者Alex ...
- vc弹出USB的方法. 附试验通过的代码!
vc弹出USB的方法. 附试验通过的代码! http://blog.sina.com.cn/s/blog_4fcd1ea30100qrzn.html (2011-04-15 10:09:48) boo ...
- 【topcoder SRM 702 DIV 2 250】TestTaking
Problem Statement Recently, Alice had to take a test. The test consisted of a sequence of true/false ...
- Java解惑八:很多其它库之谜
本文是依据JAVA解惑这本书,做的笔记. 电子书见:http://download.csdn.net/detail/u010378705/7527721 谜题76 将线程的启动方法start(),写成 ...