Spark记录-本地Spark读取Hive数据简单例子
注意:将mysql的驱动包拷贝到spark/lib下,将hive-site.xml拷贝到项目resources下,远程调试不要使用主机名 import org.apache.spark._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext
import java.io.FileNotFoundException
import java.io.IOException object HiveSelect {
def main(args: Array[String]) {
System.setProperty("hadoop.home.dir", "D:\\hadoop") //加载hadoop组件
val conf = new SparkConf().setAppName("HiveApp").setMaster("spark://192.168.66.66:7077")
.set("spark.executor.memory", "1g")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.setJars(Seq("D:\\workspace\\scala\\out\\scala.jar"))//加载远程spark
//.set("hive.metastore.uris", "thrift://192.168.66.66:9083")//远程hive的meterstore地址
// .set("spark.driver.extraClassPath","D:\\json\\mysql-connector-java-5.1.39.jar")
val sparkcontext = new SparkContext(conf);
try {
val hiveContext = new HiveContext(sparkcontext);
hiveContext.sql("use siat"); //使用数据库
hiveContext.sql("DROP TABLE IF EXISTS src") //删除表
hiveContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) " +
"ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ");//创建表
hiveContext.sql("LOAD DATA LOCAL INPATH 'D:\\workspace\\scala\\src.txt' INTO TABLE src "); //导入数据
hiveContext.sql(" SELECT * FROM src").collect().foreach(println);//查询数据
}
catch {
case e: FileNotFoundException => println("Missing file exception")
case ex: IOException => println("IO Exception")
case ee: ArithmeticException => println(ee)
case eee: Throwable => println("found a unknown exception" + eee)
case ef: NumberFormatException => println(ef)
case ec: Exception => println(ec)
case e: IllegalArgumentException => println("illegal arg. exception");
case e: IllegalStateException => println("illegal state exception");
}
finally {
sparkcontext.stop()
}
}
}
附录1:scala-spark api-http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package
org.apache.spark org.apache.spark.api.java org.apache.spark.api.java.function org.apache.spark.broadcast org.apache.spark.graphx org.apache.spark.graphx.impl org.apache.spark.graphx.lib org.apache.spark.graphx.util org.apache.spark.input org.apache.spark.internal org.apache.spark.internal.io org.apache.spark.io org.apache.spark.launcher org.apache.spark.mapred org.apache.spark.metrics.source org.apache.spark.ml org.apache.spark.ml.attribute org.apache.spark.ml.classification org.apache.spark.ml.clustering org.apache.spark.ml.evaluation org.apache.spark.ml.feature org.apache.spark.ml.fpm org.apache.spark.ml.linalg org.apache.spark.ml.param org.apache.spark.ml.recommendation org.apache.spark.ml.regression org.apache.spark.ml.source.libsvm org.apache.spark.ml.stat org.apache.spark.ml.stat.distribution org.apache.spark.ml.tree org.apache.spark.ml.tuning org.apache.spark.ml.util org.apache.spark.mllib org.apache.spark.mllib.classification org.apache.spark.mllib.clustering org.apache.spark.mllib.evaluation org.apache.spark.mllib.feature org.apache.spark.mllib.fpm org.apache.spark.mllib.linalg org.apache.spark.mllib.linalg.distributed org.apache.spark.mllib.optimization org.apache.spark.mllib.pmml org.apache.spark.mllib.random org.apache.spark.mllib.rdd org.apache.spark.mllib.recommendation org.apache.spark.mllib.regression org.apache.spark.mllib.stat org.apache.spark.mllib.stat.distribution org.apache.spark.mllib.stat.test org.apache.spark.mllib.tree org.apache.spark.mllib.tree.configuration org.apache.spark.mllib.tree.impurity org.apache.spark.mllib.tree.loss org.apache.spark.mllib.tree.model org.apache.spark.mllib.util org.apache.spark.partial org.apache.spark.rdd org.apache.spark.scheduler org.apache.spark.scheduler.cluster org.apache.spark.security org.apache.spark.serializer org.apache.spark.sql org.apache.spark.sql.api.java org.apache.spark.sql.catalog org.apache.spark.sql.expressions org.apache.spark.sql.expressions.javalang org.apache.spark.sql.expressions.scalalang org.apache.spark.sql.hive org.apache.spark.sql.hive.execution org.apache.spark.sql.hive.orc org.apache.spark.sql.jdbc org.apache.spark.sql.sources org.apache.spark.sql.streaming org.apache.spark.sql.types org.apache.spark.sql.util org.apache.spark.status.api.v1 org.apache.spark.status.api.v1.streaming org.apache.spark.storage org.apache.spark.streaming org.apache.spark.streaming.api.java org.apache.spark.streaming.dstream org.apache.spark.streaming.flume org.apache.spark.streaming.kafka org.apache.spark.streaming.kinesis org.apache.spark.streaming.receiver org.apache.spark.streaming.scheduler org.apache.spark.streaming.scheduler.rate org.apache.spark.streaming.util org.apache.spark.ui.env org.apache.spark.ui.exec org.apache.spark.ui.jobs org.apache.spark.ui.storage org.apache.spark.util org.apache.spark.util.random org.apache.spark.util.sketch
Spark记录-本地Spark读取Hive数据简单例子的更多相关文章
- R语言读取Hive数据表
R通过RJDBC包连接Hive 目前Hive集群是可以通过跳板机来访问 HiveServer, 将Hive 中的批量数据读入R环境,并进行后续的模型和算法运算. 1. 登录跳板机后需要首先在Linux ...
- javascript读取xml文件读取节点数据的例子
分享下用javascript读取xml文件读取节点数据方法. 读取的节点数据,还有一种情况是读取节点属性数据. <head> <title></title> < ...
- Spark记录-Spark-Shell客户端操作读取Hive数据
1.拷贝hive-site.xml到spark/conf下,拷贝mysql-connector-java-xxx-bin.jar到hive/lib下 2.开启hive元数据服务:hive --ser ...
- Spark SQL读取hive数据时报找不到mysql驱动
Exception: Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BoneC ...
- Spark从HDFS上读取JSON数据
代码如下: import org.apache.spark.sql.Row; import org.apache.spark.SparkConf; import org.apache.spark.ap ...
- Spark记录-阿里巴巴开源工具DataX数据同步工具使用
1.官网下载 下载地址:https://github.com/alibaba/DataX DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL.Oracle.SqlSe ...
- python 读取hive数据
话不多说,直接上代码 from pyhive import hivedef pyhive(hql): conn = hive.Connection(host='HiveServer2 host', p ...
- ListBox和ComboBox绑定数据简单例子
1. 将集合数据绑定到ListBox和ComboBox控件,界面上显示某个属性的内容 //自定义了Person类(有Name,Age,Heigth等属性) List<Person> per ...
- Spark读取elasticsearch数据指南
最近要在 Spark job 中通过 Spark SQL 的方式读取 Elasticsearch 数据,踩了一些坑,总结于此. 环境说明 Spark job 的编写语言为 Scala,scala-li ...
随机推荐
- R绘图 第四篇:绘制箱图(ggplot2)
箱线图通过绘制观测数据的五数总括,即最小值.下四分位数.中位数.上四分位数以及最大值,描述了变量值的分布情况.箱线图能够显示出离群点(outlier),离群点也叫做异常值,通过箱线图能够很容易识别出数 ...
- 浅谈 iOS 中的 Activity Indicator
Activity Indicator 是iOS开发中必不可少的一个视图.本文就简单地总结一下这个Activity Indicator 的使用方法. 默认 Activity Indicator 以下的函 ...
- 一款基于Zigbee技术的智慧鱼塘系统研究与设计
在现代鱼塘养鱼中,主要困扰渔农的就是养殖成本问题.而鱼塘养殖成本最高的就是养殖的人工费,喂养的饲料费和鱼塘中高达几千瓦增氧机的消耗的电费.实现鱼塘自动化养殖将会很好地解决上述问题,大大提高渔农的经济效 ...
- 甲题题解-1116. Come on! Let’s C (20)-(素数筛选法)
用vis标记出现过的id,checked标记询问过的id.至于如何判断排名为素数,用素数筛选法预处理一下即可,水题. #include <iostream> #include <cs ...
- 20135202闫佳歆--week5 系统调用(下)--学习笔记
此为个人笔记存档 week 5 系统调用(下) 一.给MenuOS增加time和time-asm命令 这里老师示范的时候是已经做好的了: rm menu -rf 强制删除 git clone http ...
- 实验---反汇编一个简单的C程序(杨光)
反汇编一个简单的C程序 攥写人:杨光 学号:20135233 ( *原创作品转载请注明出处*) ( 学习课程:<Linux内核分析>MOOC课程http://mooc.study.163 ...
- RYU 的选择以及安装
RYU 的选择以及安装 由于近期的项目需求,不得已得了解一下控制器内部发现拓扑原理,由于某某应用中的控制器介绍中使用的RYU,所以打算把RYU装一下试试.出乎意料的是,RYU竟是我之前装过最最轻便的控 ...
- win8以上windows系统eclipse环境下图片显示乱码问题解决
相信升级了win10系统的诸多安卓开发者在用eclipse时会发现一个很不爽的地方,就是原本win7环境下能正常打开的图片文件现在成了一页乱码,我曾多次碰到这个问题,在网上也很难找到行之有效的具体解决 ...
- final 140字评论II
1.约跑app: 从性能上讲,着重修改了其他组找出的bug,性能上有了很大的提高,增强了实用性. 从功能上讲,该app可以增加用户之间的互动性,有较多的客户群,适合人群不限于青少年和成年人. 从UI上 ...
- PAT 甲级 1022 Digital Library
https://pintia.cn/problem-sets/994805342720868352/problems/994805480801550336 A Digital Library cont ...