Spark SQL External Data Sources JDBC官方实现写测试
通过Spark SQL External Data Sources JDBC实现将RDD的数据写入到MySQL数据库中。
jdbc.scala重要API介绍:
/**
* Save this RDD to a JDBC database at `url` under the table name `table`.
* This will run a `CREATE TABLE` and a bunch of `INSERT INTO` statements.
* If you pass `true` for `allowExisting`, it will drop any table with the
* given name; if you pass `false`, it will throw if the table already
* exists.
*/
def createJDBCTable(url: String, table: String, allowExisting: Boolean) /**
* Save this RDD to a JDBC database at `url` under the table name `table`.
* Assumes the table already exists and has a compatible schema. If you
* pass `true` for `overwrite`, it will `TRUNCATE` the table before
* performing the `INSERT`s.
*
* The table must already exist on the database. It must have a schema
* that is compatible with the schema of this RDD; inserting the rows of
* the RDD in order via the simple statement
* `INSERT INTO table VALUES (?, ?, ..., ?)` should not fail.
*/
def insertIntoJDBC(url: String, table: String, overwrite: Boolean)
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._ val sqlContext = new SQLContext(sc)
import sqlContext._ #数据准备
val url = "jdbc:mysql://hadoop000:3306/test?user=root&password=root" val arr2x2 = Array[Row](Row.apply("dave", 42), Row.apply("mary", 222))
val arr1x2 = Array[Row](Row.apply("fred", 3))
val schema2 = StructType(StructField("name", StringType) :: StructField("id", IntegerType) :: Nil) val arr2x3 = Array[Row](Row.apply("dave", 42, 1), Row.apply("mary", 222, 2))
val schema3 = StructType(StructField("name", StringType) :: StructField("id", IntegerType) :: StructField("seq", IntegerType) :: Nil) import org.apache.spark.sql.jdbc._ ================================CREATE======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2) srdd.createJDBCTable(url, "person", false)
sqlContext.jdbcRDD(url, "person").collect.foreach(println)
[dave,42]
[mary,222] ==============================CREATE with overwrite========================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x3), schema3)
srdd.createJDBCTable(url, "person2", false)
sqlContext.jdbcRDD(url, "person2").collect.foreach(println)
[mary,222,2]
[dave,42,1] val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2)
srdd2.createJDBCTable(url, "person2", true)
sqlContext.jdbcRDD(url, "person2").collect.foreach(println)
[fred,3] ================================CREATE then INSERT to append======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2)
srdd.createJDBCTable(url, "person3", false)
sqlContext.jdbcRDD(url, "person3").collect.foreach(println)
[mary,222]
[dave,42] srdd2.insertIntoJDBC(url, "person3", false)
sqlContext.jdbcRDD(url, "person3").collect.foreach(println)
[mary,222]
[dave,42]
[fred,3] ================================CREATE then INSERT to truncate======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2) srdd.createJDBCTable(url, "person4", false)
sqlContext.jdbcRDD(url, "person4").collect.foreach(println)
[dave,42]
[mary,222] srdd2.insertIntoJDBC(url, "person4", true)
[fred,3] ================================Incompatible INSERT to append======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr2x3), schema3)
srdd.createJDBCTable(url, "person5", false)
srdd2.insertIntoJDBC(url, "person5", true)
java.sql.SQLException: Column count doesn't match value count at row 1
Spark SQL External Data Sources JDBC官方实现写测试的更多相关文章
- Spark SQL External Data Sources JDBC官方实现读测试
在最新的master分支上官方提供了Spark JDBC外部数据源的实现,先尝为快. 通过spark-shell测试: import org.apache.spark.sql.SQLContext v ...
- Spark SQL External Data Sources JDBC简易实现
在spark1.2版本中最令我期待的功能是External Data Sources,通过该API可以直接将External Data Sources注册成一个临时表,该表可以和已经存在的表等通过sq ...
- Spark SQL 之 Data Sources
#Spark SQL 之 Data Sources 转载请注明出处:http://www.cnblogs.com/BYRans/ 数据源(Data Source) Spark SQL的DataFram ...
- Spark(3) - External Data Source
Introduction Spark provides a unified runtime for big data. HDFS, which is Hadoop's filesystem, is t ...
- Spark SQL External DataSource简介
随着Spark1.2的发布,Spark SQL开始正式支持外部数据源.这使得Spark SQL支持了更多的类型数据源,如json, parquet, avro, csv格式.只要我们愿意,我们可以开发 ...
- How to: Provide Credentials for the Dashboards Module when Using External Data Sources
XAF中使用dashboard模块时,如果使用了sql数据源,可以使用此方法提供连接信息 https://www.devexpress.com/Support/Center/Question/Deta ...
- 【转载】Spark SQL之External DataSource外部数据源
http://blog.csdn.net/oopsoom/article/details/42061077 一.Spark SQL External DataSource简介 随着Spark1.2的发 ...
- Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN
Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门 起始点: SparkSession ...
- What’s new for Spark SQL in Apache Spark 1.3(中英双语)
文章标题 What’s new for Spark SQL in Apache Spark 1.3 作者介绍 Michael Armbrust 文章正文 The Apache Spark 1.3 re ...
随机推荐
- app跳转
iOS 从C APP里启动 D APP 首先在D APP里设置 URL Schemes 在info.plist 文件里添加URL Schemes URL Types -->item0 --> ...
- iOS 模拟器上录制Demo视频
在App审核中会让用户上传一段简易的视频,那么如何制作改视频呢? 今天无意中在Cocochina 中看到 说利用QuickTime来制作,而QuickTime是操作系统自带的. 哦 My,God ! ...
- sqoop将关系型的数据库得数据导入到hbase中
1.sqoop将关系数据库导入到hbase的参数说明
- Android Studio的一些快捷键
以下这些也是百度的其他人整理的.后面有新的会加进来. AS的快捷键容易和QQ,微信等冲突,可以手动关掉或者修改其他软件的热键 Ctrl+G / Ctrl+Alt+Shift+G:查询变量或者函数或者类 ...
- GCD,用同步/异步函数,创建并发/串行队列
队列 第一个参数:C语言字符串,标签 第二个参数: DISPATCH_QUEUE_CONCURRENT:并发队列 DISPATCH_QUEUE_SERIAL:串行队列 dispatch_queue_ ...
- Startup key combinations for Intel-based Macs
Learn about the startup key combinations you can use with Intel-based Macs. You can use the followin ...
- G - YY's new problem(HUSH算法,目前还不懂什么是HUSH算法)
Time Limit:4000MS Memory Limit:65536KB 64bit IO Format:%I64d & %I64u Submit Status Pra ...
- PAT (Basic Level) Practise:1022. D进制的A+B
[题目连接] 输入两个非负10进制整数A和B(<=230-1),输出A+B的D (1 < D <= 10)进制数. 输入格式: 输入在一行中依次给出3个整数A.B和D. 输出格式: ...
- oracle数据库--启动和关闭
oracle--启动 oracle数据库的启动过程包含3个步骤:启动实例->加载数据库->打开数据库 分步骤启动过程可以对数据库进行不同的维护操作,对应我们不同的需求. 启动模式: 1.s ...
- 在Swift中应用Grand Central Dispatch(上)转载自的goldenfiredo001的博客
尽管Grand Central Dispatch(GCD)已经存在一段时间了,但并非每个人都知道怎么使用它.这是情有可原的,因为并发很棘手,而且GCD本身基于C的API在 Swift世界中很刺眼. 在 ...