通过Spark SQL External Data Sources JDBC实现将RDD的数据写入到MySQL数据库中。

jdbc.scala重要API介绍:

/**
* Save this RDD to a JDBC database at `url` under the table name `table`.
* This will run a `CREATE TABLE` and a bunch of `INSERT INTO` statements.
* If you pass `true` for `allowExisting`, it will drop any table with the
* given name; if you pass `false`, it will throw if the table already
* exists.
*/
def createJDBCTable(url: String, table: String, allowExisting: Boolean) /**
* Save this RDD to a JDBC database at `url` under the table name `table`.
* Assumes the table already exists and has a compatible schema. If you
* pass `true` for `overwrite`, it will `TRUNCATE` the table before
* performing the `INSERT`s.
*
* The table must already exist on the database. It must have a schema
* that is compatible with the schema of this RDD; inserting the rows of
* the RDD in order via the simple statement
* `INSERT INTO table VALUES (?, ?, ..., ?)` should not fail.
*/
def insertIntoJDBC(url: String, table: String, overwrite: Boolean)
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._ val sqlContext = new SQLContext(sc)
import sqlContext._ #数据准备
val url = "jdbc:mysql://hadoop000:3306/test?user=root&password=root" val arr2x2 = Array[Row](Row.apply("dave", 42), Row.apply("mary", 222))
val arr1x2 = Array[Row](Row.apply("fred", 3))
val schema2 = StructType(StructField("name", StringType) :: StructField("id", IntegerType) :: Nil) val arr2x3 = Array[Row](Row.apply("dave", 42, 1), Row.apply("mary", 222, 2))
val schema3 = StructType(StructField("name", StringType) :: StructField("id", IntegerType) :: StructField("seq", IntegerType) :: Nil) import org.apache.spark.sql.jdbc._ ================================CREATE======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2) srdd.createJDBCTable(url, "person", false)
sqlContext.jdbcRDD(url, "person").collect.foreach(println)
[dave,42]
[mary,222] ==============================CREATE with overwrite========================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x3), schema3)
srdd.createJDBCTable(url, "person2", false)
sqlContext.jdbcRDD(url, "person2").collect.foreach(println)
[mary,222,2]
[dave,42,1] val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2)
srdd2.createJDBCTable(url, "person2", true)
sqlContext.jdbcRDD(url, "person2").collect.foreach(println)
[fred,3] ================================CREATE then INSERT to append======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2)
srdd.createJDBCTable(url, "person3", false)
sqlContext.jdbcRDD(url, "person3").collect.foreach(println)
[mary,222]
[dave,42] srdd2.insertIntoJDBC(url, "person3", false)
sqlContext.jdbcRDD(url, "person3").collect.foreach(println)
[mary,222]
[dave,42]
[fred,3] ================================CREATE then INSERT to truncate======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2) srdd.createJDBCTable(url, "person4", false)
sqlContext.jdbcRDD(url, "person4").collect.foreach(println)
[dave,42]
[mary,222] srdd2.insertIntoJDBC(url, "person4", true)
[fred,3] ================================Incompatible INSERT to append======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr2x3), schema3)
srdd.createJDBCTable(url, "person5", false)
srdd2.insertIntoJDBC(url, "person5", true)

java.sql.SQLException: Column count doesn't match value count at row 1

Spark SQL External Data Sources JDBC官方实现写测试的更多相关文章

  1. Spark SQL External Data Sources JDBC官方实现读测试

    在最新的master分支上官方提供了Spark JDBC外部数据源的实现,先尝为快. 通过spark-shell测试: import org.apache.spark.sql.SQLContext v ...

  2. Spark SQL External Data Sources JDBC简易实现

    在spark1.2版本中最令我期待的功能是External Data Sources,通过该API可以直接将External Data Sources注册成一个临时表,该表可以和已经存在的表等通过sq ...

  3. Spark SQL 之 Data Sources

    #Spark SQL 之 Data Sources 转载请注明出处:http://www.cnblogs.com/BYRans/ 数据源(Data Source) Spark SQL的DataFram ...

  4. Spark(3) - External Data Source

    Introduction Spark provides a unified runtime for big data. HDFS, which is Hadoop's filesystem, is t ...

  5. Spark SQL External DataSource简介

    随着Spark1.2的发布,Spark SQL开始正式支持外部数据源.这使得Spark SQL支持了更多的类型数据源,如json, parquet, avro, csv格式.只要我们愿意,我们可以开发 ...

  6. How to: Provide Credentials for the Dashboards Module when Using External Data Sources

    XAF中使用dashboard模块时,如果使用了sql数据源,可以使用此方法提供连接信息 https://www.devexpress.com/Support/Center/Question/Deta ...

  7. 【转载】Spark SQL之External DataSource外部数据源

    http://blog.csdn.net/oopsoom/article/details/42061077 一.Spark SQL External DataSource简介 随着Spark1.2的发 ...

  8. Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN

    Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门 起始点: SparkSession ...

  9. What’s new for Spark SQL in Apache Spark 1.3(中英双语)

    文章标题 What’s new for Spark SQL in Apache Spark 1.3 作者介绍 Michael Armbrust 文章正文 The Apache Spark 1.3 re ...

随机推荐

  1. JavaScript数组方法总结

    由于最近都在freecodecamp上刷代码,运用了很多JavaScript数组的方法,因此做了一份关于JavaScript教程的整理,具体内容如下: 一.普通方法 1.join() 将数组元素连接在 ...

  2. 黑马程序员——【Java基础】——网络编程

    ---------- android培训.java培训.期待与您交流! ---------- 一.网络模型概述 网络模型示意图: 说明: (1)数据的传输:在用户端,应用层的数据,经过层层封包,最后到 ...

  3. Java数据结构和算法之哈希表

    五.哈希表 一般的线性表.树中,记录在结构中的相对位置是随机的即和记录的关键字之间不存在确定的关系,在结构中查找记录时需进行一系列和关键字的比较.这一类查找方法建立在“比较”的基础上,查找的效率与比较 ...

  4. 【Avalon】factory

    (function(global, factory) { if (typeof module === "object" && typeof module.expor ...

  5. C和Objective-C的语法概要

    C语言的三个基本要素是数据.语句和函数,支持面向过程编程(POP). C语言有数据,数据分为常量和变量,数据的类型分为字符类型和数字类型,数字类型分为整数类型和浮点数类型,复合数据的类型有数组和结构, ...

  6. PhpStorm中配置xdebug调试环境

    1. 安装xdebug 第一步: 得到本地PHP配置信息 在终端中运行: php -i > outputphp.txt 然后将得到的txt文件中的信息拷贝并复制到http://xdebug.or ...

  7. 无需输入密码的scp/ssh/rsync操作方法

    一般使用scp/ssh/rsync传输文件时,都需要输入密码.下面是免密码传输文件的方法. 假设要在两台主机之间传送文件,host_src & host_dst.host_src是文件源地址所 ...

  8. Java笔记4-do while循环,break,修饰符,方法的调用

    do while循环语法:do{ //循环体}while(条件表达式); 注:它是先执行循环体,后再判断的循环结构. 如:int i = 0;do{ System.out.println(" ...

  9. Candy Store

    Candy Store Time Limit: 30000ms, Special Time Limit:75000ms, Memory Limit:65536KB Total submit users ...

  10. linux crontab -r 导致no crontab for root的原因及解决方案

    使用方式 : crontab file [-u user]-用指定的文件替代目前的crontab. crontab-[-u user]-用标准输入替代目前的crontab. crontab-1[use ...