通过Spark SQL External Data Sources JDBC实现将RDD的数据写入到MySQL数据库中。

jdbc.scala重要API介绍:

/**
* Save this RDD to a JDBC database at `url` under the table name `table`.
* This will run a `CREATE TABLE` and a bunch of `INSERT INTO` statements.
* If you pass `true` for `allowExisting`, it will drop any table with the
* given name; if you pass `false`, it will throw if the table already
* exists.
*/
def createJDBCTable(url: String, table: String, allowExisting: Boolean) /**
* Save this RDD to a JDBC database at `url` under the table name `table`.
* Assumes the table already exists and has a compatible schema. If you
* pass `true` for `overwrite`, it will `TRUNCATE` the table before
* performing the `INSERT`s.
*
* The table must already exist on the database. It must have a schema
* that is compatible with the schema of this RDD; inserting the rows of
* the RDD in order via the simple statement
* `INSERT INTO table VALUES (?, ?, ..., ?)` should not fail.
*/
def insertIntoJDBC(url: String, table: String, overwrite: Boolean)
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._ val sqlContext = new SQLContext(sc)
import sqlContext._ #数据准备
val url = "jdbc:mysql://hadoop000:3306/test?user=root&password=root" val arr2x2 = Array[Row](Row.apply("dave", 42), Row.apply("mary", 222))
val arr1x2 = Array[Row](Row.apply("fred", 3))
val schema2 = StructType(StructField("name", StringType) :: StructField("id", IntegerType) :: Nil) val arr2x3 = Array[Row](Row.apply("dave", 42, 1), Row.apply("mary", 222, 2))
val schema3 = StructType(StructField("name", StringType) :: StructField("id", IntegerType) :: StructField("seq", IntegerType) :: Nil) import org.apache.spark.sql.jdbc._ ================================CREATE======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2) srdd.createJDBCTable(url, "person", false)
sqlContext.jdbcRDD(url, "person").collect.foreach(println)
[dave,42]
[mary,222] ==============================CREATE with overwrite========================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x3), schema3)
srdd.createJDBCTable(url, "person2", false)
sqlContext.jdbcRDD(url, "person2").collect.foreach(println)
[mary,222,2]
[dave,42,1] val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2)
srdd2.createJDBCTable(url, "person2", true)
sqlContext.jdbcRDD(url, "person2").collect.foreach(println)
[fred,3] ================================CREATE then INSERT to append======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2)
srdd.createJDBCTable(url, "person3", false)
sqlContext.jdbcRDD(url, "person3").collect.foreach(println)
[mary,222]
[dave,42] srdd2.insertIntoJDBC(url, "person3", false)
sqlContext.jdbcRDD(url, "person3").collect.foreach(println)
[mary,222]
[dave,42]
[fred,3] ================================CREATE then INSERT to truncate======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2) srdd.createJDBCTable(url, "person4", false)
sqlContext.jdbcRDD(url, "person4").collect.foreach(println)
[dave,42]
[mary,222] srdd2.insertIntoJDBC(url, "person4", true)
[fred,3] ================================Incompatible INSERT to append======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr2x3), schema3)
srdd.createJDBCTable(url, "person5", false)
srdd2.insertIntoJDBC(url, "person5", true)

java.sql.SQLException: Column count doesn't match value count at row 1

Spark SQL External Data Sources JDBC官方实现写测试的更多相关文章

  1. Spark SQL External Data Sources JDBC官方实现读测试

    在最新的master分支上官方提供了Spark JDBC外部数据源的实现,先尝为快. 通过spark-shell测试: import org.apache.spark.sql.SQLContext v ...

  2. Spark SQL External Data Sources JDBC简易实现

    在spark1.2版本中最令我期待的功能是External Data Sources,通过该API可以直接将External Data Sources注册成一个临时表,该表可以和已经存在的表等通过sq ...

  3. Spark SQL 之 Data Sources

    #Spark SQL 之 Data Sources 转载请注明出处:http://www.cnblogs.com/BYRans/ 数据源(Data Source) Spark SQL的DataFram ...

  4. Spark(3) - External Data Source

    Introduction Spark provides a unified runtime for big data. HDFS, which is Hadoop's filesystem, is t ...

  5. Spark SQL External DataSource简介

    随着Spark1.2的发布,Spark SQL开始正式支持外部数据源.这使得Spark SQL支持了更多的类型数据源,如json, parquet, avro, csv格式.只要我们愿意,我们可以开发 ...

  6. How to: Provide Credentials for the Dashboards Module when Using External Data Sources

    XAF中使用dashboard模块时,如果使用了sql数据源,可以使用此方法提供连接信息 https://www.devexpress.com/Support/Center/Question/Deta ...

  7. 【转载】Spark SQL之External DataSource外部数据源

    http://blog.csdn.net/oopsoom/article/details/42061077 一.Spark SQL External DataSource简介 随着Spark1.2的发 ...

  8. Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN

    Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门 起始点: SparkSession ...

  9. What’s new for Spark SQL in Apache Spark 1.3(中英双语)

    文章标题 What’s new for Spark SQL in Apache Spark 1.3 作者介绍 Michael Armbrust 文章正文 The Apache Spark 1.3 re ...

随机推荐

  1. ajax对象属性withCredentials

    默认情况下,ajax跨源请求不提供凭据(cookie.HTTP认证及客户端SSL证明等).通过将设置ajax的withCredentials属性设置为true,可以指定某个请求应该发送凭据.如果服务器 ...

  2. 安装debian第一天遇到的几个问题及解决方案

    1.当我想要使用sudo时,提示 bash: sudo: command not found 一开始以为是PATH不对,就各种百度各种试 export PATH=${PATH}:$HOME/bin:/ ...

  3. Flask微型框架入门笔记

    例程: from flask import Flask app = Flask(__name__) # 新建一个Flask可运行实体(名字参数如果是单独应用可以使用__name__变量,如果是modu ...

  4. 数据库最大连接池max pool size

    本文导读:Max Pool Size如果未设置则默认为100,理论最大值为32767.最大连接数是连接池能申请的最大连接数,如果数据库连接请求超过此数,后面的数据库连接请求将被加入到等待队列中,这会影 ...

  5. Day17_集合第三天

    1.HashSet类(掌握) 1.哈希值概念      哈希值:哈希值就是调用对象的hashCode()方法后返回的一个int型数字      哈希桶:简单点理解就是存储相同哈希值对象的一个容器 1. ...

  6. 解决eclipse spring配置报错:cvc-elt.1: Cannot find the declaration of element

    解决eclipse spring配置报错:cvc-elt.1: Cannot find the declaration of element 'beans'.Referenced file conta ...

  7. cornerstone忽略显示.DS_Store文件

    在MacOS上使用svn工具时,经常发现变化列表里出现一堆的?文件,.DS_Store,对有强迫症的人来说很郁闷.处理起来很简单,就是在svn的配置里忽略这个文件.#ue ~/.subversion/ ...

  8. iOS常用设计模式和机制之Block简单使用

    Block :block 实际上就是 Objective-C语言对闭包的实现 闭包(Closure):闭包就是一个函数,或者一个指向函数的指针,加上这个函数执行的非局部变量.闭包允许一个函数访问声明该 ...

  9. XListView刷新

    package com.example.da; import java.util.ArrayList;import java.util.List; import com.badu.net.Networ ...

  10. IT行业的正式入门

    虽然我是计算机专业毕业的大学生,但我自己认为我连什么是 IT都不了解,我热爱Java程序的设计,所以我现在在努力学习,今天是上Java程序设计的第一天,我正式进入IT业,踏上了这条“不归路”.figh ...