今天有哥们问到怎样对Spark进行单元測试。如今将Sbt的測试方法写出来,例如以下:

    对Spark的test case进行測试的时候能够用sbt的test命令:

    一、測试所有test case

     sbt/sbt test

    二、測试单个test case

     sbt/sbt "test-only *DriverSuite*" 

以下举个样例:

这个Test Case是位于$SPARK_HOME/core/src/test/scala/org/apache/spark/DriverSuite.scala 

FunSuit是scalatest里面的測试Suit。要继承它。

这里主要是一个回归測试,測试Spark程序正常结束后,Driver会不会正常退出。

注:我就拿这个样例模拟一下,測试成功和測试失败的情景,这个样例和DriverSuite的測试目的全然不一致。仅仅是演示作用。 :)

以下是正常执行退出的样例:

package org.apache.spark

import java.io.File

import org.apache.log4j.Logger
import org.apache.log4j.Level import org.scalatest.FunSuite
import org.scalatest.concurrent.Timeouts
import org.scalatest.prop.TableDrivenPropertyChecks._
import org.scalatest.time.SpanSugar._ import org.apache.spark.util.Utils import scala.language.postfixOps class DriverSuite extends FunSuite with Timeouts { test("driver should exit after finishing") {
val sparkHome = sys.env.get("SPARK_HOME").orElse(sys.props.get("spark.home")).get
// Regression test for SPARK-530: "Spark driver process doesn't exit after finishing"
val masters = Table(("master"), ("local"), ("local-cluster[2,1,512]"))
forAll(masters) { (master: String) =>
failAfter(60 seconds) {
Utils.executeAndGetOutput(
Seq("./bin/spark-class", "org.apache.spark.DriverWithoutCleanup", master),
new File(sparkHome),
Map("SPARK_TESTING" -> "1", "SPARK_HOME" -> sparkHome))
}
}
}
} /**
* Program that creates a Spark driver but doesn't call SparkContext.stop() or
* Sys.exit() after finishing.
*/
object DriverWithoutCleanup {
def main(args: Array[String]) {
Logger.getRootLogger().setLevel(Level.WARN)
val sc = new SparkContext(args(0), "DriverWithoutCleanup")
sc.parallelize(1 to 100, 4).count()
}
}

executeAndGetOutput方法接受一个command命令,调用spark-class来执行DriverWithoutCleanup类。

 /**
* Execute a command and get its output, throwing an exception if it yields a code other than 0.
*/
def executeAndGetOutput(command: Seq[String], workingDir: File = new File("."),
extraEnvironment: Map[String, String] = Map.empty): String = {
val builder = new ProcessBuilder(command: _*)
.directory(workingDir)
val environment = builder.environment()
for ((key, value) <- extraEnvironment) {
environment.put(key, value)
}
val process = builder.start() //启动一个进程来执行spark job
new Thread("read stderr for " + command(0)) {
override def run() {
for (line <- Source.fromInputStream(process.getErrorStream).getLines) {
System.err.println(line)
}
}
}.start()
val output = new StringBuffer
val stdoutThread = new Thread("read stdout for " + command(0)) { //读取spark job的输出
override def run() {
for (line <- Source.fromInputStream(process.getInputStream).getLines) {
output.append(line)
}
}
}
stdoutThread.start()
val exitCode = process.waitFor()
stdoutThread.join() // Wait for it to finish reading output
if (exitCode != 0) {
throw new SparkException("Process " + command + " exited with code " + exitCode)
}
output.toString //返回spark job的输出
}

执行第二个命令能够看到执行结果:

sbt/sbt "test-only *DriverSuite*" 

执行结果:    

[info] Compiling 1 Scala source to /app/hadoop/spark-1.0.1/core/target/scala-2.10/test-classes...
[info] DriverSuite: //执行DriverSuit这个TestSuit
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/lib_managed/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/assembly/target/scala-2.10/spark-assembly-1.0.1-hadoop0.20.2-cdh3u5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/08/14 18:20:15 WARN spark.SparkConf:
SPARK_CLASSPATH was detected (set to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*').
This is deprecated in Spark 1.0+. Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath 14/08/14 18:20:15 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
14/08/14 18:20:15 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/lib_managed/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/assembly/target/scala-2.10/spark-assembly-1.0.1-hadoop0.20.2-cdh3u5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/08/14 18:20:19 WARN spark.SparkConf:
SPARK_CLASSPATH was detected (set to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*').
This is deprecated in Spark 1.0+. Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath 14/08/14 18:20:19 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
14/08/14 18:20:19 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Spark assembly has been built with Hive, including Datanucleus jars on classpath
[info] - driver should exit after finishing
[info] ScalaTest
[info] Run completed in 12 seconds, 586 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 76 s, completed Aug 14, 2014 6:20:26 PM

測试通过, Total 1, Failed 0, Errors 0。 Passed 1。

这里假设我们略微将test case 改改。让spark job抛异常,那么这个,这样test case 就会failed掉。例如以下:

object DriverWithoutCleanup {
def main(args: Array[String]) {
Logger.getRootLogger().setLevel(Level.WARN)
val sc = new SparkContext(args(0), "DriverWithoutCleanup")
sc.parallelize(1 to 100, 4).count()
throw new RuntimeException("OopsOutOfMemory, haha, not real OOM, don't worry!") //加入此行
}

那么。再次执行測试:

会发现错误

 [info] DriverSuite:
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/lib_managed/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/assembly/target/scala-2.10/spark-assembly-1.0.1-hadoop0.20.2-cdh3u5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/08/14 18:40:07 WARN spark.SparkConf:
SPARK_CLASSPATH was detected (set to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*').
This is deprecated in Spark 1.0+. Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath 14/08/14 18:40:07 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
14/08/14 18:40:07 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
Exception in thread "main" java.lang.RuntimeException: OopsOutOfMemory, haha, not real OOM, don't worry! //自己定义抛异常使spark job执行失败,打印出了异常堆栈,測试用例失败
at org.apache.spark.DriverWithoutCleanup$.main(DriverSuite.scala:60)
at org.apache.spark.DriverWithoutCleanup.main(DriverSuite.scala)
[info] - driver should exit after finishing *** FAILED ***
[info] SparkException was thrown during property evaluation. (DriverSuite.scala:40)
[info] Message: Process List(./bin/spark-class, org.apache.spark.DriverWithoutCleanup, local) exited with code 1
[info] Occurred at table row 0 (zero based, not counting headings), which had values (
[info] master = local
[info] )
[info] ScalaTest
[info] Run completed in 4 seconds, 765 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
[info] *** 1 TEST FAILED ***
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error] org.apache.spark.DriverSuite
[error] (core/test:testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 14 s, completed Aug 14, 2014 6:40:10 PM

能够看到TEST FAILED。

  三、 总结:

  本文主要解说了,怎样执行spark的測试用例,执行所有test case,和执行单个test case的命令,并通过一个样例解说其执行正常和失败的具体情景,具体细节还须要继续摸索。

假设想做contributor,这一关必须过了。

——EOF——

原创文章,转载请注明,出自http://blog.csdn.net/oopsoom/article/details/38555173

Run Test Case on Spark的更多相关文章

  1. Spark术语

    1.resilient distributed dataset (RDD) The core programming abstraction in Spark, consisting of a fau ...

  2. Spark技术内幕: Shuffle详解(三)

    前两篇文章写了Shuffle Read的一些实现细节.但是要想彻底理清楚这里边的实现逻辑,还是需要更多篇幅的:本篇开始,将按照Job的执行顺序,来讲解Shuffle.即,结果数据(ShuffleMap ...

  3. APACHE SPARK 2.0 API IMPROVEMENTS: RDD, DATAFRAME, DATASET AND SQL

    What’s New, What’s Changed and How to get Started. Are you ready for Apache Spark 2.0? If you are ju ...

  4. How Cigna Tuned Its Spark Streaming App for Real-time Processing with Apache Kafka

    Explore the configuration changes that Cigna’s Big Data Analytics team has made to optimize the perf ...

  5. Building Lambda Architecture with Spark Streaming

    The versatility of Apache Spark’s API for both batch/ETL and streaming workloads brings the promise ...

  6. 在Java Web中使用Spark MLlib训练的模型

    PMML是一种通用的配置文件,只要遵循标准的配置文件,就可以在Spark中训练机器学习模型,然后再web接口端去使用.目前应用最广的就是基于Jpmml来加载模型在javaweb中应用,这样就可以实现跨 ...

  7. Spark记录-官网学习配置篇(一)

    参考http://spark.apache.org/docs/latest/configuration.html Spark提供三个位置来配置系统: Spark属性控制大多数应用程序参数,可以使用Sp ...

  8. Spark Structured Stream 2

    ❤Limitations of DStream API Batch Time Constraint application级别的设置. 不支持EventTime event time 比process ...

  9. spark属性

    应用属性 属性名 缺省值 意义 spark.app.name (none) The name of your application. This will appear in the UI and i ...

随机推荐

  1. Module ngx_http_v2_module

    官方配置说明: http://nginx.org/en/docs/http/ngx_http_v2_module.html#example ngx_http_v2_module模块指令中文说明 ngx ...

  2. GO -- 正则表达式

    str := "880218end" match, _ := regexp.MatchString("\\d{16}", str) //六位连续的数字 fmt. ...

  3. ios成长之每日一遍(day 7)

    今天到UITabBarController 结合 UIPickView, 这里一共有5个实现, 由浅到易. 其实在IB上面使用UITabBarController很简单, 就像平常拖控件一样拖到界面上 ...

  4. SQLite中的增删改查

    虽然android提供了sql查询的封装方法,但是理解起来还是麻烦,所以我这里用sql语句来完成工作. 首先是建立一个类,继承SQLiteOpenHelper 这里面会建立一个数据库,并且初始化一个表 ...

  5. TextView字体,行距,html格式,超链接,最大长度的设定

    颜色,大小 <!-- 设置字体的大小,推荐用sp做单位:字体颜色以#开头 --> <TextView android:id="@+id/textView1" an ...

  6. 微信Access Token 缓存方法

    微信Access Token默认缓存是2小时,但是需要特别强调,微信服务号和微信企业号缓存并不相同. (1)微信公众号号:每次Http请求Access Token 系统会返回不同的Token,并附带超 ...

  7. [转载][HASS.IO] 【HASSOS安装】成功安装HASSOS 1.9(避开了大部分坑版)

    7月20日HA官方放出HASSOS说明时,我开始入坑HASSOS,经历了安装没流量.打开主页:8123没显示.HASS.IO边栏不显示.安装不了HASS.IO插件等问题之后,在8月6日总算避开了大坑进 ...

  8. 线程 Thread Runnable 守护线程 join MD

    Markdown版本笔记 我的GitHub首页 我的博客 我的微信 我的邮箱 MyAndroidBlogs baiqiantao baiqiantao bqt20094 baiqiantao@sina ...

  9. java.sql.SQLException: ORA-00932: 数据类型不一致: 应为 -, 但却获得 CLOB

    总是报:ORA-00932: 数据类型不一致: 应为 -, 但却获得 CLOB 是由于这个a.progressAndPlan字段clob字段. 第一种解决方法: a.progressAndPlan 改 ...

  10. 大数据开发实战:Hive优化实战2-大表join小表优化

    4.大表join小表优化 和join相关的优化主要分为mapjoin可以解决的优化(即大表join小表)和mapjoin无法解决的优化(即大表join大表),前者相对容易解决,后者较难,比较麻烦. 首 ...