Spark本地运行成功,集群运行空指针异。
一个很久之前写的Spark作业,当时运行在local模式下。最近又开始处理这方面数据了,就打包提交集群,结果频频空指针。最开始以为是程序中有null调用了,经过排除发现是继承App导致集群运行时候无法反射获取main方法。
这个问题不难,起始我们也知道提交作业时候不能继承App,源码也看过这一部分,容易被混淆是程序的错。错误如下:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, node, executor 1): java.lang.NullPointerException
at com.daxin.stat.har.OffLineTrainModel$$anonfun$2.apply(OffLineTrainModel.scala:132)
at com.daxin.stat.har.OffLineTrainModel$$anonfun$2.apply(OffLineTrainModel.scala:128)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1353)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1353)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744) Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1353)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.take(RDD.scala:1326)
at org.apache.spark.ml.tree.impl.DecisionTreeMetadata$.buildMetadata(DecisionTreeMetadata.scala:112)
at org.apache.spark.ml.tree.impl.RandomForest$.run(RandomForest.scala:105)
at org.apache.spark.mllib.tree.RandomForest.run(RandomForest.scala:94)
at org.apache.spark.mllib.tree.RandomForest$.trainClassifier(RandomForest.scala:129)
at org.apache.spark.mllib.tree.RandomForest$.trainClassifier(RandomForest.scala:171)
at com.daxin.stat.har.OffLineTrainModel$.delayedEndpoint$com$daxin$stat$har$OffLineTrainModel$1(OffLineTrainModel.scala:145)
at com.daxin.stat.har.OffLineTrainModel$delayedInit$body.apply(OffLineTrainModel.scala:17)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.daxin.stat.har.OffLineTrainModel$.main(OffLineTrainModel.scala:17)
at com.daxin.stat.har.OffLineTrainModel.main(OffLineTrainModel.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at com.daxin.stat.har.OffLineTrainModel$$anonfun$2.apply(OffLineTrainModel.scala:132)
at com.daxin.stat.har.OffLineTrainModel$$anonfun$2.apply(OffLineTrainModel.scala:128)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1353)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1353)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Spark本地运行成功,集群运行空指针异。的更多相关文章
- hadoop本地运行与集群运行
开发环境: windows10+伪分布式(虚拟机组成的集群)+IDEA(不需要装插件) 介绍: 本地开发,本地debug,不需要启动集群,不需要在集群启动hdfs yarn 需要准备什么: 1/配置w ...
- storm单机运行与集群运行问题
使用trident接口时,storm读取kafka数据会将kafka消费记录保存起来,将消费记录的位置保存在tridentTopology.newStream()的第一个参数里, 如果设置成从头开始消 ...
- 编写Spark的WordCount程序并提交到集群运行[含scala和java两个版本]
编写Spark的WordCount程序并提交到集群运行[含scala和java两个版本] 1. 开发环境 Jdk 1.7.0_72 Maven 3.2.1 Scala 2.10.6 Spark 1.6 ...
- Spark学习笔记3(IDEA编写scala代码并打包上传集群运行)
Spark学习笔记3 IDEA编写scala代码并打包上传集群运行 我们在IDEA上的maven项目已经搭建完成了,现在可以写一个简单的spark代码并且打成jar包 上传至集群,来检验一下我们的sp ...
- Spark学习之在集群上运行Spark
一.简介 Spark 的一大好处就是可以通过增加机器数量并使用集群模式运行,来扩展程序的计算能力.好在编写用于在集群上并行执行的 Spark 应用所使用的 API 跟本地单机模式下的完全一样.也就是说 ...
- 在local模式下的spark程序打包到集群上运行
一.前期准备 前期的环境准备,在Linux系统下要有Hadoop系统,spark伪分布式或者分布式,具体的教程可以查阅我的这两篇博客: Hadoop2.0伪分布式平台环境搭建 Spark2.4.0伪分 ...
- 【Spark】SparkStreaming-提交到集群运行
SparkStreaming-提交到集群运行 spark streaming 提交_百度搜索 SparkStreaming示例在集群中运行 - CSDN博客
- Spark wordcount开发并提交到集群运行
使用的ide是eclipse package com.luogankun.spark.base import org.apache.spark.SparkConf import org.apache. ...
- Spark学习之在集群上运行Spark(6)
Spark学习之在集群上运行Spark(6) 1. Spark的一个优点在于可以通过增加机器数量并使用集群模式运行,来扩展程序的计算能力. 2. Spark既能适用于专用集群,也可以适用于共享的云计算 ...
随机推荐
- U3D MonoBehaviour
一.简介 MonoBehaviour是每个脚本派生类的基类,它定义了一个脚本文件从最初被加载到最终被销毁的一个完整过程. 这个过程通过对应的方法体现出来,在不同的方法完成不同的功能,我们把这些方法称为 ...
- PHP与MySQL的关系?
1).mysqli和PDO 属于MySQL的拓展包extension 2)my.ini开启拓展命令 extension=php_mysqli.dll extension=php_pdo_mysql.d ...
- ASOC 音频子系统框架
基于: Mini2440 开发板, Linux 3.4.2 内核 ASOC 简介: ASoC - ALSA System on Chip,是建立在标准ALSA驱动层上,为了更好地支持嵌入式处理器和移动 ...
- Apollo 3 定时/长轮询拉取配置的设计
前言 如上图所示,Apollo portal 更新配置后,进行轮询的客户端获取更新通知,然后再调用接口获取最新配置.不仅仅只有轮询,还有定时更新(默认 5 分钟一次).目的就是让客户端能够稳定的获取到 ...
- Numbers、Strings、Lists 笔记<一>
下一篇:流程控制<二> 阅读链接:官方Python3.7教程 废话:最近开始阅读python3.7文档,希望把容易混淆的知识记下来. 除法总是返回一个浮点数 >>> 8/ ...
- 【原】ActiveMq实现分布式事务一致性
前言:关于分布式事务话题一直是颇有争议的话题,在本文中通过ActiveMq 实现分布式事务做一个简单的demo;同时也让自己能在实践中可以获取经验和对分布式事务自己的一些思考. 1.本地事务 我们通常 ...
- webpack4 系列教程(六): 处理SCSS
这节课讲解webpack4中处理scss.只需要在处理css的配置上增加编译scss的 LOADER 即可.了解更多处理css的内容 >>> >>> 本节课源码 & ...
- 将表格导出为excel
<table id="tableExcel" border="1"> <tr> <th>零</th> <t ...
- Force.com 多租户架构
本文参考自官方文档. 多租户架构 作为云计算平台的先驱,Salesforce最大的特点是"软件即服务"(Software as a Service,Saas).实现这种技术的基础便 ...
- 接口自动化 基于python+Testlink+Jenkins实现的接口自动化测试框架[V2.0改进版]
基于python+Testlink+Jenkins实现的接口自动化测试框架[V2.0改进版] by:授客 QQ:1033553122 由于篇幅问题,,暂且采用网盘分享的形式: 下载地址: [授客] ...