一个很久之前写的Spark作业,当时运行在local模式下。最近又开始处理这方面数据了,就打包提交集群,结果频频空指针。最开始以为是程序中有null调用了,经过排除发现是继承App导致集群运行时候无法反射获取main方法。

这个问题不难,起始我们也知道提交作业时候不能继承App,源码也看过这一部分,容易被混淆是程序的错。错误如下:

  1. Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, node, executor 1): java.lang.NullPointerException
  2. at com.daxin.stat.har.OffLineTrainModel$$anonfun$2.apply(OffLineTrainModel.scala:132)
  3. at com.daxin.stat.har.OffLineTrainModel$$anonfun$2.apply(OffLineTrainModel.scala:128)
  4. at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
  5. at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
  6. at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
  7. at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
  8. at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  9. at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  10. at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
  11. at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
  12. at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
  13. at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
  14. at scala.collection.AbstractIterator.to(Iterator.scala:1336)
  15. at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
  16. at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
  17. at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
  18. at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
  19. at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1353)
  20. at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1353)
  21. at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
  22. at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
  23. at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
  24. at org.apache.spark.scheduler.Task.run(Task.scala:99)
  25. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
  26. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  27. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  28. at java.lang.Thread.run(Thread.java:744)
  29.  
  30. Driver stacktrace:
  31. at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
  32. at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
  33. at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
  34. at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  35. at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  36. at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
  37. at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
  38. at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
  39. at scala.Option.foreach(Option.scala:257)
  40. at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
  41. at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
  42. at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
  43. at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
  44. at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  45. at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
  46. at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
  47. at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
  48. at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
  49. at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1353)
  50. at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  51. at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  52. at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  53. at org.apache.spark.rdd.RDD.take(RDD.scala:1326)
  54. at org.apache.spark.ml.tree.impl.DecisionTreeMetadata$.buildMetadata(DecisionTreeMetadata.scala:112)
  55. at org.apache.spark.ml.tree.impl.RandomForest$.run(RandomForest.scala:105)
  56. at org.apache.spark.mllib.tree.RandomForest.run(RandomForest.scala:94)
  57. at org.apache.spark.mllib.tree.RandomForest$.trainClassifier(RandomForest.scala:129)
  58. at org.apache.spark.mllib.tree.RandomForest$.trainClassifier(RandomForest.scala:171)
  59. at com.daxin.stat.har.OffLineTrainModel$.delayedEndpoint$com$daxin$stat$har$OffLineTrainModel$1(OffLineTrainModel.scala:145)
  60. at com.daxin.stat.har.OffLineTrainModel$delayedInit$body.apply(OffLineTrainModel.scala:17)
  61. at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
  62. at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
  63. at scala.App$$anonfun$main$1.apply(App.scala:76)
  64. at scala.App$$anonfun$main$1.apply(App.scala:76)
  65. at scala.collection.immutable.List.foreach(List.scala:381)
  66. at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
  67. at scala.App$class.main(App.scala:76)
  68. at com.daxin.stat.har.OffLineTrainModel$.main(OffLineTrainModel.scala:17)
  69. at com.daxin.stat.har.OffLineTrainModel.main(OffLineTrainModel.scala)
  70. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  71. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  72. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  73. at java.lang.reflect.Method.invoke(Method.java:606)
  74. at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
  75. at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
  76. at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
  77. at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
  78. at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  79. Caused by: java.lang.NullPointerException
  80. at com.daxin.stat.har.OffLineTrainModel$$anonfun$2.apply(OffLineTrainModel.scala:132)
  81. at com.daxin.stat.har.OffLineTrainModel$$anonfun$2.apply(OffLineTrainModel.scala:128)
  82. at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
  83. at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
  84. at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
  85. at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
  86. at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  87. at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  88. at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
  89. at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
  90. at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
  91. at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
  92. at scala.collection.AbstractIterator.to(Iterator.scala:1336)
  93. at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
  94. at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
  95. at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
  96. at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
  97. at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1353)
  98. at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1353)
  99. at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
  100. at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
  101. at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
  102. at org.apache.spark.scheduler.Task.run(Task.scala:99)
  103. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
  104. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  105. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  106. at java.lang.Thread.run(Thread.java:744)

Spark本地运行成功,集群运行空指针异。的更多相关文章

  1. hadoop本地运行与集群运行

    开发环境: windows10+伪分布式(虚拟机组成的集群)+IDEA(不需要装插件) 介绍: 本地开发,本地debug,不需要启动集群,不需要在集群启动hdfs yarn 需要准备什么: 1/配置w ...

  2. storm单机运行与集群运行问题

    使用trident接口时,storm读取kafka数据会将kafka消费记录保存起来,将消费记录的位置保存在tridentTopology.newStream()的第一个参数里, 如果设置成从头开始消 ...

  3. 编写Spark的WordCount程序并提交到集群运行[含scala和java两个版本]

    编写Spark的WordCount程序并提交到集群运行[含scala和java两个版本] 1. 开发环境 Jdk 1.7.0_72 Maven 3.2.1 Scala 2.10.6 Spark 1.6 ...

  4. Spark学习笔记3(IDEA编写scala代码并打包上传集群运行)

    Spark学习笔记3 IDEA编写scala代码并打包上传集群运行 我们在IDEA上的maven项目已经搭建完成了,现在可以写一个简单的spark代码并且打成jar包 上传至集群,来检验一下我们的sp ...

  5. Spark学习之在集群上运行Spark

    一.简介 Spark 的一大好处就是可以通过增加机器数量并使用集群模式运行,来扩展程序的计算能力.好在编写用于在集群上并行执行的 Spark 应用所使用的 API 跟本地单机模式下的完全一样.也就是说 ...

  6. 在local模式下的spark程序打包到集群上运行

    一.前期准备 前期的环境准备,在Linux系统下要有Hadoop系统,spark伪分布式或者分布式,具体的教程可以查阅我的这两篇博客: Hadoop2.0伪分布式平台环境搭建 Spark2.4.0伪分 ...

  7. 【Spark】SparkStreaming-提交到集群运行

    SparkStreaming-提交到集群运行 spark streaming 提交_百度搜索 SparkStreaming示例在集群中运行 - CSDN博客

  8. Spark wordcount开发并提交到集群运行

    使用的ide是eclipse package com.luogankun.spark.base import org.apache.spark.SparkConf import org.apache. ...

  9. Spark学习之在集群上运行Spark(6)

    Spark学习之在集群上运行Spark(6) 1. Spark的一个优点在于可以通过增加机器数量并使用集群模式运行,来扩展程序的计算能力. 2. Spark既能适用于专用集群,也可以适用于共享的云计算 ...

随机推荐

  1. 光流法详解之一(LK光流)

    Lucas–Kanade光流算法是一种两帧差分的光流估计算法.它由Bruce D. Lucas 和 Takeo Kanade提出 [1]. LK光流法有三个假设条件: 1. 亮度恒定:一个像素点随着时 ...

  2. ZooKeeper 分布式锁

    在Redis分布式锁一文中, 作者介绍了如何使用Redis开发分布式锁. Redis分布式锁具有轻量高吞吐量的特点,但是一致性保证较弱.我们可以使用Zookeeper开发分布式锁,来满足对高一致性的要 ...

  3. Spring JDBCTemplate连接SQL Server之初体验

    前言 在没有任何框架的帮助下我们操作数据库都是用jdbc,耗时耗力,那么有了Spring,我们则不用重复造轮子了,先来试试Spring JDBC增删改查,其中关键就是构造JdbcTemplate类. ...

  4. WPF Application 类介绍以及怎样修改启动方式

    因为想要修改wpf的启动方式,所以研究了下Application类,现把一些有用的属性与大家分享下: 属性: Current                  获取当前 AppDomain的 Appl ...

  5. PowerDesigner反向生成物理数据模型

    什么是PowerDesigner Power Designer 是Sybase公司的CASE工具集,使用它可以方便地对管理信息系统进行分析设计,它几乎包括了数据库模型设计的全过程.利用Power De ...

  6. C#操作IIS站点 Microsoft.Web.Administration.dll

    利用IIS7自带类库管理IIS现在变的更强大更方便,而完全可以不需要用DirecotryEntry这个类了(网上很多.net管理iis6.0的文章都用到了DirecotryEntry这个类 ),Mic ...

  7. border-sizing属性详解和应用

    box-sizing用于更改用于计算元素宽度和高度的默认的 CSS 盒子模型.它有content-box.border-box和inherit三种取值.inherit指的是从父元素继承box-sizi ...

  8. Python 练习:三级菜单选择城市

    info = { 'GuangDong':{ 'GuangZhou': ['TianHe', 'HaiZhu'], 'MaoMing': ['MaoNan', 'DianBai']}, 'ShanDo ...

  9. 使用CSS如何解决inline-block元素的空白间距

    早上在博客中有人提了这样一个问题:“li元素inline-block横向排列,出现了未知间隙”,我相信大家在写页面的时候都遇到过这样的情况吧. 我一般遇到这情况都会把li浮动起来,这样就没有间隙.但是 ...

  10. excel单元格内换行的方法

    方法一:调整单元格格式换行 选定单元格,选择“格式→单元格”,在弹出的对话框中单击“对齐”,选中“自动换行”,单击[确定]按钮即可. 方法二:Alt+Enter键(使用强行换行时,系统会同时选择自动换 ...