spark on yarn 运行问题记录
问题一:
18/03/15 07:59:23 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1521099425266_0002 failed 2 times due to AM Container for appattempt_1521099425266_0002_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://spark1:8088/proxy/application_1521099425266_0002/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1521099425266_0002_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
此问题一般和内存有关,调大内存
再把虚拟和物理监控线程关闭
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
问题二:
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.kfk
start time: 1521115132862
final status: FAILED
tracking URL: http://spark1:8088/cluster/app/application_1521099425266_0002
user: kfk
Exception in thread "main" org.apache.spark.SparkException: Application application_1521099425266_0002 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/03/15 07:59:23 INFO util.ShutdownHookManager: Shutdown hook called
18/03/15 07:59:23 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-edf48e42-1bda-41b6-8a1b-7f9e176da728
此问题一般是由于集群配置原因,检查jdk ,yarn 的配置文件
问题三:
diagnostics: Application application_1521099425266_0004 failed 2 times due to Error launching appattempt_1521099425266_0004_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1521213771615 found 1521138303131
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
同步集群的时间即可,本人集群其实一直都是时钟同步的,但是不知道为什么第三个节点会突然时钟错乱,jdk版本也错乱了
问题问题四:
Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
2018-03-16 11:59:29,345 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1521214648009_0003 State change from FINAL_SAVING to FAILED
2018-03-16 11:59:29,346 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1521214648009_0003 failed 2 times due to AM Container for appattempt_1521214648009_0003_000002 exited with exitCode: 15
For more detailed output, check application tracking page:http://spark2:8088/proxy/application_1521214648009_0003/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1521214648009_0003_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Container exited with a non-zero exit code 15
Failing this attempt. Failing the application. APPID=application_1521214648009_0003
2018-03-16 11:59:29,346 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1521214648009_0003,name=com.spark.test.MyScalaWordCout,user=kfk,queue=root.kfk,state=FAILED,trackingUrl=http://spark2:8088/cluster/app/application_1521214648009_0003,appMasterHost=N/A,startTime=1521215923660,finishTime=1521215968592,finalStatus=FAILED
2018-03-16 11:59:30,164 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
2018-03-16 12:00:15,892 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 6667ms for sessionid 0x3622d0b65080001, closing socket connection and attempting reconnect
2018-03-16 12:00:15,996 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: None with state:Disconnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2018-03-16 12:00:15,996 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session disconnected
2018-03-16 12:00:16,123 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server spark1/192.168.208.151:2181. Will not attempt to authenticate using SASL (unknown error)
2018-03-16 12:00:17,199 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 6670ms for sessionid 0x1622882ae9c0001, closing socket connection and attempting reconnect
2018-03-16 12:00:17,301 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...
2018-03-16 12:00:17,838 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server spark3/192.168.208.153:2181. Will not attempt to authenticate using SASL (unknown error)
2018-03-16 12:00:18,838 INFO org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.208.152:35089, server: spark3/192.168.208.153:2181
2018-03-16 12:00:18,843 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server spark3/192.168.208.153:2181, sessionid = 0x1622882ae9c0001, negotiated timeout = 10000
2018-03-16 12:00:18,844 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2018-03-16 12:00:18,858 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2018-03-16 12:00:18,862 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0272731203726d32
2018-03-16 12:00:18,862 INFO org.apache.hadoop.ha.ActiveStandbyElector: But old node has our own data, so don't need to fence it.
2018-03-16 12:00:18,862 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /yarn-leader-election/rs/ActiveBreadCrumb to indicate that the local node is the most recent active...
2018-03-16 12:00:19,127 INFO org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.208.152:50168, server: spark1/192.168.208.151:2181
2018-03-16 12:00:21,384 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server spark1/192.168.208.151:2181, sessionid = 0x3622d0b65080001, negotiated timeout = 10000
2018-03-16 12:00:21,386 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/hadoop/yarn-site.xml
2018-03-16 12:00:21,387 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: None with state:SyncConnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2018-03-16 12:00:21,387 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected
2018-03-16 12:00:21,387 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored
2018-03-16 12:00:21,406 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS
2018-03-16 12:00:21,407 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Already in active state
2018-03-16 12:00:21,407 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshQueues TARGET=AdminService RESULT=SUCCESS
2018-03-16 12:00:21,408 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/hadoop/yarn-site.xml
2018-03-16 12:00:21,426 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to
2018-03-16 12:00:21,426 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to
2018-03-16 12:00:21,426 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
2018-03-16 12:00:21,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS
2018-03-16 12:00:21,432 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/hadoop/core-site.xml
2018-03-16 12:00:21,432 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/hadoop/yarn-site.xml
2018-03-16 12:00:21,450 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshSuperUserGroupsConfiguration TARGET=AdminService RESULT=SUCCESS
2018-03-16 12:00:21,450 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/hadoop/core-site.xml
2018-03-16 12:00:21,451 INFO org.apache.hadoop.security.Groups: clearing userToGroupsMap cache
2018-03-16 12:00:21,451 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshUserToGroupsMappings TARGET=AdminService RESULT=SUCCESS
2018-03-16 12:00:21,451 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=SUCCESS
这些问题看表面一般看不出来,在yarn的日志里面可以查看具体日志
问题五:
Exception in thread "main" org.apache.spark.SparkException: Application application_1521293577934_0006 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
这只是个表面错误,实际错误找到资源调度列表中的错误任务,点击进去发现实际错误
Diagnostics: User class threw exception: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://ns/opt/datas/stu2.txt
spark on yarn 运行问题记录的更多相关文章
- Spark on Yarn运行时加载的jar包
spark on yarn运行时会加载的jar包有如下: spark-submit中指定的--jars $SPARK_HOME/jars下的jar包 yarn提供的jar包 spark-submit通 ...
- Spark on Yarn运行错误:Yarn application has already ended! It might have been killed or unable to launch application master
Spark on Yarn模式运行错误: bin/spark-shell --master yarn --deploy-mode client #报错 查看8088页面上的工作日志 错误原因:在执 ...
- 大话Spark(2)-Spark on Yarn运行模式
Spark On Yarn 有两种运行模式: Yarn - Cluster Yarn - Client 他们的主要区别是: Cluster: Spark的Driver在App Master主进程内运行 ...
- Spark on YARN运行模式(图文详解)
不多说,直接上干货! 请移步 Spark on YARN简介与运行wordcount(master.slave1和slave2)(博主推荐) Spark on YARN模式的安装(spark-1.6. ...
- spark on yarn运行产生jar包冲突问题
1.1 问题描述 Spark Streaming程序解析protobuf序列化的数据时,--jars 来添加依赖的protobuf-java-3.0.0.jar包,使用local模式程序正常,使用ya ...
- Spark On Yarn搭建及各运行模式说明
之前记录Yarn:Hadoop2.0之YARN组件,这次使用Docker搭建Spark On Yarn 一.各运行模式 1.单机模式 该模式被称为Local[N]模式,是用单机的多个线程来模拟Spa ...
- Spark on YARN简介与运行wordcount(master、slave1和slave2)(博主推荐)
前期博客 Spark on YARN模式的安装(spark-1.6.1-bin-hadoop2.6.tgz +hadoop-2.6.0.tar.gz)(master.slave1和slave2)(博主 ...
- 【转载】Spark系列之运行原理和架构
参考 http://www.cnblogs.com/shishanyuan/p/4721326.html 1. Spark运行架构 1.1 术语定义 lApplication:Spark Applic ...
- 六、yarn运行模式
简介 spark的yarn运行模式根据Driver在集群中的位置分成两种: 1)yarn-client 客户端模式 2)yarn-cluster 集群模式 yarn模式和standalone模式不同, ...
随机推荐
- Linux上 ps 命令的用法
ps a 显示现行终端机下的所有程序,包括其他用户的程序.2)ps -A 显示所有程序. 3)ps c 列出程序时,显示每个程序真正的指令名称,而不包含路径,参数或常驻服务的标示. 4)ps -e 此 ...
- Spring Security入门(3-2)Spring Security对接用户的权限系统
源文链接,多谢作者的分享: http://www.360doc.com/content/14/0727/16/18637323_397445724.shtml 1.原生的spring-security ...
- Mego开发文档 - 从EF6/EFCore迁移到Mego
从EF6/EFCore迁移到Mego框架 如果您有EntityFragmework6或EntityFragmeworkCore的开发经验,在首次接触Mego框架时会发现这两个框架非常相似,本文将帮忙您 ...
- Tcl与Design Compiler (五)——综合库(时序库)和DC的设计对象
本文如果有错,欢迎留言更正:此外,转载请标明出处 http://www.cnblogs.com/IClearner/ ,作者:IC_learner 前面一直说到综合库/工艺库这些东西,现在就来讲讲讲 ...
- webservice面试题
webservice是什么? 1.基于WEB的服务,服务端整出一些资源让客户端应用访问(提供数据) 2.webservice是一个跨语言跨平台的规范(抽象) 3.是多个跨语言跨平台的应用间通信整合的方 ...
- python基础——继承实现的原理
python基础--继承实现的原理 1 继承顺序 class A(object): def test(self): print('from A') class B(A): def test(self) ...
- angular中的路径问题
我们在写项目时会遇到启动页调到引导页,引导页再调到首页, 那我们在用angular框架写这种东西的时候如果我们不细心的话就会遇到问题, 比如说找不到引导页的图片等等. 那我们怎么解决这个问题呢? 首先 ...
- POJ-2240 Arbitrage---判断正环+枚举
题目链接: https://vjudge.net/problem/POJ-2240 题目大意: 已知n种货币,以及m种货币汇率及方式,问能否通过货币转换,使得财富增加. 思路: 由于这里问的是财富有没 ...
- 同主机下Docker+nginx+tomcat负载均衡集群搭建
想用Docker模拟一下nginx+tomcat集群部署,今天折腾了一天,遇坑无数,终于在午夜即将到来之际将整个流程走通,借本文希望给同样遇到类似问题的小伙伴们留点线索. 主机环境是CentOS 7, ...
- python3全栈开发-异常处理
一. 什么是异常 异常就是程序运行时发生错误的信号(在程序出现错误时,则会产生一个异常,若程序没有处理它,则会抛出该异常,程序的运行也随之终止),在python中,错误触发的异常如下 而错误分成两种 ...