OpLogMergeMessage-OutOfMemoryError-JavaHeapSpace
direct memory size
netty or oplog
5.5kw * 20
60G worker/ 26G MaxDirectMemorySize
1/2 tasks per worker both error
some tasks can work well
because of memory and multithreads pattern caused by resource scrambling
gc-log:
2018-11-09T14:10:47.973+0800: 7393.560: [CMS-concurrent-sweep: 0.241/0.241 secs] [Times: user=0.48 sys=0.00, real=0.24 secs]
2018-11-09T14:10:47.973+0800: 7393.560: [CMS-concurrent-reset-start]
2018-11-09T14:10:48.038+0800: 7393.625: [CMS-concurrent-reset: 0.065/0.065 secs] [Times: user=0.13 sys=0.00, real=0.07 secs]
2018-11-09T14:10:50.038+0800: 7395.625: [GC (CMS Initial Mark) [1 CMS-initial-mark: 25626226K(26204160K)] 39382689K(40762048K), 0.0139416 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
2018-11-09T14:10:50.052+0800: 7395.639: [CMS-concurrent-mark-start]
2018-11-09T14:10:50.427+0800: 7396.014: [CMS-concurrent-mark: 0.375/0.375 secs] [Times: user=2.59 sys=0.02, real=0.37 secs]
2018-11-09T14:10:50.427+0800: 7396.014: [CMS-concurrent-preclean-start]
2018-11-09T14:10:50.457+0800: 7396.044: [CMS-concurrent-preclean: 0.030/0.030 secs] [Times: user=0.06 sys=0.00, real=0.03 secs]
2018-11-09T14:10:50.457+0800: 7396.044: [CMS-concurrent-abortable-preclean-start]
2018-11-09T14:10:50.457+0800: 7396.044: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2018-11-09T14:10:50.458+0800: 7396.045: [GC (CMS Final Remark) [YG occupancy: 13756466 K (14557888 K)]2018-11-09T14:10:50.458+0800: 7396.045: [GC (CMS Final Remark) 2018-11-09T14:10:50.458+0800: 7396.045: [ParNew: 13756466K->13756466K(14557888K), 0.0000233 secs] 39382693K->39382693K(40762048K), 0.0000914 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2018-11-09T14:10:50.458+0800: 7396.045: [Rescan (parallel) , 0.0138796 secs]2018-11-09T14:10:50.472+0800: 7396.059: [weak refs processing, 0.0000406 secs]2018-11-09T14:10:50.472+0800: 7396.059: [class unloading, 0.0087389 secs]2018-11-09T14:10:50.481+0800: 7396.068: [scrub symbol table, 0.0055956 secs]2018-11-09T14:10:50.487+0800: 7396.074: [scrub string table, 0.0005615 secs][1 CMS-remark: 25626226K(26204160K)] 39382693K(40762048K), 0.0290641 secs] [Times: user=0.30 sys=0.00, real=0.02 secs]
2018-11-09T14:10:50.488+0800: 7396.075: [CMS-concurrent-sweep-start]
2018-11-09T14:10:50.729+0800: 7396.316: [CMS-concurrent-sweep: 0.241/0.241 secs] [Times: user=0.48 sys=0.00, real=0.24 secs]
2018-11-09T14:10:50.729+0800: 7396.316: [CMS-concurrent-reset-start]
2018-11-09T14:10:50.794+0800: 7396.381: [CMS-concurrent-reset: 0.065/0.065 secs] [Times: user=0.13 sys=0.00, real=0.06 secs]
2018-11-09T14:10:51.734+0800: 7397.321: [GC (Allocation Failure) 2018-11-09T14:10:51.734+0800: 7397.321: [ParNew: 14280769K->14280769K(14557888K), 0.0000297 secs]2018-11-09T14:10:51.734+0800: 7397.321: [CMS: 25626226K->25626226K(26204160K), 8.7144181 secs] 39906995K->39782608K(40762048K), [Metaspace: 37753K->37753K(38912K)], 8.7146944 secs] [Times: user=8.72 sys=0.00, real=8.72 secs]
2018-11-09T14:11:00.449+0800: 7406.036: [Full GC (Allocation Failure) 2018-11-09T14:11:00.449+0800: 7406.036: [CMS: 25626226K->25626196K(26204160K), 6.1291271 secs] 39782608K->39782578K(40762048K), [Metaspace: 37753K->37753K(38912K)], 6.1292957 secs] [Times: user=6.13 sys=0.00, real=6.13 secs]
2018-11-09T14:11:06.579+0800: 7412.166: [GC (CMS Initial Mark) [1 CMS-initial-mark: 25626196K(26204160K)] 39782578K(40762048K), 0.0017634 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
2018-11-09T14:11:06.581+0800: 7412.168: [CMS-concurrent-mark-start]
2018-11-09T14:11:06.840+0800: 7412.427: [Full GC (Allocation Failure) 2018-11-09T14:11:06.840+0800: 7412.427: [CMS2018-11-09T14:11:07.867+0800: 7413.454: [CMS-concurrent-mark: 1.033/1.286 secs] [Times: user=5.11 sys=0.61, real=1.28 secs]
(concurrent mode failure): 26150484K->26150474K(26204160K), 7.8489326 secs] 40314100K->39782414K(40762048K), [Metaspace: 37784K->37784K(38912K)], 7.8491778 secs] [Times: user=11.81 sys=0.39, real=7.85 secs]
2018-11-09T14:11:14.690+0800: 7420.277: [Full GC (Allocation Failure) 2018-11-09T14:11:14.690+0800: 7420.277: [CMS: 26150474K->26150474K(26204160K), 1.2736921 secs] 39782414K->39782404K(40762048K), [Metaspace: 37784K->37784K(38912K)], 1.2738487 secs] [Times: user=1.28 sys=0.00, real=1.27 secs]
- stdout
2018-11-09 14:09:01,703 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: for one batch with 400036 in 67002 ms..
2018-11-09 14:09:01,703 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: for one batch with stepRead is 0 ..
2018-11-09 14:09:05,694 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: dataBlock read finished with 41660237 ..
2018-11-09 14:09:07,408 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: Calculate Delta for update ...
2018-11-09 14:09:11,398 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: Begin to update parameter in PS ...
2018-11-09 14:09:11,406 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMModel: Start to push w0 from PS ...
2018-11-09 14:11:06,588 FATAL [pool-5-thread-1] com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache: merge OpLogMergeMessage [update=com.tencent.angel.ml.math.vector.DenseDoubleVector@77d861dc, toString()=OpLogMessage [matrixId=1, type=MERGE, context=com.tencent.angel.psagent.task.TaskContext@16aa8654TaskContext [index=0, matrix clocks=(matrixId=0,clock=2)(matrixId=1,clock=1)(matrixId=2,clock=1)], seqId=17]] falied,
java.lang.OutOfMemoryError: Java heap space
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:158)
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:169)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.resize(SparseDoubleVector.java:495)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:564)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:555)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:35)
at com.tencent.angel.ml.math.matrix.RowbaseMatrix.plusBy(RowbaseMatrix.java:126)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLog.merge(MatrixOpLog.java:160)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache$Merger.run(MatrixOpLogCache.java:444)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-11-09 14:11:15,964 FATAL [pool-5-thread-2] com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache: merge OpLogMergeMessage [update=com.tencent.angel.ml.math.vector.DenseDoubleVector@316dd3c4, toString()=OpLogMessage [matrixId=1, type=MERGE, context=com.tencent.angel.psagent.task.TaskContext@16aa8654TaskContext [index=0, matrix clocks=(matrixId=0,clock=2)(matrixId=1,clock=1)(matrixId=2,clock=1)], seqId=18]] falied,
java.lang.OutOfMemoryError: Java heap space
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:158)
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:169)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.resize(SparseDoubleVector.java:495)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:564)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:555)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:35)
at com.tencent.angel.ml.math.matrix.RowbaseMatrix.plusBy(RowbaseMatrix.java:126)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLog.merge(MatrixOpLog.java:160)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache$Merger.run(MatrixOpLogCache.java:444)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-11-09 14:11:15,969 INFO [pool-5-thread-1] com.tencent.angel.psagent.PSAgent: psagent falied
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.Worker: worker failed message : merge OpLogMergeMessage [update=com.tencent.angel.ml.math.vector.DenseDoubleVector@77d861dc, toString()=OpLogMessage [matrixId=1, type=MERGE, context=com.tencent.angel.psagent.task.TaskContext@16aa8654TaskContext [index=0, matrix clocks=(matrixId=0,clock=2)(matrixId=1,clock=1)(matrixId=2,clock=1)], seqId=17]] falied, Java heap space, send it to appmaster success
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.Worker: start to close all modules in worker
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.Worker: stop workerService
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.WorkerService: stop rpc server
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.ipc.NettyServer: Stopping server on 20586
2018-11-09 14:11:15,985 ERROR [Worker Heartbeat] com.tencent.angel.worker.Worker: report to appmaster failed, err:
java.lang.NullPointerException
at com.tencent.angel.worker.Worker.heartbeat(Worker.java:341)
at com.tencent.angel.worker.Worker.access$200(Worker.java:65)
at com.tencent.angel.worker.Worker$1.run(Worker.java:303)
at java.lang.Thread.run(Thread.java:745)
OpLogMergeMessage-OutOfMemoryError-JavaHeapSpace的更多相关文章
- OutOfMemoryError系列(1): Java heap space
每个Java程序都只能使用一定量的内存, 这种限制是由JVM的启动参数决定的.而更复杂的情况在于, Java程序的内存分为两部分: 堆内存(Heap space)和 永久代(Permanent Gen ...
- Java常见的几种内存溢出及解决方法
Java常见的几种内存溢出及解决方法[情况一]:java.lang.OutOfMemoryError:Javaheapspace:这种是java堆内存不够,一个原因是真不够(如递归的层数太多等),另一 ...
- 【转】JVM 堆内存设置原理
堆内存设置 原理 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Generation),主要存放的是Java类定 ...
- tomcat 启动时参数设置说明
使用Intellij idea 其发动tomcat时会配置启动vm options :-Xms128m -Xmx768m -XX:PermSize=64M -XX:MaxPermSize=512m. ...
- 巧解Tomcat中JVM内存溢出问题
你对Tomcat 的JVM内存溢出问题的解决方法是否了解,这里和大家分享一下,相信本文介绍一定会让你有所收获. tomcat 的JVM内存溢出问题的解决 最近在熟悉一个开发了有几年的项目,需要把数据库 ...
- JVM(Java虚拟机)优化大全和案例实战
堆内存设置 原理 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Generation),主要存放的是Java类定 ...
- Tomcat性能优化及JVM内存工作原理
Java性能优化原则:代码运算性能.内存回收.应用配置(影响Java程序主要原因是垃圾回收,下面会重点介绍这方面) 代码层优化:避免过多循环嵌套.调用和复杂逻辑. Tomcat调优主要内容如下: 1. ...
- 关于JVM的垃圾回收(GC) 这可能是你想了解的
目录 1 JVM中Java对象的分类 2 JVM的GC类型及触发条件 2.1 Young GC 2.2 Full GC 3 Java对象生成时的内存申请过程 3 Oracle JDK中的垃圾收集器 3 ...
- Permanent Space 和 Heap Space
JVM堆内存 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Generation),主要存放的是Java类定 ...
- Java------------JVM(Java虚拟机)优化大全和案例实战
JVM(Java虚拟机)优化大全和案例实战 堆内存设置 原理 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Ge ...
随机推荐
- django上下文处理器的基本使用
1.定义一个方法 2.在django里面的settings.py里面修改配置文件 3.最后在模板里面调用 操做步骤如下: 这是在settings.py里面配置的文件 在模板里面调用上下文处理器
- laravel5.4 导出 Excel 表格
1.执行 composer require maatwebsite/excel 2. composer.json 文件出现(或者手动添加) 3.在config目录下 app.php 添加参数 4.导出 ...
- Windows2016的 IIS中配置PHP7运行环境
Windows2016的 IIS中配置PHP7运行环境 在Windows 的IIS(8.0)中搭建PHP运行环境: 一:安装IIS服务器 .进入控制面板>>程序和功能>>打开或 ...
- kafka知识点详解
第一部分:kafka概述 一.定义(消息引擎系统) 一句话概括kafka的核心功能就是:高性能的消息发送与高性能的消息消费. kafka刚推出的时候是以消息引擎的身份出现的,它具有强大的消息传输效率和 ...
- lua 调用参数报错(a userdata value)
本来想用这种方式统一安全删除node,后来发现参数变成(a userdata value),所以不能用下面这个方式做 方法如下: function RemoveNodeSafe(node) if no ...
- poj3126 Prime Path 广搜bfs
题目: The ministers of the cabinet were quite upset by the message from the Chief of Security stating ...
- python学习记录20181207
1.python中函数指针的用法 直接把函数名赋值给变量,不需要加上()和形参 如: def add(num1,num2): return num1+num2 fun = add 2.打印输出列表需要 ...
- vue2.0 事件处理常用修饰符-----------------记录,加强记忆。
1,<!-- 阻止单击事件继续传播 --> <a v-on:click.stop="doThis"></a> stop修饰符对应的是阻止冒泡的e ...
- Mac OS X系统下,svn: Can't remove file Operation not permitted.解决方案
当你的svn出现类似以下错误时,提示Operation not permitted之类的问题,说明项目下 .svn文件夹内的文件权限有问题. 一般是由于windows和mac操作系统同时操作同个svn ...
- Windows Internals 笔记——终止进程
1.进程可以通过以下四种方式终止: 主线程的入口点函数返回(强烈推荐的方式) 进程中的一个线程调用ExitProcess函数(避免这种方式) 另一个进程中的线程调用TerminateProcess函数 ...