• direct memory size

  • netty or oplog

  • 5.5kw * 20

  • 60G worker/ 26G MaxDirectMemorySize

  • 1/2 tasks per worker both error

  • some tasks can work well

  • because of memory and multithreads pattern caused by resource scrambling

  • gc-log:

2018-11-09T14:10:47.973+0800: 7393.560: [CMS-concurrent-sweep: 0.241/0.241 secs] [Times: user=0.48 sys=0.00, real=0.24 secs]
2018-11-09T14:10:47.973+0800: 7393.560: [CMS-concurrent-reset-start]
2018-11-09T14:10:48.038+0800: 7393.625: [CMS-concurrent-reset: 0.065/0.065 secs] [Times: user=0.13 sys=0.00, real=0.07 secs]
2018-11-09T14:10:50.038+0800: 7395.625: [GC (CMS Initial Mark) [1 CMS-initial-mark: 25626226K(26204160K)] 39382689K(40762048K), 0.0139416 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
2018-11-09T14:10:50.052+0800: 7395.639: [CMS-concurrent-mark-start]
2018-11-09T14:10:50.427+0800: 7396.014: [CMS-concurrent-mark: 0.375/0.375 secs] [Times: user=2.59 sys=0.02, real=0.37 secs]
2018-11-09T14:10:50.427+0800: 7396.014: [CMS-concurrent-preclean-start]
2018-11-09T14:10:50.457+0800: 7396.044: [CMS-concurrent-preclean: 0.030/0.030 secs] [Times: user=0.06 sys=0.00, real=0.03 secs]
2018-11-09T14:10:50.457+0800: 7396.044: [CMS-concurrent-abortable-preclean-start]
2018-11-09T14:10:50.457+0800: 7396.044: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2018-11-09T14:10:50.458+0800: 7396.045: [GC (CMS Final Remark) [YG occupancy: 13756466 K (14557888 K)]2018-11-09T14:10:50.458+0800: 7396.045: [GC (CMS Final Remark) 2018-11-09T14:10:50.458+0800: 7396.045: [ParNew: 13756466K->13756466K(14557888K), 0.0000233 secs] 39382693K->39382693K(40762048K), 0.0000914 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2018-11-09T14:10:50.458+0800: 7396.045: [Rescan (parallel) , 0.0138796 secs]2018-11-09T14:10:50.472+0800: 7396.059: [weak refs processing, 0.0000406 secs]2018-11-09T14:10:50.472+0800: 7396.059: [class unloading, 0.0087389 secs]2018-11-09T14:10:50.481+0800: 7396.068: [scrub symbol table, 0.0055956 secs]2018-11-09T14:10:50.487+0800: 7396.074: [scrub string table, 0.0005615 secs][1 CMS-remark: 25626226K(26204160K)] 39382693K(40762048K), 0.0290641 secs] [Times: user=0.30 sys=0.00, real=0.02 secs]
2018-11-09T14:10:50.488+0800: 7396.075: [CMS-concurrent-sweep-start]
2018-11-09T14:10:50.729+0800: 7396.316: [CMS-concurrent-sweep: 0.241/0.241 secs] [Times: user=0.48 sys=0.00, real=0.24 secs]
2018-11-09T14:10:50.729+0800: 7396.316: [CMS-concurrent-reset-start]
2018-11-09T14:10:50.794+0800: 7396.381: [CMS-concurrent-reset: 0.065/0.065 secs] [Times: user=0.13 sys=0.00, real=0.06 secs]
2018-11-09T14:10:51.734+0800: 7397.321: [GC (Allocation Failure) 2018-11-09T14:10:51.734+0800: 7397.321: [ParNew: 14280769K->14280769K(14557888K), 0.0000297 secs]2018-11-09T14:10:51.734+0800: 7397.321: [CMS: 25626226K->25626226K(26204160K), 8.7144181 secs] 39906995K->39782608K(40762048K), [Metaspace: 37753K->37753K(38912K)], 8.7146944 secs] [Times: user=8.72 sys=0.00, real=8.72 secs]
2018-11-09T14:11:00.449+0800: 7406.036: [Full GC (Allocation Failure) 2018-11-09T14:11:00.449+0800: 7406.036: [CMS: 25626226K->25626196K(26204160K), 6.1291271 secs] 39782608K->39782578K(40762048K), [Metaspace: 37753K->37753K(38912K)], 6.1292957 secs] [Times: user=6.13 sys=0.00, real=6.13 secs]
2018-11-09T14:11:06.579+0800: 7412.166: [GC (CMS Initial Mark) [1 CMS-initial-mark: 25626196K(26204160K)] 39782578K(40762048K), 0.0017634 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
2018-11-09T14:11:06.581+0800: 7412.168: [CMS-concurrent-mark-start]
2018-11-09T14:11:06.840+0800: 7412.427: [Full GC (Allocation Failure) 2018-11-09T14:11:06.840+0800: 7412.427: [CMS2018-11-09T14:11:07.867+0800: 7413.454: [CMS-concurrent-mark: 1.033/1.286 secs] [Times: user=5.11 sys=0.61, real=1.28 secs]
(concurrent mode failure): 26150484K->26150474K(26204160K), 7.8489326 secs] 40314100K->39782414K(40762048K), [Metaspace: 37784K->37784K(38912K)], 7.8491778 secs] [Times: user=11.81 sys=0.39, real=7.85 secs]
2018-11-09T14:11:14.690+0800: 7420.277: [Full GC (Allocation Failure) 2018-11-09T14:11:14.690+0800: 7420.277: [CMS: 26150474K->26150474K(26204160K), 1.2736921 secs] 39782414K->39782404K(40762048K), [Metaspace: 37784K->37784K(38912K)], 1.2738487 secs] [Times: user=1.28 sys=0.00, real=1.27 secs]
  • stdout
2018-11-09 14:09:01,703 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: for one batch with 400036 in 67002 ms..
2018-11-09 14:09:01,703 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: for one batch with stepRead is 0 ..
2018-11-09 14:09:05,694 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: dataBlock read finished with 41660237 ..
2018-11-09 14:09:07,408 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: Calculate Delta for update ...
2018-11-09 14:09:11,398 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: Begin to update parameter in PS ...
2018-11-09 14:09:11,406 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMModel: Start to push w0 from PS ...
2018-11-09 14:11:06,588 FATAL [pool-5-thread-1] com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache: merge OpLogMergeMessage [update=com.tencent.angel.ml.math.vector.DenseDoubleVector@77d861dc, toString()=OpLogMessage [matrixId=1, type=MERGE, context=com.tencent.angel.psagent.task.TaskContext@16aa8654TaskContext [index=0, matrix clocks=(matrixId=0,clock=2)(matrixId=1,clock=1)(matrixId=2,clock=1)], seqId=17]] falied,
java.lang.OutOfMemoryError: Java heap space
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:158)
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:169)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.resize(SparseDoubleVector.java:495)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:564)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:555)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:35)
at com.tencent.angel.ml.math.matrix.RowbaseMatrix.plusBy(RowbaseMatrix.java:126)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLog.merge(MatrixOpLog.java:160)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache$Merger.run(MatrixOpLogCache.java:444)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-11-09 14:11:15,964 FATAL [pool-5-thread-2] com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache: merge OpLogMergeMessage [update=com.tencent.angel.ml.math.vector.DenseDoubleVector@316dd3c4, toString()=OpLogMessage [matrixId=1, type=MERGE, context=com.tencent.angel.psagent.task.TaskContext@16aa8654TaskContext [index=0, matrix clocks=(matrixId=0,clock=2)(matrixId=1,clock=1)(matrixId=2,clock=1)], seqId=18]] falied,
java.lang.OutOfMemoryError: Java heap space
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:158)
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:169)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.resize(SparseDoubleVector.java:495)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:564)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:555)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:35)
at com.tencent.angel.ml.math.matrix.RowbaseMatrix.plusBy(RowbaseMatrix.java:126)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLog.merge(MatrixOpLog.java:160)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache$Merger.run(MatrixOpLogCache.java:444)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-11-09 14:11:15,969 INFO [pool-5-thread-1] com.tencent.angel.psagent.PSAgent: psagent falied
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.Worker: worker failed message : merge OpLogMergeMessage [update=com.tencent.angel.ml.math.vector.DenseDoubleVector@77d861dc, toString()=OpLogMessage [matrixId=1, type=MERGE, context=com.tencent.angel.psagent.task.TaskContext@16aa8654TaskContext [index=0, matrix clocks=(matrixId=0,clock=2)(matrixId=1,clock=1)(matrixId=2,clock=1)], seqId=17]] falied, Java heap space, send it to appmaster success
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.Worker: start to close all modules in worker
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.Worker: stop workerService
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.WorkerService: stop rpc server
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.ipc.NettyServer: Stopping server on 20586
2018-11-09 14:11:15,985 ERROR [Worker Heartbeat] com.tencent.angel.worker.Worker: report to appmaster failed, err:
java.lang.NullPointerException
at com.tencent.angel.worker.Worker.heartbeat(Worker.java:341)
at com.tencent.angel.worker.Worker.access$200(Worker.java:65)
at com.tencent.angel.worker.Worker$1.run(Worker.java:303)
at java.lang.Thread.run(Thread.java:745)

OpLogMergeMessage-OutOfMemoryError-JavaHeapSpace的更多相关文章

  1. OutOfMemoryError系列(1): Java heap space

    每个Java程序都只能使用一定量的内存, 这种限制是由JVM的启动参数决定的.而更复杂的情况在于, Java程序的内存分为两部分: 堆内存(Heap space)和 永久代(Permanent Gen ...

  2. Java常见的几种内存溢出及解决方法

    Java常见的几种内存溢出及解决方法[情况一]:java.lang.OutOfMemoryError:Javaheapspace:这种是java堆内存不够,一个原因是真不够(如递归的层数太多等),另一 ...

  3. 【转】JVM 堆内存设置原理

    堆内存设置 原理 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Generation),主要存放的是Java类定 ...

  4. tomcat 启动时参数设置说明

    使用Intellij idea 其发动tomcat时会配置启动vm options :-Xms128m -Xmx768m -XX:PermSize=64M -XX:MaxPermSize=512m. ...

  5. 巧解Tomcat中JVM内存溢出问题

    你对Tomcat 的JVM内存溢出问题的解决方法是否了解,这里和大家分享一下,相信本文介绍一定会让你有所收获. tomcat 的JVM内存溢出问题的解决 最近在熟悉一个开发了有几年的项目,需要把数据库 ...

  6. JVM(Java虚拟机)优化大全和案例实战

    堆内存设置 原理 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Generation),主要存放的是Java类定 ...

  7. Tomcat性能优化及JVM内存工作原理

    Java性能优化原则:代码运算性能.内存回收.应用配置(影响Java程序主要原因是垃圾回收,下面会重点介绍这方面) 代码层优化:避免过多循环嵌套.调用和复杂逻辑. Tomcat调优主要内容如下: 1. ...

  8. 关于JVM的垃圾回收(GC) 这可能是你想了解的

    目录 1 JVM中Java对象的分类 2 JVM的GC类型及触发条件 2.1 Young GC 2.2 Full GC 3 Java对象生成时的内存申请过程 3 Oracle JDK中的垃圾收集器 3 ...

  9. Permanent Space 和 Heap Space

      JVM堆内存 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Generation),主要存放的是Java类定 ...

  10. Java------------JVM(Java虚拟机)优化大全和案例实战

    JVM(Java虚拟机)优化大全和案例实战 堆内存设置 原理 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Ge ...

随机推荐

  1. [物理学与PDEs]第5章第2节 变形的描述, 应变张量 2.3 位移梯度张量与无穷小应变张量

    1.  位移向量 $$\bex {\bf u}={\bf y}-{\bf x}. \eex$$ 2.  位移梯度张量 $$\bex \n_x{\bf u}={\bf F}-{\bf I}. \eex$ ...

  2. [物理学与PDEs]第1章习题1 无限长直线的电场强度与电势

    设有一均匀分布着电荷的无限长直线, 其上的电荷线密度 (即单位长度上的电荷量) 为 $\sigma$. 试求该直线所形成的电场的电场强度及电势. 解答: 设空间上点 $P$ 到直线的距离为 $r$, ...

  3. Nmpy函数总结

    函数和方法method总览 这是个Numpy函数和方法分类排列目录. 创建数组 arange, array, copy, empty, empty_like, eye, fromfile, fromf ...

  4. Linux设置SSH登录(SecureCrt)

    背景 每次登录需要输入复杂的密码,而且不停的有人在尝试登录root账户.感觉心慌慌,所以不得不设置更加安全的登录方式. 配置SSH无密码登录需要4步 准备工作 生成公钥和私钥 导入公钥到认证文件,更改 ...

  5. 【原创】大叔经验分享(8)创建hive表时用内部表还是外部表

    内部表和外部表最主要的一个差别就是删除表或者删除分区时,底层的文件是否自动删除,内部表会自动删除,外部表不会自动删除,所以基础数据表一定要用外部表,即使误删表或分区之后,还可以很容易的恢复回来. 虽然 ...

  6. C++入门篇二

    c++是c语言的增强版,但是和c语言之间有何区别呢? c和c++的区别: 1.全局变量检测增强int a;int a=10; 2.函数检测增强,参数类型增强,返回值检测增强,函数调用参数检测增强(参数 ...

  7. 金蝶K/3 BOS产品培训教案

    K/3 BOS产品培训教案     1 K/3 BOS IDE练习案例... 2 1.1新建基础资料... 2 1.1.1新增基础资料交货地点... 2 1.2新建业务单据... 2 1.2.1新建寄 ...

  8. HTML转义字符&npsp;表示non-breaking space,unicode编码为u'\xa0',超出gbk编码范围?

    0.目录 1.参考2.问题定位不间断空格的unicode表示为 u\xa0',超出gbk编码范围?3.如何处理.extract_first().replace(u'\xa0', u' ').strip ...

  9. Metaphor of topological basis and open set

    The definition of topological basis for a space $X$ requires that each point $x$ in $X$ is contained ...

  10. python运算符——算数运算符

    加减乘除比较简单这里不多赘述了,print(2 +-*/ 3),唯一需要注意的就是整除运算,符号是“//”,整除运算取的是整数部分,而不是四舍五入哦! print(5 / 2)    这个运行的结果是 ...