开发某个项目过程中,就需求,搭建了一套测试环境。很快完成!

  后来代码中加入了许多新功能,会涉及到反复重启,然后就发现了启动特别慢。这给原本功能就不多的应用增添了许多的负担。

  我决定改变这一切!找到启动缓慢的根源,加快启动速度!

思路展开:

  1. 启动慢,先看日志,有没有什么异常?
  2. 看下是不是gc 有问题,是否占用很长时间?
  3. 日志卡住,是否是存在网络请求不通情况?
  4. 抓包查看卡住的时候,应用都做什么?
  5. 把线程堆栈打印出来,线程都在做什么?把内存dump 出来瞅瞅吧?
  6. 实在不行,本地debug不行,换一台测试机试试?
  7. 实在不行,远程debug吧?
  8. 定位到点后,再来思考解决好的解决方案?
  9. 除了解决这个问题,我们还能思考点什么?

让我按思路一个个来看解决过程!

1. 启动慢,先看日志,有没有什么异常?

  这个基本上很快就有答案了,看日志,等待期间没有问题!

  但是,我们往往会停留在这个阶段较长一段时间,因为我们总想通过其他方式,打印点什么出来!

  所以,我们可以会改日志级别,try... catch... 等等一顿操作,然后,才说算了吧,我们想想别的招!我个人还是觉得这种思路是ok的!

2. 看下是不是gc 有问题,是否占用很长时间?

  这里可能就会涉及,之前并没有打印gc日志,然后还得回去在启动参数里加上gc输出!姿势如下:

-Xloggc:/opt/logs/a_gc.log -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/logs/ -XX:+UseG1GC -XX:MaxGCPauseMillis= -XX:+ParallelRefProcEnabled -XX:ErrorFile=/run/dump/hs_err_pid%p.lo3g -XX:HeapDumpPath=/run/dump

  打开后,你看到的gc 日志可能是这样的。 你想从里面发现点什么,好像频繁没有 FullGC 之类的操作,然后又悻悻而归!

10.224: [GC concurrent-root-region-scan-start]
10.249: [GC concurrent-root-region-scan-end, 0.0252691 secs]
10.249: [GC concurrent-mark-start]
10.250: [GC concurrent-mark-end, 0.0005479 secs]
10.250: [GC remark 10.251: [Finalize Marking, 0.0001862 secs] 10.251: [GC ref-proc, 0.0026452 secs] 10.253: [Unloading, 0.0125200 secs], 0.0156381 secs]
[Times: user=0.06 sys=0.00, real=0.02 secs]
10.266: [GC cleanup 33M->33M(1024M), 0.0012152 secs]
[Times: user=0.00 sys=0.00, real=0.00 secs]
13.768: [GC pause (G1 Evacuation Pause) (young), 0.0411037 secs]
[Parallel Time: 34.5 ms, GC Workers: ]
[GC Worker Start (ms): Min: 13768.7, Avg: 13769.8, Max: 13773.1, Diff: 4.4]
[Ext Root Scanning (ms): Min: 0.0, Avg: 1.8, Max: 4.1, Diff: 4.1, Sum: 7.3]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Processed Buffers: Min: , Avg: 0.0, Max: , Diff: , Sum: ]
[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 3.1, Max: 8.8, Diff: 8.8, Sum: 12.5]
[Object Copy (ms): Min: 21.1, Avg: 28.0, Max: 30.5, Diff: 9.4, Sum: 111.9]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
[Termination Attempts: Min: , Avg: 12.2, Max: , Diff: , Sum: ]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 29.9, Avg: 33.2, Max: 34.3, Diff: 4.4, Sum: 132.6]
[GC Worker End (ms): Min: 13803.0, Avg: 13803.0, Max: 13803.0, Diff: 0.0]
[Code Root Fixup: 0.4 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 5.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 4.6 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.9 ms]
[Eden: 544.0M(544.0M)->.0B(573.0M) Survivors: 31.0M->41.0M Heap: 575.0M(1024.0M)->41.0M(1024.0M)]
[Times: user=0.13 sys=0.02, real=0.04 secs]
17.178: [GC pause (G1 Evacuation Pause) (young), 0.0648366 secs]
[Parallel Time: 59.7 ms, GC Workers: ]
[GC Worker Start (ms): Min: 17177.7, Avg: 17181.1, Max: 17184.4, Diff: 6.6]
[Ext Root Scanning (ms): Min: 0.0, Avg: 3.9, Max: 12.7, Diff: 12.7, Sum: 15.6]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Processed Buffers: Min: , Avg: 0.0, Max: , Diff: , Sum: ]
[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.2, Sum: 0.6]
[Code Root Scanning (ms): Min: 0.0, Avg: 6.5, Max: 19.4, Diff: 19.4, Sum: 26.0]
[Object Copy (ms): Min: 34.9, Avg: 45.5, Max: 51.2, Diff: 16.3, Sum: 182.1]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.3]
[Termination Attempts: Min: , Avg: 16.5, Max: , Diff: , Sum: ]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
[GC Worker Total (ms): Min: 52.9, Avg: 56.2, Max: 59.5, Diff: 6.6, Sum: 224.8]
[GC Worker End (ms): Min: 17237.3, Avg: 17237.3, Max: 17237.3, Diff: 0.0]
[Code Root Fixup: 0.5 ms]
20.648: [GC pause (G1 Evacuation Pause) (young), 0.0653569 secs]
[Parallel Time: 55.9 ms, GC Workers: ]
[GC Worker Start (ms): Min: 20648.5, Avg: 20650.4, Max: 20652.3, Diff: 3.7]
[Ext Root Scanning (ms): Min: 0.1, Avg: 2.9, Max: 9.4, Diff: 9.4, Sum: 11.7]
[Update RS (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.5, Sum: 0.7]
[Processed Buffers: Min: , Avg: 2.5, Max: , Diff: , Sum: ]
[Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.6, Diff: 0.6, Sum: 1.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 3.2, Max: 10.8, Diff: 10.8, Sum: 12.9]
[Object Copy (ms): Min: 39.4, Avg: 46.4, Max: 52.1, Diff: 12.8, Sum: 185.5]
[Termination (ms): Min: 0.0, Avg: 0.9, Max: 1.5, Diff: 1.5, Sum: 3.7]
[Termination Attempts: Min: , Avg: 1.0, Max: , Diff: , Sum: ]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 52.1, Avg: 54.0, Max: 55.8, Diff: 3.7, Sum: 215.8]
[GC Worker End (ms): Min: 20704.4, Avg: 20704.4, Max: 20704.4, Diff: 0.0]
[Code Root Fixup: 0.2 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 9.2 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 8.5 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.2 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.2 ms]
[Eden: 38.0M(.0K)->.0B(610.0M) Survivors: 43.0M->.0K Heap: 83.3M(1024.0M)->54.4M(1024.0M)]
[Times: user=0.24 sys=0.02, real=0.06 secs]
2826.037: [GC pause (Metadata GC Threshold) (young) (initial-mark), 0.0101504 secs]
[Parallel Time: 7.1 ms, GC Workers: ]
[GC Worker Start (ms): Min: 2826037.4, Avg: 2826037.5, Max: 2826037.5, Diff: 0.1]
[Ext Root Scanning (ms): Min: 1.7, Avg: 2.0, Max: 2.2, Diff: 0.5, Sum: 7.8]
[Update RS (ms): Min: 0.9, Avg: 1.0, Max: 1.0, Diff: 0.1, Sum: 3.8]
[Processed Buffers: Min: , Avg: 14.2, Max: , Diff: , Sum: ]
[Scan RS (ms): Min: 0.3, Avg: 0.7, Max: 0.9, Diff: 0.6, Sum: 2.8]
[Code Root Scanning (ms): Min: 0.2, Avg: 0.4, Max: 0.8, Diff: 0.6, Sum: 1.5]
[Object Copy (ms): Min: 2.7, Avg: 2.9, Max: 3.2, Diff: 0.5, Sum: 11.7]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: , Avg: 1.0, Max: , Diff: , Sum: ]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 6.9, Avg: 7.0, Max: 7.0, Diff: 0.1, Sum: 27.8]
[GC Worker End (ms): Min: 2826044.4, Avg: 2826044.4, Max: 2826044.4, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 2.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.6 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.0 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.5 ms]
[Eden: 354.0M(438.0M)->.0B(603.0M) Survivors: .0K->11.0M Heap: 408.3M(1024.0M)->61.4M(1024.0M)]
[Times: user=0.03 sys=0.00, real=0.01 secs]
2826.047: [GC concurrent-root-region-scan-start]
2826.054: [GC concurrent-root-region-scan-end, 0.0066742 secs]
2826.054: [GC concurrent-mark-start]
2826.116: [GC concurrent-mark-end, 0.0616457 secs]
2826.116: [GC remark 2826.116: [Finalize Marking, 0.0002202 secs] 2826.116: [GC ref-proc, 0.0031923 secs] 2826.120: [Unloading, 0.0131458 secs], 0.0169584 secs]
[Times: user=0.05 sys=0.00, real=0.01 secs]
2826.134: [GC cleanup 62M->62M(1024M), 0.0010641 secs]
[Times: user=0.00 sys=0.00, real=0.00 secs]

  不管怎么样,你知道内存不是问题了。不过一般地,在启动时就存在内存问题的应用可能确实不多!

3. 日志卡住,是否是存在网络请求不通情况?

  一般来说,我们的应用,每做一些关键操作时,都会有相应的日志输出。所以,当你日志卡住的时候,应该就是哪里出现了问题了!

  而在排除了中间出现full gc 卡顿的问题后,我们可以认为可能是网络出现问题了。

  查询网络问题必备的工具: tcpdump, lsof ...

  用法也很简单,我们可以在完全没有任何信息的前提下,进行网络抓包,从而给排查问题一点提示:

tcpdump -iany -XX     # 这样就可以看到应用的网络io情况了, 如果量太大可以先把 -XX 选项去除,只看来往情况:

  大概结果如下:

::25.770463 IP 10.5.3.215. > zt-d-xx.ssh: Flags [.], ack , win , length
::25.770478 IP zt-d-xx.ssh > 10.5.3.215.: Flags [P.], seq :, ack , win , length
::25.770482 IP 10.1.2.215. > zt-d-xx.ssh: Flags [.], ack , win , options [nop,nop,sack {:}], length
::25.770487 IP 10.1.2.215. > zt-d-xx.ssh: Flags [.], ack , win , length
::25.770554 IP zt-d-xx.ssh > 10.1.2.215.: Flags [P.], seq :, ack , win , length
::25.770579 IP zt-d-xx.ssh > 10.1.2.215.: Flags [P.], seq :, ack , win , length
::25.770612 IP zt-d-xx.ssh > 10.1.2.215.: Flags [P.], seq :, ack , win , length
::25.771813 IP 10.1.2.215. > zt-d-xx.ssh: Flags [.], ack , win , options [nop,nop,sack {:}], length
::25.771822 IP zt-d-xx.ssh > 10.1.2.215.: Flags [P.], seq :, ack , win , length
::25.771837 IP 10.1.2.215. > zt-d-xx.ssh: Flags [.], ack , win , length
::25.771840 IP 10.1.2.215. > zt-d-xx.ssh: Flags [.], ack , win , length
::25.771868 IP zt-d-xx.ssh > 10.1.2.215.: Flags [P.], seq :, ack , win , length
::25.771884 IP zt-d-xx.ssh > 10.1.2.215.: Flags [P.], seq :, ack , win , length
::25.771891 IP 10.1.2.215. > zt-d-xx.ssh: Flags [.], ack , win , length
::25.771934 IP zt-d-xx.ssh > 10.1.2.215.: Flags [P.], seq :, ack , win , length
::25.771959 IP zt-d-xx.ssh > 10.5.2.215.: Flags [P.], seq :, ack , win , length
::25.771982 IP zt-d-xx.ssh > 10.5.2.215.: Flags [P.], seq :, ack , win , length
::25.779454 IP 10.5.2.215. > zt-d-xx.ssh: Flags [.], ack , win , length
::25.779469 IP zt-d-xx.ssh > 10.5.2.215.: Flags [P.], seq :, ack , win , length
::25.779474 IP 10.5.2.215. > zt-d-xx.ssh: Flags [.], ack , win , length
::25.779476 IP 10.5.2.215. > zt-d-xx.ssh: Flags [.], ack , win , length
::25.779478 IP 10.5.2.215. > zt-d-xx.ssh: Flags [.], ack , win , length
::25.779480 IP 10.5.2.215. > zt-d-xx.ssh: Flags [.], ack , win , length
::25.779529 IP zt-d-xx.ssh > 10.5.3.215.: Flags [P.], seq :, ack , win , length
::25.779567 IP zt-d-xx.ssh > 10.5.3.215.: Flags [P.], seq :, ack , win , length

  其中,我们主要关注 Flags 参数. man tcpdump 后可以看到解释:

  Tcpflags are some combination of S (SYN), F (FIN), P (PUSH), R (RST), U (URG), W (ECN CWR), E (ECN-Echo) or '.' (ACK)

  详细说明就是:

    # SYN 表示建立连接;
    # FIN 表示关闭连接;
    # ACK 表示响应;
    # PSH 表示有 DATA数据传输;
    # RST 表示连接重置;
    # URG 表示紧急停止位;

  所以,我们在这里需要关注这么多吗? 其实不需要的,我觉得咱们可以先看下是否有 RST 连接重置的情况,如果存在,则说明这个网络是不通的,基本定位网络问题!~ 如下抓包:

::41.635937 IP zt-d-xx. > 172.1.0.17.http: Flags [S], seq , win , options [mss ,sackOK,TS val  ecr ,nop,wscale ], length
::41.636084 IP 172.1.0.17.http > zt-d-xx.: Flags [R.], seq , ack , win , length

  另外,要查看下是不是某个网络请求 只有 [S] 信号,没有 [P] 的信号, 则说明只有一端在建立连接,另一边则无响应,网络问题定位!如下请求一个不存在的地址:

::57.744349 IP zt-d-xx. > ---.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq , win , options [mss ,sackOK,TS val  ecr ,nop,wscale ], length
::58.745584 IP zt-d-xx. > ---.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq , win , options [mss ,sackOK,TS val ecr ,nop,wscale ], length
::00.749570 IP zt-d-xx. > ---.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq , win , options [mss ,sackOK,TS val ecr ,nop,wscale ], length
::04.757571 IP zt-d-xx. > ---.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq , win , options [mss ,sackOK,TS val ecr ,nop,wscale ], length
::12.765568 IP zt-d-xx. > ---.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq , win , options [mss ,sackOK,TS val ecr ,nop,wscale ], length
::28.797584 IP zt-d-xx. > ---.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq , win , options [mss ,sackOK,TS val ecr ,nop,wscale ], length
::00.861591 IP zt-d-xx. > ---.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq , win , options [mss ,sackOK,TS val ecr ,nop,wscale ], length

  发了n个 [S] 包,都没有响应; 说明整个网络都是不通的;

  通过这个排查,我们一般可以排除网络问题!

4. 抓包查看卡住的时候,应用都做什么?

  这个点其实可以早点做,使用 top 查看内存,cpu 占用情况!

  使用 vmstat 查看系统虚拟内存状态,也可以报告关于进程、内存、I/O等系统整体运行状态!

  使用 lsof -p <pid> 查看打开的文件情况!应该可以看到一些不一样的东西,比如你会看到一些你不知道被打开的文件,你会看到一些你不知道的连接请求!

lsof -i   # 列出所有的网络连接
lsof -p <pid> # 列出进程对应的文件信息
lsof -i4: # 列表8080 端口对应的 ipv4 的连接

5. 把线程堆栈打印出来,线程都在做什么?把内存dump 出来瞅瞅吧?

  如果还是没有发现什么,那我们还得回来。看看线程卡在了哪里!

  使用 jstack 把线程堆栈打印出来,使用 jmap 把内存dump 出来,分析!

jstack <pid> | less        # 查看线程信息,是否存在死锁
jmap -dump:format=b,file=/tmp/dumpid.dump <pid> # 查看堆信息,是否需要特别的对象

6. 实在不行,本地debug不行,换一台测试机试试?

  对于本地debug, 我只能说,一般环境都不通的,而且本地一般也很复现场景!

  换台机器测试的目的,其实是想确认问题是否在所有机器上复现,因为如果是和机器本身相关的问题,往往是很难排查的。(我说的没有相关经验的同学)

  所以,找一台另外机器,代码跑起来,如果其他地方正常,则该问题是机器相关的。

  那么机器相关的bug又该怎么办呢?其实,还得具体问题,具体分析!

  不过幸好,我们还有 remote debug 功能,直接连接上去就可以调试了。问题自然也就清晰起来!

7. 实在不行,远程debug吧?

  remote debug 请查看上一篇文章: 排障利器之远程调试与监控 --jmx & remote debug

  连接上之后,先大概猜测可能会出问题的地方,进行断点。判断依据,应该是之前的排查结果,线程情况等等!

  进行大概估计后,在可能的地方进行单步,卡住的地方,其中必然出现了问题。只需再进入内部重复刚才的做法即可!

  优点是这样调试下来,必定能找到问题出现的地方。缺点是:可能需要反复重现问题(可能就是反复重启)。但是这样显得没有技术含量,还有就是在问题难以复现的场景,很难抓住机会解决问题。

  所以,经验真的很重要!

  卡住时的堆栈如下:

"main@1" prio= tid=0x1 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(FileInputStream.java:-)
at java.io.FileInputStream.read(FileInputStream.java:)
at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedBytes(SeedGenerator.java:)
at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:)
at sun.security.provider.SecureRandom$SeederHolder.<clinit>(SecureRandom.java:)
at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:)
- locked <0x1af3> (a sun.security.provider.SecureRandom)
at java.security.SecureRandom.nextBytes(SecureRandom.java:)
at java.security.SecureRandom.next(SecureRandom.java:)
at java.util.Random.nextLong(Random.java:)
at com.taobao.notify.utils.StrongRandom.<init>(StrongRandom.java:)
at com.taobao.notify.utils.UniqId.<init>(UniqId.java:)
at com.taobao.notify.utils.UniqId.<clinit>(UniqId.java:)
at com.taobao.notify.client.impl.DefaultNotifyClient.<init>(DefaultNotifyClient.java:)
at com.taobao.notify.client.impl.DefaultNotifyClient.<init>(DefaultNotifyClient.java:)
at com.taobao.notify.client.NotifyClientFactory.<clinit>(NotifyClientFactory.java:)
at com.taobao.notify.remotingclient.DefaultNotifyManager.<init>(DefaultNotifyManager.java:)
at com.taobao.notify.remotingclient.NotifyManagerBean.init(NotifyManagerBean.java:)
- locked <0x1a78> (a com.alipay.common.event.tbnotify.adapter.UniformEventPublisherAdapter)
at com.alipay.common.event.tbnotify.client.NotifyManagerBeanEx.init(NotifyManagerBeanEx.java:)
at com.alipay.common.event.tbnotify.adapter.EventSendFacadeTBNotifyImpl.init(EventSendFacadeTBNotifyImpl.java:)
at com.alipay.common.event.tbnotify.client.NotifyManagerBeanEx.init(NotifyManagerBeanEx.java:)
at com.alipay.common.event.tbnotify.adapter.UniformEventPublisherAdapter.init(UniformEventPublisherAdapter.java:)
at com.alipay.boot.dms.client.util.InitializeUtil.initializePublisher(InitializeUtil.java:)
at com.alipay.boot.dms.spring.factory.DmsPublisherFactoryBean.afterPropertiesSet(DmsPublisherFactoryBean.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractBeanFactory$.getObject(AbstractBeanFactory.java:)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:)
- locked <0x1af4> (a java.util.concurrent.ConcurrentHashMap)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:)
at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:)
at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractBeanFactory$.getObject(AbstractBeanFactory.java:)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:)
at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.autowireResource(CommonAnnotationBeanPostProcessor.java:)
at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.getResource(CommonAnnotationBeanPostProcessor.java:)
at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor$ResourceElement.getResourceToInject(CommonAnnotationBeanPostProcessor.java:)
at org.springframework.beans.factory.annotation.InjectionMetadata$InjectedElement.inject(InjectionMetadata.java:)
at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:)
at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.postProcessPropertyValues(CommonAnnotationBeanPostProcessor.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractBeanFactory$.getObject(AbstractBeanFactory.java:)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:)
at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:)
at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.autowireResource(CommonAnnotationBeanPostProcessor.java:)
at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.getResource(CommonAnnotationBeanPostProcessor.java:)
at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor$ResourceElement.getResourceToInject(CommonAnnotationBeanPostProcessor.java:)
at org.springframework.beans.factory.annotation.InjectionMetadata$InjectedElement.inject(InjectionMetadata.java:)
at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:)
at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.postProcessPropertyValues(CommonAnnotationBeanPostProcessor.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractBeanFactory$.getObject(AbstractBeanFactory.java:)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:)
- locked <0x1af5> (a java.lang.Object)
at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.refresh(EmbeddedWebApplicationContext.java:)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:)
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:)
at com.xx.SOFABootSpringApplication.main(SOFABootSpringApplication.java:)
at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:)
at java.lang.reflect.Method.invoke(Method.java:)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:)

  而根据堆栈进行的大概猜想,开始排查,过程如下:

    // 初始代码位置:
// com.taobao.notify.remotingclient.NotifyManagerBean
public synchronized void init() {
if (inited.get()) {
return;
} if (StringUtil.isEmpty(this.name) || StringUtil.isEmpty(this.description)) {
throw new RuntimeException("The name and description property is required.");
}
if (null == notifyManager) { AllNotifyClientProperties properties = new AllNotifyClientProperties();
properties.setCheckMessageTPConfig(checkMessageTPConfig);
properties.setMessageTPConfig(messageTPConfig);
properties.setMessageProperties(messageProperties);
properties.setAntVipConfig(antVipConfig); // 问题出在这里,new 这个对象时,会触发很多的其他类加载,而这里面就包含了有问题的类
notifyManager = new DefaultNotifyManager(groupId, appName, name, description,
messageListener, checkMessageListener, 1, enableCrc, properties); notifyManager.subscribeAllMessages(this.subscribeMessages, this.bindingList); /** ������ظ���groupId������ɾ��֮ǰ��publish topic*/
//notifyManager.resetPublishTopics(publishTopics);
for (String publishTopic : publishTopics) {
notifyManager.addPublishTopic(publishTopic);
} // ��ʼ��ʱ����servertag��url�Ķ�Ӧ�б�
notifyManager.resetServerTagUrls();
} inited.set(true);
} // com.taobao.notify.remotingclient.DefaultNotifyManager 这个构造方法是 ok 的
public DefaultNotifyManager(String groupId, String appName, String name, String description,
MessageListener msgListener, CheckMessageListener checkMsgListener,
int connCount, boolean enableCrc,
AllNotifyClientProperties properties) {
initNotifyClient(properties); if (StringUtil.isBlank(groupId)) {
throw new IllegalArgumentException("GroupID cannot be blank.");
}
if (StringTools.containsWhitespace(groupId)) {
throw new IllegalArgumentException("GroupID cannot contain the blank character. ["
+ groupId + "]");
}
this.groupId = groupId.trim();
this.appName = StringUtil.trim(appName); if (null == properties) {
properties = new AllNotifyClientProperties();
}
this.notifyclient.addGroup(this.groupId, this.appName, name, description, msgListener,
checkMsgListener, properties.getMessageProperties(), properties.getWaitForConnTime(),
enableCrc, properties.getAntVipConfig());
this.notifyclient.setConnectionCount(connCount);
}
// 但是问题在静态字段的初始化问题
// com.taobao.notify.remotingclient.DefaultNotifyManager
public class DefaultNotifyManager implements NotifyManager, DefaultNotifyManagerMBean {
//��־
private static final Logger logger = MsgBrokerClientLoggerFactory
.getLogger(DefaultNotifyManager.class); // 这看起来像是个单例的 client
private NotifyClient notifyclient = NotifyClientFactory
.getSingletonNotifyClient();
}
// com.taobao.notify.client.NotifyClientFactory
public class NotifyClientFactory implements FactoryBean { // 初始化类时,会先初始化一个单例
static NotifyClient notifyClient = new DefaultNotifyClient(); /**
*
* @return
*/
public static NotifyClient getSingletonNotifyClient() {
return notifyClient;
}
}
// 到此,一切看起来都很美好,让我继续往下看
// com.taobao.notify.client.impl.DefaultNotifyClient
public class DefaultNotifyClient implements NotifyClient {
private static final String LogPrefix = LoggerPrefix
.makeLogPrefix(DefaultNotifyClient.class);
static final Logger logger = MsgBrokerClientLoggerFactory
.getLogger(DefaultNotifyClient.class); private volatile NotifyClientConfig notifyClientConfig = null; private static final int MAX_TIMES = 3; protected final NotifyGroupManager notifyGroupManager = new NotifyGroupManager(); private final RemotingService remotingService; protected final ClientSubscriptionManager clientSubscriptionManager = new ClientSubscriptionManager(); protected final PublishTopicsManager publishTopicsManager = new PublishTopicsManager(); private final ThreadPoolExecutor asynSendMessageWorkTP; private final LoggingService loggingService = LoggingService
.getInstance(); private final ReliableAsynSendManager reliableAsynSendManager; private final MessageProperties reliableMessageProperties = new MessageProperties(); // 问题在于 UniqId 的生成
private final UniqId uniqIdInstance = UniqId.getInstance(); /**
*
*/
public DefaultNotifyClient() {
this(new NotifyClientConfig());
} /**
* @param notifyClientConfig
*/
public DefaultNotifyClient(final NotifyClientConfig notifyClientConfig) {
if (null == this.notifyClientConfig) {
this.notifyClientConfig = notifyClientConfig;
}
if (notifyClientConfig.isDebug()) {
this.remotingService = new IntegratedMockRemotingService(
this.clientSubscriptionManager, notifyClientConfig.getDebugRemoteSubscribers(),
this.notifyGroupManager, notifyClientConfig.getDebugLocalPort());
logger.warn(LogPrefix + "Note that, this is a MsgbrokerClient in the DEBUG mode.");
logger.warn(LogPrefix + "In the DEBUG mode, local port ["
+ notifyClientConfig.getDebugLocalPort() + "]");
logger.warn(LogPrefix + "In the DEBUG mode, connections ["
+ notifyClientConfig.getDebugRemoteSubscribers() + "]");
} else {
this.remotingService = new DefaultRemotingService(
this.notifyClientConfig.getRemotingType(), this.notifyGroupManager,
notifyClientConfig);
} // ��ֹѹ��
this.reliableMessageProperties.setCompressSize(Integer.MAX_VALUE);
this.reliableMessageProperties.setMaxStringMessageSize(Integer.MAX_VALUE); this.asynSendMessageWorkTP = new ManagedThreadPoolExecutor(notifyClientConfig
.getAsynSendMessageTPConfig().getCorePoolSize(), notifyClientConfig
.getAsynSendMessageTPConfig().getMaxPoolSize(), notifyClientConfig
.getAsynSendMessageTPConfig().getKeepAliveTime(), notifyClientConfig
.getAsynSendMessageTPConfig().getMaxQueueSize(), "asynSendMsgTP-" + this.hashCode(),
new ThreadPoolExecutor.AbortPolicy()); this.reliableAsynSendManager = new ReliableAsynSendManager(this, notifyGroupManager,
notifyClientConfig);
if (notifyClientConfig.isPreInitializeReliableAsynSendManager()) {
this.reliableAsynSendManager.init();
} // DefaultNotifyClientʵ�������ж������Ҫ���䲻ͬ��ID
try {
ThreadPoolConfig messageTPConfig = notifyClientConfig.getMessageTPConfig();
Lookout.registry().info(
Lookout.registry().createId("msgbroker.client.group.infos")
.withTag("hashcode", String.valueOf(this.hashCode()))
.withTag("corePoolSize", String.valueOf(messageTPConfig.getCorePoolSize()))
.withTag("maxPoolSize", String.valueOf(messageTPConfig.getMaxPoolSize()))
.withTag("queueSize", String.valueOf(messageTPConfig.getMaxQueueSize()))
.withTag("channelSize", String.valueOf(messageTPConfig.getChannelSize())),
new Info<Set<String>>() {
public Set<String> value() {
return notifyGroupManager.getGroupIds();
}
});
} catch (Exception e) {
logger.warn("register msgbroker.client.group.infos failed, " + e.getMessage());
}
}
}
// UniqId 生成如下
// com.taobao.notify.utils.UniqId
public class UniqId {
private static final Logger log = MsgBrokerClientLoggerFactory.getLogger(UniqId.class);
private static final String LogPrefix = LoggerPrefix.makeLogPrefix(UniqId.class); private static char[] digits = { '0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', 'a', 'b', 'c', 'd', 'e', 'f' }; private static Map<Character, Integer> rDigits = new HashMap<Character, Integer>(16);
static {
for (int i = 0; i < digits.length; ++i) {
rDigits.put(digits[i], i);
}
} private static UniqId me = new UniqId();
private String hostAddr;
// 实际问题在于这里了, StrongRandom 的问题
private Random random = new StrongRandom();
private MessageDigest mHasher;
private UniqTimer timer = new UniqTimer(); private ReentrantLock opLock = new ReentrantLock(); private UniqId() {
try {
InetAddress addr = InetAddress.getLocalHost(); hostAddr = addr.getHostAddress();
} catch (IOException e) {
log.error(LogPrefix + "Get HostAddr Error", e);
hostAddr = String.valueOf(System.currentTimeMillis());
} if (Util.isBlank(hostAddr) || "127.0.0.1".equals(hostAddr)) {
hostAddr = String.valueOf(System.currentTimeMillis());
} if (log.isDebugEnabled()) {
log.debug(LogPrefix + "hostAddr is:" + hostAddr);
} try {
mHasher = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException nex) {
mHasher = null;
log.error(LogPrefix + "new MD5 Hasher error", nex);
}
} /**
* ��ȡUniqIDʵ��
*
* @return UniqId
*/
public static UniqId getInstance() {
return me;
}
} // StrongRandom 生成缓慢
// com.taobao.notify.utils.StrongRandom
public StrongRandom() {
try {
// This operation may block on some systems with low entropy. See
// this page
// for workaround suggestions:
// http://docs.codehaus.org.display.JETTY.Connectors+slow+to+startup
random = SecureRandom.getInstance(SESSION_ID_RANDOM_ALGORITHM);
} catch (NoSuchAlgorithmException e) {
try {
random = SecureRandom.getInstance(SESSION_ID_RANDOM_ALGORITHM_ALT);
weakRandom = false;
} catch (NoSuchAlgorithmException e_alt) {
random = new Random();
weakRandom = true;
}
}
// 就是这里缓慢了,卡住大约一分钟,且无法调试
random.setSeed(random.nextLong() ^ System.currentTimeMillis() ^ hashCode()
^ Runtime.getRuntime().freeMemory());
}

  最后,定位到是 随机数生成有问题了!

8. 定位到点后,再来思考解决好的解决方案?

  如上,我们知道了是因为 SecureRandom 生成随机数时出现了问题!

  那么,如何解决?我能想到的,就是网上搜索答案了。因为这时候已经没有什么可以参考的了。

  搜索内容就是, SecureRandom 生成随机缓慢解决方案? 然后就有答案了!

  当然,你可以咨询遇到过这些问题的前辈们,那样更快速!

  原因如下:

java生成随机码时,会使用两个文件:
1. /dev/random , 随机性高,和真实的物理环境有关,阻塞模式。(本文出现的启动缓慢问题,就出在这个上面)
2. /dev/urandom ,伪随机模式,非阻塞,随机性不如 random。
所以,既然是生成random模式有问题,那么想办法换掉这个模式就ok了。

  jdk中,提供相应的替换方法,我们先看 $JAVA_HOME/jre/lib/security/java.security 这个里面有个随机数的配置,其说明如下:

#
# Sun Provider SecureRandom seed source.
#
# Select the primary source of seed data for the "SHA1PRNG" and
# "NativePRNG" SecureRandom implementations in the "Sun" provider.
# (Other SecureRandom implementations might also use this property.)
#
# On Unix-like systems (for example, Solaris/Linux/MacOS), the
# "NativePRNG" and "SHA1PRNG" implementations obtains seed data from
# special device files such as file:/dev/random.
#
# On Windows systems, specifying the URLs "file:/dev/random" or
# "file:/dev/urandom" will enable the native Microsoft CryptoAPI seeding
# mechanism for SHA1PRNG.
#
# By default, an attempt is made to use the entropy gathering device
# specified by the "securerandom.source" Security property. If an
# exception occurs while accessing the specified URL:
#
# SHA1PRNG:
# the traditional system/thread activity algorithm will be used.
#
# NativePRNG:
# a default value of /dev/random will be used. If neither
# are available, the implementation will be disabled.
# "file" is the only currently supported protocol type.
#
# The entropy gathering device can also be specified with the System
# property "java.security.egd". For example:
#
# % java -Djava.security.egd=file:/dev/random MainClass
#
# Specifying this System property will override the
# "securerandom.source" Security property.
#
# In addition, if "file:/dev/random" or "file:/dev/urandom" is
# specified, the "NativePRNG" implementation will be more preferred than
# SHA1PRNG in the Sun provider.
#
securerandom.source=file:/dev/random

  即 可以直接改这个文件,也可以通过命令行加参数修改,且命令行优先级更高,所以我们使用命令行修改即可:

    -Djava.security.egd=file:/dev/./urandom # 在项目中添加java_opts,里面添加启动参数 urandom 

  如此,再次验证后,启动正常了。问题done。

9. 除了解决这个问题,我们还能思考点什么?

  关于 SecureRandom 为什么慢的原因,可以看下这篇文章:  https://blog.csdn.net.zengdeqing2012/article.details/78133370

老话:说了不一定有机会,但不说一定没机会。

一个JAVA应用启动缓慢问题排查 --来自jdk securerandom 的问候的更多相关文章

  1. Java诊断利器Arthas优雅排查生产环境

    前言 Arthas 是Alibaba开源的Java诊断工具.在线排查问题,无需重启:动态跟踪Java代码:实时监控JVM状态.对分秒必争的线上异常,Arthas可帮助我们快速诊断相关问题. 下载安装 ...

  2. webmagic的设计机制及原理-如何开发一个Java爬虫

    之前就有网友在博客里留言,觉得webmagic的实现比较有意思,想要借此研究一下爬虫.最近终于集中精力,花了三天时间,终于写完了这篇文章.之前垂直爬虫写了一年多,webmagic框架写了一个多月,这方 ...

  3. Java面向对象编程 第二章 第一个Java应用

    2.1创建Java源文件 Java应用由一个或多个扩展名为".java"的文件构成,这些文件被称为Java源文件,从编译的角度,则被称为编译单元. 本章包含两个Java源文件:Do ...

  4. Gradle入门(3):构建第一个Java项目

    Gradle插件通过引入特定领域的约定和任务来构建你的项目.Java插件是Gradle自身装载的一个插件.Java插件提供的基本功能远比源代码编译和打包多.它为你的项目建立了一个标准的项目布局,并确保 ...

  5. webmagic的设计机制及原理-如何开发一个Java爬虫 转

    此文章是webmagic 0.1.0版的设计手册,后续版本的入门及用户手册请看这里:https://github.com/code4craft/webmagic/blob/master/user-ma ...

  6. 作为一个java高级工程师的进阶之路

    本文可能可能更偏向于是内心的独白篇和面试技巧总结 一.独白 之前也面试别人,现在轮到自己找工作,怎么说呢,每个面试官的看法不一样,面试的方式就不一样,比如我面试别人我喜欢问项目中他用到了那些,然后针对 ...

  7. java:线上问题排查常用手段(转)

    出处:java:线上问题排查常用手段 一.jmap找出占用内存较大的实例 先给个示例代码: import java.util.ArrayList; import java.util.List; imp ...

  8. Jetty启动报错排查org.eclipse.jetty.util.MultiException: Multiple exceptions

    最近自己搭建了一个spring的项目,使用Maven做项目构建,使用JDK8,为了方便启动就使用jetty作为启动容器,但是却无意间步入了一个坑 [WARNING] Failed startup of ...

  9. 搭建java开发环境、使用eclipse编写第一个java程序

    搭建java开发环境.使用eclipse编写第一个java程序 一.Java 开发环境的搭建 1.首先安装java SDK(简称JDK). 点击可执行文件 jdk-6u24-windows-i586. ...

随机推荐

  1. Java生鲜电商平台-电商会员体系系统的架构设计与源码解析

    Java生鲜电商平台-电商会员体系系统的架构设计与源码解析 说明:Java生鲜电商平台中会员体系作为电商平台的基础设施,重要性不容忽视.我去年整理过生鲜电商中的会员系统,但是比较粗,现在做一个最好的整 ...

  2. Java日期时间API系列1-----Jdk7及以前的日期时间类

    先看一个简单的图: 主要的类有: Date类负责时间的表示,在计算机中,时间的表示是一个较大的概念,现有的系统基本都是利用从1970.1.1 00:00:00 到当前时间的毫秒数进行计时,这个时间称为 ...

  3. 完美解决linux下vim在终端不能用鼠标复制的问题

    在vim 中输入 :set mouse=r 就行了,还不行的话 :set mouse=v

  4. Kali linux-信息收集-dmitry

    信息收集-dmitry DMitry(Deepmagic Information Gathering Tools 深度信息收集工具)是一个kali linux下用C语言写的工具.主要功能为端口扫描,w ...

  5. Python3使用线程

    Python2标准库中提供了两个模块thread和threading支持多线程.thread有一些缺陷在Python3中弃用,为了兼容性,python3 将 thread 重命名为 "_th ...

  6. MySQL基础之数据管理【2】

    where条件筛选记录 select id,username,age from uesr where id=5; alter table user add userDesc varchar(100); ...

  7. Node.js实现热加载

    不管是node.js原生开发,还是借助express,kora等框架开发node.js的情况下,在对代码做出更新后,都是需要重启已生效我们的文件的. 本文记录一次在原生node.js开发的时候,为项目 ...

  8. logstash的安装,启动与输出

    在相应的目录下 wget https://artifacts.elastic.co/downloads/logstash/logstash-6.6.2.tar.gz 解压 tar -zxvf  log ...

  9. centOS7安装mysql8.0完美教程!!只要按照步骤,无脑操作,一次成功!

    查看防火墙systemctl status firewalld重启防火墙systemctl start firewalld 1.mysql 首先关闭防火墙 systemctl stop firewal ...

  10. BayaiM__SQLLDR_linux_shell高级版

    BayaiM__SQLLDR_linux_shell高级版   备注:1.因公司在职,商业机密,顾IP地方加了"*"号,你可以任意写一个数字做IP做就好.2.不要瞎BB,哥自己写的 ...