在cloudstack4.5.2版本下,偶尔出现libvirtd服务无响应的情况,导致virsh命令无法使用,同时伴随cloudstack master丢失该slave主机连接的情况。怀疑是libvirtd服务或版本的问题,但是在官网上并没有找到类似的bug提交,该问题可能还存在于更高的版本,需要时间进一步从根本上分析。下面是该问题的处理过程,在此记录下,关注和使用cloudstack的朋友可以参考。

众所周知,cloudstack的社区热度远不如openstack,为什么还要选择clcoudstack?这个问题以后有机会再和大家聊。言归正传。

环境交代

宿主机操作系统:centos6.5x64(2.6.32-431.el6.x86_64)
cloudstack版本:4.5.2
libvirt版本:libvirt-0.10.2-54.el6_7.2.x86_64

问题描述

通过cloudstackapi listHosts报警信息显示:
node5.cloud.rtmap:192.168.14.20 state is Down at 2016-05-13T07:19:04+0800
#有关cloudstackapi的使用方法在其它文章中总结,不在此处说明。

登陆问题宿主服务器检查:
[root@node5 log]#virsh list --all
没有响应ctrl^c退出
这时的vm可以正常工作,但处于失控状态

尝试重启启动libvirtd服务:
[root@node5 log]# service libvirtd stop
正在关闭 libvirtd 守护进程:                               [失败]  #无法关闭libvirtd服务

尝试重启启动cloudstack-agent服务:
[root@node5 libvirt]# service cloudstack-agent restart
Stopping Cloud Agent:
Starting Cloud Agent:
libvirtd故障依旧

简单维护

[root@node5 ping]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf
libvirtd:错误:Unable to initialize network sockets。查看 /var/log/messages 或者运行不带 --daemon 的命令查看更多信息。

[root@node5 log]# libvirtd -d
可以执行成功,这时执行virsh list --all 可以查看和操作vm

[root@node5 log]#virsh list --all
Id    名称                         状态
----------------------------------------------------
 2     i-4-185-VM                     running

虽然vm运行正常,现在也可以通过命令正常管理了。但是对于cloudstack平台而言,宿主机处于down状态,vm处于失控状态。

临时解决办法是在其它大的升级和维护过程中重启服务器解决,根本解决还要具体问题具体分析。

分析与排查

检查进程

  1. [root@node5 log]# ps ax |grep libvirtd
  2. ? R : libvirtd --daemon -l  #该服务始终处于run状态
  3.  
  4. [root@node5 log]# top -p
  5. top -p
  6. top - :: up days, :, user, load average: 3.05, 5.07, 6.64
  7. Tasks: total, running, sleeping, stopped, zombie
  8. Cpu(s): 4.8%us, 1.4%sy, 0.0%ni, 93.1%id, 0.6%wa, 0.0%hi, 0.1%si, 0.0%st
  9. Mem: 264420148k total, 182040780k used, 82379368k free, 834232k buffers
  10. Swap: 8388600k total, 92k used, 8388508k free, 100453708k cached
  11.  
  12. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  13. root 984m 12m R 100.2 0.0 :22.68 libvirtd       #cpu占用100%,无法释放,影响系统稳定性

杀进程

  1. [root@node5 log]# kill -
  2. [root@node5 log]# kill -
  3. [root@master log]# ps ax |grep libvirtd  #检查进程依然存在
  4. ? R : libvirtd --daemon -l
  5. [root@node5 ~]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf
  6. libvirtd:错误:Unable to initialize network sockets。查看 /var/log/messages 或者运行不带 --daemon 的命令查看更多信息。
  7. [root@node5 ~]# netstat -antp |grep
  8. tcp 0.0.0.0: 0.0.0.0:* LISTEN /libvirtd
  9. tcp 192.168.14.25: 192.168.14.22: CLOSE_WAIT -
  10. tcp 192.168.14.25: 192.168.14.20: CLOSE_WAIT -
  11. tcp 192.168.14.25: 192.168.14.10: CLOSE_WAIT -
  12. tcp ::: :::* LISTEN /libvirtd
  13. tcp ::: ::: CLOSE_WAIT -

经过上述操作,初步判断libvirtd陷入了hang死状态。

追踪进程

  1. [root@node5 log]#strace -f libvirtd      #后来发现这样strace是不对的,正确的方法是strace -f -p 针对pid进行strace
  2. [pid ] close() = - EBADF (Bad file descriptor)
  3. [pid ] close() = - EBADF (Bad file descriptor)
  4. [pid ] close() = - EBADF (Bad file descriptor)
  5. [pid ] close() = - EBADF (Bad file descriptor)
  6. [pid ] close() = - EBADF (Bad file descriptor)
  7. [pid ] close() = - EBADF (Bad file descriptor)
  8. [pid ] close() = - EBADF (Bad file descriptor)
  9. [pid ] close() = - EBADF (Bad file descriptor)
  10. [pid ] close() = - EBADF (Bad file descriptor)
  11. [pid ] close() = - EBADF (Bad file descriptor)
  12. [pid ] close() = - EBADF (Bad file descriptor)
  13. [pid ] close() = - EBADF (Bad file descriptor)
  14. [pid ] close() = - EBADF (Bad file descriptor)
  15. [pid ] close() = - EBADF (Bad file descriptor)
  16. ^C[pid ] close( <unfinished ...>
  17. Process detached
  18. Process detached
  19. Process detached
  20. Process detached
  21. Process detached
  22. Process detached
  23. Process detached
  24. Process detached
  25. Process detached
  26. Process detached
  27. Process detached
  28. Process detached

父进程6485在不断的产生和关闭子进程,并返回错误信息。Bad file descriptor的原因(如何触发的,谁触发的)? 循环为何无法退出?问题如何再现?

获得更多的线索
官方文档(libvirtd各种故障诊断记录和解决办法非常详尽)
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-libvirtd_failed_to_start

开启系统日志
Change libvirt's logging in /etc/libvirt/libvirtd.conf by enabling the line below. To enable the setting the line, open the /etc/libvirt/libvirtd.conf file in a text editor, remove the hash (or #) symbol from the beginning of the following line, and save the change:
  1. log_outputs="3:syslog:libvirtd"

参照配置,重启服务器等待下次故障观察日志

  1. ......
  2.  
  3. Jun :: node5 abrtd: New client connected
  4. Jun :: node5 abrtd: Directory 'pyhook-2016-06-01-12:42:26-70065' creation detected
  5. Jun :: node5 abrt-server[]: Saved Python crash dump of pid to /var/spool/abrt/pyhook----::-
  6. Jun :: node5 abrtd: Package 'cloudstack-common' isn't signed with proper key
  7. Jun :: node5 abrtd: 'post-create' on '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065' exited with
  8. Jun :: node5 abrtd: Deleting problem directory '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065'
  9. Jun :: node5 abrt: detected unhandled Python exception in '/usr/share/cloudstack-common/scripts/vm/network/security_group.py'
  10. ......
  11.  
  12. Jun :: node5 libvirtd: : warning : qemuDomainObjBeginJobInternal: : Cannot start job (modify, none) for domain i---VM; current job is (modify, none) owned by (, )
  13. Jun :: node5 libvirtd: : error : qemuDomainObjBeginJobInternal: : Timed out during operation: cannot acquire state change lock
  14. Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
  15. Jun :: node5 libvirtd: : error : virNetSocketNewListenTCP: : Unable to bind to port: 地址已在使用
  16. Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
  17. Jun :: node5 libvirtd: : error : virNetSocketNewListenTCP: : Unable to bind to port: 地址已在使用
  18. Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
  19. Jun :: node5 libvirtd: : error : virNetSocketNewListenTCP: : Unable to bind to port: 地址已在使用
  20. Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
  21. Jun :: node5 libvirtd: : error : virNetSocketNewListenTCP: : Unable to bind to port: 地址已在使用
  22. Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
  23. Jun :: node5 libvirtd: : error : virNetSocketNewListenTCP: : Unable to bind to port: 地址已在使用
  24. ......
  25.  
  26. Jun :: node5 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
  27. Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
  28. Jun :: node5 libvirtd: : error : virPidFileAcquirePath: : Failed to acquire pid file '/var/run/libvirtd.pid': 资源暂时不可用

并未获得致命错误和更多线索。(该日志配置选项还是很有必要打开的,很多问题都可以通过它来定位)

解决过程

解决思路

  • 尝试和找到终止进程、重启服务的方法
  • 提交bug,等待补丁升级
  • 分析源代码,再现问题,解决问题(投入研发和时间)

由于不能再现问题,还是从简入繁吧。触发这些子进程的元凶是谁?还是cloudstack-agent的嫌疑最大,但之前重启过该服务并没有解决问题,那么agent服务是怎么一回事呢?

看下启动脚本可以基本了解,

[root@node5 libvirt]# cat /etc/rc.d/init.d/cloudstack-agent

  1. #!/bin/bash
  2.  
  3. # chkconfig:
  4. # description: Cloud Agent
  5.  
  6. # Licensed to the Apache Software Foundation (ASF) under one
  7. # or more contributor license agreements. See the NOTICE file
  8. # distributed with this work for additional information
  9. # regarding copyright ownership. The ASF licenses this file
  10. # to you under the Apache License, Version 2.0 (the
  11. # "License"); you may not use this file except in compliance
  12. # with the License. You may obtain a copy of the License at
  13. #
  14. # http://www.apache.org/licenses/LICENSE-2.0
  15. #
  16. # Unless required by applicable law or agreed to in writing,
  17. # software distributed under the License is distributed on an
  18. # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  19. # KIND, either express or implied. See the License for the
  20. # specific language governing permissions and limitations
  21. # under the License.
  22.  
  23. # WARNING: if this script is changed, then all other initscripts MUST BE changed to match it as well
  24.  
  25. . /etc/rc.d/init.d/functions
  26.  
  27. # set environment variables
  28.  
  29. SHORTNAME=$(basename $ | sed -e 's/^[SK][0-9][0-9]//')
  30. PIDFILE=/var/run/"$SHORTNAME".pid
  31. LOCKFILE=/var/lock/subsys/"$SHORTNAME"
  32. LOGDIR=/var/log/cloudstack/agent
  33. LOGFILE=${LOGDIR}/agent.log
  34. PROGNAME="Cloud Agent"
  35. CLASS="com.cloud.agent.AgentShell"
  36. JSVC=`which jsvc >/dev/null`;
  37.  
  38. # exit if we don't find jsvc
  39. if [ -z "$JSVC" ]; then
  40. echo no jsvc found in path;
  41. exit ;
  42. fi
  43.  
  44. unset OPTIONS
  45. [ -r /etc/sysconfig/"$SHORTNAME" ] && source /etc/sysconfig/"$SHORTNAME"
  46.  
  47. # The first existing directory is used for JAVA_HOME (if JAVA_HOME is not defined in $DEFAULT)
  48. JDK_DIRS="/usr/lib/jvm/jre /usr/lib/jvm/java-7-openjdk /usr/lib/jvm/java-7-openjdk-i386 /usr/lib/jvm/java-7-openjdk-amd64 /usr/lib/jvm/java-6-openjdk /usr/lib/jvm/java-6-openjdk-i386 /usr/lib/jvm/java-6-openjdk-amd64 /usr/lib/jvm/java-6-sun"
  49.  
  50. for jdir in $JDK_DIRS; do
  51. if [ -r "$jdir/bin/java" -a -z "${JAVA_HOME}" ]; then
  52. JAVA_HOME="$jdir"
  53. fi
  54. done
  55. export JAVA_HOME
  56.  
  57. ACP=`ls /usr/share/cloudstack-agent/lib/*.jar | tr '\n' ':' | sed s'/.$//'`
  58. PCP=`ls /usr/share/cloudstack-agent/plugins/*.jar 2>/dev/null | tr '\n' ':' | sed s'/.$//'`
  59.  
  60. # We need to append the JSVC daemon JAR to the classpath
  61. # AgentShell implements the JSVC daemon methods
  62. export CLASSPATH="/usr/share/java/commons-daemon.jar:$ACP:$PCP:/etc/cloudstack/agent:/usr/share/cloudstack-common/scripts"
  63.  
  64. start() {
  65. echo -n $"Starting $PROGNAME: "
  66. if hostname --fqdn >/dev/null 2>&1 ; then
  67. $JSVC -Xms256m -Xmx2048m -cp "$CLASSPATH" -pidfile "$PIDFILE" \
  68. -errfile $LOGDIR/cloudstack-agent.err -outfile $LOGDIR/cloudstack-agent.out $CLASS
  69. RETVAL=$?
  70. echo
  71. else
  72. failure
  73. echo
  74. echo The host name does not resolve properly to an IP address. Cannot start "$PROGNAME". > /dev/stderr
  75. RETVAL=9
  76. fi
  77. [ $RETVAL = 0 ] && touch ${LOCKFILE}
  78. return $RETVAL
  79. }
  80.  
  81. stop() {
  82. echo -n $"Stopping $PROGNAME: "
  83. $JSVC -pidfile "$PIDFILE" -stop $CLASS
  84. RETVAL=$?
  85. echo
  86. [ $RETVAL = 0 ] && rm -f ${LOCKFILE} ${PIDFILE}
  87. }
  88.  
  89. case "$1" in
  90. start)
  91. start
  92. ;;
  93. stop)
  94. stop
  95. ;;
  96. status)
  97. status -p ${PIDFILE} $SHORTNAME
  98. RETVAL=$?
  99. ;;
  100. restart)
  101. stop
  102. sleep 3
  103. start
  104. ;;
  105. condrestart)
  106. if status -p ${PIDFILE} $SHORTNAME >&/dev/null; then
  107. stop
  108. sleep 3
  109. start
  110. fi
  111. ;;
  112. *)
  113. echo $"Usage: $SHORTNAME {start|stop|restart|condrestart|status|help}"
  114. RETVAL=3
  115. esac
  116.  
  117. exit $RETVAL

[root@node5 libvirt]# ps ax |grep jsvc.exec

  1. [root@node5 libvirt]# ps ax |grep jsvc.exec
  2. ? Ss : jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4..jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7..jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3..jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2..jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5..jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7..jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2..jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0..jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1..jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7..jar:/usr/share/cloudstack-agent/lib/dom4j-1.6..jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6..jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0..jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7..jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7..jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3..jar:/usr/share/cloudstack-agent/lib/httpcore-4.3..jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8..jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8..jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1..jar:/usr/share/cloudstack-agent/lib/jasypt-1.9..jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12..GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0..jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar
  3. ? Sl : jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4..jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7..jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3..jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2..jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5..jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7..jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2..jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0..jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1..jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7..jar:/usr/share/cloudstack-agent/lib/dom4j-1.6..jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6..jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0..jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7..jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7..jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3..jar:/usr/share/cloudstack-agent/lib/httpcore-4.3..jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8..jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8..jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1..jar:/usr/share/cloudstack-agent/lib/jasypt-1.9..jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12..GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0..jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar

重启服务

  1. [root@node5 bin]# service cloudstack-agent status
  2. cloudstack-agent (pid ) 正在运行...
  1. [root@node5 bin]# service cloudstack-agent stop
  2. Stopping Cloud Agent:
  3.  
  4. [root@node5 bin]# service cloudstack-agent status
    cloudstack-agent (pid  6657) 正在运行..

ps ax |grep jsvc.exec 也验证了进程依然存在

眼前一亮的同时,也发现了之前使用restart带来的问题,stop不成功的问题被掩盖了~~~有没有懊恼? 不过来不及反思,接下来的问题还远不是这么简单......

  1. [root@node5 bin]# kill -
  2. [root@node5 bin]# kill -
  3. -bash: kill: () - 没有那个进程
  4. -bash: kill: () - 没有那个进程
  5. [root@node5 bin]# service cloudstack-agent status
  6. cloudstack-agent 已死,但 pid 文件仍存
  7. [root@node5 bin]# rm /var/run/cloudstack-agent.pid
  8. rm:是否删除普通文件 "/var/run/cloudstack-agent.pid"y
  9. [root@node5 bin]# service cloudstack-agent status
  10. cloudstack-agent 已死,但是 subsys 被锁
  11. [root@node5 bin]# service cloudstack-agent start
  12. [root@node5 bin]# service cloudstack-agent status
  13. cloudstack-agent (pid ) 正在运行...
  14. [root@node5 bin]# netstat -antp |grep
  15. tcp 192.168.14.20: 192.168.14.10: ESTABLISHED /jsvc.exec

处理后状态恢复正常,但是libvirtd仍然无法杀掉, 很快netstat -antp |grep 8250 状态再次消失,cloudstack master平台监控主机记录由Up状态转为disconnect状态。不过毕竟不是down状态,较之前已经有了进步。

启动一个libvirtd -d看下,

  1. [root@node5 bin]# libvirtd -d
  2. [root@node5 bin]# ps ax |grep libvirtd
  3. ? R : libvirtd --daemon -l
  4. ? Sl : libvirtd -d
  5. pts/ S+ : grep libvirtd

然后在cloudstack master平台上手工点击强制重新连接该主机,成功了。主机监控状态由disconnect转为Up,这时再次尝试杀掉仍然是不成功的,于是又在cloudstack master管理平台上尝试着点击操作了一下暂停vm命令,vm成功暂停。再返回服务器上观察原来hung死的libvirtd进程已经消失。

  1. [root@node5 bin]# libvirtd -d
  2. [root@node5 bin]# ps ax |grep libvirtd
  3. ? Sl : libvirtd -d
  4. pts/ S+ : grep libvirtd

至此既恢复了平台对该主机的管控,也终止了libvirtd异常进程。问题初步归于cloudstack-agent在处理发送个libvirtd的信号上存在些小问题。以后再单独分析下jsvc进程,再现问题和根本解决。

问题反思

在处理服务异常的问题上,命令行参数不要用restart,用stop和kill来调试。说起来都是泪!

相关链接

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-libvirtd_failed_to_start

-本文完-

2016-06-13 15:20:22

cloudstack下libvirtd服务无响应问题的更多相关文章

  1. [httpd] httpd server 在低负载的情况下对SYN无响应

    如题: 两台client通过load balance访问httpd server.两个client交互访问.load balance处于fullnat模式. server在低负载情况下,常常对某一个c ...

  2. Win8/Win10下程序经常无响应的解决办法

    如果你使用Win8/Win10系统时经常出现程序无响应的问题不仿试下如下解决办法. 表现症状: 任何程序都有可能出现无响应(记事本.Visual Studio.QQ.视频播放器等) 一旦一个程序出现未 ...

  3. tomcat服务无响应堆栈分析

    tomcat服务突然无响应了,导出内存堆栈和线程堆栈,分析后发现是同步锁使用不合理导致的. [root@prd-dtb-web-01 ~]# [root@prd-dtb-web-01 ~]# jmap ...

  4. 一次Mysql连接池卡死导致服务无响应问题分析(.Net Mysql.Data 6.9.9)

    问题: 进程启动后,线程数迅速上升至最小线程数后,缓慢上升(线程池限制)到数千,然后由于线程过多,CPU飙升到90%. 对外表现为Api无响应或连接超时. 背景 有些数据存在于另一个机房,通过内网专线 ...

  5. ArcGIS Server浏览地图服务无响应原因分析说明

    1.问题描述 从4月17号下午5时起,至18号晚9点,客户单位部分通过ArcGIS Server发布的地图服务(该部分地图服务的数据源为数据库SJZX)无法加载浏览,表现为长时间无响应.同时,通过Ar ...

  6. ubuntu下QtCreator启动无响应问题解决

    打开Qt后就卡死. 解决方法:删除系统配置目录下的QtProject文件夹: find / -name QtProject 输出: /root/.config/QtProject 删除QtProjec ...

  7. 一个服务io占满,服务器无响应

    (1).服务器io占满,服务无响应, sar -q -f  /var/log/sa/sa28 上图显示plist-sz 增加了一倍 plist-sz 说明:进程列表中的进程(processes)和线程 ...

  8. Ubuntu16.04 下的网易云出现网络异常、无法播放,界面无响应问题的统一解决

    能够在Linux系统下体验到原生界面的网易云音乐是件不错的事情,但是它总是经常性的出现网络异常,界面无响应的问题 为了听歌的体验,进行深入探究: 首先通过终端启用网易云音乐:sudo netease- ...

  9. win7下Google谷歌浏览器上传下载卡死无响应

    问题背景:win7,谷歌浏览器上传选择图片之后,页面卡死无响应. 以前解决过类似问题就是input type="file"的accept属性当为通配符时,会出现这种情况,改为具体的 ...

随机推荐

  1. 快速压缩PNG文件在线工具

    https://tinypng.com/ 直接拖移要压缩文件即可

  2. ThinkPHP3.2 生成二维码

    下面是整合将phpqrcode整合到TP生成二维码就可以解决这个问题了.其实也很简单,使用方法如下:先下载附件解压至ThinkPHP/Extend/Vendor目录,目录不存在自己创建.      v ...

  3. thinkphp获取特定字段的两种方法

    thinkphp getField( )和field( ) 2014年10月05日 ⁄ 综合 ⁄ 共 1509字 ⁄ 字号 小 中 大 ⁄ 评论关闭 做数据库查询的时候,比较经常用到这两个,总是查手册 ...

  4. 关于getHTML()方法和getHtmlAjax()方法 GetHttpLength, 清除HTML标签

    public string getHtml(string Url, string type = "UTF-8") { try { System.Net.WebRequest wRe ...

  5. 【转载】Windows系统下删除ubuntu

    原始日期:2013-11-02 15:51 以windows7为例:   用MbrFix.exe修复MBR 卸载Windows/Linux双系统中的Ubuntu1.如果你有Windows系统安装盘/启 ...

  6. Redis中的数据对象

    redis对象 redis中有五种常用对象 我们所说的对象的类型大多是值的类型,键的类型大多是字符串对象,值得类型大概有以下几种,但是无论哪种都是基于redisObject实现的 redisObjec ...

  7. 利用python将mysql中的数据导入excel

    Python对Excel的读写主要有xlrd.xlwt.xlutils.openpyxl.xlsxwriter几种. 如下分别利用xlwt和openpyxl将mysql数据库中查询的数据保存到exce ...

  8. ajax请求返回数据,模板中的数据处理

    /*ajax请求返回数据,模板中的数据处理*/ function QueryGameAsset(){ var new_start_time=$('#new_start_time').val();//开 ...

  9. 开源搜索引擎abelkhan

    发起一个开源项目http://www.abelkhan.com/ 目前而言,已经用python编写了一个网络爬虫抓取页面,和一个简单的前端 网络爬虫,已经有很多高手写过,我基本上奉行了拿来主义, 得益 ...

  10. 详解ASP.NET MVC 控制器

    1   概述 在阅读本篇博文时,建议结合上篇博文:详解ASP.NET MVC 路由  一起阅读,效果可能会更好些. Controller(控制器)在ASP.NET MVC中负责控制所有客户端与服务端的 ...