cloudstack下libvirtd服务无响应问题
在cloudstack4.5.2版本下,偶尔出现libvirtd服务无响应的情况,导致virsh命令无法使用,同时伴随cloudstack master丢失该slave主机连接的情况。怀疑是libvirtd服务或版本的问题,但是在官网上并没有找到类似的bug提交,该问题可能还存在于更高的版本,需要时间进一步从根本上分析。下面是该问题的处理过程,在此记录下,关注和使用cloudstack的朋友可以参考。
众所周知,cloudstack的社区热度远不如openstack,为什么还要选择clcoudstack?这个问题以后有机会再和大家聊。言归正传。
环境交代
宿主机操作系统:centos6.5x64(2.6.32-431.el6.x86_64)
cloudstack版本:4.5.2
libvirt版本:libvirt-0.10.2-54.el6_7.2.x86_64
问题描述
通过cloudstackapi listHosts报警信息显示:
node5.cloud.rtmap:192.168.14.20 state is Down at 2016-05-13T07:19:04+0800
#有关cloudstackapi的使用方法在其它文章中总结,不在此处说明。
登陆问题宿主服务器检查:
[root@node5 log]#virsh list --all
没有响应ctrl^c退出
这时的vm可以正常工作,但处于失控状态
尝试重启启动libvirtd服务:
[root@node5 log]# service libvirtd stop
正在关闭 libvirtd 守护进程: [失败] #无法关闭libvirtd服务
尝试重启启动cloudstack-agent服务:
[root@node5 libvirt]# service cloudstack-agent restart
Stopping Cloud Agent:
Starting Cloud Agent:
libvirtd故障依旧
简单维护
[root@node5 ping]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf
libvirtd:错误:Unable to initialize network sockets。查看 /var/log/messages 或者运行不带 --daemon 的命令查看更多信息。
[root@node5 log]# libvirtd -d
可以执行成功,这时执行virsh list --all 可以查看和操作vm
[root@node5 log]#virsh list --all
Id 名称 状态
----------------------------------------------------
2 i-4-185-VM running
虽然vm运行正常,现在也可以通过命令正常管理了。但是对于cloudstack平台而言,宿主机处于down状态,vm处于失控状态。
临时解决办法是在其它大的升级和维护过程中重启服务器解决,根本解决还要具体问题具体分析。
分析与排查
检查进程
- [root@node5 log]# ps ax |grep libvirtd
- ? R : libvirtd --daemon -l #该服务始终处于run状态
- [root@node5 log]# top -p
- top -p
- top - :: up days, :, user, load average: 3.05, 5.07, 6.64
- Tasks: total, running, sleeping, stopped, zombie
- Cpu(s): 4.8%us, 1.4%sy, 0.0%ni, 93.1%id, 0.6%wa, 0.0%hi, 0.1%si, 0.0%st
- Mem: 264420148k total, 182040780k used, 82379368k free, 834232k buffers
- Swap: 8388600k total, 92k used, 8388508k free, 100453708k cached
- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
- root 984m 12m R 100.2 0.0 :22.68 libvirtd #cpu占用100%,无法释放,影响系统稳定性
杀进程
- [root@node5 log]# kill -
- [root@node5 log]# kill -
- [root@master log]# ps ax |grep libvirtd #检查进程依然存在
- ? R : libvirtd --daemon -l
- [root@node5 ~]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf
- libvirtd:错误:Unable to initialize network sockets。查看 /var/log/messages 或者运行不带 --daemon 的命令查看更多信息。
- [root@node5 ~]# netstat -antp |grep
- tcp 0.0.0.0: 0.0.0.0:* LISTEN /libvirtd
- tcp 192.168.14.25: 192.168.14.22: CLOSE_WAIT -
- tcp 192.168.14.25: 192.168.14.20: CLOSE_WAIT -
- tcp 192.168.14.25: 192.168.14.10: CLOSE_WAIT -
- tcp ::: :::* LISTEN /libvirtd
- tcp ::: ::: CLOSE_WAIT -
经过上述操作,初步判断libvirtd陷入了hang死状态。
追踪进程
- [root@node5 log]#strace -f libvirtd #后来发现这样strace是不对的,正确的方法是strace -f -p 针对pid进行strace
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- [pid ] close() = - EBADF (Bad file descriptor)
- ^C[pid ] close( <unfinished ...>
- Process detached
- Process detached
- Process detached
- Process detached
- Process detached
- Process detached
- Process detached
- Process detached
- Process detached
- Process detached
- Process detached
- Process detached
父进程6485在不断的产生和关闭子进程,并返回错误信息。Bad file descriptor的原因(如何触发的,谁触发的)? 循环为何无法退出?问题如何再现?
获得更多的线索
官方文档(libvirtd各种故障诊断记录和解决办法非常详尽)
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-libvirtd_failed_to_start
/etc/libvirt/libvirtd.conf
by enabling the line below. To enable the setting the line, open the /etc/libvirt/libvirtd.conf
file in a text editor, remove the hash (or #
) symbol from the beginning of the following line, and save the change:
- log_outputs="3:syslog:libvirtd"
参照配置,重启服务器等待下次故障观察日志
- ......
- Jun :: node5 abrtd: New client connected
- Jun :: node5 abrtd: Directory 'pyhook-2016-06-01-12:42:26-70065' creation detected
- Jun :: node5 abrt-server[]: Saved Python crash dump of pid to /var/spool/abrt/pyhook----::-
- Jun :: node5 abrtd: Package 'cloudstack-common' isn't signed with proper key
- Jun :: node5 abrtd: 'post-create' on '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065' exited with
- Jun :: node5 abrtd: Deleting problem directory '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065'
- Jun :: node5 abrt: detected unhandled Python exception in '/usr/share/cloudstack-common/scripts/vm/network/security_group.py'
- ......
- Jun :: node5 libvirtd: : warning : qemuDomainObjBeginJobInternal: : Cannot start job (modify, none) for domain i---VM; current job is (modify, none) owned by (, )
- Jun :: node5 libvirtd: : error : qemuDomainObjBeginJobInternal: : Timed out during operation: cannot acquire state change lock
- Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
- Jun :: node5 libvirtd: : error : virNetSocketNewListenTCP: : Unable to bind to port: 地址已在使用
- Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
- Jun :: node5 libvirtd: : error : virNetSocketNewListenTCP: : Unable to bind to port: 地址已在使用
- Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
- Jun :: node5 libvirtd: : error : virNetSocketNewListenTCP: : Unable to bind to port: 地址已在使用
- Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
- Jun :: node5 libvirtd: : error : virNetSocketNewListenTCP: : Unable to bind to port: 地址已在使用
- Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
- Jun :: node5 libvirtd: : error : virNetSocketNewListenTCP: : Unable to bind to port: 地址已在使用
- ......
- Jun :: node5 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
- Jun :: node5 libvirtd: : info : libvirt version: 0.10., package: .el6_7. (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
- Jun :: node5 libvirtd: : error : virPidFileAcquirePath: : Failed to acquire pid file '/var/run/libvirtd.pid': 资源暂时不可用
并未获得致命错误和更多线索。(该日志配置选项还是很有必要打开的,很多问题都可以通过它来定位)
解决过程
解决思路
- 尝试和找到终止进程、重启服务的方法
- 提交bug,等待补丁升级
- 分析源代码,再现问题,解决问题(投入研发和时间)
由于不能再现问题,还是从简入繁吧。触发这些子进程的元凶是谁?还是cloudstack-agent的嫌疑最大,但之前重启过该服务并没有解决问题,那么agent服务是怎么一回事呢?
看下启动脚本可以基本了解,
[root@node5 libvirt]# cat /etc/rc.d/init.d/cloudstack-agent
- #!/bin/bash
- # chkconfig:
- # description: Cloud Agent
- # Licensed to the Apache Software Foundation (ASF) under one
- # or more contributor license agreements. See the NOTICE file
- # distributed with this work for additional information
- # regarding copyright ownership. The ASF licenses this file
- # to you under the Apache License, Version 2.0 (the
- # "License"); you may not use this file except in compliance
- # with the License. You may obtain a copy of the License at
- #
- # http://www.apache.org/licenses/LICENSE-2.0
- #
- # Unless required by applicable law or agreed to in writing,
- # software distributed under the License is distributed on an
- # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- # KIND, either express or implied. See the License for the
- # specific language governing permissions and limitations
- # under the License.
- # WARNING: if this script is changed, then all other initscripts MUST BE changed to match it as well
- . /etc/rc.d/init.d/functions
- # set environment variables
- SHORTNAME=$(basename $ | sed -e 's/^[SK][0-9][0-9]//')
- PIDFILE=/var/run/"$SHORTNAME".pid
- LOCKFILE=/var/lock/subsys/"$SHORTNAME"
- LOGDIR=/var/log/cloudstack/agent
- LOGFILE=${LOGDIR}/agent.log
- PROGNAME="Cloud Agent"
- CLASS="com.cloud.agent.AgentShell"
- JSVC=`which jsvc >/dev/null`;
- # exit if we don't find jsvc
- if [ -z "$JSVC" ]; then
- echo no jsvc found in path;
- exit ;
- fi
- unset OPTIONS
- [ -r /etc/sysconfig/"$SHORTNAME" ] && source /etc/sysconfig/"$SHORTNAME"
- # The first existing directory is used for JAVA_HOME (if JAVA_HOME is not defined in $DEFAULT)
- JDK_DIRS="/usr/lib/jvm/jre /usr/lib/jvm/java-7-openjdk /usr/lib/jvm/java-7-openjdk-i386 /usr/lib/jvm/java-7-openjdk-amd64 /usr/lib/jvm/java-6-openjdk /usr/lib/jvm/java-6-openjdk-i386 /usr/lib/jvm/java-6-openjdk-amd64 /usr/lib/jvm/java-6-sun"
- for jdir in $JDK_DIRS; do
- if [ -r "$jdir/bin/java" -a -z "${JAVA_HOME}" ]; then
- JAVA_HOME="$jdir"
- fi
- done
- export JAVA_HOME
- ACP=`ls /usr/share/cloudstack-agent/lib/*.jar | tr '\n' ':' | sed s'/.$//'`
- PCP=`ls /usr/share/cloudstack-agent/plugins/*.jar 2>/dev/null | tr '\n' ':' | sed s'/.$//'`
- # We need to append the JSVC daemon JAR to the classpath
- # AgentShell implements the JSVC daemon methods
- export CLASSPATH="/usr/share/java/commons-daemon.jar:$ACP:$PCP:/etc/cloudstack/agent:/usr/share/cloudstack-common/scripts"
- start() {
- echo -n $"Starting $PROGNAME: "
- if hostname --fqdn >/dev/null 2>&1 ; then
- $JSVC -Xms256m -Xmx2048m -cp "$CLASSPATH" -pidfile "$PIDFILE" \
- -errfile $LOGDIR/cloudstack-agent.err -outfile $LOGDIR/cloudstack-agent.out $CLASS
- RETVAL=$?
- echo
- else
- failure
- echo
- echo The host name does not resolve properly to an IP address. Cannot start "$PROGNAME". > /dev/stderr
- RETVAL=9
- fi
- [ $RETVAL = 0 ] && touch ${LOCKFILE}
- return $RETVAL
- }
- stop() {
- echo -n $"Stopping $PROGNAME: "
- $JSVC -pidfile "$PIDFILE" -stop $CLASS
- RETVAL=$?
- echo
- [ $RETVAL = 0 ] && rm -f ${LOCKFILE} ${PIDFILE}
- }
- case "$1" in
- start)
- start
- ;;
- stop)
- stop
- ;;
- status)
- status -p ${PIDFILE} $SHORTNAME
- RETVAL=$?
- ;;
- restart)
- stop
- sleep 3
- start
- ;;
- condrestart)
- if status -p ${PIDFILE} $SHORTNAME >&/dev/null; then
- stop
- sleep 3
- start
- fi
- ;;
- *)
- echo $"Usage: $SHORTNAME {start|stop|restart|condrestart|status|help}"
- RETVAL=3
- esac
- exit $RETVAL
[root@node5 libvirt]# ps ax |grep jsvc.exec
- [root@node5 libvirt]# ps ax |grep jsvc.exec
- ? Ss : jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4..jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7..jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3..jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2..jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5..jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7..jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2..jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0..jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1..jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7..jar:/usr/share/cloudstack-agent/lib/dom4j-1.6..jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6..jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0..jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7..jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7..jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3..jar:/usr/share/cloudstack-agent/lib/httpcore-4.3..jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8..jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8..jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1..jar:/usr/share/cloudstack-agent/lib/jasypt-1.9..jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12..GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0..jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar
- ? Sl : jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4..jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7..jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3..jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2..jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5..jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5..jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7..jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2..jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0..jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1..jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7..jar:/usr/share/cloudstack-agent/lib/dom4j-1.6..jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6..jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0..jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7..jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7..jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3..jar:/usr/share/cloudstack-agent/lib/httpcore-4.3..jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8..jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1..jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8..jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1..jar:/usr/share/cloudstack-agent/lib/jasypt-1.9..jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12..GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0..jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar
重启服务
- [root@node5 bin]# service cloudstack-agent status
- cloudstack-agent (pid ) 正在运行...
- [root@node5 bin]# service cloudstack-agent stop
- Stopping Cloud Agent:
- [root@node5 bin]# service cloudstack-agent status
cloudstack-agent (pid 6657) 正在运行..
ps ax |grep jsvc.exec 也验证了进程依然存在
眼前一亮的同时,也发现了之前使用restart带来的问题,stop不成功的问题被掩盖了~~~有没有懊恼? 不过来不及反思,接下来的问题还远不是这么简单......
- [root@node5 bin]# kill -
- [root@node5 bin]# kill -
- -bash: kill: () - 没有那个进程
- -bash: kill: () - 没有那个进程
- [root@node5 bin]# service cloudstack-agent status
- cloudstack-agent 已死,但 pid 文件仍存
- [root@node5 bin]# rm /var/run/cloudstack-agent.pid
- rm:是否删除普通文件 "/var/run/cloudstack-agent.pid"?y
- [root@node5 bin]# service cloudstack-agent status
- cloudstack-agent 已死,但是 subsys 被锁
- [root@node5 bin]# service cloudstack-agent start
- [root@node5 bin]# service cloudstack-agent status
- cloudstack-agent (pid ) 正在运行...
- [root@node5 bin]# netstat -antp |grep
- tcp 192.168.14.20: 192.168.14.10: ESTABLISHED /jsvc.exec
处理后状态恢复正常,但是libvirtd仍然无法杀掉, 很快netstat -antp |grep 8250 状态再次消失,cloudstack master平台监控主机记录由Up状态转为disconnect状态。不过毕竟不是down状态,较之前已经有了进步。
启动一个libvirtd -d看下,
- [root@node5 bin]# libvirtd -d
- [root@node5 bin]# ps ax |grep libvirtd
- ? R : libvirtd --daemon -l
- ? Sl : libvirtd -d
- pts/ S+ : grep libvirtd
然后在cloudstack master平台上手工点击强制重新连接该主机,成功了。主机监控状态由disconnect转为Up,这时再次尝试杀掉仍然是不成功的,于是又在cloudstack master管理平台上尝试着点击操作了一下暂停vm命令,vm成功暂停。再返回服务器上观察原来hung死的libvirtd进程已经消失。
- [root@node5 bin]# libvirtd -d
- [root@node5 bin]# ps ax |grep libvirtd
- ? Sl : libvirtd -d
- pts/ S+ : grep libvirtd
至此既恢复了平台对该主机的管控,也终止了libvirtd异常进程。问题初步归于cloudstack-agent在处理发送个libvirtd的信号上存在些小问题。以后再单独分析下jsvc进程,再现问题和根本解决。
问题反思
在处理服务异常的问题上,命令行参数不要用restart,用stop和kill来调试。说起来都是泪!
相关链接
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-libvirtd_failed_to_start
-本文完-
2016-06-13 15:20:22
cloudstack下libvirtd服务无响应问题的更多相关文章
- [httpd] httpd server 在低负载的情况下对SYN无响应
如题: 两台client通过load balance访问httpd server.两个client交互访问.load balance处于fullnat模式. server在低负载情况下,常常对某一个c ...
- Win8/Win10下程序经常无响应的解决办法
如果你使用Win8/Win10系统时经常出现程序无响应的问题不仿试下如下解决办法. 表现症状: 任何程序都有可能出现无响应(记事本.Visual Studio.QQ.视频播放器等) 一旦一个程序出现未 ...
- tomcat服务无响应堆栈分析
tomcat服务突然无响应了,导出内存堆栈和线程堆栈,分析后发现是同步锁使用不合理导致的. [root@prd-dtb-web-01 ~]# [root@prd-dtb-web-01 ~]# jmap ...
- 一次Mysql连接池卡死导致服务无响应问题分析(.Net Mysql.Data 6.9.9)
问题: 进程启动后,线程数迅速上升至最小线程数后,缓慢上升(线程池限制)到数千,然后由于线程过多,CPU飙升到90%. 对外表现为Api无响应或连接超时. 背景 有些数据存在于另一个机房,通过内网专线 ...
- ArcGIS Server浏览地图服务无响应原因分析说明
1.问题描述 从4月17号下午5时起,至18号晚9点,客户单位部分通过ArcGIS Server发布的地图服务(该部分地图服务的数据源为数据库SJZX)无法加载浏览,表现为长时间无响应.同时,通过Ar ...
- ubuntu下QtCreator启动无响应问题解决
打开Qt后就卡死. 解决方法:删除系统配置目录下的QtProject文件夹: find / -name QtProject 输出: /root/.config/QtProject 删除QtProjec ...
- 一个服务io占满,服务器无响应
(1).服务器io占满,服务无响应, sar -q -f /var/log/sa/sa28 上图显示plist-sz 增加了一倍 plist-sz 说明:进程列表中的进程(processes)和线程 ...
- Ubuntu16.04 下的网易云出现网络异常、无法播放,界面无响应问题的统一解决
能够在Linux系统下体验到原生界面的网易云音乐是件不错的事情,但是它总是经常性的出现网络异常,界面无响应的问题 为了听歌的体验,进行深入探究: 首先通过终端启用网易云音乐:sudo netease- ...
- win7下Google谷歌浏览器上传下载卡死无响应
问题背景:win7,谷歌浏览器上传选择图片之后,页面卡死无响应. 以前解决过类似问题就是input type="file"的accept属性当为通配符时,会出现这种情况,改为具体的 ...
随机推荐
- 快速压缩PNG文件在线工具
https://tinypng.com/ 直接拖移要压缩文件即可
- ThinkPHP3.2 生成二维码
下面是整合将phpqrcode整合到TP生成二维码就可以解决这个问题了.其实也很简单,使用方法如下:先下载附件解压至ThinkPHP/Extend/Vendor目录,目录不存在自己创建. v ...
- thinkphp获取特定字段的两种方法
thinkphp getField( )和field( ) 2014年10月05日 ⁄ 综合 ⁄ 共 1509字 ⁄ 字号 小 中 大 ⁄ 评论关闭 做数据库查询的时候,比较经常用到这两个,总是查手册 ...
- 关于getHTML()方法和getHtmlAjax()方法 GetHttpLength, 清除HTML标签
public string getHtml(string Url, string type = "UTF-8") { try { System.Net.WebRequest wRe ...
- 【转载】Windows系统下删除ubuntu
原始日期:2013-11-02 15:51 以windows7为例: 用MbrFix.exe修复MBR 卸载Windows/Linux双系统中的Ubuntu1.如果你有Windows系统安装盘/启 ...
- Redis中的数据对象
redis对象 redis中有五种常用对象 我们所说的对象的类型大多是值的类型,键的类型大多是字符串对象,值得类型大概有以下几种,但是无论哪种都是基于redisObject实现的 redisObjec ...
- 利用python将mysql中的数据导入excel
Python对Excel的读写主要有xlrd.xlwt.xlutils.openpyxl.xlsxwriter几种. 如下分别利用xlwt和openpyxl将mysql数据库中查询的数据保存到exce ...
- ajax请求返回数据,模板中的数据处理
/*ajax请求返回数据,模板中的数据处理*/ function QueryGameAsset(){ var new_start_time=$('#new_start_time').val();//开 ...
- 开源搜索引擎abelkhan
发起一个开源项目http://www.abelkhan.com/ 目前而言,已经用python编写了一个网络爬虫抓取页面,和一个简单的前端 网络爬虫,已经有很多高手写过,我基本上奉行了拿来主义, 得益 ...
- 详解ASP.NET MVC 控制器
1 概述 在阅读本篇博文时,建议结合上篇博文:详解ASP.NET MVC 路由 一起阅读,效果可能会更好些. Controller(控制器)在ASP.NET MVC中负责控制所有客户端与服务端的 ...