1:代码

下面是一个简单的能够发生死锁的代码:

#include <unistd.h>
#include <pthread.h>
#include <string.h> typedef struct
{
pthread_mutex_t mutex1;
pthread_mutex_t mutex2; int sequence1;
int sequence2;
}Counter; void* thread1(void* arg)
{
Counter *cc = (Counter *)arg; while ()
{
pthread_mutex_lock(&cc->mutex1);
++cc->sequence1;
sleep(); pthread_mutex_lock(&cc->mutex2);
++cc->sequence2; pthread_mutex_unlock(&cc->mutex2);
pthread_mutex_unlock(&cc->mutex1);
}
} void* thread2(void* arg)
{
Counter *cc = (Counter *)arg; while ()
{
pthread_mutex_lock(&cc->mutex2);
++cc->sequence2;
sleep(); pthread_mutex_lock(&cc->mutex1);
++cc->sequence1; pthread_mutex_unlock(&cc->mutex1);
pthread_mutex_unlock(&cc->mutex2);
}
} int main()
{
Counter pub_counter = {PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER, , }; pthread_t tid[];
if (pthread_create(&tid[], NULL, &thread1, &pub_counter) != )
{
_exit();
}
if (pthread_create(&tid[], NULL, &thread2, &pub_counter) != )
{
_exit();
} pthread_join(tid[], NULL);
pthread_join(tid[], NULL); return ;
}

2:编译运行

编译时加上-g选项,以便能够得到符号对应的源码

gcc -o deadlock -g deadlock.c -pthread
./deadlock

3:pstack查看调用栈

使用pstack命令,可以查看正在运行的进程的调用栈:

# ps -ef|grep deadlock
root : pts/ :: ./deadlock
root : pts/ :: grep --color=auto deadlock # pstack
Thread (Thread 0x7f6093bf6700 (LWP )):
# 0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.
# 0x00007f6093fc1d02 in _L_lock_791 () from /lib64/libpthread.so.
# 0x00007f6093fc1c08 in pthread_mutex_lock () from /lib64/libpthread.so.
# 0x00000000004007d8 in thread1 (arg=0x7fffad4cbeb0) at deadlock.c:
# 0x00007f6093fbfdc5 in start_thread () from /lib64/libpthread.so.
# 0x00007f6093cee76d in clone () from /lib64/libc.so.
Thread (Thread 0x7f60933f5700 (LWP )):
# 0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.
# 0x00007f6093fc1d02 in _L_lock_791 () from /lib64/libpthread.so.
# 0x00007f6093fc1c08 in pthread_mutex_lock () from /lib64/libpthread.so.
# 0x0000000000400852 in thread2 (arg=0x7fffad4cbeb0) at deadlock.c:
# 0x00007f6093fbfdc5 in start_thread () from /lib64/libpthread.so.
# 0x00007f6093cee76d in clone () from /lib64/libc.so.
Thread (Thread 0x7f60943e1740 (LWP )):
# 0x00007f6093fc0ef7 in pthread_join () from /lib64/libpthread.so.
# 0x0000000000400908 in main () at deadlock.c:

多运行几次,发现每次的打印中,线程2和3都卡在__lll_lock_wait函数中,这就是一个明显的死锁发生的信号了。

4:gdb

4.1 attach到进程

使用gdb命令,attach到进程上,查看锁的状态:

# gdb attach
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.-.el7
Copyright (C) Free Software Foundation, Inc.
License GPLv3+: GNU GPL version or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
attach: No such file or directory.
Attaching to process
Reading symbols from /root/devel/mycode/deadlock...done.
Reading symbols from /lib64/libpthread.so....(no debugging symbols found)...done.
[New LWP ]
[New LWP ]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Loaded symbols for /lib64/libpthread.so.
Reading symbols from /lib64/libc.so....(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.
Reading symbols from /lib64/ld-linux-x86-.so....(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-.so.
0x00007f6093fc0ef7 in pthread_join () from /lib64/libpthread.so.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-.el7_3..x86_64

4.2查看改进程当前有哪些线程:

(gdb) info thread
Id Target Id Frame
Thread 0x7f6093bf6700 (LWP ) "deadlock" 0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.
Thread 0x7f60933f5700 (LWP ) "deadlock" 0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.
* Thread 0x7f60943e1740 (LWP ) "deadlock" 0x00007f6093fc0ef7 in pthread_join () from /lib64/libpthread.so.

*说明当前正在线程1上,需要切换到线程2和线程3上,查看锁的状态。

先切换到线程2上,并打印调用栈:

(gdb) thread
[Switching to thread (Thread 0x7f60933f5700 (LWP 9869))]
# 0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.
(gdb) bt
# 0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.
# 0x00007f6093fc1d02 in _L_lock_791 () from /lib64/libpthread.so.
# 0x00007f6093fc1c08 in pthread_mutex_lock () from /lib64/libpthread.so.
# 0x0000000000400852 in thread2 (arg=0x7fffad4cbeb0) at deadlock.c:
# 0x00007f6093fbfdc5 in start_thread () from /lib64/libpthread.so.
# 0x00007f6093cee76d in clone () from /lib64/libc.so.

线程2的”PID”为9869。调用栈显示该线程正阻塞在pthread_mutex_lock上。尝试看一下锁的状态:

(gdb) p cc
No symbol "cc" in current context.
(gdb) frame
# 0x0000000000400852 in thread2 (arg=0x7fffad4cbeb0) at deadlock.c:
pthread_mutex_lock(&cc->mutex1);
(gdb) p cc
$ = (Counter *) 0x7fffad4cbeb0
(gdb) p cc->mutex1
$ = {__data = {__lock = , __count = , __owner = 9868, __nusers = , __kind = , __spins = , __list = {__prev = 0x0,
__next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\214&\000\000\001", '\000' <repeats times>, __align = }
(gdb) p cc->mutex2
$ = {__data = {__lock = , __count = , __owner = 9869, __nusers = , __kind = , __spins = , __list = {__prev = 0x0,
__next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\215&\000\000\001", '\000' <repeats times>, __align = }
(gdb) p cc->sequence1
$ =
(gdb) p cc->sequence2
$ =

因为当前正处于栈帧0上,也就是__lll_lock_wait函数中,因此尝试打印cc时,会报:No symbol "cc" in current context。因此,首先需要使用frame 3命令,切换到调用pthread_mutex_lock之前的栈帧,然后打印出cc中的各个属性。

可见,cc->mutex1当前被”PID”为9868的线程所持有,而cc->mutex2被”PID”为9869的线程,也就是当前线程所持有。

然后,切换到线程3上,然后查看调用栈以及锁的状态:

(gdb) thread
[Switching to thread (Thread 0x7f6093bf6700 (LWP ))]
# 0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.
(gdb) bt
# 0x00007f6093fc61bd in __lll_lock_wait () from /lib64/libpthread.so.
# 0x00007f6093fc1d02 in _L_lock_791 () from /lib64/libpthread.so.
# 0x00007f6093fc1c08 in pthread_mutex_lock () from /lib64/libpthread.so.
# 0x00000000004007d8 in thread1 (arg=0x7fffad4cbeb0) at deadlock.c:
# 0x00007f6093fbfdc5 in start_thread () from /lib64/libpthread.so.
# 0x00007f6093cee76d in clone () from /lib64/libc.so.
(gdb) f 3

# 0x00000000004007d8 in thread1 (arg=0x7fffad4cbeb0) at deadlock.c:
pthread_mutex_lock(&cc->mutex2);
(gdb) p cc->mutex1
$ = {__data = {__lock = , __count = , __owner = 9868, __nusers = , __kind = , __spins = , __list = {__prev = 0x0,
__next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\214&\000\000\001", '\000' <repeats times>, __align = }
(gdb) p cc->mutex2
$ = {__data = {__lock = , __count = , __owner = 9869, __nusers = , __kind = , __spins = , __list = {__prev = 0x0,
__next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\215&\000\000\001", '\000' <repeats times>, __align = }
(gdb) p cc->sequence1
$ =
(gdb) p cc->sequence2
$ =

可见,线程3的”PID”为9868,它就是持有cc->mutex1的线程,而该线程所请求lock的cc->mutex2,目前正被”PID”为9869的线程持有,也就是线程2。

5:附注

gdb attach到进程上之后,进程的运行就停止了(不是死掉,只是停止运行),从而可以运行各种GDB命令,查看调用栈,内部变量等:

The first thing gdb does after arranging to debug the specified process is to stop it. You can examine and modify an attached process with all the gdb commands that are ordinarily available when you start processes with run. You can insert breakpoints; you can step and continue; you can modify storage. If you would rather the process continue running, you may use the continue command after attaching gdb to the process.

http://sourceware.org/gdb/onlinedocs/gdb/Attach.html

当使用GDB调试进程时,如果该进程收到了信号,对于不同的信号,GDB会有不同的动作。有些信号会使得GDB将进程停住,或者直接将信号传递给进程。可以使用info signals或info handle命令,查看GDB收到信号时采取的动作:

(gdb) info signals
Signal Stop Print Pass to program Description SIGHUP Yes Yes Yes Hangup
SIGINT Yes Yes No Interrupt
SIGQUIT Yes Yes Yes Quit
SIGILL Yes Yes Yes Illegal instruction
SIGTRAP Yes Yes No Trace/breakpoint trap
SIGABRT Yes Yes Yes Aborted
SIGEMT Yes Yes Yes Emulation trap
SIGFPE Yes Yes Yes Arithmetic exception
SIGKILL Yes Yes Yes Killed

可见,对于SIGINT和SIGTRAP信号,默认情况下GDB会停止进程的运行,并且不将信号传递给进程。因此,可以利用这两个信号,暂停进程的运行,打印调试信息,然后使用continue命令,使进程继续运行。

GDB has the ability to detect any occurrence of a signal in your program. You can tell GDB in advance what to do for each kind of signal.

Normally, GDB is set up to let the non-erroneous signals like SIGALRM be silently passed to your program (so as not to interfere with their role in the program’s functioning) but to stop your program immediately whenever an error signal happens. You can change these settings with the handle command.

info signals

info handle

Print a table of all the kinds of signals and how GDB has been told to handle each one. You can use this to see the signal numbers of all the defined types of signals.

info signals sig

Similar, but print information only about the specified signal number.

info handle is an alias for info signals.

catch signal [signal… | ‘all’]

Set a catchpoint for the indicated signals. See Set Catchpoints, for details about this command.

handle signal [keywords…]

Change the way GDB handles signal signal. The signal can be the number of a signal or its name (with or without the ‘SIG’ at the beginning); a list of signal numbers of the form ‘low-high’; or the word ‘all’, meaning all the known signals. Optional arguments keywords, described below, say what change to make.

The keywords allowed by the handle command can be abbreviated. Their full names are:

nostop

GDB should not stop your program when this signal happens. It may still print a message telling you that the signal has come in.

stop

GDB should stop your program when this signal happens. This implies the print keyword as well.

print

GDB should print a message when this signal happens.

noprint

GDB should not mention the occurrence of the signal at all. This implies the nostop keyword as well.

pass

noignore

GDB should allow your program to see this signal; your program can handle the signal, or else it may terminate if the signal is fatal and not handled. pass and noignore are synonyms.

nopass

ignore

GDB should not allow your program to see this signal. nopass and ignore are synonyms.

When a signal stops your program, the signal is not visible to the program until you continue. Your program sees the signal then, if pass is in effect for the signal in question at that time. In other words, after GDB reports a signal, you can use the handle command with pass or nopass to control whether your program sees that signal when you continue.

The default is set to nostop, noprint, pass for non-erroneous signals such as SIGALRM, SIGWINCH and SIGCHLD, and to stop, print, pass for the erroneous signals.

https://sourceware.org/gdb/current/onlinedocs/gdb/Signals.html#Signals

使用pstack和gdb调试死锁的更多相关文章

  1. 嵌入式 GDB调试死锁示例

    死锁:一种情形,此时执行程序中两个或多个线程发生永久堵塞(等待),每个线程都在等待被 其他线程占用并堵塞了的资源.例如,如果线程A锁住了记录1并等待记录2,而线程B锁住了记录2并等待记录1,这样两个线 ...

  2. 用gdb调试python多线程代码-记一次死锁的发现

    | 版权:本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接.如有问题,可以邮件:wangxu198709@gmail.com 前言 相信很多人都有 ...

  3. gdb调试分析多线程死锁

    转载: http://blog.chinaunix.net/uid-30343738-id-5757210.html #include <stdio.h> #include <pth ...

  4. nginx源码分析--使用GDB调试(strace、 pstack )

    nginx源码分析--使用GDB调试(strace.  pstack ) http://blog.csdn.net/scdxmoe/article/details/49070577

  5. gdb调试多线程程序总结

    阿里核心系统团队博客 http://csrd.aliapp.com/?tag=pstack Linux下多线程查看工具(pstree.ps.pstack) http://www.cnblogs.com ...

  6. gdb调试线程

    gdb thread apply all bt 如果你发现有那么几个栈停在 pthread_wait 或者类似调用上,大致就可以得出结论:就是它们几个儿女情长,耽误了整个进程. 注意gdb的版本要高于 ...

  7. Linux gdb调试器

    gdb的启动 --gdb 程序名 [corefile] --corefile是可选的,但能增强gdb的调试能力 --强调:启动gdb必须在编译命里加上"-g"参数,"-g ...

  8. GDB调试-从入门到实践

    你好,我是雨乐! 在上篇文章中,我们分析了线上coredump产生的原因,其中用到了coredump分析工具gdb,这几天一直有读者在问,能不能写一篇关于gdb调试方面的文章,今天借助此文,分享一些工 ...

  9. GDB调试命令小结

    1.启动调试 前置条件:编译生成执行码时带上 -g,如果使用Makefile,通过给CFLAGS指定-g选项,否则调试时没有符号信息.gdb program //最常用的用gdb启动程序,开始调试的方 ...

随机推荐

  1. http协议与servletl理解

    协议是指计算机通信网络中两台计算机之间进行通信所必须共同遵守的规定或规则,超文本传输协议(HTTP)是一种通信协议,它允许将超文本标记语言(HTML)文档从Web服务器传送到客户端的浏览器     s ...

  2. MySQL系列(一)--基础知识(转载)

    安装就不说了,网上多得是,我的MySQL是8.0版本,可以参考:CentOS7安装MySQL8.0图文教程和MySQL8.0本地访问设置为远程访问权限 我的MySQL安装在阿里云上面,阿里云向外暴露端 ...

  3. Web应用托管服务(Web+)隐藏的十个上云最佳姿势

    随着云计算浪潮的推进,技术架构云化已经成为大势所趋.特别是最近由CNCF推动的云原生概念,将符合云原生标准的各种开源技术方案推向了前所未有的高度.在这一波浪潮的推动下,越来越多的企业开始了自身的数字化 ...

  4. 通过游戏学python 3.6 第一季 第四章 实例项目 猜数字游戏--核心代码--猜测次数--随机函数和屏蔽错误代码--优化代码及注释 可复制直接使用 娱乐 可封装 函数

    #猜数字--核心代码--猜测次数--随机函数和屏蔽错误代码---优化代码及注释 #!usr/bin/env python #-*-coding:utf-8-*- #QQ124111294 import ...

  5. HTTP协议详解(经典)

    转自:http://blog.csdn.net/gueter/archive/2007/03/08/1524447.aspx Author :Jeffrey 引言 HTTP是一个属于应用层的面向对象的 ...

  6. Hosts 广告

    # 百度 127.0.0.1 cpro.baidustatic.com 127.0.0.1 dup.baidustatic.com 127.0.0.1 hm.baidu.com 127.0.0.1 i ...

  7. 20190719-FirstZero

    这也许也是一个成就吧? First Zero 考试 第一次 爆0 好了好了. T1 你永远不知道你在想什么. 我仿佛想出一个$\Theta(NM\log^2 N)$的$dfs$??? 蒟蒻原地爆炸 T ...

  8. LAMP环境搭建和配置(2)

    配置httpd 默认虚拟主机 编辑hpptd的主配置文件 搜索httpd-vhost,把行首的#号删除 保存主配置文件,然后编辑虚拟主机配置文件 重新编辑配置段(第一段为默认虚拟主机) ServerA ...

  9. jenkins自动部署

    最近在使用公司的jenkins进行自动部署项目,由于之前没有用过,一直半生不熟,因此特意写个随机记录. 1.登录jenkins服务 jenkins安装好后,我们通过浏览器访问它的主页(如下),输入用户 ...

  10. JAVA邀请码生成器

    code import java.util.Random; /** * 邀请码生成器,算法原理:<br/> * 1) 获取id: 1127738 <br/> * 2) 使用自定 ...