已经运行多年的jbd2,它还是死锁了

背景:这个是在centos7的环境上复现的,内核版本为3.10.0-957.27.2.el7

下面列一下我们是怎么排查并解这个问题的。

一、故障现象

oppo云内核团队接到运维兄弟收集的测试环境一例crash,

现象是load很高,卡得没法操作:

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.27.2.el7.x86_64/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 40
DATE: Mon Apr 19 15:46:07 2021----收集crash的日期
UPTIME: 312 days, 00:49:56
LOAD AVERAGE: 72886.67, 72873.82, 72735.93

我们授权后登陆oppo云宿主,查看最早报hung的日志如下:

Apr 16 19:46:22 [ERR] :  - [26652536.550311] INFO: task jbd2/dm-12-8:547025 blocked for more than 120 seconds.
Apr 16 19:46:22 [ERR] : - [26652536.550342] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 16 19:46:22 [WARNING] : - [26652536.550372] Call Trace:
Apr 16 19:46:22 [WARNING] : - [26652536.550380] [<ffffffff8a76ae39>] schedule_preempt_disabled+0x29/0x70
Apr 16 19:46:22 [WARNING] : - [26652536.550383] [<ffffffff8a768db7>] __mutex_lock_slowpath+0xc7/0x1d0
Apr 16 19:46:22 [WARNING] : - [26652536.550385] [<ffffffff8a76819f>] mutex_lock+0x1f/0x2f
Apr 16 19:46:22 [WARNING] : - [26652536.550405] [<ffffffffc03f17b8>] jbd2_update_log_tail+0x28/0x60 [jbd2]
Apr 16 19:46:22 [WARNING] : - [26652536.550410] [<ffffffffc03ea5dd>] jbd2_journal_commit_transaction+0x155d/0x19b0 [jbd2]
Apr 16 19:46:22 [WARNING] : - [26652536.550417] [<ffffffffc03efe89>] kjournald2+0xc9/0x260 [jbd2]---主函数
Apr 16 19:46:22 [WARNING] : - [26652536.550423] [<ffffffff8a0c3f50>] ? wake_up_atomic_t+0x30/0x30
Apr 16 19:46:22 [WARNING] : - [26652536.550428] [<ffffffffc03efdc0>] ? commit_timeout+0x10/0x10 [jbd2]
Apr 16 19:46:22 [WARNING] : - [26652536.550430] [<ffffffff8a0c2e81>] kthread+0xd1/0xe0
Apr 16 19:46:22 [WARNING] : - [26652536.550433] [<ffffffff8a0c2db0>] ? insert_kthread_work+0x40/0x40
Apr 16 19:46:22 [WARNING] : - [26652536.550437] [<ffffffff8a776c1d>] ret_from_fork_nospec_begin+0x7/0x21
Apr 16 19:46:22 [WARNING] : - [26652536.550439] [<ffffffff8a0c2db0>] ? insert_kthread_work+0x40/0x40
Apr 16 19:46:22 [ERR] : - [26652536.550467] INFO: task java:12610 blocked for more than 120 seconds.

二、故障现象分析

既然最早报hung的进程为jbd2/dm-12-8:547025,那就需要先看下该进程的堆栈,

crash> bt 547025
PID: 547025 TASK: ffff91792aff1040 CPU: 16 COMMAND: "jbd2/dm-12-8"
#0 [ffff91a907247b68] __schedule at ffffffff8a769a72
#1 [ffff91a907247bf8] schedule_preempt_disabled at ffffffff8a76ae39
#2 [ffff91a907247c08] __mutex_lock_slowpath at ffffffff8a768db7
#3 [ffff91a907247c60] mutex_lock at ffffffff8a76819f
#4 [ffff91a907247c78] jbd2_update_log_tail at ffffffffc03f17b8 [jbd2]
#5 [ffff91a907247ca8] jbd2_journal_commit_transaction at ffffffffc03ea5dd [jbd2]
#6 [ffff91a907247e48] kjournald2 at ffffffffc03efe89 [jbd2]
#7 [ffff91a907247ec8] kthread at ffffffff8a0c2e81
#8 [ffff91a907247f50] ret_from_fork_nospec_begin at ffffffff8a776c1d
crash> ps -m 547025
[ 2 19:52:41.384] [UN] PID: 547025 TASK: ffff91792aff1040 CPU: 16 COMMAND: "jbd2/dm-12-8"

从上面ps -m的打印看,该进程最后一次调度的时间是67时52分之前,也就是[ 2 19:52:41.384]:

按照当前的时间点减去报hung的时间点,说明这个进程报hung之后,没有再经历调度。

下面就需要看,这个进程卡在什么地方,代码如下

void jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block)
{
mutex_lock(&journal->j_checkpoint_mutex);//caq:这里需要拿锁
if (tid_gt(tid, journal->j_tail_sequence))
__jbd2_update_log_tail(journal, tid, block);
mutex_unlock(&journal->j_checkpoint_mutex);
}

从堆栈和代码可以确定,是在等待 journal->j_checkpoint_mutex。

然后看mutex放在栈里的位置:

crash> dis -l jbd2_update_log_tail |grep mutex_lock
0xffffffffc03f17b3 <jbd2_update_log_tail+35>: callq 0xffffffff8a768180 <mutex_lock>
crash> dis -l jbd2_update_log_tail |grep mutex_lock -B 10
0xffffffffc03f1796 <jbd2_update_log_tail+6>: mov %rsp,%rbp
0xffffffffc03f1799 <jbd2_update_log_tail+9>: push %r14
0xffffffffc03f179b <jbd2_update_log_tail+11>: mov %rdx,%r14
0xffffffffc03f179e <jbd2_update_log_tail+14>: push %r13
0xffffffffc03f17a0 <jbd2_update_log_tail+16>: mov %esi,%r13d
0xffffffffc03f17a3 <jbd2_update_log_tail+19>: push %r12
0xffffffffc03f17a5 <jbd2_update_log_tail+21>: lea 0xf0(%rdi),%r12
0xffffffffc03f17ac <jbd2_update_log_tail+28>: push %rbx
0xffffffffc03f17ad <jbd2_update_log_tail+29>: mov %rdi,%rbx
0xffffffffc03f17b0 <jbd2_update_log_tail+32>: mov %r12,%rdi---x86的arch,说明是r12是第一个入参
0xffffffffc03f17b3 <jbd2_update_log_tail+35>: callq 0xffffffff8a768180 <mutex_lock>

然后看mutex_lock里面,并没有人动r12寄存器,

dis -l mutex_lock|grep r12 ---没有输出,说明这个函数里面没有人动r12

再往下面调用链看:

crash> dis -l __mutex_lock_slowpath |head -12
/usr/src/debug/kernel-3.10.0-957.27.2.el7/linux-3.10.0-957.27.2.el7.x86_64/kernel/mutex.c: 771
0xffffffff8a768cf0 <__mutex_lock_slowpath>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8a768cf5 <__mutex_lock_slowpath+5>: push %rbp
0xffffffff8a768cf6 <__mutex_lock_slowpath+6>: mov %rsp,%rbp
0xffffffff8a768cf9 <__mutex_lock_slowpath+9>: push %r15
0xffffffff8a768cfb <__mutex_lock_slowpath+11>: push %r14
0xffffffff8a768cfd <__mutex_lock_slowpath+13>: push %r13
/usr/src/debug/kernel-3.10.0-957.27.2.el7/linux-3.10.0-957.27.2.el7.x86_64/arch/x86/include/asm/current.h: 14
0xffffffff8a768cff <__mutex_lock_slowpath+15>: mov %gs:0x10e80,%r13
/usr/src/debug/kernel-3.10.0-957.27.2.el7/linux-3.10.0-957.27.2.el7.x86_64/kernel/mutex.c: 771
0xffffffff8a768d08 <__mutex_lock_slowpath+24>: push %r12----r12在此被压栈,从这取mutex就行
0xffffffff8a768d0a <__mutex_lock_slowpath+26>: push %rbx

根据上面分析,从栈中取得mutex如下:

crash> struct mutex ffff91793cbbe0f0----从栈中取的mutex
struct mutex {----从源码的上下文看,这个其实就是journal_t.j_checkpoint_mutex
count = {
counter = -9
},
wait_lock = {
{
rlock = {
raw_lock = {
val = {
counter = 0
}
}
}
}
},
wait_list = {
next = 0xffff91a907247c10,
prev = 0xffff91612cb139c0
},
owner = 0xffff9178e0ad1040, ----对应的owner
{
osq = {
tail = {
counter = 0
}
},
__UNIQUE_ID_rh_kabi_hide1 = {
spin_mlock = 0x0
},
{<No data fields>}
}
}

下面,就需要根据mutex的owner,看一下对应的进程为什么持有锁之后,长时间不放锁了。

crash> bt 0xffff9178e0ad1040----对应的进程为13027
PID: 13027 TASK: ffff9178e0ad1040 CPU: 10 COMMAND: "java"
#0 [ffff91791321f828] __schedule at ffffffff8a769a72
#1 [ffff91791321f8b8] schedule at ffffffff8a769f19
#2 [ffff91791321f8c8] jbd2_log_wait_commit at ffffffffc03ef7c5 [jbd2]
#3 [ffff91791321f940] jbd2_log_do_checkpoint at ffffffffc03ec0bf [jbd2]
#4 [ffff91791321f9a8] __jbd2_log_wait_for_space at ffffffffc03ec4bf [jbd2]---持有j_checkpoint_mutex
#5 [ffff91791321f9f0] add_transaction_credits at ffffffffc03e63d3 [jbd2]
#6 [ffff91791321fa50] start_this_handle at ffffffffc03e65e1 [jbd2]
#7 [ffff91791321fae8] jbd2__journal_start at ffffffffc03e6a93 [jbd2]
#8 [ffff91791321fb30] __ext4_journal_start_sb at ffffffffc05abc69 [ext4]
#9 [ffff91791321fb70] ext4_da_write_begin at ffffffffc057e099 [ext4]
#10 [ffff91791321fbf8] generic_file_buffered_write at ffffffff8a1b73a4
#11 [ffff91791321fcc0] __generic_file_aio_write at ffffffff8a1b9ce2
#12 [ffff91791321fd40] generic_file_aio_write at ffffffff8a1b9f59
#13 [ffff91791321fd80] ext4_file_write at ffffffffc0573322 [ext4]
#14 [ffff91791321fdf0] do_sync_write at ffffffff8a241c13
#15 [ffff91791321fec8] vfs_write at ffffffff8a242700
#16 [ffff91791321ff08] sys_write at ffffffff8a24351f
#17 [ffff91791321ff50] system_call_fastpath at ffffffff8a776ddb
RIP: 00007fd9b3b0569d RSP: 00007fd8b1f6d680 RFLAGS: 00000293
RAX: 0000000000000001 RBX: 00007fd8b1f6d820 RCX: 0000000000000010
RDX: 0000000000000046 RSI: 00007fd8b1f6b7c0 RDI: 00000000000000ee
RBP: 00007fd8b1f6b790 R8: 00007fd8b1f6b7c0 R9: 00000005c5741848
R10: 0000000000216546 R11: 0000000000000293 R12: 0000000000000046
R13: 00007fd8b1f6b7c0 R14: 00000000000000ee R15: 0000000000000046
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b

13027也在等待,处于UN状态,它的最后一次调度时间为:

crash> ps -m 13027
[ 2 19:52:46.519] [UN] PID: 13027 TASK: ffff9178e0ad1040 CPU: 10 COMMAND: "java"
对比一下报hung的jbd2进程:
crash> ps -m 547025
[ 2 19:52:41.384] [UN] PID: 547025 TASK: ffff91792aff1040 CPU: 16 COMMAND: "jbd2/dm-12-8"

我们发现,其实13027处于UN的时间应该比547025还要多5秒,为啥不是它触发了hung的检测呢?这个问题留在后面

解释。

下面,我们就需要继续分析 13027 为什么持有journal_t.j_checkpoint_mutex锁却没有及时释放。

从代码分析 __jbd2_log_wait_for_space

void __jbd2_log_wait_for_space(journal_t *journal)
{
......
while (jbd2_log_space_left(journal) < nblocks) {//caq:空间不够
write_unlock(&journal->j_state_lock);
mutex_lock(&journal->j_checkpoint_mutex); /*
* Test again, another process may have checkpointed while we
* were waiting for the checkpoint lock. If there are no
* transactions ready to be checkpointed, try to recover
* journal space by calling cleanup_journal_tail(), and if
* that doesn't work, by waiting for the currently committing
* transaction to complete. If there is absolutely no way
* to make progress, this is either a BUG or corrupted
* filesystem, so abort the journal and leave a stack
* trace for forensic evidence.
*/
write_lock(&journal->j_state_lock);
if (journal->j_flags & JBD2_ABORT) {
mutex_unlock(&journal->j_checkpoint_mutex);
return;
}
spin_lock(&journal->j_list_lock);
nblocks = jbd2_space_needed(journal);
space_left = jbd2_log_space_left(journal);//caq:再次获取一次数据
if (space_left < nblocks) {//caq:空间还是不够
int chkpt = journal->j_checkpoint_transactions != NULL;
tid_t tid = 0; if (journal->j_committing_transaction)
tid = journal->j_committing_transaction->t_tid;
spin_unlock(&journal->j_list_lock);
write_unlock(&journal->j_state_lock);
if (chkpt) {
jbd2_log_do_checkpoint(journal);//caq:此时持有 j_checkpoint_mutex
......
}

__jbd2_log_wait_for_space在持有 j_checkpoint_mutex的情况下,进入

jbd2_log_do_checkpoint并在等待log提交,等待完成的条件是当前提交的

tid必须要不大于 j_commit_sequence。

int jbd2_log_wait_commit(journal_t *journal, tid_t tid)
{
int err = 0; read_lock(&journal->j_state_lock);
#ifdef CONFIG_JBD2_DEBUG
if (!tid_geq(journal->j_commit_request, tid)) {
printk(KERN_ERR
"%s: error: j_commit_request=%d, tid=%d\n",
__func__, journal->j_commit_request, tid);
}
#endif
while (tid_gt(tid, journal->j_commit_sequence)) {---只要当前tid大于j_commit_sequence
jbd_debug(1, "JBD2: want %d, j_commit_sequence=%d\n",
tid, journal->j_commit_sequence);
read_unlock(&journal->j_state_lock);
wake_up(&journal->j_wait_commit);
wait_event(journal->j_wait_done_commit,
!tid_gt(tid, journal->j_commit_sequence));--就会等到tid不大于j_commit_sequence为止
read_lock(&journal->j_state_lock);
}
read_unlock(&journal->j_state_lock); if (unlikely(is_journal_aborted(journal)))
err = -EIO;
return err;
}

根据栈中的详细数据:

crash> bt -f 13027
PID: 13027 TASK: ffff9178e0ad1040 CPU: 10 COMMAND: "java"
#0 [ffff91791321f828] __schedule at ffffffff8a769a72
ffff91791321f830: 0000000000000082 ffff91791321ffd8
ffff91791321f840: ffff91791321ffd8 ffff91791321ffd8
ffff91791321f850: 000000000001ab80 ffff917a3761e180
ffff91791321f860: ffff91791321f870 ffffffff8a75e2cb
ffff91791321f870: ffff91791321f8f8 0000000065353fc1
ffff91791321f880: 0000000000000246 ffff91793cbbe000
ffff91791321f890: 000000006f19bbe9 ffff91793cbbe090
ffff91791321f8a0: ffff91793cbbe028 ffff91791321f8f8
ffff91791321f8b0: ffff91791321f8c0 ffffffff8a769f19
#1 [ffff91791321f8b8] schedule at ffffffff8a769f19
ffff91791321f8c0: ffff91791321f938 ffffffffc03ef7c5
#2 [ffff91791321f8c8] jbd2_log_wait_commit at ffffffffc03ef7c5 [jbd2]
ffff91791321f8d0: ffff9178e0ad1040 ffff91793cbbe0a8
ffff91791321f8e0: 0000000000000000 ffff9178e0ad1040
ffff91791321f8f0: ffffffff8a0c3f50 ffff91793cbbe098
ffff91791321f900: ffff91793cbbe098 0000000065353fc1
ffff91791321f910: ffff91a925027700 ffff91793cbbe000 ---r12就是journal_t
ffff91791321f920: ffff91793cbbe3a0 ffff9186a0a3a1a0
ffff91791321f930: 000000006f19bbe9 ffff91791321f9a0 ---r15就是000000006f19bbe9
ffff91791321f940: ffffffffc03ec0bf
#3 [ffff91791321f940] jbd2_log_do_checkpoint at ffffffffc03ec0bf [jbd2]

当前的tid值,可以从栈中取出为:

查看当前tid为:
crash> p 0x6f19bbe9
$2 = 1863957481
crash> journal_t.j_commit_sequence ffff91793cbbe000
j_commit_sequence = 1863957480

熟悉jbd2代码的同学到此肯定能看出问题来了,因为要想j_commit_sequence 不停地增长并完成,得交给

对应设备的jbd2进程,而刚好,它又在等待我们的13027 持有的 j_checkpoint_mutex,两者明显死锁了。

三、故障复现&修复

1.__jbd2_log_wait_for_space调用 jbd2_log_wait_commit 的两个分支,

一个是jbd2_log_do_checkpoint-->__process_buffer-->jbd2_log_wait_commit,

在这个分支是持有j_checkpoint_mutex的,

而在另外一个分支:

jbd2_log_do_checkpoint-->jbd2_log_wait_commit,是先释放了j_checkpoint_mutex再进入的

jbd2_log_wait_commit。

void __jbd2_log_wait_for_space(journal_t *journal)
{
int nblocks, space_left;
/* assert_spin_locked(&journal->j_state_lock); */ nblocks = jbd2_space_needed(journal);
while (jbd2_log_space_left(journal) < nblocks) {//caq:空间不够
write_unlock(&journal->j_state_lock);
mutex_lock(&journal->j_checkpoint_mutex);---持有锁 /*
* Test again, another process may have checkpointed while we
* were waiting for the checkpoint lock. If there are no
* transactions ready to be checkpointed, try to recover
* journal space by calling cleanup_journal_tail(), and if
* that doesn't work, by waiting for the currently committing
* transaction to complete. If there is absolutely no way
* to make progress, this is either a BUG or corrupted
* filesystem, so abort the journal and leave a stack
* trace for forensic evidence.
*/
write_lock(&journal->j_state_lock);
if (journal->j_flags & JBD2_ABORT) {
mutex_unlock(&journal->j_checkpoint_mutex);
return;
}
spin_lock(&journal->j_list_lock);
nblocks = jbd2_space_needed(journal);
space_left = jbd2_log_space_left(journal);//caq:再次获取一次数据
if (space_left < nblocks) {//caq:空间还是不够
int chkpt = journal->j_checkpoint_transactions != NULL;
tid_t tid = 0; if (journal->j_committing_transaction)
tid = journal->j_committing_transaction->t_tid;
spin_unlock(&journal->j_list_lock);
write_unlock(&journal->j_state_lock);
if (chkpt) {----这个分支,持有j_checkpoint_mutex
jbd2_log_do_checkpoint(journal);//caq:注意此时持有 j_checkpoint_mutex
} else if (jbd2_cleanup_journal_tail(journal) == 0) {
/* We were able to recover space; yay! */
;
} else if (tid) {---而这个分支,同样是可能会调用 jbd2_log_wait_commit,正常释放了锁
/*
* jbd2_journal_commit_transaction() may want
* to take the checkpoint_mutex if JBD2_FLUSHED
* is set. So we need to temporarily drop it.
*/
mutex_unlock(&journal->j_checkpoint_mutex);//caq:这个分支在此释放了锁
jbd2_log_wait_commit(journal, tid);
write_lock(&journal->j_state_lock);
continue;
} else {
printk(KERN_ERR "%s: needed %d blocks and "
"only had %d space available\n",
__func__, nblocks, space_left);
printk(KERN_ERR "%s: no way to get more "
"journal space in %s\n", __func__,
journal->j_devname);
WARN_ON(1);
jbd2_journal_abort(journal, 0);
}
write_lock(&journal->j_state_lock);
} else {
spin_unlock(&journal->j_list_lock);
}
mutex_unlock(&journal->j_checkpoint_mutex);
}
}

而在linux主线中,有一个14年的补丁其实是去掉了__process_buffer中复杂的持锁流程的。

[root@custom-16-126 jbd2]# git log -p be1158cc615fd723552f0d9912087423c7cadda5
commit be1158cc615fd723552f0d9912087423c7cadda5
Author: Theodore Ts'o <tytso@mit.edu>
Date: Mon Sep 1 21:19:01 2014 -0400 jbd2: fold __process_buffer() into jbd2_log_do_checkpoint() __process_buffer() is only called by jbd2_log_do_checkpoint(), and it
had a very complex locking protocol where it would be called with the
j_list_lock, and sometimes exit with the lock held (if the return code
was 0), or release the lock. This was confusing both to humans and to smatch (which erronously
complained that the lock was taken twice). Folding __process_buffer() to the caller allows us to simplify the
control flow, making the resulting function easier to read and reason
about, and dropping the compiled size of fs/jbd2/checkpoint.c by 150
bytes (over 4% of the text size). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>

那是不是说,把__process_buffer重构成上面补丁这样就可以了呢?

其实不是的,重构并没有改变拿锁的逻辑,真正修改的话,需要参照下面

这个补丁。

commit 53cf978457325d8fb2cdecd7981b31a8229e446e
Author: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Date: Thu Jan 31 23:42:11 2019 -0500 jbd2: fix deadlock while checkpoint thread waits commit thread to finish This issue was found when I tried to put checkpoint work in a separate thread,
the deadlock below happened:
Thread1 | Thread2
__jbd2_log_wait_for_space |
jbd2_log_do_checkpoint (hold j_checkpoint_mutex)|
if (jh->b_transaction != NULL) |
... |
jbd2_log_start_commit(journal, tid); |jbd2_update_log_tail
| will lock j_checkpoint_mutex,
| but will be blocked here.
|
jbd2_log_wait_commit(journal, tid); |
wait_event(journal->j_wait_done_commit, |
!tid_gt(tid, journal->j_commit_sequence)); |
... |wake_up(j_wait_done_commit)
} | then deadlock occurs, Thread1 will never be waken up. To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint()
when we are going to wait for transaction commit.

回到上面遗留的小问题,为什么jbd2/dm-12-8:547025的最后调度时间比 13027 要晚,

但是它却先报出hung呢?这个需要看 khungtaskd的检测机制有关,可以查看

check_hung_uninterruptible_tasks的源码,具体跟task的串接位置相关,

同时跟当时处于UN状态的进程数也有些关系。

四、故障规避或解决

我们的解决方案是:

1.在红帽的当前3.10.0-957.27.2内核中合入对应的补丁,或者偷懒直接升级到kernel-3.10.0-1127.el7及以上。

jbd2的死锁分析的更多相关文章

  1. 一次 MySQL 线上死锁分析实战

    关键词:MySQL Index Merge 前言 MySQL 的锁机制相信大家在学习 MySQL 的时候都有简单的了解过,那既然有锁就必定绕不开死锁这个问题.其实 MySQL 在大部分场景下是不会存在 ...

  2. 线上BUG:MySQL死锁分析实战

    原文链接:线上BUG:MySQL死锁分析实战 1 线上告警 我们不需要关注截图中得其他信息,只要能看到打印得org.springframework.dao.DeadlockLoserDataAcces ...

  3. golang 中 channel 的详细使用、使用注意事项及死锁分析

    目录 1.什么是 channel,介绍管道 2.channel 的基本使用 3.channel 的使用场景 4.使用 channel的注意事项及死锁分析 什么是 channel 管道 它是一个数据管道 ...

  4. 查看w3wp进程占用的内存及.NET内存泄露,死锁分析

    一 基础知识 在分析之前,先上一张图: 从上面可以看到,这个w3wp进程占用了376M内存,启动了54个线程. 在使用windbg查看之前,看到的进程含有 *32 字样,意思是在64位机器上已32位方 ...

  5. Mysql查询语句使用select.. for update导致的数据库死锁分析

    近期有一个业务需求,多台机器需要同时从Mysql一个表里查询数据并做后续业务逻辑,为了防止多台机器同时拿到一样的数据,每台机器需要在获取时锁住获取数据的数据段,保证多台机器不拿到相同的数据. 我们My ...

  6. 【转】Deadlock的一些总结(死锁分析及处理)

    1.1.1 摘要 在系统设计过程中,系统的稳定性.响应速度和读写速度至关重要,就像12306.cn那样,当然我们可以通过提高系统并发能力来提高系统性能总体性能,但在并发作用下也会出现一些问题,例如死锁 ...

  7. db2死锁分析与处理

    在数据库中,锁的主要功能是为了控制并发数据的完整性而引入的机制,在并发应用中出现锁现象并不可怕,锁现象通常分为死锁和锁等待两种情形. 死锁是因为两个并发的进程或者线程同时各自占有一个资源,又需要占有对 ...

  8. 查看w3wp进程占用的内存及.NET内存泄露,死锁分析--转载

    一 基础知识 在分析之前,先上一张图: 从上面可以看到,这个w3wp进程占用了376M内存,启动了54个线程. 在使用windbg查看之前,看到的进程含有 *32 字样,意思是在64位机器上已32位方 ...

  9. MySQL · BUG分析 · Rename table 死锁分析

    http://mysql.taobao.org/monthly/2016/03/06/ 背景 InnoDB buffer pool中的page管理牵涉到两个链表,一个是lru链表,一个是flush 脏 ...

随机推荐

  1. 学学dotnet core中的身份验证和授权-1-概念

    前言 身份验证: Authentication 授权: Authorization net core 中的身份验证和授权这两个部分,是相辅相成的.当初我在学在部分的时候,是看的 net core 官网 ...

  2. CabloyJS一站式助力微信、企业微信、钉钉开发 - 企业微信篇

    前言 现在软件开发不仅要面对前端碎片化,还要面对后端碎片化.针对前端碎片化,CabloyJS提供了pc=mobile+pad的跨端自适应方案,参见:自适应布局:pc = mobile + pad 在这 ...

  3. LVS简单搭建(一)

    先说一个坑: 192.168.65.110为虚拟ip,在主节点设置(ifconfig  eth0:8 192.168.65.110/24),其他子节点要与主节点保持一致(ifconfig  lo:3  ...

  4. JS:逗号运算符

    逗号运算符: 会把逗号隔开的表达式全部执行 最后一个运行的表达式的结果就是逗号运算符的结果   例: var a = (1, 2, 3, 4, 5, 6); console.log(a); //6 隐 ...

  5. netty系列之:HashedWheelTimer一种定时器的高效实现

    目录 简介 java.util.Timer java.util.concurrent.ScheduledThreadPoolExecutor HashedWheelTimer 总结 简介 定时器是一种 ...

  6. Django快速入门之项目配置

    开始 环境 python:3.6.2 django:2.0.5 跑起来 用pycharm导入或新建一个Django项目,在目录中存在manage.py的文件,通过下列指令运行Django后台. pyt ...

  7. Spring框架系列(10) - Spring AOP实现原理详解之AOP代理的创建

    上文我们介绍了Spring AOP原理解析的切面实现过程(将切面类的所有切面方法根据使用的注解生成对应Advice,并将Advice连同切入点匹配器和切面类等信息一并封装到Advisor).本文在此基 ...

  8. 【.NET+MQTT】.NET6 环境下实现MQTT通信,以及服务端、客户端的双边消息订阅与发布的代码演示

    前言: MQTT广泛应用于工业物联网.智能家居.各类智能制造或各类自动化场景等.MQTT是一个基于客户端-服务器的消息发布/订阅传输协议,在很多受限的环境下,比如说机器与机器通信.机器与物联网通信等. ...

  9. Intel的CPU系列说明

    至强可扩展系列是英特尔推出的新一代至强处理器系列,如今距离该系列推出几乎过去一年了.新的CPU并没有延续E系列的命名,英特尔将至强可扩展系列以金属命名,将该系列分为"铂金Platinum&q ...

  10. Ubuntu安装python各版本

    编译安装的话,之前遇到过很多小问题,感觉还是通过添加这个ppa方式装的比较稳,缺点是可能安装的比较慢,可配合proxychain4 sudo apt install software-properti ...