http://paulmck.livejournal.com/7314.html

RCU的作者,paul在他的blog中有提到这个问题,也明确提到需要在module exit的地方使用rcu_barrier来等待保证call_rcu的回调函数callback能够执行完成,然后再正式卸载模块,方式快速卸载之后call_back回调发现空指针的问题,从而导致kernel panic的问题。

RCU and unloadable modules

  • Jun. 8th, 2009 at 1:38 PM

The rcu_barrier() function was described some time back in an article on Linux Weekly News. This rcu_barrier() function solves the problem where a given module invokes call_rcu() using a function in that module, but the module is removed before the corresponding grace period elapses, or at least before the callback can be invoked. This results in an attempt to call a function whose code has been removed from the Linux kernel. Oops!!!

Since the above article was written, rcu_barrier_bh() and rcu_barrier_sched() have been accepted into the Linux kernel, for use with call_rcu_bh() and call_rcu_sched(), respectively. These functions have seen relatively little use, which is no surprise, given that they are quite specialized. However, Jesper Dangaard recently discovered that they need to be used a bit more heavily. This lead to the question of exactly when they needed to be used, to which I responded as follows:

Unless there is some other mechanism to ensure that all the RCU callbacks have been invoked before the module exit, there needs to be code in the module-exit function that does the following:

  1. Prevents any new RCU callbacks from being posted. In other words, make sure that no future call_rcu()invocations happen from this module unless those call_rcu() invocations touch only functions and data that outlive this module.
  2. Invokes rcu_barrier().
  3. Of course, if the module uses call_rcu_sched() instead of call_rcu(), then it should invoke rcu_barrier_sched() instead of rcu_barrier(). Similarly, if it uses call_rcu_bh() instead of call_rcu(), then it should invoke rcu_barrier_bh() instead of rcu_barrier(). If the module uses more than one of call_rcu()call_rcu_sched(), and call_rcu_bh(), then it must invoke more than one of rcu_barrier()rcu_barrier_sched(), and rcu_barrier_bh().

What other mechanism could be used? I cannot think of one that it safe. For example, a module that tried to count the number of RCU callbacks in flight would be vulnerable to races as follows:

  1. CPU 0: RCU callback decrements the counter.
  2. CPU 1: module-exit function notices that the counter is zero, so removes the module.
  3. CPU 0: attempts to execute the code returning from the RCU callback, and dies horribly due to that code no longer being in memory.

If there was an easy solution (or even a hard solution) to this problem, then I do not believe that Nikita Danilov would have asked Dipankar Sarma for rcu_barrier(). Therefore, I do not expect anyone to be able to come up with an alternative to rcu_barrier() and friends. Always happy to learn something by being proven wrong, of course!!!

So unless someone can show me some other safe mechanism, every unloadable module that uses call_rcu()call_rcu_sched(), or call_rcu_bh() must use rcu_barrier()rcu_barrier_sched(), and/or rcu_barrier_bh() in its module-exit function.

So if you have a module that uses one of the call_rcu() functions, please use the corresponding rcu_barrier()function in the module-exit code!

Update: Peter Zijlstra rightly points out that the issue is not whether your module invokes call_rcu(), but rather whether the corresponding RCU callback invokes a function that is in a module. So, if there is a call_rcu()call_rcu_sched(), or call_rcu_bh() anywhere in the kernel whose RCU callback either directly or indirectly invokes a function in your module, then your module's exit function needs to invoke rcu_barrier()rcu_barrier_sched(), and/or rcu_barrier_bh(). Thanks to Peter for pointing this out!

关于call_rcu在内核模块退出时可能引起kernel panic的问题的更多相关文章

  1. Android退出时关闭所有Activity的方法

    Android退出时,有的Activity可能没有被关闭.为了在Android退出时关闭所有的Activity,设计了以下的类: //关闭Activity的类 public class CloseAc ...

  2. Qt 程序退出时断言错误——_BLOCK_TYPE_IS_VALID(pHead->nBlockUse),由setAttribute(Qt::WA_DeleteOnClose)引起

    最近在学习QT,自己仿写了一个简单的QT绘图程序,但是在退出时总是报错,断言错误: 报错主要问题在_BLOCK_TYPE_IS_VALID(pHead->nBlockUse),是在关闭窗口时报的 ...

  3. Android设置Activity启动和退出时的动画

    业务开发时遇到的一个小特技,要求实现Activity启动时自下向上弹出,退出时自上向下退出. 此处不关注启动和退出时其他Activity的动画效果,实现方法有两种: 1.代码方式,通过Activity ...

  4. HP平台由于变量声明冲突导致程序退出时的core

    最近遇到一个莫名的问题,在HP-UX B.11.31 U ia64平台下,程序PetriService在接收到产品化退出或Ctrl-C时,程序在main函数返回后析构全局的CTQueue<SMs ...

  5. Android 编程下 Activity 的创建和应用退出时的销毁

    为了确保对应用中 Activity 的创建和销毁状态进行控制,所以就需要一个全局的变量来记录和销毁这些 Activity.这里的大概思路是写一个类继承 Application,并使获取该 Applic ...

  6. 解决log4cxx退出时的异常

    解决log4cxx退出时的异常(金庆的专栏)如果使用log4cxx的FileWatchdog线程来监视日志配置文件进行动态配置,就可能碰到程序退出时产生的异常.程序退出时清理工作耗时很长时,该异常很容 ...

  7. 神奇的bug,退出时自动更新时间

    遇到一个神奇的bug,用户退出时,上次登录时间会变成退出时的时间. 于是开始跟踪,发现Laravel在退出时,会做一次脏检查,这时会更新rember_token,这时就会有update操作如下. 而粗 ...

  8. java实现创建临时文件然后在程序退出时自动删除文件(转)

    这篇文章主要介绍了java实现创建临时文件然后在程序退出时自动删除文件,从个人项目中提取出来的,小伙伴们可以直接拿走使用. 通过java的File类创建临时文件,然后在程序退出时自动删除临时文件.下面 ...

  9. os.waitpid()无法获取sys.exit()退出时的status code

    [目的] 父进程使用os.waitpid()等待子进程退出,并检测子进程的exit code,以决定是否重启子进程. (常见的应用场景是:子进程接收外部命令,收到"stop"时退出 ...

随机推荐

  1. 学习QT——GUI的基础用法(2)

    1.listWidget列表 在构造函数里面添加: ; i<; i++) { ui->listWidget->addItem(QString::number(i)+"ite ...

  2. @Component单例与并发(未解决)

    今天用websocket记录连接的个数: 模拟少量请求到服务器端的websocket,@Component默认是单例的,让其注解到MyWebSocket类上: 每次请求过来都是相同的MyWebSock ...

  3. ssh问题:ssh_exchange_identification: Connection closed by remote host

    ssh问题:ssh_exchange_identification: Connection closed by remote host... 刚刚一个朋友告诉我SSH连接不上服务器了,重启电脑也不管用 ...

  4. python-ceilometerclient命令行(终结)

    ceilometerclient入口 工程ceilometerclient shell.py中的main方法 ceilometerclient目录 --ceilometerclient --commo ...

  5. date命令转换日期命令提示date: illegal time format

    问题:运行date命令抛错 date -j -f "%a %b %d %T %Z %Y" "Sat Sep 29 11:33:00 CST 2018" &quo ...

  6. sql去除重复语句

    转自芙蓉清秀的BLOG http://blog.sina.com.cn/liurongxiu1211  sql去除重复语句 (2012-06-15 15:00:01) sql 单表/多表查询去除重复记 ...

  7. vue 定时器的问题

    在项目中,我们经常会使用到定时器setInterval(),可是很多时候我们会发现,即使我退出当前页面,定时器依然在工作,非常消耗内存,所以我们要进行手动清理: 将定时器保存在变量中,退出页面时清除变 ...

  8. poj3104(二分)

    题目链接:http://poj.org/problem?id=3104 题意:有n件衣服,每一件含有a[i]单位的水,每分钟衣服可以自然蒸发1单位的水,也可以在烘干器上每分钟烘干k单位的水,问将所有衣 ...

  9. RedHat 更新CentOS Yum源(转)

    经测试,可用.转自:https://www.cnblogs.com/tangsen/p/5151994.html 一.随笔引言 1.1随笔内容: 1.RedHat 配置Centos yum源 2.yu ...

  10. UVa 536 Tree Recovery(二叉树后序遍历)

    Little Valentine liked playing with binary trees very much. Her favorite game was constructing rando ...