Linux RCU机制详解

关于rcu的几点声明：

1:RCU使用在读者多而写者少的情况.RCU和读写锁相似.但RCU的读者占锁没有任何的系统开销.写者与写写者之间必须要保持同步,且写者必须要等它之前的读者全部都退出之后才能释放之前的资源.
2:RCU保护的是指针.这一点尤其重要.因为指针赋值是一条单指令.也就是说是一个原子操作.因它更改指针指向没必要考虑它的同步.只需要考虑cache的影响.
3:读者是可以嵌套的.也就是说rcu_read_lock()可以嵌套调用.
4:读者在持有rcu_read_lock()的时候,不能发生进程上下文切换.否则,因为写者需要要等待读者完成,写者进程也会一直被阻塞.

5:spin lock是互斥的，任何时候只有一个thread（reader or writer）进入临界区，rw spin lock要好一些，允许多个reader并发执行，提高了性能。不过，reader和updater不能并发执行，RCU解除了这些限制，允许一个updater（不能多个updater进入临界区，这可以通过spinlock来保证）和多个reader并发执行。

核心api：

对于reader，RCU的操作包括：

（1）rcu_read_lock，用来标识RCU read side临界区的开始。

（2）rcu_dereference，该接口用来获取RCU protected pointer。reader要访问RCU保护的共享数据，当然要获取RCU protected pointer，然后通过该指针进行dereference的操作。

（3）rcu_read_unlock，用来标识reader离开RCU read side临界区

对于writer，RCU的操作包括：

（1）rcu_assign_pointer。该接口被writer用来进行removal的操作，在witer完成新版本数据分配和更新之后，调用这个接口可以让RCU protected pointer指向RCU protected data。

（2）synchronize_rcu。writer端的操作可以是同步的，也就是说，完成更新操作之后，可以调用该接口函数等待所有在旧版本数据上的reader线程离开临界区，一旦从该函数返回，说明旧的共享数据没有任何引用了，可以直接进行reclaimation的操作。

（3）call_rcu。当然，某些情况下（例如在softirq context中），writer无法阻塞，这时候可以调用call_rcu接口函数，该函数仅仅是注册了callback就直接返回了，在适当的时机会调用callback函数，完成reclaimation的操作。这样的场景其实是分开removal和reclaimation的操作在两个不同的线程中：updater和reclaimer。

Example1：

struct foo {
int a;
char b;
long c;
};
DEFINE_SPINLOCK(foo_mutex);

struct foo *gbl_foo;
void foo_update_a(int new_a)
{
struct foo *new_fp;
struct foo *old_fp;

new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
spin_lock(&foo_mutex);
old_fp = gbl_foo;
*new_fp = *old_fp;
new_fp->a = new_a;
rcu_assign_pointer(gbl_foo, new_fp);
spin_unlock(&foo_mutex);
synchronize_rcu();
kfree(old_fp);
}

int foo_get_a(void)
{
int retval;

rcu_read_lock();
retval = rcu_dereference(gbl_foo)->a;
rcu_read_unlock();
return retval;
}
如上代码所示,RCU被用来保护全局指针struct foo *gbl_foo. foo_get_a()用来从RCU保护的结构中取得gbl_foo的值.而foo_update_a()用来更新被RCU保护的gbl_foo的值.
另外,我们思考一下,为什么要在foo_update_a()中使用自旋锁foo_mutex呢?
假设中间没有使用自旋锁.那foo_update_a()的代码如下:

void foo_update_a(int new_a)
{
struct foo *new_fp;
struct foo *old_fp;

new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);

old_fp = gbl_foo;
1:-------------------------
*new_fp = *old_fp;
new_fp->a = new_a;
rcu_assign_pointer(gbl_foo, new_fp);

synchronize_rcu();
kfree(old_fp);
}
假设A进程在上图----标识处被B进程抢点.B进程也执行了goo_ipdate_a().等B执行完后，再切换回A进程.此时,A进程所持的old_fd实际上已经被B进程给释放掉了.此后A进程对old_fd的操作都是非法的.

另外,我们在上面也看到了几个有关RCU的核心API.它们为别是:
rcu_read_lock()
rcu_read_unlock()
synchronize_rcu()
rcu_assign_pointer()
rcu_dereference()
其中,rcu_read_lock()和rcu_read_unlock()用来保持一个读者的RCU临界区.在该临界区内不允许发生上下文切换.
rcu_dereference():读者调用它来获得一个被RCU保护的指针.
Rcu_assign_pointer():写者使用该函数来为被RCU保护的指针分配一个新的值.这样是为了安全从写者到读者更改其值.这个函数会返回一个新值

Example2：

 1 struct el {                          1 struct el {

 2   struct list_head list;             2   struct list_head list;

 3   long key;                          3   long key;

 4   spinlock_t mutex;                  4   spinlock_t mutex;

 5   int data;                          5   int data;

 6   /* Other data fields */            6   /* Other data fields */

 7 };                                   7 };

 8 rwlock_t listmutex;                  8 spinlock_t listmutex;

 9 struct el head;                      9 struct el head;

 1 int search(long key, int *result)    1 int search(long key, int *result)

 2 {                                    2 {

 3   struct list_head *lp;              3   struct list_head *lp;

 4   struct el *p;                      4   struct el *p;

 5                                      5

 6   read_lock(&listmutex);             6   rcu_read_lock();

 7   list_for_each_entry(p, head, lp) { 7   list_for_each_entry_rcu(p, head, lp) {

 8     if (p->key == key) {             8     if (p->key == key) {

 9       *result = p->data;             9       *result = p->data;

10       read_unlock(&listmutex);      10       rcu_read_unlock();

11       return 1;                     11       return 1;

12     }                               12     }

13   }                                 13   }

14   read_unlock(&listmutex);          14   rcu_read_unlock();

15   return 0;                         15   return 0;

16 }                                   16 }

 1 int delete(long key)                 1 int delete(long key)

 2 {                                    2 {

 3   struct el *p;                      3   struct el *p;

 4                                      4

 5   write_lock(&listmutex);            5   spin_lock(&listmutex);

 6   list_for_each_entry(p, head, lp) { 6   list_for_each_entry(p, head, lp) {

 7     if (p->key == key) {             7     if (p->key == key) {

 8       list_del(&p->list);            8       list_del_rcu(&p->list) or list_add_rcu(&p->list);

 9       write_unlock(&listmutex);      9       spin_unlock(&listmutex);

                                       10       synchronize_rcu();

10       kfree(p);                     11       kfree(p);

11       return 1;                     12       return 1;

12     }                               13     }

13   }                                 14   }

14   write_unlock(&listmutex);         15   spin_unlock(&listmutex);

15   return 0;                         16   return 0;

16 }                                   17 }

Example3：

rcu_assign_pointer()通常用于写者的发布，rcu_dereference()通常用于读者的订阅。

写者：

1 p->a = 1;
2 p->b = 2;
3 p->c = 3;
4 rcu_assign_pointer(gp, p);

读者：

1 rcu_read_lock();
2 p = rcu_dereference(gp);
3 if (p != NULL) {
4 do_something_with(p->a, p->b, p->c);
5 }
6 rcu_read_unlock();

rcu_assign_pointer()是说，先把那块内存写好，再把指针指过去。这里使用的内存写屏障是为了保证并发的读者读到数据一致性。在这条语句之前的读者读到旧的指针和旧的内存，这条语句之后的读者读到新的指针和新的内存。如果没有这条语句，很有可能出现读者读到新的指针和旧的内存。也就是说，这里通过内存屏障刷新了p所指向的内存的值，至于gp本身的值有没有更新还不确定。实际上，gp本身值的真正更新要等到并发的读者来促发。
rcu_dereference() 原语用的是数据依赖屏障，smp_read_barrier_dependence,它要求后面的读操作如果依赖前面的读操作，则前面的读操作需要首先完成。根据数据之间的依赖，要读p->a, p->b, p->c, 就必须先读p，要先读p，就必须先读p1，要先读p1，就必须先读gp。也就是说读者所在的core在进行后续的操作之前，gp必须是同步过的当前时刻的最新值。如果没有这个数据依赖屏障，有可能读者所在的core很长一段时间内一直用的是旧的gp值。所以，这里使用数据依赖屏障是为了督促写者将gp值准备好，是为了呼应写者，这个呼应的诉求是通过数据之间的依赖关系来促发的，也就是说到了非呼应不可的地步了。

Example4：

/*共享数据结构体*/
/*其中rcu_head为双向链表*/
struct shared_data{
char a;
int b;
struct rcu_head rcu;
}
/*读取者，临界区的代码不允许睡眠*/
static void reader(struct shared_data *ptr)
{
struct shared_data *p = NULL;
rcu_read_lock();
/*调用 rcu_dereference 在双向链表中获得ptr指针*/
p = rcu_dereference(*ptr);
if(p)
do_something_with(p);
rcu_read_unlock();
}
/*写入者*/
/*使用回调函数,contain_of从双向链表中获取老的共享数据*/
static void del_old_ptr(struct rcu_head *rh)
{
struct shared_data *p = contain_of(rh,struct shared_data,rcu)
kfree(p);
}
static void writer(struct shared_data *ptr)
{
struct shared_data *new_ptr = malloc(...);
...
new_ptr->a = 'a';
new_ptr->b = 1;
/*更新指针*/
rcu_assign_pionter(new_ptr);
/*注册回调函数*/
call_rcu(ptr->rcu,del_old_ptr);
}

Example5：

	struct foo {

		int a;

		int b;

		int c;

	};

	struct foo *gp1;

	struct foo *gp2;

	void updater(void)

	{

		struct foo *p;

		p = kmalloc(...);

		if (p == NULL)

			deal_with_it();

		p->a = 42;  /* Each field in its own cache line. */

		p->b = 43;

		p->c = 44;

		rcu_assign_pointer(gp1, p);

		p->b = 143;

		p->c = 144;

		rcu_assign_pointer(gp2, p);

	}

	void reader(void)

	{

		struct foo *p;

		struct foo *q;

		int r1, r2;

		p = rcu_dereference(gp2);

		if (p == NULL)

			return;

		r1 = p->b;  /* Guaranteed to get 143. */

		q = rcu_dereference(gp1);  /* Guaranteed non-NULL. */

		if (p == q) {

			/* The compiler decides that q->c is same as p->c. */

			r2 = p->c; /* Could get 44 on weakly order system. */

		}

		do_something_with(r1, r2);

	}

You might be surprised that the outcome (r1 == 143 && r2 == 44) is possible,

but you should not be.  After all, the updater might have been invoked

a second time between the time reader() loaded into "r1" and the time

that it loaded into "r2".  The fact that this same result can occur due

to some reordering from the compiler and CPUs is beside the point.

But suppose that the reader needs a consistent view?

Then one approach is to use locking, for example, as follows:

	struct foo {

		int a;

		int b;

		int c;

		spinlock_t lock;

	};

	struct foo *gp1;

	struct foo *gp2;

	void updater(void)

	{

		struct foo *p;

		p = kmalloc(...);

		if (p == NULL)

			deal_with_it();

		spin_lock(&p->lock);

		p->a = 42;  /* Each field in its own cache line. */

		p->b = 43;

		p->c = 44;

		spin_unlock(&p->lock);

		rcu_assign_pointer(gp1, p);

		spin_lock(&p->lock);

		p->b = 143;

		p->c = 144;

		spin_unlock(&p->lock);

		rcu_assign_pointer(gp2, p);

	}

	void reader(void)

	{

		struct foo *p;

		struct foo *q;

		int r1, r2;

		p = rcu_dereference(gp2);

		if (p == NULL)

			return;

		spin_lock(&p->lock);

		r1 = p->b;  /* Guaranteed to get 143. */

		q = rcu_dereference(gp1);  /* Guaranteed non-NULL. */

		if (p == q) {

			/* The compiler decides that q->c is same as p->c. */

			r2 = p->c; /* Locking guarantees r2 == 144. */

		}

		spin_unlock(&p->lock);

		do_something_with(r1, r2);

	}

As always, use the right tool for the job!

Example6：

如果写者需要对链表条目进行修改，那么就需要首先拷贝要修改的条目，然后修改条目的拷贝，等修改完毕后，再使用条目拷贝取代要修改的条目，要修改条目将被在经历一个grace period后安全删除。

对于系统调用审计代码，并没有这种情况。这里假设有修改的情况，那么使用rwlock的修改代码应当如下：

       static inline int audit_upd_rule(struct audit_rule *rule,

                                         struct list_head *list,

                                         __u32 newaction,

                                         __u32 newfield_count)

        {

                struct audit_entry  *e;

                struct audit_newentry *ne;

                write_lock(&auditsc_lock);

                /* Note: audit_netlink_sem held by caller. */

                list_for_each_entry(e, list, list) {

                        if (!audit_compare_rule(rule, &e->rule)) {

                                e->rule.action = newaction;

                                e->rule.file_count = newfield_count;

                                write_unlock(&auditsc_lock);

                                return 0;

                        }

                }

                write_unlock(&auditsc_lock);

                return -EFAULT;         /* No matching rule */

        }

如果使用RCU，修改代码应当为；

      static inline int audit_upd_rule(struct audit_rule *rule,

                                         struct list_head *list,

                                         __u32 newaction,

                                         __u32 newfield_count)

        {

                struct audit_entry  *e;

                struct audit_newentry *ne;

                list_for_each_entry(e, list, list) {

                        if (!audit_compare_rule(rule, &e->rule)) {

                                ne = kmalloc(sizeof(*entry), GFP_ATOMIC);

                                if (ne == NULL)

                                        return -ENOMEM;

                                audit_copy_rule(&ne->rule, &e->rule);

                                ne->rule.action = newaction;

                                ne->rule.file_count = newfield_count;

                                list_replace_rcu(e, ne);

                                call_rcu(&e->rcu, audit_free_rule, e);

                                return 0;

                        }

                }

                return -EFAULT;         /* No matching rule */

        }

修改操作立即可见

前面两种情况，读者能够容忍修改可以在一段时间后看到，也就说读者在修改后某一时间段内，仍然看到的是原来的数据。在很多情况下，读者不能容忍看到旧的数据，这种情况下，需要使用一些新措施，如System V IPC，它在每一个链表条目中增加了一个deleted字段，标记该字段是否删除，如果删除了，就设置为真，否则设置为假，当代码在遍历链表时，核对每一个条目的deleted字段，如果为真，就认为它是不存在的。

还是以系统调用审计代码为例，如果它不能容忍旧数据，那么，读端代码应该修改为：

       static enum audit_state audit_filter_task(struct task_struct *tsk)

        {

                struct audit_entry *e;

                enum audit_state   state;

                rcu_read_lock();

                list_for_each_entry_rcu(e, &audit_tsklist, list) {

                        if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {

                                spin_lock(&e->lock);

                                if (e->deleted) {

                                        spin_unlock(&e->lock);

                                        rcu_read_unlock();

                                        return AUDIT_BUILD_CONTEXT;

                                }

                                rcu_read_unlock();

                                return state;

                        }

                }

                rcu_read_unlock();

                return AUDIT_BUILD_CONTEXT;

        }

注意，对于这种情况，每一个链表条目都需要一个spinlock保护，因为删除操作将修改条目的deleted标志。此外，该函数如果搜索到条目，返回时应当保持该条目的锁，因为只有这样，才能看到新的修改的数据，否则，仍然可能看到就的数据。

写端的删除操作将变成：

       static inline int audit_del_rule(struct audit_rule *rule,

                                         struct list_head *list)

        {

                struct audit_entry  *e;

                /* Do not use the _rcu iterator here, since this is the only

                 * deletion routine. */

                list_for_each_entry(e, list, list) {

                        if (!audit_compare_rule(rule, &e->rule)) {

                                spin_lock(&e->lock);

                                list_del_rcu(&e->list);

                                e->deleted = 1;

                                spin_unlock(&e->lock);

                                call_rcu(&e->rcu, audit_free_rule, e);

                                return 0;

                        }

                }

                return -EFAULT;         /* No matching rule */

        }

删除条目时，需要标记该条目为已删除。这样读者就可以通过该标志立即得知条目是否已经删除。

Linux RCU机制详解的更多相关文章

Linux RCU 机制详解
1.简介: RCU(Read-Copy Update)是数据同步的一种方式,在当前的Linux内核中发挥着重要的作用. RCU主要针对的数据对象是链表,目的是提高遍历读取数据的效率,为了达到目的使用R ...
Linux 内存机制详解宝典
Linux 内存机制详解宝典在linux的内存分配机制中,优先使用物理内存,当物理内存还有空闲时(还够用),不会释放其占用内存,就算占用内存的程序已经被关闭了,该程序所占用的内存用来做缓存使用,对于 ...
linux 内核 RCU机制详解
RCU(Read-Copy Update)是数据同步的一种方式,在当前的Linux内核中发挥着重要的作用.RCU主要针对的数据对象是链表,目的是提高遍历读取数据的效率,为了达到目的使用RCU机制读取数 ...
linux内核 RCU机制详解【转】
本文转载自:https://blog.csdn.net/xabc3000/article/details/15335131 简介 RCU(Read-Copy Update)是数据同步的一种方式,在当前 ...
Linux Kdump 机制详解
文章目录 1. 简介 1.1 安装 1.2 触发 kdump 1.3 调试 kdump 1.3.1 安装 debuginfo vmlinux 1.3.2 编译 kernel 1.4 kdump-too ...
红帽Linux故障定位技术详解与实例(2)
红帽Linux故障定位技术详解与实例(2) 2011-09-28 14:26 圈儿 BEAREYES.COM 我要评论(0) 字号:T | T 在线故障定位就是在故障发生时, 故障所处的操作系统环境仍 ...
[转帖]linux screen 命令详解，xshell关掉窗口或者断开连接，查看断开前执行的命令
linux screen 命令详解,xshell关掉窗口或者断开连接,查看断开前执行的命令 https://binwaer.com/post/12.html yun install -y screen ...
Linux常用命令详解下
Linux常用命令详解目录一.Linux常用命令 1.1.查看及切换目录(pwd.cd.ls.du) 1.2.创建目录和文件(mkdir.touch.ln) 1.3.复制.删除.移动目录和文件(c ...
linux awk命令详解
linux awk命令详解简介 awk是一个强大的文本分析工具,相对于grep的查找,sed的编辑,awk在其对数据分析并生成报告时,显得尤为强大.简单来说awk就是把文件逐行的读入,以空格为默认分 ...

随机推荐

[河南省ACM省赛-第三届] AMAZING AUCTION （nyoj 251）
题目链接:http://acm.nyist.net/JudgeOnline/problem.php?pid=251 规则: 1.若某竞标价唯一,则胜出 2.若不存在唯一竞标价,则投标次数最少竞标价中标 ...
Hibernate 框架基本知识
QTP:Quick Test Pressional 1,Hibernate是一个优秀的java持久化层解决方案,是当今主流的对象-关系映射(ORM,ObjectRelationalMapping)工具 ...
sharepoint代码添加WebPart
Adding a web part Following code snippet can be used to add a web part in an existing SharePoint w ...
兼容不同浏览器的 CSS Hack 写法
所谓 CSS Hack,是指在 CSS 代码中嵌入诸如 *,*html 等代码,方便于独立控制某种浏览器的具体样式.比如有些 CSS Hack 只能被 IE6 或 IE7 识别,而 Firefox ...
Mac软件记录
前端: Brackets,sourceTree,dreamweaver,ps,ai,softmatic Weblayers. phpStorm,pyCharm,IDEA,eaclipse,XCODE, ...
Android;设置TextView加粗代码设置
我用过paint的那种方式,不好使. private void setTextBold(TextView textView) { //android中为textview动态设置字体为粗体 textVi ...
Centos安装vncserver服务
vnc是一款Windows远程桌面软件,其优点是支持跨操作系统的远程图形化控制.下面开始记录第一次安装vnc服务的过程. 1.先检查系统是否有安装VNC服务 [root@localhost ~]# [ ...
C/C++中整数与浮点数在内存中的表示方式
在C/C++中数字类型主要有整数与浮点数两种类型,在32位机器中整型占4字节,浮点数分为float,double两种类型,其中float占4字节,而double占8字节.下面来说明它们在内存中的具体表 ...
UITextField 设置 placeholder 的字体颜色方法
p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 18.0px Helvetica; color: #ff2608 } [_textField setValu ...
【MySQL】查询优化实例解析-延迟关联优化
[提出问题] 从数据表t通过分页查询的方式读取数据,读取时要根据a1排序.t有80万行记录,当OFFSET很大时,读取速度很慢.优化后查询速度提升很快. 下图是表的定义,一共有几十个字段,RowLen ...

Linux RCU机制详解

Linux RCU机制详解的更多相关文章

随机推荐

热门专题