linux 内核的RCU本质

RCU，Read-Copy Update，是一种同步机制，它的本质就是在同步什么？1. 它只有reader-side lock，并且不产生锁竞争。2. 它同步reader-side 临界区和 reclaim-side 临界区，而不是writer-side临界区。3. rcu reader并发访问由共享指针指向的不同版本的数据副本（copy），而(reader/writer) spinlock同步对同一份数据的所有访问。4. rcu writer-side临界区的同步必须由使用者来完成。5. rcu对数据旧副本的读访问和回收进行同步，保护的是数据旧副本。

rcu的主要思想是将update拆分成removal和reclamation两步，并且延后(defer)(旧副本的)析构回收(reclaim)。

rcu将原本同一份数据互斥的read操作和write操作，转化成read操作和update操作在一份数据不同版本的副本上，依赖的是指针指向的切换。由于数据有不同副本，旧副本必须要回收，所以rcu将update操作拆分成removal操作和reclamation操作。这样一来，数据的每一个副本的write操作从竞态条件分离出来，根据不需要write_lock，包含在update-side的removal操作步骤。而read操作则是并发在不同的副本上，它与update-side的removal操作步骤唯一的竞态条件发生在共享指针的访问上。副本的切换是由update-side的removal操作完成的，它只使用了内存屏障来同步reader-side对共享指针的访问。update-side的reclammation操作完成对数据的旧副本回收，它必须与进行访问旧副本的reader-side临界区同步，但是并不与reader-side lock存在锁竞争，换句话说，reader-side临界区因为锁竞争而阻塞在reader-side lock上。事实也是如此，虽然rcu提供了原语操作rcu_read_lock和rcu_read_unlock，但是在里面的锁计数并没有进行原子操作，并且锁的计数不是对应于一个锁，而对应于一个线程，只用来描述rcu_read_lock在当前线程嵌套的层数。所以虽然不同cpu上的线程都进入了rcu reader-side临界区，但是它们的却各自使用一个锁计数，因此reader-side临界区不会阻塞在rcu_read_lock上。reclaim-side临界区必须同步于访问旧副本reader-side临界区之后，但是并不与访问新副本reader-side临界区同步。这样来看，rcu的同步机制保护的不是同一份数据的访问，而是一份数据的旧副本的访问和回收。

下面是官方设计文档

What is RCU?

RCU is a synchronization mechanism that was added to the Linux kernel

during the 2.5 development effort that is optimized for read-mostly

situations.  Although RCU is actually quite simple once you understand it,

getting there can sometimes be a challenge.  Part of the problem is that

most of the past descriptions of RCU have been written with the mistaken

assumption that there is "one true way" to describe RCU.  Instead,

the experience has been that different people must take different paths

to arrive at an understanding of RCU.  This document provides several

different paths, as follows:

.    RCU OVERVIEW

.    WHAT IS RCU'S CORE API?

.    WHAT ARE SOME EXAMPLE USES OF CORE RCU API?

.    WHAT IF MY UPDATING THREAD CANNOT BLOCK?

.    WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?

.    ANALOGY WITH READER-WRITER LOCKING

.    FULL LIST OF RCU APIs

.    ANSWERS TO QUICK QUIZZES

People who prefer starting with a conceptual overview should focus on

Section , though most readers will profit by reading this section at

some point.  People who prefer to start with an API that they can then

experiment with should focus on Section .  People who prefer to start

with example uses should focus on Sections  and .  People who need to

understand the RCU implementation should focus on Section , then dive

into the kernel source code.  People who reason best by analogy should

focus on Section .  Section  serves as an index to the docbook API

documentation, and Section  is the traditional answer key.

So, start with the section that makes the most sense to you and your

preferred method of learning.  If you need to know everything about

everything, feel free to read the whole thing -- but if you are really

that type of person, you have perused the source code and will therefore

never need this document anyway.  ;-)

.  RCU OVERVIEW

The basic idea behind RCU is to split updates into "removal" and

"reclamation" phases.  The removal phase removes references to data items

within a data structure (possibly by replacing them with references to

new versions of these data items), and can run concurrently with readers.

The reason that it is safe to run the removal phase concurrently with

readers is the semantics of modern CPUs guarantee that readers will see

either the old or the new version of the data structure rather than a

partially updated reference.  The reclamation phase does the work of reclaiming

(e.g., freeing) the data items removed from the data structure during the

removal phase.  Because reclaiming data items can disrupt any readers

concurrently referencing those data items, the reclamation phase must

not start until readers no longer hold references to those data items.

Splitting the update into removal and reclamation phases permits the

updater to perform the removal phase immediately, and to defer the

reclamation phase until all readers active during the removal phase have

completed, either by blocking until they finish or by registering a

callback that is invoked after they finish.  Only readers that are active

during the removal phase need be considered, because any reader starting

after the removal phase will be unable to gain a reference to the removed

data items, and therefore cannot be disrupted by the reclamation phase.

So the typical RCU update sequence goes something like the following:

a.    Remove pointers to a data structure, so that subsequent

    readers cannot gain a reference to it.

b.    Wait for all previous readers to complete their RCU read-side

    critical sections.

c.    At this point, there cannot be any readers who hold references

    to the data structure, so it now may safely be reclaimed

    (e.g., kfree()d).

Step (b) above is the key idea underlying RCU's deferred destruction.

The ability to wait until all readers are done allows RCU readers to

use much lighter-weight synchronization, in some cases, absolutely no

synchronization at all.  In contrast, in more conventional lock-based

schemes, readers must use heavy-weight synchronization in order to

prevent an updater from deleting the data structure out from under them.

This is because lock-based updaters typically update data items in place,

and must therefore exclude readers.  In contrast, RCU-based updaters

typically take advantage of the fact that writes to single aligned

pointers are atomic on modern CPUs, allowing atomic insertion, removal,

and replacement of data items in a linked structure without disrupting

readers.  Concurrent RCU readers can then continue accessing the old

versions, and can dispense with the atomic operations, memory barriers,

and communications cache misses that are so expensive on present-day

SMP computer systems, even in absence of lock contention.

In the three-step procedure shown above, the updater is performing both

the removal and the reclamation step, but it is often helpful for an

entirely different thread to do the reclamation, as is in fact the case

in the Linux kernel's directory-entry cache (dcache).  Even if the same

thread performs both the update step (step (a) above) and the reclamation

step (step (c) above), it is often helpful to think of them separately.

For example, RCU readers and updaters need not communicate at all,

but RCU provides implicit low-overhead communication between readers

and reclaimers, namely, in step (b) above.

So how the heck can a reclaimer tell when a reader is done, given

that readers are not doing any sort of synchronization operations???

Read on to learn about how RCU's API makes this easy.

whatisRCU

RCU的思想是将updates分离成removal（迁移）和reclamation（回收）两个动作。
removal：
将数据结构的成员用新版本替换旧版本的引用。可以与readers安全地并发。
原因是现在cpu保证reader能够可见一个数据结构的新旧两个版本。
reclamation：
回收在removal动作过程中移除的数据项。由于回收动作会破坏任何并发的readers在那些要回收的数据项上的引用，
所以reclamation一定不能开始直到没有readers保留那些数据项。

updates分离的好处是，允许removal（迁移）动作可以立即执行，而延后reclamation（回收）动作起到readers在removal期间的所有活动
完成。延后的reclamation要么同步阻塞等待，要么注册异步回调。
只要关心进行removal阶段的readers活动，因为在removal阶段之后的readers开始的活动是不可能得到引用到被移除的数据项，
就不会受到reclamation的破坏。

所以典型的RCU update顺序如下三步：
1. 移去一个数据结构的指针(s)，以使后到的readers不能得到它的引用。
2. 等待所有已经进行的readers完成RCU读侧（端）临界区。
3. 在某一时刻，没有任何readers保留数据结构的（旧）引用，此时就可以安全地reclaim。

第2步是RCU延后析构的主要底层思想。阻塞等待所有readers完成允许RCU readers使用十分轻型的同步，
某些情况下，完全无同步。但是，在更传统的基于锁的情况，readers必须使用重型同步来避免一个updater删除数据结构，它们正使用。
因为基于锁的updaters典型地，必须排他readers才能正确update数据项。但是，基于RCU的updaters典型地有这样的事实优点，
对单一对齐的指针的写是原子的，在现代cpu，允许对一链表结构的数据项原子地插入，移除，以及替换，而不破坏readers。
并发RCU readers能够继续访问旧版本（数据项），还能够不得不原子操作，内存屏障，和通讯快存不命中（现代SMP计算机系统昂贵的开销），甚至在锁竞争。

第3步，updater将执行迁移和回收，但让一个完全不同的线程去执行回收却十分有帮助，正如事实上内核的dcache例子。
甚至于在同一线程上执行第1步和第3步，将它们分开来思考仍然十分有帮助。
例如，RCU readers和updaters完全不需要进行通讯，但RCU提供隐式的低成本的通讯，带名称的，在第2步。

在源代码rcu/rcupdate.h对rcu_read_lock有这样的注释说明：

/**

 * rcu_read_lock() - mark the beginning of an RCU read-side critical section

 *

 * When synchronize_rcu() is invoked on one CPU while other CPUs

 * are within RCU read-side critical sections, then the

 * synchronize_rcu() is guaranteed to block until after all the other

 * CPUs exit their critical sections.  Similarly, if call_rcu() is invoked

 * on one CPU while other CPUs are within RCU read-side critical

 * sections, invocation of the corresponding RCU callback is deferred

 * until after the all the other CPUs exit their critical sections.

 *

 * Note, however, that RCU callbacks are permitted to run concurrently

 * with new RCU read-side critical sections.  One way that this can happen

 * is via the following sequence of events: (1) CPU 0 enters an RCU

 * read-side critical section, (2) CPU 1 invokes call_rcu() to register

 * an RCU callback, (3) CPU 0 exits the RCU read-side critical section,

 * (4) CPU 2 enters a RCU read-side critical section, (5) the RCU

 * callback is invoked.  This is legal, because the RCU read-side critical

 * section that was running concurrently with the call_rcu() (and which

 * therefore might be referencing something that the corresponding RCU

 * callback would free up) has completed before the corresponding

 * RCU callback is invoked.

 *

 * RCU read-side critical sections may be nested.  Any deferred actions

 * will be deferred until the outermost RCU read-side critical section

 * completes.

 *

 * You can avoid reading and understanding the next paragraph by

 * following this rule: don't put anything in an rcu_read_lock() RCU

 * read-side critical section that would block in a !PREEMPT kernel.

 * But if you want the full story, read on!

 *

 * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU),

 * it is illegal to block while in an RCU read-side critical section.

 * In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPT

 * kernel builds, RCU read-side critical sections may be preempted,

 * but explicit blocking is illegal.  Finally, in preemptible RCU

 * implementations in real-time (with -rt patchset) kernel builds, RCU

 * read-side critical sections may be preempted and they may also block, but

 * only when acquiring spinlocks that are subject to priority inheritance.

 */

下面一段注释说明根本不需要write_lock，与read_lock进行锁竞争，但是写与写操作之间必须由rcu的使用都来完成它们的同步。

/*

 * So where is rcu_write_lock()?  It does not exist, as there is no

 * way for writers to lock out RCU readers.  This is a feature, not

 * a bug -- this property is what provides RCU's performance benefits.

 * Of course, writers must coordinate with each other.  The normal

 * spinlock primitives work well for this, but any other technique may be

 * used as well.  RCU does not care how the writers keep out of each

 * others' way, as long as they do so.

 */

linux 内核的RCU本质的更多相关文章

Linux内核同步 - RCU synchronize原理分析
RCU(Read-Copy Update)是Linux内核比较成熟的新型读写锁,具有较高的读写并发性能,常常用在需要互斥的性能关键路径.在kernel中,rcu有tiny rcu和tree rcu两种 ...
Linux内核同步 - RCU基础
一.前言关于RCU的文档包括两份,一份讲基本的原理(也就是本文了),一份讲linux kernel中的实现.第二章描述了为何有RCU这种同步机制,特别是在cpu core数目不断递增的今天,一个性能 ...
Linux内核同步 - classic RCU的实现
一.前言无论你愿意或者不愿意,linux kernel的版本总是不断的向前推进,做为一个热衷于专研内核的工程师,最大的痛苦莫过于此:当你熟悉了一个版本的内核之后,内核已经推进到一个新的版本,你曾经熟 ...
Linux内核中锁机制之RCU、大内核锁
在上篇博文中笔者分析了关于完成量和互斥量的使用以及一些经典的问题,下面笔者将在本篇博文中重点分析有关RCU机制的相关内容以及介绍目前已被淘汰出内核的大内核锁(BKL).文章的最后对<大话Linu ...
大话Linux内核中锁机制之RCU、大内核锁
大话Linux内核中锁机制之RCU.大内核锁在上篇博文中笔者分析了关于完成量和互斥量的使用以及一些经典的问题,下面笔者将在本篇博文中重点分析有关RCU机制的相关内容以及介绍目前已被淘汰出内核的大内核 ...
linux 内核 RCU机制详解
RCU(Read-Copy Update)是数据同步的一种方式,在当前的Linux内核中发挥着重要的作用.RCU主要针对的数据对象是链表,目的是提高遍历读取数据的效率,为了达到目的使用RCU机制读取数 ...
Linux内核分析（六）----字符设备控制方法实现|揭秘系统调用本质
原文:Linux内核分析(六)----字符设备控制方法实现|揭秘系统调用本质 Linux内核分析(六) 昨天我们对字符设备进行了初步的了解,并且实现了简单的字符设备驱动,今天我们继续对字符设备的某些方 ...
再谈Linux内核中的RCU机制
转自:http://blog.chinaunix.net/uid-23769728-id-3080134.html RCU的设计思想比较明确,通过新老指针替换的方式来实现免锁方式的共享保护.但是具体到 ...
Linux内核同步：RCU
linux内核 RCU机制详解简介 RCU(Read-Copy Update)是数据同步的一种方式,在当前的Linux内核中发挥着重要的作用.RCU主要针对的数据对象是链表,目的是提高遍历读取数据的 ...

随机推荐

libevent::事件::定时器
#include <cstdio> #include <errno.h> #include <sys/types.h> #include <event.h&g ...
spark版本定制课程-第1课
spark版本定制课程-第1课 1.学习本课程可以自己动手改进spark,或者给spark增加功能.增加某些官方没有提供的功能,通过本课程希望早就一些顶级spark专家,根据整个社会的需要对spark ...
VBA 在第二个sheet中查找第一个sheet中不存在的值
VBA 在第二个sheet中查找第一个sheet中不存在的值 Sub Macro2() ' ' Macro2 Macro ' 宏由 Lizm 录制,时间: 2019/04/10 ' ' Dim ...
The usage of Markdown---目录
更新时间:2019.09.14 当我们编辑的内容比较多时,通常要生成目录来进行页内跳转.除了之前提到过的页内跳转链接的方法,还有一种方法--目录树,能够自动生产目录,大大减少工作量. tip1: ...
day34作业
作业查看岗位是teacher的员工姓名.年龄 select name,age from teacher where post='teacher'; 查看岗位是teacher且年龄大于30岁的员工姓名 ...
django-VIews之HttpResponse（一）
HttpResponse(content,conent_type=None,status=None,charset=None,*args,**kwargst) content:返回给视图的内容 con ...
linux sqlite安装
wget http://www.sqlite.org/sqlite-3.6.16.tar.gz tar -zxvf sqlite-3.6.16.tar.gz cd sqlite-3.6.16 ./c ...
洛谷P2858 【[USACO06FEB]奶牛零食Treats for the Cows】
我们可以记录头和尾再加一个卖了的零食数目,如果头超过尾就return 0. 如果遇到需要重复使用的数,(也就是不为零的d数组)就直接return d[tuo][wei]. 如果没有,就取卖头一个与最后 ...
PL/SQL软件执行命令出现动态执行表不可访问，本会话的自动统计被禁止
出现这样的原因是该用户没有相关权限. 解决方法: 去除软件层面设置
千与千寻主题曲beep函数版
在出代码之前,我们向来了解一下Beep函数. 例: Beep(,); 这个表示575Hz响100ms. 下面给出代码: #include <bits/stdc++.h> #include ...

linux 内核的RCU本质

linux 内核的RCU本质的更多相关文章

随机推荐

热门专题