folly/ThreadCachedInt.h

High-performance atomic increment using thread caching.

folly/ThreadCachedInt.h introduces a integer class designed for high performance increments from multiple threads simultaneously without loss of precision. It has two read modes, readFast gives a potentially stale value with one load, and readFull gives the exact value, but is much slower, as discussed below.

Performance


Increment performance is up to 10x greater than std::atomic_fetch_add in high contention environments. See folly/test/ThreadCachedIntTest.h for more comprehensive benchmarks.

readFast is as fast as a single load.

readFull, on the other hand, requires acquiring a mutex and iterating through a list to accumulate the values of all the thread local counters, so is significantly slower than readFast.

Usage


Create an instance and increment it with increment or the operator overloads. Read the value with readFast for quick, potentially stale data, or readFull for a more expensive but precise result. There are additional convenience functions as well, such as set.

    ThreadCachedInt<int64_t> val;
EXPECT_EQ(, val.readFast());
++val; // increment in thread local counter only
EXPECT_EQ(, val.readFast()); // increment has not been flushed
EXPECT_EQ(, val.readFull()); // accumulates all thread local counters
val.set();
EXPECT_EQ(, val.readFast());
EXPECT_EQ(, val.readFull());

Implementation


folly::ThreadCachedInt uses folly::ThreadLocal to store thread specific objects that each have a local counter. When incrementing, the thread local instance is incremented. If the local counter passes the cache size, the value is flushed to the global counter with an atomic increment. It is this global counter that is read with readFast via a simple load, but will not count any of the updates that haven't been flushed.

In order to read the exact value, ThreadCachedInt uses the extended readAllThreads() API of folly::ThreadLocal to iterate through all the references to all the associated thread local object instances. This currently requires acquiring a global mutex and iterating through the references, accumulating the counters along with the global counter. This also means that the first use of the object from a new thread will acquire the mutex in order to insert the thread local reference into the list. By default, there is one global mutex per integer type used in ThreadCachedInt. If you plan on using a lot of ThreadCachedInts in your application, considering breaking up the global mutex by introducing additional Tag template parameters.

set simply sets the global counter value, and marks all the thread local instances as needing to be reset. When iterating with readFull, thread local counters that have been marked as reset are skipped. When incrementing, thread local counters marked for reset are set to zero and unmarked for reset.

Upon destruction, thread local counters are flushed to the parent so that counts are not lost after increments in temporary threads. This requires grabbing the global mutex to make sure the parent itself wasn't destroyed in another thread already.

Alternate Implementations


There are of course many ways to skin a cat, and you may notice there is a partial alternate implementation in folly/test/ThreadCachedIntTest.cpp that provides similar performance. ShardedAtomicInt simply uses an array ofstd::atomic<int64_t>'s and hashes threads across them to do low-contention atomic increments, and readFull just sums up all the ints.

This sounds great, but in order to get the contention low enough to get similar performance as ThreadCachedInt with 24 threads, ShardedAtomicInt needs about 2000 ints to hash across. This uses about 20x more memory, and the lock-freereadFull has to sum up all 2048 ints, which ends up being a about 50x slower than ThreadCachedInt in low contention situations, which is hopefully the common case since it's designed for high-write, low read access patterns. Performance of readFull is about the same speed as ThreadCachedInt in high contention environments.

Depending on the operating conditions, it may make more sense to use one implementation over the other. For example, a lower contention environment will probably be able to use a ShardedAtomicInt with a much smaller array without hurting performance, while improving memory consumption and perf of readFull.

ThreadCachedInt的更多相关文章

  1. folly学习心得(转)

    原文地址:  https://www.cnblogs.com/Leo_wl/archive/2012/06/27/2566346.html   阅读目录 学习代码库的一般步骤 folly库的学习心得 ...

  2. Folly: Facebook Open-source Library Readme.md 和 Overview.md(感觉包含的东西并不多,还是Boost更有用)

    folly/ For a high level overview see the README Components Below is a list of (some) Folly component ...

随机推荐

  1. mapply

    相比 lapply( )和 sapply( )在一个向量上迭代,mapply( )可以在多个向量上进行迭代.换句话,mapply 是 sapply 的多元版本:mapply(function(a, b ...

  2. android------引导页两种实现方式(原生和WebView网页实现)

    有的App当你第一次打开的是和常常会有引导页来描述一些App信息(功能,特点),当然也要做验证,验证第二次进入不进入引导页,直接进入App,此博客借助ViewPager来实现引导页, ViewPage ...

  3. c#在winform中用DataGridView实现分页效果

    public partial class Form11 : Form { public Form11() { InitializeComponent(); } private int Inum = 1 ...

  4. 在laravel视图中直接使用{{ csrf_token() }}被翻译成英文显示的处理方法

    在表单中加一个input框在放入{{ csrf_token() }}就可以了: 方法如下: <input type="hidden" name="_token&qu ...

  5. avalon 搭配 百度的UI移动框架 gmu 可以很好干活

    使用过的人评价, 这个UI稳定, bug少, 组件丰富, 触屏好; 小公司, 可以用用 链接

  6. Linux升级nodejs及多版本管理

    最近要用到开发要用到nodejs,于是跑到开发机运行了下node,已经安装了,深感欣慰,是啥版本呢?再次运行了下node -v,原来是0.6.x的.估计是早先什么时候谁弄的.那么来升级下node吧. ...

  7. I.MX6 CAAM

    /********************************************************************************* * I.MX6 CAAM * 说明 ...

  8. 使用Maven简单配置Mybatis

    1.新建一个Maven项目 2. 在pom.xml中进行配置,在pom.xml中配置的时候,需要网速好,当网速不是很好的时候,是加载不出Jar包的. 代码如下所示. <project xmlns ...

  9. Codeforces 148B: Escape

    题目链接:http://codeforces.com/problemset/problem/148/B 题意:公主从龙的洞穴中逃跑,公主的速度为vp,龙的速度为vd,在公主逃跑时间t时,龙发现公主逃跑 ...

  10. 2018-2019-2 《网络对抗技术》Exp2 后门原理与实践 20165222

    Exp2 后门原理与实践 实验环境 win7ip地址为: 192.168.136.130 kali.ip地址为: 192.168.136.129 两台虚拟机可以ping通 实验步骤 1,使用netca ...