ThreadCachedInt
folly/ThreadCachedInt.h
High-performance atomic increment using thread caching.
folly/ThreadCachedInt.h
introduces a integer class designed for high performance increments from multiple threads simultaneously without loss of precision. It has two read modes, readFast
gives a potentially stale value with one load, and readFull
gives the exact value, but is much slower, as discussed below.
Performance
Increment performance is up to 10x greater than std::atomic_fetch_add
in high contention environments. See folly/test/ThreadCachedIntTest.h
for more comprehensive benchmarks.
readFast
is as fast as a single load.
readFull
, on the other hand, requires acquiring a mutex and iterating through a list to accumulate the values of all the thread local counters, so is significantly slower than readFast
.
Usage
Create an instance and increment it with increment
or the operator overloads. Read the value with readFast
for quick, potentially stale data, or readFull
for a more expensive but precise result. There are additional convenience functions as well, such as set
.
ThreadCachedInt<int64_t> val;
EXPECT_EQ(, val.readFast());
++val; // increment in thread local counter only
EXPECT_EQ(, val.readFast()); // increment has not been flushed
EXPECT_EQ(, val.readFull()); // accumulates all thread local counters
val.set();
EXPECT_EQ(, val.readFast());
EXPECT_EQ(, val.readFull());
Implementation
folly::ThreadCachedInt
uses folly::ThreadLocal
to store thread specific objects that each have a local counter. When incrementing, the thread local instance is incremented. If the local counter passes the cache size, the value is flushed to the global counter with an atomic increment. It is this global counter that is read with readFast
via a simple load, but will not count any of the updates that haven't been flushed.
In order to read the exact value, ThreadCachedInt
uses the extended readAllThreads()
API of folly::ThreadLocal
to iterate through all the references to all the associated thread local object instances. This currently requires acquiring a global mutex and iterating through the references, accumulating the counters along with the global counter. This also means that the first use of the object from a new thread will acquire the mutex in order to insert the thread local reference into the list. By default, there is one global mutex per integer type used in ThreadCachedInt
. If you plan on using a lot of ThreadCachedInt
s in your application, considering breaking up the global mutex by introducing additional Tag
template parameters.
set
simply sets the global counter value, and marks all the thread local instances as needing to be reset. When iterating with readFull
, thread local counters that have been marked as reset are skipped. When incrementing, thread local counters marked for reset are set to zero and unmarked for reset.
Upon destruction, thread local counters are flushed to the parent so that counts are not lost after increments in temporary threads. This requires grabbing the global mutex to make sure the parent itself wasn't destroyed in another thread already.
Alternate Implementations
There are of course many ways to skin a cat, and you may notice there is a partial alternate implementation in folly/test/ThreadCachedIntTest.cpp
that provides similar performance. ShardedAtomicInt
simply uses an array ofstd::atomic<int64_t>
's and hashes threads across them to do low-contention atomic increments, and readFull
just sums up all the ints.
This sounds great, but in order to get the contention low enough to get similar performance as ThreadCachedInt with 24 threads, ShardedAtomicInt
needs about 2000 ints to hash across. This uses about 20x more memory, and the lock-freereadFull
has to sum up all 2048 ints, which ends up being a about 50x slower than ThreadCachedInt
in low contention situations, which is hopefully the common case since it's designed for high-write, low read access patterns. Performance of readFull
is about the same speed as ThreadCachedInt
in high contention environments.
Depending on the operating conditions, it may make more sense to use one implementation over the other. For example, a lower contention environment will probably be able to use a ShardedAtomicInt
with a much smaller array without hurting performance, while improving memory consumption and perf of readFull
.
ThreadCachedInt的更多相关文章
- folly学习心得(转)
原文地址: https://www.cnblogs.com/Leo_wl/archive/2012/06/27/2566346.html 阅读目录 学习代码库的一般步骤 folly库的学习心得 ...
- Folly: Facebook Open-source Library Readme.md 和 Overview.md(感觉包含的东西并不多,还是Boost更有用)
folly/ For a high level overview see the README Components Below is a list of (some) Folly component ...
随机推荐
- ThinkPHP 表单自动验证运用
使用TP 3.2框架 public function add_post(){ //验证规则 $rule=array( array('name','require','请输入姓名',1),//必须验证n ...
- n个数取前k个最小数
算法题:K 个最近的点 给定一些 points 和一个 origin,从 points 中找到 k 个离 origin 最近的点.按照距离由小到大返回.如果两个点有相同距离,则按照x值来排序:若x值也 ...
- ContentNegotiatingViewResolver多种输出格式实例: json/jsp/xml/xls/pdf
ContentNegotiatingViewResolver多种输出格式实例: json/jsp/xml/xls/pdf 本例用的是javaConfig配置 以pizza为例. json输出需要用到的 ...
- ibatis.net 实现多数据库配置
1.1 功能介绍 使用ibatis.net ORM框架时,有时候需要操作多个数据库,同时有时候也需要对连接数据库信息进行加密,本文通过将配置连接写到Web.config中, 这样就可以在Web.co ...
- 导出csv文件,导出axlsx文件。gem 'Axlsx-Rails' (470🌟);导入csv文件。
汇出 CSV 档案 需求:后台可以汇出报名资料 有时候后台功能做再多,也不如 Microsoft Excel 或 Apple Numbers 试算表软件提供的分析功能,这时候如果有汇出功能,就可以很方 ...
- HTML5-canvas实例:2D折线数据图与2D扇形图
基础知识: <canvas id="demo" width="400" height="400"></canvas> ...
- docker部署mysql
1. 下载 [root@localhost my.Shells]# ./dockerStart.sh start or stop start Redirecting to /bin/systemctl ...
- springmvc的表单标签
1. Spring提供的轻量级标签库 2.可在JSP页面中渲染HTML元素的标签 3 用法 1)必须在JSP页面的开头处声明taglib指令 <%@ taglib prefix="fm ...
- leetcode122 买卖股票的最佳时机 python
题目:给定一个数组,它表示了一只股票的价格浮动,第i个元素代表的是股票第i天的价格.设计一个函数,计算出该股票的最大收益,注意,可以多次买入卖出,但下一次买入必须是在本次持有股票卖出之后.比如[1,7 ...
- php中require_once与include_once的区别
首先include_once仅包含文件一次,如果没有文件,会发出警告,并继续执行. 而require_once也是仅包含文件一次,但是如果程序中没有找到文件,则程序会中止执行.