__sync_fetch_and_add函数(Redis源码学习)

在学习redis-3.0源码中的sds文件时，看到里面有如下的C代码，之前从未接触过，所以为了全面学习redis源码，追根溯源，学习一下__sync_fetch_and_add的系列函数：

#define update_zmalloc_stat_add(__n) __sync_add_and_fetch(&used_memory, (__n))

在网上查找相关 __sync_add_and_fetch 函数的知识点，基本都是一样的内容，于是总结如下。

1.背景由来

实现多线程环境下的计数器操作，统计相关事件的次数. 当然我们知道，count++这种操作不是原子的。一个自加操作，本质是分成三步的：

 1 从缓存取到寄存器

 2 在寄存器加1

 3 存入缓存。

由于时序的因素，多个线程操作同一个全局变量，会出现问题。这也是并发编程的难点。在目前多核条件下，这种困境会越来越彰显出来。

最简单的处理办法就是加锁保护，这也是我最初的解决方案。看下面的代码:

    pthread_mutex_t count_lock = PTHREAD_MUTEX_INITIALIZER;

    pthread_mutex_lock(&count_lock);

    global_int++;

    pthread_mutex_unlock(&count_lock);

后来在网上查找资料，找到了__sync_fetch_and_add系列的命令，相关英文文章： Multithreaded simple data type access and atomic variables，

2.系列函数

__sync_fetch_and_add系列一共有十二个函数，有加/减/与/或/异或/等函数的原子性操作函数,__sync_fetch_and_add,顾名思义，先fetch，然后自加，返回的是自加以前的值。以count = 4为例，调用__sync_fetch_and_add(&count,1)之后，返回值是4，然后，count变成了5.

简单验证代码如下sync_fetch_add.c:

#include <stdio.h>

#include <stdlib.h>

int main(int argc, char **argv){

    int count = 4;

    printf("111 count:%d\n",count);

    int retval = __sync_fetch_and_add(&count,10);

    printf("222 retval:%d\n",retval);

    printf("222 count:%d\n",count);

    return 0;

}

linux 系统中命令行执行：gdb -g -o sync_fetch_add sync_fetch_add.c

得到可执行文件，执行后得到如下结果：

./sync_fetch_add

111 count:4

222 retval:4

222 count:14

其他函数可以自行验证。

有__sync_fetch_and_add,自然也就有__sync_add_and_fetch，呵呵这个的意思就很清楚了，先自加，在返回。他们的关系与i++和++i的关系是一样的。有了这个函数，对于多线程对全局变量进行自加，我们就再也不用理线程锁了。下面这行代码，和上面被pthread_mutex保护的那行代码作用是一样的，而且也是线程安全的。

在用gcc编译的时候要加上选项 -march=i686，我在执行上面代码时，gcc没加该参数，使用到的版本gcc version 4.4.7 20120313 , 上面代码能正常运行通过。

下面是这群函数的全部，无非是先fetch再运算，或者先运算再fetch。

type __sync_fetch_and_add (type *ptr, type value);

type __sync_fetch_and_sub (type *ptr, type value);

type __sync_fetch_and_or (type *ptr, type value);

type __sync_fetch_and_and (type *ptr, type value);

type __sync_fetch_and_xor (type *ptr, type value);

type __sync_fetch_and_nand (type *ptr, type value);

type __sync_add_and_fetch (type *ptr, type value);

type __sync_sub_and_fetch (type *ptr, type value);

type __sync_or_and_fetch (type *ptr, type value);

type __sync_and_and_fetch (type *ptr, type value);

type __sync_xor_and_fetch (type *ptr, type value);

type __sync_nand_and_fetch (type *ptr, type value);

GCC 提供的原子操作

gcc从4.1.2提供了__sync_*系列的built-in函数，用于提供加减和逻辑运算的原子操作。

其声明如下：

type __sync_fetch_and_add (type  * ptr, type value, ...)

type __sync_fetch_and_sub (type  * ptr, type value, ...)

type __sync_fetch_and_or (type  * ptr, type value, ...)

type __sync_fetch_and_and (type  * ptr, type value, ...)

type __sync_fetch_and_xor (type  * ptr, type value, ...)

type __sync_fetch_and_nand (type  * ptr, type value, ...)

type __sync_add_and_fetch (type  * ptr, type value, ...)

type __sync_sub_and_fetch (type  * ptr, type value, ...)

type __sync_or_and_fetch (type  * ptr, type value, ...)

type __sync_and_and_fetch (type  * ptr, type value, ...)

type __sync_xor_and_fetch (type  * ptr, type value, ...)

type __sync_nand_and_fetch (type  * ptr, type value, ...)

这两组函数的区别在于第一组返回更新前的值，第二组返回更新后的值。

看网上有大师的代码测试例子Alexander Sandler，现拷贝为 sync_fetch2.c 文件如下并验证执行结果：

#include <stdio.h>

#include <pthread.h>

#include <unistd.h>

#include <stdlib.h>

#include <sched.h>

#include <linux/unistd.h>

#include <sys/syscall.h>

#include <errno.h>

#define INC_TO 1000000 // one million...

int global_int = 0;

pid_t gettid( void )

{

	return syscall( __NR_gettid );

}

void *thread_routine( void *arg )

{

	int i;

	int proc_num = (int)(long)arg;

	cpu_set_t set;

	CPU_ZERO( &set );

	CPU_SET( proc_num, &set );

	if (sched_setaffinity( gettid(), sizeof( cpu_set_t ), &set ))

	{

		perror( "sched_setaffinity" );

		return NULL;

	}

	for (i = 0; i < INC_TO; i++)

	{

		// global_int++;

		__sync_fetch_and_add( &global_int, 1 );

	}

	return NULL;

}

int main()

{

	int procs = 0;

	int i;

	pthread_t *thrs;    

	// Getting number of CPUs

	procs = (int)sysconf( _SC_NPROCESSORS_ONLN );

	if (procs < 0)

	{

		perror( "sysconf" );

		return -1;

	}

	thrs = (pthread_t *)malloc( (sizeof( pthread_t )) * procs );

	if (thrs == NULL)

	{

		perror( "malloc" );

		return -1;

	}

	printf( "Starting %d threads...\n", procs );

	for (i = 0; i < procs; i++)

	{

		if (pthread_create( &thrs[i], NULL, thread_routine,

			(void *)(long)i ))

		{

			perror( "pthread_create" );

			procs = i;

			break;

		}

	}

	for (i = 0; i < procs; i++)

		pthread_join( thrs[i], NULL );

	free( thrs );

	printf( "After doing all the math, global_int value is: %d\n",global_int );

	printf( "Expected value is: %d\n", INC_TO * procs );

	return 0;

}

上面代码在RHEL6.9中编译：g++ -g -o sync_fetch2 sync_fetch2.c -lpthread

执行结果为：

./sync_fetch2

Starting 4 threads...

After doing all the math, global_int value is: 4000000

Expected value is: 4000000

如果将上面thread_routine函数中的这两句换一下，直接用变量加加，则每次执行都得到不一样的值

	global_int++;

	// __sync_fetch_and_add( &global_int, 1 );

修改后得到结果如下：

$./sync_fetch2

Starting 4 threads...

After doing all the math, global_int value is: 1428371

Expected value is: 4000000

$ ./sync_fetch2

Starting 4 threads...

After doing all the math, global_int value is: 2479197

Expected value is: 4000000

3.小结

可以从代码验证中看到 __sync_fetch_and_add 函数的作用，在多线程中，对简单的变量运算能保证结果的正确，至于其他函数，参考上面代码，读者可以自行验证。

另外基于上面例子，有人修改代码，加上执行消耗时间，通过__sync_fetch_and_add和加锁机制的对比，发现__sync_fetch_and_add比加解锁机制快了6-7倍，执行速度还是很快的，因为涉及到汇编代码，后续有机会会再学习验证。

本人才疏学浅，错误不当之处，请批评指正。

如果文章对您有一点点用处，我会很高兴能帮到您。多谢关注推荐和转发，谢谢！

参考网址：

http://www.alexonlinux.com/multithreaded-simple-data-type-access-and-atomic-variables

https://blog.csdn.net/i_am_jojo/article/details/7591743

https://www.zhihu.com/question/280022939

https://blog.csdn.net/long2324066440/article/details/72784084