转自：

深入剖析 linux GCC 4.4 的 STL string

本文通过研究STL源码来剖析C++中标准模板块库std::string运行机理，重点研究了其中的引用计数和Copy-On-Write技术。

平台：x86_64-redhat-linux
gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)

1. 问题提出

最近在我们的项目当中，出现了两次与使用string相关的问题。

1.1. 问题1：新代码引入的Bug

前一段时间有一个老项目来一个新需求，我们新增了一些代码逻辑来处理这个新需求。测试阶段没有问题，但上线之后，偶尔会引起错误的逻辑输出甚至崩溃。这个问题困扰着我们很久。我们对新增代码做周详单元测试和集成测试都没有发现问题，最后只能逼迫我们去看那一大段未修改过原始代码逻辑。该项目中经常会碰到使用string，原始代码中有这样一段逻辑引起了我们的怀疑：

`1`	`string string_info;`

`2`	`//... 对string_info的赋值操作`

`3`	`char* p = (char*)string_info.data();`

在严格的检查下和逻辑判断后，某些逻辑分支会对p指向的内容进行一些修改。这样虽然危险，但一直工作正常。联想到我们最近的修改：将string_info这个string对象拷贝了一份，然后进行一些处理。我们意识到string的Copy-On-Write和引用计数技术可能会导致我们拷贝的这个string并没有真正的实现数据拷贝。在做了一些测试和研究之后，我们确信了这一点。如是对上述代码进行了修正处理如下：

`1`	`char* p = &(string_info[0]);`

然后对项目类似的地方都做了这样的处理之后，测试，上线，一切OK，太完美了。

1.2. 问题2：性能优化

最近做一个项目的重构，对相关代码进行性能分析profile时发现memcpy的CPU占比比较高，达到8.7%，仔细检查代码中，发现现有代码大量的map查找操作。map定义如下：

`1`	`typedef` `std::map ssmap;`

`2`	`ssmap info_map;`

查找的操作如下：

`1`	`info_map["some_key"] = some_value;`

我们不经意间就会写出上述代码，如果改为下述代码，性能会好很多：

`1`	`static` `const` `std::string __s_some_key =` `"some_key";`

`2`	`info_map[__s_some_key] = some_value;`

这是因为第一种代码，每次查找都构造一个临时的string对象，同时会将“some_key”这个字符串拷贝一份。修改之后的代码，只需要在第一次初始化时候构造一次，以后每次调用都不会进行拷贝，因此效率上要好很多。类似代码都经过这样优化之后，memcpy的CPU占比下来了，降到4.3%。

下面我们通过深入string的源码内部来解释上述两个问题的解决过程和思路。

2. std::string定义

STL中的字符串类string的定义如下：

`1`	`template<typename` `_CharT,` `typename` `_Traits ,` `typename` `_Alloc>` `class` `basic_string;`

`2`	`typedef` `basic_string <char, char_traits<char` `>, allocator<` `char> > string;`

不难发现string在栈内存空间上只占用一个指针(_CharT* _M_p)的大小空间，因此sizeof(string)==8。其他信息都存储在堆内存空间上。

问题1：
我们有下面这一条C++语句：

`1`	`string name;`

请问，name这个变量总共带来多大的内存开销？这个问题我们稍后解答。

3. std::string内存空间布局

下面我们通过常见的用法来剖析一下string对象内部内存空间布局情况。
最常见的string用法是通过c风格字符串构造一个string对象，例如：
string name(“zieckey”);

其调用的构造函数定义如下：

`1`	`basic_string(const` `_CharT* __s,` `const` `_Alloc& __a)`

`2`	`: _M_dataplus( _S_construct(__s , __s ? __s + traits_type ::length( __s) :`

`3`	`__s + npos , __a), __a)`

4 {}

该构造函数直接调用 _S_construct 来构造这个对象，定义如下：

`01`	`template<typename` `_CharT,` `typename` `_Traits ,` `typename` `_Alloc>`

`02`	`template<typename` `_InIterator>`

03 _CharT*

`04`	`basic_string<_CharT , _Traits, _Alloc>::`

`05`	`_S_construct(_InIterator __beg, _InIterator __end ,` `const` `_Alloc& __a ,`

`06`	`input_iterator_tag)`

07 {

`08`	`// Avoid reallocation for common case.`

`09`	`_CharT __buf[128];`

`10`	`size_type __len = 0;`

`11`	`while` `( __beg != __end && __len <` `sizeof(__buf ) /` `sizeof( _CharT))`

12 {

`13`	`__buf[__len ++] = *__beg;`

`14`	`++ __beg;`

15 }

16

`17`	`//构造一个 _Rep 结构体，同时分配足够的空间，具体见下面内存映像图示`

`18`	`_Rep* __r = _Rep ::_S_create( __len, size_type (0), __a);`

19

`20`	`//拷贝数据到 string对象内部`

`21`	`_M_copy( __r->_M_refdata (), __buf, __len);`

22 __try

23 {

`24`	`while` `(__beg != __end)`

25 {

`26`	`if` `(__len == __r-> _M_capacity)`

27 {

`28`	`// Allocate more space.`

`29`	`_Rep* __another = _Rep:: _S_create(__len + 1, __len, __a);`

`30`	`_M_copy(__another ->_M_refdata(), __r->_M_refdata (), __len);`

`31`	`__r->_M_destroy (__a);`

`32`	`__r = __another ;`

33 }

`34`	`__r->_M_refdata ()[__len++] = * __beg;`

`35`	`++ __beg;`

36 }

37 }

`38`	`__catch(...)`

39 {

`40`	`__r->_M_destroy (__a);`

`41`	`__throw_exception_again;`

42 }

`43`	`//设置字符串长度、引用计数以及赋值最后一个字节为结尾符 char_type()`

`44`	`__r-> _M_set_length_and_sharable(__len );`

45

`46`	`//最后，返回字符串第一个字符的地址`

`47`	`return` `__r->_M_refdata ();`

48 }

49

`50`	`template<typename` `_CharT,` `typename` `_Traits ,` `typename` `_Alloc>`

`51`	`typename` `basic_string <_CharT, _Traits, _Alloc >::_Rep*`

`52`	`basic_string<_CharT , _Traits, _Alloc>::_Rep ::`

`53`	`_S_create(size_type __capacity, size_type __old_capacity ,`

`54`	`const` `_Alloc & __alloc)`

55 {

`56`	`// 需要分配的空间包括：`

`57`	`// 一个数组 char_type[__capacity]`

`58`	`// 一个额外的结尾符 char_type()`

`59`	`// 一个足以容纳 struct _Rep 空间`

`60`	`// Whew. Seemingly so needy, yet so elemental.`

`61`	`size_type __size = (__capacity + 1) *` `sizeof( _CharT) +` `sizeof` `(_Rep);`

62

`63`	`void* __place = _Raw_bytes_alloc (__alloc). allocate(__size );` `//申请空间`

64

`65`	`_Rep * __p =` `new` `(__place) _Rep;// 在地址__place 空间上直接 new对象( 称为placement new)`

`66`	`__p-> _M_capacity = __capacity ;`

`67`	`__p-> _M_set_sharable();// 设置引用计数为0，标明该对象只为自己所有`

`68`	`return` `__p;`

69 }

_Rep定义如下：

`1`	`struct` `_Rep_base`

2 {

`3`	`size_type _M_length;`

`4`	`size_type _M_capacity;`

`5`	`_Atomic_word _M_refcount;`

6 };

至此，我们可以回答上面“问题1”中提出的问题：
上文中”string name;”这个name对象所占用的总空间为33个字节，具体如下：

`1`	`sizeof(std::string) + 0 +` `sizeof('') +` `sizeof(std::string::_Rep)`

其中：sizeof(std::string)为栈空间

上文中的提到的另一条C++语句 string name(“zieckey”); 定义了一个string变量name，其内存空间布局如下：

4. 深入string内部源码

4.1. string copy与strncpy

长期以来，经常看到有人对std::string赋值拷贝与strncpy之间的效率进行比较和讨论。下面我们通过测试用例来进行一个基本的测试：

`01`	`#include<iostream>`

`02`	`#include<cstdlib>`

`03`	`#include<string>`

`04`	`#include<ctime>`

`05`	`#include<cstring>`

06

`07`	`using` `namespace` `std;`

08

`09`	`const` `int` `array_size = 200;`

`10`	`const` `int` `loop_count = 1000000;`

11

`12`	`void` `test_strncpy ()`

13 {

`14`	`char` `s1[array_size ];`

`15`	`char* s2=` `new` `char[ array_size];`

`16`	`memset( s2,` `'c'` `, array_size);`

`17`	`size_t` `start=clock` `();`

`18`	`for(` `int` `i =0;i!= loop_count;++i )` `strncpy( s1,s2 , array_size);`

`19`	`cout<< __func__ <<` `" : "` `<<` `clock()- start<<endl ;`

`20`	`delete` `s2;`

`21`	`s2 = NULL;`

22 }

23

`24`	`void` `test_string_copy ()`

25 {

`26`	`string s1;`

`27`	`string s2;`

`28`	`s2. append(array_size ,` `'c');`

`29`	`size_t` `start=clock` `();`

`30`	`for(` `int` `i =0;i!= loop_count;++i ) s1= s2;`

`31`	`cout<< __func__ <<` `" : "` `<<` `clock()- start<<endl ;`

32 }

33

`34`	`int` `main ()`

35 {

`36`	`test_strncpy();`

`37`	`test_string_copy();`

`38`	`return` `0;`

39 }

使用g++ -O3编译，运行时间如下：

test_strncpy : 40000
test_string_copy : 10000

字符串strncpy的运行时间居然是string copy的4倍。究其原因就是因为，string copy是基于引用计数技术，每次copy的代价非常小。
测试中我们还发现，如果array_size在10个字节以内的话，两者相差不大，随着array_size的变大，两者的差距也越来越大。例如，在array_size=1000的时候，strncpy就要慢13倍。

4.2. 通过GDB调试查看引用计数变化

上面的测试结论非常好，打消了大家对string性能问题的担忧。下面我们通过一段程序来验证引用计数在这一过程中的变化和作用。
请先看一段测试代码：

`01`	`#include <assert.h>`

`02`	`#include <iostream>`

`03`	`#include <string>`

04

`05`	`using` `namespace` `std;`

06

`07`	`int` `main ()`

08 {

`09`	`string a =` `"0123456789abcdef"` `;`

`10`	`string b = a ;`

`11`	`cout <<` `"a.data() ="` `<< (void` `*)a. data() << endl ;`

`12`	`cout <<` `"b.data() ="` `<< (void` `*)b. data() << endl ;`

`13`	`assert( a.data () == b. data());`

`14`	`cout << endl;`

15

`16`	`string c = a ;`

`17`	`cout <<` `"a.data() ="` `<< (void` `*)a. data() << endl ;`

`18`	`cout <<` `"b.data() ="` `<< (void` `*)b. data() << endl ;`

`19`	`cout <<` `"c.data() ="` `<< (void` `*)c. data() << endl ;`

`20`	`assert( a.data () == c. data());`

21

`22`	`cout << endl;`

`23`	`c[0] =` `'1';`

`24`	`cout <<` `"after write:\n";`

`25`	`cout <<` `"a.data() ="` `<< (void` `*)a. data() << endl ;`

`26`	`cout <<` `"b.data() ="` `<< (void` `*)b. data() << endl ;`

`27`	`cout <<` `"c.data() ="` `<< (void` `*)c. data() << endl ;`

`28`	`assert( a.data () != c. data() && a .data() == b.data ());`

`29`	`return` `0;`

30 }

运行之后，输出：

a.data() =0xc22028
b.data() =0xc22028

a.data() =0xc22028
b.data() =0xc22028
c.data() =0xc22028

after write:
a.data() =0xc22028
b.data() =0xc22028
c.data() =0xc22068

上述代码运行的结果输出反应出，在我们对b、c赋值之后，a、b、c三个string对象的内部数据的内存地址都是一样的。只有当我们对c对象进行修改之后，c对象的内部数据的内存地址才不一样，这一点是是如何做到的呢？

我们通过gdb调试来验证引用计数在上述代码执行过程中的变化：

`01`	`(gdb) b 10`

`02`	`Breakpoint 1 at 0x400c35:` `file` `string_copy1.cc, line 10.`

`03`	`(gdb) b 16`

`04`	`Breakpoint 2 at 0x400d24:` `file` `string_copy1.cc, line 16.`

`05`	`(gdb) b 23`

`06`	`Breakpoint 3 at 0x400e55:` `file` `string_copy1.cc, line 23.`

07 (gdb) r

`08`	`Starting program: [...]/unixstudycode/string_copy/string_copy1`

`09`	`[Thread debugging using libthread_db enabled]`

10

`11`	`Breakpoint 1, main () at string_copy1.cc:10`

`12`	`10 string b = a;`

13

`14`	`(gdb) x/16uba._M_dataplus._M_p-8`

`15`	`0x602020: 0 0 0 0 0 0 0 0`

`16`	`0x602028: 48 49 50 51 52 53 54 55`

此时对象a的引用计数是0

1 (gdb) n

`2`	`11 cout <<` `"a.data() ="` `<< (void*)a.data() << endl;`

b=a 将a赋值给b，string copy

`1`	`(gdb) x/16ub` `a._M_dataplus._M_p-8`

`2`	`0x602020: 1 0 0 0 0 0 0 0`

`3`	`0x602028: 48 49 50 51 52 53 54 55`

此时对象a的引用计数变为1，表明有另一个对象共享该对象a

01 (gdb) c

`02`	`Continuing.`

`03`	`a.data() =0x602028`

`04`	`b.data() =0x602028`

05

`06`	`Breakpoint 2, main () at string_copy1.cc:16`

`07`	`16 string c = a;`

`08`	`(gdb) x/16ub` `a._M_dataplus._M_p-8`

`09`	`0x602020: 1 0 0 0 0 0 0 0`

`10`	`0x602028: 48 49 50 51 52 53 54 55`

11 (gdb) n

`12`	`17 cout <<` `"a.data() ="` `<< (void*)a.data() << endl;`

c=a 将a赋值给c，string copy

`1`	`(gdb) x/16ub` `a._M_dataplus._M_p-8`

`2`	`0x602020: 2 0 0 0 0 0 0 0`

`3`	`0x602028: 48 49 50 51 52 53 54 55`

此时对象a的引用计数变为2，表明有另外2个对象共享该对象a

01 (gdb) c

`02`	`Continuing.`

`03`	`a.data() =0x602028`

`04`	`b.data() =0x602028`

`05`	`c.data() =0x602028`

06

`07`	`Breakpoint 3, main () at string_copy1.cc:23`

`08`	`23 c[0] =` `'1';`

09 (gdb) n

`10`	`24 cout <<` `"after write:\n";`

对c的值进行修改

`1`	`(gdb) x/16ub` `a._M_dataplus._M_p-8`

`2`	`0x602020: 1 0 0 0 0 0 0 0`

`3`	`0x602028: 48 49 50 51 52 53 54 55`

此时对象a的引用计数变为1

`1`	`(gdb) p a._M_dataplus._M_p`

`2`	`$3 = 0x602028` `"0123456789abcdef"`

`3`	`(gdb) p b._M_dataplus._M_p`

`4`	`$4 = 0x602028` `"0123456789abcdef"`

`5`	`(gdb) p c._M_dataplus._M_p`

`6`	`$5 = 0x602068` `"1123456789abcdef"`

此时对象c的内部数据内存地址已经与a、b不同了，即Copy-On-Write

上述GDB调试过程，清晰的验证了3个string对象a b c的通过引用计数技术联系在一起。

4.3. 源码分析string copy

下面我们阅读源码来分析。上述过程。
先看string copy过程的源码：

`01`	`//拷贝构造函数`

`02`	`basic_string(const` `basic_string& __str)`

`03`	`: _M_dataplus( __str._M_rep ()->_M_grab( _Alloc(__str .get_allocator()),`

`04`	`__str.get_allocator ()),`

`05`	`__str.get_allocator ())`

06 {}

07

`08`	`_CharT* _M_grab(const` `_Alloc& __alloc1,` `const` `_Alloc& __alloc2)`

09 {

`10`	`return` `(! _M_is_leaked() && __alloc1 == __alloc2)`

`11`	`? _M_refcopy() : _M_clone (__alloc1);`

12 }

13

`14`	`_CharT*_M_refcopy()` `throw` `()`

15 {

`16`	`#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING`

`17`	`if` `( __builtin_expect(this` `!= &_S_empty_rep(),` `false))`

18 #endif

`19`	`__gnu_cxx::__atomic_add_dispatch (&this-> _M_refcount, 1);`

`20`	`return` `_M_refdata();`

21 }

上面几段源代码比较好理解，先后调用了basic_string (const basic_string& __str )拷贝构造函数、_M_grab、_M_refcopy，
_M_refcopy实际上就是调用原子操作__atomic_add_dispatch (确保线程安全)将引用计数+1，然后返回原对象的数据地址。
由此可以看到，string对象之间的拷贝/赋值代价非常非常小。

几个赋值语句之后，a、b、c对象的内存空间布局如下图所示：

4.4. Copy-On-Write

下面再来看”c[0] = ‘1’; “做了些什么：

`01`	`reference operator []( size_type __pos )`

02 {

`03`	`_M_leak();`

`04`	`return` `_M_data ()[__pos ];`

05 }

06

`07`	`void` `_M_leak ()` `// for use in begin() & non-const op[]`

08 {

`09`	`//前面看到 c 对象在此时实际上与a对象的数据实际上指向同一块内存区域`

`10`	`//因此会调用 _M_leak_hard()`

`11`	`if` `(! _M_rep ()->_M_is_leaked ())`

`12`	`_M_leak_hard ();`

13 }

14

`15`	`void` `_M_leak_hard ()`

16 {

`17`	`if` `( _M_rep ()->_M_is_shared ())`

`18`	`_M_mutate (0, 0, 0);`

`19`	`_M_rep()-> _M_set_leaked ();`

20 }

21

`22`	`void` `_M_mutate ( size_type __pos , size_type __len1, size_type __len2 )`

23 {

`24`	`const` `size_type __old_size =` `this-> size ();//16`

`25`	`const` `size_type __new_size = __old_size + __len2 - __len1 ;` `//16`

`26`	`const` `size_type __how_much = __old_size - __pos - __len1 ;` `//16`

27

`28`	`if` `( __new_size >` `this` `-> capacity() \|\| _M_rep ()->_M_is_shared ())`

29 {

`30`	`// 重新构造一个对象`

`31`	`const` `allocator_type __a = get_allocator ();`

`32`	`_Rep * __r = _Rep:: _S_create (__new_size ,` `this-> capacity (), __a );`

33

`34`	`// 然后拷贝数据`

`35`	`if` `(__pos )`

`36`	`_M_copy (__r -> _M_refdata(), _M_data (), __pos );`

`37`	`if` `(__how_much )`

`38`	`_M_copy (__r -> _M_refdata() + __pos + __len2 ,`

`39`	`_M_data () + __pos + __len1, __how_much );`

40

`41`	`//将原对象上的引用计数减`

`42`	`_M_rep ()->_M_dispose ( __a);`

43

`44`	`//绑定到新的对象上`

`45`	`_M_data (__r -> _M_refdata());`

46 }

`47`	`else` `if` `(__how_much && __len1 != __len2 )`

48 {

`49`	`// Work in-place.`

`50`	`_M_move (_M_data () + __pos + __len2 ,`

`51`	`_M_data () + __pos + __len1, __how_much );`

52 }

53

`54`	`//最后设置新对象的长度和引用计数值`

`55`	`_M_rep()-> _M_set_length_and_sharable (__new_size );`

56 }

上面源码稍微复杂点，对c进行修改的过程分为以下两步：

第一步是判断是否为共享对象，(引用计数大于0)，如果是共享对象，就拷贝一份新的数据，同时将老数据的引用计数值减1。
第二步：在新的地址空间上进行修改，从而避免了对其他对象的数据污染

由此可以看出，如果不是通过string提供的接口对string对象强制修改的话，会带来潜在的不安全性和破坏性。例如：

`1`	`char* p =` `const_cast<char*>(s1.data());`

`2`	`p[0] =` `'a';`

上述代码对c修改(“c[0] = ‘1’; “)之后，a b c对象的内存空间布局如下：

Copy-On-Write的好处通过上文的解析是显而易见是，但也带来一些副作用。例如上述代码片段”c[0] = ‘1’; “如果是通过外部的强制操作可能会带来意想不到的结果。请看下面代码：

`1`	`char* pc =` `const_cast(c.c_str());`

`2`	`pc[0] =` `'1';`

这段代码通过强制修改c对象内部数据的值，看似效率上比operator[] 高，但同时也修改a、b对象的值，而这可能不是我们所希望看到的。这是我们需要提高警惕的地方。

5. 不宜使用string的例子

我们项目组内部有一个分布式的内存kv系统，一般是md5做key，value是任意二进制数。当初设计的时候，考虑到内存容量始终有限，没有选择使用string，而是单独开发的key结构和value结构。下面是我们设计的key结构定义：

`1`	`struct` `Key`

2 {

`3`	`uint64_t low;`

`4`	`uint64_t high;`

5 };

该结构所需内存大小为16字节，保持二进制的16字节MD5。相对于string做key来说，要节省33(参考上文string内存空间布局)个字节。例如，现在我们某个项目正在使用该系统的搭建的一个分布式集群，总共有100亿条记录，每条记录都节省33字节，总共节省内存空间：33*100亿=330G。由此可见，仅仅对key的一个小小改进，就能节省如此大的内存，还是非常值得。

6. 对比微软Visual Studio提供的STL版本

vc6.0的string实现是基于引用计数的，但不是线程安全的。但在后续版本的vc中去掉了引用计数技术，string copy 都直接进行深度内存拷贝。
由于string实现上的细节不一致，导致跨平台程序的移植带来潜在的风险。这种场合下，我们需要额外注意。

7. 总结

即使是一个空string对象，其所占内存空间也达到33字节，因此在内存使用要求比较严格的应用场景，例如memcached等，请慎重考虑使用string。
string由于使用引用计数和Copy-On-Write技术，相对于strcpy，string copy的性能提升非常显著。
使用引用计数后，多个string指向同一块内存区域，因此，如果强制修改一个string的内容，会影响其他string。