【超值分享】为何写服务器程序需要自己管理内存，从改造std::string字符串操作说起。。。

服务器程序为何要进行内存管理，管中窥豹，让我们从string字符串的操作说起。。。。。。

new/delete是用于c++中的动态内存管理函数，而malloc/free在c++和c中都可以使用，本质上new/delete底层封装了malloc/free。无论是上面的哪种内存管理方式，都存在以下两个问题：

1、效率问题：频繁的在堆上申请和释放内存必然需要大量时间，降低了程序的运行效率。对于一个需要频繁申请和释放内存的程序由于是服务器程序来说，大量的调用new/malloc申请内存和delete/free释放内存都需要花费系统时间，这就必然会降低程序的运行效率。

2、内存碎片：经常申请小块内存，会将物理内存“切”得很碎，导致内存碎片。申请内存的顺序并不是释放内存的顺序，因此频繁申请小块内存必然会导致内存碎片，可能造成“有内存但是申请不到大块内存”的现象。

对于客户端软件，内存管理不是很重要，起码你可以重启机器。但对于需要24小时长期不间断运行的服务器程序来说就显得特别的重要了！比如无处不在的web服务器，它采用的是HTTP协议，基于请求—应答的超文本传输方式，这种一问一答的协议非常简单，请求头和响应头都是非二进制的字符串。当服务端收到客户端的GET或POST请求时，服务器程序要先构造一个响应头并拼接响应体，如下：

	// 构造响应头

	string strHttpResponse;

	strHttpResponse += "HTTP/1.1 200 OK\r\n";

	strHttpResponse += "Server: HttpServer \r\n";

	strHttpResponse += "Content-Type: text/html; charset=utf-8\r\n";

	strHttpResponse += "Content-Length: 9527\r\n";

	strHttpResponse += "Last-Modified: Sat, 13 Apr 2019 14:27:06 GMT\r\n";

	strHttpResponse += "\r\n";				// 空行，空行后就是真正的响应体	

	// 构造响应体

	strHttpResponse += "<html><head><title>Hello，我是9527！</title>"

						"</head><body>Hello，我是9527的body，假装我有9527那么长!</body></html>";

对于动态网页或者后台应用来说，通常需要查询数据库以及各种业务上的操作，然后将结果拼接为json或xml这种半结构化数据返回给客户端。

当然这篇文章并不是要介绍什么是HTTP协议，关于HTTP协议介绍的文章已经非常多了。我们是想通过一次正常的HTTP会话，来看看字符串操作是如何应用的？是否有优化提升的可能？

字符串操作能有多大事啊！

对于客户端来说，问题确实不大，但对于每天24小时不关机长期运行的web服务器程序来说可能就会产生性能问题。字符串在累加赋值时，可能导致内存的不断开辟和销毁，也就是上面我们说的产生了内存碎片。

产生内存碎片能有多大事啊！

如果在高并发的情况下，性能就可能会有影响，频繁的malloc/free本身就会大量的占用CPU时间，过多的碎片将会让物理内存过于碎片化，从而导致无法申请更大的连续的内存块。

无论是标准库中的string还是微软MFC库中的CString，内部都会维护一个字符串缓存。当拼接后的字符串长度小于内部缓存时，直接将两个字符串连接即可；当拼接后的字符串长度大于内部缓存时，就需要重新开辟一个新的更大的缓存，然后将字符串重新拼接起来。为了直观的进行比较，我们编写一个自己的字符串封装类CFastString（文末有CFastString的全部实现）。并重载操作符“+=”。



const CFastString& CFastString::operator+=(const char *pszSrc)

{

	assert(pszSrc);

	int iLenSrc = _tcslen(pszSrc);

	int iNewSize = iLenSrc + length() + 1;	// 0结尾，所以+1

	// 当内部缓存足够时，直接进行拼接，不足时则需要开辟新的内存

	if(m_iBuffSize >= iNewSize)

	{

		memcpy(m_pszStr+m_iStrLen, pszSrc, iLenSrc);

		*(m_pszStr+iNewSize-1) = 0;

	}

	else

	{

		// 分配一块新的内存

		char* pszNew = AllocBuffer(iNewSize);

		// 将字符串拷贝拼接到新开辟的内存中

		// 方法一：strcpy+strcat

 		strcpy(pszNew, m_pszStr);

 		strcat(pszNew, pszSrc);

		// 方法二：直接使用内存拷贝

//		memcpy(pszNew, m_pszStr, m_iStrLen);

//		memcpy(pszNew+m_iStrLen, pszSrc, iLenSrc);

		free(m_pszStr);

		m_pszStr = pszNew;

	}

	m_iStrLen = iNewSize-1;

	return *this;

}

通过上面的代码可以看到，如果内部缓存不足时，将会重新申请新的缓存，字符串在不断累加过程中，可能会导致内存的反复申请和销毁，那么如何提升性能呢？

我们写个测试函数比较CFastString和string的累加函数（+=）的性能，测试代码如下：

void TestFastString()

{

	int i = 0;

	int iTimes = 5000;

	// 测试CFastString

	printf("CFastString 测试：\r\n");

	CFastString fstr = "Hello";

	DWORD dwStart = ::GetTickCount();

	for(i = 0; i < iTimes; i++)

	{

		fstr += "10000000000000000000000000000000";

		fstr += "20000000000000000000000000000000";

		fstr += "30000000000000000000000000000000";

		fstr += "40000000000000000000000000000000";

	}

	DWORD dwSpan1 = ::GetTickCount()-dwStart;

	printf("CFastString Span = %d\n", dwSpan1);

	// 测试string

	printf("std::string 测试：\r\n");

	string str = "Hello";

	dwStart = ::GetTickCount();

	for(i = 0; i < iTimes; i++)

	{

		str += "10000000000000000000000000000000";

		str += "20000000000000000000000000000000";

		str += "30000000000000000000000000000000";

		str += "40000000000000000000000000000000";

	}

	DWORD dwSpan2 = ::GetTickCount()-dwStart;

	printf("std::string Span = %d\n", dwSpan2);

	printf("测试结束！\r\n");

}

运行一下，结果如下：

我们发现CFastString并不fast，反而相当的slow。重新封装的字符串操作类还不如不封装，会不会是strcpy和strcat比较慢？

改进一：

我们修改CFastString::operator+=(const char *pszSrc)函数代码，将如下拼接语句：

// 方法一：strcpy+strcat

strcpy(pszNew, m_pszStr);

strcat(pszNew, pszSrc);

改为：

// 方法二：直接使用内存拷贝

memcpy(pszNew, m_pszStr, m_iStrLen);

memcpy(pszNew+m_iStrLen, pszSrc, iLenSrc);

再次运行看下结果：

还不错，比string快了一点，但好像并不显著。重载的+=函数中，每次内存分配的大小为前一个字符串加后一个字符串的大小，这就导致了一旦字符串的内部缓存已满时，后面每次的累加操作都会触发一次内存的重新申请和释放。举个极端的例子，假设str在累加操作前内部缓存已满：

str += "0";

str += "1";

str += "2";

str += "3";

str += "4";

str += "5";

str += "6";

str += "7";

str += "8";

str += "9";

和

str += "0123456789";

两者虽然结果一样，但第一种写法会触发10次内存的申请和释放，而后者只触发了一次。

如果我们每次申请内存时多分配一点，效果如何呢？

改进二：

我们将：

char* pszNew = AllocBuffer(iNewSize);

改为：

// 分配一块新的内存，将之前的按原尺寸分配改为增加1.5

char* pszNew = AllocBuffer(iNewSize, 1.5);

累加字符串时，我们并不是按照实际需要的尺寸来分配内存，而是在此基础上多分50%。运行结果如下：

CFastString快的仿佛飞了起来。如果上面测试函数中的iTimes不是循环次数而是并发数，也就是服务器同时处理了5000个HTTP请求，那么可以看到，CPU的处理速度得到了极大提升，也就说让CPU避免了频繁的malloc和free操作，在处理速度提升的同时，内存碎片也得到了降低。

当然你可能会说，内存多分配了50%，但这个50%换来了性能上的极大提升，服务器编程中以空间换时间非常正常，内存闲着也是闲着，又不是不还。回到AllocBuffer(int iAllocSize, double dScaleOut)这个函数上，我们只是增加了一个控制参数dScaleOut而已。

上面并不是严格意义上的内存管理，只能说是内存分配的技巧。真正的内存管理是需要预先分配N多连续的内存块（也就是内存池），当String需要内存时从内存池中申请一块，释放时再还给内存池，内存池的实现很多，已经写的太多了，就下次再介绍吧。

回到主题，如果想写好一个高性能的服务器程序，很多细节问题都要考虑，哪怕是不起眼的字符串操作，哪怕是字符串中不起眼的累加操作。

我的HttpServer就是使用了自定义CFastString同时结合了真正的内存管理，IOCP只是保证高并发的前提，真正的把内存管理起来才能确保服务器发挥最佳的性能。

下面是CFastString案例简单源码，拿走不谢！

头文件



#include <TCHAR.h>

#define DEFAULT_BUFFER_SIZE		256

class CFastString

{

public:

	CFastString();

	CFastString(const CFastString& cstrSrc);

	CFastString(const char* pszSrc);

	virtual ~CFastString();

public:

	int length() const{

		return m_iStrLen;

	}

	// 这种方式获取字符串的长度要慢于length()函数

	int GetLength() {

		return m_pszStr ? strlen(m_pszStr) : -1;

	}

	char* c_str() const{

		return m_pszStr;

	}

	// =============运算符重载=============

	const CFastString& operator=(const CFastString& cstrSrc);

	const CFastString& operator=(const char* pszSrc);

	const CFastString& operator+=(const CFastString& cstrSrc);

	const CFastString& operator+=(const char *pszSrc);

	// =============友元函数=============

	friend CFastString operator+(const CFastString& cstr1, const CFastString& cstr2);

	friend CFastString operator+(const CFastString& cstr, const char* psz);

	friend CFastString operator+(const char* psz, const CFastString& cstr);

	// 类型转换重载

	operator char*() const{

		return m_pszStr;

	}

	operator const char*() const{

		return m_pszStr;

	}

protected:

	// =============连接两个字符串=============

	void Concat(const char* psz1, const char* psz2);

protected:

	char* AllocBuffer(int iAllocSize, double dScaleOut = 1.0);

	void  ReAllocBuff(int iNewSize);

protected:

	char*	m_pszStr;		// 字符串Buffer

	int		m_iStrLen;		// 字符串长度

	int		m_iBuffSize;	// 字符串所在Buffer长度

};

实现文件



#include "stdafx.h"

#include "FastString.h"

#include <stdlib.h>

#include <assert.h>

#include <TCHAR.h>

//////////////////////////////////////////////////////////////////////

// Construction/Destruction

//////////////////////////////////////////////////////////////////////

CFastString::CFastString()

{

	m_iBuffSize = DEFAULT_BUFFER_SIZE;

	m_pszStr = (char*)malloc(m_iBuffSize);

	memset(m_pszStr, 0, m_iBuffSize);

	m_iStrLen = 0;

}

CFastString::CFastString(const CFastString& cstrSrc)

{

	int iSrcSize = cstrSrc.length()+1;

	m_pszStr = AllocBuffer(iSrcSize);

	m_iStrLen = 0;

	//_tcscpy(m_pszStr, cstrSrc);

	memcpy(m_pszStr, cstrSrc.c_str(), iSrcSize);

	m_iStrLen = iSrcSize-1;

}

CFastString::CFastString(const char* pszSrc)

{

	assert(pszSrc);

	int iSrcSize = _tcslen(pszSrc) + 1;

	m_pszStr = AllocBuffer(iSrcSize);

	m_iStrLen = 0;

	//_tcscpy(m_pszStr, pszSrc);

	memcpy(m_pszStr, pszSrc, iSrcSize);

	m_iStrLen = iSrcSize-1;

}

CFastString::~CFastString()

{

	free(m_pszStr);

	m_pszStr = NULL;

	m_iStrLen = 0;

	m_iBuffSize = 0;

}

char* CFastString::AllocBuffer(int iAllocSize, double dScaleOut)

{

	if(dScaleOut < 1.0)

		dScaleOut = 1.0;

	int iNewBuffSize = int(iAllocSize*dScaleOut);

	if(iNewBuffSize > m_iBuffSize)

		m_iBuffSize = iNewBuffSize;

	char* pszNew = (char*)malloc(m_iBuffSize);

	return pszNew;

}

void CFastString::ReAllocBuff(int iNewSize)

{

	if(iNewSize <= 0)

	{

		assert(0);

		return ;

	}

	if(iNewSize <= m_iBuffSize)

		return ;

	m_iStrLen = 0;

	// 重新分配一块内存

	free(m_pszStr);

	m_pszStr = (char*)malloc(iNewSize);

	m_iBuffSize = iNewSize;

}

void CFastString::Concat(const char* psz1, const char* psz2)

{

	assert(psz1);

	assert(psz2);

	if(NULL == psz1 || NULL == psz2)

		return;

	int iLen1 = _tcslen(psz1);

	int iLen2 = _tcslen(psz2);

	int iNewSize = iLen1 + iLen2 + 1;

	if(m_iBuffSize < iNewSize)

		ReAllocBuff(iNewSize);

	// 拷贝字符串1

	memcpy(m_pszStr, psz1, iLen1);

	// 拷贝字符串2

	memcpy(m_pszStr+iLen1, psz2, iLen2);

	m_iStrLen = iNewSize-1;

	*(m_pszStr+m_iStrLen) = 0;

}

const CFastString& CFastString::operator=(const char* pszSrc)

{

	assert(pszSrc);

	int iSrcSize = _tcslen(pszSrc)+1;

	if(m_iBuffSize < iSrcSize)

		ReAllocBuff(iSrcSize);

	//strcpy(m_pszStr, pszSrc);

	memcpy(m_pszStr, pszSrc, iSrcSize);

	m_iStrLen = iSrcSize - 1;

	return *this;

}

const CFastString& CFastString::operator+=(const CFastString& cstrSrc)

{

	cstrSrc.length();

	int iNewSize = cstrSrc.length() + length() + 1;

	if(m_iBuffSize >= iNewSize)

	{

		memcpy(m_pszStr+m_iStrLen, cstrSrc.c_str(), cstrSrc.length());

		*(m_pszStr+iNewSize-1) = 0;

	}

	else

	{

		char* pszNew = AllocBuffer(iNewSize, 1.5);

		memcpy(pszNew, m_pszStr, m_iStrLen);

		memcpy(pszNew+m_iStrLen, cstrSrc.c_str(), cstrSrc.length());

		free(m_pszStr);

		m_pszStr = pszNew;

	}

	m_iStrLen = iNewSize-1;

	return *this;

}

const CFastString& CFastString::operator+=(const char *pszSrc)

{

	assert(pszSrc);

	int iLenSrc = _tcslen(pszSrc);

	int iNewSize = iLenSrc + length() + 1;

	// 当内部缓存足够时，直接进行拼接，不足时则需要开辟新的内存

	if(m_iBuffSize >= iNewSize)

	{

		memcpy(m_pszStr+m_iStrLen, pszSrc, iLenSrc);

		*(m_pszStr+iNewSize-1) = 0;

	}

	else

	{

		// 分配一块新的内存，将之前的按原尺寸分配改为增加1.5

//		char* pszNew = AllocBuffer(iNewSize);

		char* pszNew = AllocBuffer(iNewSize, 1.5);

		// 将字符串拷贝拼接到新开辟的内存中

		// 方法一：strcpy+strcat

// 		strcpy(pszNew, m_pszStr);

// 		strcat(pszNew, pszSrc);

		// 方法二：直接使用内存拷贝

		memcpy(pszNew, m_pszStr, m_iStrLen);

		memcpy(pszNew+m_iStrLen, pszSrc, iLenSrc);

		free(m_pszStr);

		m_pszStr = pszNew;

	}

	m_iStrLen = iNewSize-1;

	return *this;

}

// ===============friend函数===================

CFastString operator+(const CFastString& cstr1, const CFastString& cstr2)

{

	CFastString cstrNew;

	cstrNew.Concat(cstr1, cstr2);

	return cstrNew;

}

CFastString operator+(const CFastString& cstr, const char* psz)

{

	CFastString cstrNew;

	cstrNew.Concat(cstr, psz);

	return cstrNew;

}

CFastString operator+(const char* psz, const CFastString& cstr)

{

	CFastString cstrNew;

	cstrNew.Concat(psz, cstr);

	return cstrNew;

}