性能测试（3）: 对无锁队列boost::lockfree::queue和moodycamel::ConcurrentQueue做一个性能对比测试

English version : The performance benchmark of queue with std::mutex against boost::lockfree::queue and moodycamel::ConcurrentQueue

Brief

我们使用https://github.com/Qihoo360/evpp项目中的EventLoop::QueueInLoop(...)函数来做这个性能测试。我们通过该函数能够将一个仿函数执行体从一个线程调度到另一个线程中执行。这是一个典型的生产者和消费者问题。

我们用一个队列来保存这种仿函数执行体。多个生产者线程向这个队列写入仿函数执行体，一个消费者线程从队列中取出仿函数执行体来执行。为了保证队列的线程安全问题，我们可以使用一个锁来保护这个队列，或者使用无锁队列机制来解决安全问题。EventLoop::QueueInLoop(...)函数通过通定义实现了三种不同模式的跨线程交换数据的队列。

测试对象

evpp-v0.3.2
EventLoop::QueueInLoop(...)函数内的队列的三种实现方式：
- 带锁的队列用std::vector和std::mutex来实现，具体的 gcc 版本为 4.8.2
- boost::lockfree::queue from boost-1.53
- moodycamel::ConcurrentQueue with commit c54341183f8674c575913a65ef7c651ecce47243

测试环境

Linux CentOS 6.2, 2.6.32-220.7.1.el6.x86_64
Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC)

测试方法

测试代码请参考https://github.com/Qihoo360/evpp/blob/master/benchmark/post_task/post_task6.cc. 在一个消费者线程中运行一个EventLoop对象loop_，多个生产者线程不停的调用loop_->QueueInLoop(...)方法将仿函数执行体放入到消费者的队列中让其消费（执行）。每个生产者线程放入一定总数（由运行参数指定）的仿函数执行体之后就停下来，等消费者线程完全消费完所有的仿函数执行体之后，程序退出，并记录开始和结束时间。

为了便于大家阅读，现将相关代码的核心部分摘录如下。

event_loop.h中定义了队列：

    std::shared_ptr<PipeEventWatcher> watcher_;

#ifdef H_HAVE_BOOST

    boost::lockfree::queue<Functor*>* pending_functors_;

#elif defined(H_HAVE_CAMERON314_CONCURRENTQUEUE)

    moodycamel::ConcurrentQueue<Functor>* pending_functors_;

#else

    std::mutex mutex_;

    std::vector<Functor>* pending_functors_; // @Guarded By mutex_

#endif

event_loop.cc中定义了QueueInLoop(...)的具体实现：

void Init() {

    watcher_->Watch(std::bind(&EventLoop::DoPendingFunctors, this));

}

void EventLoop::QueueInLoop(const Functor& cb) {

    {

#ifdef H_HAVE_BOOST

        auto f = new Functor(cb);

        while (!pending_functors_->push(f)) {

        }

#elif defined(H_HAVE_CAMERON314_CONCURRENTQUEUE)

        while (!pending_functors_->enqueue(cb)) {

        }

#else

        std::lock_guard<std::mutex> lock(mutex_);

        pending_functors_->emplace_back(cb);

#endif

    }

    watcher_->Notify();

}

void EventLoop::DoPendingFunctors() {

#ifdef H_HAVE_BOOST

    Functor* f = nullptr;

    while (pending_functors_->pop(f)) {

        (*f)();

        delete f;

    }

#elif defined(H_HAVE_CAMERON314_CONCURRENTQUEUE)

    Functor f;

    while (pending_functors_->try_dequeue(f)) {

        f();

        --pending_functor_count_;

    }

#else

    std::vector<Functor> functors;

    {

        std::lock_guard<std::mutex> lock(mutex_);

        notified_.store(false);

        pending_functors_->swap(functors);

    }

    for (size_t i = 0; i < functors.size(); ++i) {

        functors[i]();

    }

#endif

}

我们进行了两种测试：

一个生产者线程投递1000000个仿函数执行体到消费者线程中执行，统计总耗时。然后同样的方法我们反复测试10次
生产者线程分别是2/4/6/8/12/16/20，每个线程投递1000000个仿函数执行体到消费者线程中执行，并统计总共耗时

测试结论

当我们只有生产者和消费者都只有一个时，大多数测试结果表明moodycamel::ConcurrentQueue的性能是最好的，大概比queue with std::mutex高出10%~50%左右的性能。boost::lockfree::queue比queue with std::mutex的性能只能高出一点点。由于我们的实现中，必须要求能够使用多生产者的写入，所以并没有测试boost中专门的单生产者单消费者的无锁队列boost::lockfree::spsc_queue，在这种场景下，boost稍稍有些吃亏，但并不影响整体测试结果及结论。
当我们有多个生产者线程和一个消费者线程时，boost::lockfree::queue的性能比queue with std::mutex高出75%~150%左右。 moodycamel::ConcurrentQueue的性能最好，大概比boost::lockfree::queue高出25%~100%，比queue with std::mutex高出100%~500%。当生产者线程越多，也就是锁冲突概率越大时，moodycamel::ConcurrentQueue的性能优势体现得更加明显。

因此，上述对比测试结论，就我们的evpp项目中的EventLoop的实现方式，我们推荐使用moodycamel::ConcurrentQueue来实现跨线程的数据交换。

更详细的测试数据，请参考下面的两个图表。

纵轴是执行耗时，越低性能越高。

图1，生产者和消费者都只有一个，横轴是测试的批次：

图2，生产者线程有多个，横轴是生产者线程的个数，分别是2/4/6/8/12/16/20：

其他的性能测试报告

The IO Event performance benchmark against Boost.Asio : evpp is higher than asio about 20%~50% in this case

The ping-pong benchmark against Boost.Asio : evpp is higher than asio about 5%~20% in this case

The throughput benchmark against libevent2 : evpp is higher than libevent about 17%~130% in this case

The performance benchmark of queue with std::mutex against boost::lockfree::queue and moodycamel::ConcurrentQueue : moodycamel::ConcurrentQueue is the best, the average is higher than boost::lockfree::queue about 25%~100% and higher than queue with std::mutex about 100%~500%

The throughput benchmark against Boost.Asio : evpp and asio have the similar performance in this case

The throughput benchmark against Boost.Asio(中文) : evpp and asio have the similar performance in this case

The throughput benchmark against muduo(中文) : evpp and muduo have the similar performance in this case

最后

报告中的图表是使用gochart绘制的。

非常感谢您的阅读。如果您有任何疑问，请随时在https://github.com/Qihoo360/evpp/issues跟我们讨论。谢谢。

lockfree buffer test的更多相关文章

lockfree
为什么要lockfree 按我的理解, lockfree就是不去调用操作系统给定的锁机制. 1. 会有system call, and system call is expensive; 比如pt ...
Lock-Free 编程
文章索引 Lock-Free 编程是什么? Lock-Free 编程技术读改写原子操作(Atomic Read-Modify-Write Operations) Compare-And-Swap 循 ...
Lock-less buffer management scheme for telecommunication network applications
A buffer management mechanism in a multi-core processor for use on a modem in a telecommunications n ...
双buffer实现无锁切换
大家好,我是雨乐! 在我们的工作中,多线程编程是一件太稀松平常的事.在多线程环境下操作一个变量或者一块缓存,如果不对其操作加以限制,轻则变量值或者缓存内容不符合预期,重则会产生异常,导致进程崩溃.为了 ...
性能优化-使用双buffer实现无锁队列
借助本文,实现一种在"读多写一"场景下的无锁实现方式在我们的工作中,多线程编程是一件太稀松平常的事.在多线程环境下操作一个变量或者一块缓存,如果不对其操作加以限制,轻则变量值或者 ...
Node.js：Buffer浅谈
Javascript在客户端对于unicode编码的数据操作支持非常友好,但是对二进制数据的处理就不尽人意.Node.js为了能够处理二进制数据或非unicode编码的数据,便设计了Buffer类,该 ...
java.IO输入输出流：过滤流：buffer流和data流
java.io使用了适配器模式装饰模式等设计模式来解决字符流的套接和输入输出问题. 字节流只能一次处理一个字节,为了更方便的操作数据,便加入了套接流. 问题引入:缓冲流为什么比普通的文件字节流效率高? ...
一点公益商城开发系统模式Ring Buffer+
一个队列如果只生产不消费肯定不行的,那么如何及时消费Ring Buffer的数据呢?简单的方案就是当Ring Buffer"写满"的时候一次性将数据"消费"掉. ...
CSharpGL(38)带初始数据创建Vertex Buffer Object的情形汇总
CSharpGL(38)带初始数据创建Vertex Buffer Object的情形汇总开始总的来说,OpenGL应用开发者会遇到为如下三种数据创建Vertex Buffer Object的情形: ...

随机推荐

5 java 笔记
1 建议不要在循环体内修改循环变量的值 2 java语言没有提供goto语句来控制程序的跳转 2 java语言同样也提供了continue和break关键字来控制程序的循环结构 3 java中的标签 ...
ubuntu系统新用户添加
大概是4个步骤吧,是用脚本实现的,这里我列一下关键点 sudo useradd -m userYouWantAdd sudo passwd userYouWantAdd sudo usermod -a ...
JCA-Java加密框架
转自:https://www.jianshu.com/p/a8194c237363 JCA是平台的一个主要部分,包含一个“Provider”体系结构和一组用于数字签名,消息摘要(哈希),证书和证书验证 ...
Docker安装&java-Zookeeper进行操作
Docker安装Zookeeper下载Zookeeper镜像 docker pull zookeeper 启动容器并添加映射 docker run --privileged=: -d zookeepe ...
idea 党用快捷键
实用快捷键: Ctrl+/ 或 Ctrl+Shift+/ 注释(// 或者/*...*/ )Ctrl+D 复制行Ctrl+X 删除行快速修复 alt+enter (modify/cast)代码提示 a ...
Java学习笔记【十二、网络编程】
原计划的学习结束时间是3月4日,目前看来已经延迟了,距离低标还差一些,多方面原因,也不找借口,利用周末赶赶进度,争取本周末把低标完成吧! 参考: http://www.runoob.com/java/ ...
html中onclick传的数字不对的原因
在html中数字16位以后传输的时候都是0,改成字符串就可以了
【转】container_of宏分析
在学习Linux驱动的过程中,遇到一个宏叫做container_of.该宏定义在include/linux/kernel.h中,首先来贴出它的代码: /** * container_of - cast ...
JSON parse error: syntax error, expect {, actual error, pos 0, fastjson-version 1.2.58; nested exception is com.alibaba.fastjson.JSONExcetion: syntax error, except {, actual error, pos ...
这个报错信息告诉你,你提交的参数需要是json类型.所以,POST请求携带的数据需要序列化一下json.dumps(data).
Tomcat 7 简单定制
Tomcat笔记安装 wget https://mirrors.huaweicloud.com/apache/tomcat/tomcat-7/v7.0.96/bin/apache-tomcat-7. ...

lockfree buffer test