c++比例-libcurl多线程并发时的core【转载】
转自: https://www.cnblogs.com/edgeyang/articles/3722035.html
浅析libcurl多线程安全问题
背景:使用多线程libcurl发送请求,在未设置超时或长超时的情况下程序运行良好。但只要设置了较短超时(小于180s),程序就会出现随机的coredump。并且栈里面找不到任何有用的信息。
问题:1.为什么未设置超时,或者长超时时间(比如601s)的情况下多线程libcurl不会core?
问题:2.进程coredump并不是必现,是否在libcurl内多线程同时修改了全局变量导致?
先来看下官方libcurl的说明:
libcurl is free, thread-safe, IPv6 compatible, feature rich, well supported, fast, thoroughly documented and is already used by many known, big and successful companies and numerous applications.
可以看到官方自称licurl是线程安全的,是否真的如此?再来看看代码中用到的超时选项的说明:
CURLOPT_TIMEOUT
Pass a long as parameter containing the maximum time in seconds that you allow the libcurl transfer operation to take. Normally, name lookups can take a considerable time and limiting operations to less than a few minutes risk aborting perfectly normal operations. This option will cause curl to use the SIGALRM to enable time-outing system calls.
In unix-like systems, this might cause signals to be used unless CURLOPT_NOSIGNAL is set.
Default timeout is 0 (zero) which means it never times out.
选项提到了超时机制是使用SIGALRM信号量来实现的,并且在unix-like操作系统中又提到了另外一个选项CURLOPT_NOSIGNAL:
CURLOPT_NOSIGNAL
Pass a long. If it is 1, libcurl will not use any functions that install signal handlers or any functions that cause signals to be sent to the process. This option is mainly here to allow multi-threaded unix applications to still set/use all timeout options etc, without risking getting signals. The default value for this parameter is 0. (Added in 7.10)
If this option is set and libcurl has been built with the standard name resolver, timeouts will not occur while the name resolve takes place. Consider building libcurl with c-ares support to enable asynchronous DNS lookups, which enables nice timeouts for name resolves without signals.
Setting CURLOPT_NOSIGNAL to 1 makes libcurl NOT ask the system to ignore SIGPIPE signals, which otherwise are sent by the system when trying to send data to a socket which is closed in the other end. libcurl makes an effort to never cause such SIGPIPEs to trigger, but some operating systems have no way to avoid them and even on those that have there are some corner cases when they may still happen, contrary to our desire. In addition, usingCURLAUTH_NTLM_WB authentication could cause a SIGCHLD signal to be raised.
该选项说明提到,为了在多线程中允许程序去设置timeout选项,但不是使用signals,需要设置CURLOPT_NOSIGNAL为1 。
于是在代码中加上了这句,测试再没有发现有coredump的情况。
easy_setopt(curl, CURLOPT_NOSIGNAL, (long));
问题:3.timeout机制实现机制是什么,为什么设置了选项CURLOPT_NOSIGNAL线程就安全了?
为了解答上面的问题,需要查看libcurl的相关源代码,以下是DNS解析的函数:
int Curl_resolv_timeout(struct connectdata *conn,
const char *hostname,
int port,
struct Curl_dns_entry **entry,
long timeoutms)
{
#ifdef USE_ALARM_TIMEOUT
#ifdef HAVE_SIGACTION
struct sigaction keep_sigact; /* store the old struct here */
volatile bool keep_copysig = FALSE; /* wether old sigact has been saved */
struct sigaction sigact;
#else
#ifdef HAVE_SIGNAL
void (*keep_sigact)(int); /* store the old handler here */
#endif /* HAVE_SIGNAL */
#endif /* HAVE_SIGACTION */
volatile long timeout;
volatile unsigned int prev_alarm = ;
struct SessionHandle *data = conn->data;
#endif /* USE_ALARM_TIMEOUT */
int rc; *entry = NULL; if(timeoutms < )
/* got an already expired timeout */
return CURLRESOLV_TIMEDOUT; #ifdef USE_ALARM_TIMEOUT
if(data->set.no_signal)
/* Ignore the timeout when signals are disabled */
timeout = ;
else
timeout = timeoutms; if(!timeout)
/* USE_ALARM_TIMEOUT defined, but no timeout actually requested */
return Curl_resolv(conn, hostname, port, entry); if(timeout < )
/* The alarm() function only provides integer second resolution, so if
we want to wait less than one second we must bail out already now. */
return CURLRESOLV_TIMEDOUT; /*************************************************************
* Set signal handler to catch SIGALRM
* Store the old value to be able to set it back later!
*************************************************************/
#ifdef HAVE_SIGACTION
sigaction(SIGALRM, NULL, &sigact);
keep_sigact = sigact;
keep_copysig = TRUE; /* yes, we have a copy */
sigact.sa_handler = alarmfunc;
#ifdef SA_RESTART
/* HPUX doesn't have SA_RESTART but defaults to that behaviour! */
sigact.sa_flags &= ~SA_RESTART;
#endif
/* now set the new struct */
sigaction(SIGALRM, &sigact, NULL);
#else /* HAVE_SIGACTION */
/* no sigaction(), revert to the much lamer signal() */
#ifdef HAVE_SIGNAL
keep_sigact = signal(SIGALRM, alarmfunc);
#endif
#endif /* HAVE_SIGACTION */ /* alarm() makes a signal get sent when the timeout fires off, and that
will abort system calls */
prev_alarm = alarm(curlx_sltoui(timeout/1000L)); /* This allows us to time-out from the name resolver, as the timeout
will generate a signal and we will siglongjmp() from that here.
This technique has problems (see alarmfunc).
This should be the last thing we do before calling Curl_resolv(),
as otherwise we'd have to worry about variables that get modified
before we invoke Curl_resolv() (and thus use "volatile"). */
if(sigsetjmp(curl_jmpenv, )) {
/* this is coming from a siglongjmp() after an alarm signal */
failf(data, "name lookup timed out");
rc = CURLRESOLV_ERROR;
goto clean_up;
} #else
#ifndef CURLRES_ASYNCH
if(timeoutms)
infof(conn->data, "timeout on name lookup is not supported\n");
#else
(void)timeoutms; /* timeoutms not used with an async resolver */
#endif
#endif /* USE_ALARM_TIMEOUT */ /* Perform the actual name resolution. This might be interrupted by an
* alarm if it takes too long.
*/
rc = Curl_resolv(conn, hostname, port, entry); #ifdef USE_ALARM_TIMEOUT
clean_up: if(!prev_alarm)
/* deactivate a possibly active alarm before uninstalling the handler */
alarm(); #ifdef HAVE_SIGACTION
if(keep_copysig) {
/* we got a struct as it looked before, now put that one back nice
and clean */
sigaction(SIGALRM, &keep_sigact, NULL); /* put it back */
}
#else
#ifdef HAVE_SIGNAL
/* restore the previous SIGALRM handler */
signal(SIGALRM, keep_sigact);
#endif
#endif /* HAVE_SIGACTION */ /* switch back the alarm() to either zero or to what it was before minus
the time we spent until now! */
if(prev_alarm) {
/* there was an alarm() set before us, now put it back */
unsigned long elapsed_ms = Curl_tvdiff(Curl_tvnow(), conn->created); /* the alarm period is counted in even number of seconds */
unsigned long alarm_set = prev_alarm - elapsed_ms/; if(!alarm_set ||
((alarm_set >= 0x80000000) && (prev_alarm < 0x80000000)) ) {
/* if the alarm time-left reached zero or turned "negative" (counted
with unsigned values), we should fire off a SIGALRM here, but we
won't, and zero would be to switch it off so we never set it to
less than 1! */
alarm();
rc = CURLRESOLV_TIMEDOUT;
failf(data, "Previous alarm fired off!");
}
else
alarm((unsigned int)alarm_set);
}
#endif /* USE_ALARM_TIMEOUT */ return rc;
}
由此可见,DNS解析阶段timeout的实现机制是通过SIGALRM+sigsetjmp/siglongjmp来实现的。
解析前,通过alarm设定超时时间,并设置跳转的标记:
/* alarm() makes a signal get sent when the timeout fires off, and that
will abort system calls */
prev_alarm = alarm(curlx_sltoui(timeout/1000L)); /* This allows us to time-out from the name resolver, as the timeout
will generate a signal and we will siglongjmp() from that here.
This technique has problems (see alarmfunc).
This should be the last thing we do before calling Curl_resolv(),
as otherwise we'd have to worry about variables that get modified
before we invoke Curl_resolv() (and thus use "volatile"). */
if(sigsetjmp(curl_jmpenv, )) {
/* this is coming from a siglongjmp() after an alarm signal */
failf(data, "name lookup timed out");
rc = CURLRESOLV_ERROR;
goto clean_up;
}
在等到超时后,进入alarmfunc函数实现跳转:
#ifdef USE_ALARM_TIMEOUT
/*
* This signal handler jumps back into the main libcurl code and continues
* execution. This effectively causes the remainder of the application to run
* within a signal handler which is nonportable and could lead to problems.
*/
static
RETSIGTYPE alarmfunc(int sig)
{
/* this is for "-ansi -Wall -pedantic" to stop complaining! (rabe) */
(void)sig;
siglongjmp(curl_jmpenv, );
return;
}
#endif /* USE_ALARM_TIMEOUT */
而CURLOPT_NOSIGNAL选项的作用是什么呢?

case CURLOPT_NOSIGNAL:
/*
* The application asks not to set any signal() or alarm() handlers,
* even when using a timeout.
*/
data->set.no_signal = ( != va_arg(param, long))?TRUE:FALSE;
break;

再回过头看看DNS解析的那段代码,你会发现在超时设定前有如下代码:

#ifdef USE_ALARM_TIMEOUT
if(data->set.no_signal)
/* Ignore the timeout when signals are disabled */
timeout = ;
else
timeout = timeoutms; if(!timeout)
/* USE_ALARM_TIMEOUT defined, but no timeout actually requested */
return Curl_resolv(conn, hostname, port, entry);

没错!设置了CURLOPT_NOSIGNAL选项,会把超时时间设置为0,也就是DNS解析不设置超时时间!以此来绕过SIGALRM+sigsetjmp/siglongjmp的超时机制。以此引来新的问题,DNS解析没有超时限制,不过这个官方有推荐的解决方法了。
了解了这些选项的原理之后,回到问题3,为什么使用了CURLOPT_NOSIGNAL选项后就保证了线程安全?继续看sigsetjmp/siglongjmp实现就会发现:

#ifdef HAVE_SIGSETJMP
/* Beware this is a global and unique instance. This is used to store the
return address that we can jump back to from inside a signal handler. This
is not thread-safe stuff. */
sigjmp_buf curl_jmpenv;
#endif

sigsetjmp/siglongjmp使用的curl_jmpenv是个全局唯一的变量!多个线程都会去修改该变量,破坏了栈的内容并导致coredump。看来这还是libcurl的实现问题,如果每个线程都有一个sigjmp_buf变量,是否就可以解决上面的问题呢?
看到这里,问题2也有了答案:当多个线程同时修改sigjmp_buf会出现问题,但线程间是串行的sigsetjmp/siglongjmp并不会出现问题,这有一定的随机性。
问题1,未设置超时不会有问题这很好理解,但是为什么设置长超时也不会出现问题?
原因就是在libcurl超时前,apache服务器端先超时返回了。apache超时时间一般是180s。只要libcurl超时大于180s,libcurl客户端永远都不会触发超时。而是直接返回504的错误。
是否设置了CURLOPT_NOSIGNAL就可以保证线程安全了呢?官方文档还提到了另外两个函数:
CURL *curl_easy_init( );
This function must be the first function to call, and it returns a CURL easy handle that you must use as input to other easy-functions. curl_easy_init initializes curl and this call MUST have a corresponding call to curl_easy_cleanup(3) when the operation is complete.
If you did not already call curl_global_init(3), curl_easy_init(3) does it automatically. This may be lethal in multi-threaded cases, since curl_global_init(3) is not thread-safe, and it may result in resource problems because there is no corresponding cleanup.
You are strongly advised to not allow this automatic behaviour, by calling curl_global_init(3) yourself properly. See the description in libcurl(3) of global environment requirements for details of how to use this function.
其中curl_easy_init函数体内会调用curl_global_init,而后者是非线程安全的。
在curl_easy_init函数体内,有且仅调用一次curl_global_init:
/*
* curl_easy_init() is the external interface to alloc, setup and init an
* easy handle that is returned. If anything goes wrong, NULL is returned.
*/
CURL *curl_easy_init(void)
{
CURLcode res;
struct SessionHandle *data; /* Make sure we inited the global SSL stuff */
if(!initialized) {
res = curl_global_init(CURL_GLOBAL_DEFAULT);
if(res) {
/* something in the global init failed, return nothing */
DEBUGF(fprintf(stderr, "Error: curl_global_init failed\n"));
return NULL;
}
} /* We use curl_open() with undefined URL so far */
res = Curl_open(&data);
if(res != CURLE_OK) {
DEBUGF(fprintf(stderr, "Error: Curl_open failed\n"));
return NULL;
} return data;
}
但是在curl_global_init函数体内,是非线程安全的。initialized++并非原子操作,有可能出现多个线程重复执行curl_global_init。
CURLcode curl_global_init(long flags)
{
if(initialized++)
return CURLE_OK; /* Setup the default memory functions here (again) */
Curl_cmalloc = (curl_malloc_callback)malloc;
Curl_cfree = (curl_free_callback)free;
Curl_crealloc = (curl_realloc_callback)realloc;
Curl_cstrdup = (curl_strdup_callback)system_strdup;
Curl_ccalloc = (curl_calloc_callback)calloc;
#if defined(WIN32) && defined(UNICODE)
Curl_cwcsdup = (curl_wcsdup_callback)_wcsdup;
#endif if(flags & CURL_GLOBAL_SSL)
if(!Curl_ssl_init()) {
DEBUGF(fprintf(stderr, "Error: Curl_ssl_init failed\n"));
return CURLE_FAILED_INIT;
} if(flags & CURL_GLOBAL_WIN32)
if(win32_init() != CURLE_OK) {
DEBUGF(fprintf(stderr, "Error: win32_init failed\n"));
return CURLE_FAILED_INIT;
} #ifdef __AMIGA__
if(!Curl_amiga_init()) {
DEBUGF(fprintf(stderr, "Error: Curl_amiga_init failed\n"));
return CURLE_FAILED_INIT;
}
#endif #ifdef NETWARE
if(netware_init()) {
DEBUGF(fprintf(stderr, "Warning: LONG namespace not available\n"));
}
#endif #ifdef USE_LIBIDN
idna_init();
#endif if(Curl_resolver_global_init() != CURLE_OK) {
DEBUGF(fprintf(stderr, "Error: resolver_global_init failed\n"));
return CURLE_FAILED_INIT;
} #if defined(USE_LIBSSH2) && defined(HAVE_LIBSSH2_INIT)
if(libssh2_init()) {
DEBUGF(fprintf(stderr, "Error: libssh2_init failed\n"));
return CURLE_FAILED_INIT;
}
#endif if(flags & CURL_GLOBAL_ACK_EINTR)
Curl_ack_eintr = ; init_flags = flags; return CURLE_OK;
}
所以curl_global_init需要在单线程中执行,例如在程序的开头。
最后,贴一个官方给出的多线程例子,稍作修改(docs/examples/multithreads.c):

/***************************************************************************
* _ _ ____ _
* Project ___| | | | _ \| |
* / __| | | | |_) | |
* | (__| |_| | _ <| |___
* \___|\___/|_| \_\_____|
*
* Copyright (C) 1998 - 2011, Daniel Stenberg, <daniel@haxx.se>, et al.
*
* This software is licensed as described in the file COPYING, which
* you should have received as part of this distribution. The terms
* are also available at http://curl.haxx.se/docs/copyright.html.
*
* You may opt to use, copy, modify, merge, publish, distribute and/or sell
* copies of the Software, and permit persons to whom the Software is
* furnished to do so, under the terms of the COPYING file.
*
* This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY
* KIND, either express or implied.
*
***************************************************************************/
/* A multi-threaded example that uses pthreads extensively to fetch
* X remote files at once */ #include <stdio.h>
#include <pthread.h>
#include <curl/curl.h> #define NUMT 4 /*
List of URLs to fetch. If you intend to use a SSL-based protocol here you MUST setup the OpenSSL
callback functions as described here: http://www.openssl.org/docs/crypto/threads.html#DESCRIPTION */
const char * const urls[NUMT]= {
"http://curl.haxx.se/",
"ftp://cool.haxx.se/",
"http://www.contactor.se/",
"www.haxx.se"
}; static void *pull_one_url(void *url)
{
CURL *curl; curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_TIMEOUT, 30L); /*timeout 30s,add by edgeyang*/
curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1L); /*no signal,add by edgeyang*/
curl_easy_perform(curl); /* ignores error */
curl_easy_cleanup(curl); return NULL;
} /*
int pthread_create(pthread_t *new_thread_ID,
const pthread_attr_t *attr,
void * (*start_func)(void *), void *arg);
*/ int main(int argc, char **argv)
{
pthread_t tid[NUMT];
int i;
int error; /* Must initialize libcurl before any threads are started */
curl_global_init(CURL_GLOBAL_ALL); for(i=; i< NUMT; i++) {
error = pthread_create(&tid[i],
NULL, /* default attributes please */
pull_one_url,
(void *)urls[i]);
if( != error)
fprintf(stderr, "Couldn't run thread number %d, errno %d\n", i, error);
else
fprintf(stderr, "Thread %d, gets %s\n", i, urls[i]);
} /* now wait for all threads to terminate */
for(i=; i< NUMT; i++) {
error = pthread_join(tid[i], NULL);
fprintf(stderr, "Thread %d terminated\n", i);
} curl_global_cleanup(); /*add by edgeyang*/
return ;
}

c++比例-libcurl多线程并发时的core【转载】的更多相关文章
- j2ee高并发时使用全局变量需要注意的问题
原文:https://blog.csdn.net/jston_learn/article/details/21617311 开发中,全局变量的使用很频繁,但对于多线程的访问,使用全局变量需要注意的地方 ...
- libcurl 多线程使用注意事项 - Balder~专栏 - 博客频道 - CSDN.NET
libcurl 多线程使用注意事项 - Balder~专栏 - 博客频道 - CSDN.NET libcurl 多线程使用注意事项 分类: C/C++学习 2012-05-24 18:48 2843人 ...
- 多线程并发 synchronized对象锁的控制与优化
本文针对用户取款时多线程并发情境,进行相关多线程控制与优化的描述. 首先建立用户类UserTest.业务操作类SynchronizedTest.数据存取类DataStore,多线程测试类MultiTh ...
- 多线程并发同一个表问题(li)
现有数据库开发过程中对事务的控制.事务锁.行锁.表锁的发现缺乏必要的方法和手段,通过以下手段可以丰富我们处理开发过程中处理锁问题的方法.For Update和For Update of使用户能够锁定指 ...
- Java面试题整理一(侧重多线程并发)
1..是否可以在static环境中访问非static变量? 答:static变量在Java中是属于类的,它在所有的实例中的值是一样的.当类被Java虚拟机载入的时候,会对static变量进行初始化.如 ...
- HashMap多线程并发问题分析
转载: HashMap多线程并发问题分析 并发问题的症状 多线程put后可能导致get死循环 从前我们的Java代码因为一些原因使用了HashMap这个东西,但是当时的程序是单线程的,一切都没有问题. ...
- 用读写锁三句代码解决多线程并发写入文件 z
C#使用读写锁三句代码简单解决多线程并发写入文件时提示“文件正在由另一进程使用,因此该进程无法访问此文件”的问题 在开发程序的过程中,难免少不了写入错误日志这个关键功能.实现这个功能,可以选择使用第三 ...
- 由获取微信access_token引出的Java多线程并发问题
背景: access_token是公众号的全局唯一票据,公众号调用各接口时都需使用access_token.开发者需要进行妥善保存.access_token的存储至少要保留512个字符空间.acces ...
- Java 多线程 并发编程
一.多线程 1.操作系统有两个容易混淆的概念,进程和线程. 进程:一个计算机程序的运行实例,包含了需要执行的指令:有自己的独立地址空间,包含程序内容和数据:不同进程的地址空间是互相隔离的:进程拥有各种 ...
随机推荐
- Fortify漏洞之Cross-Site Scripting(XSS 跨站脚本攻击)
书接上文,继续对Fortify漏洞进行总结,本篇主要针对XSS跨站脚步攻击漏洞进行总结,如下: 1.Cross-Site Scripting(XSS 跨站脚本攻击) 1.1.产生原因: 1. 数据通过 ...
- 3.JUC之volatile
原文链接:http://blog.csdn.net/zteny/article/details/54888629 一.简介 volatile是Java语言的关键字,用来修饰可变变量(即该变量不能被fi ...
- (二)react-native开发系列之windows开发环境配置
之前写了react-native在mac上得环境搭建,但是如果只开发android的话,只要用windows系统就可以了,下面就来说下react-native的windows开发环境配置. 1.下载配 ...
- js调用正则表达式
//验证是否为正整数 function isPositiveInteger(s) { var re = /^[0-9]+$/; return re.test(s); } if (exchangeCou ...
- JAVA笔记整理(九),JAVA中的集合
在工作中,我们经常需要将多个对象集中存放,可以使用数组,但是数组的长度一旦固定之后是不可变的,为了保存数量确定的数据,我们可以使用JAVA中的集合. 在我看来,JAVA中的集合可以看作是一个特殊的数据 ...
- 使用框架时,在web.xml中配置servlet时,拦截请求/和/*的区别。
关于servlet的拦截设置,之前看了好多,说的都不太清除,明白. 最近明白了一些,总的来说就是保证拦截所有用户请求的同时,放行静态资源. 现整理如下: 一.我们都知道在基于Spring的Applic ...
- html的基础知识点
小编为大家带来html的基础知识Web 标准的好处让Web的发展前景更广阔 内容能被更广泛的设备访问更容易被搜寻引擎搜索降低网站流量费用使网站更易于维护提高页面浏览速度Web 标准构成Web标准不是某 ...
- Java学习笔记——第3篇
面向对象 结构化程序的任何一个结构都具有唯一的入口和唯一的出口,并且程序不会出现死循环. 虽然Java是面向对象的,但Java的方法里则是一种结构化的程序流. 面向对象的基本思想:类.对象.继承.封装 ...
- linux网络编程之socket编程(七)
今天继续学习socket编程,北京在持续几天的雾霾天之后久违的太阳终于出来了,心情也特别特别的好,于是乎,在这美好的夜晚,该干点啥事吧,那当然就是继续坚持我的程序学习喽,闲话不多说,进入正题: 通过这 ...
- MySQL服务器2
1.sql的基本语法 对数据库 create database db1; 创建数据库 对表: create database t1(id int,name char(10)); 创建表 show cr ...