使用httpclient抓取时,netstat 发现很多time_wait连接
http://wiki.apache.org/HttpComponents/FrequentlyAskedConnectionManagementQuestions
1. Connections in TIME_WAIT State
After running your HTTP application, you use the netstat command and detect a lot of connections in stateTIME_WAIT. Now you wonder why these connections are not cleaned up.
1.1. What is the TIME_WAIT State?
The TIME_WAIT state is a protection mechanism in TCP. The side thatcloses a socket connection orderly will keep the connection in stateTIME_WAIT for some time, typically between 1 and 4 minutes. Thishappens afterthe connection is closed. It does notindicate a cleanup problem. The TIME_WAIT state protects against lossof data and data corruption. It is there to help you. For technicaldetails, have a look at the Unix Socket FAQ, section 2.7.
1.2. Some Connections Go To TIME_WAIT, Others Not
If a connection is orderly closed by your application, it will go tothe TIME_WAIT state. If a connection is orderly closed by the server,the server keeps it in TIME_WAIT and your client doesn't. If aconnection is reset or otherwise dropped by your application in anon-orderly fashion, it will not go to TIME_WAIT.
Unfortunately, it will not always be obvious to you whether aconnection is closed orderly or not. This is because connections arepooled and kept open for re-use by default. HttpClient 3.x, HttpClient4, and also the standard Java HttpURLConnection do that for you. Mostapplications will simply execute requests, then read from the responsestream, and finally close that stream.
Closing the response stream is notthe same thing as closing the connection! Closing the response streamreturns the connection to the pool, but it will be kept open ifpossible. This saves a lot of time if you send another request to thesame host within a few seconds, or even minutes.
Connection pools have a limited number of connections. A pool mayhave 5 connections, or 100, or maybe only 1. When you send a request toa host, and there is no open connection to that host in the pool, a newconnection needs to be opened. But if the pool is already full, an openconnection has to be closed before a new one can be opened. In thiscase, the old connection will be closed orderly and go to the TIME_WAITstate.
When your application exits and the JVM terminates, the open connections in the pools will not be closed orderly. They are reset or cancelled, without going to TIME_WAIT. To avoid this, you should call theshutdownmethod of the connection pools your application is using beforeexiting. The standard Java HttpURLConnection has no public method toshutdown it's connection pool.
1.3. Running Out Of Ports
Some applications open and orderly close a lot of connections withina short time, for example when load-testing a server. A connection instate TIME_WAIT will prevent that port number from being re-used foranother connection. That is not an error, it is the purpose ofTIME_WAIT.
TCP is configured at the operating system level, not through Java.Your first action should be to increase the number of ephemeral portson the machine. Windows in particular has a rather low default for theephemeral ports. The PerformanceWiki has tuning tips for the common operating systems, have a look at the respective Network section.
Only if increasing the number of ephemeral ports does not solve yourproblem, you should consider decreasing the duration of the TIME_WAITstate. You probably have to reduce the maximum lifetime of IP packets,as the duration of TIME_WAIT is typically twice that timespan to allowfor a round-trip delay. Be aware that this will affect all applications running on the machine. Don't ask us how to do it, we're not the experts for network tuning.
There are some ways to deal with the problem at the applicationlevel. One way is to send a "Connection: close" header with eachrequest. That will tell the server to close the connection, so it goesto TIME_WAIT on the other side. Of course this also disables thekeep-alive feature of connection pooling and thereby degradesperformance. If you are running load tests against a server, theuntypical behavior of your application may distort the test results.[[BR] Another way is to not orderly close connections. There is a trickto set SO_LINGER to a special value, which will cause the connection tobe reset instead of orderly closed. Note that the HttpClient API willnot support that directly, you'll have to extend or modify some classesto implement this hack.
Yet another way is to re-use ports thatare still blocked by a connection in TIME_WAIT. You can do that byspecifying the SO_REUSEADDR option when opening a socket. Java 1.4introduced the methodSocket.setReuseAddress for this purpose. You will have to extend or modify some classes of HttpClient for this too, but at least it's not a hack.
1.4. Further Reading
java.net.Socket.setReuseAddress
Discussion on the HttpClient mailing list in December 2007
netstat command line tool
http://www.softlab.ntua.gr/facilities/documentation/unix/unix-socket-faq/unix-socket-faq-2.html#ss2.7
2.7 Please explain the TIME_WAIT state.
Remember that TCP guarantees all data transmitted will be delivered, if atall possible. When you close a socket, the server goes into a TIME_WAITstate, just to be really really sure that all the data has gone through. When a socket is closed, both sides agree by sending messages to eachother that they will send no more data. This, it seemed to me was goodenough, and after the handshaking is done, the socket should be closed. The problem is two-fold. First, there is no way to be sure that the last ack was communicated successfully. Second, there may be "wandering duplicates" left on the net that must be dealt with if they are delivered.
Andrew Gierth (andrewg@microlise.co.uk) helped to explain the closing sequence in the following usenet posting:
Assume that a connection is in ESTABLISHED state, and the client is aboutto do an orderly release. The client's sequence no. is Sc, and the server'sis Ss. The pipe is empty in both directions.
Client Server
====== ======
ESTABLISHED ESTABLISHED
(client closes)
ESTABLISHED ESTABLISHED
<CTL=FIN+ACK><SEQ=Sc><ACK=Ss> ------->>
FIN_WAIT_1
<<-------- <CTL=ACK><SEQ=Ss><ACK=Sc+1>
FIN_WAIT_2 CLOSE_WAIT
<<-------- <CTL=FIN+ACK><SEQ=Ss><ACK=Sc+1> (server closes)
LAST_ACK
<CTL=ACK>,<SEQ=Sc+1><ACK=Ss+1> ------->>
TIME_WAIT CLOSED
(2*msl elapses...)
CLOSED
Note: the +1 on the sequence numbers is because the FIN counts as one byte of data. (The above diagram is equivalent to fig. 13 from RFC 793).
Now consider what happens if the last of those packets is dropped in thenetwork. The client has done with the connection; it has no more data orcontrol info to send, and never will have. But the server does not knowwhether the client received all the data correctly; that's what the lastACK segment is for. Now the server may or may not care whether theclient got the data, but that is not an issue for TCP; TCP is a reliablerotocol, and must distinguish between an orderly connection closewhere all data is transferred, and a connection abortwhere data mayor may not have been lost.
So, if that last packet is dropped, the server will retransmit it (it is,after all, an unacknowledged segment) and will expect to see a suitableACK segment in reply. If the client went straight to CLOSED, the onlypossible response to that retransmit would be a RST, which would indicateto the server that data had been lost, when in fact it had not been.
(Bear in mind that the server's FIN segment may, additionally, containdata.)
DISCLAIMER: This is my interpretation of the RFCs (I have read all theTCP-related ones I could find), but I have not attempted to examineimplementation source code or trace actual connections in order toverify it. I am satisfied that the logic is correct, though.
More commentarty from Vic:
The second issue was addressed by Richard Stevens (rstevens@noao.edu,author of "Unix Network Programming", see1.5 Where can I get source code for the book [book title]?).I have put together quotes from someof his postings and email which explain this. I have brought togetherparagraphs from different postings, and have made as few changes as possible.
From Richard Stevens (rstevens@noao.edu):
If the duration of the TIME_WAIT state were just to handle TCP's full-duplex close, then the time would be much smaller, and it would be some function of the current RTO (retransmission timeout), not the MSL (the packet lifetime).
A couple of points about the TIME_WAIT state.
- The end that sends the first FIN goes into the TIME_WAIT state, because thatis the end that sends the final ACK. If the other end's FIN is lost, orif the final ACK is lost, having the end that sends the first FIN maintain state about the connection guarantees that it has enough information to retransmit the final ACK.
- Realize that TCP sequence numbers wrap around after 2**32 bytes have been transferred. Assume a connection between A.1500 (host A, port 1500) and B.2000. During the connection one segment is lost and retransmitted. But the segment is not really lost, it is held by some intermediate router and then re-injected into the network. (This is called a "wandering duplicate".) But in the time between the packet being lost & retransmitted, and then reappearing, the connection is closed (without any problems) and then another connection is established between the same host, same port (that is, A.1500 and B.2000; this is called another "incarnation" of the connection). But the sequence numbers chosen for the new incarnation just happen to overlap with the sequence number of the wandering duplicate that is about to reappear. (This is indeed possible, given the way sequence numbers are chosen for TCP connections.) Bingo, you are about to deliver the data from the wandering duplicate (the previous incarnation of the connection) to the new incarnation of the connection. To avoid this, you do not allow the same incarnation of the connection to be reestablished until the TIME_WAIT state terminates.Even the TIME_WAIT state doesn't complete solve the second problem, given what is called TIME_WAIT assassination. RFC 1337 has more details.
- The reason that the duration of the TIME_WAIT state is 2*MSL is that the maximum amount of time a packet can wander around a network is assumed to be MSL seconds. The factor of 2 is for the round-trip. The recommended value for MSL is 120 seconds, but Berkeley-derived implementations normally use 30 seconds instead. This means a TIME_WAIT delay between 1 and 4 minutes. Solaris 2.x does indeed use the recommended MSL of 120 seconds.
A wandering duplicate is a packet that appeared to be lost and wasretransmitted. But it wasn't really lost ... some router had problems,held on to the packet for a while (order of seconds, could be a minuteif the TTL is large enough) and then re-injects the packet back intothe network. But by the time it reappears, the application that sentit originally has already retransmitted the data contained in that packet.
Because of these potential problems with TIME_WAIT assassinations, one should not avoid the TIME_WAIT state by setting the SO_LINGER
option to send an RST instead of the normal TCP connection termination (FIN/ACK/FIN/ACK). The TIME_WAIT state is there for a reason; it's your friend and it's there to help you :-)
I have a long discussion of just this topic in my just-released "TCP/IPIllustrated, Volume 3". The TIME_WAIT state is indeed, one of the mostmisunderstood features of TCP.
I'm currently rewriting "Unix Network Programming" (see1.5 Where can I get source code for the book [book title]?). and will include lots more on this topic, as it is often confusing and misunderstood.
An additional note from Andrew:
Closing a socket: if SO_LINGER
has not been called on a socket, thenclose()
is not supposed to discard data. This is true on SVR4.2 (and,apparently, on all non-SVR4 systems) but apparently not on SVR4; theuse of eithershutdown()
or SO_LINGER
seems to be required toguarantee delivery of all data.
http://blog.csdn.net/liuxuejin/article/details/8552677
使用httpclient抓取时,netstat 发现很多time_wait连接的更多相关文章
- HTTPCLIENT抓取网页内容
通过httpclient抓取网页信息. public class SnippetHtml{ /** * 通过url获取网站html * @param url 网站url */ public Strin ...
- [Python] 抓取时光网的电影列表并生成网页
抓取时光网的电影列表并生成网页 源码 https://github.com/YouXianMing/BeautifulSoup4-WebCralwer 分析 利用BeautifulSoup进行分析网页 ...
- HttpClient(一)HttpClient抓取网页基本信息
一.HttpClient简介 HttpClient 是 Apache Jakarta Common 下的子项目,可以用来提供高效的.最新的.功能丰富的支持 HTTP 协议的客户端编程工具包, 并且它支 ...
- HttpClient抓取网页内容简单介绍
版本HttpClient3.1 1.GET方式 第一步.创建一个客户端,类似于你用浏览器打开一个网页 HttpClient httpClient = new HttpClient(); 第二步.创建一 ...
- Java爬虫系列二:使用HttpClient抓取页面HTML
爬虫要想爬取需要的信息,首先第一步就要抓取到页面html内容,然后对html进行分析,获取想要的内容.上一篇随笔<Java爬虫系列一:写在开始前>中提到了HttpClient可以抓取页面内 ...
- 利用HttpClient抓取话费详单等信息
由于项目需要,需要获取授权用户的在运营商(中国移动.中国联通.中国电信)那里的个人信息.话费详单.月汇总账单信息(需要指出的是电信用户的个人信息无法从网上营业厅获取).抓取用户信息肯定是要模仿用户登录 ...
- zabbix proxy 服务器 netstat 出现大量Time_Wait连接问题
问题描述: 监控系统云网关监控几万个TCP port的存活情况, 最近发现有几个端口出现告警闪断情况,怀疑因为运行TCP检查的 zabbix proxy 服务器 tcp参数配置不合理. netstat ...
- C#用HttpClient抓取jd.com搜索框下拉数据
添加System.Web.dll引用 添加System.Net.Http引用 using System.Net.Http; using System.Web; string key = "电 ...
- 使用HttpClient抓取网站首页
HttpClient是Apache开发的第三方Java库,可以用来进行网络爬虫的开发,相关API的可以在http://hc.apache.org/httpcomponents-client-ga/ht ...
随机推荐
- 【转】Linux內核驅動之GPIO子系統(一)GPIO的使用 _蝸牛
原文网址:http://tc.chinawin.net/it/os/article-2512b.html 一 概述 Linux內核中gpio是最簡單,最常用的資源(和interrupt ,dma,ti ...
- 【转】Android将Activity打成jar包供第三方调用(解决资源文件不能打包的问题)
Android中引入第三方Jar包的方法(java.lang.NoClassDefFoundError解决办法) 鼠标右键项目,然后属性,然后java buildpath 然后order and ex ...
- 【转】GitHub问题之恢复本地被删除的文件
原文网址:http://blog.csdn.net/iaiti/article/details/39557951 折腾了真久,GitHub commit之后,我手痒把本地的一个文件给删了,然后一直gi ...
- bzoj2965
http://www.lydsy.com/JudgeOnline/problem.php?id=2965 http://www.tsinsen.com/A1385 平面图网络流. 首先我们要将平面图转 ...
- C++链接器工具错误:LNK2001, LNK2019(转载)
这是归属于链接器工具错误 这一类. 无法解析的外部符号“symbol” 代码引用了链接器无法在库和对象文件中找到的内容(如函数.变量或标签). 可能的原因 代码请求的内容不存在(例如,符号拼写错误或使 ...
- ORTP库API使用入门
一.简介 ORTP是一个支持RTP以及RFC3550协议的库,有如下的特性: (1)使用C语言编写,可以工作于windows, Linux, 以及 Unix平台 (2)实现了RFC3550协议,提供简 ...
- 什么是空间复杂度(What is actually Space Complexity ?)
属于空间复杂度(Space Complexity)在很多情况下被错认为是附属空间(Auxiliary Space),下面是附属空间和空间复杂度的定义. 附属空间(Auxiliary Space)是算法 ...
- Spark Yarn-cluster与Yarn-client
摘要 在Spark中,有Yarn-Client和Yarn-Cluster两种模式可以运行在Yarn上,通常Yarn-cluster适用于生产环境,而Yarn-Cluster更适用于交互,调试模式,以下 ...
- Android 编程下的计时器
在安卓 APP 的手机号注册逻辑中,经常会有将激活码发送到手机的环节,这个环节中绝大多数的应用考虑到网络延迟或服务器压力以及短信服务商的延迟等原因,会给用户提供一个重新获取激活码的按钮.如下图所示: ...
- UIApplication的作用
1.设置app图标右上角的数字2.设置状态栏的属性(样式.是否要显示)3.打开某个链接\发短信\打电话4.keyWindow : 访问程序的主窗口(一个程序只能有一个主窗口)5.windows : 访 ...