使用httpclient抓取时,netstat 发现很多time_wait连接
http://wiki.apache.org/HttpComponents/FrequentlyAskedConnectionManagementQuestions
1. Connections in TIME_WAIT State
After running your HTTP application, you use the netstat command and detect a lot of connections in stateTIME_WAIT. Now you wonder why these connections are not cleaned up.
1.1. What is the TIME_WAIT State?
The TIME_WAIT state is a protection mechanism in TCP. The side thatcloses a socket connection orderly will keep the connection in stateTIME_WAIT for some time, typically between 1 and 4 minutes. Thishappens afterthe connection is closed. It does notindicate a cleanup problem. The TIME_WAIT state protects against lossof data and data corruption. It is there to help you. For technicaldetails, have a look at the Unix Socket FAQ, section 2.7.
1.2. Some Connections Go To TIME_WAIT, Others Not
If a connection is orderly closed by your application, it will go tothe TIME_WAIT state. If a connection is orderly closed by the server,the server keeps it in TIME_WAIT and your client doesn't. If aconnection is reset or otherwise dropped by your application in anon-orderly fashion, it will not go to TIME_WAIT.
Unfortunately, it will not always be obvious to you whether aconnection is closed orderly or not. This is because connections arepooled and kept open for re-use by default. HttpClient 3.x, HttpClient4, and also the standard Java HttpURLConnection do that for you. Mostapplications will simply execute requests, then read from the responsestream, and finally close that stream.
Closing the response stream is notthe same thing as closing the connection! Closing the response streamreturns the connection to the pool, but it will be kept open ifpossible. This saves a lot of time if you send another request to thesame host within a few seconds, or even minutes.
Connection pools have a limited number of connections. A pool mayhave 5 connections, or 100, or maybe only 1. When you send a request toa host, and there is no open connection to that host in the pool, a newconnection needs to be opened. But if the pool is already full, an openconnection has to be closed before a new one can be opened. In thiscase, the old connection will be closed orderly and go to the TIME_WAITstate.
When your application exits and the JVM terminates, the open connections in the pools will not be closed orderly. They are reset or cancelled, without going to TIME_WAIT. To avoid this, you should call theshutdownmethod of the connection pools your application is using beforeexiting. The standard Java HttpURLConnection has no public method toshutdown it's connection pool.
1.3. Running Out Of Ports
Some applications open and orderly close a lot of connections withina short time, for example when load-testing a server. A connection instate TIME_WAIT will prevent that port number from being re-used foranother connection. That is not an error, it is the purpose ofTIME_WAIT.
TCP is configured at the operating system level, not through Java.Your first action should be to increase the number of ephemeral portson the machine. Windows in particular has a rather low default for theephemeral ports. The PerformanceWiki has tuning tips for the common operating systems, have a look at the respective Network section.
Only if increasing the number of ephemeral ports does not solve yourproblem, you should consider decreasing the duration of the TIME_WAITstate. You probably have to reduce the maximum lifetime of IP packets,as the duration of TIME_WAIT is typically twice that timespan to allowfor a round-trip delay. Be aware that this will affect all applications running on the machine. Don't ask us how to do it, we're not the experts for network tuning.
There are some ways to deal with the problem at the applicationlevel. One way is to send a "Connection: close" header with eachrequest. That will tell the server to close the connection, so it goesto TIME_WAIT on the other side. Of course this also disables thekeep-alive feature of connection pooling and thereby degradesperformance. If you are running load tests against a server, theuntypical behavior of your application may distort the test results.[[BR] Another way is to not orderly close connections. There is a trickto set SO_LINGER to a special value, which will cause the connection tobe reset instead of orderly closed. Note that the HttpClient API willnot support that directly, you'll have to extend or modify some classesto implement this hack.
Yet another way is to re-use ports thatare still blocked by a connection in TIME_WAIT. You can do that byspecifying the SO_REUSEADDR option when opening a socket. Java 1.4introduced the methodSocket.setReuseAddress for this purpose. You will have to extend or modify some classes of HttpClient for this too, but at least it's not a hack.
1.4. Further Reading
java.net.Socket.setReuseAddress
Discussion on the HttpClient mailing list in December 2007
netstat command line tool
http://www.softlab.ntua.gr/facilities/documentation/unix/unix-socket-faq/unix-socket-faq-2.html#ss2.7
2.7 Please explain the TIME_WAIT state.
Remember that TCP guarantees all data transmitted will be delivered, if atall possible. When you close a socket, the server goes into a TIME_WAITstate, just to be really really sure that all the data has gone through. When a socket is closed, both sides agree by sending messages to eachother that they will send no more data. This, it seemed to me was goodenough, and after the handshaking is done, the socket should be closed. The problem is two-fold. First, there is no way to be sure that the last ack was communicated successfully. Second, there may be "wandering duplicates" left on the net that must be dealt with if they are delivered.
Andrew Gierth (andrewg@microlise.co.uk) helped to explain the closing sequence in the following usenet posting:
Assume that a connection is in ESTABLISHED state, and the client is aboutto do an orderly release. The client's sequence no. is Sc, and the server'sis Ss. The pipe is empty in both directions.
Client Server
====== ======
ESTABLISHED ESTABLISHED
(client closes)
ESTABLISHED ESTABLISHED
<CTL=FIN+ACK><SEQ=Sc><ACK=Ss> ------->>
FIN_WAIT_1
<<-------- <CTL=ACK><SEQ=Ss><ACK=Sc+1>
FIN_WAIT_2 CLOSE_WAIT
<<-------- <CTL=FIN+ACK><SEQ=Ss><ACK=Sc+1> (server closes)
LAST_ACK
<CTL=ACK>,<SEQ=Sc+1><ACK=Ss+1> ------->>
TIME_WAIT CLOSED
(2*msl elapses...)
CLOSED
Note: the +1 on the sequence numbers is because the FIN counts as one byte of data. (The above diagram is equivalent to fig. 13 from RFC 793).
Now consider what happens if the last of those packets is dropped in thenetwork. The client has done with the connection; it has no more data orcontrol info to send, and never will have. But the server does not knowwhether the client received all the data correctly; that's what the lastACK segment is for. Now the server may or may not care whether theclient got the data, but that is not an issue for TCP; TCP is a reliablerotocol, and must distinguish between an orderly connection closewhere all data is transferred, and a connection abortwhere data mayor may not have been lost.
So, if that last packet is dropped, the server will retransmit it (it is,after all, an unacknowledged segment) and will expect to see a suitableACK segment in reply. If the client went straight to CLOSED, the onlypossible response to that retransmit would be a RST, which would indicateto the server that data had been lost, when in fact it had not been.
(Bear in mind that the server's FIN segment may, additionally, containdata.)
DISCLAIMER: This is my interpretation of the RFCs (I have read all theTCP-related ones I could find), but I have not attempted to examineimplementation source code or trace actual connections in order toverify it. I am satisfied that the logic is correct, though.
More commentarty from Vic:
The second issue was addressed by Richard Stevens (rstevens@noao.edu,author of "Unix Network Programming", see1.5 Where can I get source code for the book [book title]?).I have put together quotes from someof his postings and email which explain this. I have brought togetherparagraphs from different postings, and have made as few changes as possible.
From Richard Stevens (rstevens@noao.edu):
If the duration of the TIME_WAIT state were just to handle TCP's full-duplex close, then the time would be much smaller, and it would be some function of the current RTO (retransmission timeout), not the MSL (the packet lifetime).
A couple of points about the TIME_WAIT state.
- The end that sends the first FIN goes into the TIME_WAIT state, because thatis the end that sends the final ACK. If the other end's FIN is lost, orif the final ACK is lost, having the end that sends the first FIN maintain state about the connection guarantees that it has enough information to retransmit the final ACK.
- Realize that TCP sequence numbers wrap around after 2**32 bytes have been transferred. Assume a connection between A.1500 (host A, port 1500) and B.2000. During the connection one segment is lost and retransmitted. But the segment is not really lost, it is held by some intermediate router and then re-injected into the network. (This is called a "wandering duplicate".) But in the time between the packet being lost & retransmitted, and then reappearing, the connection is closed (without any problems) and then another connection is established between the same host, same port (that is, A.1500 and B.2000; this is called another "incarnation" of the connection). But the sequence numbers chosen for the new incarnation just happen to overlap with the sequence number of the wandering duplicate that is about to reappear. (This is indeed possible, given the way sequence numbers are chosen for TCP connections.) Bingo, you are about to deliver the data from the wandering duplicate (the previous incarnation of the connection) to the new incarnation of the connection. To avoid this, you do not allow the same incarnation of the connection to be reestablished until the TIME_WAIT state terminates.Even the TIME_WAIT state doesn't complete solve the second problem, given what is called TIME_WAIT assassination. RFC 1337 has more details.
- The reason that the duration of the TIME_WAIT state is 2*MSL is that the maximum amount of time a packet can wander around a network is assumed to be MSL seconds. The factor of 2 is for the round-trip. The recommended value for MSL is 120 seconds, but Berkeley-derived implementations normally use 30 seconds instead. This means a TIME_WAIT delay between 1 and 4 minutes. Solaris 2.x does indeed use the recommended MSL of 120 seconds.
A wandering duplicate is a packet that appeared to be lost and wasretransmitted. But it wasn't really lost ... some router had problems,held on to the packet for a while (order of seconds, could be a minuteif the TTL is large enough) and then re-injects the packet back intothe network. But by the time it reappears, the application that sentit originally has already retransmitted the data contained in that packet.
Because of these potential problems with TIME_WAIT assassinations, one should not avoid the TIME_WAIT state by setting the SO_LINGER
option to send an RST instead of the normal TCP connection termination (FIN/ACK/FIN/ACK). The TIME_WAIT state is there for a reason; it's your friend and it's there to help you :-)
I have a long discussion of just this topic in my just-released "TCP/IPIllustrated, Volume 3". The TIME_WAIT state is indeed, one of the mostmisunderstood features of TCP.
I'm currently rewriting "Unix Network Programming" (see1.5 Where can I get source code for the book [book title]?). and will include lots more on this topic, as it is often confusing and misunderstood.
An additional note from Andrew:
Closing a socket: if SO_LINGER
has not been called on a socket, thenclose()
is not supposed to discard data. This is true on SVR4.2 (and,apparently, on all non-SVR4 systems) but apparently not on SVR4; theuse of eithershutdown()
or SO_LINGER
seems to be required toguarantee delivery of all data.
http://blog.csdn.net/liuxuejin/article/details/8552677
使用httpclient抓取时,netstat 发现很多time_wait连接的更多相关文章
- HTTPCLIENT抓取网页内容
通过httpclient抓取网页信息. public class SnippetHtml{ /** * 通过url获取网站html * @param url 网站url */ public Strin ...
- [Python] 抓取时光网的电影列表并生成网页
抓取时光网的电影列表并生成网页 源码 https://github.com/YouXianMing/BeautifulSoup4-WebCralwer 分析 利用BeautifulSoup进行分析网页 ...
- HttpClient(一)HttpClient抓取网页基本信息
一.HttpClient简介 HttpClient 是 Apache Jakarta Common 下的子项目,可以用来提供高效的.最新的.功能丰富的支持 HTTP 协议的客户端编程工具包, 并且它支 ...
- HttpClient抓取网页内容简单介绍
版本HttpClient3.1 1.GET方式 第一步.创建一个客户端,类似于你用浏览器打开一个网页 HttpClient httpClient = new HttpClient(); 第二步.创建一 ...
- Java爬虫系列二:使用HttpClient抓取页面HTML
爬虫要想爬取需要的信息,首先第一步就要抓取到页面html内容,然后对html进行分析,获取想要的内容.上一篇随笔<Java爬虫系列一:写在开始前>中提到了HttpClient可以抓取页面内 ...
- 利用HttpClient抓取话费详单等信息
由于项目需要,需要获取授权用户的在运营商(中国移动.中国联通.中国电信)那里的个人信息.话费详单.月汇总账单信息(需要指出的是电信用户的个人信息无法从网上营业厅获取).抓取用户信息肯定是要模仿用户登录 ...
- zabbix proxy 服务器 netstat 出现大量Time_Wait连接问题
问题描述: 监控系统云网关监控几万个TCP port的存活情况, 最近发现有几个端口出现告警闪断情况,怀疑因为运行TCP检查的 zabbix proxy 服务器 tcp参数配置不合理. netstat ...
- C#用HttpClient抓取jd.com搜索框下拉数据
添加System.Web.dll引用 添加System.Net.Http引用 using System.Net.Http; using System.Web; string key = "电 ...
- 使用HttpClient抓取网站首页
HttpClient是Apache开发的第三方Java库,可以用来进行网络爬虫的开发,相关API的可以在http://hc.apache.org/httpcomponents-client-ga/ht ...
随机推荐
- cf444A DZY Loves Physics
A. DZY Loves Physics time limit per test 1 second memory limit per test 256 megabytes input standard ...
- 【剑指offer】面试题41:和为 s 的两个数字 VS 和为 s 的连续正数序列
题目: 输出所有和为S的连续正数序列.序列内按照从小至大的顺序,序列间按照开始数字从小到大的顺序 思路: small代表序列最小数字,large代表序列最大数字.初始化small为1,large为2. ...
- android开发学习 几个有用的学习资料~
<Android高级应用开发-基础篇> 针对Android基础入门课程,包含了Android四大件基础.UI,Launcher等等.这个课程学习之后也是进一步深入的基础. <Andr ...
- testlink 下载地址
testlink 下载地址 https://sourceforge.net/projects/testlink/files/TestLink%201.9/
- Linux下PHP安装配置MongoDB数据库连接扩展
Web服务器: IP地址:192.168.21.127 PHP安装路径:/usr/local/php 实现目的: 安装PHP的MongoDB数据库扩展,通过PHP程序连接MongoDB数据库 具体操作 ...
- php 实现购物车
<?php class Cart{ public function Cart() { if(!isset($_SESSION['cart'])){ ...
- 编写高质量代码改善python程序91个建议学习01
编写高质量代码改善python程序91个建议学习 第一章 建议1:理解pythonic的相关概念 狭隘的理解:它是高级动态的脚本编程语言,拥有很多强大的库,是解释从上往下执行的 特点: 美胜丑,显胜隐 ...
- AIX下RAC搭建 Oracle10G(一)检測系统环境
AIX下RAC搭建系列 环境 节点 节点1 节点2 小机型号 IBM P-series 630 IBM P-series 630 主机名 AIX203 AIX204 交换机 SAN光纤交换机 存储 S ...
- Android开发小问题——java使用
2013-09-25 导语:离上次写博客有点久了,这次写两个开发中解决的问题吧. 正文: 1.ArrayList<E>使用remove问题: 2.字符串映射到函数运行方法: ==== 1. ...
- leetcode:Minimum Path Sum(路线上元素和的最小值)【面试算法题】
题目: Given a m x n grid filled with non-negative numbers, find a path from top left to bottom right w ...