概述

本文主要对MSS相关的几个字段结合源码流程进行分析;

字段含义

user_mss(tcp_options_received)–用户配置的mss,优先级最高;

mss_clamp(tcp_options_received)–对端通告的mss,即为对端能接受的最大mss,对端通告的mss与user_mss中的较小值;

advmss(tcp_sock)–用于通告对端的mss值,本端能接受的最大mss;

mss_cache(tcp_sock)–缓存发送方当前有效的mss值,根据pmtu变化,不会超过mss_clamp;

rcv_mss(inet_connection_sock)–由最近接收到的段估算的对端mss,主要用来确定是否执行延迟确认;

user_mss配置

user_mss是用户配置的MSS,该MSS优先级最高,如果配置了该MSS,则MSS均不能超过该值;下面为调用setsockopt设置user_mss的代码,其操作字段为TCP_MAXSEG;配置范围不能小于最小MSS,不能大于最大窗口值;

 static int do_tcp_setsockopt(struct sock *sk, int level,
int optname, char __user *optval, unsigned int optlen)
{
switch (optname) {
case TCP_MAXSEG:
/* Values greater than interface MTU won't take effect. However
* at the point when this call is done we typically don't yet
* know which interface is going to be used */
if (val && (val < TCP_MIN_MSS || val > MAX_TCP_WINDOW)) {
err = -EINVAL;
break;
}
tp->rx_opt.user_mss = val;
break;
}

交互流程代码分析

第一次握手
客户端发送syn

在进行connect操作的初始化中对mss的设置如下:

(1) 如果有用户配置的user_mss,则将mss_clamp(本端最大mss)设置为user_mss;

(2) 调用tcp_sync_mss来同步mss,其主要是根据设备mtu,最大窗口等计算出当前有效的mss,并将该mss记录到tp->mss_cache中;因该函数涉及篇幅较大,在本文最后进行分析;

(3) 设置用于通告给对端的advmss,去路由表中查MSS,这里会用到pmtu,然后将这个值与user_mss比较,取较小的值设置为向对端通告的值;

(4) 估算对端的mss,根据advmss,mss_cache,rcv_wnd,MSS_DEFAULT,MIN_MSS估算rcv_mss;

 static void tcp_connect_init(struct sock *sk)
{
/* If user gave his TCP_MAXSEG, record it to clamp */
/* (1)如果配置了user_mss,则设置最大mss为user_mss */
if (tp->rx_opt.user_mss)
tp->rx_opt.mss_clamp = tp->rx_opt.user_mss;
tp->max_window = ;
tcp_mtup_init(sk);
/* (2)根据设备mtu同步mss */
tcp_sync_mss(sk, dst_mtu(dst)); tcp_ca_dst_init(sk, dst); if (!tp->window_clamp)
tp->window_clamp = dst_metric(dst, RTAX_WINDOW); /*
(3)设置向对端通告的mss
dst_metric_advmss-去路由表中查询mss
tcp_mss_clamp-取user_mss和上述查询到的mss之间的较小值
*/
tp->advmss = tcp_mss_clamp(tp, dst_metric_advmss(dst)); /* (4)估算对端mss */
tcp_initialize_rcv_mss(sk);
}

在发送syn流程中,会将advmss添加到tcp首部的选项中;调用关系为tcp_transmit_skb->tcp_syn_options->tcp_advertise_mss;可见这里不是直接使用前面的adv_mss,而是调用tcp_advertise_mss重新获取的;

 /* Compute TCP options for SYN packets. This is not the final
* network wire format yet.
*/
static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb,
struct tcp_out_options *opts,
struct tcp_md5sig_key **md5)
{
/* We always get an MSS option. The option bytes which will be seen in
* normal data packets should timestamps be used, must be in the MSS
* advertised. But we subtract them from tp->mss_cache so that
* calculations in tcp_sendmsg are simpler etc. So account for this
* fact here if necessary. If we don't do this correctly, as a
* receiver we won't recognize data packets as being full sized when we
* should, and thus we won't abide by the delayed ACK rules correctly.
* SACKs don't matter, we never delay an ACK when we have any of those
* going out. */
opts->mss = tcp_advertise_mss(sk);
remaining -= TCPOLEN_MSS_ALIGNED;
}

tcp_advertise_mss重新取查路由表获取mss,并且与前面获取的mss取较小值;

 /* Calculate mss to advertise in SYN segment.
* RFC1122, RFC1063, draft-ietf-tcpimpl-pmtud-01 state that:
*
* 1. It is independent of path mtu.
* 2. Ideally, it is maximal possible segment size i.e. 65535-40.
* 3. For IPv4 it is reasonable to calculate it from maximal MTU of
* attached devices, because some buggy hosts are confused by
* large MSS.
* 4. We do not make 3, we advertise MSS, calculated from first
* hop device mtu, but allow to raise it to ip_rt_min_advmss.
* This may be overridden via information stored in routing table.
* 5. Value 65535 for MSS is valid in IPv6 and means "as large as possible,
* probably even Jumbo".
*/
static __u16 tcp_advertise_mss(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
const struct dst_entry *dst = __sk_dst_get(sk);
int mss = tp->advmss; if (dst) {
unsigned int metric = dst_metric_advmss(dst); if (metric < mss) {
mss = metric;
tp->advmss = mss;
}
} return (__u16)mss;
}
服务器接收syn

服务器当前处于LISTEN状态,收到客户端发来的syn包,在处理过程中,需要解析tcp首部的选项,调用关系为tcp_conn_request->tcp_parse_options,其中解析选项的MSS部分如下,解析mss选项,与user_mss进行对比取较小值,然后将mss_clamp(最大mss)设置为该值;

 /* Look for tcp options. Normally only called on SYN and SYNACK packets.
* But, this can also be called on packets in the established flow when
* the fast version below fails.
*/
void tcp_parse_options(const struct sk_buff *skb,
struct tcp_options_received *opt_rx, int estab,
struct tcp_fastopen_cookie *foc)
{
switch (opcode) {
case TCPOPT_MSS:
if (opsize == TCPOLEN_MSS && th->syn && !estab) {
u16 in_mss = get_unaligned_be16(ptr);
if (in_mss) {
if (opt_rx->user_mss && opt_rx->user_mss < in_mss)
in_mss = opt_rx->user_mss;
opt_rx->mss_clamp = in_mss;
}
}
break;
}

在分配了请求控制块,对控制块进行初始化的时候,使用从选项中获取的最大mss初始化控制块的mss;

 static void tcp_openreq_init(struct request_sock *req,
const struct tcp_options_received *rx_opt,
struct sk_buff *skb, const struct sock *sk)
{
struct inet_request_sock *ireq = inet_rsk(req);
/* ... */
req->mss = rx_opt->mss_clamp;
/* ... */
}
第二次握手
服务器发送syn+ack

在请求控制块添加到连接链表之后,需要向客户端发送syn+ack,在构造synack包时,需要在选项中指明本端的mss,调用关系如下:tcp_v4_send_synack–>tcp_make_synack–>tcp_synack_options;首先获取mss,方法与前客户端的方法一致,即从路由表中获取mss,与用户配置的user_mss进行比较,取其中较小值;然后调用选项设置将该mss加入到选项中;

 struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
struct request_sock *req,
struct tcp_fastopen_cookie *foc,
enum tcp_synack_type synack_type)
{
/* mss取从路由表中查询的mss与user_mss之间的较小值 */
mss = tcp_mss_clamp(tp, dst_metric_advmss(dst));
/* 设置tcp选项 */
tcp_header_size = tcp_synack_options(req, mss, skb, &opts, md5, foc) +   sizeof(*th);
}
 /* Set up TCP options for SYN-ACKs. */
static unsigned int tcp_synack_options(struct request_sock *req,
unsigned int mss, struct sk_buff *skb,
struct tcp_out_options *opts,
const struct tcp_md5sig_key *md5,
struct tcp_fastopen_cookie *foc)
{
struct inet_request_sock *ireq = inet_rsk(req);
unsigned int remaining = MAX_TCP_OPTION_SPACE; /* We always send an MSS option. */
opts->mss = mss;
remaining -= TCPOLEN_MSS_ALIGNED;
}
客户端接收syn+ack

客户端当前处于SYN_SENT状态,此时收到服务器发来的syn+ack包,客户端进行以下工作:(1)解析该包tcp选项中的mss ,存入opt_rx->mss_clamp (2) 通过最新的pmtu计算mss (3) 估算对端mss (4) 如果需要进入快速模式,则需要通过rcv_mss计算快速模式额度;

 static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
const struct tcphdr *th)
{
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_fastopen_cookie foc = { .len = - };
int saved_clamp = tp->rx_opt.mss_clamp;
bool fastopen_fail;
/* ... */
/* (1)解析tcp选项 */
tcp_parse_options(skb, &tp->rx_opt, , &foc);
/* ... */
/* (2)计算mss */
tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
/* (3)初始化rcv_mss */
tcp_initialize_rcv_mss(sk);
/* ... */
/* (4)进入快速ack模式 */
tcp_enter_quickack_mode(sk);
}
已连接状态发送数据

tcp发送数据系统调用最终会调用tcp_sendmsg函数,该函数会在发送数据之前,获取发送mss,该mss用于限制后续发送数据段大小;

 int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
/*...*/
mss_now = tcp_send_mss(sk, &size_goal, flags);
/*...*/
}
 static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
{
int mss_now; mss_now = tcp_current_mss(sk);
*size_goal = tcp_xmit_size_goal(sk, mss_now, !(flags & MSG_OOB)); return mss_now;
}

tcp_current_mss函数根据当前mtu和实际头部选项长度,来更新mss值;

 /* Compute the current effective MSS, taking SACKs and IP options,
* and even PMTU discovery events into account.
*/
unsigned int tcp_current_mss(struct sock *sk)
{
const struct tcp_sock *tp = tcp_sk(sk);
const struct dst_entry *dst = __sk_dst_get(sk);
u32 mss_now;
unsigned int header_len;
struct tcp_out_options opts;
struct tcp_md5sig_key *md5; /* 获取当前有效mss */
mss_now = tp->mss_cache; /* 路由缓存存在 */
if (dst) {
/* 获取路径mtu */
u32 mtu = dst_mtu(dst); /* 两个mtu不相等,以当前mtu为准更新mss */
if (mtu != inet_csk(sk)->icsk_pmtu_cookie)
mss_now = tcp_sync_mss(sk, mtu);
} /* 获取头部长度 */
header_len = tcp_established_options(sk, NULL, &opts, &md5) +
sizeof(struct tcphdr);
/* The mss_cache is sized based on tp->tcp_header_len, which assumes
* some common options. If this is an odd packet (because we have SACK
* blocks etc) then our calculated header_len will be different, and
* we have to adjust mss_now correspondingly */ /* 头部长度不等,需要更新mss */
if (header_len != tp->tcp_header_len) {
int delta = (int) header_len - tp->tcp_header_len;
mss_now -= delta;
} /* 返回mss */
return mss_now;
}

函数tcp_sync_mss

这个函数上面的诸多流程都有用到,这里统一进行分析说明;

 /* This function synchronize snd mss to current pmtu/exthdr set.

    tp->rx_opt.user_mss is mss set by user by TCP_MAXSEG. It does NOT counts
for TCP options, but includes only bare TCP header. tp->rx_opt.mss_clamp is mss negotiated at connection setup.
It is minimum of user_mss and mss received with SYN.
It also does not include TCP options. inet_csk(sk)->icsk_pmtu_cookie is last pmtu, seen by this function. tp->mss_cache is current effective sending mss, including
all tcp options except for SACKs. It is evaluated,
taking into account current pmtu, but never exceeds
tp->rx_opt.mss_clamp. NOTE1. rfc1122 clearly states that advertised MSS
DOES NOT include either tcp or ip options. NOTE2. inet_csk(sk)->icsk_pmtu_cookie and tp->mss_cache
are READ ONLY outside this function. --ANK (980731)
*/
/*更新mss */
unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu)
{
struct tcp_sock *tp = tcp_sk(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
int mss_now; /* 发现mtu上限>路径mtu,则重置为路径mtu */
if (icsk->icsk_mtup.search_high > pmtu)
icsk->icsk_mtup.search_high = pmtu; /* 计算当前mss */
mss_now = tcp_mtu_to_mss(sk, pmtu);
/* 根据对端通知的最大窗口和当前mss大小调整mss */
mss_now = tcp_bound_to_half_wnd(tp, mss_now); /* And store cached results */
/* 记录最新的路径mtu */
icsk->icsk_pmtu_cookie = pmtu;
/* 启用了路径mtu发现 */
if (icsk->icsk_mtup.enabled)
/* mss为当前mss和mss探测下限计算所得的最小值 */
mss_now = min(mss_now, tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low));
/* 当前mss缓存 */
tp->mss_cache = mss_now; return mss_now;
}

下面两个函数作用为根据mtu计算mss;

 /* 计算mss,未包含SACK */
int tcp_mtu_to_mss(struct sock *sk, int pmtu)
{
/* Subtract TCP options size, not including SACKs */
/* 去掉tcp选项的长度 */
return __tcp_mtu_to_mss(sk, pmtu) -
(tcp_sk(sk)->tcp_header_len - sizeof(struct tcphdr));
}
 /* 在不根据tcp选项的情况下计算mss */
static inline int __tcp_mtu_to_mss(struct sock *sk, int pmtu)
{
const struct tcp_sock *tp = tcp_sk(sk);
const struct inet_connection_sock *icsk = inet_csk(sk);
int mss_now; /* Calculate base mss without TCP options:
It is MMS_S - sizeof(tcphdr) of rfc1122
*/
/* 当前mss = 路径mtu - 网络头 - tcp头 */
mss_now = pmtu - icsk->icsk_af_ops->net_header_len - sizeof(struct tcphdr); /* IPv6 adds a frag_hdr in case RTAX_FEATURE_ALLFRAG is set */
if (icsk->icsk_af_ops->net_frag_header_len) {
const struct dst_entry *dst = __sk_dst_get(sk); if (dst && dst_allfrag(dst))
mss_now -= icsk->icsk_af_ops->net_frag_header_len;
} /* Clamp it (mss_clamp does not include tcp options) */
/* 当前mss > mss最大值,调整成最大值 */
if (mss_now > tp->rx_opt.mss_clamp)
mss_now = tp->rx_opt.mss_clamp; /* Now subtract optional transport overhead */
/* mss减去ip选项长度 */
mss_now -= icsk->icsk_ext_hdr_len; /* Then reserve room for full set of TCP options and 8 bytes of data */
/* 若不足48,则需要扩充保留40字节的tcp选项和8字节的tcp数据长度 */
/* 8+20+20+18=64,最小包长 */
if (mss_now < )
mss_now = ; /* 返回mss */
return mss_now;
}

tcp_bound_to_half_wnd函数根据对端通告窗口的最大值来调整mss;如果最大窗口大于默认mss,则当前mss不能超过窗口的一半,当然也不能太小,最小68-headerlen;

 static inline int tcp_bound_to_half_wnd(struct tcp_sock *tp, int pktsize)
{
int cutoff; /* When peer uses tiny windows, there is no use in packetizing
* to sub-MSS pieces for the sake of SWS or making sure there
* are enough packets in the pipe for fast recovery.
*
* On the other hand, for extremely large MSS devices, handling
* smaller than MSS windows in this way does make sense.
*/
/*
对端通告的最大窗口> 默认mss
cutoff记录最大窗口的一半
*/
if (tp->max_window > TCP_MSS_DEFAULT)
cutoff = (tp->max_window >> );
/* <=默认mss,则记录最大窗口 */
else
cutoff = tp->max_window; /* 包大小值限制在68-header <= x <=cutoff之间 */ /* 包大小> cutoff,则从cutoff和最小mtu之间取大的 */
if (cutoff && pktsize > cutoff)
return max_t(int, cutoff, 68U - tp->tcp_header_len); /* 包大小<= cutoff,返回包大小 */
/* 窗口很大,则使用包大小 */
else
return pktsize;
}

TCP最大报文段MSS源码分析的更多相关文章

  1. u-boot源码分析之C语言段

    题外话: 最近一直在学习u-boot的源代码,从代码量到代码风格,都让我认识到什么才是真正的程序.以往我所学到的C语言知识和u-boot的源代码相比,实在不值一提.说到底,机器都是0和1控制的.感觉这 ...

  2. tcprstat源码分析之tcp数据包分析

    tcprstat是percona用来监测mysql响应时间的.不过对于任何运行在TCP协议上的响应时间,都可以用.本文主要做源码分析,如何使用tcprstat请大家查看博文<tcprstat分析 ...

  3. 【lwip】07-链路层收发以太网数据帧源码分析

    目录 前言 7.1 链路层概述 7.2 MAC地址的基本概念 7.3 以太网帧结构 7.4 以太网帧结构 7.5 以太网帧报文数据结构 7.6 发送以太网数据帧 7.7 接收以太网数据帧 7.8 虚拟 ...

  4. gRPC源码分析0-导读

    gRPC是Google开源的新一代RPC框架,官网是http://www.grpc.io.正式发布于2016年8月,技术栈非常的新,基于HTTP/2,netty4.1,proto3.虽然目前在工程化方 ...

  5. 《深入理解Spark:核心思想与源码分析》——SparkContext的初始化(叔篇)——TaskScheduler的启动

    <深入理解Spark:核心思想与源码分析>一书前言的内容请看链接<深入理解SPARK:核心思想与源码分析>一书正式出版上市 <深入理解Spark:核心思想与源码分析> ...

  6. docker 源码分析 一(基于1.8.2版本),docker daemon启动过程;

    最近在研究golang,也学习一下比较火的开源项目docker的源代码,国内比较出名的docker源码分析是孙宏亮大牛写的一系列文章,但是基于的docker版本有点老:索性自己就git 了一下最新的代 ...

  7. Beego源码分析(转)

    摘要 beego 是 @astaxie 开发的重量级Go语言Web框架.它有标准的MVC模式,完善的功能模块,和优异的调试和开发模式等特点.并且beego在国内企业用户较多,社区发达和Q群,文档齐全, ...

  8. [软件测试]网站压测工具Webbench源码分析

    一.我与webbench二三事 Webbench是一个在linux下使用的非常简单的网站压测工具.它使用fork()模拟多个客户端同时访问我们设定的URL,测试网站在压力下工作的性能.Webbench ...

  9. wifidog源码分析 - 用户连接过程

    引言 之前的文章已经描述wifidog大概的一个工作流程,这里我们具体说说wifidog是怎么把一个新用户重定向到认证服务器中的,它又是怎么对一个已认证的用户实行放行操作的.我们已经知道wifidog ...

随机推荐

  1. 不支持javascript的浏览器将JS脚本显示为页面内容

    不支持javascript的浏览器将JS脚本显示为页面内容.为了防止这种情况发生,您可以使用这样的HTML注释标记:<html ><体><script type=“tex ...

  2. httpclient 上传附件实例

    httpclient 单附件上传实例  (扩展多附件上传实例,点我) /** * 上传附件 * @param host * @param uri * @param filePath 文件路径 * @p ...

  3. vue学习(2)-过滤器

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...

  4. Java--java.util.stream.Collectors文档实例

    // java.util.stream.Collectors 类的主要作用就是辅助进行各类有用的 reduction 操作,例如转变输出为 Collection,把 Stream 元素进行归组. pu ...

  5. vue 项目中如何在页面刷新的状态下保留数据

    1.问题:在vue项目中,刷新页面之后,我当前打开的所有菜单,都消失,我如何实现刷新之后页面仍然是刷新之前的状态 效果图: 解决方法: 使用vuex作状态管理: 将vuex里面的数据同步更新到loca ...

  6. Oracle的FIXED

    今天发现一个有意思的问题,我们知道,在Oracle数据库中正常执行 select sysdate from dual 都可以返回当前主机的系统时间.正常修改系统时间,对应的查询结果也会变成修改后的系统 ...

  7. html中onclick传的数字不对的原因

    在html中数字16位以后传输的时候都是0,改成字符串就可以了

  8. Kubernetes介绍与核心组件

    Kubernetes是什么? Kubernetes是容器集群管理系统,是一个开源的平台,可以实现容器集群的自动化部署.自动扩缩容.维护等功能. Kubernetes 特点 可移植: 支持公有云,私有云 ...

  9. 09-【el表达式和jstl标签库】

    el表达式和jstl标签库 一:el表达式:表达式语言,jsp页面获取数据比较简单1.el表达式的语法(掌握)el表达式通常取值是获取作用域对象中的属性值:${属性名}=>是el表达式的简写的形 ...

  10. js遍历数组随机返回指定的数组结果

    ////随机生成数组中自定义的个数返回一个字符串数组    function getArrayItems(arr, num) {        //新建一个数组,将传入的数组复制过来,用于运算,而不要 ...