TCPkeepalive的详解(解惑)|江阴雨辰互联

2023年7月20日发(作者：)

TCPkeepalive的详解（解惑）TCP是⾯向连接的，⼀般情况，两端的应⽤程序可以通过发送和接收数据得知对端的存活。当两端的应⽤程序都没有数据发送和接收时，如何判断连接是否正常呢？这就是SO_KEEPALIVE的作⽤。1. SO_KEEPALIVE 的作⽤1.1 SO_KEEPALIVE的定义SO_KEEPALIVE⽤于开启或者关闭保活探测，默认情况下是关闭的。当SO_KEEPALIVE开启时，可以保持连接检测对⽅主机是否崩溃，避免（服务器）永远阻塞于TCP连接的输⼊。相关的属性包括：tcp_keepalive_time、tcp_keepalive_probes、tcp_keepalive_intvl。tcp_keepalive_intvl (integer; default: 75; since Linux 2.4) The number of seconds between TCP keep-alive _keepalive_probes (integer; default: 9; since Linux 2.2) The maximum number of TCP keep-alive probes to send before giving up and killing the connection if no response is obtained from the other _keepalive_time (integer; default: 7200; since Linux 2.2) The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes. Keep-alives are sent only when the SO_KEEPALIVE socket option is enabled. The default value is 7200 seconds (2 hours). An idle connection is terminated after approximately an additional 11 minutes (9 probes an interval of 75 seconds apart) when keep-alive is enabled. Note that underlying connection tracking mechanisms and application timeouts may be much shorter.这些属性可以在/proc/sys/net/ipv4/下查看：cat /proc/sys/net/ipv4/tcp_keepalive_time7200cat /proc/sys/net/ipv4/tcp_keepalive_probes9cat /proc/sys/net/ipv4/tcp_keepalive_intvl75也可以通过命令⾏查看：sudo sysctl -a | grep _keepalive_time = _keepalive_probes = _keepalive_intvl = 751.2 连接探活的过程开启SO_KEEPALIVE后，如果2⼩时内在此套接⼝的任⼀⽅向都没有数据交换，TCP就⾃动给对⽅发⼀个保持存活探测分节(keepaliveprobe)。这是⼀个对⽅必须响应的TCP分节.它会导致以下三种情况：对⽅接收⼀切正常：以期望的ACK响应。2⼩时后，TCP将发出另⼀个探测分节。对⽅已崩溃且已重新启动：以RST响应。套接⼝的待处理错误被置为ECONNRESET，套接⼝本⾝则被关闭。对⽅⽆任何响应：源⾃berkeley的TCP发送另外8个探测分节，相隔75秒⼀个，试图得到⼀个响应。⼀共尝试9次，即在发出第⼀个探测分节11分钟 15秒后若仍⽆响应就放弃。套接⼝的待处理错误被置为ETIMEOUT，套接⼝本⾝则被关闭。如ICMP错误是“hostunreachable(主机不可达)”，说明对⽅主机并没有崩溃，但是不可达，这种情况下待处理错误被置为 EHOSTUNREACH。根据上⾯的介绍我们可以知道对端以⼀种⾮优雅的⽅式断开连接的时候，我们可以设置SO_KEEPALIVE属性使得我们在2⼩时以后发现对⽅的TCP连接是否依然存在。int keepAlive = 1;setsockopt(listenfd, SOL_SOCKET, SO_KEEPALIVE, (void*)&keepAlive, sizeof(keepAlive));如果我们不能接受如此之长的等待时间，怎么办？2.设置TCP KEEPALIVE上⾯提到，SO_KEEPALIVE默认的时间间隔太长，不利于应⽤程序检测连接状态。解决⽅法有2种：全局设置针对单个连接设置2.1 全局设置在Linux中我们可以通过修改 /etc/ 的全局配置：_keepalive_time=_keepalive_intvl=_keepalive_probes=9添加上⾯的配置后输⼊

sysctl -p 使其⽣效，你可以使⽤命令来查看当前的默认配置sysctl -a | grep keepalive

如果应⽤中已经设置SO_KEEPALIVE，程序不⽤重启，内核直接⽣效.这种⽅法设置的全局的参数，针对整个系统⽣效，对单个socket的设置不够友好。2.2 针对单个连接设置我们可以使⽤TCP的TCP_KEEPCNT、TCP_KEEPIDLE、TCP_KEEPINTVL3个选项。这些选项是连接级别的，每个socket都可以设置这些属性。这些选项的定义，可以通过man查看。man 7 tcpsocket option：TCP_KEEPCNT (since Linux 2.4) The maximum number of keepalive probes TCP should send before dropping the connection. This option should not be used in code intended to be portable. 关闭⼀个⾮活跃连接之前的最⼤重试次数。该选项不具备可移植性。TCP_KEEPIDLE (since Linux 2.4) The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes, if the socket option SO_KEEPALIVE has been set on this socket. This option should not be used in code intended to be portable. 设置连接上如果没有数据发送的话，多久后发送keepalive探测分组，单位是秒该选项不具备可移植性。TCP_KEEPINTVL (since Linux 2.4) The time (in seconds) between individual keepalive probes. This option should not be used in code intended to be portable. 前后两次探测之间的时间间隔，单位是秒该选项不具备可移植性。代码层⾯的设置步骤：int keepAlive = 1; // ⾮0值，开启keepalive属性int keepIdle = 60; // 如该连接在60秒内没有任何数据往来,则进⾏此TCP层的探测int keepInterval = 5; // 探测发包间隔为5秒int keepCount = 3; // 尝试探测的最多次数// 开启探活setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, (void *)&keepAlive, sizeof(keepAlive));setsockopt(sockfd, SOL_TCP, TCP_KEEPIDLE, (void*)&keepIdle, sizeof(keepIdle));setsockopt(sockfd, SOL_TCP, TCP_KEEPINTVL, (void *)&keepInterval, sizeof(keepInterval));setsockopt(sockfd, SOL_TCP, TCP_KEEPCNT, (void *)&keepCount, sizeof(keepCount)

3.为什么应⽤层需要heart beat/⼼跳包？通过上⾯的介绍，感觉TCP keepalive已经很⽜逼了，但为什么还会提到应⽤层的⼼跳呢？⽬前了解的原因包括两个：TCP keepalive处于传输层，由操作系统负责，能够判断进程存在，⽹络通畅，但⽆法判断进程阻塞或死锁等问题。客户端与服务器之间有四层代理或负载均衡，即在传输层之上的代理，只有传输层以上的数据才被转发，例如socks5等所以，基于以上原因，有时候还是需要应⽤程序⾃⼰去设计⼼跳规则的。可以服务端负责周期发送⼼跳包，检测客户端，也可以客户端负责发送⼼跳包，或者服服务端和客户端同时发送⼼跳包。可以根据具体的应⽤场景进⾏设计。参考《UNIX⽹络编程卷1》《Linux多线程服务端编程》

发布者：admin，转转请注明出处：http://www.yc00.com/xiaochengxu/1689812963a288269.html