一次网络连接残留的分析

一次网络连接残留的分析

本来放在知识星球,作删减和调整后发博客吧

问题描述

LVS TCP 探活一般是 3 次握手(验证服务节点还在)后立即发送一个 RST packet 来断开连接(效率高,不需要走四次挥手),但是在我们的LVS 后面的 RS 上发现有大量的探活连接残留,需要分析为什么?

一通分析下来发现是 RST 包 和第三次握手的 ack 到达对端乱序了,导致 RST 被drop 掉了。但是还需要进一步分析 drop 的时候和 RST 包里面带的 timestamp 有没有关系?

可以用 Scapy 来实验验证如下 4 个场景:

  1. 正常三次握手,然后发送 RST 看看是否被 drop —— 期望 RST 不被 drop,连接正常释放,作为对比项
  2. 正常 2 次握手,然后立即发送 RST(正常带 timestamp),再发送 ack(制造乱序),看看 RST 会不会被 drop,如果 RST drop 后连接还能正常握手成功并残留吗?
  3. 正常 2 次握手,然后立即发送 RST(不带 timestamp),再发送 ack(制造乱序),看看 RST 会不会被 drop
  4. 正常 2 次握手,然后立即发送 RST(带 timestamp,但是 timestamp 为 0),再发送 ack(制造乱序),看看 RST 会不会被 drop

重现场景构造如下:通过客户端+服务端来尝试重现,客户端用 scapy 来构造任意网络包,服务端通过 python 起一个 WEB 服务

客户端

因为最新的 scapy 需要 python3.7 ,可以搞一个内核版本较高的 Linux 来测试(星球统一 99 块的实验 ECS 就符合要求),安装命令大概是这样:yum install python3-scapy

用 scapy 脚本构造如上 3 个场景的网络包,代码和使用帮助我放到这里了:https://github.com/plantegg/programmer_case/commit/e71ade38050c48170c7d6fb5922f78188a96435b#diff-3d18b8aa76586e6c59227e020ba22ef1ef8c5416764d0a923b198ad824996eda

如果需要构造带 timestamp 的RST 用如下代码段,乱序通过调整 ack和 RST 的顺序来实现

1
2
3
4
5
6
7
8
9
10
11
# 构造 ACK 包
ack = TCP(sport=source_port,
dport=target_port,
flags='A',
seq=syn_ack.ack,
ack=syn_ack.seq + 1,
options=[('NOP', None), ('NOP', None),
('Timestamp', (int(time.time()), 0))]) //重点调整这里的时间戳,以及 rst 和 ack 包的顺序

# 发送 ACK
send(ip/ack)

在scapy 机器上drop 掉OS 自动发送的 RST(因为连接是 scapy 伪造的,OS 收到 syn+ack 后会 OS系统会发 RST(这个 RST不带 timestamp))

1
2
3
4
iptables -A OUTPUT -p tcp --dport 8000 --tcp-flags RST RST  ! --tcp-option 8 -j DROP

//清理
iptables -D OUTPUT -p tcp --dport 8000 --tcp-flags RST RST ! --tcp-option 8 -j DROP

scapy 构造的包流程,可以看到不走内核 tcp 协议栈,也不走 nf_hook(防火墙),不受上面的 iptables 规则限制,所以能发送到服务端:

1
2
3
4
5
6
7
8
9
***************** c7d8ea00 ***************
[100167.011693] [__dev_queue_xmit ] TCP: 172.26.137.131:8146 -> 172.26.137.130:8000 seq:12346, ack:0, flags:R
[100167.011702] [dev_hard_start_xmit ] TCP: 172.26.137.131:8146 -> 172.26.137.130:8000 seq:12346, ack:0, flags:R *skb is successfully sent to the NIC driver*
[100167.011714] [consume_skb ] TCP: 172.26.137.131:8146 -> 172.26.137.130:8000 seq:12346, ack:0, flags:R *packet is freed (normally)*

***************** c700e300 ***************
[100167.024811] [__dev_queue_xmit ] TCP: 172.26.137.131:8146 -> 172.26.137.130:8000 seq:12346, ack:2680597246, flags:A
[100167.024821] [dev_hard_start_xmit ] TCP: 172.26.137.131:8146 -> 172.26.137.130:8000 seq:12346, ack:2680597246, flags:A *skb is successfully sent to the NIC driver*
[100167.024891] [consume_skb ] TCP: 172.26.137.131:8146 -> 172.26.137.130:8000 seq:12346, ack:2680597246, flags:A *packet is freed (normally)*

Server 端

先记住一个知识点,后面看内核调用堆栈会用得上确认是否被丢包

一个网络包正常处理流程最后调 consume_skb 来释放,如果网络包需要 Drop 就调 kfree_skb 来丢包

server端 安装 netstrace来监控包是否被drop,并通过 python 拉起一个端口:

1
python -m http.server 8000

tcpdump 确认 8000 端口收到的包

在 8000端口机器上执行抓包验证收到的包顺序和所携带的 timestamp,包含 3 个场景的包:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#tcpdump -i eth0 port 8000 -nn
//场景 2:正常 2 次握手,然后立即发送 RST(带 timestamp)
13:56:26.614701 IP 172.26.137.131.54321 > 172.26.137.130.8000: Flags [S], seq 2754757912, win 8192, options [mss 1460,nop,nop,TS val 1732514186 ecr 0], length 0
13:56:26.614815 IP 172.26.137.130.8000 > 172.26.137.131.54321: Flags [S.], seq 1579697129, ack 2754757913, win 65160, options [mss 1460,nop,nop,TS val 2888180099 ecr 1732514186], length 0
13:56:26.633997 IP 172.26.137.131.54321 > 172.26.137.130.8000: Flags [R], seq 2754757913, win 8192, options [mss 1460,nop,nop,TS val 1732514186 ecr 0], length 0 //留意端口号 54321 和 seq 2754757913 跟 nettrace 对应
13:56:26.654954 IP 172.26.137.131.54321 > 172.26.137.130.8000: Flags [.], ack 1, win 8192, options [nop,nop,TS val 1732514186 ecr 0], length 0
13:56:26.655042 IP 172.26.137.130.8000 > 172.26.137.131.54321: Flags [R], seq 1579697130, win 0, length 0

//场景 3:正常 2 次握手,然后立即发送 RST(不带 timestamp), 注意这里的 tcp options 是 null
13:56:28.993723 IP 172.26.137.131.12345 > 172.26.137.130.8000: Flags [S], seq 54243194, win 8192, options [mss 1460,nop,nop,TS val 1732514188 ecr 0], length 0
13:56:28.993809 IP 172.26.137.130.8000 > 172.26.137.131.12345: Flags [S.], seq 1983242893, ack 54243195, win 65160, options [mss 1460,nop,nop,TS val 2888182478 ecr 1732514188], length 0
13:56:29.012982 IP 172.26.137.131.12345 > 172.26.137.130.8000: Flags [R], seq 54243195, win 8192, length 0 //留意端口号 12345 和 seq 54243195 跟 nettrace 对应
13:56:29.029886 IP 172.26.137.131.12345 > 172.26.137.130.8000: Flags [.], ack 1, win 8192, options [nop,nop,TS val 1732514189 ecr 0], length 0
13:56:29.029983 IP 172.26.137.130.8000 > 172.26.137.131.12345: Flags [R], seq 1983242894, win 0, length 0 //OS 触发
13:56:29.050888 IP 172.26.137.131.12345 > 172.26.137.130.8000: Flags [R], seq 54243195, win 8192, length 0

//场景 1:正常握手,然后 RST
13:56:30.399672 IP 172.26.137.131.22345 > 172.26.137.130.8000: Flags [S], seq 1038081714, win 8192, options [mss 1460,nop,nop,TS val 1732514190 ecr 0], length 0
13:56:30.399770 IP 172.26.137.130.8000 > 172.26.137.131.22345: Flags [S.], seq 3263478059, ack 1038081715, win 65160, options [mss 1460,nop,nop,TS val 2888183884 ecr 1732514190], length 0
13:56:30.426005 IP 172.26.137.131.22345 > 172.26.137.130.8000: Flags [.], ack 1, win 8192, options [nop,nop,TS val 1732514190 ecr 0], length 0
13:56:30.448876 IP 172.26.137.131.22345 > 172.26.137.130.8000: Flags [R], seq 1038081715, win 8192, length 0

场景 1:正常三次握手后再 RST,作为对比

netstrace 命令和结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#netstat -P 8000
***************** c22c8c00,c22c8000 ***************
[4912187.018483] [__ip_local_out ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA
[4912187.018485] [nf_hook_slow ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA *ipv4 in chain: OUTPUT*
[4912187.018487] [nft_do_chain ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA *iptables table:, chain:OUTPUT*
[4912187.018489] [nft_do_chain ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA *iptables table:, chain:OUTPUT*
[4912187.018493] [ip_output ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA
[4912187.018495] [nf_hook_slow ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA *ipv4 in chain: POST_ROUTING*
[4912187.018496] [nft_do_chain ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA *iptables table:, chain:POSTROU*
[4912187.018499] [ip_finish_output ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA
[4912187.018502] [ip_finish_output2 ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA
[4912187.018506] [__dev_queue_xmit ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA
[4912187.018510] [dev_hard_start_xmit ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA *skb is successfully sent to the NIC driver*
[4912187.018512] [skb_clone ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA
[4912187.018516] [tpacket_rcv ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA
[4912187.018519] [consume_skb ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA *packet is freed (normally)*
[4912187.018533] [consume_skb ] TCP: 172.26.137.130:8000 -> 172.26.137.131:22345 seq:3263478059, ack:1038081715, flags:SA *packet is freed (normally)*

***************** c22c8a00,c22c8f00 ***************
[4912187.044742] [napi_gro_receive_entry] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044749] [dev_gro_receive ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044751] [__netif_receive_skb_core] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044753] [tpacket_rcv ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044758] [ip_rcv ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044760] [ip_rcv_core ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044762] [skb_clone ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044766] [nf_hook_slow ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A *ipv4 in chain: PRE_ROUTING*
[4912187.044769] [nft_do_chain ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A *iptables table:, chain:PREROUT*
[4912187.044772] [ip_rcv_finish ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044776] [ip_route_input_slow ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044781] [fib_validate_source ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044785] [ip_local_deliver ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044786] [nf_hook_slow ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A *ipv4 in chain: INPUT*
[4912187.044787] [nft_do_chain ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A *iptables table:, chain:INPUT*
[4912187.044789] [nft_do_chain ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A *iptables table:, chain:INPUT*
[4912187.044791] [ip_local_deliver_finish] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044794] [tcp_v4_rcv ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044806] [tcp_child_process ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044810] [tcp_rcv_state_process] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A *TCP socket state has changed*
[4912187.044813] [tcp_ack ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044818] [__kfree_skb ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044825] [packet_rcv ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A
[4912187.044827] [consume_skb ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:A *packet is freed (normally)*

***************** c22c8900,c22c8a00 ***************
[4912187.067611] [napi_gro_receive_entry] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067617] [dev_gro_receive ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067622] [__netif_receive_skb_core] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067624] [tpacket_rcv ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067628] [ip_rcv ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067630] [ip_rcv_core ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067631] [skb_clone ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067634] [nf_hook_slow ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R *ipv4 in chain: PRE_ROUTING*
[4912187.067636] [nft_do_chain ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R *iptables table:, chain:PREROUT*
[4912187.067639] [ip_rcv_finish ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067640] [ip_local_deliver ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067642] [nf_hook_slow ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R *ipv4 in chain: INPUT*
[4912187.067643] [nft_do_chain ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R *iptables table:, chain:INPUT*
[4912187.067644] [nft_do_chain ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R *iptables table:, chain:INPUT*
[4912187.067646] [ip_local_deliver_finish] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067648] [tcp_v4_rcv ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067650] [tcp_filter ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067651] [tcp_v4_do_rcv ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067653] [tcp_rcv_established ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067659] [__kfree_skb ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067685] [packet_rcv ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R
[4912187.067687] [consume_skb ] TCP: 172.26.137.131:22345 -> 172.26.137.130:8000 seq:1038081715, ack:3263478060, flags:R *packet is freed (normally)* //RST packet 被正常处理,没有发生 drop

场景 2:正常 2 次握手,然后立即发送 RST(带 timestamp)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#netstat -P 8000
//场景 2:正常 2 次握手,然后立即发送 RST(带 timestamp)—— RST 被 drop 了
***************** c22c8900,c22c8300 *************** //8000 端口回复的 syn+ack
[4912183.233533] [__ip_local_out ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA
[4912183.233535] [nf_hook_slow ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA *ipv4 in chain: OUTPUT*
[4912183.233537] [nft_do_chain ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA *iptables table:, chain:OUTPUT*
[4912183.233538] [nft_do_chain ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA *iptables table:, chain:OUTPUT*
[4912183.233541] [ip_output ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA
[4912183.233542] [nf_hook_slow ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA *ipv4 in chain: POST_ROUTING*
[4912183.233543] [nft_do_chain ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA *iptables table:, chain:POSTROU*
[4912183.233546] [ip_finish_output ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA
[4912183.233549] [ip_finish_output2 ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA
[4912183.233552] [__dev_queue_xmit ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA
[4912183.233555] [dev_hard_start_xmit ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA *skb is successfully sent to the NIC driver*
[4912183.233557] [skb_clone ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA
[4912183.233561] [tpacket_rcv ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA
[4912183.233565] [consume_skb ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA *packet is freed (normally)*
[4912183.233581] [consume_skb ] TCP: 172.26.137.130:8000 -> 172.26.137.131:54321 seq:1579697129, ack:2754757913, flags:SA *packet is freed (normally)*

***************** c22c8000,c22c8c00 ***************//客户端发送的 RST 比 ack 先到
[4912183.252733] [napi_gro_receive_entry] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252741] [dev_gro_receive ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252743] [__netif_receive_skb_core] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252745] [tpacket_rcv ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252749] [ip_rcv ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252750] [ip_rcv_core ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252752] [skb_clone ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252757] [nf_hook_slow ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R *ipv4 in chain: PRE_ROUTING*
[4912183.252759] [nft_do_chain ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R *iptables table:, chain:PREROUT*
[4912183.252761] [ip_rcv_finish ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252765] [ip_route_input_slow ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252771] [fib_validate_source ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252773] [ip_local_deliver ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252775] [nf_hook_slow ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R *ipv4 in chain: INPUT*
[4912183.252777] [nft_do_chain ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R *iptables table:, chain:INPUT*
[4912183.252779] [nft_do_chain ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R *iptables table:, chain:INPUT*
[4912183.252782] [ip_local_deliver_finish] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252783] [tcp_v4_rcv ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252789] [kfree_skb ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R *tcp_v4_rcv+0x65* *packet is dropped by kernel* //被 drop 了
[4912183.252792] [packet_rcv ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R
[4912183.252794] [consume_skb ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:R *packet is freed (normally)*

***************** c22c8900,c22c8200 ***************
[4912183.273690] [napi_gro_receive_entry] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273697] [dev_gro_receive ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273700] [__netif_receive_skb_core] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273701] [tpacket_rcv ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273705] [ip_rcv ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273707] [ip_rcv_core ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273708] [skb_clone ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273711] [nf_hook_slow ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A *ipv4 in chain: PRE_ROUTING*
[4912183.273714] [nft_do_chain ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A *iptables table:, chain:PREROUT*
[4912183.273716] [ip_rcv_finish ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273719] [ip_route_input_slow ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273724] [fib_validate_source ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273726] [ip_local_deliver ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273728] [nf_hook_slow ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A *ipv4 in chain: INPUT*
[4912183.273733] [nft_do_chain ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A *iptables table:, chain:INPUT*
[4912183.273735] [nft_do_chain ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A *iptables table:, chain:INPUT*
[4912183.273737] [ip_local_deliver_finish] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273738] [tcp_v4_rcv ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273742] [__inet_lookup_listener] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273744] [tcp_filter ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273746] [tcp_v4_do_rcv ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273750] [tcp_rcv_state_process] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A *TCP socket state has changed*
[4912183.273754] [tcp_v4_send_reset ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273798] [kfree_skb ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A *tcp_v4_do_rcv+0x6c* *packet is dropped by kernel*
[4912183.273801] [packet_rcv ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A
[4912183.273803] [consume_skb ] TCP: 172.26.137.131:54321 -> 172.26.137.130:8000 seq:2754757913, ack:1579697130, flags:A *packet is freed (normally)*

场景 3:正常 2 次握手,然后立即发送 RST(不带 timestamp)

可以看到 RST 被 drop 然后 握手失败

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
***************** c22c8900,c22c8f00 ***************
[4912185.612533] [__ip_local_out ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA
[4912185.612535] [nf_hook_slow ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA *ipv4 in chain: OUTPUT*
[4912185.612536] [nft_do_chain ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA *iptables table:, chain:OUTPUT*
[4912185.612538] [nft_do_chain ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA *iptables table:, chain:OUTPUT*
[4912185.612539] [ip_output ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA
[4912185.612541] [nf_hook_slow ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA *ipv4 in chain: POST_ROUTING*
[4912185.612542] [nft_do_chain ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA *iptables table:, chain:POSTROU*
[4912185.612544] [ip_finish_output ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA
[4912185.612546] [ip_finish_output2 ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA
[4912185.612547] [__dev_queue_xmit ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA
[4912185.612550] [dev_hard_start_xmit ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA *skb is successfully sent to the NIC driver*
[4912185.612552] [skb_clone ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA
[4912185.612555] [tpacket_rcv ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA
[4912185.612558] [consume_skb ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA *packet is freed (normally)*
[4912185.612573] [consume_skb ] TCP: 172.26.137.130:8000 -> 172.26.137.131:12345 seq:1983242893, ack:54243195, flags:SA *packet is freed (normally)*

***************** c22c8f00,c22c8800 ***************
[4912185.631719] [napi_gro_receive_entry] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631726] [dev_gro_receive ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631728] [__netif_receive_skb_core] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631730] [tpacket_rcv ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631734] [ip_rcv ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631736] [ip_rcv_core ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631737] [skb_clone ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631744] [nf_hook_slow ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R *ipv4 in chain: PRE_ROUTING*
[4912185.631746] [nft_do_chain ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R *iptables table:, chain:PREROUT*
[4912185.631748] [ip_rcv_finish ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631754] [ip_route_input_slow ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631759] [fib_validate_source ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631762] [ip_local_deliver ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631763] [nf_hook_slow ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R *ipv4 in chain: INPUT*
[4912185.631765] [nft_do_chain ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R *iptables table:, chain:INPUT*
[4912185.631767] [nft_do_chain ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R *iptables table:, chain:INPUT*
[4912185.631770] [ip_local_deliver_finish] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631772] [tcp_v4_rcv ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631777] [kfree_skb ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R *tcp_v4_rcv+0x65* *packet is dropped by kernel*
[4912185.631780] [packet_rcv ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R
[4912185.631783] [consume_skb ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:R *packet is freed (normally)*

***************** c22c8600,c22c8100 ***************
[4912185.648623] [napi_gro_receive_entry] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648630] [dev_gro_receive ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648632] [__netif_receive_skb_core] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648633] [tpacket_rcv ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648637] [ip_rcv ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648639] [ip_rcv_core ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648640] [skb_clone ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648643] [nf_hook_slow ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A *ipv4 in chain: PRE_ROUTING*
[4912185.648645] [nft_do_chain ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A *iptables table:, chain:PREROUT*
[4912185.648647] [ip_rcv_finish ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648650] [ip_route_input_slow ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648656] [fib_validate_source ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648659] [ip_local_deliver ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648660] [nf_hook_slow ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A *ipv4 in chain: INPUT*
[4912185.648662] [nft_do_chain ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A *iptables table:, chain:INPUT*
[4912185.648664] [nft_do_chain ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A *iptables table:, chain:INPUT*
[4912185.648667] [ip_local_deliver_finish] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648672] [tcp_v4_rcv ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648677] [__inet_lookup_listener] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648679] [tcp_filter ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648681] [tcp_v4_do_rcv ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648685] [tcp_rcv_state_process] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A *TCP socket state has changed*
[4912185.648689] [tcp_v4_send_reset ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648739] [kfree_skb ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A *tcp_v4_do_rcv+0x6c* *packet is dropped by kernel*
[4912185.648741] [packet_rcv ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A
[4912185.648743] [consume_skb ] TCP: 172.26.137.131:12345 -> 172.26.137.130:8000 seq:54243195, ack:1983242894, flags:A *packet is freed (normally)*

上面三个场景都没能重现问题,所以继续构造场景 4

场景 4 timestamp 不递增

保证 tcp options 里面有 timestamp,且不递增,这时终于重现了连接残留:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
//这表示有 tcp 连接残留在 8000 端口上,而实际上期望连接要因为有 RST 而被释放
#netstat -ant |grep 8000
tcp 4 0 0.0.0.0:8000 0.0.0.0:* LISTEN
tcp 0 0 172.26.137.130:8000 172.26.137.131:19723 ESTABLISHED


//此时对应的抓包,注意这里 Server 端也没有回复 RST,前面 3 个场景 Server 端 8000 都会回 RST,从而不会残留
//连接残留:ts 为 0,RST 被忽略,导致连接残留
16:09:55.669693 IP 172.26.137.131.19723 > 172.26.137.130.8000: Flags [S], seq 12345, win 8192, options [TS val 1732608595 ecr 0,eol], length 0
16:09:55.669708 IP 172.26.137.130.8000 > 172.26.137.131.19723: Flags [S.], seq 3736478060, ack 12346, win 65160, options [mss 1460,nop,nop,TS val 2982589154 ecr 1732608595], length 0
16:09:55.687943 IP 172.26.137.131.19723 > 172.26.137.130.8000: Flags [R], seq 12346, win 8192, options [TS val 0 ecr 2982589154,eol], length 0
16:09:55.703896 IP 172.26.137.131.19723 > 172.26.137.130.8000: Flags [.], ack 1, win 8192, options [TS val 1732608595 ecr 0,eol], length 0

//连接残留:ts 没递增,RST 被忽略,导致连接残留
17:18:26.739344 IP 172.26.137.131.59541 > 172.26.137.130.8000: Flags [S], seq 12345, win 8192, options [TS val 1732612706 ecr 0,eol], length 0
17:18:26.739358 IP 172.26.137.130.8000 > 172.26.137.131.59541: Flags [S.], seq 3510510105, ack 12346, win 65160, options [mss 1460,nop,nop,TS val 2986700224 ecr 1732612706], length 0
17:18:26.756574 IP 172.26.137.131.59541 > 172.26.137.130.8000: Flags [R], seq 12346, win 8192, options [mss 1460,TS val 1732611916 ecr 0,eol], length 0
17:18:26.870569 IP 172.26.137.131.59541 > 172.26.137.130.8000: Flags [.], ack 1, win 8192, options [TS val 1732612706 ecr 0,eol], length 0

不会导致连接残留的情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
//连接不残留, ts 递增
16:24:33.516952 IP 172.26.137.131.19544 > 172.26.137.130.8000: Flags [S], seq 12345, win 8192, options [TS val 1732609473 ecr 0,eol], length 0
16:24:33.516967 IP 172.26.137.130.8000 > 172.26.137.131.19544: Flags [S.], seq 1834771950, ack 12346, win 65160, options [mss 1460,nop,nop,TS val 2983467001 ecr 1732609473], length 0
16:24:33.539178 IP 172.26.137.131.19544 > 172.26.137.130.8000: Flags [R], seq 12346, win 8192, options [TS val 1732609473 ecr 0,eol], length 0
16:24:33.556153 IP 172.26.137.131.19544 > 172.26.137.130.8000: Flags [.], ack 1, win 8192, options [TS val 1732609473 ecr 0,eol], length 0
16:24:33.556164 IP 172.26.137.130.8000 > 172.26.137.131.19544: Flags [R], seq 1834771951, win 0, length 0

//连接不残留, 有 options 但是没有 ts
17:05:16.217333 IP 172.26.137.131.22567 > 172.26.137.130.8000: Flags [S], seq 12345, win 8192, options [TS val 1732611916 ecr 0,eol], length 0
17:05:16.217351 IP 172.26.137.130.8000 > 172.26.137.131.22567: Flags [S.], seq 3503286934, ack 12346, win 65160, options [mss 1460,nop,nop,TS val 2985909702 ecr 1732611916], length 0
17:05:16.229589 IP 172.26.137.131.22567 > 172.26.137.130.8000: Flags [R], seq 12346, win 8192, options [mss 1460], length 0
17:05:16.346564 IP 172.26.137.131.22567 > 172.26.137.130.8000: Flags [.], ack 1, win 8192, options [TS val 1732611916 ecr 0,eol], length 0
17:05:16.346578 IP 172.26.137.130.8000 > 172.26.137.131.22567: Flags [R], seq 3503286935, win 0, length 0

//连接不残留,options 为 null
16:29:38.618811 IP 172.26.137.131.33190 > 172.26.137.130.8000: Flags [S], seq 12345, win 8192, options [TS val 1732609778 ecr 0,eol], length 0
16:29:38.618824 IP 172.26.137.130.8000 > 172.26.137.131.33190: Flags [S.], seq 1867663284, ack 12346, win 65160, options [mss 1460,nop,nop,TS val 2983772103 ecr 1732609778], length 0
16:29:38.647039 IP 172.26.137.131.33190 > 172.26.137.130.8000: Flags [R], seq 12346, win 8192, length 0
16:29:38.670061 IP 172.26.137.131.33190 > 172.26.137.130.8000: Flags [.], ack 1, win 8192, options [TS val 1732609778 ecr 0,eol], length 0
16:29:38.670073 IP 172.26.137.130.8000 > 172.26.137.131.33190: Flags [R], seq 1867663285, win 0, length 0

//连接不残留, 有 options ,但 ts 为 nop
17:37:37.476343 IP 172.26.137.131.12345 > 172.26.137.130.8000: Flags [S], seq 2331525453, win 8192, options [mss 1460,nop,nop,TS val 1732613857 ecr 0], length 0
17:37:37.476460 IP 172.26.137.130.8000 > 172.26.137.131.12345: Flags [S.], seq 230155727, ack 2331525454, win 65160, options [mss 1460,nop,nop,TS val 2987850961 ecr 1732613857], length 0
17:37:37.494579 IP 172.26.137.131.12345 > 172.26.137.130.8000: Flags [R], seq 2331525454, win 8192, options [nop,nop,eol], length 0
17:37:37.511431 IP 172.26.137.131.12345 > 172.26.137.130.8000: Flags [.], ack 1, win 8192, options [nop,nop,TS val 1732613857 ecr 0], length 0
17:37:37.511546 IP 172.26.137.130.8000 > 172.26.137.131.12345: Flags [R], seq 230155728, win 0, length 0
17:37:37.526369 IP 172.26.137.131.12345 > 172.26.137.130.8000: Flags [R], seq 2331525454, win 8192, length 0

结论

最终重现的必要条件:内核在三次握手阶段(TCP_NEW_SYN_RECV),收到的RST 包里有 timestamp 且不递增 就会丢弃 RST

注意:

  • 如果 RST 的 seq 不递增也会导致连接残留,这属于 seq 回绕了 // /proc/net/netstat 中没找到 有哪个指标对应的监控
  • 要区分 timestamp 没有和 timestamp 为 0 的情况,为 0 表示有,大概率回绕了//场景 1-3 都忽略了这个问题
  • options=[(‘NOP’, None), (‘NOP’, None)]) 表示没有 timestamp,也不能重现问题
  • 以上案例 2/3/4 场景下 nettrace 看到的 RST 都被 drop 了,但是不妨碍连接的释放 //这个还需要分析为什么连接 RST 起作用了但是还是会 drop RST 包
  • 如果出现连接残留,也会导致全连接队列增大直到溢出
  • 三次握手成功后的通信阶段(established),此时只校验 RST 的 seq 有没有回绕,不校验 timestamp,这样连接能正确释放
1
2
3
4
5
6
//三次握手成功,如果 RST 带的 timestamp 不递增也会正确触发释放连接,也就是 ESTABLISHED  时只校验 RST 的 seq 有没有回绕,不校验 timestamp
//如下抓包的连接被正确释放了,所以 LVS 会用这个逻辑来释放连接,但是一旦乱序就嗝屁了
12:19:58.588218 IP 172.26.137.131.1406 > 172.26.137.130.8000: Flags [S], seq 2800159571, win 8192, options [mss 1460,nop,nop,TS val 1732681198 ecr 0], length 0
12:19:58.588233 IP 172.26.137.130.8000 > 172.26.137.131.1406: Flags [S.], seq 3011503126, ack 2800159572, win 65160, options [mss 1460,nop,nop,TS val 3055192072 ecr 1732681198], length 0
12:19:58.606594 IP 172.26.137.131.1406 > 172.26.137.130.8000: Flags [.], ack 1, win 8192, options [nop,nop,TS val 1732681198 ecr 0], length 0
12:19:58.624392 IP 172.26.137.131.1406 > 172.26.137.130.8000: Flags [R], seq 2800159572, win 8192, options [nop,nop,TS val 0 ecr 0], length 0

对应的内核 commit

Server 在握手的第三阶段(TCP_NEW_SYN_RECV),等待对端进行握手的第三步回 ACK时候,如果收到RST 内核会对报文进行PAWS校验,如果 RST 带的 timestamp(TVal) 不递增就会因为通不过 PAWS 校验而被扔掉

问题引入:https://github.com/torvalds/linux/commit/7faee5c0d514162853a343d93e4a0b6bb8bfec21 这个 commit 去掉了TCP_SKB_CB(skb)->when = tcp_time_stamp,导致 3.18 的内核版本linger close主动发送的 RST 中 ts_val为0

问题修复:修复的commit在 675ee231d960af2af3606b4480324e26797eb010,直到 4.10 才合并进内核

监控

对应这种握手阶段连接建立如何监控呢?

从内核代码 net/ipv4/tcp_minisocks.c/tcp_check_req 函数会对报文调用 tcp_paws_reject 函数进行 paws_reject 检测,tcp_paws_reject 如果返回值为true,则 tcp_check_req 返回NULL,并且记录 LINUX_MIB_PAWSESTABREJECTED 计数

可以观察 /proc/net/netstat 中的监控指标:PAWSEstab

1
2
3
4
5
//内核中的指标
SNMP_MIB_ITEM("PAWSEstab", LINUX_MIB_PAWSESTABREJECTED)

//尝试了 5 次 RST的 timestamp 不递增导致的残留,监控到这个值每次变化累加 1
TcpExt:PAWSEstab 1 -> 1 -> 1 -> 1 -> 1

虽然三次握手没有完成,但是在服务端连接已经是 ESTABLISHED,所以这里的统计指标还是 PAWSEstab,可以通过 netstat -s 来查看:

1
2
#netstat -s |grep -E -i "timestamp|paws"
71 packets rejected in established connections because of timestamp //无论是三次握手阶段的 RST 还是握手成功后的请求只要 timestamp 不递增就会 drop

这个指标对应在 netstat 源码(net-tools) 中的解释:

1
2
{"PAWSEstab", N_("%llu packets rejected in established connections because of timestamp"), opt_number},
{"PAWSPassive", N_("%llu passive connections rejected because of time stamp"), opt_number},

总结

星球里之前也写过 scapy 的入门以及使用案例: scapy 重现网络问题真香

就像学英语的时候要精读,分析 case 也需要深挖,可以挖上一到两周,不要每天假学习(似乎啥都看了,当时啥都懂,过几个月啥都不懂)

掌握技能比掌握知识点和问题的原因更重要

nettrace 也真的很好用/很好玩,可以帮你学到很多内核知识

参考资料

https://cloud.tencent.com/developer/article/2210423

为什么你的 SYN 包被丢 net.ipv4.tcp_tw_recycle

从一个fin 卡顿问题到 scapy 的使用