mysql: [Warning] Using a password on the command line interface can be insecure. Variable_name Value innodb_buffer_pool_size 17179869184 innodb_flush_log_at_trx_commit 2
mysql: [Warning] Using a password on the command line interface can be insecure. Variable_name Value innodb_buffer_pool_size 17179869184 innodb_flush_log_at_trx_commit 1
mysql: [Warning] Using a password on the command line interface can be insecure. Variable_name Value innodb_buffer_pool_size 17179869184 innodb_flush_log_at_trx_commit 2
mysql: [Warning] Using a password on the command line interface can be insecure. Variable_name Value innodb_buffer_pool_size 17179869184 innodb_flush_log_at_trx_commit 2
mysql: [Warning] Using a password on the command line interface can be insecure. Variable_name Value innodb_buffer_pool_size 17179869184 innodb_flush_log_at_trx_commit 1
mysql: [Warning] Using a password on the command line interface can be insecure. Variable_name Value innodb_buffer_pool_size 17179869184 innodb_flush_log_at_trx_commit 1
mysql: [Warning] Using a password on the command line interface can be insecure. Variable_name Value innodb_buffer_pool_size 17179869184 innodb_flush_log_at_trx_commit 1
mysql: [Warning] Using a password on the command line interface can be insecure. Variable_name Value innodb_buffer_pool_size 17179869184 innodb_flush_log_at_trx_commit 2
mysql: [Warning] Using a password on the command line interface can be insecure. Variable_name Value innodb_buffer_pool_size 17179869184 innodb_flush_log_at_trx_commit 2
mysql: [Warning] Using a password on the command line interface can be insecure. Variable_name Value innodb_buffer_pool_size 17179869184 innodb_flush_log_at_trx_commit 2
// 将 50,000 个参数拆分为 50 批,每批 1,000 个 intbatchSize=1000; for (inti=0; i < ids.size(); i += batchSize) { List<Long> batch = ids.subList(i, Math.min(i + batchSize, ids.size())); // 执行查询 }
🗃️ 方案 4: 临时表方案
1 2 3 4
CREATE TEMPORARY TABLE temp_ids (id BIGINT); INSERT INTO temp_ids VALUES (?), (?), ...; SELECT*FROM large_table WHERE id IN (SELECT id FROM temp_ids); DROP TEMPORARY TABLE temp_ids;
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Application was streaming results when the connection failed. Consider raising value of 'net_write_timeout' on the server. at sun.reflect.GeneratedConstructorAccessor150.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:989) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3749) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3649) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4090) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:972) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:2123) at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:374) at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:354) at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:155) at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:6726) at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:865) at com.alibaba.druid.pool.DruidPooledResultSet.close(DruidPooledResultSet.java:86)
The number of seconds to wait for a block to be written to a connection before aborting the write. 只针对执行查询中的等待超时,网络不好,tcp buffer满了(应用迟迟不读走数据)等容易导致mysql server端报net_write_timeout错误,指的是mysql server hang在那里长时间无法发送查询结果。
如下图红框所示的地方可以看到MySQL Server 传着传着居然带了个 fin 包在里面,表示MySQL Server要断开连接了,无奈Client只能也发送quit 断开连接。红框告诉我们一个无比有力的证据MySQL Server 在不应该断开的地方断开了连接,问题在 MySQL Server 端
Application was streaming results when the connection failed. Consider raising value of ‘net_write_timeout’ on the server. - com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Application was streaming results when the connection failed. Consider raising value of ‘net_write_timeout’ on the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Application was streaming results when the connection failed. Consider raising value of 'net_write_timeout' on the server. at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:990) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3559) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3459) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3900) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:873) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1996) at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:374) at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:354) at com.mysql.jdbc.ResultSetImpl.next(ResultSetImpl.java:6312) at Test.main(Test.java:38) Caused by: java.io.EOFException: Can not read response from server. Expected to read 8 bytes, read 3 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3011) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3519) ... 8 more
JDBC 驱动对这个错误有如下提示(坑人):
Application was streaming results when the connection failed. Consider raising value of ‘net_write_timeout’ on the server. - com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Application was streaming results when the connection failed. Consider raising value of ‘net_write_timeout’ on the server.
实验中的一些说明:
netTimeoutForStreamingResults=1 表示设置 net_write_timeout 为 1 秒,客户端会发送 set net_write_timeout=1 给数据库
2024-11-28T14:33:03.447397Z 12 [Note] Aborted connection 12 to db: 'test' user: 'root' host: '172.26.137.130' (Got timeout writing communication packets)
此时客户端还慢悠悠地读,RDS 没有回任何错误信息给客户端,客户端读完所有 Response 然后直接读到连接断开就报 Consider raising value of ‘net_write_timeout’ on the server 了,如果客户端读的慢,比如要 10 分钟实际连接在 RDS 上 10 分钟前就进入 fin 了,但是 10 分钟后客户端才报错
long start = System.currentTimeMillis(); ResultSet rs = stmt.executeQuery(sql); int count=0; while (rs.next()) { System.out.println("id:"+rs.getInt("id")+" count:"+count); count++; if(count<3) //1 秒后数据库端连接就已经关闭了,但是因为客户端读得慢,需要不 sleep 后才能读到 fin 然后报错,所以报错可以比实际晚很久 Thread.sleep(1500); } rs.close(); stmt.close(); Thread.sleep(Long.valueOf(interval)); break; } conn.close(); } catch (SQLException e) { e.printStackTrace(); } } }
Consider raising value of ‘net_write_timeout’ 这个报错数据库端不会返回任何错误码给客户端,只是发 fin 断开连接,对客户端来说这条连接是 net_write_timeout 超时了 还是 被kill(或者其他原因) 是没法区分的,所以不管什么原因,只要连接异常 MySQL JDBC Driver 就抛 net_write_timeout 错误
可以将 netTimeoutForStreamingResults 设为 0 或者 100,然后在中途 kill 掉 MySQL 上的 SQL,你也会在客户端看到同样的错误, kill SQL 是在 MySQL 的报错日志中都是同样的:
1
2024-11-28T07:33:12.967012Z 23 [Note] Aborted connection 23 to db: 'test' user: 'root' host: '172.26.137.130' (Got an error writing communication packets)
所以你看一旦客户端出现这个异常堆栈,除了抓包似乎没什么好办法,其实抓包也只能抓到数据库主动发了 fin 什么原因还是不知道,我恨这个没有错误码一统江湖的报错
net_write_timeout 后 RDS 直接发 fin(有时 fin 前面还有一堆 response 包也在排队),然后 rds 日志先报错:2024-11-28T06:33:03.447397Z 12 [Note] Aborted connection 12 to db: ‘test’ user: ‘root’ host: ‘172.26.137.130’ (Got timeout writing communication packets)
客户端慢悠悠地读,RDS 没有传任何错误信息给客户端,客户端读完所有 response 然后直接读到连接断开就报 Consider raising value of ‘net_write_timeout’ on the server 了,如果客户端读的慢,比如要 10 分钟实际连接在 RDS 上 10 分钟前就进入 fin 了,但是 10 分钟后客户端才报错
#netstat -anto |grep -E "Recv|33864|3001|33077" Proto Recv-Q Send-Q Local Address Foreign Address State Timer tcp 0 248 127.0.0.1:33864 127.0.0.1:3001 ESTABLISHED probe (33.48/0/8) tcp6 0 11 127.0.0.1:3307 127.0.0.1:33864 ESTABLISHED on (49.03/13/0)
直到 900多秒后 OS 重试了15次发现都失败,于是向业务/Sysbench 返回连接异常,触发业务/Sysbench 释放异常连接重建新连接,新连接指向了新的 Master 3306,业务恢复正常
1 2 3 4 5 6 7
[ 957s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s 0.00 reconn/s: 0.00 DEBUG: Ignoring error 2013 Lost connection to MySQL server during query, DEBUG: Reconnecting DEBUG: Reconnected [ 958s ] thds: 1 tps: 53.00 qps: 950.97 (r/w/o: 741.98/208.99/0.00) lat (ms,95%): 30.26 err/s 0.00 reconn/s: 1.00 [ 959s ] thds: 1 tps: 64.00 qps: 1154.03 (r/w/o: 896.02/258.01/0.00) lat (ms,95%): 22.69 err/s 0.00 reconn/s: 0.00 [ 960s ] thds: 1 tps: 66.00 qps: 1184.93 (r/w/o: 923.94/260.98/0.00) lat (ms,95%): 25.28 err/s 0.00 reconn/s: 0.00
A 长连接 访问B 服务,B服务到A网络不通,假如B发生HA,一般会先Reset/断开B上所有连接(比如 MySQL 会去kill 所有processlist;比如重启MySQL——假如这里的B是MySQL),但是因为网络不通这里的reset、fin网络包都无法到达A,所以B是无法兜底这个异常场景, A无法感知B不可用了,会使用旧连接大约15分钟
最可怕的是 B 服务不响应,B所在的OS 还在响应,那么在A的视角 网络是正常的,这时只能A自己来通过超时兜底
tcp_retries1 - INTEGER This value influences the time, after which TCP decides, that something is wrong due to unacknowledged RTO retransmissions, and reports this suspicion to the network layer. See tcp_retries2 for more details.
RFC 1122 recommends at least 3 retransmissions, which is the default.
tcp_retries2 - INTEGER This value influences the timeout of an alive TCP connection, when RTO retransmissions remain unacknowledged. Given a value of N, a hypothetical TCP connection following exponential backoff with an initial RTO of TCP_RTO_MIN would retransmit N times before killing the connection at the (N+1)th RTO.
The default value of 15 yields a hypothetical timeout of 924.6 seconds and is a lower bound for the effective timeout. TCP will effectively time out at the first RTO which exceeds the hypothetical timeout.
RFC 1122 recommends at least 100 seconds for the timeout, which corresponds to a value of at least 8.
#./mlc Intel(R) Memory Latency Checker - v3.9 Measuring idle latencies (in ns)... Numa node Numa node 0 0 145.8 //多次测试稳定都是145纳秒
Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 110598.7 3:1 Reads-Writes : 93408.5 2:1 Reads-Writes : 89249.5 1:1 Reads-Writes : 64137.3 Stream-triad like: 77310.4
Measuring Memory Bandwidths between nodes within system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Numa node Numa node 0 0 110598.4
Measuring Loaded Latencies for the system Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Inject Latency Bandwidth Delay (ns) MB/sec ========================== 00000 506.00 111483.5 00002 505.74 112576.9 00008 505.87 112644.3 00015 508.96 112643.6 00050 574.36 112701.5
当两个参数都为 on 时的mlc 测试结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
#./mlc Intel(R) Memory Latency Checker - v3.9 Measuring idle latencies (in ns)... Numa node Numa node 0 1 0 81.6 145.9 1 144.9 81.2
Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 227204.2 3:1 Reads-Writes : 212432.5 2:1 Reads-Writes : 210423.3 1:1 Reads-Writes : 196677.2 Stream-triad like: 189691.4