[Bug 511] New: Premature ip_conntrack timer expiry on 3+ window size advertisements

bugzilla-daemon at bugzilla.netfilter.org bugzilla-daemon at bugzilla.netfilter.org
Fri Sep 15 05:57:31 CEST 2006


https://bugzilla.netfilter.org/bugzilla/show_bug.cgi?id=511

           Summary: Premature ip_conntrack timer expiry on 3+ window size
                    advertisements
           Product: netfilter/iptables
           Version: linux-2.6.x
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: minor
          Priority: P2
         Component: ip_conntrack
        AssignedTo: laforge at netfilter.org
        ReportedBy: georgeh at anstat.com.au


The linux firewall loses track of valid TCP connections, due to premature
expiry of the ip_conntrack timer.

This can result in one end of the TCP connection being kept open
indefinitely, waiting for more data. This data is now being dropped by
the firewall, due to the ip_conntrack timer having expired.

Example: A tomcat server which is supplying data to a front-end apache
server typically has a limit to the number of "threads" it allows.
These threads are related to the number of open TCP/IP connections.
TCP/IP connections affected by this problem are kept open, and accumulate
over time. Eventually the tomcat server reaches the limit of allowed threads, and
stops serving requests from the apache server.

This problem is caused by insufficient criteria used for "dead-peer" detection.
The current algorithm is: if 3 of the "same" packets are seen in one
direction, with no traffic in the other direction, the receiver is treated
as a "dead-peer" and the ip_conntrack timer is reset to 5 minutes.

The criteria for being the "same" is based on the TCP sequence number and
acknowledgement number, but does not include the TCP window size advertisement.

Sample occurance :

  Extract from /proc/net/ip_conntrack

    08:46:51.811 431999 ESTABLISHED src=192.168.81.245 dst=192.168.49.30
sport=48610 dport=8009
    08:47:01.532 431998 ESTABLISHED src=192.168.81.245 dst=192.168.49.30
sport=48610 dport=8009
    08:47:01.920 000299 ESTABLISHED src=192.168.81.245 dst=192.168.49.30
sport=48610 dport=8009
    08:47:02.925 000298 ESTABLISHED src=192.168.81.245 dst=192.168.49.30
sport=48610 dport=8009
    08:47:03.919 000297 ESTABLISHED src=192.168.81.245 dst=192.168.49.30
sport=48610 dport=8009
    ...
    08:47:56.922 431999 ESTABLISHED src=192.168.81.245 dst=192.168.49.30
sport=48610 dport=8009

  Extract from tcpdump
  
    08:46:59.997669 IP tomcat-server.8009 > apache-server.48610: .
347295:348743(1448) ack 889 win 5896 <nop,nop,timestamp 3345910565 508393589>
    08:46:59.997681 IP tomcat-server.8009 > apache-server.48610: .
348743:350191(1448) ack 889 win 5896 <nop,nop,timestamp 3345910565 508393589>
    08:46:59.997693 IP tomcat-server.8009 > apache-server.48610: .
350191:351639(1448) ack 889 win 5896 <nop,nop,timestamp 3345910565 508393589>
    08:46:59.997702 IP tomcat-server.8009 > apache-server.48610: P
351639:352129(490) ack 889 win 5896 <nop,nop,timestamp 3345910565 508393589>
    08:47:00.035237 IP apache-server.48610 > tomcat-server.8009: . ack 352129
win 2406 <nop,nop,timestamp 508393593 3345910565>
    08:47:00.162163 IP apache-server.48610 > tomcat-server.8009: . ack 352129
win 10136 <nop,nop,timestamp 508393605 3345910565>
    08:47:00.515958 IP apache-server.48610 > tomcat-server.8009: . ack 352129
win 24616 <nop,nop,timestamp 508393641 3345910565>
    08:47:01.915395 IP apache-server.48610 > tomcat-server.8009: . ack 352129
win 56472 <nop,nop,timestamp 508393780 3345910565>
    08:47:56.910459 IP apache-server.48610 > tomcat-server.8009: P 889:1636(747)
ack 352129 win 63712 <nop,nop,timestamp 508399279 3345910565>
    08:47:56.910484 IP tomcat-server.8009 > apache-server.48610: . ack 1636 win
5896 <nop,nop,timestamp 3345967478 508399279>
    08:47:57.464866 IP tomcat-server.8009 > apache-server.48610: P
352129:352216(87) ack 1636 win 5896 <nop,nop,timestamp 3345968032 508399279>



This problem occurs frequently in real-life, but it is difficult to reproduce
"at will".

I suspect another symptom of this problem are dropped "FIN" packets on a firewall.
Once the ip_conntrack timer has expired, and the client or server try to close
the connection (without more data being transfered from the client), the FIN
packet it classified as "INVALID", and dropped. 

To show any "lost" tcp connections, use:

netstat -na | gawk '/ESTAB/ { local=$4; gsub(".*:","",local) ; remote=$5 ;
gsub(".*:","",remote); result=system("grep -q port=" local ".\\*port=" remote "
 /proc/net/ip_conntrack"); print local"/"remote " = " ( result ? "not found" :
"found" );}'

Note that the busier the apache and tomcat servers get, the less likely they are
to suffer from this problem

A poosible patch would as follows, but this has not been tested due to the
problem being difficult to reproduce under controlled conditions
(any suggestions are welcome).

--- /usr/src/linux-2.6.17.13/include/linux/netfilter/nf_conntrack_tcp.h.orig   
2006-09-09 13:23:25.000000000 +1000
+++ /usr/src/linux-2.6.17.13/include/linux/netfilter/nf_conntrack_tcp.h
2006-09-15 11:18:28.000000000 +1000
@@ -49,6 +49,7 @@
        u_int32_t       last_seq;       /* Last sequence number seen in dir */
        u_int32_t       last_ack;       /* Last sequence number seen in opposite
dir */
        u_int32_t       last_end;       /* Last seq + len */
+       u_int16_t       last_win;       /* Last window advertisement seen in dir */
 };

 #endif /* __KERNEL__ */
--- /usr/src/linux-2.6.17.13/net/ipv4/netfilter/ip_conntrack_proto_tcp.c.orig  
2006-09-09 13:23:25.000000000 +1000
+++ /usr/src/linux-2.6.17.13/net/ipv4/netfilter/ip_conntrack_proto_tcp.c       
2006-09-15 11:16:07.000000000 +1000
@@ -732,12 +732,14 @@
                        if (state->last_dir == dir
                            && state->last_seq == seq
                            && state->last_ack == ack
+                           && state->last_win == win
                            && state->last_end == end)
                                state->retrans++;
                        else {
                                state->last_dir = dir;
                                state->last_seq = seq;
                                state->last_ack = ack;
+                               state->last_win = win;
                                state->last_end = end;
                                state->retrans = 0;
                        }

-- 
Configure bugmail: https://bugzilla.netfilter.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the netfilter-buglog mailing list