<html>
<head>
<base href="https://bugzilla.netfilter.org/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Flowtable: Flows exiting OFFLOAD State being assigned value of nf_conntrack_tcp_timeout_unacknowledged"
href="https://bugzilla.netfilter.org/show_bug.cgi?id=1743">1743</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Flowtable: Flows exiting OFFLOAD State being assigned value of nf_conntrack_tcp_timeout_unacknowledged
</td>
</tr>
<tr>
<th>Product</th>
<td>nftables
</td>
</tr>
<tr>
<th>Version</th>
<td>1.0.x
</td>
</tr>
<tr>
<th>Hardware</th>
<td>x86_64
</td>
</tr>
<tr>
<th>OS</th>
<td>other
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P5
</td>
</tr>
<tr>
<th>Component</th>
<td>kernel
</td>
</tr>
<tr>
<th>Assignee</th>
<td>pablo@netfilter.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>tim@muppetz.com
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=739" name="attach_739" title="Session where Conntrack Changed to 300">attachment 739</a> <a href="attachment.cgi?id=739&action=edit" title="Session where Conntrack Changed to 300">[details]</a></span>
Session where Conntrack Changed to 300
Kernel: 6.6.21
I have a TCP flow between an Android Phone and Google's Firebase Cloud
Messaging (FCM). FCM uses TCP port 5228 and is a very low traffic connection,
it can be anywhere up to 28 minutes before a keepalive packet goes via it. It
is used for push messaging (and probably a lot of other things too)
Firstly, I have Flowtable Disabled: When I watch the FCM flow in conntrack as
such:
watch -n 1 "sudo conntrack -L -p TCP -s 192.168.0.128 -d 142.251.12.188 --dport
5228"
I will quite often see the flow change from a keepalive time of ~432000 down to
300. To determine if this was nf_conntrack_tcp_timeout_unacknowledged or
nf_conntrack_tcp_timeout_max_retrans I altered both sysctls entries and was
able to determine if I changed nf_conntrack_tcp_timeout_unacknowledged to 400,
that when I see the keepalive time change, it changes to 400 seconds.
So my first question that I don't understand is, why is a flow in the
Established state changing to the unacknowledged timeout? It only changes for
a second though, then I assume another packet comes in and the time jumps back
to 5 days.
This to be appears odd, but probably this is normal behaviour and I just don't
understand it.
[I tested with OpenWRT with kernel 5.15.150 (also using nftables) and I see it
do the same thing, conntrack timeout dropping to 300 for a second before
bouncing back to $nf_conntrack_tcp_timeout_established so this must be expected
behaviour.]
My real issue comes about when I enable Flow Offload. With the same sort of
packet flow, I will see the following:
The flow enters the OFFLOAD state in conntrack. When it comes out of OFFLOAD
it will be in one of 3 states:
A timeout of ~432000 (Seems odd, I expect ~86400)
A timeout of ~86400 (This is what I expect)
A timeout of 300 ($nf_conntrack_tcp_timeout_unacknowledged) minus anywhere up
to 30 seconds. So values like 260, 274, 283 are all values I've seen.
A major problem comes about when it enters the table with the
nf_conntrack_tcp_timeout_unacknowledged timeout of ~300. Because there is so
little traffic on this session, it will often age out and leave the conntrack
table. When this happens, the FCM session dies and Android devices on the
network no longer receive push messages until they are woken up, realise the
session is dead and establish a new one.
Attached is a tcpdump of a Google FCM session where I saw the timeout drop to
$nf_conntrack_tcp_timeout_unacknowledged at approx packet 23.
I have tried watching conntrack with -E but I see no events for this session
being generated when the keepalive times are changing.
Other details:
This is happening on a Vyos 1.4.0-epa2 release Router.
My WAN interface is a PPPoE interface, my LAN Interface is an Ethernet
interface (virtio, the router is virtualised)
There are two patches in the Vyos kernel that are "non-standard" - I have
looked at them and I can't see how they could interfere with Offload - here is
the link to them:
<a href="https://github.com/vyos/vyos-build/tree/sagitta/packages/linux-kernel/patches/kernel">https://github.com/vyos/vyos-build/tree/sagitta/packages/linux-kernel/patches/kernel</a>
Please let me know what other details I can provide that might help locate the
issue.
Thank you very much.
Tim</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are watching all bug changes.</li>
</ul>
</body>
</html>