[Bug 1766] New: nfqueue randomly drops packets with same tuple

bugzilla-daemon at netfilter.org bugzilla-daemon at netfilter.org
Mon Aug 26 11:08:49 CEST 2024


https://bugzilla.netfilter.org/show_bug.cgi?id=1766

            Bug ID: 1766
           Summary: nfqueue randomly drops packets with same tuple
           Product: netfilter/iptables
           Version: unspecified
          Hardware: x86_64
                OS: All
            Status: NEW
          Severity: major
          Priority: P5
         Component: netfilter hooks
          Assignee: netfilter-buglog at lists.netfilter.org
          Reporter: antonio.ojea.garcia at gmail.com

I was puzzled by this problem for a long time, first reported in
https://github.com/kubernetes-sigs/kube-network-policies/issues/12 and
now reported in https://github.com/kubernetes-sigs/kind/issues/3713

It seems the same symptom as described here
https://www.spinics.net/lists/netfilter/msg58296.html but those seems to be
fixed back in the day.

I was able to narrow down the scenario, I will try to
translate the kubernetes constructs to namespaces and node to describe
better the scenario.

2 nodes: N1 and N2

N1 contains two containers:
  - client C1 (10.244.1.3)
  - DNS server D1 (10.244.1.5)
N2 containers the second DNS server D2 (10.244.2.4)

One rule to send the packets to nfqueue in postrouting, but it
happened in other hooks before. We can assume the set matches the
packet and the nfqueue userspace always accept the packet

>       chain postrouting {
>                type filter hook postrouting priority srcnat - 5; policy accept;
>                icmpv6 type { nd-neighbor-solicit, nd-neighbor-advert } accept
>                meta skuid 0 accept
>                ct state established,related accept
>                ip saddr @podips-v4 queue flags bypass to 100 comment "process IPv4 traffic with network policy enforcement"
>                ip daddr @podips-v4 queue flags bypass to 100 comment "process IPv4 traffic with network policy enforcement"
>                ip6 saddr @podips-v6 queue flags bypass to 100 comment "process IPv6 traffic with network policy enforcement"
>                ip6 daddr @podips-v6 queue flags bypass to 100 comment "process IPv6 traffic with network policy enforcement"
>        }


The containerd DNS servers are abstracted via DNAT with IP 10.96.0.10


>  meta l4proto udp ip daddr 10.96.0.10  udp dport 53 counter packets 0 bytes 0 jump KUBE-SVC-TCOU7JCQXEZGVUNU

>       chain KUBE-SVC-TCOU7JCQXEZGVUNU {
>                meta l4proto udp ip saddr != 10.244.0.0/16 ip daddr 10.96.0.10  udp dport 53 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
>                 meta random & 2147483647 < 1073741824 counter packets 38 bytes 2280 jump KUBE-SEP-CEYPGFB7VCORONY3
>                 counter packets 32 bytes 1920 jump KUBE-SEP-RJHMR3QLYGJVBWVL
>        }

>        chain KUBE-SEP-CEYPGFB7VCORONY3 {
>                ip saddr 10.244.1.5  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
>                meta l4proto udp   counter packets 38 bytes 2280 dnat to 10.244.1.5:53
>        }

C1 sends a DNS request to the virtual ip 10.96.0.10, because of the
happy-eyeball protocol, it sends two packets with the same tuple for
each record A and AAAA
The symptom is that one of the packets does not come back ... see
tcpdump trace, the packets go out at 22:49:07 but only the A answer
comes back, the client retries at 22:49:10 the AAAA and this times
come back

22:49:07.632846 vetha5c90841 In  IP (tos 0x0, ttl 64, id 52468, offset
0, flags [DF], proto UDP (17), length 60)
    10.244.1.3.48199 > 10.96.0.10.53: 60169+ A? www.google.com. (32)
22:49:07.632909 vetha5c90841 In  IP (tos 0x0, ttl 64, id 52469, offset
0, flags [DF], proto UDP (17), length 60)
    10.244.1.3.48199 > 10.96.0.10.53: 60459+ AAAA? www.google.com. (32)
22:49:07.633080 veth271ea3e0 Out IP (tos 0x0, ttl 63, id 52468, offset
0, flags [DF], proto UDP (17), length 60)
    10.244.1.3.48199 > 10.244.1.5.53: 60169+ A? www.google.com. (32)
22:49:07.633210 eth0  Out IP (tos 0x0, ttl 63, id 52469, offset 0,
flags [DF], proto UDP (17), length 60)
    10.244.1.3.48199 > 10.244.1.5.53: 60459+ AAAA? www.google.com. (32)
22:49:07.633352 eth0  In  IP (tos 0x0, ttl 62, id 52469, offset 0,
flags [DF], proto UDP (17), length 60)
    10.244.1.3.48199 > 10.244.1.5.53: 60459+ AAAA? www.google.com. (32)
22:49:07.653981 veth271ea3e0 In  IP (tos 0x0, ttl 64, id 28750, offset
0, flags [DF], proto UDP (17), length 240)
    10.244.1.5.53 > 10.244.1.3.48199: 60169 6/0/0 www.google.com. A
172.217.218.104, www.google.com. A 172.217.218.99, www.google.com. A
172.217.218.106, www.google.com. A 172.217.218.147, www.google.com. A
172.217.218.105, www.google.com. A 172.217.218.103 (212)
22:49:07.654012 vetha5c90841 Out IP (tos 0x0, ttl 63, id 28750, offset
0, flags [DF], proto UDP (17), length 240)
    10.96.0.10.53 > 10.244.1.3.48199: 60169 6/0/0 www.google.com. A
172.217.218.104, www.google.com. A 172.217.218.99, www.google.com. A
172.217.218.106, www.google.com. A 172.217.218.147, www.google.com. A
172.217.218.105, www.google.com. A 172.217.218.103 (212)
22:49:10.135710 vetha5c90841 In  IP (tos 0x0, ttl 64, id 52470, offset
0, flags [DF], proto UDP (17), length 60)
    10.244.1.3.48199 > 10.96.0.10.53: 60459+ AAAA? www.google.com. (32)
22:49:10.135740 veth271ea3e0 Out IP (tos 0x0, ttl 63, id 52470, offset
0, flags [DF], proto UDP (17), length 60)
    10.244.1.3.48199 > 10.244.1.5.53: 60459+ AAAA? www.google.com. (32)
22:49:10.136635 veth271ea3e0 In  IP (tos 0x0, ttl 64, id 28842, offset
0, flags [DF], proto UDP (17), length 228)
    10.244.1.5.53 > 10.244.1.3.48199: 60459 4/0/0 www.google.com. AAAA
2a00:1450:4013:c08::6a, www.google.com. AAAA 2a00:1450:4013:c08::67,
www.google.com. AAAA 2a00:1450:4013:c08::63, www.google.com. AAAA
2a00:1450:4013:c08::68 (200)
22:49:10.136669 vetha5c90841 Out IP (tos 0x0, ttl 63, id 28842, offset
0, flags [DF], proto UDP (17), length 228)
    10.96.0.10.53 > 10.244.1.3.48199: 60459 4/0/0 www.google.com. AAAA
2a00:1450:4013:c08::6a, www.google.com. AAAA 2a00:1450:4013:c08::67,
www.google.com. AAAA 2a00:1450:4013:c08::63, www.google.com. AAAA
2a00:1450:4013:c08::68 (200)
^C
23 packets captured


When tracing the packets I could observer two different reasons for
dropping, depending the destination of the DNAT rule, if is local it
is dropped by SKB_DROP_REASON_IP_RPFILTER if is in the other node it
is dropped by SKB_DROP_REASON_NEIGH_FAILED

0xffff9527290acb00      3 [<empty>(3178406)]
kfree_skb_reason(SKB_DROP_REASON_IP_RPFILTER)             1289
netns=4026533244 mark=0x0 iface=52(eth0) proto=0x0800 mtu=1500 len=60
10.244.1.3:48199->10.244.1.5:53(udp)

and

3:24:37.411 0xffff9534c19a3d00      7     [<empty>(0)]
kfree_skb_reason(SKB_DROP_REASON_NEIGH_FAILED) 1583194220087332
netns=4026533244 mark=0x0 iface=5(veth271ea3e0) proto=0x0800 mtu=1500
len=60 10.244.1.3:58611->10.244.2.4:53(udp)


If I enable martian logging net.ipv4.conf.all.log_martians=1 it also
reports these packets as martians when the destination is in the same
node

[1581593.716839] IPv4: martian source 10.244.1.5 from 10.244.1.3, on dev eth0
[1581593.723848] ll header: 00000000: 02 42 c0 a8 08 05 02 42 c0 a8 08 03 08 00

An interesting detail is that only seems to happen with DNS (2 packets
with the same tuple) and when there is more than 1 replica behind the
virtual IP (> 1 DNAT rules) . When there is only 1 DNAT rule it does
not happen, this is a fact.

Since the behavior is not deterministic but reproducible, it makes me
think that there is some kind of race where the nfqueue system is not
able to correctly handle the two packets with the same tuple on the
return path and it goes dropped  ...

I would like some help on two fronts:
- advices on the next steps to debugging further or how can I provide
more information that can help maintainer
- advices on how to workaround temporarily this problem

-- 
You are receiving this mail because:
You are watching all bug changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240826/46a406a1/attachment.html>


More information about the netfilter-buglog mailing list