IP fragments being dropped
Brian Kuschak
brian.kuschak@skystream.com
Sun, 25 Nov 2001 16:56:01 -0800
Ok, so I'm replying to my own post :-)
After many hours of pain, I found the cause of this problem, and verified
that it's fixed in the latest kernel. However as I mentioned before I'm
forced to use the 2.4.3 kernel. I intend to fix the problem like this, and
I would appreciate any comments from the netfilter authors on any possible
problems with this approach:
Multicast fragmented UDP packets to be forwarded are dropped by netfilter.
Unicast packets or unfragmented mcast packets are forwarded correctly. The
source of the problem is ip_mc_output(). It is called twice by
ipmr_forward_finish(), once for each fragment. It calls the NF_HOOK to loop
back packets for local users. This hook then calls ip_nat_out() which
defragments by calling ip_defreg(). The defrag functions hash the packets
into a queue based on saddr, daddr, proto, and id. This works fine for the
looped back packet - it is defraged and sent. The defrag queue is then
cleared for this packet. Problem is now ip_mc_output() is called with the
second fragment (to be forwarded). The NF_HOOK once again attempts to
defrag the packet, but the first fragment has already been deleted from the
fragment queue!! So ip_nat_out() steals the packet and shoves it in the
frag queue by calling ip_defrag() again. This was really the 2nd fragment
and should have completed the packet, but it can't, so ip_finish_output2()
is never called.
My fix is below. Basically ip_mc_output() is split into two parts. The 1st
part calls NF_HOOK once, and the second part then calls
ip_dev_loopback_xmit() and ip_finish_output2() directly. The packet is
therefore only defrag'ed (and re-fragmented) once and the 2nd function is
called twice to both forward and loop back the fragments. It seems to work
fine for me, but I want to make sure I didn't break anything else - we need
to ship this code soon!
Thanks for looking,
Brian
diff -c -r1.1.1.2 ip_output.c
*** ip_output.c 2001/02/26 19:05:22 1.1.1.2
--- ip_output.c 2001/11/26 00:08:33
***************
*** 193,212 ****
{
struct sock *sk = skb->sk;
struct rtable *rt = (struct rtable*)skb->dst;
- struct net_device *dev = rt->u.dst.dev;
-
- /*
- * If the indicated interface is up and running, send the
packet.
- */
- IP_INC_STATS(IpOutRequests);
- #ifdef CONFIG_IP_ROUTE_NAT
- if (rt->rt_flags & RTCF_NAT)
- ip_do_nat(skb);
- #endif
- skb->dev = dev;
- skb->protocol = __constant_htons(ETH_P_IP);
-
/*
* Multicasts are looped back for other local users
*/
--- 193,199 ----
***************
*** 226,234 ****
{
struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC);
if (newskb)
! NF_HOOK(PF_INET, NF_IP_POST_ROUTING, newskb,
NULL,
! newskb->dev,
! ip_dev_loopback_xmit);
}
/* Multicasts with ttl 0 must not go beyond the host */
--- 213,219 ----
{
struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC);
if (newskb)
! ip_dev_loopback_xmit(newskb);
}
/* Multicasts with ttl 0 must not go beyond the host */
***************
*** 242,252 ****
if (rt->rt_flags&RTCF_BROADCAST) {
struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC);
if (newskb)
! NF_HOOK(PF_INET, NF_IP_POST_ROUTING, newskb, NULL,
! newskb->dev, ip_dev_loopback_xmit);
}
! return ip_finish_output(skb);
}
int ip_output(struct sk_buff *skb)
--- 227,258 ----
if (rt->rt_flags&RTCF_BROADCAST) {
struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC);
if (newskb)
! ip_dev_loopback_xmit(newskb);
}
+
+ return ip_finish_output2(skb);
+ }
+
+ int ip_mc_output(struct sk_buff *skb)
+ {
+ struct rtable *rt = (struct rtable*)skb->dst;
+ struct net_device *dev = rt->u.dst.dev;
+
+ /*
+ * If the indicated interface is up and running, send the
packet.
+ */
+ IP_INC_STATS(IpOutRequests);
+ #ifdef CONFIG_IP_ROUTE_NAT
+ if (rt->rt_flags & RTCF_NAT)
+ ip_do_nat(skb);
+ #endif
+
+ skb->dev = dev;
+ skb->protocol = __constant_htons(ETH_P_IP);
! /* this will defrag if necessary */
! return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, dev,
! ip_mc_output2);
}
int ip_output(struct sk_buff *skb)
-----Original Message-----
From: Brian Kuschak
Sent: Wednesday, November 21, 2001 1:15 PM
To: 'netfilter-devel@lists.samba.org'
Subject: IP fragments being dropped
My netfilter is dropping all fragmented UDP packets (larger than 1472
bytes). Packets smaller than this are routed fine. I have a single
masquerading rule in iptables for all packets exiting on ppp0. The packets
in question are arriving on eth2 and exiting on eth1. When netfiler is
removed, fragments are routed properly.
For several reasons, I am forced to use an older kernel - 2.4.3. I see that
the fragment-handling has changed somewhat in the newer kernels. I suspect
the conntrack code is not defragmenting properly.
Can anyone provide any insight on what might need to be tweaked to get this
working on this older kernel?
Thanks,
Brian