ipqueue and conntrack confirm problem

Mattias Nissler nissler@fdns-services.de
Fri, 23 Nov 2001 13:25:38 +0100


This is a multi-part message in MIME format.

------=_NextPart_000_0011_01C17422.57554280
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hallo,

I've got a question regarding the netfilter code, since I encountered a =
problem. I'm quite unexperienced in kernel+netfilter code, so here is my =
excuse if the question I ask was already covered on the list. I'm =
currently giving a try in writing an libipq application, some small =
replacement for diald that will receive packets via the queue and will =
set NF_ACCEPT after having dialed the interface. What I found when =
testing was that not all of my pings didn't reach the destination, they =
weren't even transmitted out of my box (checked that with tcpdump). So I =
jumped into the kernel code to track down the problem. A funny thing I =
soon discovered was that the first waiting packet that my app =
transferred after dialing was actually output, but the others were =
dropped. After some hours of investigation I found out that those =
packets were dropped in ip_refrag, which seems to be the last time =
netfilter is working with a packet. The reason for dropping is that the =
packet cannot be confirmed, i.e. netfilter isn't able to add it to the =
hash tables. There's nothing suprising about that, since the first =
retransmitted packet actually makes it way into the hash, so the others =
won't since their tuple is already used. With normal execution there =
should be no problem, since when the after-first packets come in they =
will use the old conntrack that is created for the first packet. And now =
finally here is the problem. The first conntrack is only confirmed when =
actually leaving netfilter in ip_refrag. But since it is kept by my app =
this conntrack won't be confirmed. Since it is not confirmed, the =
following packets will get allocated their own conntracks and won't make =
their way through ip_refrag as a result, since now there is a conntrack =
which blocks the others.
To sum it up: Many packets arrive. They won't make their way right out =
but are held back by my ip_queue app. When they are released, the first =
packet will be confirmed, the others will get drop. The reason for this =
is that new packets arrive while the first packet hasn't left netfilter =
yet.
After some thinking I thought that perhaps this could also occur with =
packets arriving very frequently on the same connection (I thought of =
flood pings?). But this depends on the Linux networking code (i.e. if =
there is a possibility to get a packet into the networking code after =
the one before has left, which is achieved through ip_queue in my =
description above). I'd like to hear a statement of some more =
expirienced persons about that.
After having found the problem I thought about solutions. Several =
approaches came to my mind. One would be to confirm a conntrack just =
after arriving in netfilter. I think this is against current netfilter =
policy, since packets that will be dropped and never reach the other end =
of netfilter are also tracked. The second solution I thought about was =
not to drop the packet in ip_refrag when the conntrack cannot be =
confirmed. This could resolve the problem, but is a quite radical change =
that works form me (I tried it), but may break things I don't know about =
yet. The best thing that came to my mind (I think so) would be to =
confirm the packet when it is queued to userspace. The following packets =
could find their conntrack in the hashes and would be happy (I haven't =
tried it, but I will do).
One last thing about my ping tests. I know that icmp packets time out =
very quickly, I've read ip_conntrack_proto_icmp.c. So I also could make =
my app wait that timeout, so the next transmitted packet would get =
confirmed successfully since the old ones are already timed out. But I =
don't think that will fix the problem at its roots.

Expecting your opinions, suggestions and corrections (!),

Mattias Nissler


------=_NextPart_000_0011_01C17422.57554280
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Diso-8859-1" =
http-equiv=3DContent-Type>
<META content=3D"MSHTML 5.00.2014.210" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#fff4b0>
<DIV><FONT size=3D2>Hallo,</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT size=3D2>I've got a question regarding the netfilter code, =
since I=20
encountered a problem. I'm quite unexperienced in kernel+netfilter code, =
so here=20
is my excuse if the question I ask was already covered on the list. I'm=20
currently giving a try in writing an libipq application, some small =
replacement=20
for diald that will receive packets via the queue and will set NF_ACCEPT =
after=20
having dialed the interface. What I found when testing was that not all =
of my=20
pings didn't reach the destination, they weren't even transmitted out of =
my box=20
(checked that with tcpdump). So I jumped into the kernel code to track =
down the=20
problem. A funny thing I soon discovered was that the first waiting =
packet that=20
my app transferred after dialing was actually output, but the others =
were=20
dropped. After some hours of investigation I found out that those =
packets were=20
dropped in ip_refrag, which seems to be the last time netfilter is =
working with=20
a packet. The reason for dropping is that the packet cannot be =
confirmed, i.e.=20
netfilter isn't able to add it to the hash tables. There's nothing =
suprising=20
about that, since the first retransmitted packet actually makes it way =
into the=20
hash, so the others won't since their tuple is already used. With normal =

execution there should be no problem, since when the after-first packets =
come in=20
they will use the old conntrack that is created for the first packet. =
And now=20
finally here is the problem. The first conntrack is only confirmed when =
actually=20
leaving netfilter in ip_refrag. But since it is kept by my app this =
conntrack=20
won't be confirmed. Since it is not confirmed, the following packets =
will get=20
allocated their own conntracks and won't make their way through =
ip_refrag as a=20
result, since now there is a conntrack which blocks the =
others.</FONT></DIV>
<DIV><FONT size=3D2>To sum it up: Many packets arrive. They won't make =
their way=20
right out but are held back by my ip_queue app. When they are released, =
the=20
first packet will be confirmed, the others will get drop. The reason for =
this is=20
that new packets arrive while the first packet hasn't left netfilter=20
yet.</FONT></DIV>
<DIV><FONT size=3D2>After some thinking I thought that perhaps this =
could also=20
occur with packets arriving very frequently on the same connection (I =
thought of=20
flood pings?). But this depends on the Linux networking code (i.e. if =
there is a=20
possibility to get a packet into the networking code after the one =
before has=20
left, which is achieved through ip_queue in my description above). I'd =
like to=20
hear a statement of some more expirienced persons about =
that.</FONT></DIV>
<DIV><FONT size=3D2>After having found the problem I thought about =
solutions.=20
Several approaches came to my mind. One would be to confirm a conntrack =
just=20
after arriving in netfilter. I think this is against current netfilter =
policy,=20
since packets that will be dropped and never reach the other end of =
netfilter=20
are also tracked. The second solution I thought about was not to drop =
the packet=20
in ip_refrag when the conntrack cannot be confirmed. This could resolve =
the=20
problem, but is a quite radical change that works form me (I tried it), =
but may=20
break things I don't know about yet. The best thing that came to my mind =
(I=20
think so)&nbsp;would be to confirm the packet when it is queued to =
userspace.=20
The following packets could find their conntrack in the hashes and would =
be=20
happy (I haven't tried it, but I will do).</FONT></DIV>
<DIV><FONT size=3D2>One last thing about my ping tests. I know that icmp =
packets=20
time out very quickly, I've read ip_conntrack_proto_icmp.c. So I also =
could make=20
my app wait that timeout, so the next transmitted packet would get =
confirmed=20
successfully since the old ones are already timed out. But I don't think =
that=20
will fix the problem at its roots.</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT size=3D2>Expecting your opinions, suggestions and corrections =

(!),</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT size=3D2>Mattias Nissler</FONT></DIV>
<DIV>&nbsp;</DIV></BODY></HTML>

------=_NextPart_000_0011_01C17422.57554280--