Deadlock in netfilter code (ftp-conntrack)
Max Kellermann
max@duempel.org
Wed Aug 11 14:28:02 CEST 2004
Hi,
I am currently hunting a deadlock bug in the netfilter code on severel
of our servers. I will provide more information when I can analyze the
next crash.
Two servers are crashing once a week since we upgraded to 2.6.7
(2.4.22 before; 2.4.23+ seemed to have a similar problem, though I
never debugged them). All servers are dual Xeon 2.6 GHz with 2 GB
memory, CCISS controller. Hyperthreading is enabled, making 4 virtual
CPUs. I used KDB remotely to debug (the Compaq boxes have a web
interface with a really ugly applet for remote console access - I have
no physical access to the servers).
Today, all CPUs except one hung in ip_ct_refresh(), trying to get a
write lock. The last CPU waited for a spinlock in ip_nat_ftp.c,
function help(). Unfortunately, KDB crashed before I could find out
more. On the previous crash, I was able to manually revive the server
by resetting the spinlock directly in kernel memory with KDB twice.
Is this a known bug in netfilter?
I hope KDB will stay up a bit longer on the next crash, so I can
locate the bug. You will hear from me again, hopefully with a patch
file attached..
Max
More information about the netfilter-devel
mailing list