Detailed report on SMB-build lockups [seems that it is locking problem in networking code] (2.4.0-test2-ac2 and later) [PATCH]
Alexander Demenshin
aldem@aldem.net
Tue, 11 Jul 2000 23:39:53 +0200
On Wed, Jul 12, 2000 at 02:54:47AM +1000, James Morris wrote:
> Please see the patch below which ensures that the brlock stays held if
> feeding packets back into the stack, and disables local bhs during the
> call.
Even worse (sorry, but)... _complete_ lockup, even no SysRq available...
However, in test3, there were some fixes related to modules I mentioned
in my report, I tried it - far better (at least it takes longer till crash :).
So... (I was out for a while :)). In test3 there is no more locks with
fragmented ICMP (so far it passed over 18M packets - against 500K before).
Now... I am going to kdb with test3 :)
Over 3M packets ( I am soooooooo patient)... oops!
So... Second CPU is on idle...
stack trace from kdb on active CPU:
tcp_keepalive_timer
timer_bh
bh_action
...blablabla...
ip_local_deliver_finish
nf_hook_slow
ip_local_deliver
ip_rcv_finish
nf_reinject !!! YES !!! (well, we knew already that it was here :)))
Going inside... Hmm.. Nothing interesting. Mix of locks during recursive invocations...
Brrr... I'll try top apply patch to test3 - may lockups in pre8 were caused by other
problems...
oops... Over 4M packets with _no_ problems... Hmmm... Looks strange :))
But no... Now real fun begins....
ping -f + my test + real bench on real server (with disk io) + conntrack...
Oops... Real "OOPS"... After first test it was OK, but first attempt to start
second one (on real web server) - I am now in KDB again.
First EIP was meaningless, after "go" I've got "Kernel BUG in sched.c:683"...
Uff... I am tired. Too much oops today... At least I hope that there is no
more ipfrag bug...
That's all. Someone like to hunt instead of me (today)? ;))
/Al