Detailed report on SMB-build lockups [seems that it is locking problem in networking code] (2.4.0-test2-ac2 and later) [PATCH]

Alexander Demenshin aldem@aldem.net
Tue, 11 Jul 2000 23:39:53 +0200


On Wed, Jul 12, 2000 at 02:54:47AM +1000, James Morris wrote:

> Please see the patch below which ensures that the brlock stays held if
> feeding packets back into the stack, and disables local bhs during the
> call.

  Even worse (sorry, but)... _complete_ lockup, even no SysRq available...
  
  However, in test3, there were some fixes related to modules I mentioned
  in my report, I tried it - far better (at least it takes longer till crash :).
  
  So... (I was out for a while :)). In test3 there is no more locks with
  fragmented ICMP (so far it passed over 18M packets - against 500K before).

  Now... I am going to kdb with test3 :)
  
  Over 3M packets ( I am soooooooo patient)... oops!

  So... Second CPU is on idle...
  
  stack trace from kdb on active CPU:
  
  	tcp_keepalive_timer
  	timer_bh
  	bh_action
  	...blablabla...
  	ip_local_deliver_finish
  	nf_hook_slow
  	ip_local_deliver
  	ip_rcv_finish
  	nf_reinject		!!! YES !!! (well, we knew already that it was here :)))
  
  Going inside... Hmm.. Nothing interesting. Mix of locks during recursive invocations...
  Brrr... I'll try top apply patch to test3 - may lockups in pre8 were caused by other
  problems...
  
  oops... Over 4M packets with _no_ problems... Hmmm... Looks strange :))
  
  But no... Now real fun begins....
  
  ping -f + my test + real bench on real server (with disk io) + conntrack...
  
  Oops... Real "OOPS"... After first test it was OK, but first attempt to start
  second one (on real web server) - I am now in KDB again.
  
  First EIP was meaningless, after "go" I've got "Kernel BUG in sched.c:683"...
  
  Uff... I am tired. Too much oops today... At least I hope that there is no
  more ipfrag bug...
  
  That's all. Someone like to hunt instead of me (today)? ;))  

/Al