ip_queue problem (and ip_conntrack DoS)

Alexander Demenshin aldem-nf@aldem.net
Sun, 9 Jul 2000 23:07:49 +0200


Hello folks,

	So, after one day of experiments with QUEUE target and my demo program,
	I found that kernel hangs when a lot of connections made in short time...
	
	My config:

		- kernel 2.4.0-test2-ac2
		- iptables 1.1.0
		
	(gcc/libc etc are irrelevant - see below why I think so)
	
	No rules are in iptables, except:
	
	iptables -t mangle -A PREROUTING -d 127.1.1.1 -j QUEUE
	
	(I use 127.1.1.1 as my test address; anyway everything that is in 127.0.0.0/8
	maps into local host).
	
	Then:
	
		ping -f -s 1024 127.1.1.1	# Even for long time it is OK
		
	But:
	
		/usr/apache/bin/ab -n 10000 -c 100 http://127.1.1.1:8080/	# b00m...
	
	ab is "apache bench", it only connects to web server, accepts data and nothing more,
	I use it to benchmark performance of networking code (and to stress-test it too).
	
	Actually, I've no webserver running, only it's emulation (accept any request and
	reply with some valid data; on localhost I can reach rate up 500 conns/sec).
	
	So, in command above, I request 10000 connections with 100 requests max in
	parallel. In very short time, kernel hangs. Just hangs. Or... Even better word - freezes.
	
	No "kernel panic", no Oops, _nothing_ - just no reaction to _any_ action from my side.
	SysReq also does not work, Caps/Num lock - no result. It is the reason (BTW) why I cannot
	provide more detailed information :)
	
	But, problem can be reproduced easily - as shown above - just make _a lot_ of connections
	in short time (at rate > 100/sec).
	
	I think that problem is related to ip_queue module, because I can reproduce it
	_only_ when I use QUEUE target (and when packets are processed and accepted in userspace).
	In fact it may be somewhere else - may be ip_conntrack*? (Problem does not exist in
	case if I have other targets - except QUEUE).
	
	BTW, what may happen if set_verdict is not accepted by kernel and is never retried
	in userspace (as in my case - sometimes it return error like "Not enough buffer space")?
	
	I'll try to track where is the problem with QUEUE, but with no Oops and	so on it is
	a little bit difficult :)

	Additionally, I've found very nice opportunity to DoS any system with ip_conntrack
	enabled. As we know, number of entries in conntrack table is by default 16K (or so),
	so it is easy to overflow it - in my case, if I use the same "ab" with connection
	count of "20000", it stops everything at some point (due to packet dropping).
	
	Why? Hmm... Timeout for finished connections is 120 sec, so in case if we make
	connections at rate 200/sec, in 100 sec we will reach table limit, so all following
	(and some existing) connections will have some problems. In case of very busy router
	with conntracking enabled, situation is even worse - some connections will be dropped
	randomly, some not accepted, etc... 
	
	Again and again - may be it worth to make conntack selective? For instance, if I use
	Linux+Netfilter as one of firewalls before _huge_ network (say, 1000 hosts), and I
	know for sure that some hosts are protected by other firewall(s), why I've to track
	all of them? Or if I explicitly allow _all_ traffic between specific addresses, does
	it worth to keep state on this?
	
	That's all... Any comments/ideas?
	
/Al