[Q] connection tracking scaling
Mon, 18 Mar 2002 20:45:52 +0200
One fix: the data was for 2000 sessions --> 4000 connections (sorry)
More info on the setup:
Client a (100Mbit) ----> switch ---> client b (100Mbit)
With 2000 connections I get ~90Mbits in + ~90Mbits out
With 4000 connections I get ~60Mbits in + ~60 Mbits out
With 8000 connections and no connection tracking in the kernel (test setup):
~85Mbits in + ~85Mbits out
90Mbits*2 ==> ~15000 packets/second
grep conntrack /proc/slabinfo (about 4200 connections at the time)
ip_conntrack 4527 4532 352 412 412 1
8192 buckets, 65536 max
I guess the hash function fails for my setup (note: this is a test setup,
connection originating from a single ip to a single ip)
From: Patrick Schaaf [mailto:email@example.com]
Sent: Monday, March 18, 2002 8:10 PM
To: Aviv Bergman
Subject: Re: [Q] connection tracking scaling
> I'm working on a proxy type program, using REDIRECT to catch (tcp)
> traffic, and I'm seeing severe network degradation above ~2000
> (computer: 1Gb p3, 2Gb memory, kernel 2.4.18 + aa1 patch)
Two questions first: how many packets/second is these 2000 connections? And,
do you have a feeling about how far the system would scale if it were'nt for
Also, what is the total number of conntrack entries you have? Do this:
grep conntrack /proc/slabinfo
and show us the output line, please.
> I've profiled the kernel and found that > 50% of the cpu time is in
> - is there a patch to make connection tracking use a
> more scalable data structure (as I understand it uses a list), or to
> improve it's performance?
Please look a bit more careful at the code. While __ip_conntrack_find()
itself operates on a list, this is only the "inner loop" of a hash table
implementation. The setup is the usual "array of pointers to hash lists, and
hopefully they don't collide" setup. Nothing really wrong with that, in
principle, as far as data structures go.
The most likely cause for that 50% CPU usage, is that the hash table is too
small for your application. Contrary to ip_conntrack_max, the total number
of entries in the conntracking, the hash table size is not modifiable at
runtime. In ip_conntrack_core.c, that size is gotten from the variable
ip_conntrack_htable_size. This variable is computed at boot / module load
time, depending on the amount of RAM in your system.
IF you run ip_conntrack as a module, you can override the computed hash
table size by specifying "hashsize=XXX" as a module load parameter.
You can see the active value chosen for the hash table size, in syslog:
ip_conntrack (XXX buckets, YYY max)
The XXX is the number I'm talking about. Given ZZZ active conntrack entries
(as seen in /proc/slabinfo) you'll have, on average, a list size within
__ip_conntrack_find() of ZZZ/XXX.
The secondary reason for overly long lists, would be a bad hash function. If
you want to find out whether that could be the case, you could instrument
__ip_conntrack_find() to count the length of each list during traversal,
remember that somewhere, and occasionally printk() an average, minimum, and
Hope this helps...