[RFC] ct_sync 0.15 (corrected)

Harald Welte laforge@netfilter.org
Thu Aug 19 12:06:46 CEST 2004


--vr1sEM+RgL05fCrX
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Aug 13, 2004 at 04:26:30PM +0200, KOVACS Krisztian wrote:

>      1. There should some facility by which one can select which
>         connections have to be replicated. This way it would be possible
>         to limit replication traffic to the bare minimum. For example,
>         there is no point in replicating conntrack entries for
>         connections whose endpoint is one of the nodes (administrative
>         SSH traffic, for example). A per-conntrack flag would be needed,
>         just like CONNMARK, which could be set for conntracks needing
>         replication with a simple iptables rule. Actually, CONNMARK is
>         enough, if we choose a given bit of the mark as the SYNC bit.
>         Besides this, we should decide if we needed a SYNC or a NOSYNC
>         bit, that is, if the default mode of operation should be "sync
>         or not to sync".

I would just use connmark for now.  Let's make it a CONFIG option
though, so people can just use connmark without any interference and
replicate all connections.

>      2. The error recovery functions in the protocol layer should be
>         revamped.=20
> 	  However, to iterate over the entries of the ring, it should
> 	  hold the spinlock of the ring, which is not possible, since
> 	  the send() operation may sleep... (This is done from the
> 	  receiver thread, and the ring is accessed from the sender
> 	  thread and from softirq context as well.) What would be the
> 	  most elegant solution?

given that this is a event expected to happen very rarely, I would
propose to just:
- grab the lock
- copy the whole ring (or the needed parts)
- release the lock
- send packets from the local copy (may sleep)
- free local copy

>         On the other hand, it may be possible that the master is not
>         able to re-send the packet, for example this may be the case if
>         it is "too old", and is not present in the backlog anymore. In
>         this case, the slave should be notified that recovery is not
>         possible this way, and it needs to do a full re-sync.

Within the current protocol, the master can just make that decision and
do a full resync without telling the slave.

>	  This is why I thought that we should include some extra
>	  information in every packet: the minimal sequence number of
>	  the oldest packet in the master's backlog.=20

Agreed.  We should also add a read-only sysctl that tells userspace
whether a slave is already fully-synced.=20

>         So, do anyone know of anything which could be used by ct_sync?
>         (It has to be a semi-reliable, connectionless multicast protocol
>         with a _very_ low overhead.)

everything I've seen so far about reliable multicast is inherently
complex.

>      3. There are a few things in the connection tracking code which
>         are incompatible with replication "by design". For example,
>         the expectfn() function in the expectation structure is such:
>         simply, there is no way to replicate a stand-alone function
>         pointer which could point to any arbitrary function.=20

Yes, indeed.  we could look up the symbol name in the symbol table and
replicate that ;)   Crude hack, but it would work.

> 	 One more example could be TCP window tracking, I don't think we
> 	 have the necessary bandwidth and CPU time to send an update
> 	 message after each and every received TCP packet... Any idea
> 	 how we could solve these problems?

We already do this since the timeout is updated with every packet.  So
at this point, I see not much difference.  Jozsef and I agreed some time
in the past, that if we don't replicate all the window information, in
the event of a slave being propagated to master, the new master should
disable windowtracking or switch into a lazy mode.

>      4. The current version is 2.4-only, it is for the good old
>         ip_conntrack, and supports IPv4 only. I don't really think
>         this is the way to go, but there is commercial interest in
>         having this kind of failover functionality as fast as possible.

Ack.

>         However, I think that after reaching some state which is
>         acceptable for the users needing the basic features fast, this
>         whole thing should be re-designed and ported to 2.6 and
>         nf_conntrack. This would depend on a few other things, such as
>         porting ctnetlink for nf_conntrack, but I thing those would
>         be important to have as well. Again, this would be quite a
>         lot work to do, thus deferring the 'stable' (production ready)
>         release of the code.

I would first make the 2.4.x version stable and almost feature-complete
(as far as possible).  We have then learned our lessons and can clean it
up while porting on top of nf_conntrack.

>  Regards,
>    Krisztian KOVACS

--=20
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

--vr1sEM+RgL05fCrX
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBJInGXaXGVTD0i/8RAtwCAJ9NdUaHjeGS5GjXkPCOsmyuKAuT6ACdFzhF
P69qbkuFXnNaolwig9pReKQ=
=PDLK
-----END PGP SIGNATURE-----

--vr1sEM+RgL05fCrX--




More information about the netfilter-devel mailing list