transparent proxy implementation stuff

Balazs Scheidler bazsi@balabit.hu
Fri, 21 Dec 2001 20:06:27 +0100


Hi,

I had a couple of thoughts about transparent proxy implementation. And I
insert them here. Comments are welcome.

The biggest issue will be to decide when to remove entries from the
translation table. A smaller issue is that sockets to be used for
transparent proxying must be bound explicitly, even when they are used for
connecting to remote hosts. 

(it's a comment in iptable_tproxy.c therefore the C comment)

/* Transparent proxying for netfilter
 *
 * Requirements:
 * -------------
 * 
 * There are 4 features needed for real transparent proxying:
 *   1. the proxy must be able to initiate connections from foreign 
 *      IP/port pair
 *   2. the proxy must be able to intercept connections destined to 
 *      foreign IP/port pair _without_ adding a firewall rule
 *   3. the administrator must be able to redirect connections destined 
 *      to foreign IP/port pair (the so-called redirection)
 *   4. defer sending the syn-ack to an incoming syn until the proxy decides
 *      whether the connection is allowed or not. We must be very careful
 *      here, because enabling this feature may involve a _very_ easy to
 *      exploit DoS condition. Rigid limits, and using it only for trusted
 *      networks should help though.
 *
 * Earlier kernel versions (v2.2) supported the first three with ugly hacks
 * in both the routing code and the TCP/UDP implementation. This time we
 * try to implement a cleaner solution. Requirements:
 *
 *   1. not to use sockets bound to non-local addresses, because the routing
 *      code may easily be broken
 *   2. add the least possible code to UDP/TCP implementation
 *   3. plug nicely into netfilter
 *
 * Implementation
 * --------------
 *
 * The implementation uses a simple NAT-like functionality to redirect packets
 * to local sockets. For this we need the following data structure:
 *
 *   - TCP/UDP sockets bound to local addresses, these sockets must 
 *     be explictly bound to the correct interface (we should be provide a
 *     function to bind sockets by destination just like the autobind function
 *     in connect() & sendmsg())
 *   - a translation table containing address/port tuples and the address of 
 *     the local socket
 *   - a few fields in the IPCB part of the skb (origdstaddr)
 *   - a few fields in the sock->af_inet (origdstaddr)
 *   - a new iptables table called tproxy specifying local redirections
 *
 * The first three required features are implemented this way:
 *
 *  1. UDP
 *
 *     Application part:
 *
 *       sending messages with arbitrary source address:
 *
 *         the application opens a socket, calls a setsockopt(SOL_UDP,
 *         UDP_TPROXY_DSTADDR, 1), which in turn enables the
 *         CMSG_UDP_TPROXY_DSTADDR control message. This control message
 *         allows the application to specify the source it wants when using
 *         sendmsg().
 *
 *       receiving messages originally not destined to the firewall:
 *
 *         there are two ways of receiving datagrams originally not destined
 *         to the firewall: 1) using a REDIRECT-like target (we'll call this
 *         TPROXY from now on) in the tproxy table, 2) using a bind-like
 *         operation (but not bind) which catches all traffic destined to
 *         that specific IP/port pair. In any of these cases the original
 *         destination address is lost. The application may get this address
 *         by calling setsockopt(SOL_UDP, UDP_TPROXY_DSTADDR, 1), which
 *         enables the CMSG_UDP_TPROXY_DSTADDR control message, containing
 *         this lost address when using recvmsg().
 *
 *       receiving messages originally not destined to the firewall _and_ from a specific host:
 *
 *         the BSD socket library allows an UDP socket to be connected. A
 *         connected socket receives messages only from the given host,
 *         everything else is dropped.  A similar technique should work with
 *         transparent proxying: bind-like setsockopt and connect() (or a
 *         connect-like setsockopt) should result in the same behaviour.
 *
 *     Kernel part:
 *
 *       sending messages with arbitrary source address:
 *
 *         if the application uses sendmsg() and specifies an
 *         CMSG_UDP_TPROXY_DSTADDR control message, the supplied address is
 *         stored in the IPCB, the local OUTPUT hook picks this value, and
 *         rewrites the source address accordingly.
 *
 *       receiving messages originally not destined to the firewall using a TPROXY rule:
 *
 *         the original destination address is saved in the IPCB, and the
 *         destination is rewritten so that the destination IP is the
 *         primary IP address of the interface the packet was received on
 *         (default, could be specified as an argument to the TPROXY target),
 *         and the port is the port number specified to TPROXY.
 *
 *       receiving messages originally not destined to the firewall using a bind-like operation:
 *
 *         The application calls setsockopt(SOL_UDP, UDP_TPROXY_SRCADDR) with a
 *         sockaddr specifying the address it wants to catch messages on. 
 *         This call adds an entry to the translation table: a tuple
 *         describing the packets to be caught (wildcard source, specified
 *         address as destination), and the socket address as the address to
 *         translate to (if bound to specific interface address, otherwise
 *         the PREROUTING hook will automatically substitute the address of the incoming
 *         interface)
 *
 *       receiving messages originally not destined to the firewall and from a specific host:
 *
 *         this is similar to the previous case, but the application also calls
 *         connect() after UDP_TPROXY_SRCADDR.
 *
 *  2. TCP
 *
 *     Application part:
 *
 *       initiating a connection from a foreign IP address:
 *
 *         the application creates a socket, calls setsockopt(SOL_TCP,
 *         TCP_TPROXY_SRCADDR) with an IP/port pair as the outgoing source
 *         address. It then calls connect as it normally would to connect to
 *         its destination.
 *
 *       intercepting a connection with TPROXY target:
 *
 *         the application can get the original destination address/port pair
 *         using getsockopt(SOL_TCP, TCP_TPROXY_DSTADDR)
 *
 *       intercepting a connection with a bind-like function:
 *
 *         the application calls setsockopt(SOL_TCP, TCP_TPROXY_SRCADDR)
 *         specifying an address which should be captured. The original
 *         destination address can again be queried using
 *         getsockopt(SOL_TCP, TCP_TPROXY_DSTADDR).
 *
 *     Kernel part:
 *
 *       initiating a connection from a foreign IP address:
 *
 *         as the application creates a socket and calls TCP_TPROXY_SRCADDR,
 *         the setsockopt code adds an entry to the translation table.
 *
 *       intercepting connections with a TPROXY target:
 *
 *         the original destination address is saved in the IPCB, and the
 *         destination is rewritten so that the destination IP is the
 *         primary IP address of the interface the packet was received on
 *         (default, could be specified as an argument to the TPROXY
 *         target), and the port is the port number specified to TPROXY. 
 *         This redirects the packet to the local IP stack, where tcp_rcv()
 *         checks for incoming connections.  When a new connection is
 *         accepted, the original destination address is saved in the socket
 *         so getsockopt(SOL_TCP, TCP_TPROXY_DSTADDR) can query it. The
 *         TPROXY target also adds a new, conditional entry to the
 *         translation table. Conditional means that if no process listen on
 *         the redirected port, and the kernel returns an RST in response,
 *         the entry should be removed. If the connection is established
 *         successfully (e.g. a matching socket was found in tcp_rcv), the entry
 *         should be associated with the socket, so it can be removed when the
 *         socket is destroyed.
 *
 *       intercepting connections with a bind-like operation:
 *
 *         the application calls setsockopt(SOL_TCP, TCP_TPROXY_SRCADDR), which
 *         adds an entry to the translation table.
 *
 *  3. Netfilter hooks
 *
 *     The translation table is processed by netfilter hooks, registered in
 *     PREROUTING and OUTPUT.
 *
 *       PREROUTING hook
 *
 *         This hook processes incoming packets as they enter on the
 *         incoming interface. This hook first checks if the
 *         destination/source address matches any addresses in the
 *         translation table. If it does, it translates the packet
 *         accordingly (DNAT), and sends it on.  If the packet doesn't match
 *         anything in the translation table, it consults the iptable
 *         tproxy. This table may contain TPROXY targets. The TPROXY target
 *         may translate the packet as it needs to (based on its
 *         parameters), and either add a new entry to the translation table
 *         (for TCP), or not (for UDP).
 *
 *       OUTPUT hook
 *
 *         This hook processes packets generated by the localhost. This hook
 *         first checks if the destination/source address matches any
 *         addresses in the translation table, if it does it translates the
 *         packet accordingly (SNAT), and sends it on. If the packet doesn't
 *         match anything in the translation table it similarly consults the
 *         iptable tproxy, to make it possible to TPROXY locally generated
 *         connections. The output hook should check for TCP reset
 *         packets/ICMP port unreachable packets, which indicate that there was
 *         no listening socket 
 *
 * Some issues:
 *  - deleting entries from the translation table (we should hook into
 *    socket destruction if possible)
 *  - what happens if connect() fails without knowing about the translation
 *    (should be solved by requiring an explicit bind to the correct
 *    interface)
 *  - binding sockets by destination to the correct outgoing interface
 *    (since the kernel knows which is the correct network interface we
 *    should provide some setsockopt or something, which does the job, so
 *    the application doesn't have to mess with interface and routing information)
 *  - do snat in local OUTPUT or POSTROUTING ?
 *    since the source address may affect routing, I think it should be done in OUTPUT,
 *    real NAT does it in POSTROUTING, and reroutes the packet if anything changes.
 *  - interoperability with filter/NAT/conntrack ?
 *  - icmp handling
 *
 */