Port based mangling trouble

Henrik Nordström hno@marasystems.com
Sun, 18 Nov 2001 22:12:10 +0100


I don't have a answer to your question, but:

What is wrong with using the Netfiler provided NAT framework? Why reinvent it 
all again?

There is a lot more to NAT than only changing IP addresses of the packets. 
You also need to deal with at least
a) Fragmented packets
b) Related ICMP packets

Plain IP based NAT may work fine without these if you stick to the IP level, 
but by inspecting port numbers you are up into the TCP levels which is more 
than IP. The Netfilter NAT framework has mostly all one need for any kind of 
NAT, but are lacking a little in selection mechanisms..

There is also another NAT implementation ontop of the Netfilter provided 
packet hoos: IPVS <http://www.linuxvirtualserver.org/>, duplicating much of 
the complexity of NAT but in a different manner than the "native" Netfiler 
implementation.

Regards
Henrik Nordstrom



On Sunday 18 November 2001 21.07, Chad Clark wrote:
> Hi All,
>
> I'm trying to put together a transparent source IP based NAT for load
> balencing. The idea is that a virtual server visible to our LAN as
> 10.10.1.252 (on eth0) will route all port 80 requests to one of two
> machines (10.10.7.2 and 10.10.7.3) (through eth1 as 10.10.7.1).
> Naturally the source IP will be changed (to 10.10.1.252) as the replies
> go back through the virtual sever.
>
> So far the following function hooks to NF_IP_PRE_ROUTING and seems to
> redirect to the selected IP
> okay.  The problem is when telnet from 10.10.1.8 to 10.10.1.252:80.
> Looking at tcpdump on 10.10.7.2 I see the requests comming in (to
> 10.10.7.2) but don't see any ACK's going back.  When I telnet from the
> virtual server (ie requests come from 10.10.7.1) to 10.10.7.2:80 tcpdump
> shows the connection going through (in fact 'GET /' works).
>
> It looks like the real sever (10.10.7.2) is droping the packets.  I'm
> guessing that the checksums don't match up.  Can anyone comfirm this or
> fill me in on what I'm missing?
>
> Thanks, Chad Clark
>
>
> /* file: cnf_hack.c
>    details: should rewrite incoming port 80 tcp trafic to one of 2 ip's
> based on the least
>    signifigant bit of the source IP ADDR. This is based on the
> netfiltering in the 2.3+ kernels
>    and not ip_tables. (ip_tables in turn rely on the same netfilter hooks
> as we have access to.)
>
>    Based strongly on code from Rusty Russell's (june 2000) Example module
> for Linux Mag.
>    Make SURE to compile with the line:
>      gcc -O2 -c -D__KERNEL__ -DMODULE cnf_hack.c -I/usr/src/linux/include
> */
>
>
> static unsigned int cnf_hack_hook(unsigned int hook, struct sk_buff
> **pskb,
> 				  const struct net_device *indev,
> 				  const struct net_device *outdev,
> 				  int (*okfn) (struct sk_buff *))
> {
>     /* We know this field (nh.iph) is valid, because we registered this
> as a PF_INET hook, so we
>      * will only ever be passed IP packets. */
>
>     struct iphdr *iphead = (void *) (*pskb)->nh.iph;
>     __u32 src_ip = iphead->saddr;
>     __u32 dst_ip = iphead->daddr;
>
>     /* struct tcphdr *tcphead = (void *) (*pskb)->h.th; <--- why does
> this line not work? */
>
>     struct tcphdr *tcphead = (struct tcphdr *) ((u_int32_t *) (*pskb)-
>
> >nh.iph +
>
>                              (*pskb)->nh.iph->ihl);
>     __u16 dst_port = tcphead->dest;
>     __u16 src_port = tcphead->source;
>
>     /* what to use here? (check
> in /usr/src/linux/include/netfilter_ipv4.h?) - chad */
>     (*pskb)->nfcache |= NFC_UNKNOWN;
>
>     if((dst_port == 0x5000)                /* 0x5000 is port
> 80                    */
>         && (dst_ip == 0xfc010a0a)          /* 0xfc010a0a =
> 10.10.1.252             */
>         && (src_ip != 0xfc010a0a)) {       /* never happen due to hook
> placement?  */
>
>             printk("cnf_hack: srcIP:%x", src_ip);
>   	    if (src_ip & 0x01000000) {                 /* 0x0100 0000 =
> least sig bit */
> 	    	printk(" odd");
> 	    	printk(" so route to:%x",0x03070a0a);
> 	    	iphead->daddr = 0x03070a0a;            /* odd go to
> 10.10.7.3 */
> 	    }
> 	    else {
> 	    	printk(" even");
> 	    	printk(" so route to:%x",0x02070a0a);
> 	    	iphead->daddr = 0x02070a0a;            /* even go to
> 10.10.7.2 */
> 	    }
> 	    printk(" replaces: %x orig_dest_port:%x\n", dst_ip, dst_port);
>
> 	    /* redo checksum */
>             tcphead->check = 0;
>             tcphead->check = tcp_v4_check(tcphead,
>                                  sizeof(struct tcphdr),
>                                  (*pskb)->nh.iph->saddr,
>                                  (*pskb)->nh.iph->daddr,
>                                  csum_partial((char *) tcphead,
>                                  sizeof(struct tcphdr), 0));
> 	    (*pskb)->nh.iph->check = 0;
> 	    (*pskb)->nh.iph->check = ip_fast_csum((unsigned char *)
> 	                             (*pskb)->nh.iph, (*pskb)->nh.iph-
>
> >ihl);
>
>     	    return NF_ACCEPT;
>     }
> }