From zhaojingmin at hotmail.com Wed Mar 1 03:57:47 2006 From: zhaojingmin at hotmail.com (Jing Min Zhao) Date: Wed Mar 1 04:10:01 2006 Subject: New H.323 conntrack & NAT helper module References: <925A849792280C4E80C5461017A4B8A2032119@mail733.InfraSupportEtc.com><44001CDD.3030305@trash.net> <4400A541.9080901@trash.net> Message-ID: ----- Original Message ----- From: "Patrick McHardy" To: "Jing Min Zhao" Cc: ; "Greg Scott" Sent: Saturday, February 25, 2006 1:43 PM Subject: Re: New H.323 conntrack & NAT helper module Hi, Patrick, Thank you for your patches, they are very helpful. I've already applied the patch for excessive stack usage. For the patch of adding support for non-linear SKBs, I admit it is even a bug if there is no support for non-linear SKBs, but I have some different idea for the checksum method. > Add support for non-linear SKBs. I know switching to > ip_nat_mangle_{tcp,udp}_packet is less efficient because it checksums > the packet on each call, but that can be fixed seperately by switching > it to incremental checksumming. Imagine a Setup signal with 30 fast-start entries (this is not unusual for Gnomemeeting and OpenPhone), if you use ip_nat_mangle_tcp_packet, you will have to call it 45 times. For a RRQ message, you will possible call ip_nat_mangle_udp_packet more than 10 times if it contains many signal addresses. You can use incremental checksumming, but I'm still worrying about the efficiency. This is why I prefer to use a counter (please see the last paragraph) to track modifications and do the checksum only once. > Change the H.323 helper to support non-linear skbs similar to the other > helpers. This has two additional positive side-effects: > - skb_writable was broken as it always tried to reload the data pointer > with > an assumed TPKT payload, even for H.225 RAS packets. This can be fixed by seeing the protocol. > - we can use the regular NAT packet mangling functions and get rid of the > manual checksumming > > Well, thank you, This is really great work. I forgot one small issue: > there seems to be some debugging-leftover, the functions registering > expectations add up the number of registered expectations and return > that value, but nobody uses it. If there are no plans for using it, > I would prefer to have it removed. > The return is actually a modification track counter. If a function successfully changed a packet, the counter will be increased. Finally, if the counter is positive, the packet will be checksumed; if it's 0, no changes; if negative, error happened. Best regards, Jing Min Zhao From thomas at gelf.net Wed Mar 1 13:42:13 2006 From: thomas at gelf.net (Thomas Gelf) Date: Wed Mar 1 14:03:55 2006 Subject: NAT behave - hairpinning Message-ID: Hi all, I googled around for a some time to find information regarding "hairpin" support for netfilter. Reason: I don't like my public SIP Registrar / Proxy to redirect calls between my local STUN-enabled and NATed SIP clients to an external RTP Proxy (no, sip-conntrack-nat is not what I'm looking for). The most recent hint relating this topic I've managed to find on Rusty's bleeding edge page: http://ozlabs.org/~rusty/index.cgi/2005#2005-10-10 It would be great if someone could give me some additional information: - is hairpinning meant to be supported in near-future official linux kernels? - is it planned to provide such a patch on patch-o-matic? Thanks a lot! Kind regards, Thomas Gelf -- Thomas Gelf From pablo at netfilter.org Wed Mar 1 16:23:17 2006 From: pablo at netfilter.org (Pablo Neira Ayuso) Date: Wed Mar 1 16:34:30 2006 Subject: [PATCH 6/5] [CTNETLINK] Fix expectation mask dumping In-Reply-To: <44025F8A.6070302@netfilter.org> References: <44025F8A.6070302@netfilter.org> Message-ID: <4405BC65.1030705@netfilter.org> This applies to 2.6.16 This patch introduces the function ctnetlink_exp_dump_mask, that correctly dumps the expectation mask. Such function uses the l3num value from the expectation tuple that is a valid layer 3 protocol number. The value of the l3num mask isn't dumped since it is meaningless from the userspace side. -- The dawn of the fourth age of Linux firewalling is coming; a time of great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris -------------- next part -------------- [CTNETLINK] Fix expectaction mask dumping The expectation mask has some particularities that requires a different handling. The protocol number fields can be set to non-valid protocols, ie. l3num is set to 0xFFFF. Since that protocol does not exist, the mask tuple will not be dumped. Moreover, this results in a kernel panic when nf_conntrack accesses the array of protocol handlers, that is PF_MAX (0x1F) long. This patch introduces the function ctnetlink_exp_dump_mask, that correctly dumps the expectation mask. Such function uses the l3num value from the expectation tuple that is a valid layer 3 protocol number. The value of the l3num mask isn't dumped since it is meaningless from the userspace side. Thanks to Yasuyuki Kozakai and Patrick McHardy for the feedback. Signed-off-by: Pablo Neira Ayuso Index: net-2.6.16.git/net/netfilter/nf_conntrack_netlink.c =================================================================== --- net-2.6.16.git.orig/net/netfilter/nf_conntrack_netlink.c 2006-02-28 16:52:20.000000000 +0100 +++ net-2.6.16.git/net/netfilter/nf_conntrack_netlink.c 2006-02-28 16:52:22.000000000 +0100 @@ -4,7 +4,7 @@ * (C) 2001 by Jay Schulist * (C) 2002-2005 by Harald Welte * (C) 2003 by Patrick Mchardy - * (C) 2005 by Pablo Neira Ayuso + * (C) 2005-2006 by Pablo Neira Ayuso * * I've reworked this stuff to use attributes instead of conntrack * structures. 5.44 am. I need more tea. --pablo 05/07/11. @@ -55,20 +55,18 @@ static char __initdata version[] = "0.92 static inline int ctnetlink_dump_tuples_proto(struct sk_buff *skb, - const struct nf_conntrack_tuple *tuple) + const struct nf_conntrack_tuple *tuple, + struct nf_conntrack_protocol *proto) { - struct nf_conntrack_protocol *proto; int ret = 0; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); NFA_PUT(skb, CTA_PROTO_NUM, sizeof(u_int8_t), &tuple->dst.protonum); - /* If no protocol helper is found, this function will return the - * generic protocol helper, so proto won't *ever* be NULL */ - proto = nf_ct_proto_find_get(tuple->src.l3num, tuple->dst.protonum); if (likely(proto->tuple_to_nfattr)) ret = proto->tuple_to_nfattr(skb, tuple); - nf_ct_proto_put(proto); + NFA_NEST_END(skb, nest_parms); return ret; @@ -77,33 +75,44 @@ nfattr_failure: } static inline int -ctnetlink_dump_tuples(struct sk_buff *skb, - const struct nf_conntrack_tuple *tuple) +ctnetlink_dump_tuples_ip(struct sk_buff *skb, + const struct nf_conntrack_tuple *tuple, + struct nf_conntrack_l3proto *l3proto) { - struct nfattr *nest_parms; - struct nf_conntrack_l3proto *l3proto; int ret = 0; - - l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); - - nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); + if (likely(l3proto->tuple_to_nfattr)) ret = l3proto->tuple_to_nfattr(skb, tuple); + NFA_NEST_END(skb, nest_parms); + return ret; + +nfattr_failure: + return -1; +} + +static inline int +ctnetlink_dump_tuples(struct sk_buff *skb, + const struct nf_conntrack_tuple *tuple) +{ + int ret; + struct nf_conntrack_l3proto *l3proto; + struct nf_conntrack_protocol *proto; + + l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); + ret = ctnetlink_dump_tuples_ip(skb, tuple, l3proto); nf_ct_l3proto_put(l3proto); if (unlikely(ret < 0)) return ret; - nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); - ret = ctnetlink_dump_tuples_proto(skb, tuple); - NFA_NEST_END(skb, nest_parms); + proto = nf_ct_proto_find_get(tuple->src.l3num, tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, tuple, proto); + nf_ct_proto_put(proto); return ret; - -nfattr_failure: - return -1; } static inline int @@ -1150,6 +1159,37 @@ nfattr_failure: } static inline int +ctnetlink_exp_dump_mask(struct sk_buff *skb, + const struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_tuple *mask) +{ + int ret; + struct nf_conntrack_l3proto *l3proto; + struct nf_conntrack_protocol *proto; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_EXPECT_MASK); + + l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); + ret = ctnetlink_dump_tuples_ip(skb, mask, l3proto); + nf_ct_l3proto_put(l3proto); + + if (unlikely(ret < 0)) + goto nfattr_failure; + + proto = nf_ct_proto_find_get(tuple->src.l3num, tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, mask, proto); + nf_ct_proto_put(proto); + if (unlikely(ret < 0)) + goto nfattr_failure; + + NFA_NEST_END(skb, nest_parms); + + return 0; + +nfattr_failure: + return -1; +} + +static inline int ctnetlink_exp_dump_expect(struct sk_buff *skb, const struct nf_conntrack_expect *exp) { @@ -1159,7 +1199,7 @@ ctnetlink_exp_dump_expect(struct sk_buff if (ctnetlink_exp_dump_tuple(skb, &exp->tuple, CTA_EXPECT_TUPLE) < 0) goto nfattr_failure; - if (ctnetlink_exp_dump_tuple(skb, &exp->mask, CTA_EXPECT_MASK) < 0) + if (ctnetlink_exp_dump_mask(skb, &exp->tuple, &exp->mask) < 0) goto nfattr_failure; if (ctnetlink_exp_dump_tuple(skb, &master->tuplehash[IP_CT_DIR_ORIGINAL].tuple, Index: net-2.6.16.git/net/ipv4/netfilter/ip_conntrack_netlink.c =================================================================== --- net-2.6.16.git.orig/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-02-28 16:52:20.000000000 +0100 +++ net-2.6.16.git/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-02-28 16:52:22.000000000 +0100 @@ -4,7 +4,7 @@ * (C) 2001 by Jay Schulist * (C) 2002-2005 by Harald Welte * (C) 2003 by Patrick Mchardy - * (C) 2005 by Pablo Neira Ayuso + * (C) 2005-2006 by Pablo Neira Ayuso * * I've reworked this stuff to use attributes instead of conntrack * structures. 5.44 am. I need more tea. --pablo 05/07/11. @@ -53,20 +53,18 @@ static char __initdata version[] = "0.90 static inline int ctnetlink_dump_tuples_proto(struct sk_buff *skb, - const struct ip_conntrack_tuple *tuple) + const struct ip_conntrack_tuple *tuple, + struct ip_conntrack_protocol *proto) { - struct ip_conntrack_protocol *proto; int ret = 0; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); NFA_PUT(skb, CTA_PROTO_NUM, sizeof(u_int8_t), &tuple->dst.protonum); - /* If no protocol helper is found, this function will return the - * generic protocol helper, so proto won't *ever* be NULL */ - proto = ip_conntrack_proto_find_get(tuple->dst.protonum); if (likely(proto->tuple_to_nfattr)) ret = proto->tuple_to_nfattr(skb, tuple); - ip_conntrack_proto_put(proto); + NFA_NEST_END(skb, nest_parms); return ret; @@ -75,28 +73,41 @@ nfattr_failure: } static inline int -ctnetlink_dump_tuples(struct sk_buff *skb, - const struct ip_conntrack_tuple *tuple) +ctnetlink_dump_tuples_ip(struct sk_buff *skb, + const struct ip_conntrack_tuple *tuple) { - struct nfattr *nest_parms; - int ret; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); - nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); NFA_PUT(skb, CTA_IP_V4_SRC, sizeof(u_int32_t), &tuple->src.ip); NFA_PUT(skb, CTA_IP_V4_DST, sizeof(u_int32_t), &tuple->dst.ip); - NFA_NEST_END(skb, nest_parms); - nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); - ret = ctnetlink_dump_tuples_proto(skb, tuple); NFA_NEST_END(skb, nest_parms); - return ret; + return 0; nfattr_failure: return -1; } static inline int +ctnetlink_dump_tuples(struct sk_buff *skb, + const struct ip_conntrack_tuple *tuple) +{ + int ret; + struct ip_conntrack_protocol *proto; + + ret = ctnetlink_dump_tuples_ip(skb, tuple); + if (unlikely(ret < 0)) + return ret; + + proto = ip_conntrack_proto_find_get(tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, tuple, proto); + ip_conntrack_proto_put(proto); + + return ret; +} + +static inline int ctnetlink_dump_status(struct sk_buff *skb, const struct ip_conntrack *ct) { u_int32_t status = htonl((u_int32_t) ct->status); @@ -1134,6 +1145,32 @@ nfattr_failure: } static inline int +ctnetlink_exp_dump_mask(struct sk_buff *skb, + const struct ip_conntrack_tuple *tuple, + const struct ip_conntrack_tuple *mask) +{ + int ret; + struct ip_conntrack_protocol *proto; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_EXPECT_MASK); + + ret = ctnetlink_dump_tuples_ip(skb, mask); + if (unlikely(ret < 0)) + goto nfattr_failure; + + proto = ip_conntrack_proto_find_get(tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, mask, proto); + if (unlikely(ret < 0)) + goto nfattr_failure; + + NFA_NEST_END(skb, nest_parms); + + return 0; + +nfattr_failure: + return -1; +} + +static inline int ctnetlink_exp_dump_expect(struct sk_buff *skb, const struct ip_conntrack_expect *exp) { @@ -1143,7 +1180,7 @@ ctnetlink_exp_dump_expect(struct sk_buff if (ctnetlink_exp_dump_tuple(skb, &exp->tuple, CTA_EXPECT_TUPLE) < 0) goto nfattr_failure; - if (ctnetlink_exp_dump_tuple(skb, &exp->mask, CTA_EXPECT_MASK) < 0) + if (ctnetlink_exp_dump_mask(skb, &exp->tuple, &exp->mask) < 0) goto nfattr_failure; if (ctnetlink_exp_dump_tuple(skb, &master->tuplehash[IP_CT_DIR_ORIGINAL].tuple, From markoflinux at gmail.com Thu Mar 2 12:12:05 2006 From: markoflinux at gmail.com (Huy Vu Pham) Date: Thu Mar 2 12:24:27 2006 Subject: SIP NAT CONTRACK Module with Netfilter in kernel 2.4.x In-Reply-To: References: Message-ID: Dear Netfilter Devel list, I got problem very strange with Netfilter in linux kernel 2.4.x. I apply contrack/nat SIP protocol (http://openwrt.alphacore.net/patches/buildroot/317-netfilter-nat-sip ) with HELPER module to capture all RTP packets. ( #Out from WAN site: eth1 iptables -t mangle -A POSTROUTING -o eth1 -p UDP -m helper --helper sipd00 -j MARK --set-mark 0x20 #Out from LAN site: eth0 iptables -t mangle -A POSTROUTING -o eth0 -p UDP -m helper --helper sipd00 -j MARK --set-mark 0x21 ) My test case like this: SIP PHONE A (Outside NAT) ----- NAT BOX (Have SIP ALG) ------- SIP PHONE B (Inside NAT). 1. Reboot NAT BOX, A call B. SIP MODULE can capture all RTP packets, Before RTP timeout, I make the call from B to A also OK. 2. Reboot NAT BOX, B call A. SIP MODULE "CAN NOT" capture any RTP packets. Before RTP timeout, I make the call from A to B also got the same problem. What is difference between case (1) and case(2)? I already tested with kernel from 2.4.20 to 2.4.32. The problem is the same. Thanks, From kadlec at blackhole.kfki.hu Thu Mar 2 12:31:56 2006 From: kadlec at blackhole.kfki.hu (Jozsef Kadlecsik) Date: Thu Mar 2 12:43:26 2006 Subject: IPSET patches from pom-ng don't apply to 2.6.16-rc5 In-Reply-To: <44043F49.7050304@gmx.net> References: <44043F49.7050304@gmx.net> Message-ID: On Tue, 28 Feb 2006, Carl-Daniel Hailfinger wrote: > applying the ipset patch from patch-o-matic-ng doesn't work anymore: > > Welcome to Patch-o-matic ($Revision: 4088 $)! > > Kernel: 2.6.16, /storage/linux-2.6.16-rc5 > Iptables: 1.3.5, /storage/iptables-1.3.5 > Each patch is a new feature: many have minimal impact, some do not. > Almost every one has bugs, so don't apply what you don't need! > ------------------------------------------------------- > Already applied: > > Testing set... not applied > The set patch: > Author: Jozsef Kadlecsik > Status: Beta > [...] > ----------------------------------------------------------------- > Do you want to apply this patch [N/y/t/f/a/r/b/w/q/?] t > unable to find ladd slot in src /tmp/pom-26985/net/ipv4/netfilter/Makefile (./patchlets/set/linux-2.6/./net/ipv4/netfilter/Makefile.ladd) The marker points from net/ipv4/netfilter/Makefile were moved into net/netfilter/Makefile thus those cannot be found anymore. I'm working on an updated ipset patch in which I'll deal with this problem. Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From yasuyuki.kozakai at toshiba.co.jp Thu Mar 2 18:13:15 2006 From: yasuyuki.kozakai at toshiba.co.jp (Yasuyuki KOZAKAI) Date: Thu Mar 2 18:25:36 2006 Subject: [PATCH 6/5] [CTNETLINK] Fix expectation mask dumping In-Reply-To: <4405BC65.1030705@netfilter.org> References: <44025F8A.6070302@netfilter.org> <4405BC65.1030705@netfilter.org> Message-ID: <200603021713.k22HDGn9004928@toshiba.co.jp> Hi, Pablo, Sorry, I forgot to check the part of ip_conntrack_netlink.c in previous patch. From: Pablo Neira Ayuso Date: Wed, 01 Mar 2006 16:23:17 +0100 > Index: net-2.6.16.git/net/ipv4/netfilter/ip_conntrack_netlink.c > =================================================================== (snip) > static inline int > +ctnetlink_exp_dump_mask(struct sk_buff *skb, > + const struct ip_conntrack_tuple *tuple, > + const struct ip_conntrack_tuple *mask) > +{ > + int ret; > + struct ip_conntrack_protocol *proto; > + struct nfattr *nest_parms = NFA_NEST(skb, CTA_EXPECT_MASK); > + > + ret = ctnetlink_dump_tuples_ip(skb, mask); > + if (unlikely(ret < 0)) > + goto nfattr_failure; > + > + proto = ip_conntrack_proto_find_get(tuple->dst.protonum); > + ret = ctnetlink_dump_tuples_proto(skb, mask, proto); > + if (unlikely(ret < 0)) > + goto nfattr_failure; > + > + NFA_NEST_END(skb, nest_parms); > + > + return 0; > + > +nfattr_failure: > + return -1; > +} ip_conntrack_proto_put() is missing here. Hope I don't miss catching any more... -- Yasuyuki Kozakai From yasuyuki.kozakai at toshiba.co.jp Thu Mar 2 18:41:08 2006 From: yasuyuki.kozakai at toshiba.co.jp (Yasuyuki KOZAKAI) Date: Thu Mar 2 18:53:27 2006 Subject: ipv6 conntrack status? In-Reply-To: <20060225111316.GA13280@mx.ytti.net> References: <20060225111316.GA13280@mx.ytti.net> Message-ID: <200603021741.k22Hf8RP011900@toshiba.co.jp> From: Saku Ytti Date: Sat, 25 Feb 2006 13:13:16 +0200 > Hey, > > I'm running 2.6.16-rc4, I've rolled and installed: > conntrack_1.00beta1_amd64.deb > libconntrack-extensions_1.00beta1_amd64.deb > libnetfilter-conntrack1_0.0.30_amd64.deb > libnetfilter-conntrack-dev_0.0.30_amd64.deb > libnfnetlink0_0.0.14_amd64.deb > libnfnetlink-dev_0.0.14_amd64.deb > > I checked SVN for iptables 1.4 branch, but couldn't see ip6 conntrack. > > [root@ip.fi ~]# conntrack -L -f ipv6 > Operation failed: Unknown error 18446744073709551519 > > ipv4 works perfectly. What am I missing? I couldn't really find any > documentation how the new architecture should be configured, if there > is such, please point me to it. > > I've previously used ipv6 conntrack from usagi patched without problems. You can use nf_conntrack_ipv6 in mainline kernel, not ip6_conntrack. To build it, please disable IP_NF_CONNTRACK, enable NF_CONNTRACK and NF_CONNTRACK_IPV6. NF_CONNTRACK is in the menu "Core Netfilter Configuration", and NF_CONNTRACK_IPV6 in the menu "IPv6: Netfilter Configuration (EXPERIMENTAL)". Regards, -- Yasuyuki Kozakai From pablo at eurodev.net Thu Mar 2 20:16:50 2006 From: pablo at eurodev.net (Pablo Neira Ayuso) Date: Thu Mar 2 20:28:02 2006 Subject: [PATCH 7/5] [CTNETLINK] Fix expectation mask dumping In-Reply-To: <44025F8A.6070302@netfilter.org> References: <44025F8A.6070302@netfilter.org> Message-ID: <440744A2.3000307@eurodev.net> This patch applies to 2.6.16. This patch introduces the function ctnetlink_exp_dump_mask, that correctly dumps the expectation mask. Such function uses the l3num value from the expectation tuple that is a valid layer 3 protocol number. The value of the l3num mask isn't dumped since it is meaningless from the userspace side. -- The dawn of the fourth age of Linux firewalling is coming; a time of great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris -------------- next part -------------- [CTNETLINK] Fix expectaction mask dumping The expectation mask has some particularities that requires a different handling. The protocol number fields can be set to non-valid protocols, ie. l3num is set to 0xFFFF. Since that protocol does not exist, the mask tuple will not be dumped. Moreover, this results in a kernel panic when nf_conntrack accesses the array of protocol handlers, that is PF_MAX (0x1F) long. This patch introduces the function ctnetlink_exp_dump_mask, that correctly dumps the expectation mask. Such function uses the l3num value from the expectation tuple that is a valid layer 3 protocol number. The value of the l3num mask isn't dumped since it is meaningless from the userspace side. Thanks to Yasuyuki Kozakai and Patrick McHardy for the feedback. Signed-off-by: Pablo Neira Ayuso Index: net-2.6.16.git/net/netfilter/nf_conntrack_netlink.c =================================================================== --- net-2.6.16.git.orig/net/netfilter/nf_conntrack_netlink.c 2006-02-28 16:52:20.000000000 +0100 +++ net-2.6.16.git/net/netfilter/nf_conntrack_netlink.c 2006-02-28 16:52:22.000000000 +0100 @@ -4,7 +4,7 @@ * (C) 2001 by Jay Schulist * (C) 2002-2005 by Harald Welte * (C) 2003 by Patrick Mchardy - * (C) 2005 by Pablo Neira Ayuso + * (C) 2005-2006 by Pablo Neira Ayuso * * I've reworked this stuff to use attributes instead of conntrack * structures. 5.44 am. I need more tea. --pablo 05/07/11. @@ -55,20 +55,18 @@ static char __initdata version[] = "0.92 static inline int ctnetlink_dump_tuples_proto(struct sk_buff *skb, - const struct nf_conntrack_tuple *tuple) + const struct nf_conntrack_tuple *tuple, + struct nf_conntrack_protocol *proto) { - struct nf_conntrack_protocol *proto; int ret = 0; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); NFA_PUT(skb, CTA_PROTO_NUM, sizeof(u_int8_t), &tuple->dst.protonum); - /* If no protocol helper is found, this function will return the - * generic protocol helper, so proto won't *ever* be NULL */ - proto = nf_ct_proto_find_get(tuple->src.l3num, tuple->dst.protonum); if (likely(proto->tuple_to_nfattr)) ret = proto->tuple_to_nfattr(skb, tuple); - nf_ct_proto_put(proto); + NFA_NEST_END(skb, nest_parms); return ret; @@ -77,33 +75,44 @@ nfattr_failure: } static inline int -ctnetlink_dump_tuples(struct sk_buff *skb, - const struct nf_conntrack_tuple *tuple) +ctnetlink_dump_tuples_ip(struct sk_buff *skb, + const struct nf_conntrack_tuple *tuple, + struct nf_conntrack_l3proto *l3proto) { - struct nfattr *nest_parms; - struct nf_conntrack_l3proto *l3proto; int ret = 0; - - l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); - - nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); + if (likely(l3proto->tuple_to_nfattr)) ret = l3proto->tuple_to_nfattr(skb, tuple); + NFA_NEST_END(skb, nest_parms); + return ret; + +nfattr_failure: + return -1; +} + +static inline int +ctnetlink_dump_tuples(struct sk_buff *skb, + const struct nf_conntrack_tuple *tuple) +{ + int ret; + struct nf_conntrack_l3proto *l3proto; + struct nf_conntrack_protocol *proto; + + l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); + ret = ctnetlink_dump_tuples_ip(skb, tuple, l3proto); nf_ct_l3proto_put(l3proto); if (unlikely(ret < 0)) return ret; - nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); - ret = ctnetlink_dump_tuples_proto(skb, tuple); - NFA_NEST_END(skb, nest_parms); + proto = nf_ct_proto_find_get(tuple->src.l3num, tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, tuple, proto); + nf_ct_proto_put(proto); return ret; - -nfattr_failure: - return -1; } static inline int @@ -1150,6 +1159,37 @@ nfattr_failure: } static inline int +ctnetlink_exp_dump_mask(struct sk_buff *skb, + const struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_tuple *mask) +{ + int ret; + struct nf_conntrack_l3proto *l3proto; + struct nf_conntrack_protocol *proto; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_EXPECT_MASK); + + l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); + ret = ctnetlink_dump_tuples_ip(skb, mask, l3proto); + nf_ct_l3proto_put(l3proto); + + if (unlikely(ret < 0)) + goto nfattr_failure; + + proto = nf_ct_proto_find_get(tuple->src.l3num, tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, mask, proto); + nf_ct_proto_put(proto); + if (unlikely(ret < 0)) + goto nfattr_failure; + + NFA_NEST_END(skb, nest_parms); + + return 0; + +nfattr_failure: + return -1; +} + +static inline int ctnetlink_exp_dump_expect(struct sk_buff *skb, const struct nf_conntrack_expect *exp) { @@ -1159,7 +1199,7 @@ ctnetlink_exp_dump_expect(struct sk_buff if (ctnetlink_exp_dump_tuple(skb, &exp->tuple, CTA_EXPECT_TUPLE) < 0) goto nfattr_failure; - if (ctnetlink_exp_dump_tuple(skb, &exp->mask, CTA_EXPECT_MASK) < 0) + if (ctnetlink_exp_dump_mask(skb, &exp->tuple, &exp->mask) < 0) goto nfattr_failure; if (ctnetlink_exp_dump_tuple(skb, &master->tuplehash[IP_CT_DIR_ORIGINAL].tuple, Index: net-2.6.16.git/net/ipv4/netfilter/ip_conntrack_netlink.c =================================================================== --- net-2.6.16.git.orig/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-02-28 16:52:20.000000000 +0100 +++ net-2.6.16.git/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-03-02 20:15:17.000000000 +0100 @@ -4,7 +4,7 @@ * (C) 2001 by Jay Schulist * (C) 2002-2005 by Harald Welte * (C) 2003 by Patrick Mchardy - * (C) 2005 by Pablo Neira Ayuso + * (C) 2005-2006 by Pablo Neira Ayuso * * I've reworked this stuff to use attributes instead of conntrack * structures. 5.44 am. I need more tea. --pablo 05/07/11. @@ -53,20 +53,18 @@ static char __initdata version[] = "0.90 static inline int ctnetlink_dump_tuples_proto(struct sk_buff *skb, - const struct ip_conntrack_tuple *tuple) + const struct ip_conntrack_tuple *tuple, + struct ip_conntrack_protocol *proto) { - struct ip_conntrack_protocol *proto; int ret = 0; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); NFA_PUT(skb, CTA_PROTO_NUM, sizeof(u_int8_t), &tuple->dst.protonum); - /* If no protocol helper is found, this function will return the - * generic protocol helper, so proto won't *ever* be NULL */ - proto = ip_conntrack_proto_find_get(tuple->dst.protonum); if (likely(proto->tuple_to_nfattr)) ret = proto->tuple_to_nfattr(skb, tuple); - ip_conntrack_proto_put(proto); + NFA_NEST_END(skb, nest_parms); return ret; @@ -75,28 +73,41 @@ nfattr_failure: } static inline int -ctnetlink_dump_tuples(struct sk_buff *skb, - const struct ip_conntrack_tuple *tuple) +ctnetlink_dump_tuples_ip(struct sk_buff *skb, + const struct ip_conntrack_tuple *tuple) { - struct nfattr *nest_parms; - int ret; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); - nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); NFA_PUT(skb, CTA_IP_V4_SRC, sizeof(u_int32_t), &tuple->src.ip); NFA_PUT(skb, CTA_IP_V4_DST, sizeof(u_int32_t), &tuple->dst.ip); - NFA_NEST_END(skb, nest_parms); - nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); - ret = ctnetlink_dump_tuples_proto(skb, tuple); NFA_NEST_END(skb, nest_parms); - return ret; + return 0; nfattr_failure: return -1; } static inline int +ctnetlink_dump_tuples(struct sk_buff *skb, + const struct ip_conntrack_tuple *tuple) +{ + int ret; + struct ip_conntrack_protocol *proto; + + ret = ctnetlink_dump_tuples_ip(skb, tuple); + if (unlikely(ret < 0)) + return ret; + + proto = ip_conntrack_proto_find_get(tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, tuple, proto); + ip_conntrack_proto_put(proto); + + return ret; +} + +static inline int ctnetlink_dump_status(struct sk_buff *skb, const struct ip_conntrack *ct) { u_int32_t status = htonl((u_int32_t) ct->status); @@ -1134,6 +1145,33 @@ nfattr_failure: } static inline int +ctnetlink_exp_dump_mask(struct sk_buff *skb, + const struct ip_conntrack_tuple *tuple, + const struct ip_conntrack_tuple *mask) +{ + int ret; + struct ip_conntrack_protocol *proto; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_EXPECT_MASK); + + ret = ctnetlink_dump_tuples_ip(skb, mask); + if (unlikely(ret < 0)) + goto nfattr_failure; + + proto = ip_conntrack_proto_find_get(tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, mask, proto); + if (unlikely(ret < 0)) + goto nfattr_failure; + + ip_conntrack_proto_put(proto); + NFA_NEST_END(skb, nest_parms); + + return 0; + +nfattr_failure: + return -1; +} + +static inline int ctnetlink_exp_dump_expect(struct sk_buff *skb, const struct ip_conntrack_expect *exp) { @@ -1143,7 +1181,7 @@ ctnetlink_exp_dump_expect(struct sk_buff if (ctnetlink_exp_dump_tuple(skb, &exp->tuple, CTA_EXPECT_TUPLE) < 0) goto nfattr_failure; - if (ctnetlink_exp_dump_tuple(skb, &exp->mask, CTA_EXPECT_MASK) < 0) + if (ctnetlink_exp_dump_mask(skb, &exp->tuple, &exp->mask) < 0) goto nfattr_failure; if (ctnetlink_exp_dump_tuple(skb, &master->tuplehash[IP_CT_DIR_ORIGINAL].tuple, From pablo at eurodev.net Thu Mar 2 20:23:58 2006 From: pablo at eurodev.net (Pablo Neira Ayuso) Date: Thu Mar 2 20:35:04 2006 Subject: [PATCH 7/5] [CTNETLINK] Fix expectation mask dumping In-Reply-To: <440744A2.3000307@eurodev.net> References: <44025F8A.6070302@netfilter.org> <440744A2.3000307@eurodev.net> Message-ID: <4407464E.70003@eurodev.net> Pablo Neira Ayuso wrote: > This patch applies to 2.6.16. > > This patch introduces the function ctnetlink_exp_dump_mask, that > correctly dumps the expectation mask. Such function uses the l3num value > from the expectation tuple that is a valid layer 3 protocol number. > > The value of the l3num mask isn't dumped since it is meaningless from > the userspace side. This patch is crap, I'll resend later... sometimes I'm stupid. -- Pablo From pablo at eurodev.net Thu Mar 2 20:36:59 2006 From: pablo at eurodev.net (Pablo Neira Ayuso) Date: Thu Mar 2 20:48:08 2006 Subject: [PATCH] [CTNETLINK] Fix expectation mask dumping, take n+1 Message-ID: <4407495B.5010407@eurodev.net> Does this patch requires an explanation again? Well, here it goes: This patch applies to 2.6.16. This patch introduces the function ctnetlink_exp_dump_mask, that correctly dumps the expectation mask. Such function uses the l3num value from the expectation tuple that is a valid layer 3 protocol number. The value of the l3num mask isn't dumped since it is meaningless from the userspace side. -- Pablo -------------- next part -------------- [CTNETLINK] Fix expectaction mask dumping The expectation mask has some particularities that requires a different handling. The protocol number fields can be set to non-valid protocols, ie. l3num is set to 0xFFFF. Since that protocol does not exist, the mask tuple will not be dumped. Moreover, this results in a kernel panic when nf_conntrack accesses the array of protocol handlers, that is PF_MAX (0x1F) long. This patch introduces the function ctnetlink_exp_dump_mask, that correctly dumps the expectation mask. Such function uses the l3num value from the expectation tuple that is a valid layer 3 protocol number. The value of the l3num mask isn't dumped since it is meaningless from the userspace side. Thanks to Yasuyuki Kozakai and Patrick McHardy for the feedback. Signed-off-by: Pablo Neira Ayuso Index: net-2.6.16.git/net/netfilter/nf_conntrack_netlink.c =================================================================== --- net-2.6.16.git.orig/net/netfilter/nf_conntrack_netlink.c 2006-02-28 16:52:20.000000000 +0100 +++ net-2.6.16.git/net/netfilter/nf_conntrack_netlink.c 2006-02-28 16:52:22.000000000 +0100 @@ -4,7 +4,7 @@ * (C) 2001 by Jay Schulist * (C) 2002-2005 by Harald Welte * (C) 2003 by Patrick Mchardy - * (C) 2005 by Pablo Neira Ayuso + * (C) 2005-2006 by Pablo Neira Ayuso * * I've reworked this stuff to use attributes instead of conntrack * structures. 5.44 am. I need more tea. --pablo 05/07/11. @@ -55,20 +55,18 @@ static char __initdata version[] = "0.92 static inline int ctnetlink_dump_tuples_proto(struct sk_buff *skb, - const struct nf_conntrack_tuple *tuple) + const struct nf_conntrack_tuple *tuple, + struct nf_conntrack_protocol *proto) { - struct nf_conntrack_protocol *proto; int ret = 0; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); NFA_PUT(skb, CTA_PROTO_NUM, sizeof(u_int8_t), &tuple->dst.protonum); - /* If no protocol helper is found, this function will return the - * generic protocol helper, so proto won't *ever* be NULL */ - proto = nf_ct_proto_find_get(tuple->src.l3num, tuple->dst.protonum); if (likely(proto->tuple_to_nfattr)) ret = proto->tuple_to_nfattr(skb, tuple); - nf_ct_proto_put(proto); + NFA_NEST_END(skb, nest_parms); return ret; @@ -77,33 +75,44 @@ nfattr_failure: } static inline int -ctnetlink_dump_tuples(struct sk_buff *skb, - const struct nf_conntrack_tuple *tuple) +ctnetlink_dump_tuples_ip(struct sk_buff *skb, + const struct nf_conntrack_tuple *tuple, + struct nf_conntrack_l3proto *l3proto) { - struct nfattr *nest_parms; - struct nf_conntrack_l3proto *l3proto; int ret = 0; - - l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); - - nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); + if (likely(l3proto->tuple_to_nfattr)) ret = l3proto->tuple_to_nfattr(skb, tuple); + NFA_NEST_END(skb, nest_parms); + return ret; + +nfattr_failure: + return -1; +} + +static inline int +ctnetlink_dump_tuples(struct sk_buff *skb, + const struct nf_conntrack_tuple *tuple) +{ + int ret; + struct nf_conntrack_l3proto *l3proto; + struct nf_conntrack_protocol *proto; + + l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); + ret = ctnetlink_dump_tuples_ip(skb, tuple, l3proto); nf_ct_l3proto_put(l3proto); if (unlikely(ret < 0)) return ret; - nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); - ret = ctnetlink_dump_tuples_proto(skb, tuple); - NFA_NEST_END(skb, nest_parms); + proto = nf_ct_proto_find_get(tuple->src.l3num, tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, tuple, proto); + nf_ct_proto_put(proto); return ret; - -nfattr_failure: - return -1; } static inline int @@ -1150,6 +1159,37 @@ nfattr_failure: } static inline int +ctnetlink_exp_dump_mask(struct sk_buff *skb, + const struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_tuple *mask) +{ + int ret; + struct nf_conntrack_l3proto *l3proto; + struct nf_conntrack_protocol *proto; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_EXPECT_MASK); + + l3proto = nf_ct_l3proto_find_get(tuple->src.l3num); + ret = ctnetlink_dump_tuples_ip(skb, mask, l3proto); + nf_ct_l3proto_put(l3proto); + + if (unlikely(ret < 0)) + goto nfattr_failure; + + proto = nf_ct_proto_find_get(tuple->src.l3num, tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, mask, proto); + nf_ct_proto_put(proto); + if (unlikely(ret < 0)) + goto nfattr_failure; + + NFA_NEST_END(skb, nest_parms); + + return 0; + +nfattr_failure: + return -1; +} + +static inline int ctnetlink_exp_dump_expect(struct sk_buff *skb, const struct nf_conntrack_expect *exp) { @@ -1159,7 +1199,7 @@ ctnetlink_exp_dump_expect(struct sk_buff if (ctnetlink_exp_dump_tuple(skb, &exp->tuple, CTA_EXPECT_TUPLE) < 0) goto nfattr_failure; - if (ctnetlink_exp_dump_tuple(skb, &exp->mask, CTA_EXPECT_MASK) < 0) + if (ctnetlink_exp_dump_mask(skb, &exp->tuple, &exp->mask) < 0) goto nfattr_failure; if (ctnetlink_exp_dump_tuple(skb, &master->tuplehash[IP_CT_DIR_ORIGINAL].tuple, Index: net-2.6.16.git/net/ipv4/netfilter/ip_conntrack_netlink.c =================================================================== --- net-2.6.16.git.orig/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-02-28 16:52:20.000000000 +0100 +++ net-2.6.16.git/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-03-02 20:25:06.000000000 +0100 @@ -4,7 +4,7 @@ * (C) 2001 by Jay Schulist * (C) 2002-2005 by Harald Welte * (C) 2003 by Patrick Mchardy - * (C) 2005 by Pablo Neira Ayuso + * (C) 2005-2006 by Pablo Neira Ayuso * * I've reworked this stuff to use attributes instead of conntrack * structures. 5.44 am. I need more tea. --pablo 05/07/11. @@ -53,20 +53,18 @@ static char __initdata version[] = "0.90 static inline int ctnetlink_dump_tuples_proto(struct sk_buff *skb, - const struct ip_conntrack_tuple *tuple) + const struct ip_conntrack_tuple *tuple, + struct ip_conntrack_protocol *proto) { - struct ip_conntrack_protocol *proto; int ret = 0; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); NFA_PUT(skb, CTA_PROTO_NUM, sizeof(u_int8_t), &tuple->dst.protonum); - /* If no protocol helper is found, this function will return the - * generic protocol helper, so proto won't *ever* be NULL */ - proto = ip_conntrack_proto_find_get(tuple->dst.protonum); if (likely(proto->tuple_to_nfattr)) ret = proto->tuple_to_nfattr(skb, tuple); - ip_conntrack_proto_put(proto); + NFA_NEST_END(skb, nest_parms); return ret; @@ -75,28 +73,41 @@ nfattr_failure: } static inline int -ctnetlink_dump_tuples(struct sk_buff *skb, - const struct ip_conntrack_tuple *tuple) +ctnetlink_dump_tuples_ip(struct sk_buff *skb, + const struct ip_conntrack_tuple *tuple) { - struct nfattr *nest_parms; - int ret; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); - nest_parms = NFA_NEST(skb, CTA_TUPLE_IP); NFA_PUT(skb, CTA_IP_V4_SRC, sizeof(u_int32_t), &tuple->src.ip); NFA_PUT(skb, CTA_IP_V4_DST, sizeof(u_int32_t), &tuple->dst.ip); - NFA_NEST_END(skb, nest_parms); - nest_parms = NFA_NEST(skb, CTA_TUPLE_PROTO); - ret = ctnetlink_dump_tuples_proto(skb, tuple); NFA_NEST_END(skb, nest_parms); - return ret; + return 0; nfattr_failure: return -1; } static inline int +ctnetlink_dump_tuples(struct sk_buff *skb, + const struct ip_conntrack_tuple *tuple) +{ + int ret; + struct ip_conntrack_protocol *proto; + + ret = ctnetlink_dump_tuples_ip(skb, tuple); + if (unlikely(ret < 0)) + return ret; + + proto = ip_conntrack_proto_find_get(tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, tuple, proto); + ip_conntrack_proto_put(proto); + + return ret; +} + +static inline int ctnetlink_dump_status(struct sk_buff *skb, const struct ip_conntrack *ct) { u_int32_t status = htonl((u_int32_t) ct->status); @@ -1134,6 +1145,33 @@ nfattr_failure: } static inline int +ctnetlink_exp_dump_mask(struct sk_buff *skb, + const struct ip_conntrack_tuple *tuple, + const struct ip_conntrack_tuple *mask) +{ + int ret; + struct ip_conntrack_protocol *proto; + struct nfattr *nest_parms = NFA_NEST(skb, CTA_EXPECT_MASK); + + ret = ctnetlink_dump_tuples_ip(skb, mask); + if (unlikely(ret < 0)) + goto nfattr_failure; + + proto = ip_conntrack_proto_find_get(tuple->dst.protonum); + ret = ctnetlink_dump_tuples_proto(skb, mask, proto); + ip_conntrack_proto_put(proto); + if (unlikely(ret < 0)) + goto nfattr_failure; + + NFA_NEST_END(skb, nest_parms); + + return 0; + +nfattr_failure: + return -1; +} + +static inline int ctnetlink_exp_dump_expect(struct sk_buff *skb, const struct ip_conntrack_expect *exp) { @@ -1143,7 +1181,7 @@ ctnetlink_exp_dump_expect(struct sk_buff if (ctnetlink_exp_dump_tuple(skb, &exp->tuple, CTA_EXPECT_TUPLE) < 0) goto nfattr_failure; - if (ctnetlink_exp_dump_tuple(skb, &exp->mask, CTA_EXPECT_MASK) < 0) + if (ctnetlink_exp_dump_mask(skb, &exp->tuple, &exp->mask) < 0) goto nfattr_failure; if (ctnetlink_exp_dump_tuple(skb, &master->tuplehash[IP_CT_DIR_ORIGINAL].tuple, From gervasiobernal at speedy.com.ar Thu Mar 2 21:29:08 2006 From: gervasiobernal at speedy.com.ar (Gervasio Bernal) Date: Thu Mar 2 21:41:34 2006 Subject: Table NAT and MANGLE Message-ID: <44075594.3000608@speedy.com.ar> Hi all!! Suppose I have this 2 rules, one in mangle and the other one in NAT table: #iptables -t mangle -A POSTROUTING -o eth0 -j TTL --ttl-set 64 #iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE So, every time a packet goes out through eth0 first it sets the TTL to 64 and then do the masquerade. Is there any way to do this but in opposite order? First masquerade and the TTL. Maybe this example is not very clear, but I'm developing a module that needs to match a packet after masquerade. Thanks. From suparnamisri at yahoo.com Fri Mar 3 06:08:20 2006 From: suparnamisri at yahoo.com (suparna misri) Date: Fri Mar 3 06:20:40 2006 Subject: PLZ HELP Message-ID: <20060303050820.31966.qmail@web32807.mail.mud.yahoo.com> Hello, I am doing a small project in networking in which I will compress packets at sender side and receiver will decompress the packets.I am using "iptables ip_queue module" for capturing and changing the contents of packets which I know how to do. My problem is that if packet is lost and does not reach to the receiver then it cannot decompress it . How can receiver notify sender(eg send nack) that it does not have the required packet so that sender can send it again. Should I build and send a new packet or can I use "iptables ip_queue modules" for the same. I hope I could explain my point well. I will really appreciate any help. Thanks. Sincerely, Suparna. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From kadlec at blackhole.kfki.hu Fri Mar 3 10:20:05 2006 From: kadlec at blackhole.kfki.hu (Jozsef Kadlecsik) Date: Fri Mar 3 10:31:40 2006 Subject: IPSET patches from pom-ng don't apply to 2.6.16-rc5 In-Reply-To: References: <44043F49.7050304@gmx.net> Message-ID: On Thu, 2 Mar 2006, Jozsef Kadlecsik wrote: > On Tue, 28 Feb 2006, Carl-Daniel Hailfinger wrote: > > > applying the ipset patch from patch-o-matic-ng doesn't work anymore: > > > > Welcome to Patch-o-matic ($Revision: 4088 $)! > > > > Kernel: 2.6.16, /storage/linux-2.6.16-rc5 > > Iptables: 1.3.5, /storage/iptables-1.3.5 > > Each patch is a new feature: many have minimal impact, some do not. > > Almost every one has bugs, so don't apply what you don't need! > > ------------------------------------------------------- > > Already applied: > > > > Testing set... not applied > > The set patch: > > Author: Jozsef Kadlecsik > > Status: Beta > > [...] > > ----------------------------------------------------------------- > > Do you want to apply this patch [N/y/t/f/a/r/b/w/q/?] t > > unable to find ladd slot in src /tmp/pom-26985/net/ipv4/netfilter/Makefile (./patchlets/set/linux-2.6/./net/ipv4/netfilter/Makefile.ladd) > > The marker points from net/ipv4/netfilter/Makefile were moved into > net/netfilter/Makefile thus those cannot be found anymore. It is fixed in svn. Best regards, Jozsf - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From saku at ytti.fi Thu Mar 2 18:58:54 2006 From: saku at ytti.fi (Saku Ytti) Date: Fri Mar 3 14:11:23 2006 Subject: ipv6 conntrack status? In-Reply-To: <200603021741.k22Hf8Hr013260@toshiba.co.jp> References: <20060225111316.GA13280@mx.ytti.net> <200603021741.k22Hf8Hr013260@toshiba.co.jp> Message-ID: <20060302175854.GA741@mx.ytti.net> On (2006-03-03 02:41 +0900), Yasuyuki KOZAKAI wrote: > You can use nf_conntrack_ipv6 in mainline kernel, not ip6_conntrack. > To build it, please disable IP_NF_CONNTRACK, enable NF_CONNTRACK and > NF_CONNTRACK_IPV6. NF_CONNTRACK is in the menu "Core Netfilter Configuration", > and NF_CONNTRACK_IPV6 in the menu "IPv6: Netfilter Configuration > (EXPERIMENTAL)". Indeed menuconfig hides the new conntrack if old one is chosen, as was my case. Thanks! -- ++ytti -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : /pipermail/netfilter-devel/attachments/20060302/a6494434/attachment.pgp From kgy at deverto.com Fri Mar 3 09:44:11 2006 From: kgy at deverto.com (Kovesdi Gyorgy) Date: Fri Mar 3 14:11:24 2006 Subject: output device Message-ID: <200603030944.11858.kgy@deverto.com> Hi, I would like to set the output device in a rule (it is needed due to overlapping addresses). AFAIK, it cannot be done directly in a rule (if I am wrong, plese tell me immediately). I tried using iptables2 for this: it is working, but significantly slower (3...5 times) than a single routing rule. I think it should be added to the netfilter and conntrack system (keeping its speed), but I don't know how much work such an expansion costs, and I am not familiar enough in developing this field. Do you have any idea? Thanks in advance Gyorgy Kovesdi From azez at ufomechanic.net Fri Mar 3 14:33:40 2006 From: azez at ufomechanic.net (Amin Azez) Date: Fri Mar 3 14:46:41 2006 Subject: output device In-Reply-To: <200603030944.11858.kgy@deverto.com> References: <200603030944.11858.kgy@deverto.com> Message-ID: <440845B4.3000208@ufomechanic.net> Kovesdi Gyorgy wrote: > Hi, > > I would like to set the output device in a rule (it is needed due to > overlapping addresses). AFAIK, it cannot be done directly in a rule (if I am > wrong, plese tell me immediately). I tried using iptables2 for this: it is > working, but significantly slower (3...5 times) than a single routing rule. > I think it should be added to the netfilter and conntrack system (keeping its > speed), but I don't know how much work such an expansion costs, and I am not > familiar enough in developing this field. > Do you have any idea? why can't you use ipt_route ? Sam From cnguyen at certicom.com Fri Mar 3 17:04:11 2006 From: cnguyen at certicom.com (Chinh Nguyen) Date: Fri Mar 3 17:26:41 2006 Subject: Table NAT and MANGLE In-Reply-To: <44075594.3000608@speedy.com.ar> References: <44075594.3000608@speedy.com.ar> Message-ID: <440868FB.2080703@certicom.com> Gervasio Bernal wrote: > Hi all!! > > Suppose I have this 2 rules, one in mangle and the other one in NAT table: > > #iptables -t mangle -A POSTROUTING -o eth0 -j TTL --ttl-set 64 > #iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE > > So, every time a packet goes out through eth0 first it sets the TTL to > 64 and then do the masquerade. Is there any way to do this but in > opposite order? First masquerade and the TTL. > Maybe this example is not very clear, but I'm developing a module that > needs to match a packet after masquerade. > > Thanks. > >From the manual for SNAT "... and rules should cease being examined". You can't do anything after SNAT. Since MASQUERADE is kind of a special case of SNAT, IMO, you can't use another rule after MASQUERADE either. It should be noted that there seems to be at least 1 exception, although I don't know if there are others. For example, with the latest iptables & kernel (2.6.16-rc4 as of this writing), you can SNAT (probably MASQUERADE) or DNAT before encrypting a packet with IPsec. This is accomplished by using the -m policy module. iptables -A POSTROUTING -t nat -m policy ... -j SNAT. In other words, any packet that matches an IPSec policy can then be SNAT/DNAT before encryption/decryption. Practically, there is an action (encryption) that is applied after the NAT. Functionally though, it's still only 1 SNAT/DNAT rule after which no other will apply. But perhaps there are other "2 rules in 1" exceptions. I don't know myself. Regards. From gregor at net.in.tum.de Fri Mar 3 22:01:00 2006 From: gregor at net.in.tum.de (Gregor Maier) Date: Fri Mar 3 22:13:10 2006 Subject: [PATCH] Fix wrong option in Makefile for CONFIG_BRIDGE_EBT_ULOG Message-ID: <20060303210100.GA11792@net.in.tum.de> [PATCH] Fix wrong option spelling in Makefile for CONFIG_BRIDGE_EBT_ULOG Signed-off-by: Gregor Maier diff --git a/net/bridge/netfilter/Makefile b/net/bridge/netfilter/Makefile index 8bf6d9f..905087e 100644 --- a/net/bridge/netfilter/Makefile +++ b/net/bridge/netfilter/Makefile @@ -29,4 +29,4 @@ obj-$(CONFIG_BRIDGE_EBT_SNAT) += ebt_sna # watchers obj-$(CONFIG_BRIDGE_EBT_LOG) += ebt_log.o -obj-$(CONFIG_BRIDGE_EBT_LOG) += ebt_ulog.o +obj-$(CONFIG_BRIDGE_EBT_ULOG) += ebt_ulog.o From tzuhsi.peng at msa.hinet.net Sat Mar 4 05:02:44 2006 From: tzuhsi.peng at msa.hinet.net (Jesse Peng) Date: Sat Mar 4 05:15:30 2006 Subject: NAT behave - hairpinning References: Message-ID: <002701c63f40$81b38300$3478fea9@acer21ce70712f> Dear Thomas Here is the link to the patch which handles P2P(Peer-to-Peer)traffic with hole punch and hairpin concerned according to linux netfilter nat behave,Please reference. Patch: http://lists.netfilter.org/pipermail/netfilter-devel/2006-January/023084.html Coding hint: http://lists.netfilter.org/pipermail/netfilter-devel/2005-December/022584.html Theory: http://lists.netfilter.org/pipermail/netfilter-devel/2004-November/017479.html Your sincerely Jesse ----- Original Message ----- From: "Thomas Gelf" To: Sent: Wednesday, March 01, 2006 8:42 PM Subject: NAT behave - hairpinning > Hi all, > > I googled around for a some time to find information regarding "hairpin" > support for netfilter. > > Reason: I don't like my public SIP Registrar / Proxy to redirect calls > between my local STUN-enabled and NATed SIP clients to an external RTP > Proxy (no, sip-conntrack-nat is not what I'm looking for). > > The most recent hint relating this topic I've managed to find on Rusty's > bleeding edge page: http://ozlabs.org/~rusty/index.cgi/2005#2005-10-10 > > It would be great if someone could give me some additional information: > - is hairpinning meant to be supported in near-future official linux > kernels? > - is it planned to provide such a patch on patch-o-matic? > > Thanks a lot! > > Kind regards, > Thomas Gelf > > > -- > Thomas Gelf > > > From kaber at trash.net Sat Mar 4 10:00:14 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Mar 4 10:14:28 2006 Subject: [PATCH] Fix wrong option in Makefile for CONFIG_BRIDGE_EBT_ULOG In-Reply-To: <20060303210100.GA11792@net.in.tum.de> References: <20060303210100.GA11792@net.in.tum.de> Message-ID: <4409571E.2030309@trash.net> Gregor Maier wrote: > [PATCH] Fix wrong option spelling in Makefile for CONFIG_BRIDGE_EBT_ULOG Applied, thanks. I'll try to get it in 2.6.16. From kaber at trash.net Sat Mar 4 10:03:56 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Mar 4 10:18:04 2006 Subject: ipv6 conntrack status? In-Reply-To: <20060302175854.GA741@mx.ytti.net> References: <20060225111316.GA13280@mx.ytti.net> <200603021741.k22Hf8Hr013260@toshiba.co.jp> <20060302175854.GA741@mx.ytti.net> Message-ID: <440957FC.6050506@trash.net> Saku Ytti wrote: > On (2006-03-03 02:41 +0900), Yasuyuki KOZAKAI wrote: > > >>You can use nf_conntrack_ipv6 in mainline kernel, not ip6_conntrack. >>To build it, please disable IP_NF_CONNTRACK, enable NF_CONNTRACK and >>NF_CONNTRACK_IPV6. NF_CONNTRACK is in the menu "Core Netfilter Configuration", >>and NF_CONNTRACK_IPV6 in the menu "IPv6: Netfilter Configuration >>(EXPERIMENTAL)". > > > Indeed menuconfig hides the new conntrack if old one is chosen, > as was my case. Maybe we can find a better way to choose between the two conntrack implementations. Right now its really confusing, IP_NF_CONNTRACK hides NF_CONNTRACK, but if NF_CONNTRACK is selected I can still choose IP_NF_CONNTRACK even though it won't work. Maybe a "choice" option? From kaber at trash.net Sat Mar 4 10:04:53 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Mar 4 10:19:00 2006 Subject: SIP NAT CONTRACK Module with Netfilter in kernel 2.4.x In-Reply-To: References: Message-ID: <44095835.7010703@trash.net> Huy Vu Pham wrote: > Dear Netfilter Devel list, > I got problem very strange with Netfilter in linux kernel 2.4.x. > > I apply contrack/nat SIP protocol > (http://openwrt.alphacore.net/patches/buildroot/317-netfilter-nat-sip > ) with HELPER module to capture all RTP packets. > ( > #Out from WAN site: eth1 > iptables -t mangle -A POSTROUTING -o eth1 -p UDP -m helper --helper > sipd00 -j MARK --set-mark 0x20 > #Out from LAN site: eth0 > iptables -t mangle -A POSTROUTING -o eth0 -p UDP -m helper --helper > sipd00 -j MARK --set-mark 0x21 > ) > > My test case like this: > SIP PHONE A (Outside NAT) ----- NAT BOX (Have SIP ALG) ------- SIP > PHONE B (Inside NAT). > > 1. Reboot NAT BOX, A call B. SIP MODULE can capture all RTP packets, > Before RTP timeout, I make the call from B to A also OK. > > 2. Reboot NAT BOX, B call A. SIP MODULE "CAN NOT" capture any RTP packets. > Before RTP timeout, I make the call from A to B also got the same problem. > > What is difference between case (1) and case(2)? The SIP helper currently only tracks one direction. I wanted to fix it for some time, but didn't get to it yet. From kaber at trash.net Sat Mar 4 10:26:15 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Mar 4 10:40:23 2006 Subject: [PATCH 1/5] [CTNETLINK] Fix expectation mask dumping In-Reply-To: <44025F8A.6070302@netfilter.org> References: <44025F8A.6070302@netfilter.org> Message-ID: <44095D37.1030109@trash.net> Pablo Neira Ayuso wrote: > This patch introduces the function ctnetlink_exp_dump_mask, that > correctly dumps the expectation mask. Such function uses the l3num value > from the expectation tuple that is a valid layer 3 protocol number. > > The value of the l3num mask isn't dumped since it is meaningless from > the userspace side. Its too late for 2.6.16, I've applied incarnation n+1 to my 2.6.17 tree. Thanks. From kaber at trash.net Sat Mar 4 10:35:03 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Mar 4 10:49:11 2006 Subject: [PATCH 5/5] [NF_CONNTRACK] load on demand layer 3 protocol handlers In-Reply-To: <44025FAE.1000400@netfilter.org> References: <44025FAE.1000400@netfilter.org> Message-ID: <44095F47.3070804@trash.net> Pablo Neira Ayuso wrote: > x_tables matches and targets that require nf_conntrack_ipv[4|6] to work > don't have enough information to load on demand these modules. This > patch introduces the following changes to solve this issue: All other patches also applied to 2.6.17. Thanks especially for 4/5, I was too lazy to do it myself :) From kaber at trash.net Sat Mar 4 10:37:51 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Mar 4 10:51:59 2006 Subject: [PATCH] Multiple matches of the same type In-Reply-To: References: Message-ID: <44095FEF.7040901@trash.net> Jozsef Kadlecsik wrote: > Hi, > > The attached patch adds the ability to specify multiple matches of the > same type by ip[6]tables. Besides removing a limitation, a few matches > (recent, u32, set :-) can benefit from the feature. > > If two or more matches of the same type are detected then the options are > assumed to be grouped in order to tell which option belongs to which > match: > > ... -m foo ... ... -m foo ... ... > I think thats a reasonable assumption. Its good to finally get rid of this limitation. From kaber at trash.net Sat Mar 4 10:41:56 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Mar 4 10:56:08 2006 Subject: New H.323 conntrack & NAT helper module In-Reply-To: References: <925A849792280C4E80C5461017A4B8A2032119@mail733.InfraSupportEtc.com><44001CDD.3030305@trash.net> <4400A541.9080901@trash.net> Message-ID: <440960E4.80601@trash.net> Jing Min Zhao wrote: > For the patch of adding support for non-linear SKBs, I admit > it is even a bug if there is no support for non-linear SKBs, but I have > some different idea for the checksum method. > >> Add support for non-linear SKBs. I know switching to >> ip_nat_mangle_{tcp,udp}_packet is less efficient because it checksums >> the packet on each call, but that can be fixed seperately by switching >> it to incremental checksumming. > > > Imagine a Setup signal with 30 fast-start entries (this is not unusual > for Gnomemeeting > and OpenPhone), if you use ip_nat_mangle_tcp_packet, you will have to > call it 45 > times. For a RRQ message, you will possible call > ip_nat_mangle_udp_packet more > than 10 times if it contains many signal addresses. You can use incremental > checksumming, but I'm still worrying about the efficiency. This is why I > prefer to use > a counter (please see the last paragraph) to track modifications and do the > checksum only once. I would expect incremental checksumming to be less expensive than redoing the entire checksum. I'll try to get a patch ready for testing this weekend, than we can compare the two approaches. > >> Change the H.323 helper to support non-linear skbs similar to the other >> helpers. This has two additional positive side-effects: > > >> - skb_writable was broken as it always tried to reload the data >> pointer with >> an assumed TPKT payload, even for H.225 RAS packets. > > > This can be fixed by seeing the protocol. Yes, but it complicates parsing multiple TPKTs. >> there seems to be some debugging-leftover, the functions registering >> expectations add up the number of registered expectations and return >> that value, but nobody uses it. If there are no plans for using it, >> I would prefer to have it removed. >> > > The return is actually a modification track counter. If a function > successfully changed a > packet, the counter will be increased. Finally, if the counter is > positive, the packet will be > checksumed; if it's 0, no changes; if negative, error happened. Ahh of course. I only looked for users after removing the csum hook. From kaber at trash.net Sat Mar 4 11:00:18 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Mar 4 11:12:45 2006 Subject: [patch] ipt_recent In-Reply-To: <43F9EA77.4060208@ufomechanic.net> References: <43F9EA77.4060208@ufomechanic.net> Message-ID: <44096532.2070000@trash.net> Amin Azez wrote: > This patch fixes the previously mentioned bug in ipt_recent and adds: > > --lt n # check less than n items in list > --gt n # checks more than n items in list > --eq n # check exactly n items in list > > Which can be prefixed with ! to invert. > > --- include/linux/netfilter_ipv4/ipt_recent.h.nolimit 2006-02-20 10:12:06.000000000 +0000 > +++ include/linux/netfilter_ipv4/ipt_recent.h 2006-02-20 11:30:58.000000000 +0000 > @@ -10,6 +10,11 @@ > #define IPT_RECENT_REMOVE 8 > #define IPT_RECENT_TTL 16 > > +#define IPT_RECENT_INVERT 1 > +#define IPT_RECENT_LT 2 > +#define IPT_RECENT_GT 4 > +#define IPT_RECENT_EQ (IPT_RECENT_LT | IPT_RECENT_GT) > + > #define IPT_RECENT_SOURCE 0 > #define IPT_RECENT_DEST 1 > > @@ -20,6 +25,8 @@ > u_int32_t hit_count; > u_int8_t check_set; > u_int8_t invert; > + u_int8_t check_count; > + u_int32_t entry_count; > char name[IPT_RECENT_NAME_LEN]; > u_int8_t side; > }; Sorry, we can't do that since it breaks userspace compatibility. But I'm really glad someone finally has the stomach to touch ipt_recent, I'll review your other patches now. From kaber at trash.net Sat Mar 4 11:10:31 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Mar 4 11:22:58 2006 Subject: ipt_recent fix In-Reply-To: References: Message-ID: <44096797.7060203@trash.net> Amin Azez wrote: > Here is a fix for ipt_recent. > > The problem is that curr_table[x].hash_entry=0 (for all x) > > So all pristine (never used) curr_table entries claim to have hash pos 0. > > So when we add new items to the hash and do: > hash_table[r_list[location].hash_entry] = -1; > to use the oldest (used or unused) location - if it is unused we trash > anything that really has hash pos 0 already allocated. > > This can be reproduced with this: > > # iptables -A FORWARD -d 1.1.1.1 -m recent --name TEST1 --set > # G=1 ; while test $G -lt 17 ; do echo $G ; echo 1.1.1.$G> \ > /proc/net/ipt_recent/TEST1 ; G=$(($G+1)) ; dmesg -c ; cat \ > /proc/net/ipt_recent/TEST1 | head ; done > x > > the 13th item has hash of 0 and gets trashed when the 14th item is added > without this patch. > > We should only trash it then if we _know_ it has been used before and > therefore is REALLY at the position it claims... > > This patch makes sure that items claiming to have a hash entry really do > have that hash entry before destroying that hash entry. > > This patch also adds counting of the number of hash entries in > preparation for conditions based on the number of entries in the recent > table rule; patch for that to follow next week. > > Any comments on this? > > Sam > > > ------------------------------------------------------------------------ > > --- ./net/ipv4/netfilter/ipt_recent.c.nolimit 2006-02-15 16:34:20.000000000 +0000 > +++ ./net/ipv4/netfilter/ipt_recent.c 2006-02-17 17:23:00.000000000 +0000 > @@ -70,7 +70,11 @@ > /* Structure of our linked list of tables of recent lists. */ > struct recent_ip_tables { > char name[IPT_RECENT_NAME_LEN]; > + /* number of entries in list *table */ > + int entry_count; > + /* number of reference to this structure from iptables rules */ > int count; > + /* an index increased with each operation which maps time_info[x].position to a position in table */ > int time_pos; > struct recent_ip_list *table; > struct recent_ip_tables *next; > @@ -139,6 +143,7 @@ > curr_table = (struct recent_ip_tables*) data; > > spin_lock_bh(&curr_table->list_lock); > + len += sprintf(buffer+len,"count=%d\n",curr_table->entry_count); > for(count = 0; count < ip_list_tot; count++) { > if(!curr_table->table[count].addr) continue; > last_len = len; This changes the proc output format, which we shouldn't do without a good reason to avoid breaking scripts parsing it. Can you seperate the fix from the patch please and send that? If you think there is a good reason to change the output format feel free to send a seperate patch for that. From kaber at trash.net Sat Mar 4 11:13:22 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Mar 4 11:25:49 2006 Subject: BUG: More ipt_recent queries In-Reply-To: <43FAF692.8030804@ufomechanic.net> References: <43FAF692.8030804@ufomechanic.net> Message-ID: <44096842.3010507@trash.net> Stephen, can you have a look at this please? Amin Azez wrote: > I'm concerned about ipt_recent where it removes entries from the list. > > Surly the move-up-and-close-the-gap while loop will never enter because > time_info[time_loc].time has just been set to 0 so that this clause of > the while loop: > > time_info[(time_loc+1) % ip_list_tot].time < time_info[time_loc].time) > > will always be false. > > Fuller code segment: > > location = hash_table[hash_result]; > hash_table[r_list[location].hash_entry] = -1; > time_loc = r_list[location].time_pos; > time_info[time_loc].time = 0; > time_info[time_loc].position = location; > > while((time_info[(time_loc+1) % ip_list_tot].time < > time_info[time_loc].time) && ((time_loc+1) % ip_list_tot) != > curr_table->time_pos) { > time_temp = time_info[time_loc].time; > time_info[time_loc].time = time_info[(time_loc+1)%ip_list_tot].time; > time_info[(time_loc+1)%ip_list_tot].time = time_temp; > time_temp = time_info[time_loc].position; > time_info[time_loc].position = > time_info[(time_loc+1)%ip_list_tot].position; > time_info[(time_loc+1)%ip_list_tot].position = time_temp; > r_list[time_info[time_loc].position].time_pos = time_loc; > r_list[time_info[(time_loc+1)%ip_list_tot].position].time_pos = > (time_loc+1)%ip_list_tot; > time_loc = (time_loc+1) % ip_list_tot; > } > > > I think we should set time_info[time_loc].time = 0; at the end of the > while loop? > > Sam > > From gandalf at wlug.westbo.se Sat Mar 4 17:23:26 2006 From: gandalf at wlug.westbo.se (Martin Josefsson) Date: Sat Mar 4 17:51:40 2006 Subject: Hashtrie testing (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) In-Reply-To: References: <43EFF1F0.1090701@netfilter.org> <20060213112028.GU4601@sunbeam.de.gnumonks.org> <43F438F5.8070607@trash.net> <43F43FA9.4000906@trash.net> <43F4426D.9060807@trash.net> <43F4DBDF.9010008@trash.net> Message-ID: <1141489406.3881.23.camel@localhost.localdomain> On Fri, 2006-02-17 at 19:41 +0100, Jozsef Kadlecsik wrote: (Changing subject and removing some cc's) > Hi Martin, Hi Jozsef > On Fri, 17 Feb 2006, Jozsef Kadlecsik wrote: > > > I'm collecting the ideas, so can't submit patches yet ;-). :) I'll move my svn tree to the netfilter svn sometime soon. > Two more things: > > - the Jenkins hash internally produces 96 bits, which we could > use in the hashtrie That's a great idea, will look into it. > - there should be some (whatever artifical) tests to study the > dynamical behaviour of the hashtrie: say all the entries are > added, then half of them deleted and added back again, but > in reverse order. Will the maxdepth grow? I've performed some simple tests with this and here are the results: First I tried removing the upper half of the conntrack entries (it's the conntrack entries index in the test array that's printed out in the "Removing" line below) Number of entries in hashtrie: 819200 Number of children in hashtrie: 27345 Maxdepth of hashtrie: 3 (0 == root) Removing entries between 204800 and 409599. Adding entries between 409599 and 204800. insert (half reverse): time: 1484 cyc, 929 ns (1076184/s) Number of entries in hashtrie: 819200 Number of children in hashtrie: 27345 Maxdepth of hashtrie: 3 (0 == root) No diffrence at all which is a bit weird, I had to doublecheck the code to see that I really readded the entries in reverse order and I did. Then I tried it with the lower half of the conntrack entries, those lower in the tree and got this: Number of entries in hashtrie: 819200 Number of children in hashtrie: 27273 Maxdepth of hashtrie: 3 (0 == root) Removing entries between 0 and 204799. Adding entries between 204799 and 0. insert (half reverse): time: 1627 cyc, 1018 ns (981869/s) Number of entries in hashtrie: 819200 Number of children in hashtrie: 29189 Maxdepth of hashtrie: 3 (0 == root) Here wee see that the number of child-nodes has increased from 27273 to 29189, that's a 7.0% increase, but the max depth hasn't increased in this particular test. Given the right sitation the maxdepth will probably increase, it's likely to occur sometimes since the number of child-nodes has increased. Then I tried to remove every other entry and then readding them in reverse order (this test is performed after the previous readd of the lower half, not the same testrun as above so the absolute numbers doesn't match those above): Number of entries in hashtrie: 819200 Number of children in hashtrie: 27329 Maxdepth of hashtrie: 3 (0 == root) Removing entries between 0 and 204799. Adding entries between 204799 and 0. insert (half reverse): time: 1590 cyc, 996 ns (1004079/s) Number of entries in hashtrie: 819200 Number of children in hashtrie: 29253 Maxdepth of hashtrie: 3 (0 == root) Removing every other entry from 0 to 409599 Adding every other entry from 409599 to 0 insert (other reverse): time: 1637 cyc, 1025 ns (975239/s) Number of entries in hashtrie: 819200 Number of children in hashtrie: 29376 Maxdepth of hashtrie: 3 (0 == root) Here we see that the number of child-nodes increased in the remove_lower_half_and_readd_in_reverse_order test just as it did before, and then the number of child-nodes increased again after removing every other entry and adding them back in reverse order. The total increase after both tests is 7.5% More testing is clearly needed. -- /Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : /pipermail/netfilter-devel/attachments/20060304/f15d7159/attachment.pgp From gregor at net.in.tum.de Sat Mar 4 18:38:16 2006 From: gregor at net.in.tum.de (Gregor Maier) Date: Sat Mar 4 18:50:29 2006 Subject: [PATCH][RFC] Unifiy logging in netfilter using nf_log, take #1 Message-ID: <20060304173815.GA12715@net.in.tum.de> [PATCH][RFC] Unifiy logging in netfilter using nf_log, take #1 Although nf_log is meant as a general logging API for netfilter, not every module uses it. Furthermore modules can interfere with the logging of other modules. This patch (against net-2.6.17) tries to eliminate these problems. * Everything uses nf_log.c as logging API (excpect the obsolete ULOG targets) * Loggers in nf_log.c are stackable. More than one logger can be registered per PF * nf_loginfo struct has been changed. Instead of the type field and the union there is a "backends" field now that contains a bitmask specifing which backends should process/log this packet. * ipt_LOGc and ip6t_LOG.c have been splitted: * xt_LOG.c now contains the targets. The targets always use nf_log for logging. These should be the only logging targets. They take care of setting up nf_loginfo for logging to syslog and/or to other backends * ip_log_syslog.c and ip6_log_syslog.c are new and contain the syslog log backends. When these modules are loaded, the syslog backend is registered with nf_log * The backends (nfnetlink_log, ip_log_syslog, ip6_log_syslog, ebt_LOG) check the backends field of nf_loginfo to see if they should handle the packet * The ULOG targets have been changed to be self-contained and independent from nf_log API. They are the _only_ modules that do not use nf_log What the LOG targets for v4 and v6 can do from a userspace / iptables point of view: * Log to syslog (allows you to specify the flags) * Log to nfnetlink_log (allows you to specify the loggroup) * Log to syslog and nfnetlink_log ===> One LOG target fits all. Things TODO: * ebt_log is always using the syslog backend. Furthermore the code there isn't split into the syslog logger and the target * change userspace iptables, so it can utilize the new stuff. Patch will follow soon * thoroughly test the patch POTENTIAL PROBLEMS / ALTERNATIVE SOLUTION * ipt_log_info and ip6t_log_info have been replaced by xt_log_info (which has changed in size). This means iptables must be recompiled and older iptables versions won't work with these changes. If this is considered a problem the only solution I can think of, is not to change the old LOG targets and introduce a new, general purpose target, that can do what the LOG target in this patch can to. Signed-off-by: Gregor Maier ===================================================================== diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index 4688969..edccaeb 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -123,22 +123,18 @@ extern struct list_head nf_hooks[NPROTO] #define NF_LOG_UID 0x08 /* Log UID owning local socket */ #define NF_LOG_MASK 0x0f -#define NF_LOG_TYPE_LOG 0x01 -#define NF_LOG_TYPE_ULOG 0x02 +#define NF_LOG_BACKEND_SYSLOG 0x01 +#define NF_LOG_BACKEND_NFLOG 0x02 +#define NF_LOG_BACKEND_MASK (NF_LOG_BACKEND_SYSLOG|NF_LOG_BACKEND_NFLOG) struct nf_loginfo { - u_int8_t type; - union { - struct { - u_int32_t copy_len; - u_int16_t group; - u_int16_t qthreshold; - } ulog; - struct { - u_int8_t level; - u_int8_t logflags; - } log; - } u; + u_int8_t backends; /* ORed from NF_LOG_BACKEND* */ + /* nflog backend */ + u_int16_t group; + u_int16_t qthreshold; + /* SYSLOG backend */ + u_int8_t level; + u_int8_t logflags; }; typedef void nf_logfn(unsigned int pf, @@ -157,7 +153,7 @@ struct nf_logger { /* Function to register/unregister log function. */ int nf_log_register(int pf, struct nf_logger *logger); -int nf_log_unregister_pf(int pf); +int nf_log_unregister_pf(int pf, struct nf_logger *logger); void nf_log_unregister_logger(struct nf_logger *logger); /* Calls the registered backend logging function */ diff --git a/include/linux/netfilter/xt_LOG.h b/include/linux/netfilter/xt_LOG.h new file mode 100644 index 0000000..cbb186f --- /dev/null +++ b/include/linux/netfilter/xt_LOG.h @@ -0,0 +1,26 @@ +/* iptables module for logging / LOG target + * + * (C) 2006 Gregor Maier + * + * This software is distributed under GNU GPL v2, 1991 + * +*/ +#ifndef _XT_LOG_TARGET_H +#define _XT_LOG_TARGET_H + +/* make sure not to change this without changing netfilter.h:NF_LOG_* (!) */ +#define XT_LOG_TCPSEQ 0x01 /* Log TCP sequence numbers */ +#define XT_LOG_TCPOPT 0x02 /* Log TCP options */ +#define XT_LOG_IPOPT 0x04 /* Log IP options */ +#define XT_LOG_UID 0x08 /* Log UID owning local socket */ +#define XT_LOG_MASK 0x0f + +struct xt_log_info { + u_int16_t group; + unsigned char backends; + unsigned char level; + unsigned char logflags; + char prefix[30]; +}; + +#endif /* _XT_LOG_TARGET_H */ diff --git a/include/linux/netfilter_bridge/ebt_log.h b/include/linux/netfilter_bridge/ebt_log.h index 96e231a..936e5d9 100644 --- a/include/linux/netfilter_bridge/ebt_log.h +++ b/include/linux/netfilter_bridge/ebt_log.h @@ -4,7 +4,7 @@ #define EBT_LOG_IP 0x01 /* if the frame is made by ip, log the ip information */ #define EBT_LOG_ARP 0x02 #define EBT_LOG_NFLOG 0x04 -#define EBT_LOG_MASK (EBT_LOG_IP | EBT_LOG_ARP) +#define EBT_LOG_MASK (EBT_LOG_IP | EBT_LOG_ARP | EBT_LOG_NFLOG) #define EBT_LOG_PREFIX_SIZE 30 #define EBT_LOG_WATCHER "log" diff --git a/net/bridge/netfilter/Makefile b/net/bridge/netfilter/Makefile index 8bf6d9f..905087e 100644 --- a/net/bridge/netfilter/Makefile +++ b/net/bridge/netfilter/Makefile @@ -29,4 +29,4 @@ obj-$(CONFIG_BRIDGE_EBT_SNAT) += ebt_sna # watchers obj-$(CONFIG_BRIDGE_EBT_LOG) += ebt_log.o -obj-$(CONFIG_BRIDGE_EBT_LOG) += ebt_ulog.o +obj-$(CONFIG_BRIDGE_EBT_ULOG) += ebt_ulog.o diff --git a/net/bridge/netfilter/ebt_log.c b/net/bridge/netfilter/ebt_log.c index 288ff1d..a1cb46a 100644 --- a/net/bridge/netfilter/ebt_log.c +++ b/net/bridge/netfilter/ebt_log.c @@ -67,8 +67,13 @@ ebt_log_packet(unsigned int pf, unsigned { unsigned int bitmask; + if (!loginfo) + return; /* FIXME: we should add a default_loginfo here */ + if (!(loginfo->backends & NF_LOG_BACKEND_SYSLOG)) + return; + spin_lock_bh(&ebt_log_lock); - printk("<%c>%s IN=%s OUT=%s MAC source = ", '0' + loginfo->u.log.level, + printk("<%c>%s IN=%s OUT=%s MAC source = ", '0' + loginfo->level, prefix, in ? in->name : "", out ? out->name : ""); print_MAC(eth_hdr(skb)->h_source); @@ -77,10 +82,7 @@ ebt_log_packet(unsigned int pf, unsigned printk("proto = 0x%04x", ntohs(eth_hdr(skb)->h_proto)); - if (loginfo->type == NF_LOG_TYPE_LOG) - bitmask = loginfo->u.log.logflags; - else - bitmask = NF_LOG_MASK; + bitmask = loginfo->logflags; if ((bitmask & EBT_LOG_IP) && eth_hdr(skb)->h_proto == htons(ETH_P_IP)){ @@ -162,16 +164,14 @@ static void ebt_log(const struct sk_buff struct ebt_log_info *info = (struct ebt_log_info *)data; struct nf_loginfo li; - li.type = NF_LOG_TYPE_LOG; - li.u.log.level = info->loglevel; - li.u.log.logflags = info->bitmask; + li.backends = NF_LOG_BACKEND_SYSLOG; /* currently only syslog backend supported */ + li.level = info->loglevel; + /* XXX: info->bitmask is 32 bit, logflas only 8 + * Currently it's not a problem, since only 3 falgs are defined, but nevertheless */ + li.logflags = info->bitmask; - if (info->bitmask & EBT_LOG_NFLOG) - nf_log_packet(PF_BRIDGE, hooknr, skb, in, out, &li, + nf_log_packet(PF_BRIDGE, hooknr, skb, in, out, &li, info->prefix); - else - ebt_log_packet(PF_BRIDGE, hooknr, skb, in, out, &li, - info->prefix); } static struct ebt_watcher log = diff --git a/net/bridge/netfilter/ebt_ulog.c b/net/bridge/netfilter/ebt_ulog.c index 802baf7..d06c67d 100644 --- a/net/bridge/netfilter/ebt_ulog.c +++ b/net/bridge/netfilter/ebt_ulog.c @@ -218,28 +218,6 @@ alloc_failure: goto unlock; } -/* this function is registered with the netfilter core */ -static void ebt_log_packet(unsigned int pf, unsigned int hooknum, - const struct sk_buff *skb, const struct net_device *in, - const struct net_device *out, const struct nf_loginfo *li, - const char *prefix) -{ - struct ebt_ulog_info loginfo; - - if (!li || li->type != NF_LOG_TYPE_ULOG) { - loginfo.nlgroup = EBT_ULOG_DEFAULT_NLGROUP; - loginfo.cprange = 0; - loginfo.qthreshold = EBT_ULOG_DEFAULT_QTHRESHOLD; - loginfo.prefix[0] = '\0'; - } else { - loginfo.nlgroup = li->u.ulog.group; - loginfo.cprange = li->u.ulog.copy_len; - loginfo.qthreshold = li->u.ulog.qthreshold; - strlcpy(loginfo.prefix, prefix, sizeof(loginfo.prefix)); - } - - ebt_ulog_packet(hooknum, skb, in, out, &loginfo, prefix); -} static void ebt_ulog(const struct sk_buff *skb, unsigned int hooknr, const struct net_device *in, const struct net_device *out, @@ -275,12 +253,6 @@ static struct ebt_watcher ulog = { .me = THIS_MODULE, }; -static struct nf_logger ebt_ulog_logger = { - .name = EBT_ULOG_WATCHER, - .logfn = &ebt_log_packet, - .me = THIS_MODULE, -}; - static int __init init(void) { int i, ret = 0; @@ -306,13 +278,6 @@ static int __init init(void) else if ((ret = ebt_register_watcher(&ulog))) sock_release(ebtulognl->sk_socket); - if (nf_log_register(PF_BRIDGE, &ebt_ulog_logger) < 0) { - printk(KERN_WARNING "ebt_ulog: not logging via ulog " - "since somebody else already registered for PF_BRIDGE\n"); - /* we cannot make module load fail here, since otherwise - * ebtables userspace would abort */ - } - return ret; } @@ -321,7 +286,6 @@ static void __exit fini(void) ebt_ulog_buff_t *ub; int i; - nf_log_unregister_logger(&ebt_ulog_logger); ebt_unregister_watcher(&ulog); for (i = 0; i < EBT_ULOG_MAXNLGROUPS; i++) { ub = &ulog_buffers[i]; diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig index 933ee7a..a875596 100644 --- a/net/ipv4/netfilter/Kconfig +++ b/net/ipv4/netfilter/Kconfig @@ -313,6 +313,17 @@ config IP_NF_FILTER local output. See the man page for iptables(8). To compile it as a module, choose M here. If unsure, say N. + +config IP_NF_LOG_SYSLOG + tristate "Syslog backend for LOG target" + depends on IP_NF_IPTABLES + help + This option adds a log backend, which allows you to create rules in + any iptables table which records the packet header to the syslog. + See NF_XTABLES_TARGET_LOG + + To compile it as a module, choose M here. If unsure, say N. + config IP_NF_TARGET_REJECT tristate "REJECT target support" @@ -324,15 +335,6 @@ config IP_NF_TARGET_REJECT To compile it as a module, choose M here. If unsure, say N. -config IP_NF_TARGET_LOG - tristate "LOG target support" - depends on IP_NF_IPTABLES - help - This option adds a `LOG' target, which allows you to create rules in - any iptables table which records the packet header to the syslog. - - To compile it as a module, choose M here. If unsure, say N. - config IP_NF_TARGET_ULOG tristate "ULOG target support (OBSOLETE)" depends on IP_NF_IPTABLES diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile index 3fe8092..8032d65 100644 --- a/net/ipv4/netfilter/Makefile +++ b/net/ipv4/netfilter/Makefile @@ -45,6 +45,9 @@ obj-$(CONFIG_IP_NF_MANGLE) += iptable_ma obj-$(CONFIG_IP_NF_NAT) += iptable_nat.o obj-$(CONFIG_IP_NF_RAW) += iptable_raw.o +# Syslog backend support +obj-$(CONFIG_IP_NF_LOG_SYSLOG) += ip_log_syslog.o + # matches obj-$(CONFIG_IP_NF_MATCH_HASHLIMIT) += ipt_hashlimit.o obj-$(CONFIG_IP_NF_MATCH_IPRANGE) += ipt_iprange.o @@ -68,7 +71,6 @@ obj-$(CONFIG_IP_NF_TARGET_REDIRECT) += i obj-$(CONFIG_IP_NF_TARGET_NETMAP) += ipt_NETMAP.o obj-$(CONFIG_IP_NF_TARGET_SAME) += ipt_SAME.o obj-$(CONFIG_IP_NF_NAT_SNMP_BASIC) += ip_nat_snmp_basic.o -obj-$(CONFIG_IP_NF_TARGET_LOG) += ipt_LOG.o obj-$(CONFIG_IP_NF_TARGET_ULOG) += ipt_ULOG.o obj-$(CONFIG_IP_NF_TARGET_TCPMSS) += ipt_TCPMSS.o obj-$(CONFIG_IP_NF_TARGET_CLUSTERIP) += ipt_CLUSTERIP.o diff --git a/net/ipv4/netfilter/ip_log_syslog.c b/net/ipv4/netfilter/ip_log_syslog.c new file mode 100644 index 0000000..8463ba9 --- /dev/null +++ b/net/ipv4/netfilter/ip_log_syslog.c @@ -0,0 +1,437 @@ +/* + * This is a module which is used for logging packets. + */ + +/* (C) 1999-2001 Paul `Rusty' Russell + * (C) 2002-2004 Netfilter Core Team + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * 2006-03-04 Gregor Maier + * Unified logging. Use nf_log for everything. Create xt_LOG target + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Netfilter Core Team "); +MODULE_DESCRIPTION("iptables syslog logging module"); + +#if 0 +#define DEBUGP printk +#else +#define DEBUGP(format, args...) +#endif + +/* Use lock to serialize, so printks don't overlap */ +static DEFINE_SPINLOCK(log_lock); + +/* One level of recursion won't kill us */ +static void dump_packet(const struct nf_loginfo *info, + const struct sk_buff *skb, + unsigned int iphoff) +{ + struct iphdr _iph, *ih; + unsigned int logflags; + + logflags = info->logflags; + + ih = skb_header_pointer(skb, iphoff, sizeof(_iph), &_iph); + if (ih == NULL) { + printk("TRUNCATED"); + return; + } + + /* Important fields: + * TOS, len, DF/MF, fragment offset, TTL, src, dst, options. */ + /* Max length: 40 "SRC=255.255.255.255 DST=255.255.255.255 " */ + printk("SRC=%u.%u.%u.%u DST=%u.%u.%u.%u ", + NIPQUAD(ih->saddr), NIPQUAD(ih->daddr)); + + /* Max length: 46 "LEN=65535 TOS=0xFF PREC=0xFF TTL=255 ID=65535 " */ + printk("LEN=%u TOS=0x%02X PREC=0x%02X TTL=%u ID=%u ", + ntohs(ih->tot_len), ih->tos & IPTOS_TOS_MASK, + ih->tos & IPTOS_PREC_MASK, ih->ttl, ntohs(ih->id)); + + /* Max length: 6 "CE DF MF " */ + if (ntohs(ih->frag_off) & IP_CE) + printk("CE "); + if (ntohs(ih->frag_off) & IP_DF) + printk("DF "); + if (ntohs(ih->frag_off) & IP_MF) + printk("MF "); + + /* Max length: 11 "FRAG:65535 " */ + if (ntohs(ih->frag_off) & IP_OFFSET) + printk("FRAG:%u ", ntohs(ih->frag_off) & IP_OFFSET); + + if ((logflags & XT_LOG_IPOPT) + && ih->ihl * 4 > sizeof(struct iphdr)) { + unsigned char _opt[4 * 15 - sizeof(struct iphdr)], *op; + unsigned int i, optsize; + + optsize = ih->ihl * 4 - sizeof(struct iphdr); + op = skb_header_pointer(skb, iphoff+sizeof(_iph), + optsize, _opt); + if (op == NULL) { + printk("TRUNCATED"); + return; + } + + /* Max length: 127 "OPT (" 15*4*2chars ") " */ + printk("OPT ("); + for (i = 0; i < optsize; i++) + printk("%02X", op[i]); + printk(") "); + } + + switch (ih->protocol) { + case IPPROTO_TCP: { + struct tcphdr _tcph, *th; + + /* Max length: 10 "PROTO=TCP " */ + printk("PROTO=TCP "); + + if (ntohs(ih->frag_off) & IP_OFFSET) + break; + + /* Max length: 25 "INCOMPLETE [65535 bytes] " */ + th = skb_header_pointer(skb, iphoff + ih->ihl * 4, + sizeof(_tcph), &_tcph); + if (th == NULL) { + printk("INCOMPLETE [%u bytes] ", + skb->len - iphoff - ih->ihl*4); + break; + } + + /* Max length: 20 "SPT=65535 DPT=65535 " */ + printk("SPT=%u DPT=%u ", + ntohs(th->source), ntohs(th->dest)); + /* Max length: 30 "SEQ=4294967295 ACK=4294967295 " */ + if (logflags & XT_LOG_TCPSEQ) + printk("SEQ=%u ACK=%u ", + ntohl(th->seq), ntohl(th->ack_seq)); + /* Max length: 13 "WINDOW=65535 " */ + printk("WINDOW=%u ", ntohs(th->window)); + /* Max length: 9 "RES=0x3F " */ + printk("RES=0x%02x ", (u8)(ntohl(tcp_flag_word(th) & TCP_RESERVED_BITS) >> 22)); + /* Max length: 32 "CWR ECE URG ACK PSH RST SYN FIN " */ + if (th->cwr) + printk("CWR "); + if (th->ece) + printk("ECE "); + if (th->urg) + printk("URG "); + if (th->ack) + printk("ACK "); + if (th->psh) + printk("PSH "); + if (th->rst) + printk("RST "); + if (th->syn) + printk("SYN "); + if (th->fin) + printk("FIN "); + /* Max length: 11 "URGP=65535 " */ + printk("URGP=%u ", ntohs(th->urg_ptr)); + + if ((logflags & XT_LOG_TCPOPT) + && th->doff * 4 > sizeof(struct tcphdr)) { + unsigned char _opt[4 * 15 - sizeof(struct tcphdr)]; + unsigned char *op; + unsigned int i, optsize; + + optsize = th->doff * 4 - sizeof(struct tcphdr); + op = skb_header_pointer(skb, + iphoff+ih->ihl*4+sizeof(_tcph), + optsize, _opt); + if (op == NULL) { + printk("TRUNCATED"); + return; + } + + /* Max length: 127 "OPT (" 15*4*2chars ") " */ + printk("OPT ("); + for (i = 0; i < optsize; i++) + printk("%02X", op[i]); + printk(") "); + } + break; + } + case IPPROTO_UDP: { + struct udphdr _udph, *uh; + + /* Max length: 10 "PROTO=UDP " */ + printk("PROTO=UDP "); + + if (ntohs(ih->frag_off) & IP_OFFSET) + break; + + /* Max length: 25 "INCOMPLETE [65535 bytes] " */ + uh = skb_header_pointer(skb, iphoff+ih->ihl*4, + sizeof(_udph), &_udph); + if (uh == NULL) { + printk("INCOMPLETE [%u bytes] ", + skb->len - iphoff - ih->ihl*4); + break; + } + + /* Max length: 20 "SPT=65535 DPT=65535 " */ + printk("SPT=%u DPT=%u LEN=%u ", + ntohs(uh->source), ntohs(uh->dest), + ntohs(uh->len)); + break; + } + case IPPROTO_ICMP: { + struct icmphdr _icmph, *ich; + static const size_t required_len[NR_ICMP_TYPES+1] + = { [ICMP_ECHOREPLY] = 4, + [ICMP_DEST_UNREACH] + = 8 + sizeof(struct iphdr), + [ICMP_SOURCE_QUENCH] + = 8 + sizeof(struct iphdr), + [ICMP_REDIRECT] + = 8 + sizeof(struct iphdr), + [ICMP_ECHO] = 4, + [ICMP_TIME_EXCEEDED] + = 8 + sizeof(struct iphdr), + [ICMP_PARAMETERPROB] + = 8 + sizeof(struct iphdr), + [ICMP_TIMESTAMP] = 20, + [ICMP_TIMESTAMPREPLY] = 20, + [ICMP_ADDRESS] = 12, + [ICMP_ADDRESSREPLY] = 12 }; + + /* Max length: 11 "PROTO=ICMP " */ + printk("PROTO=ICMP "); + + if (ntohs(ih->frag_off) & IP_OFFSET) + break; + + /* Max length: 25 "INCOMPLETE [65535 bytes] " */ + ich = skb_header_pointer(skb, iphoff + ih->ihl * 4, + sizeof(_icmph), &_icmph); + if (ich == NULL) { + printk("INCOMPLETE [%u bytes] ", + skb->len - iphoff - ih->ihl*4); + break; + } + + /* Max length: 18 "TYPE=255 CODE=255 " */ + printk("TYPE=%u CODE=%u ", ich->type, ich->code); + + /* Max length: 25 "INCOMPLETE [65535 bytes] " */ + if (ich->type <= NR_ICMP_TYPES + && required_len[ich->type] + && skb->len-iphoff-ih->ihl*4 < required_len[ich->type]) { + printk("INCOMPLETE [%u bytes] ", + skb->len - iphoff - ih->ihl*4); + break; + } + + switch (ich->type) { + case ICMP_ECHOREPLY: + case ICMP_ECHO: + /* Max length: 19 "ID=65535 SEQ=65535 " */ + printk("ID=%u SEQ=%u ", + ntohs(ich->un.echo.id), + ntohs(ich->un.echo.sequence)); + break; + + case ICMP_PARAMETERPROB: + /* Max length: 14 "PARAMETER=255 " */ + printk("PARAMETER=%u ", + ntohl(ich->un.gateway) >> 24); + break; + case ICMP_REDIRECT: + /* Max length: 24 "GATEWAY=255.255.255.255 " */ + printk("GATEWAY=%u.%u.%u.%u ", + NIPQUAD(ich->un.gateway)); + /* Fall through */ + case ICMP_DEST_UNREACH: + case ICMP_SOURCE_QUENCH: + case ICMP_TIME_EXCEEDED: + /* Max length: 3+maxlen */ + if (!iphoff) { /* Only recurse once. */ + printk("["); + dump_packet(info, skb, + iphoff + ih->ihl*4+sizeof(_icmph)); + printk("] "); + } + + /* Max length: 10 "MTU=65535 " */ + if (ich->type == ICMP_DEST_UNREACH + && ich->code == ICMP_FRAG_NEEDED) + printk("MTU=%u ", ntohs(ich->un.frag.mtu)); + } + break; + } + /* Max Length */ + case IPPROTO_AH: { + struct ip_auth_hdr _ahdr, *ah; + + if (ntohs(ih->frag_off) & IP_OFFSET) + break; + + /* Max length: 9 "PROTO=AH " */ + printk("PROTO=AH "); + + /* Max length: 25 "INCOMPLETE [65535 bytes] " */ + ah = skb_header_pointer(skb, iphoff+ih->ihl*4, + sizeof(_ahdr), &_ahdr); + if (ah == NULL) { + printk("INCOMPLETE [%u bytes] ", + skb->len - iphoff - ih->ihl*4); + break; + } + + /* Length: 15 "SPI=0xF1234567 " */ + printk("SPI=0x%x ", ntohl(ah->spi)); + break; + } + case IPPROTO_ESP: { + struct ip_esp_hdr _esph, *eh; + + /* Max length: 10 "PROTO=ESP " */ + printk("PROTO=ESP "); + + if (ntohs(ih->frag_off) & IP_OFFSET) + break; + + /* Max length: 25 "INCOMPLETE [65535 bytes] " */ + eh = skb_header_pointer(skb, iphoff+ih->ihl*4, + sizeof(_esph), &_esph); + if (eh == NULL) { + printk("INCOMPLETE [%u bytes] ", + skb->len - iphoff - ih->ihl*4); + break; + } + + /* Length: 15 "SPI=0xF1234567 " */ + printk("SPI=0x%x ", ntohl(eh->spi)); + break; + } + /* Max length: 10 "PROTO 255 " */ + default: + printk("PROTO=%u ", ih->protocol); + } + + /* Max length: 15 "UID=4294967295 " */ + if ((logflags & XT_LOG_UID) && !iphoff && skb->sk) { + read_lock_bh(&skb->sk->sk_callback_lock); + if (skb->sk->sk_socket && skb->sk->sk_socket->file) + printk("UID=%u ", skb->sk->sk_socket->file->f_uid); + read_unlock_bh(&skb->sk->sk_callback_lock); + } + + /* Proto Max log string length */ + /* IP: 40+46+6+11+127 = 230 */ + /* TCP: 10+max(25,20+30+13+9+32+11+127) = 252 */ + /* UDP: 10+max(25,20) = 35 */ + /* ICMP: 11+max(25, 18+25+max(19,14,24+3+n+10,3+n+10)) = 91+n */ + /* ESP: 10+max(25)+15 = 50 */ + /* AH: 9+max(25)+15 = 49 */ + /* unknown: 10 */ + + /* (ICMP allows recursion one level deep) */ + /* maxlen = IP + ICMP + IP + max(TCP,UDP,ICMP,unknown) */ + /* maxlen = 230+ 91 + 230 + 252 = 803 */ +} + +static struct nf_loginfo default_loginfo = { + .backends = NF_LOG_BACKEND_SYSLOG, + .level = 0, + .logflags = NF_LOG_MASK, + /* other fields ignores, since only using SYSLOG backend */ +}; + +static void +ip_log_syslog_packet(unsigned int pf, + unsigned int hooknum, + const struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + const struct nf_loginfo *loginfo, + const char *prefix) +{ + /* Syslog backend is responsible if no loginfo has be specified */ + if (!loginfo) + loginfo = &default_loginfo; + /* Are we responsible for this packet ? */ + if (!(loginfo->backends & NF_LOG_BACKEND_SYSLOG)) + return; + + spin_lock_bh(&log_lock); + printk("<%d>%sIN=%s OUT=%s ", loginfo->level, + prefix, + in ? in->name : "", + out ? out->name : ""); +#ifdef CONFIG_BRIDGE_NETFILTER + if (skb->nf_bridge) { + struct net_device *physindev = skb->nf_bridge->physindev; + struct net_device *physoutdev = skb->nf_bridge->physoutdev; + + if (physindev && in != physindev) + printk("PHYSIN=%s ", physindev->name); + if (physoutdev && out != physoutdev) + printk("PHYSOUT=%s ", physoutdev->name); + } +#endif + + if (in && !out) { + /* MAC logging for input chain only. */ + printk("MAC="); + if (skb->dev && skb->dev->hard_header_len + && skb->mac.raw != (void*)skb->nh.iph) { + int i; + unsigned char *p = skb->mac.raw; + for (i = 0; i < skb->dev->hard_header_len; i++,p++) + printk("%02x%c", *p, + i==skb->dev->hard_header_len - 1 + ? ' ':':'); + } else + printk(" "); + } + + dump_packet(loginfo, skb, 0); + printk("\n"); + spin_unlock_bh(&log_lock); +} + +static struct nf_logger ip_syslog_logger ={ + .name = "ip_log_syslog", + .logfn = &ip_log_syslog_packet, + .me = THIS_MODULE, +}; + +static int __init init(void) +{ + if (nf_log_register(PF_INET, &ip_syslog_logger) < 0) { + printk(KERN_WARNING "ip_log_syslog: not logging via system console " + "since somebody else already registered for PF_INET\n"); + /* we cannot make module load fail here, since otherwise + * iptables userspace would abort */ + } + + return 0; +} + +static void __exit fini(void) +{ + nf_log_unregister_logger(&ip_syslog_logger); +} + +module_init(init); +module_exit(fini); diff --git a/net/ipv4/netfilter/ipt_ULOG.c b/net/ipv4/netfilter/ipt_ULOG.c index a82a32e..1c184c7 100644 --- a/net/ipv4/netfilter/ipt_ULOG.c +++ b/net/ipv4/netfilter/ipt_ULOG.c @@ -15,6 +15,7 @@ * 2002/10/30 fix uninitialized mac_len field - * 2004/10/25 fix erroneous calculation of 'len' parameter to NLMSG_PUT * resulting in bogus 'error during NLMSG_PUT' messages. + * 2006/03/04 make ULOG self-contained without interaction with nf_log * * (C) 1999-2001 Paul `Rusty' Russell * (C) 2002-2004 Netfilter Core Team @@ -313,31 +314,6 @@ static unsigned int ipt_ulog_target(stru return IPT_CONTINUE; } -static void ipt_logfn(unsigned int pf, - unsigned int hooknum, - const struct sk_buff *skb, - const struct net_device *in, - const struct net_device *out, - const struct nf_loginfo *li, - const char *prefix) -{ - struct ipt_ulog_info loginfo; - - if (!li || li->type != NF_LOG_TYPE_ULOG) { - loginfo.nl_group = ULOG_DEFAULT_NLGROUP; - loginfo.copy_range = 0; - loginfo.qthreshold = ULOG_DEFAULT_QTHRESHOLD; - loginfo.prefix[0] = '\0'; - } else { - loginfo.nl_group = li->u.ulog.group; - loginfo.copy_range = li->u.ulog.copy_len; - loginfo.qthreshold = li->u.ulog.qthreshold; - strlcpy(loginfo.prefix, prefix, sizeof(loginfo.prefix)); - } - - ipt_ulog_packet(hooknum, skb, in, out, &loginfo, prefix); -} - static int ipt_ulog_checkentry(const char *tablename, const void *e, const struct xt_target *target, @@ -368,12 +344,6 @@ static struct ipt_target ipt_ulog_reg = .me = THIS_MODULE, }; -static struct nf_logger ipt_ulog_logger = { - .name = "ipt_ULOG", - .logfn = ipt_logfn, - .me = THIS_MODULE, -}; - static int __init init(void) { int i; @@ -401,8 +371,6 @@ static int __init init(void) sock_release(nflognl->sk_socket); return -EINVAL; } - if (nflog) - nf_log_register(PF_INET, &ipt_ulog_logger); return 0; } @@ -414,8 +382,6 @@ static void __exit fini(void) DEBUGP("ipt_ULOG: cleanup_module\n"); - if (nflog) - nf_log_unregister_logger(&ipt_ulog_logger); ipt_unregister_target(&ipt_ulog_reg); sock_release(nflognl->sk_socket); diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig index 98f7875..db49c5c 100644 --- a/net/ipv6/netfilter/Kconfig +++ b/net/ipv6/netfilter/Kconfig @@ -144,15 +144,17 @@ config IP6_NF_FILTER To compile it as a module, choose M here. If unsure, say N. -config IP6_NF_TARGET_LOG - tristate "LOG target support" - depends on IP6_NF_FILTER +config IP6_NF_LOG_SYSLOG + tristate "Syslog backend for LOG target" + depends on IP_NF_IPTABLES help - This option adds a `LOG' target, which allows you to create rules in + This option adds a log backend, which allows you to create rules in any iptables table which records the packet header to the syslog. + See NF_XTABLES_TARGET_LOG To compile it as a module, choose M here. If unsure, say N. + config IP6_NF_TARGET_REJECT tristate "REJECT target support" depends on IP6_NF_FILTER diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile index 8436a1a..418dfa4 100644 --- a/net/ipv6/netfilter/Makefile +++ b/net/ipv6/netfilter/Makefile @@ -4,6 +4,7 @@ # Link order matters here. obj-$(CONFIG_IP6_NF_IPTABLES) += ip6_tables.o +obj-$(CONFIG_IP6_NF_LOG_SYSLOG) += ip6_log_syslog.o obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o obj-$(CONFIG_IP6_NF_MATCH_OPTS) += ip6t_hbh.o ip6t_dst.o obj-$(CONFIG_IP6_NF_MATCH_IPV6HEADER) += ip6t_ipv6header.o @@ -16,7 +17,6 @@ obj-$(CONFIG_IP6_NF_FILTER) += ip6table_ obj-$(CONFIG_IP6_NF_MANGLE) += ip6table_mangle.o obj-$(CONFIG_IP6_NF_TARGET_HL) += ip6t_HL.o obj-$(CONFIG_IP6_NF_QUEUE) += ip6_queue.o -obj-$(CONFIG_IP6_NF_TARGET_LOG) += ip6t_LOG.o obj-$(CONFIG_IP6_NF_RAW) += ip6table_raw.o obj-$(CONFIG_IP6_NF_MATCH_HL) += ip6t_hl.o obj-$(CONFIG_IP6_NF_TARGET_REJECT) += ip6t_REJECT.o diff --git a/net/ipv6/netfilter/ip6_log_syslog.c b/net/ipv6/netfilter/ip6_log_syslog.c new file mode 100644 index 0000000..0a1d62c --- /dev/null +++ b/net/ipv6/netfilter/ip6_log_syslog.c @@ -0,0 +1,449 @@ +/* + * This is a module which is used for logging packets. + */ + +/* (C) 2001 Jan Rekorajski + * (C) 2002-2004 Netfilter Core Team + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * 2006-03-04 Gregor Maier + * Unified logging. Use nf_log for everything. Create xt_LOG target + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +MODULE_AUTHOR("Jan Rekorajski "); +MODULE_DESCRIPTION("IP6 tables syslog logging module"); +MODULE_LICENSE("GPL"); + +struct in_device; +#include +#include + +#if 0 +#define DEBUGP printk +#else +#define DEBUGP(format, args...) +#endif + +/* Use lock to serialize, so printks don't overlap */ +static DEFINE_SPINLOCK(log_lock); + +/* One level of recursion won't kill us */ +static void dump_packet(const struct nf_loginfo *info, + const struct sk_buff *skb, unsigned int ip6hoff, + int recurse) +{ + u_int8_t currenthdr; + int fragment; + struct ipv6hdr _ip6h, *ih; + unsigned int ptr; + unsigned int hdrlen = 0; + unsigned int logflags; + + logflags = info->logflags; + + ih = skb_header_pointer(skb, ip6hoff, sizeof(_ip6h), &_ip6h); + if (ih == NULL) { + printk("TRUNCATED"); + return; + } + + /* Max length: 88 "SRC=0000.0000.0000.0000.0000.0000.0000.0000 DST=0000.0000.0000.0000.0000.0000.0000.0000 " */ + printk("SRC=" NIP6_FMT " DST=" NIP6_FMT " ", NIP6(ih->saddr), NIP6(ih->daddr)); + + /* Max length: 44 "LEN=65535 TC=255 HOPLIMIT=255 FLOWLBL=FFFFF " */ + printk("LEN=%Zu TC=%u HOPLIMIT=%u FLOWLBL=%u ", + ntohs(ih->payload_len) + sizeof(struct ipv6hdr), + (ntohl(*(u_int32_t *)ih) & 0x0ff00000) >> 20, + ih->hop_limit, + (ntohl(*(u_int32_t *)ih) & 0x000fffff)); + + fragment = 0; + ptr = ip6hoff + sizeof(struct ipv6hdr); + currenthdr = ih->nexthdr; + while (currenthdr != NEXTHDR_NONE && ip6t_ext_hdr(currenthdr)) { + struct ipv6_opt_hdr _hdr, *hp; + + hp = skb_header_pointer(skb, ptr, sizeof(_hdr), &_hdr); + if (hp == NULL) { + printk("TRUNCATED"); + return; + } + + /* Max length: 48 "OPT (...) " */ + if (logflags & XT_LOG_IPOPT) + printk("OPT ( "); + + switch (currenthdr) { + case IPPROTO_FRAGMENT: { + struct frag_hdr _fhdr, *fh; + + printk("FRAG:"); + fh = skb_header_pointer(skb, ptr, sizeof(_fhdr), + &_fhdr); + if (fh == NULL) { + printk("TRUNCATED "); + return; + } + + /* Max length: 6 "65535 " */ + printk("%u ", ntohs(fh->frag_off) & 0xFFF8); + + /* Max length: 11 "INCOMPLETE " */ + if (fh->frag_off & htons(0x0001)) + printk("INCOMPLETE "); + + printk("ID:%08x ", ntohl(fh->identification)); + + if (ntohs(fh->frag_off) & 0xFFF8) + fragment = 1; + + hdrlen = 8; + + break; + } + case IPPROTO_DSTOPTS: + case IPPROTO_ROUTING: + case IPPROTO_HOPOPTS: + if (fragment) { + if (logflags & XT_LOG_IPOPT) + printk(")"); + return; + } + hdrlen = ipv6_optlen(hp); + break; + /* Max Length */ + case IPPROTO_AH: + if (logflags & XT_LOG_IPOPT) { + struct ip_auth_hdr _ahdr, *ah; + + /* Max length: 3 "AH " */ + printk("AH "); + + if (fragment) { + printk(")"); + return; + } + + ah = skb_header_pointer(skb, ptr, sizeof(_ahdr), + &_ahdr); + if (ah == NULL) { + /* + * Max length: 26 "INCOMPLETE [65535 + * bytes] )" + */ + printk("INCOMPLETE [%u bytes] )", + skb->len - ptr); + return; + } + + /* Length: 15 "SPI=0xF1234567 */ + printk("SPI=0x%x ", ntohl(ah->spi)); + + } + + hdrlen = (hp->hdrlen+2)<<2; + break; + case IPPROTO_ESP: + if (logflags & XT_LOG_IPOPT) { + struct ip_esp_hdr _esph, *eh; + + /* Max length: 4 "ESP " */ + printk("ESP "); + + if (fragment) { + printk(")"); + return; + } + + /* + * Max length: 26 "INCOMPLETE [65535 bytes] )" + */ + eh = skb_header_pointer(skb, ptr, sizeof(_esph), + &_esph); + if (eh == NULL) { + printk("INCOMPLETE [%u bytes] )", + skb->len - ptr); + return; + } + + /* Length: 16 "SPI=0xF1234567 )" */ + printk("SPI=0x%x )", ntohl(eh->spi) ); + + } + return; + default: + /* Max length: 20 "Unknown Ext Hdr 255" */ + printk("Unknown Ext Hdr %u", currenthdr); + return; + } + if (logflags & XT_LOG_IPOPT) + printk(") "); + + currenthdr = hp->nexthdr; + ptr += hdrlen; + } + + switch (currenthdr) { + case IPPROTO_TCP: { + struct tcphdr _tcph, *th; + + /* Max length: 10 "PROTO=TCP " */ + printk("PROTO=TCP "); + + if (fragment) + break; + + /* Max length: 25 "INCOMPLETE [65535 bytes] " */ + th = skb_header_pointer(skb, ptr, sizeof(_tcph), &_tcph); + if (th == NULL) { + printk("INCOMPLETE [%u bytes] ", skb->len - ptr); + return; + } + + /* Max length: 20 "SPT=65535 DPT=65535 " */ + printk("SPT=%u DPT=%u ", + ntohs(th->source), ntohs(th->dest)); + /* Max length: 30 "SEQ=4294967295 ACK=4294967295 " */ + if (logflags & XT_LOG_TCPSEQ) + printk("SEQ=%u ACK=%u ", + ntohl(th->seq), ntohl(th->ack_seq)); + /* Max length: 13 "WINDOW=65535 " */ + printk("WINDOW=%u ", ntohs(th->window)); + /* Max length: 9 "RES=0x3C " */ + printk("RES=0x%02x ", (u_int8_t)(ntohl(tcp_flag_word(th) & TCP_RESERVED_BITS) >> 22)); + /* Max length: 32 "CWR ECE URG ACK PSH RST SYN FIN " */ + if (th->cwr) + printk("CWR "); + if (th->ece) + printk("ECE "); + if (th->urg) + printk("URG "); + if (th->ack) + printk("ACK "); + if (th->psh) + printk("PSH "); + if (th->rst) + printk("RST "); + if (th->syn) + printk("SYN "); + if (th->fin) + printk("FIN "); + /* Max length: 11 "URGP=65535 " */ + printk("URGP=%u ", ntohs(th->urg_ptr)); + + if ((logflags & XT_LOG_TCPOPT) + && th->doff * 4 > sizeof(struct tcphdr)) { + u_int8_t _opt[60 - sizeof(struct tcphdr)], *op; + unsigned int i; + unsigned int optsize = th->doff * 4 + - sizeof(struct tcphdr); + + op = skb_header_pointer(skb, + ptr + sizeof(struct tcphdr), + optsize, _opt); + if (op == NULL) { + printk("OPT (TRUNCATED)"); + return; + } + + /* Max length: 127 "OPT (" 15*4*2chars ") " */ + printk("OPT ("); + for (i =0; i < optsize; i++) + printk("%02X", op[i]); + printk(") "); + } + break; + } + case IPPROTO_UDP: { + struct udphdr _udph, *uh; + + /* Max length: 10 "PROTO=UDP " */ + printk("PROTO=UDP "); + + if (fragment) + break; + + /* Max length: 25 "INCOMPLETE [65535 bytes] " */ + uh = skb_header_pointer(skb, ptr, sizeof(_udph), &_udph); + if (uh == NULL) { + printk("INCOMPLETE [%u bytes] ", skb->len - ptr); + return; + } + + /* Max length: 20 "SPT=65535 DPT=65535 " */ + printk("SPT=%u DPT=%u LEN=%u ", + ntohs(uh->source), ntohs(uh->dest), + ntohs(uh->len)); + break; + } + case IPPROTO_ICMPV6: { + struct icmp6hdr _icmp6h, *ic; + + /* Max length: 13 "PROTO=ICMPv6 " */ + printk("PROTO=ICMPv6 "); + + if (fragment) + break; + + /* Max length: 25 "INCOMPLETE [65535 bytes] " */ + ic = skb_header_pointer(skb, ptr, sizeof(_icmp6h), &_icmp6h); + if (ic == NULL) { + printk("INCOMPLETE [%u bytes] ", skb->len - ptr); + return; + } + + /* Max length: 18 "TYPE=255 CODE=255 " */ + printk("TYPE=%u CODE=%u ", ic->icmp6_type, ic->icmp6_code); + + switch (ic->icmp6_type) { + case ICMPV6_ECHO_REQUEST: + case ICMPV6_ECHO_REPLY: + /* Max length: 19 "ID=65535 SEQ=65535 " */ + printk("ID=%u SEQ=%u ", + ntohs(ic->icmp6_identifier), + ntohs(ic->icmp6_sequence)); + break; + case ICMPV6_MGM_QUERY: + case ICMPV6_MGM_REPORT: + case ICMPV6_MGM_REDUCTION: + break; + + case ICMPV6_PARAMPROB: + /* Max length: 17 "POINTER=ffffffff " */ + printk("POINTER=%08x ", ntohl(ic->icmp6_pointer)); + /* Fall through */ + case ICMPV6_DEST_UNREACH: + case ICMPV6_PKT_TOOBIG: + case ICMPV6_TIME_EXCEED: + /* Max length: 3+maxlen */ + if (recurse) { + printk("["); + dump_packet(info, skb, ptr + sizeof(_icmp6h), + 0); + printk("] "); + } + + /* Max length: 10 "MTU=65535 " */ + if (ic->icmp6_type == ICMPV6_PKT_TOOBIG) + printk("MTU=%u ", ntohl(ic->icmp6_mtu)); + } + break; + } + /* Max length: 10 "PROTO=255 " */ + default: + printk("PROTO=%u ", currenthdr); + } + + /* Max length: 15 "UID=4294967295 " */ + if ((logflags & XT_LOG_UID) && recurse && skb->sk) { + read_lock_bh(&skb->sk->sk_callback_lock); + if (skb->sk->sk_socket && skb->sk->sk_socket->file) + printk("UID=%u ", skb->sk->sk_socket->file->f_uid); + read_unlock_bh(&skb->sk->sk_callback_lock); + } +} + +static struct nf_loginfo default_loginfo = { + .backends = NF_LOG_BACKEND_SYSLOG, + .level = 0, + .logflags = NF_LOG_MASK, + /* other fields ignores, since only using SYSLOG backend */ +}; + +static void +ip6_log_syslog_packet(unsigned int pf, + unsigned int hooknum, + const struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + const struct nf_loginfo *loginfo, + const char *prefix) +{ + /* Syslog backend is responsible if no loginfo has be specified */ + if (!loginfo) + loginfo = &default_loginfo; + /* Are we responsible for this packet ? */ + if (!(loginfo->backends & NF_LOG_BACKEND_SYSLOG)) + return; + + spin_lock_bh(&log_lock); + printk("<%d>%sIN=%s OUT=%s ", loginfo->level, + prefix, + in ? in->name : "", + out ? out->name : ""); + if (in && !out) { + unsigned int len; + /* MAC logging for input chain only. */ + printk("MAC="); + if (skb->dev && (len = skb->dev->hard_header_len) && + skb->mac.raw != skb->nh.raw) { + unsigned char *p = skb->mac.raw; + int i; + + if (skb->dev->type == ARPHRD_SIT && + (p -= ETH_HLEN) < skb->head) + p = NULL; + + if (p != NULL) { + for (i = 0; i < len; i++) + printk("%02x%s", p[i], + i == len - 1 ? "" : ":"); + } + printk(" "); + + if (skb->dev->type == ARPHRD_SIT) { + struct iphdr *iph = (struct iphdr *)skb->mac.raw; + printk("TUNNEL=%u.%u.%u.%u->%u.%u.%u.%u ", + NIPQUAD(iph->saddr), + NIPQUAD(iph->daddr)); + } + } else + printk(" "); + } + + dump_packet(loginfo, skb, (u8*)skb->nh.ipv6h - skb->data, 1); + printk("\n"); + spin_unlock_bh(&log_lock); +} + +static struct nf_logger ip6_syslog_logger = { + .name = "ip6_log_syslog", + .logfn = &ip6_log_syslog_packet, + .me = THIS_MODULE, +}; + +static int __init init(void) +{ + if (nf_log_register(PF_INET6, &ip6_syslog_logger) < 0) { + printk(KERN_WARNING "ip6_log_syslog: not logging via system console " + "since somebody else already registered for PF_INET6\n"); + /* we cannot make module load fail here, since otherwise + * ip6tables userspace would abort */ + } + + return 0; +} + +static void __exit fini(void) +{ + nf_log_unregister_logger(&ip6_syslog_logger); +} + +module_init(init); +module_exit(fini); diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 1e6e311..e66fb3a 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -136,6 +136,17 @@ config NETFILTER_XT_TARGET_CONNMARK . The module will be called ipt_CONNMARK.o. If unsure, say `N'. +config NETFILTER_XT_TARGET_LOG + tristate '"LOG" target Support' + depends on NETFILTER_XTABLES + help + This option adds a `LOG' target, which allows you to create rules in + any iptables table which records the packet registered loggers like + NETFILTER_NETLINK_LOG, IP_NF_LOG_SYSLOG or IP6_NF_LOG_SYSLOG. + + To compile it as a module, choose M here. If unsure, say `N'. + + config NETFILTER_XT_TARGET_MARK tristate '"MARK" target support' depends on NETFILTER_XTABLES diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 9558727..f84601f 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -25,6 +25,7 @@ obj-$(CONFIG_NETFILTER_XTABLES) += x_tab # targets obj-$(CONFIG_NETFILTER_XT_TARGET_CLASSIFY) += xt_CLASSIFY.o obj-$(CONFIG_NETFILTER_XT_TARGET_CONNMARK) += xt_CONNMARK.o +obj-$(CONFIG_NETFILTER_XT_TARGET_LOG) += xt_LOG.o obj-$(CONFIG_NETFILTER_XT_TARGET_MARK) += xt_MARK.o obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o obj-$(CONFIG_NETFILTER_XT_TARGET_NOTRACK) += xt_NOTRACK.o diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c index 3e76bd0..7a7ddb4 100644 --- a/net/netfilter/nf_log.c +++ b/net/netfilter/nf_log.c @@ -6,6 +6,8 @@ #include #include #include +#include +#include /* for kmalloc */ #include #include "nf_internals.h" @@ -15,14 +17,19 @@ #define NF_LOG_PREFIXLEN 128 -static struct nf_logger *nf_logging[NPROTO]; /* = NULL */ +struct nf_logging_node { + struct list_head list; + struct nf_logger *logger; +}; + +static struct list_head nf_logging[NPROTO]; /* = NULL */ static DEFINE_SPINLOCK(nf_log_lock); -/* return EBUSY if somebody else is registered, EEXIST if the same logger - * is registred, 0 on success. */ +/* EEXIST if the same logger is registred, 0 on success. */ int nf_log_register(int pf, struct nf_logger *logger) { - int ret = -EBUSY; + struct nf_logging_node *node; + int ret = 0; if (pf >= NPROTO) return -EINVAL; @@ -30,28 +37,50 @@ int nf_log_register(int pf, struct nf_lo /* Any setup of logging members must be done before * substituting pointer. */ spin_lock(&nf_log_lock); - if (!nf_logging[pf]) { - rcu_assign_pointer(nf_logging[pf], logger); - ret = 0; - } else if (nf_logging[pf] == logger) - ret = -EEXIST; - + /* Check if we are already registered */ + list_for_each_entry(node, &nf_logging[pf], list) { + if (node->logger == logger) { + ret = -EEXIST; + goto reg_out; + } + } + node = kmalloc(sizeof(struct nf_logging_node), GFP_KERNEL); + if (!node) { + ret = -ENOMEM; + goto reg_out; + } + node->logger = logger; + list_add_rcu(&node->list, &nf_logging[pf]); + +reg_out: spin_unlock(&nf_log_lock); + synchronize_net(); return ret; } EXPORT_SYMBOL(nf_log_register); -int nf_log_unregister_pf(int pf) +int nf_log_unregister_pf(int pf, struct nf_logger *logger) { + struct nf_logging_node *node; + int do_kfree = 0; + if (pf >= NPROTO) return -EINVAL; spin_lock(&nf_log_lock); - nf_logging[pf] = NULL; + list_for_each_entry(node, &nf_logging[pf], list) { + if (node->logger == logger) { + list_del_rcu(&node->list); + do_kfree=1; + break; + } + } spin_unlock(&nf_log_lock); /* Give time to concurrent readers. */ synchronize_net(); + if (do_kfree) + kfree(node); return 0; } @@ -61,14 +90,10 @@ void nf_log_unregister_logger(struct nf_ { int i; - spin_lock(&nf_log_lock); for (i = 0; i < NPROTO; i++) { - if (nf_logging[i] == logger) - nf_logging[i] = NULL; + nf_log_unregister_pf(i, logger); } - spin_unlock(&nf_log_lock); - synchronize_net(); } EXPORT_SYMBOL(nf_log_unregister_logger); @@ -82,20 +107,25 @@ void nf_log_packet(int pf, { va_list args; char prefix[NF_LOG_PREFIXLEN]; - struct nf_logger *logger; + struct nf_logging_node *node; + + if (pf >= NPROTO) + return; + va_start(args, fmt); + vsnprintf(prefix, sizeof(prefix), fmt, args); + va_end(args); + rcu_read_lock(); - logger = rcu_dereference(nf_logging[pf]); - if (logger) { - va_start(args, fmt); - vsnprintf(prefix, sizeof(prefix), fmt, args); - va_end(args); - /* We must read logging before nf_logfn[pf] */ - logger->logfn(pf, hooknum, skb, in, out, loginfo, prefix); - } else if (net_ratelimit()) { + if (list_empty(&nf_logging[pf]) && net_ratelimit()) { printk(KERN_WARNING "nf_log_packet: can\'t log since " "no backend logging module loaded in! Please either " "load one, or disable logging explicitly\n"); + rcu_read_unlock(); + return; + } + list_for_each_entry_rcu(node, &nf_logging[pf], list) { + node->logger->logfn(pf, hooknum, skb, in, out, loginfo, prefix); } rcu_read_unlock(); } @@ -130,14 +160,20 @@ static void seq_stop(struct seq_file *s, static int seq_show(struct seq_file *s, void *v) { loff_t *pos = v; - const struct nf_logger *logger; + struct nf_logging_node *node; - logger = rcu_dereference(nf_logging[*pos]); - if (!logger) + if (list_empty(&nf_logging[*pos])) return seq_printf(s, "%2lld NONE\n", *pos); + if (seq_printf(s, "%2lld", *pos) < 0) + return -1; + + list_for_each_entry_rcu(node, &nf_logging[*pos], list) { + if (seq_printf(s, " %s", node->logger->name) < 0) + return -1; + } - return seq_printf(s, "%2lld %s\n", *pos, logger->name); + return seq_putc(s, '\n'); } static struct seq_operations nflog_seq_ops = { @@ -165,6 +201,8 @@ static struct file_operations nflog_file int __init netfilter_log_init(void) { + int i; + #ifdef CONFIG_PROC_FS struct proc_dir_entry *pde; @@ -174,5 +212,9 @@ int __init netfilter_log_init(void) pde->proc_fops = &nflog_file_ops; #endif + + for (i = 0; i < NPROTO; i++) { + INIT_LIST_HEAD(&nf_logging[i]); + } return 0; } diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c index dfb25fd..e24a058 100644 --- a/net/netfilter/nfnetlink_log.c +++ b/net/netfilter/nfnetlink_log.c @@ -574,6 +574,7 @@ nfattr_failure: #define RCV_SKB_FAIL(err) do { netlink_ack(skb, nlh, (err)); return; } while (0) +/* static struct nf_loginfo default_loginfo = { .type = NF_LOG_TYPE_ULOG, .u = { @@ -584,6 +585,7 @@ static struct nf_loginfo default_loginfo }, }, }; +*/ /* log handler for internal netfilter logging api */ static void @@ -601,17 +603,17 @@ nfulnl_log_packet(unsigned int pf, unsigned int qthreshold; unsigned int nlbufsiz; - if (li_user && li_user->type == NF_LOG_TYPE_ULOG) + if (li_user && (li_user->backends & NF_LOG_BACKEND_NFLOG)) li = li_user; - else - li = &default_loginfo; + else /* This packet is none of our buisness */ + return; - inst = instance_lookup_get(li->u.ulog.group); + inst = instance_lookup_get(li->group); if (!inst) inst = instance_lookup_get(0); if (!inst) { PRINTR("nfnetlink_log: trying to log packet, " - "but no instance for group %u\n", li->u.ulog.group); + "but no instance for group %u\n", li->group); return; } @@ -644,8 +646,8 @@ nfulnl_log_packet(unsigned int pf, qthreshold = inst->qthreshold; /* per-rule qthreshold overrides per-instance */ - if (qthreshold > li->u.ulog.qthreshold) - qthreshold = li->u.ulog.qthreshold; + if (qthreshold > li->qthreshold) + qthreshold = li->qthreshold; switch (inst->copy_mode) { case NFULNL_COPY_META: @@ -848,7 +850,7 @@ nfulnl_recv_config(struct sock *ctnl, st UDEBUG("unregistering log handler for pf=%u\n", pf); /* This is a bug and a feature. We cannot unregister * other handlers, like nfnetlink_inst can */ - nf_log_unregister_pf(pf); + nf_log_unregister_pf(pf, &nfulnl_logger); break; default: ret = -EINVAL; diff --git a/net/netfilter/xt_LOG.c b/net/netfilter/xt_LOG.c new file mode 100644 index 0000000..c33ae0f --- /dev/null +++ b/net/netfilter/xt_LOG.c @@ -0,0 +1,136 @@ +/* + * This is a module which is used for logging packets. + */ + +/* (C) 1999-2001 Paul `Rusty' Russell + * (C) 2002-2004 Netfilter Core Team + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * 2006-03-04 Gregor Maier + * Unified logging. Use nf_log for everything. Create xt_LOG target + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Netfilter Core Team "); +MODULE_DESCRIPTION("[ip,ip6]_tables LOG module"); +MODULE_ALIAS("ipt_LOG"); +MODULE_ALIAS("ip6t_LOG"); + +static struct xt_target ipt_LOG_reg; +static struct xt_target ip6t_LOG_reg; + +static unsigned int +xt_log_target(struct sk_buff **pskb, + const struct net_device *in, + const struct net_device *out, + unsigned int hooknum, + const struct xt_target *target, + const void *targinfo, + void *userinfo) +{ + const struct xt_log_info *loginfo = targinfo; + + struct nf_loginfo li; + li.backends = loginfo->backends; + li.group = loginfo->group; + li.level = loginfo->level; + li.logflags = loginfo->logflags; + li.qthreshold = 1; + + if (target == &ipt_LOG_reg) + nf_log_packet(PF_INET, hooknum, *pskb, in, out, &li, loginfo->prefix); + else if (target == &ip6t_LOG_reg) + nf_log_packet(PF_INET6, hooknum, *pskb, in, out, &li, loginfo->prefix); + + return XT_CONTINUE; +} + + +static int xt_log_checkentry(const char *tablename, + const void *e, + const struct xt_target *target, + void *targinfo, + unsigned int targinfosize, + unsigned int hook_mask) +{ + const struct xt_log_info *loginfo = targinfo; + + + if (loginfo->level >= 8) { + printk(KERN_WARNING "LOG: level %u >= 8\n", loginfo->level); + return 0; + } + if (loginfo->prefix[sizeof(loginfo->prefix) - 1] != '\0') { + printk(KERN_WARNING "LOG: prefix term %i\n", + loginfo->prefix[sizeof(loginfo->prefix) - 1]); + return 0; + } + printk(KERN_WARNING "LOG: group setting is: %d\n", loginfo->group); + + + + return 1; +} + +static struct xt_target ipt_LOG_reg = { + .name = "LOG", + .target = xt_log_target, + .checkentry = xt_log_checkentry, + .targetsize = sizeof(struct xt_log_info), + .me = THIS_MODULE, +}; + +static struct xt_target ip6t_LOG_reg = { + .name = "LOG", + .target = xt_log_target, + .checkentry = xt_log_checkentry, + .targetsize = sizeof(struct xt_log_info), + .me = THIS_MODULE, +}; + + +static int __init init(void) +{ + int ret; + + printk(KERN_WARNING "LOG init called\n"); + ret = xt_register_target(AF_INET, &ipt_LOG_reg); + if (ret) + return ret; + ret = xt_register_target(AF_INET6, &ip6t_LOG_reg); + if (ret) + goto out_ip; + + printk(KERN_WARNING "LOG loaded\n"); + return ret; + +out_ip: + xt_unregister_target(AF_INET, &ipt_LOG_reg); + + return ret; +} + +static void __exit fini(void) +{ + xt_unregister_target(AF_INET, &ipt_LOG_reg); + xt_unregister_target(AF_INET6, &ip6t_LOG_reg); +} + +module_init(init); +module_exit(fini); From gandalf at wlug.westbo.se Sat Mar 4 21:11:51 2006 From: gandalf at wlug.westbo.se (Martin Josefsson) Date: Sat Mar 4 21:40:06 2006 Subject: Hashtrie testing2 (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) In-Reply-To: References: <43EFF1F0.1090701@netfilter.org> <20060213112028.GU4601@sunbeam.de.gnumonks.org> <43F438F5.8070607@trash.net> <43F43FA9.4000906@trash.net> <43F4426D.9060807@trash.net> <43F4DBDF.9010008@trash.net> Message-ID: <1141503111.3881.61.camel@localhost.localdomain> On Fri, 2006-02-17 at 10:30 +0100, Jozsef Kadlecsik wrote: > Here are the tests I'd like to see against hashtrie: > > - larger hashentry, i.e. when HASHSIFT is equal to 6, 7 or 8 I have seen good results when sizeof(struct hashentry) = 64 on my laptop. But on the other hand my laptop likes the padding of struct hashentry at the beginning with gives unaligned pointers. It's an Pentium M processor in my laptop. > - other HASHALIGN, hashbit_t values: (32, u8), (64, u16) and (64, u32). Yes this needs more testing as well. > The current (32, u8) doesn't look optimal on 64bit CPUs, (64, u16) seems > to be the best, but without testing it's hard to choose. Currently I don't have any 64bit cpus to test things on. > - DoS against hashtrie by non-random tuples: single fixed destination IP, > port and successive source IP, port numbers. (I don't think the current > max 7 levels (childs) can survive such an attack.) I just ran some tests on this with this code: dstip = 192 << 24 | 168 << 16 | 1 << 8 | 1; srcip = 200 << 24 | 100 << 16 | 50 << 8 | 0; conntrack[i]->tuple[IP_CT_DIR_ORIGINAL].src.ip = srcip + rand() % 1024; conntrack[i]->tuple[IP_CT_DIR_ORIGINAL].src.u.tcp.port = (u16)rand(); conntrack[i]->tuple[IP_CT_DIR_ORIGINAL].dst.ip = dstip; conntrack[i]->tuple[IP_CT_DIR_ORIGINAL].dst.u.tcp.port = 80; And this is the result I got: sizeof struct hashentry: 32 sizeof struct ip_conntrack: 48 number of conntrack entries: 1228800 number of slots per hashtrie bucket: 5 number of pad bytes: 3 number of bytes per child: 2048 insert: time: 11299 cyc, 18854 ns (53040/s) Number of failed inserts: 22232 Number of entries in hashtrie: 2435368 Number of children in hashtrie: 238756 Maxdepth of hashtrie: 3 (0 == root) Maximum memory usage: 488972288 2.4 million entries in the hashtrie with fixed dstip, port and 1024 srcip's with random ports and a maxdepth of 3. I'd say the jenkins hash is doing its job quite nicely, wasn't expecting such good results. In the results above you see my main worry regarding hashtrie, the memory usage, 200 bytes per entry in hashtrie (400 bytes per conntrack) in this scenario, which means that the child-nodes aren't very populated in the leaf nodes. 2435368 entries / 238756 child-nodes = 10.2 entries/child-node And there's 2048 / 32 = 64 struct hashentry per child-node. That's 64 * 5 = 320 entries/child-node 10.2 / 320 = 3.2% usage of child-nodes in average which is simply horrible. I'm almost starting to think there's a major bug in there somewhere. I need to write some code to walk the tree and calculate the usage of each level to see that it's actually 100% for all buckets that has a child. > Somehow I don't really like the eviction algorithm. What about some lazy > auto-eviction instead: say, if there are more than 90% of the max > elements, then drop a (any) unassured connection which can be found on the > path when inserting a new one. Thus the current fixed stack could be > eliminated and there were no builtin limit in the depth. Couldn't this lead to the situation where we evict an entry early on the path, and then that slot gets reused for another entry that's also unassured, and it repeats... The problem with eviction in the hashtrie is that the depth of the entry has no correlation to the age of the entry like in the current hashtable (that's only true if you have long linked lists which leads to poor performance, so no real-world installation has properly working evication anyway today so maybe it doesn't matter too much) > > I have an old untested patch against nf_conntrack as well but it needs > > some rewriting of the conntrack locking in order to avoid an SMP deadlock. > > Rusty's lock-ordering is a sure solution, but penalizes the process. > I have been thinking on wether we could use simply, separated, unordered > locking: > > 1. lock bucket according to tuple1 > add element > unlock bucket > 2. lock bucket according to tuple2, > add element > unlock bucket > if (add element failed) > undo 1. Yes this is the way I've been thinking about. If we get into the case where we have conntrack A and B that have the exact same tuples but in reverse it doesn't really matter if one gets dropped or both, this is strictly best effort, no need to bend over backwards in order to minimize racewindows if it isn't anything serious. The only case I can think about where it might matter is the case of simultaneous open from both sides with the same source/destination ports, dns (some clients issue requests from port 53), games and ipsec isakmp come to mind. -- /Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : /pipermail/netfilter-devel/attachments/20060304/c62553ff/attachment.pgp From kadlec at blackhole.kfki.hu Sun Mar 5 10:49:43 2006 From: kadlec at blackhole.kfki.hu (Jozsef Kadlecsik) Date: Sun Mar 5 11:01:35 2006 Subject: Hashtrie testing (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) In-Reply-To: <1141489406.3881.23.camel@localhost.localdomain> References: <43EFF1F0.1090701@netfilter.org> <20060213112028.GU4601@sunbeam.de.gnumonks.org> <43F438F5.8070607@trash.net> <43F43FA9.4000906@trash.net> <43F4426D.9060807@trash.net> <43F4DBDF.9010008@trash.net> <1141489406.3881.23.camel@localhost.localdomain> Message-ID: Hi Martin, On Sat, 4 Mar 2006, Martin Josefsson wrote: > I'll move my svn tree to the netfilter svn sometime soon. That'll be great! > > - there should be some (whatever artifical) tests to study the > > dynamical behaviour of the hashtrie: say all the entries are > > added, then half of them deleted and added back again, but > > in reverse order. Will the maxdepth grow? > > I've performed some simple tests with this and here are the results: > > First I tried removing the upper half of the conntrack entries (it's the > conntrack entries index in the test array that's printed out in the > "Removing" line below) > > Number of entries in hashtrie: 819200 > Number of children in hashtrie: 27345 > Maxdepth of hashtrie: 3 (0 == root) > Removing entries between 204800 and 409599. > Adding entries between 409599 and 204800. > insert (half reverse): time: 1484 cyc, 929 ns (1076184/s) > Number of entries in hashtrie: 819200 > Number of children in hashtrie: 27345 > Maxdepth of hashtrie: 3 (0 == root) > > No diffrence at all which is a bit weird, I had to doublecheck the code > to see that I really readded the entries in reverse order and I did. That's really weird: same maxdepth and exactly the same number of children! > Then I tried it with the lower half of the conntrack entries, those > lower in the tree and got this: > > Number of entries in hashtrie: 819200 > Number of children in hashtrie: 27273 > Maxdepth of hashtrie: 3 (0 == root) > Removing entries between 0 and 204799. > Adding entries between 204799 and 0. > insert (half reverse): time: 1627 cyc, 1018 ns (981869/s) > Number of entries in hashtrie: 819200 > Number of children in hashtrie: 29189 > Maxdepth of hashtrie: 3 (0 == root) > > Here wee see that the number of child-nodes has increased from 27273 to > 29189, that's a 7.0% increase, but the max depth hasn't increased in > this particular test. Given the right sitation the maxdepth will > probably increase, it's likely to occur sometimes since the number of > child-nodes has increased. Half of the entries were removed then re-added and it produced 7% increase in the number of the child nodes. That might be good or bad as well. [...] > More testing is clearly needed. Yes, one is somehow uneasy. Hm. What about filling up the hashtrie and then some long loop of deleting 10% at random and adding *new* random entries? We could measure the change in maxdepth/childnodes after every delete/add cycle. What were the peak numbers? At what numbers would it stabilize? Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From kadlec at blackhole.kfki.hu Sun Mar 5 12:24:55 2006 From: kadlec at blackhole.kfki.hu (Jozsef Kadlecsik) Date: Sun Mar 5 12:36:46 2006 Subject: Hashtrie testing2 (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) In-Reply-To: <1141503111.3881.61.camel@localhost.localdomain> References: <43EFF1F0.1090701@netfilter.org> <20060213112028.GU4601@sunbeam.de.gnumonks.org> <43F438F5.8070607@trash.net> <43F43FA9.4000906@trash.net> <43F4426D.9060807@trash.net> <43F4DBDF.9010008@trash.net> <1141503111.3881.61.camel@localhost.localdomain> Message-ID: Hi Martin, On Sat, 4 Mar 2006, Martin Josefsson wrote: > On Fri, 2006-02-17 at 10:30 +0100, Jozsef Kadlecsik wrote: > > > Here are the tests I'd like to see against hashtrie: > > > > - larger hashentry, i.e. when HASHSIFT is equal to 6, 7 or 8 > > I have seen good results when sizeof(struct hashentry) = 64 on my > laptop. But on the other hand my laptop likes the padding of struct > hashentry at the beginning with gives unaligned pointers. It's an > Pentium M processor in my laptop. HASHSIFT (i.e. HASHNUM) defines the hash size of the hashtrie. But the hashtrie - according to your tests - are much "wider" than "deep". So there might be no point in widening the hash. > > - other HASHALIGN, hashbit_t values: (32, u8), (64, u16) and (64, u32). > > Yes this needs more testing as well. > > > The current (32, u8) doesn't look optimal on 64bit CPUs, (64, u16) seems > > to be the best, but without testing it's hard to choose. > > Currently I don't have any 64bit cpus to test things on. With the default (32, u8) settings NUMENTRY looks too small on 64bit CPUs: HASHALIGN = 32 hashbits_t = u8 32bit CPU: NUMENTRY = (32 - 4)/(1+4) = 5 PADNUM = 32 - 4 - 5*(1+4) = 3 64bit CPU: NUMENTRY = (32 - 8)/(1+8) = 2 PADNUM = 32 - 8 - 2*(1+8) = 6 HASHALIGN = 64 hashbits_t = u16 32bit CPU: NUMENTRY = (64 - 4)/(2+4) = 10 PADNUM = 64 - 4 - 10*(2+4) = 0 64bit CPU: NUMENTRY = (64 - 8)/(2+8) = 5 PADNUM = 64 - 8 - 5*(2+8) = 6 HASHALIGN = 64 hashbits_t = u32 32bit CPU: NUMENTRY = (64 - 4)/(4+4) = 7 PADNUM = 64 - 4 - 7*(4+4) = 4 64bit CPU: NUMENTRY = (64 - 8)/(4+8) = 4 PADNUM = 64 - 8 - 4*(4+8) = 8 > > - DoS against hashtrie by non-random tuples: single fixed destination IP, > > port and successive source IP, port numbers. (I don't think the current > > max 7 levels (childs) can survive such an attack.) > > I just ran some tests on this with this code: [...] > And this is the result I got: > > sizeof struct hashentry: 32 > sizeof struct ip_conntrack: 48 > number of conntrack entries: 1228800 > number of slots per hashtrie bucket: 5 > number of pad bytes: 3 > number of bytes per child: 2048 > insert: time: 11299 cyc, 18854 ns (53040/s) > Number of failed inserts: 22232 > Number of entries in hashtrie: 2435368 > Number of children in hashtrie: 238756 > Maxdepth of hashtrie: 3 (0 == root) > Maximum memory usage: 488972288 > > 2.4 million entries in the hashtrie with fixed dstip, port and 1024 > srcip's with random ports and a maxdepth of 3. I'd say the jenkins hash > is doing its job quite nicely, wasn't expecting such good results. The hashtrie tends to grow wider and not deeper: level: childs (max) entries (max) 0 64 320 1 4160 20800 2 266304 1331520 3 17043520 85217600 > In the results above you see my main worry regarding hashtrie, the > memory usage, 200 bytes per entry in hashtrie (400 bytes per conntrack) > in this scenario, which means that the child-nodes aren't very populated > in the leaf nodes. > > 2435368 entries / 238756 child-nodes = 10.2 entries/child-node > > And there's 2048 / 32 = 64 struct hashentry per child-node. > That's 64 * 5 = 320 entries/child-node > > 10.2 / 320 = 3.2% usage of child-nodes in average which is simply > horrible. > > I'm almost starting to think there's a major bug in there somewhere. > I need to write some code to walk the tree and calculate the usage of > each level to see that it's actually 100% for all buckets that has a > child. Yes, that could help to spot the problem. But if there are clashes (due to using to small parts of the hash key, i.e hashbits_t = u8), then the node can be fairly low utilized and still there is a need to expand by new childs. > > Somehow I don't really like the eviction algorithm. What about some lazy > > auto-eviction instead: say, if there are more than 90% of the max > > elements, then drop a (any) unassured connection which can be found on the > > path when inserting a new one. Thus the current fixed stack could be > > eliminated and there were no builtin limit in the depth. > > Couldn't this lead to the situation where we evict an entry early on the > path, and then that slot gets reused for another entry that's also > unassured, and it repeats... You mean, we could always evict the newest unassured entries instead of the oldest ones? The algorithm could be extended so that we'd walk the whole tree and register the oldest unassured entry in the path. I'm more worried that we'd check too litle number of entries as the tree is shorter than I thought. > > I have been thinking on wether we could use simply, separated, unordered > > locking: [...] > The only case I can think about where it might matter is the case of > simultaneous open from both sides with the same source/destination > ports, dns (some clients issue requests from port 53), games and ipsec > isakmp come to mind. That can be fixed by checking that we undo the insertion for the same conntrack entry we were unable to add at the second step. Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From c-d.hailfinger.devel.2006 at gmx.net Sun Mar 5 12:43:41 2006 From: c-d.hailfinger.devel.2006 at gmx.net (Carl-Daniel Hailfinger) Date: Sun Mar 5 12:56:16 2006 Subject: IPSET patches from pom-ng don't apply to 2.6.16-rc5 In-Reply-To: References: <44043F49.7050304@gmx.net> Message-ID: <440ACEED.9050109@gmx.net> Jozsef Kadlecsik schrieb: > On Thu, 2 Mar 2006, Jozsef Kadlecsik wrote: > >>On Tue, 28 Feb 2006, Carl-Daniel Hailfinger wrote: >> >>>applying the ipset patch from patch-o-matic-ng doesn't work anymore: >>> >>>unable to find ladd slot in src /tmp/pom-26985/net/ipv4/netfilter/Makefile (./patchlets/set/linux-2.6/./net/ipv4/netfilter/Makefile.ladd) >> >>The marker points from net/ipv4/netfilter/Makefile were moved into >>net/netfilter/Makefile thus those cannot be found anymore. > > It is fixed in svn. Thanks! Regards, Carl-Daniel -- http://www.hailfinger.org/ From gandalf at wlug.westbo.se Sun Mar 5 14:24:16 2006 From: gandalf at wlug.westbo.se (Martin Josefsson) Date: Sun Mar 5 14:36:53 2006 Subject: Hashtrie testing (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) In-Reply-To: References: <43EFF1F0.1090701@netfilter.org> <20060213112028.GU4601@sunbeam.de.gnumonks.org> <43F438F5.8070607@trash.net> <43F43FA9.4000906@trash.net> <43F4426D.9060807@trash.net> <43F4DBDF.9010008@trash.net> <1141489406.3881.23.camel@localhost.localdomain> Message-ID: <1141565056.3881.74.camel@localhost.localdomain> On Sun, 2006-03-05 at 10:49 +0100, Jozsef Kadlecsik wrote: > Hi Martin, Hi Jozsef > > I'll move my svn tree to the netfilter svn sometime soon. > > That'll be great! After a small struggle with 'svnadmin dump', 'svndumpfilter' and 'sed' it's finally there with all history. trunk/hashtrie Enjoy and don't laugh too much at all the early experimentation :) > > First I tried removing the upper half of the conntrack entries (it's the > > conntrack entries index in the test array that's printed out in the > > "Removing" line below) > > > > Number of entries in hashtrie: 819200 > > Number of children in hashtrie: 27345 > > Maxdepth of hashtrie: 3 (0 == root) > > Removing entries between 204800 and 409599. > > Adding entries between 409599 and 204800. > > insert (half reverse): time: 1484 cyc, 929 ns (1076184/s) > > Number of entries in hashtrie: 819200 > > Number of children in hashtrie: 27345 > > Maxdepth of hashtrie: 3 (0 == root) > > > > No diffrence at all which is a bit weird, I had to doublecheck the code > > to see that I really readded the entries in reverse order and I did. > > That's really weird: same maxdepth and exactly the same number of > children! Yes I thought so too, I even added a bunch of debug printf's to the code to verify that it actually readded them in reverse order, and it did. > > Then I tried it with the lower half of the conntrack entries, those > > lower in the tree and got this: > > > > Number of entries in hashtrie: 819200 > > Number of children in hashtrie: 27273 > > Maxdepth of hashtrie: 3 (0 == root) > > Removing entries between 0 and 204799. > > Adding entries between 204799 and 0. > > insert (half reverse): time: 1627 cyc, 1018 ns (981869/s) > > Number of entries in hashtrie: 819200 > > Number of children in hashtrie: 29189 > > Maxdepth of hashtrie: 3 (0 == root) > > > > Here wee see that the number of child-nodes has increased from 27273 to > > 29189, that's a 7.0% increase, but the max depth hasn't increased in > > this particular test. Given the right sitation the maxdepth will > > probably increase, it's likely to occur sometimes since the number of > > child-nodes has increased. > > Half of the entries were removed then re-added and it produced 7% increase > in the number of the child nodes. That might be good or bad as well. Yes I don't know either :) In these tests both source and destination ipaddresses are random, that's usually not the case in most installations. With fixed dst ip, port and 1024 sourceipaddresses and random src ports it increases by 7.2% > [...] > > More testing is clearly needed. > > Yes, one is somehow uneasy. Hm. What about filling up the hashtrie and > then some long loop of deleting 10% at random and adding *new* random > entries? We could measure the change in maxdepth/childnodes after every > delete/add cycle. What were the peak numbers? At what numbers would it > stabilize? I'll test this as well during the evening. Removing 10% of the entries at random, and then give those entries new random values and readd them. -- /Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : /pipermail/netfilter-devel/attachments/20060305/27970509/attachment.pgp From codeslinger at gmail.com Sun Mar 5 18:07:53 2006 From: codeslinger at gmail.com (Toby DiPasquale) Date: Sun Mar 5 18:20:29 2006 Subject: svn.netfilter.org having problems? Message-ID: <876ef97a0603050907l23f4015dx97e0b349fca19ece@mail.gmail.com> Here's what I get when I go to http://svn.netfilter.org/cgi-bin/viewcvs.cgi/trunk An Exception Has Occurred Python Traceback Traceback (most recent call last): File "/var/www/localhost/viewcvs/lib/viewcvs.py", line 3283, in main request.run_viewcvs() File "/var/www/localhost/viewcvs/lib/viewcvs.py", line 262, in run_viewcvs import vclib.svn File "/var/www/localhost/viewcvs/lib/vclib/svn/__init__.py", line 28, in ? from svn import fs, repos, core, delta File "/usr/lib/python2.4/site-packages/svn/fs.py", line 19, in ? from libsvn.fs import * File "/usr/lib/python2.4/site-packages/libsvn/fs.py", line 5, in ? import _fs ImportError: libswigpy.so.0: cannot open shared object file: No such file or directory Is there maintenance ongoing right now? -- Toby DiPasquale 0x636f6465736c696e67657240676d61696c2e636f6d From gandalf at wlug.westbo.se Sun Mar 5 18:48:58 2006 From: gandalf at wlug.westbo.se (Martin Josefsson) Date: Sun Mar 5 19:01:37 2006 Subject: Hashtrie testing2 (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) In-Reply-To: References: <43EFF1F0.1090701@netfilter.org> <20060213112028.GU4601@sunbeam.de.gnumonks.org> <43F438F5.8070607@trash.net> <43F43FA9.4000906@trash.net> <43F4426D.9060807@trash.net> <43F4DBDF.9010008@trash.net> <1141503111.3881.61.camel@localhost.localdomain> Message-ID: <1141580938.3881.129.camel@localhost.localdomain> On Sun, 2006-03-05 at 12:24 +0100, Jozsef Kadlecsik wrote: > Hi Martin, Hi Jozsef > HASHSIFT (i.e. HASHNUM) defines the hash size of the hashtrie. > But the hashtrie - according to your tests - are much "wider" than > "deep". So there might be no point in widening the hash. HASHSHIFT and HASHNUM controls the number of struct hashentry per child. The original idea was to allocate 1 page (4kB) for each child but I got better results with 2kB, which means it isn't as wide as it was from the start. > > > - other HASHALIGN, hashbit_t values: (32, u8), (64, u16) and (64, u32). > > > > Yes this needs more testing as well. > > > > > The current (32, u8) doesn't look optimal on 64bit CPUs, (64, u16) seems > > > to be the best, but without testing it's hard to choose. > > > > Currently I don't have any 64bit cpus to test things on. HASHALIGN is the size of struct hashentry, the idea is that HASHALIGN should be equal to the cacheline size of the cpu it's running on. That way you only get one cachemiss per level in the tree. HASHALIGN is set to 32 bytes just because my laptop has 32byte cachelines, other cpus has 64bytes or even 128bytes. hashbit_t is an u8 because in my tests I havn't see any gain by going to more bits. The only purpose of hashbits is to "cheat" a bit during lookup, store a part of the hash-value in the datastructure so you can easily determine if a member even can be a possible match by comparing the hashbits to the hashvalue you are searching for, this eliminates a lot of cachemisses (without this the datastructure will behave more like a linked list). I've tested with diffrent number of bits for hashbits but I havn't seen any real gain by going over 7 bits which is what is currently used, and the last bit is used as status bit for ASSURED. > With the default (32, u8) settings NUMENTRY looks too small on 64bit > CPUs: Yes it's too small, luckily I don't know of any 64bit cpus that have 32byte cachelines :) Lets say we keep hashbits as an u8 then we get the following results for diffrent cacheline sizes (I believe the intel p4 has 128byte cachelines) HASHALIGN = 32 hashbits_t = u8 32bit CPU: NUMENTRY = (32 - 4)/(1+4) = 5 PADNUM = 32 - 4 - 5*(1+4) = 3 64bit CPU: NUMENTRY = (32 - 8)/(1+8) = 2 PADNUM = 32 - 8 - 2*(1+8) = 6 HASHALIGN = 64 hashbits_t = u8 32bit CPU: NUMENTRY = (64 - 4)/(1 + 4) = 12 PADNUM = 64 - 4 - 12*(1 + 4) = 0 64bit CPU: NUMENTRY = (64 - 8)/(1 + 8) = 6 PADNUM = 64 - 8 - 6*(1 + 8) = 2 HASHALIGN = 128 hashbits_t = u8 32bit CPU: NUMENTRY = (128 - 4)/(1 +4) = 24 PADNUM = 128 - 4 - 24*(1 + 4) = 4 64bit CPU: NUMENTRY = (128 - 8)/(1 + 8) = 13 PADNUM = 128 - 8 - 13*(1 + 8) = 3 When HASHALIGN goes up we need to reduce HASHSHIFT in order to make sure we don't try to allocate > 1 page (usually 4kB) of memory. > > > - DoS against hashtrie by non-random tuples: single fixed destination IP, > > > port and successive source IP, port numbers. (I don't think the current > > > max 7 levels (childs) can survive such an attack.) [snip] > > 2.4 million entries in the hashtrie with fixed dstip, port and 1024 > > srcip's with random ports and a maxdepth of 3. I'd say the jenkins hash > > is doing its job quite nicely, wasn't expecting such good results. > > The hashtrie tends to grow wider and not deeper: > > level: childs (max) entries (max) > 0 64 320 > 1 4160 20800 > 2 266304 1331520 > 3 17043520 85217600 It grows wider if and only if we get hashvalues that aren't too similar in the high bits. If that happens we grow deeper instead of wider. But it seems the jenkins hash is really good at distributing the changes over all of the bits in the hashvalue. (it's just a little bit on the heavy side, it uses plenty of cpu) > > I'm almost starting to think there's a major bug in there somewhere. > > I need to write some code to walk the tree and calculate the usage of > > each level to see that it's actually 100% for all buckets that has a > > child. > > Yes, that could help to spot the problem. But if there are clashes > (due to using to small parts of the hash key, i.e hashbits_t = u8), then > the node can be fairly low utilized and still there is a need to expand by > new childs. Small hashbits_t doesn't have anything to do with it, hashbits_t is just for accelerating lookups. If we have a low number of struct hashentry per child-node we'll probably fill them up faster since there's fewer of them to be filled. And if we have a low NUMENTRY they will also be filled faster. > > Couldn't this lead to the situation where we evict an entry early on the > > path, and then that slot gets reused for another entry that's also > > unassured, and it repeats... > > You mean, we could always evict the newest unassured entries instead of > the oldest ones? The algorithm could be extended so that we'd walk the > whole tree and register the oldest unassured entry in the path. I'm more > worried that we'd check too litle number of entries as the tree is shorter > than I thought. The problem is how do you know which entry along a path is the oldest? Entries aren't always added to the front like with the current linked lists in the hashtable. > > > I have been thinking on wether we could use simply, separated, unordered > > > locking: > [...] > > The only case I can think about where it might matter is the case of > > simultaneous open from both sides with the same source/destination > > ports, dns (some clients issue requests from port 53), games and ipsec > > isakmp come to mind. > > That can be fixed by checking that we undo the insertion for the same > conntrack entry we were unable to add at the second step. What I meant is the following situation: clients A & B are both located behind NAT and are trying to connect to each other. Both bind to the same port, lets say X and try to connect to each others external ip. Currently if the packets from both clients arrive at the same time at an SMP firewall we add a conntrack entry for one of the clients but drop the second one because of the global lock. With Rustys lock ordering we keep this behaviour since both buckets are locked in the same order for both packets. With the "unordered independent locking" of buckets we might end up with: these happen at the same time: packet from client A hashes into bucket C, lock and add entry. packet from client B hashes into bucket D, lock and add entry. and after that these happen at the same time: packet from client A hashes into bucket D, lock and find entry, remove entry in bucket C packet from client B hashes into bucket C, lock and find entry, remove entry in bucket D This leads to no conntrack entry created for any of the clients connection attempts. The solution would be "ordered independent locking" which locks and adds entries based on bucket number that the tuplex hash into. That way we still keep the current behaviour while simplifying the locking, no chance of deadlocks. -- /Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : /pipermail/netfilter-devel/attachments/20060305/7bc29f5e/attachment.pgp From sfrost at snowman.net Mon Mar 6 04:04:11 2006 From: sfrost at snowman.net (Stephen Frost) Date: Mon Mar 6 04:15:46 2006 Subject: BUG: More ipt_recent queries In-Reply-To: <44096842.3010507@trash.net> References: <43FAF692.8030804@ufomechanic.net> <44096842.3010507@trash.net> Message-ID: <20060306030411.GZ4474@ns.snowman.net> * Patrick McHardy (kaber@trash.net) wrote: > Stephen, can you have a look at this please? Alright. I'll try to look over the current ipt_recent state and the comments I've seen regarding it. I don't expect to have time to rewrite it anytime soon though. Stephen > Amin Azez wrote: > > I'm concerned about ipt_recent where it removes entries from the list. > > > > Surly the move-up-and-close-the-gap while loop will never enter because > > time_info[time_loc].time has just been set to 0 so that this clause of > > the while loop: > > > > time_info[(time_loc+1) % ip_list_tot].time < time_info[time_loc].time) > > > > will always be false. > > > > Fuller code segment: > > > > location = hash_table[hash_result]; > > hash_table[r_list[location].hash_entry] = -1; > > time_loc = r_list[location].time_pos; > > time_info[time_loc].time = 0; > > time_info[time_loc].position = location; > > > > while((time_info[(time_loc+1) % ip_list_tot].time < > > time_info[time_loc].time) && ((time_loc+1) % ip_list_tot) != > > curr_table->time_pos) { > > time_temp = time_info[time_loc].time; > > time_info[time_loc].time = time_info[(time_loc+1)%ip_list_tot].time; > > time_info[(time_loc+1)%ip_list_tot].time = time_temp; > > time_temp = time_info[time_loc].position; > > time_info[time_loc].position = > > time_info[(time_loc+1)%ip_list_tot].position; > > time_info[(time_loc+1)%ip_list_tot].position = time_temp; > > r_list[time_info[time_loc].position].time_pos = time_loc; > > r_list[time_info[(time_loc+1)%ip_list_tot].position].time_pos = > > (time_loc+1)%ip_list_tot; > > time_loc = (time_loc+1) % ip_list_tot; > > } > > > > > > I think we should set time_info[time_loc].time = 0; at the end of the > > while loop? > > > > Sam > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : /pipermail/netfilter-devel/attachments/20060306/41c92554/attachment.pgp From kadlec at blackhole.kfki.hu Mon Mar 6 10:21:40 2006 From: kadlec at blackhole.kfki.hu (Jozsef Kadlecsik) Date: Mon Mar 6 10:33:41 2006 Subject: svn.netfilter.org having problems? In-Reply-To: <876ef97a0603050907l23f4015dx97e0b349fca19ece@mail.gmail.com> References: <876ef97a0603050907l23f4015dx97e0b349fca19ece@mail.gmail.com> Message-ID: On Sun, 5 Mar 2006, Toby DiPasquale wrote: > Here's what I get when I go to > http://svn.netfilter.org/cgi-bin/viewcvs.cgi/trunk > > > An Exception Has Occurred > Python Traceback According to Harald, a bugzilla fix blowed away part (dependecy) of cvsweb. It'll take a while to get fixed, unfortunately. Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From michael at webwork-solutions.com Sat Mar 4 13:18:35 2006 From: michael at webwork-solutions.com (Michael Leow) Date: Mon Mar 6 14:08:25 2006 Subject: [PATCH] nfsim configure Message-ID: <4409859B.2070207@webwork-solutions.com> Greetings from the 7th Asia OSS Code Fest! Hit the following problem while compiling using the following environment:- Linux 2.6.15.5 source and nfsim-20060303. Due to a missing x_tables.c; the Makefile is not created. Here is a simple patch for your consideration. leow@xen:~/broken-nfsim/nfsim-fix$ diff -rN -u ../nfsim-20060303/configure configure --- ../nfsim-20060303/configure 2006-01-09 04:55:34.000000000 +0800 +++ configure 2006-03-04 15:55:47.000000000 +0800 @@ -107,6 +107,9 @@ echo "import: netfilter/gen/$f" >> Makefile.import done cat $KERNELDIR/net/netfilter/Makefile | grep -v '^obj-$(CONFIG_NETFILTER)' | grep -v '^netfilter-objs' > netfilter/gen/Makefile + else + # Empty ... :( + touch netfilter/gen/Makefile fi fi Regards, Michael Leow HighTraffic.com (A Division of WebWork-Solutions Sdn Bhd) From kadlec at blackhole.kfki.hu Mon Mar 6 14:15:10 2006 From: kadlec at blackhole.kfki.hu (Jozsef Kadlecsik) Date: Mon Mar 6 14:27:07 2006 Subject: Hashtrie testing2 (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) In-Reply-To: <1141580938.3881.129.camel@localhost.localdomain> References: <43EFF1F0.1090701@netfilter.org> <20060213112028.GU4601@sunbeam.de.gnumonks.org> <43F438F5.8070607@trash.net> <43F43FA9.4000906@trash.net> <43F4426D.9060807@trash.net> <43F4DBDF.9010008@trash.net> <1141503111.3881.61.camel@localhost.localdomain> <1141580938.3881.129.camel@localhost.localdomain> Message-ID: Hi Martin On Sun, 5 Mar 2006, Martin Josefsson wrote: > HASHSHIFT and HASHNUM controls the number of struct hashentry per child. > The original idea was to allocate 1 page (4kB) for each child but I got > better results with 2kB, which means it isn't as wide as it was from the > start. Did you get faster lookups/insertions/deletions with half of the page size? > HASHALIGN is the size of struct hashentry, the idea is that HASHALIGN > should be equal to the cacheline size of the cpu it's running on. That > way you only get one cachemiss per level in the tree. > HASHALIGN is set to 32 bytes just because my laptop has 32byte > cachelines, other cpus has 64bytes or even 128bytes. As far as I see, the structure should satisfy "contradicting" requirements. We want an as low tree as possible, thus minimising the cost of the operations. That includes the number of levels of the childs and the number of slots in a bucket and the latter depends on the HASHALIGN, hashbits_t parameters and the members of the hashentry structure. So in theory we should maximize HASHSHIFT and minimize NUMENTRY. However we do not want a preallocated, huge, sparse hash either, and therefore it is hard to tell the best values. > hashbit_t is an u8 because in my tests I havn't see any gain by going to > more bits. > The only purpose of hashbits is to "cheat" a bit during lookup, store a > part of the hash-value in the datastructure so you can easily determine > if a member even can be a possible match by comparing the hashbits to > the hashvalue you are searching for, this eliminates a lot of > cachemisses (without this the datastructure will behave more like a > linked list). This structure is quite tricky :-). > > Yes, that could help to spot the problem. But if there are clashes > > (due to using to small parts of the hash key, i.e hashbits_t = u8), then > > the node can be fairly low utilized and still there is a need to expand by > > new childs. > > Small hashbits_t doesn't have anything to do with it, hashbits_t is just > for accelerating lookups. If we have a low number of struct hashentry > per child-node we'll probably fill them up faster since there's fewer of > them to be filled. And if we have a low NUMENTRY they will also be > filled faster. But how can we explain the so few entries in the buckets? Your numbers suggest that most of the buckets are pretty empty, as if there were two slots full from the 64 in the DoS case. Or we can say that the DoS has got an effect: the jenkins hash were unable to produce good enough keys. Or using the 32 bits from the 96 internal ones is not sufficient. But in the non-DoS random pattern case, there was 819200 entries / 27345 child-nodes =~ 29 entries/child-nodes, that's still around 10% utilization. So it looks it's not the jenkins hash which produces the sparse tree. > The problem is how do you know which entry along a path is the oldest? > Entries aren't always added to the front like with the current linked > lists in the hashtable. We could check the timer of the entry: the oldest one is that which would time out nearest in the future. [...] > these happen at the same time: > packet from client A hashes into bucket C, lock and add entry. > packet from client B hashes into bucket D, lock and add entry. > > and after that these happen at the same time: > packet from client A hashes into bucket D, lock and find entry, remove > entry in bucket C > packet from client B hashes into bucket C, lock and find entry, remove > entry in bucket D > > This leads to no conntrack entry created for any of the clients > connection attempts. > > The solution would be "ordered independent locking" which locks and adds > entries based on bucket number that the tuplex hash into. > That way we still keep the current behaviour while simplifying the > locking, no chance of deadlocks. That looks perfect! Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From kaber at trash.net Mon Mar 6 22:12:31 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Mar 6 22:25:22 2006 Subject: [NETFILTER 2.6.16]: Fix wrong option spelling in Makefile for CONFIG_BRIDGE_EBT_ULOG Message-ID: <440CA5BF.4010701@trash.net> Hi Dave, this patch fixes an incorrectly named option in the bridge-netfilter Makefile. Please apply to 2.6.16. -------------- next part -------------- [NETFILTER]: Fix wrong option spelling in Makefile for CONFIG_BRIDGE_EBT_ULOG Signed-off-by: Gregor Maier Signed-off-by: Patrick McHardy --- commit 35af439134b790b63c1708c3f2c2f54b88e3018f tree d4b211ea63c234edc3a23fdeee70ae7d30ce29e1 parent c499ec24c31edf270e777a868ffd0daddcfe7ebd author Gregor Maier Sat, 04 Mar 2006 09:58:39 +0100 committer Patrick McHardy Sat, 04 Mar 2006 09:58:39 +0100 net/bridge/netfilter/Makefile | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/bridge/netfilter/Makefile b/net/bridge/netfilter/Makefile index 8bf6d9f..905087e 100644 --- a/net/bridge/netfilter/Makefile +++ b/net/bridge/netfilter/Makefile @@ -29,4 +29,4 @@ obj-$(CONFIG_BRIDGE_EBT_SNAT) += ebt_sna # watchers obj-$(CONFIG_BRIDGE_EBT_LOG) += ebt_log.o -obj-$(CONFIG_BRIDGE_EBT_LOG) += ebt_ulog.o +obj-$(CONFIG_BRIDGE_EBT_ULOG) += ebt_ulog.o From hedeland at nortel.com Mon Mar 6 23:03:51 2006 From: hedeland at nortel.com (Per Hedeland) Date: Mon Mar 6 23:17:33 2006 Subject: [PATCH] oops in ipt_recent.c Message-ID: <200603062203.k26M3pI5024778@tordmule.bluetail.com> Hi, I got an oops due to ipt_recent stepping outside its hash table - this was on kernel 2.4, but it seems the bug is still there in latest 2.6. Patch against 2.6.15.6 below. --Per Hedeland --- a/net/ipv4/netfilter/ipt_recent.c Sun Mar 5 20:07:54 2006 +++ b/net/ipv4/netfilter/ipt_recent.c Mon Mar 6 22:25:53 2006 @@ -588,7 +588,7 @@ #endif /* Check if this is part of a collision chain */ while(hash_table[(orig_hash_result+1) % ip_list_hash_size] != -1) { - orig_hash_result++; + orig_hash_result = (orig_hash_result+1) % ip_list_hash_size; if(hash_func(r_list[hash_table[orig_hash_result]].addr,ip_list_hash_size) == hash_result) { /* Found collision chain, how deep does this rabbit hole go? */ #ifdef DEBUG From azez at ufomechanic.net Tue Mar 7 09:58:52 2006 From: azez at ufomechanic.net (Amin Azez) Date: Tue Mar 7 10:12:10 2006 Subject: [PATCH] oops in ipt_recent.c In-Reply-To: <200603062203.k26M3pI5024778@tordmule.bluetail.com> References: <200603062203.k26M3pI5024778@tordmule.bluetail.com> Message-ID: <440D4B4C.8090102@ufomechanic.net> Per, I'll add this to my rework of ipt_recent which adds time and limit querying of the list as a whole as well as of per-ip entries in the list. I'm just finishing testing this week, and will post a full patch next week. I'm curious how many entries you had in the list and what module parameters you used to trigger this bug. Sam Per Hedeland wrote: > Hi, > > I got an oops due to ipt_recent stepping outside its hash table > - this was on kernel 2.4, but it seems the bug is still there > in latest 2.6. Patch against 2.6.15.6 below. > > --Per Hedeland > > > --- a/net/ipv4/netfilter/ipt_recent.c Sun Mar 5 20:07:54 2006 > +++ b/net/ipv4/netfilter/ipt_recent.c Mon Mar 6 22:25:53 2006 > @@ -588,7 +588,7 @@ > #endif > /* Check if this is part of a collision chain */ > while(hash_table[(orig_hash_result+1) % ip_list_hash_size] != -1) { > - orig_hash_result++; > + orig_hash_result = (orig_hash_result+1) % ip_list_hash_size; > if(hash_func(r_list[hash_table[orig_hash_result]].addr,ip_list_hash_size) == hash_result) { > /* Found collision chain, how deep does this rabbit hole go? */ > #ifdef DEBUG > > From kgy at deverto.com Tue Mar 7 10:19:47 2006 From: kgy at deverto.com (Kovesdi Gyorgy) Date: Tue Mar 7 10:46:39 2006 Subject: output device In-Reply-To: <440845B4.3000208@ufomechanic.net> References: <200603030944.11858.kgy@deverto.com> <440845B4.3000208@ufomechanic.net> Message-ID: <200603071019.47411.kgy@deverto.com> > > I would like to set the output device in a rule (it is needed due to > > overlapping addresses). AFAIK, it cannot be done directly in a rule... > why can't you use ipt_route ? If I use such a rule: iptables -t mangle -A PREROUTING -p udp -d xxxx --dport yyyy -j ROUTE \ --oif eth0.2000 --continue then nothing happens. If the "--continue" is missing, the given packets are lost. What can be the problem here? I use kernel 2.6.15.5, iptables 1.2.11 and patch for the ipt_route. Other question: AFAIK the conntrack does not del with the target device. To be as fast as possible the target device would be handled in the conntrack, and it would be given together with the address in the DNAT rule, because the device is part of the address in a schyzophrenic network. Regards Gyorgy Kovesdi From tgraf at suug.ch Tue Mar 7 13:31:43 2006 From: tgraf at suug.ch (Thomas Graf) Date: Tue Mar 7 13:44:12 2006 Subject: [PATCH] [NETFILTER] ip_queue: Fix wrong skb->len == nlmsg_len assumption Message-ID: <20060307123143.GO9559@postel.suug.ch> The size of the skb carrying the netlink message is not equivalent to the length of the actual netlink message due to padding. ip_queue matches the length of the payload against the original packet size to determine if packet mangling is desired, due to the above wrong assumption arbitary packets may not be mangled depening on their original size. Signed-off-by: Thomas Graf Index: net-2.6/net/ipv4/netfilter/ip_queue.c =================================================================== --- net-2.6.orig/net/ipv4/netfilter/ip_queue.c +++ net-2.6/net/ipv4/netfilter/ip_queue.c @@ -524,7 +524,7 @@ ipq_rcv_skb(struct sk_buff *skb) write_unlock_bh(&queue_lock); status = ipq_receive_peer(NLMSG_DATA(nlh), type, - skblen - NLMSG_LENGTH(0)); + nlmsglen - NLMSG_LENGTH(0)); if (status < 0) RCV_SKB_FAIL(status); Index: net-2.6/net/ipv6/netfilter/ip6_queue.c =================================================================== --- net-2.6.orig/net/ipv6/netfilter/ip6_queue.c +++ net-2.6/net/ipv6/netfilter/ip6_queue.c @@ -522,7 +522,7 @@ ipq_rcv_skb(struct sk_buff *skb) write_unlock_bh(&queue_lock); status = ipq_receive_peer(NLMSG_DATA(nlh), type, - skblen - NLMSG_LENGTH(0)); + nlmsglen - NLMSG_LENGTH(0)); if (status < 0) RCV_SKB_FAIL(status); From kaber at trash.net Tue Mar 7 13:58:53 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Mar 7 14:11:52 2006 Subject: [PATCH] [NETFILTER] ip_queue: Fix wrong skb->len == nlmsg_len assumption In-Reply-To: <20060307123143.GO9559@postel.suug.ch> References: <20060307123143.GO9559@postel.suug.ch> Message-ID: <440D838D.8000302@trash.net> Thomas Graf wrote: > The size of the skb carrying the netlink message is not > equivalent to the length of the actual netlink message > due to padding. ip_queue matches the length of the payload > against the original packet size to determine if packet > mangling is desired, due to the above wrong assumption > arbitary packets may not be mangled depening on their > original size. Looks good, thanks Thomas. I think this should also go in 2.4. From omarquina at neotechgw.com Tue Mar 7 14:48:53 2006 From: omarquina at neotechgw.com (orlandox) Date: Tue Mar 7 14:59:39 2006 Subject: TWO PROTOCOL in the same RULE? Message-ID: <440D8F45.7050909@neotechgw.com> Hi, i'm a newby, I have a question, exist any patch, wich let me to refer with 1 rule, to diferent protocol at the same time, e.g: only write only one rule, that work with -p tcp, and -p udp, with the same target an criteria?. if don?t, anybody knows, if exist any development in progress, or why isn't necesary devel that feature From azez at ufomechanic.net Tue Mar 7 16:46:52 2006 From: azez at ufomechanic.net (Amin Azez) Date: Tue Mar 7 17:00:00 2006 Subject: BUG: More ipt_recent queries In-Reply-To: <20060306030411.GZ4474@ns.snowman.net> References: <43FAF692.8030804@ufomechanic.net> <44096842.3010507@trash.net> <20060306030411.GZ4474@ns.snowman.net> Message-ID: <440DAAEC.7060208@ufomechanic.net> As for the /proc output change, that was un-intentional, and for debugging purposes. I will remove that change, thanks for spotting it. I'm just finishing testing on new options for ipt_recent, I hope to submit a full patch next week including Per's fix. New options, from the man page: [!] --listcount-lt count Requires as a precondition that the number of IP entries in the list (subject to the optional --listtime-* specifier) is less than count (or not !). No other options are considered if this is not true. [!] --listcount-le count Requires as a precondition that the number of IP entries in the list (subject to the optional --listtime-* specifier) is less than or equal to count (or not !). No other options are consid- ered if this is not true. [!] --listcount-eq count Requires as a precondition that the number of IP entries in the list (subject to the optional --listtime-* specifier) is equal to (or not !) count. No other options are considered if this is not true. [!] --listcount-ge count Requires as a precondition that the number of IP entries in the list (subject to the optional --listtime-* specifier) is greater than or equal to count (or not !). No other options are consid- ered if this is not true. [!] --listcount-gt count Requires as a precondition that the number of IP entries in the list (subject to the optional --listtime-* specifier) is greater than count (or not !). No other options are considered if this is not true. Only one --listcount-* option can be specified. [!] --listtime-lt seconds Affects the --listcount-* so that instead of counting the number of items in the list, it counts the number of items that were last updated less than seconds seconds ago. [!] --listtime-le seconds Affects the --listcount-* so that instead of counting the number of items in the list, it counts the number of items that were last updated less than or equal to seconds seconds ago. [!] --listtime-ge seconds Affects the --listcount-* so that instead of counting the number of items in the list, it counts the number of items that were last updated more than or equal to seconds seconds ago. [!] --listtime-gt seconds Affects the --listcount-* so that instead of counting the number of items in the list, it counts the number of items that were last updated more than seconds seconds ago. Only one --listtime-* option can be specified. --listtime-* options act as select clauses for what to count. The ! negation for --list- time-* options merely inverts the comparison, so ! --listime-le is the same as --listtime-gt ... The next example accepts the packet if less than 5 ip addresses in the list have been updated in the last 60 seconds # iptables -A FORWARD -m recent --listcount-lt 5 --listtime-lt 60 --set -j ACCEPT Sam Stephen Frost wrote: >* Patrick McHardy (kaber@trash.net) wrote: > > >>Stephen, can you have a look at this please? >> >> > >Alright. I'll try to look over the current ipt_recent state and the >comments I've seen regarding it. I don't expect to have time to >rewrite it anytime soon though. > > Stephen > > > >>Amin Azez wrote: >> >> >>>I'm concerned about ipt_recent where it removes entries from the list. >>> >>>Surly the move-up-and-close-the-gap while loop will never enter because >>>time_info[time_loc].time has just been set to 0 so that this clause of >>>the while loop: >>> >>> time_info[(time_loc+1) % ip_list_tot].time < time_info[time_loc].time) >>> >>>will always be false. >>> >>>Fuller code segment: >>> >>>location = hash_table[hash_result]; >>>hash_table[r_list[location].hash_entry] = -1; >>>time_loc = r_list[location].time_pos; >>>time_info[time_loc].time = 0; >>>time_info[time_loc].position = location; >>> >>>while((time_info[(time_loc+1) % ip_list_tot].time < >>>time_info[time_loc].time) && ((time_loc+1) % ip_list_tot) != >>>curr_table->time_pos) { >>> time_temp = time_info[time_loc].time; >>> time_info[time_loc].time = time_info[(time_loc+1)%ip_list_tot].time; >>> time_info[(time_loc+1)%ip_list_tot].time = time_temp; >>> time_temp = time_info[time_loc].position; >>> time_info[time_loc].position = >>> time_info[(time_loc+1)%ip_list_tot].position; >>> time_info[(time_loc+1)%ip_list_tot].position = time_temp; >>> r_list[time_info[time_loc].position].time_pos = time_loc; >>> r_list[time_info[(time_loc+1)%ip_list_tot].position].time_pos = >>> (time_loc+1)%ip_list_tot; >>> time_loc = (time_loc+1) % ip_list_tot; >>>} >>> >>> >>>I think we should set time_info[time_loc].time = 0; at the end of the >>>while loop? >>> >>>Sam >>> >>> >>> >>> From azez at ufomechanic.net Tue Mar 7 16:48:59 2006 From: azez at ufomechanic.net (Amin Azez) Date: Tue Mar 7 17:01:55 2006 Subject: [patch] ipt_recent In-Reply-To: <44096532.2070000@trash.net> References: <43F9EA77.4060208@ufomechanic.net> <44096532.2070000@trash.net> Message-ID: <440DAB6B.4020208@ufomechanic.net> Patrick McHardy wrote: > Amin Azez wrote: > >>This patch fixes the previously mentioned bug in ipt_recent and adds: >> >>--lt n # check less than n items in list >>--gt n # checks more than n items in list >>--eq n # check exactly n items in list >> >>Which can be prefixed with ! to invert. >> >>--- include/linux/netfilter_ipv4/ipt_recent.h.nolimit 2006-02-20 10:12:06.000000000 +0000 >>+++ include/linux/netfilter_ipv4/ipt_recent.h 2006-02-20 11:30:58.000000000 +0000 >>@@ -10,6 +10,11 @@ >> #define IPT_RECENT_REMOVE 8 >> #define IPT_RECENT_TTL 16 >> >>+#define IPT_RECENT_INVERT 1 >>+#define IPT_RECENT_LT 2 >>+#define IPT_RECENT_GT 4 >>+#define IPT_RECENT_EQ (IPT_RECENT_LT | IPT_RECENT_GT) >>+ >> #define IPT_RECENT_SOURCE 0 >> #define IPT_RECENT_DEST 1 >> >>@@ -20,6 +25,8 @@ >> u_int32_t hit_count; >> u_int8_t check_set; >> u_int8_t invert; >>+ u_int8_t check_count; >>+ u_int32_t entry_count; >> char name[IPT_RECENT_NAME_LEN]; >> u_int8_t side; >> }; > > > Sorry, we can't do that since it breaks userspace compatibility. But I'm > really glad someone finally has the stomach to touch ipt_recent, I'll > review your other patches now. I've reworked that functionality significantly in a new patch to send next week. I will see if I can find a way to make use of existing structures to add the functionality. I heard tell that ipt_recent needed a maintainer? Sam From azez at ufomechanic.net Tue Mar 7 16:48:59 2006 From: azez at ufomechanic.net (Amin Azez) Date: Tue Mar 7 17:02:44 2006 Subject: [patch] ipt_recent In-Reply-To: <44096532.2070000@trash.net> References: <43F9EA77.4060208@ufomechanic.net> <44096532.2070000@trash.net> Message-ID: <440DAB6B.4020208@ufomechanic.net> Patrick McHardy wrote: > Amin Azez wrote: > >>This patch fixes the previously mentioned bug in ipt_recent and adds: >> >>--lt n # check less than n items in list >>--gt n # checks more than n items in list >>--eq n # check exactly n items in list >> >>Which can be prefixed with ! to invert. >> >>--- include/linux/netfilter_ipv4/ipt_recent.h.nolimit 2006-02-20 10:12:06.000000000 +0000 >>+++ include/linux/netfilter_ipv4/ipt_recent.h 2006-02-20 11:30:58.000000000 +0000 >>@@ -10,6 +10,11 @@ >> #define IPT_RECENT_REMOVE 8 >> #define IPT_RECENT_TTL 16 >> >>+#define IPT_RECENT_INVERT 1 >>+#define IPT_RECENT_LT 2 >>+#define IPT_RECENT_GT 4 >>+#define IPT_RECENT_EQ (IPT_RECENT_LT | IPT_RECENT_GT) >>+ >> #define IPT_RECENT_SOURCE 0 >> #define IPT_RECENT_DEST 1 >> >>@@ -20,6 +25,8 @@ >> u_int32_t hit_count; >> u_int8_t check_set; >> u_int8_t invert; >>+ u_int8_t check_count; >>+ u_int32_t entry_count; >> char name[IPT_RECENT_NAME_LEN]; >> u_int8_t side; >> }; > > > Sorry, we can't do that since it breaks userspace compatibility. But I'm > really glad someone finally has the stomach to touch ipt_recent, I'll > review your other patches now. I've reworked that functionality significantly in a new patch to send next week. I will see if I can find a way to make use of existing structures to add the functionality. I heard tell that ipt_recent needed a maintainer? Sam From gandalf at wlug.westbo.se Tue Mar 7 19:33:58 2006 From: gandalf at wlug.westbo.se (Martin Josefsson) Date: Tue Mar 7 19:46:52 2006 Subject: Hashtrie testing2 (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) In-Reply-To: References: <43EFF1F0.1090701@netfilter.org> <20060213112028.GU4601@sunbeam.de.gnumonks.org> <43F438F5.8070607@trash.net> <43F43FA9.4000906@trash.net> <43F4426D.9060807@trash.net> <43F4DBDF.9010008@trash.net> <1141503111.3881.61.camel@localhost.localdomain> <1141580938.3881.129.camel@localhost.localdomain> Message-ID: <1141756438.3881.158.camel@localhost.localdomain> On Mon, 2006-03-06 at 14:15 +0100, Jozsef Kadlecsik wrote: > Hi Martin Hi Jozsef > > HASHSHIFT and HASHNUM controls the number of struct hashentry per child. > > The original idea was to allocate 1 page (4kB) for each child but I got > > better results with 2kB, which means it isn't as wide as it was from the > > start. > > Did you get faster lookups/insertions/deletions with half of the page > size? Yes I did, but I don't trust my testresults 100% (run on my laptop) because of the fact that my laptop gets better results if I put the padding at the start of struct hashentry than if I put the padding at the end. > As far as I see, the structure should satisfy "contradicting" > requirements. We want an as low tree as possible, thus minimising the cost > of the operations. That includes the number of levels of the childs and > the number of slots in a bucket and the latter depends on the HASHALIGN, > hashbits_t parameters and the members of the hashentry structure. So in > theory we should maximize HASHSHIFT and minimize NUMENTRY. However we do > not want a preallocated, huge, sparse hash either, and therefore it is > hard to tell the best values. Maximizing HASHSHIFT makes us allocate a lot for each node. And minimizing HASHNUM makes struct hashentry smaller, thus giving a _lot_ of buckets in each node. Will having say 200 buckets with 3 entries in each bucket be better than 50 buckets with 12 entries in each bucket? Same number of entries in the node but one child-pointer per bucket, which might lead to even more child-nodes in the case where we have 200 buckets that fill up rather quick since they only hold a small number of entries. But on the other hand we hash to 200 buckets instead of 50 buckets so the total fill-rate of the node is the same for both cases. And having lots of entries in a bucket doesn't really cause a performance degradation as long as you keep the bucket not larger than the cacheline size and it has to be cacheline aligned. Having lots of buckets will give an even flatter tree. I think I've tested this as well about a year ago but I can't remember the results. > This structure is quite tricky :-). Yes it is :) Image how I felt when I got it described to me in conversations on irc. There's been a lot of experimentations and failures along the path and there's still a lot of things to test and rewrite. > > Small hashbits_t doesn't have anything to do with it, hashbits_t is just > > for accelerating lookups. If we have a low number of struct hashentry > > per child-node we'll probably fill them up faster since there's fewer of > > them to be filled. And if we have a low NUMENTRY they will also be > > filled faster. > > But how can we explain the so few entries in the buckets? Your numbers > suggest that most of the buckets are pretty empty, as if there were two > slots full from the 64 in the DoS case. Yes, it's weird. Havn't had time to write the code to traverse the produce stats from the tree yet, who knows, tonight may be the night when that happens :) > Or we can say that the DoS has got an effect: the jenkins hash were > unable to produce good enough keys. Or using the 32 bits from the 96 > internal ones is not sufficient. That might be the case. > But in the non-DoS random pattern case, there was 819200 entries / 27345 > child-nodes =~ 29 entries/child-nodes, that's still around 10% > utilization. So it looks it's not the jenkins hash which produces the > sparse tree. I'll see what I can dig out from the entries in the nodes, it might be very unbalanced. > > The problem is how do you know which entry along a path is the oldest? > > Entries aren't always added to the front like with the current linked > > lists in the hashtable. > > We could check the timer of the entry: the oldest one is that which would > time out nearest in the future. Ouch, this is exactly what we don't want to do. If we are under attack we don't want to start looking at entries to determine which entries to evict, that would result in lots of cachemisses which slows things down a lot. The current eviction algorithm tries to evict entries along the path that the new entry will use, and respect the level of the entries so you don't end up evicting entries from the root-node all the time, those free slots will get filled up with new entries rather quickly and we don't want to evict things we probably just added. But that doesn't say the current eviction algorithm does the right thing, it was just something me and Rusty came up with during the workshop as an aproximation. Fast batch-killing of entries, but try to not evict from the often walked parts of the path too often. It is best-effort, no need to bend over backwards just to do the right thing in all situations, it's not worth it in the end. > > The solution would be "ordered independent locking" which locks and adds > > entries based on bucket number that the tuplex hash into. > > That way we still keep the current behaviour while simplifying the > > locking, no chance of deadlocks. > > That looks perfect! It came to me after writing the description of the "unordered independent locking", it hit me like a punch in the face, it was so simple. -- /Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : /pipermail/netfilter-devel/attachments/20060307/67ab5606/attachment-0001.pgp From davem at davemloft.net Wed Mar 8 00:01:37 2006 From: davem at davemloft.net (David S. Miller) Date: Wed Mar 8 00:15:48 2006 Subject: [PATCH] [NETFILTER] ip_queue: Fix wrong skb->len == nlmsg_len assumption In-Reply-To: <440D838D.8000302@trash.net> References: <20060307123143.GO9559@postel.suug.ch> <440D838D.8000302@trash.net> Message-ID: <20060307.150137.63762785.davem@davemloft.net> From: Patrick McHardy Date: Tue, 07 Mar 2006 13:58:53 +0100 > Thomas Graf wrote: > > The size of the skb carrying the netlink message is not > > equivalent to the length of the actual netlink message > > due to padding. ip_queue matches the length of the payload > > against the original packet size to determine if packet > > mangling is desired, due to the above wrong assumption > > arbitary packets may not be mangled depening on their > > original size. > > Looks good, thanks Thomas. I think this should also go in 2.4. Pushed to 2.6.16, 2.6.x stable, and 2.4.x. Phew! Thanks Thomas. From bof at bof.de Wed Mar 8 07:34:17 2006 From: bof at bof.de (Patrick Schaaf) Date: Wed Mar 8 07:47:11 2006 Subject: Hashtrie testing2 (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) In-Reply-To: <1141756438.3881.158.camel@localhost.localdomain> References: <43F4DBDF.9010008@trash.net> <1141503111.3881.61.camel@localhost.localdomain> <1141580938.3881.129.camel@localhost.localdomain> <1141756438.3881.158.camel@localhost.localdomain> Message-ID: <20060308063416.GB22702@oknodo.bof.de> > > This structure is quite tricky :-). > > Yes it is :) Image how I felt when I got it described to me in > conversations on irc. Hmm. Evoke this feeling again. :) Now I'll describe an easy way (haha) to get rid of half of the tuples, for the case of no-NAT. I did that before, but if you are now revisiting conntracking basics, maybe it's time to retell: Notice that for the non-NAT case, the two tuples of a connection are fully symmetric in src(ip|port) vs. dst(ip|port). For ease of exposition, imagine each (ip|port) to be an 48-bit number. Given two numbers A and B, a function can easily be implemented: symsort: N x N -> N x N x Z2 given (a, b), if a <= b, result is (a, b, 0) given (a, b), if a > b, result is (b, a, 1) Given the symmetric tuples from both halves of one connection, this function returns the same A/B results in both cases, with a different flip bit (third result). So, when doing the hash lookup, run symsort first, and look for the resulting "sorted" tuple. The flip bit output of symsort needs to be taken into account when classifying original / other destination. Details... One would have a single tuple, instead of two, in struct ip_conntrack. For NATted connections, an additional tuple would be allocated on demand, referenced from ip_conntrack by a pointer, and also put into the hashes. When a non-NAT conntrack gets deleted, only one hash entry needs to be removed, avoiding the need for complex lock ordering schemes in that place. You can always tell you have this case, by looking whether the NAT-tuple-pointer is != 0. Downsides: - complexity - second tuple in separate memory, in the NAT case. Maybe more cache misses in some places Upsides in the non-NAT case: - half the memory for tuples - half the number of hash insertions and deletions with their assorted cache influences best regards Patrick From dim at openvz.org Tue Mar 7 15:07:18 2006 From: dim at openvz.org (Dmitry Mishin) Date: Wed Mar 8 11:58:50 2006 Subject: {get|set}sockopt compat layer In-Reply-To: <200602211256.00846.arnd@arndb.de> References: <200602201110.39092.dim@openvz.org> <200602211204.50118.dim@sw.ru> <200602211256.00846.arnd@arndb.de> Message-ID: <200603071707.19138.dim@openvz.org> Hello, Arnd! Sorry for such delay, was on vacancy. Here is a patch, introducing compat_(get|set)sockopt handlers, as you proposed. On Tuesday 21 February 2006 14:56, Arnd Bergmann wrote: > On Tuesday 21 February 2006 10:04, Dmitry Mishin wrote: > > On Monday 20 February 2006 18:55, Arnd Bergmann wrote: > > > Is CONFIG_COMPAT the right conditional here? If the code is only used > > > for architectures that have different aligments, it should not need be > > > compiled in for the other architectures. > > > > So, I'll define ARCH_HAS_FUNNY_64_ALIGNMENT in x86_64 and ia64 code and > > will check it, as Andi suggested. > > I think nowadays, unconditionally setting CONFIG_FUNNY_64_ALIGNMENT from > arch/{ia64,x86_64}/Kconfig would be the preferred way to a #define in > include/asm. > > > > > ?#define IPT_ALIGN(s) XT_ALIGN(s) > > > > + > > > > +#ifdef CONFIG_COMPAT > > > > +#include > > > > + > > > > +struct compat_ipt_getinfo > > > > +{ > > > > +???????char name[IPT_TABLE_MAXNAMELEN]; > > > > +???????compat_uint_t valid_hooks; > > > > +???????compat_uint_t hook_entry[NF_IP_NUMHOOKS]; > > > > +???????compat_uint_t underflow[NF_IP_NUMHOOKS]; > > > > +???????compat_uint_t num_entries; > > > > +???????compat_uint_t size; > > > > +}; > > > > > > This structure looks like it does not need any > > > conversions. You should probably just use > > > struct ipt_getinfo then. > > > > I just saw compat_uint_t use in net/compat.c and thought, that it is a > > good style to use it. Does anybody know arch, where sizeof(compat_uint_t) > > != 4? > > No, the compat layer already heavily depends on the fact that compat_uint_t > is always the same as unsigned int. > > > > Dito > > > > Disagree, ipt_entry_match and ipt_entry_target contain pointers which > > make their alignment equal 8 byte on 64bits architectures. > > Ah, I see. > > > > I would much rather have either an extra 'compat' argument to to > > > sock_setsockopt and proto_ops->setsockopt than to spread the use > > > of is_compat_task further. > > > > Another weak place in my code. is_compat_task() approach has one > > advantage - it doesn't require a lot of current code modifications. > > > > > Is the FIXME above the only reason that the code needs to be changed? > > > What is the reason that you did not just address this in the > > > compat_sys_setsockopt implementation? > > > > Code above doesn't work. iptables with version >= 1.3 does alignment > > checks as well as kernel does. So, we can't simply put entries with 8 > > bytes alignment to userspace or with 4 bytes alignment to kernel - we > > need translate them entry by entry. So, I tried to do this the most > > correct way - that userspace will hide its alignment from kernel and vice > > versa, with not only SET_REPLACE, but also GET_INFO, GET_ENTRIES and > > SET_COUNTERS translation. First implementation was exactly in > > compat_sys_setsockopt, but David asked me to do this in netfilter code > > itself. > > Ok, I see the point there. It's probably best to push down all the > conversions from compat_sys_setsockopt down to the protocol specific parts, > similar to what we do for the ioctl handlers. > > I'm thinking of something like > > int compat_sock_setsockopt(struct socket *sock, int level, int optname, > char __user *optval, int optlen) > { > switch (optname) { > case SO_ATTACH_FILTER: > return do_set_attach_filter(fd, level, optname, > optval, optlen); > case SO_SNDTIMEO: > return do_set_sock_timeout(fd, level, optname, > optval, optlen); > default: > break; > } > return sock_setsockopt(sock, level, optname, optval, optlen); > } > > asmlinkage long compat_sys_setsockopt(int fd, int level, int optname, > char __user *optval, int optlen) > { > int err; > struct socket *sock; > > if (optlen < 0) > return -EINVAL; > > if ((sock = sockfd_lookup(fd, &err))!=NULL) > { > err = security_socket_setsockopt(sock,level,optname); > if (err) { > sockfd_put(sock); > return err; > } > > if (level == SOL_SOCKET) > err = compat_sock_setsockopt(sock, level, > optname, optval, optlen); > else if (sock->ops->compat_setsockopt) > err = sock->ops->compat_setsockopt(sock, level, > optname, optval, optlen); > else > err = sock->ops->setsockopt(sock, level, > optname, optval, optlen); > sockfd_put(sock); > } > return err; > } > > int tcp_setsockopt(struct sock *sk, int level, int optname, char __user > *optval, int optlen) { > int err = 0; > > err = ip_setsockopt(sk, level, optname, optval, optlen); > > #ifdef CONFIG_NETFILTER > if (err = -ENOPROTOOPT) { > lock_sock(sk); > err = nf_setsockopt(sk, PF_INET, optname, optval, optlen); > release_sock(sk); > } > #endif > return err; > } > > int compat_tcp_setsockopt(struct sock *sk, int level, int optname, char > __user *optval, int optlen) { > int err = 0; > > err = ip_setsockopt(sk, level, optname, optval, optlen); > > #ifdef CONFIG_NETFILTER > if (err = -ENOPROTOOPT) { > lock_sock(sk); > err = compat_nf_setsockopt(sk, PF_INET, optname, optval, optlen); > release_sock(sk); > } > #endif > return err; > } > > And the same for udp, raw, ipv6, decnet and each of those with getsockopt. > It is a bigger change, but it puts all the handlers where they belong > and it is more extensible to other sockopt handlers if we find more > fsckup in some of them. > > Arnd <>< -- Thanks, Dmitry. -------------- next part -------------- --- ./include/linux/net.h.compat 2006-03-07 11:22:27.000000000 +0300 +++ ./include/linux/net.h 2006-03-07 11:20:07.000000000 +0300 @@ -149,6 +149,12 @@ struct proto_ops { int optname, char __user *optval, int optlen); int (*getsockopt)(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen); +#ifdef CONFIG_COMPAT + int (*compat_setsockopt)(struct socket *sock, int level, + int optname, char __user *optval, int optlen); + int (*compat_getsockopt)(struct socket *sock, int level, + int optname, char __user *optval, int __user *optlen); +#endif int (*sendmsg) (struct kiocb *iocb, struct socket *sock, struct msghdr *m, size_t total_len); int (*recvmsg) (struct kiocb *iocb, struct socket *sock, --- ./include/linux/netfilter.h.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./include/linux/netfilter.h 2006-03-07 15:00:14.000000000 +0300 @@ -2,6 +2,7 @@ #define __LINUX_NETFILTER_H #ifdef __KERNEL__ +#include #include #include #include @@ -80,10 +81,18 @@ struct nf_sockopt_ops int set_optmin; int set_optmax; int (*set)(struct sock *sk, int optval, void __user *user, unsigned int len); +#ifdef CONFIG_COMPAT + int (*compat_set)(struct sock *sk, int optval, + void __user *user, unsigned int len); +#endif int get_optmin; int get_optmax; int (*get)(struct sock *sk, int optval, void __user *user, int *len); +#ifdef CONFIG_COMPAT + int (*compat_get)(struct sock *sk, int optval, + void __user *user, int *len); +#endif /* Number of users inside set() or get(). */ unsigned int use; @@ -246,6 +255,13 @@ int nf_setsockopt(struct sock *sk, int p int nf_getsockopt(struct sock *sk, int pf, int optval, char __user *opt, int *len); +#ifdef CONFIG_COMPAT +int compat_nf_setsockopt(struct sock *sk, int pf, int optval, + char __user *opt, int len); +int compat_nf_getsockopt(struct sock *sk, int pf, int optval, + char __user *opt, int *len); +#endif + /* Packet queuing */ struct nf_queue_handler { int (*outfn)(struct sk_buff *skb, struct nf_info *info, --- ./include/net/inet_connection_sock.h.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./include/net/inet_connection_sock.h 2006-03-07 15:46:20.000000000 +0300 @@ -15,6 +15,7 @@ #ifndef _INET_CONNECTION_SOCK_H #define _INET_CONNECTION_SOCK_H +#include #include #include #include @@ -50,6 +51,14 @@ struct inet_connection_sock_af_ops { char __user *optval, int optlen); int (*getsockopt)(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); +#ifdef CONFIG_COMPAT + int (*compat_setsockopt)(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); + int (*compat_getsockopt)(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); +#endif void (*addr2sockaddr)(struct sock *sk, struct sockaddr *); int sockaddr_len; }; --- ./include/net/ip.h.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./include/net/ip.h 2006-03-07 14:38:54.000000000 +0300 @@ -356,6 +356,12 @@ extern void ip_cmsg_recv(struct msghdr * extern int ip_cmsg_send(struct msghdr *msg, struct ipcm_cookie *ipc); extern int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); extern int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); +#ifdef CONFIG_COMPAT +extern int compat_ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen); +extern int compat_ip_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen); +#endif extern int ip_ra_control(struct sock *sk, unsigned char on, void (*destructor)(struct sock *)); extern int ip_recv_error(struct sock *sk, struct msghdr *msg, int len); --- ./include/net/sctp/structs.h.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./include/net/sctp/structs.h 2006-03-07 15:47:50.000000000 +0300 @@ -54,6 +54,7 @@ #ifndef __sctp_structs_h__ #define __sctp_structs_h__ +#include #include /* We get struct timespec. */ #include /* linux/in.h needs this!! */ #include /* We get struct sockaddr_in. */ @@ -514,6 +515,18 @@ struct sctp_af { int optname, char __user *optval, int __user *optlen); +#ifdef CONFIG_COMPAT + int (*compat_setsockopt) (struct sock *sk, + int level, + int optname, + char __user *optval, + int optlen); + int (*compat_getsockopt) (struct sock *sk, + int level, + int optname, + char __user *optval, + int __user *optlen); +#endif struct dst_entry *(*get_dst) (struct sctp_association *asoc, union sctp_addr *daddr, union sctp_addr *saddr); --- ./include/net/sock.h.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./include/net/sock.h 2006-03-07 15:56:13.000000000 +0300 @@ -520,6 +520,16 @@ struct proto { int (*getsockopt)(struct sock *sk, int level, int optname, char __user *optval, int __user *option); +#ifdef CONFIG_COMPAT + int (*compat_setsockopt)(struct sock *sk, + int level, + int optname, char __user *optval, + int optlen); + int (*compat_getsockopt)(struct sock *sk, + int level, + int optname, char __user *optval, + int __user *option); +#endif int (*sendmsg)(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len); int (*recvmsg)(struct kiocb *iocb, struct sock *sk, @@ -816,6 +826,12 @@ extern int sock_common_recvmsg(struct ki struct msghdr *msg, size_t size, int flags); extern int sock_common_setsockopt(struct socket *sock, int level, int optname, char __user *optval, int optlen); +#ifdef CONFIG_COMPAT +extern int compat_sock_common_getsockopt(struct socket *sock, int level, + int optname, char __user *optval, int __user *optlen); +extern int compat_sock_common_setsockopt(struct socket *sock, int level, + int optname, char __user *optval, int optlen); +#endif extern void sk_common_release(struct sock *sk); --- ./include/net/tcp.h.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./include/net/tcp.h 2006-03-07 15:43:43.000000000 +0300 @@ -347,6 +347,14 @@ extern int tcp_getsockopt(struct sock extern int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); +#ifdef CONFIG_COMPAT +extern int compat_tcp_getsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); +extern int compat_tcp_setsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); +#endif extern void tcp_set_keepalive(struct sock *sk, int val); extern int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, --- ./net/compat.c.compat 2006-03-07 11:21:00.000000000 +0300 +++ ./net/compat.c 2006-03-07 15:04:49.000000000 +0300 @@ -416,7 +416,7 @@ struct compat_sock_fprog { compat_uptr_t filter; /* struct sock_filter * */ }; -static int do_set_attach_filter(int fd, int level, int optname, +static int do_set_attach_filter(struct socket *sock, int level, int optname, char __user *optval, int optlen) { struct compat_sock_fprog __user *fprog32 = (struct compat_sock_fprog __user *)optval; @@ -432,11 +432,12 @@ static int do_set_attach_filter(int fd, __put_user(compat_ptr(ptr), &kfprog->filter)) return -EFAULT; - return sys_setsockopt(fd, level, optname, (char __user *)kfprog, + return sock_setsockopt(sock, level, optname, (char __user *)kfprog, sizeof(struct sock_fprog)); } -static int do_set_sock_timeout(int fd, int level, int optname, char __user *optval, int optlen) +static int do_set_sock_timeout(struct socket *sock, int level, + int optname, char __user *optval, int optlen) { struct compat_timeval __user *up = (struct compat_timeval __user *) optval; struct timeval ktime; @@ -451,30 +452,61 @@ static int do_set_sock_timeout(int fd, i return -EFAULT; old_fs = get_fs(); set_fs(KERNEL_DS); - err = sys_setsockopt(fd, level, optname, (char *) &ktime, sizeof(ktime)); + err = sock_setsockopt(sock, level, optname, (char *) &ktime, sizeof(ktime)); set_fs(old_fs); return err; } +static int compat_sock_setsockopt(struct socket *sock, int level, int optname, + char __user *optval, int optlen) +{ + if (optname == SO_ATTACH_FILTER) + return do_set_attach_filter(sock, level, optname, + optval, optlen); + if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO) + return do_set_sock_timeout(sock, level, optname, optval, optlen); + + return sock_setsockopt(sock, level, optname, optval, optlen); +} + asmlinkage long compat_sys_setsockopt(int fd, int level, int optname, char __user *optval, int optlen) { + int err; + struct socket *sock; + /* SO_SET_REPLACE seems to be the same in all levels */ if (optname == IPT_SO_SET_REPLACE) return do_netfilter_replace(fd, level, optname, optval, optlen); - if (level == SOL_SOCKET && optname == SO_ATTACH_FILTER) - return do_set_attach_filter(fd, level, optname, - optval, optlen); - if (level == SOL_SOCKET && - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) - return do_set_sock_timeout(fd, level, optname, optval, optlen); - return sys_setsockopt(fd, level, optname, optval, optlen); + if (optlen < 0) + return -EINVAL; + + if ((sock = sockfd_lookup(fd, &err))!=NULL) + { + err = security_socket_setsockopt(sock,level,optname); + if (err) { + sockfd_put(sock); + return err; + } + + if (level == SOL_SOCKET) + err = compat_sock_setsockopt(sock, level, + optname, optval, optlen); + else if (sock->ops->compat_setsockopt) + err = sock->ops->compat_setsockopt(sock, level, + optname, optval, optlen); + else + err = sock->ops->setsockopt(sock, level, + optname, optval, optlen); + sockfd_put(sock); + } + return err; } -static int do_get_sock_timeout(int fd, int level, int optname, +static int do_get_sock_timeout(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { struct compat_timeval __user *up; @@ -490,7 +522,7 @@ static int do_get_sock_timeout(int fd, i len = sizeof(ktime); old_fs = get_fs(); set_fs(KERNEL_DS); - err = sys_getsockopt(fd, level, optname, (char *) &ktime, &len); + err = sock_getsockopt(sock, level, optname, (char *) &ktime, &len); set_fs(old_fs); if (!err) { @@ -503,15 +535,42 @@ static int do_get_sock_timeout(int fd, i return err; } -asmlinkage long compat_sys_getsockopt(int fd, int level, int optname, +static int compat_sock_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { - if (level == SOL_SOCKET && - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) - return do_get_sock_timeout(fd, level, optname, optval, optlen); - return sys_getsockopt(fd, level, optname, optval, optlen); + if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO) + return do_get_sock_timeout(sock, level, optname, optval, optlen); + return sock_getsockopt(sock, level, optname, optval, optlen); } +asmlinkage long compat_sys_getsockopt(int fd, int level, int optname, + char __user *optval, int __user *optlen) +{ + int err; + struct socket *sock; + + if ((sock = sockfd_lookup(fd, &err))!=NULL) + { + err = security_socket_getsockopt(sock, level, + optname); + if (err) { + sockfd_put(sock); + return err; + } + + if (level == SOL_SOCKET) + err = compat_sock_getsockopt(sock, level, + optname, optval, optlen); + else if (sock->ops->compat_getsockopt) + err = sock->ops->compat_getsockopt(sock, level, + optname, optval, optlen); + else + err = sock->ops->getsockopt(sock, level, + optname, optval, optlen); + sockfd_put(sock); + } + return err; +} /* Argument list sizes for compat_sys_socketcall */ #define AL(x) ((x) * sizeof(u32)) static unsigned char nas[18]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3), --- ./net/core/sock.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/core/sock.c 2006-03-07 16:24:57.000000000 +0300 @@ -1385,6 +1385,20 @@ int sock_common_getsockopt(struct socket EXPORT_SYMBOL(sock_common_getsockopt); +#ifdef CONFIG_COMPAT +int compat_sock_common_getsockopt(struct socket *sock, int level, + int optname, char __user *optval, int __user *optlen) +{ + struct sock *sk = sock->sk; + + if (sk->sk_prot->compat_setsockopt) + return sk->sk_prot->compat_getsockopt(sk, level, + optname, optval, optlen); + return sk->sk_prot->getsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL(compat_sock_common_getsockopt); +#endif + int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t size, int flags) { @@ -1414,6 +1428,20 @@ int sock_common_setsockopt(struct socket EXPORT_SYMBOL(sock_common_setsockopt); +#ifdef CONFIG_COMPAT +int compat_sock_common_setsockopt(struct socket *sock, + int level, int optname, char __user *optval, int optlen) +{ + struct sock *sk = sock->sk; + + if (sk->sk_prot->compat_setsockopt) + return sk->sk_prot->compat_setsockopt(sk, level, + optname, optval, optlen); + return sk->sk_prot->setsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL(compat_sock_common_setsockopt); +#endif + void sk_common_release(struct sock *sk) { if (sk->sk_prot->destroy) --- ./net/dccp/dccp.h.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/dccp/dccp.h 2006-03-07 15:24:36.000000000 +0300 @@ -246,6 +246,14 @@ extern int dccp_getsockopt(struct soc char __user *optval, int __user *optlen); extern int dccp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); +#ifdef CONFIG_COMPAT +extern int compat_dccp_getsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); +extern int compat_dccp_setsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); +#endif extern int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg); extern int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t size); --- ./net/dccp/ipv4.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/dccp/ipv4.c 2006-03-07 15:24:42.000000000 +0300 @@ -1028,6 +1028,10 @@ struct inet_connection_sock_af_ops dccp_ .net_header_len = sizeof(struct iphdr), .setsockopt = ip_setsockopt, .getsockopt = ip_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif .addr2sockaddr = inet_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in), }; @@ -1152,6 +1156,10 @@ struct proto dccp_prot = { .init = dccp_v4_init_sock, .setsockopt = dccp_setsockopt, .getsockopt = dccp_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_dccp_setsockopt, + .compat_getsockopt = compat_dccp_getsockopt, +#endif .sendmsg = dccp_sendmsg, .recvmsg = dccp_recvmsg, .backlog_rcv = dccp_v4_do_rcv, --- ./net/dccp/ipv6.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/dccp/ipv6.c 2006-03-07 15:57:23.000000000 +0300 @@ -1170,6 +1170,10 @@ static struct proto dccp_v6_prot = { .init = dccp_v6_init_sock, .setsockopt = dccp_setsockopt, .getsockopt = dccp_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_dccp_setsockopt, + .compat_getsockopt = compat_dccp_getsockopt, +#endif .sendmsg = dccp_sendmsg, .recvmsg = dccp_recvmsg, .backlog_rcv = dccp_v6_do_rcv, @@ -1207,6 +1211,10 @@ static struct proto_ops inet6_dccp_ops = .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, --- ./net/dccp/proto.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/dccp/proto.c 2006-03-07 16:49:11.000000000 +0300 @@ -255,18 +255,13 @@ static int dccp_setsockopt_service(struc return 0; } -int dccp_setsockopt(struct sock *sk, int level, int optname, - char __user *optval, int optlen) +static int do_dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) { struct dccp_sock *dp; int err; int val; - if (level != SOL_DCCP) - return inet_csk(sk)->icsk_af_ops->setsockopt(sk, level, - optname, optval, - optlen); - if (optlen < sizeof(int)) return -EINVAL; @@ -293,8 +288,34 @@ int dccp_setsockopt(struct sock *sk, int return err; } +int dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_DCCP) + return inet_csk(sk)->icsk_af_ops->setsockopt(sk, level, + optname, optval, + optlen); + return do_dccp_setsockopt(sk, level, optname, optval, optlen); +} EXPORT_SYMBOL_GPL(dccp_setsockopt); +#ifdef CONFIG_COMPAT +int compat_dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_DCCP) { + if (inet_csk(sk)->icsk_af_ops->compat_setsockopt) + return inet_csk(sk)->icsk_af_ops->compat_setsockopt(sk, + level, optname, optval, optlen); + else + return inet_csk(sk)->icsk_af_ops->setsockopt(sk, + level, optname, optval, optlen); + } + return do_dccp_setsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL_GPL(compat_dccp_setsockopt); +#endif + static int dccp_getsockopt_service(struct sock *sk, int len, u32 __user *optval, int __user *optlen) @@ -326,16 +347,12 @@ out: return err; } -int dccp_getsockopt(struct sock *sk, int level, int optname, +static int do_dccp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { struct dccp_sock *dp; int val, len; - if (level != SOL_DCCP) - return inet_csk(sk)->icsk_af_ops->getsockopt(sk, level, - optname, optval, - optlen); if (get_user(len, optlen)) return -EFAULT; @@ -368,8 +385,34 @@ int dccp_getsockopt(struct sock *sk, int return 0; } +int dccp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_DCCP) + return inet_csk(sk)->icsk_af_ops->getsockopt(sk, level, + optname, optval, + optlen); + return do_dccp_getsockopt(sk, level, optname, optval, optlen); +} EXPORT_SYMBOL_GPL(dccp_getsockopt); +#ifdef CONFIG_COMPAT +int compat_dccp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_DCCP) { + if (inet_csk(sk)->icsk_af_ops->compat_setsockopt) + return inet_csk(sk)->icsk_af_ops->compat_getsockopt(sk, + level, optname, optval, optlen); + else + return inet_csk(sk)->icsk_af_ops->getsockopt(sk, + level, optname, optval, optlen); + } + return do_dccp_getsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL_GPL(compat_dccp_getsockopt); +#endif + int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len) { @@ -696,6 +739,10 @@ static const struct proto_ops inet_dccp_ .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, --- ./net/ipv4/af_inet.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/ipv4/af_inet.c 2006-03-07 16:25:58.000000000 +0300 @@ -802,6 +802,10 @@ const struct proto_ops inet_stream_ops = .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, @@ -823,6 +827,10 @@ const struct proto_ops inet_dgram_ops = .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, @@ -848,6 +856,10 @@ static const struct proto_ops inet_sockr .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, --- ./net/ipv4/ip_sockglue.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/ipv4/ip_sockglue.c 2006-03-07 14:41:47.000000000 +0300 @@ -380,14 +380,12 @@ out: * an IP socket. */ -int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) +static int do_ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) { struct inet_sock *inet = inet_sk(sk); int val=0,err; - if (level != SOL_IP) - return -ENOPROTOOPT; - if (((1< (MRT_BASE + 10)) +#endif + ) { + lock_sock(sk); + err = nf_setsockopt(sk, PF_INET, optname, optval, optlen); + release_sock(sk); + } +#endif + return err; +} + +#ifdef CONFIG_COMPAT +int compat_ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) +{ + int err; + + if (level != SOL_IP) + return -ENOPROTOOPT; + + err = do_ip_setsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_HDRINCL && + optname != IP_IPSEC_POLICY && optname != IP_XFRM_POLICY +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > (MRT_BASE + 10)) +#endif + ) { + lock_sock(sk); + err = compat_nf_setsockopt(sk, PF_INET, + optname, optval, optlen); + release_sock(sk); + } +#endif + return err; +} +#endif + /* * Get the options. Note for future reference. The GET of IP options gets the * _received_ ones. The set sets the _sent_ ones. */ -int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) +static int do_ip_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) { struct inet_sock *inet = inet_sk(sk); int val; @@ -1051,17 +1098,8 @@ int ip_getsockopt(struct sock *sk, int l val = inet->freebind; break; default: -#ifdef CONFIG_NETFILTER - val = nf_getsockopt(sk, PF_INET, optname, optval, - &len); - release_sock(sk); - if (val >= 0) - val = put_user(len, optlen); - return val; -#else release_sock(sk); return -ENOPROTOOPT; -#endif } release_sock(sk); @@ -1082,7 +1120,73 @@ int ip_getsockopt(struct sock *sk, int l return 0; } +int ip_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) +{ + int err; + + err = do_ip_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > MRT_BASE+10) +#endif + ) { + int len; + + if(get_user(len,optlen)) + return -EFAULT; + + lock_sock(sk); + err = nf_getsockopt(sk, PF_INET, optname, optval, + &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + return err; + } +#endif + return err; +} + +#ifdef CONFIG_COMPAT +int compat_ip_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) +{ + int err; + + err = do_ip_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > MRT_BASE+10) +#endif + ) { + int len; + + if(get_user(len,optlen)) + return -EFAULT; + + lock_sock(sk); + err = compat_nf_getsockopt(sk, PF_INET, + optname, optval, &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + return err; + } +#endif + return err; +} +#endif + EXPORT_SYMBOL(ip_cmsg_recv); EXPORT_SYMBOL(ip_getsockopt); EXPORT_SYMBOL(ip_setsockopt); +#ifdef CONFIG_COMPAT +EXPORT_SYMBOL(compat_ip_getsockopt); +EXPORT_SYMBOL(compat_ip_setsockopt); +#endif --- ./net/ipv4/raw.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/ipv4/raw.c 2006-03-07 15:34:44.000000000 +0300 @@ -660,12 +660,9 @@ static int raw_geticmpfilter(struct sock out: return ret; } -static int raw_setsockopt(struct sock *sk, int level, int optname, +static int do_raw_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { - if (level != SOL_RAW) - return ip_setsockopt(sk, level, optname, optval, optlen); - if (optname == ICMP_FILTER) { if (inet_sk(sk)->num != IPPROTO_ICMP) return -EOPNOTSUPP; @@ -675,12 +672,28 @@ static int raw_setsockopt(struct sock *s return -ENOPROTOOPT; } -static int raw_getsockopt(struct sock *sk, int level, int optname, - char __user *optval, int __user *optlen) +static int raw_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) { if (level != SOL_RAW) - return ip_getsockopt(sk, level, optname, optval, optlen); + return ip_setsockopt(sk, level, optname, optval, optlen); + return do_raw_setsockopt(sk, level, optname, optval, optlen); +} +#ifdef CONFIG_COMPAT +static int compat_raw_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_RAW) + return compat_ip_setsockopt(sk, level, + optname, optval, optlen); + return do_raw_setsockopt(sk, level, optname, optval, optlen); +} +#endif + +static int do_raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ if (optname == ICMP_FILTER) { if (inet_sk(sk)->num != IPPROTO_ICMP) return -EOPNOTSUPP; @@ -690,6 +703,25 @@ static int raw_getsockopt(struct sock *s return -ENOPROTOOPT; } +static int raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_RAW) + return ip_getsockopt(sk, level, optname, optval, optlen); + return do_raw_getsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +static int compat_raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_RAW) + return compat_ip_getsockopt(sk, level, + optname, optval, optlen); + return do_raw_getsockopt(sk, level, optname, optval, optlen); +} +#endif + static int raw_ioctl(struct sock *sk, int cmd, unsigned long arg) { switch (cmd) { @@ -728,6 +760,10 @@ struct proto raw_prot = { .init = raw_init, .setsockopt = raw_setsockopt, .getsockopt = raw_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_raw_setsockopt, + .compat_getsockopt = compat_raw_getsockopt, +#endif .sendmsg = raw_sendmsg, .recvmsg = raw_recvmsg, .bind = raw_bind, --- ./net/ipv4/tcp.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/ipv4/tcp.c 2006-03-07 16:36:21.000000000 +0300 @@ -1687,18 +1687,14 @@ int tcp_disconnect(struct sock *sk, int /* * Socket option code for TCP. */ -int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, - int optlen) +static int do_tcp_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); int val; int err = 0; - if (level != SOL_TCP) - return icsk->icsk_af_ops->setsockopt(sk, level, optname, - optval, optlen); - /* This is a string value all the others are int's */ if (optname == TCP_CONGESTION) { char name[TCP_CA_NAME_MAX]; @@ -1871,6 +1867,35 @@ int tcp_setsockopt(struct sock *sk, int return err; } +int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, + int optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) + return icsk->icsk_af_ops->setsockopt(sk, level, optname, + optval, optlen); + return do_tcp_setsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +int compat_tcp_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) { + if (icsk->icsk_af_ops->compat_setsockopt) + return icsk->icsk_af_ops->compat_setsockopt(sk, + level, optname, optval, optlen); + else + return icsk->icsk_af_ops->setsockopt(sk, + level, optname, optval, optlen); + } + return do_tcp_setsockopt(sk, level, optname, optval, optlen); +} +#endif + /* Return information about state of tcp endpoint in API format. */ void tcp_get_info(struct sock *sk, struct tcp_info *info) { @@ -1931,17 +1956,13 @@ void tcp_get_info(struct sock *sk, struc EXPORT_SYMBOL_GPL(tcp_get_info); -int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, - int __user *optlen) +static int do_tcp_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); int val, len; - if (level != SOL_TCP) - return icsk->icsk_af_ops->getsockopt(sk, level, optname, - optval, optlen); - if (get_user(len, optlen)) return -EFAULT; @@ -2025,6 +2046,34 @@ int tcp_getsockopt(struct sock *sk, int return 0; } +int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, + int __user *optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) + return icsk->icsk_af_ops->getsockopt(sk, level, optname, + optval, optlen); + return do_tcp_getsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +int compat_tcp_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) { + if (icsk->icsk_af_ops->compat_getsockopt) + return icsk->icsk_af_ops->compat_getsockopt(sk, + level, optname, optval, optlen); + else + return icsk->icsk_af_ops->getsockopt(sk, + level, optname, optval, optlen); + } + return do_tcp_getsockopt(sk, level, optname, optval, optlen); +} +#endif extern void __skb_cb_too_small_for_tcp(int, int); extern struct tcp_congestion_ops tcp_reno; @@ -2142,3 +2191,7 @@ EXPORT_SYMBOL(tcp_sendpage); EXPORT_SYMBOL(tcp_setsockopt); EXPORT_SYMBOL(tcp_shutdown); EXPORT_SYMBOL(tcp_statistics); +#ifdef CONFIG_COMPAT +EXPORT_SYMBOL(compat_tcp_setsockopt); +EXPORT_SYMBOL(compat_tcp_getsockopt); +#endif --- ./net/ipv4/tcp_ipv4.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/ipv4/tcp_ipv4.c 2006-03-07 15:46:22.000000000 +0300 @@ -1225,6 +1225,10 @@ struct inet_connection_sock_af_ops ipv4_ .net_header_len = sizeof(struct iphdr), .setsockopt = ip_setsockopt, .getsockopt = ip_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif .addr2sockaddr = inet_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in), }; @@ -1807,6 +1811,10 @@ struct proto tcp_prot = { .shutdown = tcp_shutdown, .setsockopt = tcp_setsockopt, .getsockopt = tcp_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_tcp_setsockopt, + .compat_getsockopt = compat_tcp_getsockopt, +#endif .sendmsg = tcp_sendmsg, .recvmsg = tcp_recvmsg, .backlog_rcv = tcp_v4_do_rcv, --- ./net/ipv4/udp.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/ipv4/udp.c 2006-03-07 12:38:44.000000000 +0300 @@ -1207,16 +1207,13 @@ static int udp_destroy_sock(struct sock /* * Socket option code for UDP */ -static int udp_setsockopt(struct sock *sk, int level, int optname, +static int do_udp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { struct udp_sock *up = udp_sk(sk); int val; int err = 0; - if (level != SOL_UDP) - return ip_setsockopt(sk, level, optname, optval, optlen); - if(optlenpf == pf) { + if (get) { + if (val >= ops->get_optmin + && val < ops->get_optmax) { + ops->use++; + up(&nf_sockopt_mutex); + if (ops->compat_get) + ret = ops->compat_get(sk, + val, opt, len); + else + ret = ops->get(sk, + val, opt, len); + goto out; + } + } else { + if (val >= ops->set_optmin + && val < ops->set_optmax) { + ops->use++; + up(&nf_sockopt_mutex); + if (ops->compat_set) + ret = ops->compat_set(sk, + val, opt, *len); + else + ret = ops->set(sk, + val, opt, *len); + goto out; + } + } + } + } + up(&nf_sockopt_mutex); + return -ENOPROTOOPT; + + out: + down(&nf_sockopt_mutex); + ops->use--; + if (ops->cleanup_task) + wake_up_process(ops->cleanup_task); + up(&nf_sockopt_mutex); + return ret; +} + +int compat_nf_setsockopt(struct sock *sk, int pf, + int val, char __user *opt, int len) +{ + return compat_nf_sockopt(sk, pf, val, opt, &len, 0); +} +EXPORT_SYMBOL(compat_nf_setsockopt); + +int compat_nf_getsockopt(struct sock *sk, int pf, + int val, char __user *opt, int *len) +{ + return compat_nf_sockopt(sk, pf, val, opt, len, 1); +} +EXPORT_SYMBOL(compat_nf_getsockopt); +#endif --- ./net/sctp/ipv6.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/sctp/ipv6.c 2006-03-07 16:26:56.000000000 +0300 @@ -875,6 +875,10 @@ static const struct proto_ops inet6_seqp .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, --- ./net/sctp/protocol.c.compat 2006-03-06 12:06:34.000000000 +0300 +++ ./net/sctp/protocol.c 2006-03-07 15:51:18.000000000 +0300 @@ -845,6 +845,10 @@ static const struct proto_ops inet_seqpa .shutdown = inet_shutdown, /* Looks harmless. */ .setsockopt = sock_common_setsockopt, /* IP_SOL IP_OPTION is a problem. */ .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, @@ -883,6 +887,10 @@ static struct sctp_af sctp_ipv4_specific .sctp_xmit = sctp_v4_xmit, .setsockopt = ip_setsockopt, .getsockopt = ip_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif .get_dst = sctp_v4_get_dst, .get_saddr = sctp_v4_get_saddr, .copy_addrlist = sctp_v4_copy_addrlist, From kgy at deverto.com Tue Mar 7 15:40:04 2006 From: kgy at deverto.com (Kovesdi Gyorgy) Date: Wed Mar 8 11:58:51 2006 Subject: output device In-Reply-To: <440845B4.3000208@ufomechanic.net> References: <200603030944.11858.kgy@deverto.com> <440845B4.3000208@ufomechanic.net> Message-ID: <200603071540.04739.kgy@deverto.com> > > I would like to set the output device in a rule (it is needed due to > > overlapping addresses). AFAIK, it cannot be done directly in a rule... > why can't you use ipt_route ? Not only the output device, but the input device handling is also missing from the conntrack. So, if a packet arrives from a device, a conntrack rule is generated. If a packet arrives from another device with the same address and port, it is handled with the conntrack entry of the previous packet. Regards Gyorgy Kovesdi From arnd at arndb.de Tue Mar 7 16:05:38 2006 From: arnd at arndb.de (Arnd Bergmann) Date: Wed Mar 8 11:58:52 2006 Subject: {get|set}sockopt compat layer In-Reply-To: <200603071707.19138.dim@openvz.org> References: <200602201110.39092.dim@openvz.org> <200602211256.00846.arnd@arndb.de> <200603071707.19138.dim@openvz.org> Message-ID: <200603071605.39177.arnd@arndb.de> On Tuesday 07 March 2006 15:07, Dmitry Mishin wrote: > Sorry for such delay, was on vacancy. Here is a patch, introducing > compat_(get|set)sockopt handlers, as you proposed. Looks pretty good to me, just a few nits I like to pick: > --- ./include/linux/net.h.compat????????2006-03-07 11:22:27.000000000 +0300 > +++ ./include/linux/net.h???????2006-03-07 11:20:07.000000000 +0300 > @@ -149,6 +149,12 @@ struct proto_ops { > ???????????????????????????????? ? ? ?int optname, char __user *optval, int optlen); > ????????int?????????????(*getsockopt)(struct socket *sock, int level, > ???????????????????????????????? ? ? ?int optname, char __user *optval, int __user *optlen); > +#ifdef CONFIG_COMPAT > +???????int?????????????(*compat_setsockopt)(struct socket *sock, int level, > +??????????????????????????????? ? ? ?int optname, char __user *optval, int optlen); > +???????int?????????????(*compat_getsockopt)(struct socket *sock, int level, > +??????????????????????????????? ? ? ?int optname, char __user *optval, int __user *optlen); > +#endif > ????????int?????????????(*sendmsg) ? (struct kiocb *iocb, struct socket *sock, > ???????????????????????????????? ? ? ?struct msghdr *m, size_t total_len); > ????????int?????????????(*recvmsg) ? (struct kiocb *iocb, struct socket *sock, For the compat_ioctl stuff, we don't have the function pointer inside an #ifdef, the overhead is relatively small since there is only one of these structures per module implementing a protocol, but it avoids having to rebuild everything when changing CONFIG_COMPAT. It's probably not a big issue either way, maybe davem has a stronger opinion on it either way. > --- ./include/linux/netfilter.h.compat??2006-03-06 12:06:34.000000000 +0300 > +++ ./include/linux/netfilter.h?2006-03-07 15:00:14.000000000 +0300 > @@ -2,6 +2,7 @@ > ?#define __LINUX_NETFILTER_H > ? > ?#ifdef __KERNEL__ > +#include > ?#include > ?#include > ?#include You don't need to add new includes any more, these are automatic now. > @@ -80,10 +81,18 @@ struct nf_sockopt_ops > ????????int set_optmin; > ????????int set_optmax; > ????????int (*set)(struct sock *sk, int optval, void __user *user, unsigned int len); > +#ifdef CONFIG_COMPAT > +???????int (*compat_set)(struct sock *sk, int optval, > +???????????????????????void __user *user, unsigned int len); > +#endif > ? > ????????int get_optmin; > ????????int get_optmax; > ????????int (*get)(struct sock *sk, int optval, void __user *user, int *len); > +#ifdef CONFIG_COMPAT > +???????int (*compat_get)(struct sock *sk, int optval, > +???????????????????????void __user *user, int *len); > +#endif > ? > ????????/* Number of users inside set() or get(). */ > ????????unsigned int use; see above, same for some more of these. > @@ -816,6 +826,12 @@ extern int sock_common_recvmsg(struct ki > ???????????????????????? ? ? ? struct msghdr *msg, size_t size, int flags); > ?extern int sock_common_setsockopt(struct socket *sock, int level, int optname, > ???????????????????????????????? ?char __user *optval, int optlen); > +#ifdef CONFIG_COMPAT > +extern int compat_sock_common_getsockopt(struct socket *sock, int level, > +???????????????int optname, char __user *optval, int __user *optlen); > +extern int compat_sock_common_setsockopt(struct socket *sock, int level, > +???????????????int optname, char __user *optval, int optlen); > +#endif > ? > ?extern void sk_common_release(struct sock *sk); > ? Declarations don't belong inside #ifdef. Arnd <>< From kaber at trash.net Wed Mar 8 13:16:27 2006 From: kaber at trash.net (Patrick McHardy) Date: Wed Mar 8 13:29:34 2006 Subject: [patch] ipt_recent In-Reply-To: <440DAB6B.4020208@ufomechanic.net> References: <43F9EA77.4060208@ufomechanic.net> <44096532.2070000@trash.net> <440DAB6B.4020208@ufomechanic.net> Message-ID: <440ECB1B.4070507@trash.net> Amin Azez wrote: > Patrick McHardy wrote: > >>> @@ -20,6 +25,8 @@ >>> u_int32_t hit_count; >>> u_int8_t check_set; >>> u_int8_t invert; >>> + u_int8_t check_count; >>> + u_int32_t entry_count; >>> char name[IPT_RECENT_NAME_LEN]; >>> u_int8_t side; >>> }; >> >> >> >> Sorry, we can't do that since it breaks userspace compatibility. But I'm >> really glad someone finally has the stomach to touch ipt_recent, I'll >> review your other patches now. > > > I've reworked that functionality significantly in a new patch to send > next week. I will see if I can find a way to make use of existing > structures to add the functionality. Otherwise you can you versioning as in ipt_MARK and a couple of other targets. > > I heard tell that ipt_recent needed a maintainer? Yes, we need someone familiar with the code to review patches, fix bugs and clean it up. From dim at openvz.org Thu Mar 9 11:23:59 2006 From: dim at openvz.org (Dmitry Mishin) Date: Thu Mar 9 12:36:08 2006 Subject: {get|set}sockopt compat layer In-Reply-To: <200603071605.39177.arnd@arndb.de> References: <200602201110.39092.dim@openvz.org> <200603071707.19138.dim@openvz.org> <200603071605.39177.arnd@arndb.de> Message-ID: <200603091324.00362.dim@openvz.org> Hello, Arnd! > For the compat_ioctl stuff, we don't have the function pointer inside an > #ifdef, the overhead is relatively small since there is only one of these > structures per module implementing a protocol, but it avoids having to > rebuild everything when changing CONFIG_COMPAT. > > It's probably not a big issue either way, maybe davem has a stronger > opinion on it either way. > Done. -- Thanks, Dmitry. -------------- next part -------------- --- ./include/linux/net.h.compat 2006-03-09 12:57:53.000000000 +0300 +++ ./include/linux/net.h 2006-03-09 12:58:53.000000000 +0300 @@ -149,6 +149,10 @@ struct proto_ops { int optname, char __user *optval, int optlen); int (*getsockopt)(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen); + int (*compat_setsockopt)(struct socket *sock, int level, + int optname, char __user *optval, int optlen); + int (*compat_getsockopt)(struct socket *sock, int level, + int optname, char __user *optval, int __user *optlen); int (*sendmsg) (struct kiocb *iocb, struct socket *sock, struct msghdr *m, size_t total_len); int (*recvmsg) (struct kiocb *iocb, struct socket *sock, --- ./include/linux/netfilter.h.compat 2006-03-09 12:57:53.000000000 +0300 +++ ./include/linux/netfilter.h 2006-03-09 12:59:44.000000000 +0300 @@ -80,10 +80,14 @@ struct nf_sockopt_ops int set_optmin; int set_optmax; int (*set)(struct sock *sk, int optval, void __user *user, unsigned int len); + int (*compat_set)(struct sock *sk, int optval, + void __user *user, unsigned int len); int get_optmin; int get_optmax; int (*get)(struct sock *sk, int optval, void __user *user, int *len); + int (*compat_get)(struct sock *sk, int optval, + void __user *user, int *len); /* Number of users inside set() or get(). */ unsigned int use; @@ -246,6 +250,11 @@ int nf_setsockopt(struct sock *sk, int p int nf_getsockopt(struct sock *sk, int pf, int optval, char __user *opt, int *len); +int compat_nf_setsockopt(struct sock *sk, int pf, int optval, + char __user *opt, int len); +int compat_nf_getsockopt(struct sock *sk, int pf, int optval, + char __user *opt, int *len); + /* Packet queuing */ struct nf_queue_handler { int (*outfn)(struct sk_buff *skb, struct nf_info *info, --- ./include/net/inet_connection_sock.h.compat 2006-03-09 12:57:53.000000000 +0300 +++ ./include/net/inet_connection_sock.h 2006-03-09 12:59:58.000000000 +0300 @@ -50,6 +50,12 @@ struct inet_connection_sock_af_ops { char __user *optval, int optlen); int (*getsockopt)(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); + int (*compat_setsockopt)(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); + int (*compat_getsockopt)(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); void (*addr2sockaddr)(struct sock *sk, struct sockaddr *); int sockaddr_len; }; --- ./include/net/ip.h.compat 2006-03-09 12:57:53.000000000 +0300 +++ ./include/net/ip.h 2006-03-09 13:00:15.000000000 +0300 @@ -356,6 +356,10 @@ extern void ip_cmsg_recv(struct msghdr * extern int ip_cmsg_send(struct msghdr *msg, struct ipcm_cookie *ipc); extern int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); extern int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); +extern int compat_ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen); +extern int compat_ip_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen); extern int ip_ra_control(struct sock *sk, unsigned char on, void (*destructor)(struct sock *)); extern int ip_recv_error(struct sock *sk, struct msghdr *msg, int len); --- ./include/net/sctp/structs.h.compat 2006-03-09 12:57:53.000000000 +0300 +++ ./include/net/sctp/structs.h 2006-03-09 13:00:36.000000000 +0300 @@ -514,6 +514,16 @@ struct sctp_af { int optname, char __user *optval, int __user *optlen); + int (*compat_setsockopt) (struct sock *sk, + int level, + int optname, + char __user *optval, + int optlen); + int (*compat_getsockopt) (struct sock *sk, + int level, + int optname, + char __user *optval, + int __user *optlen); struct dst_entry *(*get_dst) (struct sctp_association *asoc, union sctp_addr *daddr, union sctp_addr *saddr); --- ./include/net/sock.h.compat 2006-03-09 12:57:53.000000000 +0300 +++ ./include/net/sock.h 2006-03-09 13:01:10.000000000 +0300 @@ -520,6 +520,14 @@ struct proto { int (*getsockopt)(struct sock *sk, int level, int optname, char __user *optval, int __user *option); + int (*compat_setsockopt)(struct sock *sk, + int level, + int optname, char __user *optval, + int optlen); + int (*compat_getsockopt)(struct sock *sk, + int level, + int optname, char __user *optval, + int __user *option); int (*sendmsg)(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len); int (*recvmsg)(struct kiocb *iocb, struct sock *sk, @@ -816,6 +824,10 @@ extern int sock_common_recvmsg(struct ki struct msghdr *msg, size_t size, int flags); extern int sock_common_setsockopt(struct socket *sock, int level, int optname, char __user *optval, int optlen); +extern int compat_sock_common_getsockopt(struct socket *sock, int level, + int optname, char __user *optval, int __user *optlen); +extern int compat_sock_common_setsockopt(struct socket *sock, int level, + int optname, char __user *optval, int optlen); extern void sk_common_release(struct sock *sk); --- ./include/net/tcp.h.compat 2006-03-09 12:57:53.000000000 +0300 +++ ./include/net/tcp.h 2006-03-09 13:02:50.000000000 +0300 @@ -347,6 +347,12 @@ extern int tcp_getsockopt(struct sock extern int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); +extern int compat_tcp_getsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); +extern int compat_tcp_setsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); extern void tcp_set_keepalive(struct sock *sk, int val); extern int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, --- ./net/compat.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/compat.c 2006-03-09 12:58:23.000000000 +0300 @@ -416,7 +416,7 @@ struct compat_sock_fprog { compat_uptr_t filter; /* struct sock_filter * */ }; -static int do_set_attach_filter(int fd, int level, int optname, +static int do_set_attach_filter(struct socket *sock, int level, int optname, char __user *optval, int optlen) { struct compat_sock_fprog __user *fprog32 = (struct compat_sock_fprog __user *)optval; @@ -432,11 +432,12 @@ static int do_set_attach_filter(int fd, __put_user(compat_ptr(ptr), &kfprog->filter)) return -EFAULT; - return sys_setsockopt(fd, level, optname, (char __user *)kfprog, + return sock_setsockopt(sock, level, optname, (char __user *)kfprog, sizeof(struct sock_fprog)); } -static int do_set_sock_timeout(int fd, int level, int optname, char __user *optval, int optlen) +static int do_set_sock_timeout(struct socket *sock, int level, + int optname, char __user *optval, int optlen) { struct compat_timeval __user *up = (struct compat_timeval __user *) optval; struct timeval ktime; @@ -451,30 +452,61 @@ static int do_set_sock_timeout(int fd, i return -EFAULT; old_fs = get_fs(); set_fs(KERNEL_DS); - err = sys_setsockopt(fd, level, optname, (char *) &ktime, sizeof(ktime)); + err = sock_setsockopt(sock, level, optname, (char *) &ktime, sizeof(ktime)); set_fs(old_fs); return err; } +static int compat_sock_setsockopt(struct socket *sock, int level, int optname, + char __user *optval, int optlen) +{ + if (optname == SO_ATTACH_FILTER) + return do_set_attach_filter(sock, level, optname, + optval, optlen); + if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO) + return do_set_sock_timeout(sock, level, optname, optval, optlen); + + return sock_setsockopt(sock, level, optname, optval, optlen); +} + asmlinkage long compat_sys_setsockopt(int fd, int level, int optname, char __user *optval, int optlen) { + int err; + struct socket *sock; + /* SO_SET_REPLACE seems to be the same in all levels */ if (optname == IPT_SO_SET_REPLACE) return do_netfilter_replace(fd, level, optname, optval, optlen); - if (level == SOL_SOCKET && optname == SO_ATTACH_FILTER) - return do_set_attach_filter(fd, level, optname, - optval, optlen); - if (level == SOL_SOCKET && - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) - return do_set_sock_timeout(fd, level, optname, optval, optlen); - return sys_setsockopt(fd, level, optname, optval, optlen); + if (optlen < 0) + return -EINVAL; + + if ((sock = sockfd_lookup(fd, &err))!=NULL) + { + err = security_socket_setsockopt(sock,level,optname); + if (err) { + sockfd_put(sock); + return err; + } + + if (level == SOL_SOCKET) + err = compat_sock_setsockopt(sock, level, + optname, optval, optlen); + else if (sock->ops->compat_setsockopt) + err = sock->ops->compat_setsockopt(sock, level, + optname, optval, optlen); + else + err = sock->ops->setsockopt(sock, level, + optname, optval, optlen); + sockfd_put(sock); + } + return err; } -static int do_get_sock_timeout(int fd, int level, int optname, +static int do_get_sock_timeout(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { struct compat_timeval __user *up; @@ -490,7 +522,7 @@ static int do_get_sock_timeout(int fd, i len = sizeof(ktime); old_fs = get_fs(); set_fs(KERNEL_DS); - err = sys_getsockopt(fd, level, optname, (char *) &ktime, &len); + err = sock_getsockopt(sock, level, optname, (char *) &ktime, &len); set_fs(old_fs); if (!err) { @@ -503,15 +535,42 @@ static int do_get_sock_timeout(int fd, i return err; } -asmlinkage long compat_sys_getsockopt(int fd, int level, int optname, +static int compat_sock_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { - if (level == SOL_SOCKET && - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) - return do_get_sock_timeout(fd, level, optname, optval, optlen); - return sys_getsockopt(fd, level, optname, optval, optlen); + if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO) + return do_get_sock_timeout(sock, level, optname, optval, optlen); + return sock_getsockopt(sock, level, optname, optval, optlen); } +asmlinkage long compat_sys_getsockopt(int fd, int level, int optname, + char __user *optval, int __user *optlen) +{ + int err; + struct socket *sock; + + if ((sock = sockfd_lookup(fd, &err))!=NULL) + { + err = security_socket_getsockopt(sock, level, + optname); + if (err) { + sockfd_put(sock); + return err; + } + + if (level == SOL_SOCKET) + err = compat_sock_getsockopt(sock, level, + optname, optval, optlen); + else if (sock->ops->compat_getsockopt) + err = sock->ops->compat_getsockopt(sock, level, + optname, optval, optlen); + else + err = sock->ops->getsockopt(sock, level, + optname, optval, optlen); + sockfd_put(sock); + } + return err; +} /* Argument list sizes for compat_sys_socketcall */ #define AL(x) ((x) * sizeof(u32)) static unsigned char nas[18]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3), --- ./net/core/sock.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/core/sock.c 2006-03-09 12:58:23.000000000 +0300 @@ -1385,6 +1385,20 @@ int sock_common_getsockopt(struct socket EXPORT_SYMBOL(sock_common_getsockopt); +#ifdef CONFIG_COMPAT +int compat_sock_common_getsockopt(struct socket *sock, int level, + int optname, char __user *optval, int __user *optlen) +{ + struct sock *sk = sock->sk; + + if (sk->sk_prot->compat_setsockopt) + return sk->sk_prot->compat_getsockopt(sk, level, + optname, optval, optlen); + return sk->sk_prot->getsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL(compat_sock_common_getsockopt); +#endif + int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t size, int flags) { @@ -1414,6 +1428,20 @@ int sock_common_setsockopt(struct socket EXPORT_SYMBOL(sock_common_setsockopt); +#ifdef CONFIG_COMPAT +int compat_sock_common_setsockopt(struct socket *sock, + int level, int optname, char __user *optval, int optlen) +{ + struct sock *sk = sock->sk; + + if (sk->sk_prot->compat_setsockopt) + return sk->sk_prot->compat_setsockopt(sk, level, + optname, optval, optlen); + return sk->sk_prot->setsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL(compat_sock_common_setsockopt); +#endif + void sk_common_release(struct sock *sk) { if (sk->sk_prot->destroy) --- ./net/dccp/dccp.h.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/dccp/dccp.h 2006-03-09 12:58:23.000000000 +0300 @@ -246,6 +246,14 @@ extern int dccp_getsockopt(struct soc char __user *optval, int __user *optlen); extern int dccp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); +#ifdef CONFIG_COMPAT +extern int compat_dccp_getsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); +extern int compat_dccp_setsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); +#endif extern int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg); extern int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t size); --- ./net/dccp/ipv4.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/dccp/ipv4.c 2006-03-09 12:58:23.000000000 +0300 @@ -1028,6 +1028,10 @@ struct inet_connection_sock_af_ops dccp_ .net_header_len = sizeof(struct iphdr), .setsockopt = ip_setsockopt, .getsockopt = ip_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif .addr2sockaddr = inet_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in), }; @@ -1152,6 +1156,10 @@ struct proto dccp_prot = { .init = dccp_v4_init_sock, .setsockopt = dccp_setsockopt, .getsockopt = dccp_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_dccp_setsockopt, + .compat_getsockopt = compat_dccp_getsockopt, +#endif .sendmsg = dccp_sendmsg, .recvmsg = dccp_recvmsg, .backlog_rcv = dccp_v4_do_rcv, --- ./net/dccp/ipv6.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/dccp/ipv6.c 2006-03-09 12:58:23.000000000 +0300 @@ -1170,6 +1170,10 @@ static struct proto dccp_v6_prot = { .init = dccp_v6_init_sock, .setsockopt = dccp_setsockopt, .getsockopt = dccp_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_dccp_setsockopt, + .compat_getsockopt = compat_dccp_getsockopt, +#endif .sendmsg = dccp_sendmsg, .recvmsg = dccp_recvmsg, .backlog_rcv = dccp_v6_do_rcv, @@ -1207,6 +1211,10 @@ static struct proto_ops inet6_dccp_ops = .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, --- ./net/dccp/proto.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/dccp/proto.c 2006-03-09 12:58:23.000000000 +0300 @@ -255,18 +255,13 @@ static int dccp_setsockopt_service(struc return 0; } -int dccp_setsockopt(struct sock *sk, int level, int optname, - char __user *optval, int optlen) +static int do_dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) { struct dccp_sock *dp; int err; int val; - if (level != SOL_DCCP) - return inet_csk(sk)->icsk_af_ops->setsockopt(sk, level, - optname, optval, - optlen); - if (optlen < sizeof(int)) return -EINVAL; @@ -293,8 +288,34 @@ int dccp_setsockopt(struct sock *sk, int return err; } +int dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_DCCP) + return inet_csk(sk)->icsk_af_ops->setsockopt(sk, level, + optname, optval, + optlen); + return do_dccp_setsockopt(sk, level, optname, optval, optlen); +} EXPORT_SYMBOL_GPL(dccp_setsockopt); +#ifdef CONFIG_COMPAT +int compat_dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_DCCP) { + if (inet_csk(sk)->icsk_af_ops->compat_setsockopt) + return inet_csk(sk)->icsk_af_ops->compat_setsockopt(sk, + level, optname, optval, optlen); + else + return inet_csk(sk)->icsk_af_ops->setsockopt(sk, + level, optname, optval, optlen); + } + return do_dccp_setsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL_GPL(compat_dccp_setsockopt); +#endif + static int dccp_getsockopt_service(struct sock *sk, int len, u32 __user *optval, int __user *optlen) @@ -326,16 +347,12 @@ out: return err; } -int dccp_getsockopt(struct sock *sk, int level, int optname, +static int do_dccp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { struct dccp_sock *dp; int val, len; - if (level != SOL_DCCP) - return inet_csk(sk)->icsk_af_ops->getsockopt(sk, level, - optname, optval, - optlen); if (get_user(len, optlen)) return -EFAULT; @@ -368,8 +385,34 @@ int dccp_getsockopt(struct sock *sk, int return 0; } +int dccp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_DCCP) + return inet_csk(sk)->icsk_af_ops->getsockopt(sk, level, + optname, optval, + optlen); + return do_dccp_getsockopt(sk, level, optname, optval, optlen); +} EXPORT_SYMBOL_GPL(dccp_getsockopt); +#ifdef CONFIG_COMPAT +int compat_dccp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_DCCP) { + if (inet_csk(sk)->icsk_af_ops->compat_setsockopt) + return inet_csk(sk)->icsk_af_ops->compat_getsockopt(sk, + level, optname, optval, optlen); + else + return inet_csk(sk)->icsk_af_ops->getsockopt(sk, + level, optname, optval, optlen); + } + return do_dccp_getsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL_GPL(compat_dccp_getsockopt); +#endif + int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len) { @@ -696,6 +739,10 @@ static const struct proto_ops inet_dccp_ .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, --- ./net/ipv4/af_inet.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/ipv4/af_inet.c 2006-03-09 12:58:23.000000000 +0300 @@ -802,6 +802,10 @@ const struct proto_ops inet_stream_ops = .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, @@ -823,6 +827,10 @@ const struct proto_ops inet_dgram_ops = .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, @@ -848,6 +856,10 @@ static const struct proto_ops inet_sockr .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, --- ./net/ipv4/ip_sockglue.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/ipv4/ip_sockglue.c 2006-03-09 12:58:23.000000000 +0300 @@ -380,14 +380,12 @@ out: * an IP socket. */ -int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) +static int do_ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) { struct inet_sock *inet = inet_sk(sk); int val=0,err; - if (level != SOL_IP) - return -ENOPROTOOPT; - if (((1< (MRT_BASE + 10)) +#endif + ) { + lock_sock(sk); + err = nf_setsockopt(sk, PF_INET, optname, optval, optlen); + release_sock(sk); + } +#endif + return err; +} + +#ifdef CONFIG_COMPAT +int compat_ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) +{ + int err; + + if (level != SOL_IP) + return -ENOPROTOOPT; + + err = do_ip_setsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_HDRINCL && + optname != IP_IPSEC_POLICY && optname != IP_XFRM_POLICY +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > (MRT_BASE + 10)) +#endif + ) { + lock_sock(sk); + err = compat_nf_setsockopt(sk, PF_INET, + optname, optval, optlen); + release_sock(sk); + } +#endif + return err; +} +#endif + /* * Get the options. Note for future reference. The GET of IP options gets the * _received_ ones. The set sets the _sent_ ones. */ -int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) +static int do_ip_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) { struct inet_sock *inet = inet_sk(sk); int val; @@ -1051,17 +1098,8 @@ int ip_getsockopt(struct sock *sk, int l val = inet->freebind; break; default: -#ifdef CONFIG_NETFILTER - val = nf_getsockopt(sk, PF_INET, optname, optval, - &len); - release_sock(sk); - if (val >= 0) - val = put_user(len, optlen); - return val; -#else release_sock(sk); return -ENOPROTOOPT; -#endif } release_sock(sk); @@ -1082,7 +1120,73 @@ int ip_getsockopt(struct sock *sk, int l return 0; } +int ip_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) +{ + int err; + + err = do_ip_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > MRT_BASE+10) +#endif + ) { + int len; + + if(get_user(len,optlen)) + return -EFAULT; + + lock_sock(sk); + err = nf_getsockopt(sk, PF_INET, optname, optval, + &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + return err; + } +#endif + return err; +} + +#ifdef CONFIG_COMPAT +int compat_ip_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) +{ + int err; + + err = do_ip_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > MRT_BASE+10) +#endif + ) { + int len; + + if(get_user(len,optlen)) + return -EFAULT; + + lock_sock(sk); + err = compat_nf_getsockopt(sk, PF_INET, + optname, optval, &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + return err; + } +#endif + return err; +} +#endif + EXPORT_SYMBOL(ip_cmsg_recv); EXPORT_SYMBOL(ip_getsockopt); EXPORT_SYMBOL(ip_setsockopt); +#ifdef CONFIG_COMPAT +EXPORT_SYMBOL(compat_ip_getsockopt); +EXPORT_SYMBOL(compat_ip_setsockopt); +#endif --- ./net/ipv4/raw.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/ipv4/raw.c 2006-03-09 12:58:23.000000000 +0300 @@ -660,12 +660,9 @@ static int raw_geticmpfilter(struct sock out: return ret; } -static int raw_setsockopt(struct sock *sk, int level, int optname, +static int do_raw_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { - if (level != SOL_RAW) - return ip_setsockopt(sk, level, optname, optval, optlen); - if (optname == ICMP_FILTER) { if (inet_sk(sk)->num != IPPROTO_ICMP) return -EOPNOTSUPP; @@ -675,12 +672,28 @@ static int raw_setsockopt(struct sock *s return -ENOPROTOOPT; } -static int raw_getsockopt(struct sock *sk, int level, int optname, - char __user *optval, int __user *optlen) +static int raw_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) { if (level != SOL_RAW) - return ip_getsockopt(sk, level, optname, optval, optlen); + return ip_setsockopt(sk, level, optname, optval, optlen); + return do_raw_setsockopt(sk, level, optname, optval, optlen); +} +#ifdef CONFIG_COMPAT +static int compat_raw_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_RAW) + return compat_ip_setsockopt(sk, level, + optname, optval, optlen); + return do_raw_setsockopt(sk, level, optname, optval, optlen); +} +#endif + +static int do_raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ if (optname == ICMP_FILTER) { if (inet_sk(sk)->num != IPPROTO_ICMP) return -EOPNOTSUPP; @@ -690,6 +703,25 @@ static int raw_getsockopt(struct sock *s return -ENOPROTOOPT; } +static int raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_RAW) + return ip_getsockopt(sk, level, optname, optval, optlen); + return do_raw_getsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +static int compat_raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_RAW) + return compat_ip_getsockopt(sk, level, + optname, optval, optlen); + return do_raw_getsockopt(sk, level, optname, optval, optlen); +} +#endif + static int raw_ioctl(struct sock *sk, int cmd, unsigned long arg) { switch (cmd) { @@ -728,6 +760,10 @@ struct proto raw_prot = { .init = raw_init, .setsockopt = raw_setsockopt, .getsockopt = raw_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_raw_setsockopt, + .compat_getsockopt = compat_raw_getsockopt, +#endif .sendmsg = raw_sendmsg, .recvmsg = raw_recvmsg, .bind = raw_bind, --- ./net/ipv4/tcp.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/ipv4/tcp.c 2006-03-09 12:58:23.000000000 +0300 @@ -1687,18 +1687,14 @@ int tcp_disconnect(struct sock *sk, int /* * Socket option code for TCP. */ -int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, - int optlen) +static int do_tcp_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); int val; int err = 0; - if (level != SOL_TCP) - return icsk->icsk_af_ops->setsockopt(sk, level, optname, - optval, optlen); - /* This is a string value all the others are int's */ if (optname == TCP_CONGESTION) { char name[TCP_CA_NAME_MAX]; @@ -1871,6 +1867,35 @@ int tcp_setsockopt(struct sock *sk, int return err; } +int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, + int optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) + return icsk->icsk_af_ops->setsockopt(sk, level, optname, + optval, optlen); + return do_tcp_setsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +int compat_tcp_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) { + if (icsk->icsk_af_ops->compat_setsockopt) + return icsk->icsk_af_ops->compat_setsockopt(sk, + level, optname, optval, optlen); + else + return icsk->icsk_af_ops->setsockopt(sk, + level, optname, optval, optlen); + } + return do_tcp_setsockopt(sk, level, optname, optval, optlen); +} +#endif + /* Return information about state of tcp endpoint in API format. */ void tcp_get_info(struct sock *sk, struct tcp_info *info) { @@ -1931,17 +1956,13 @@ void tcp_get_info(struct sock *sk, struc EXPORT_SYMBOL_GPL(tcp_get_info); -int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, - int __user *optlen) +static int do_tcp_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); int val, len; - if (level != SOL_TCP) - return icsk->icsk_af_ops->getsockopt(sk, level, optname, - optval, optlen); - if (get_user(len, optlen)) return -EFAULT; @@ -2025,6 +2046,34 @@ int tcp_getsockopt(struct sock *sk, int return 0; } +int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, + int __user *optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) + return icsk->icsk_af_ops->getsockopt(sk, level, optname, + optval, optlen); + return do_tcp_getsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +int compat_tcp_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) { + if (icsk->icsk_af_ops->compat_getsockopt) + return icsk->icsk_af_ops->compat_getsockopt(sk, + level, optname, optval, optlen); + else + return icsk->icsk_af_ops->getsockopt(sk, + level, optname, optval, optlen); + } + return do_tcp_getsockopt(sk, level, optname, optval, optlen); +} +#endif extern void __skb_cb_too_small_for_tcp(int, int); extern struct tcp_congestion_ops tcp_reno; @@ -2142,3 +2191,7 @@ EXPORT_SYMBOL(tcp_sendpage); EXPORT_SYMBOL(tcp_setsockopt); EXPORT_SYMBOL(tcp_shutdown); EXPORT_SYMBOL(tcp_statistics); +#ifdef CONFIG_COMPAT +EXPORT_SYMBOL(compat_tcp_setsockopt); +EXPORT_SYMBOL(compat_tcp_getsockopt); +#endif --- ./net/ipv4/tcp_ipv4.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/ipv4/tcp_ipv4.c 2006-03-09 12:58:23.000000000 +0300 @@ -1225,6 +1225,10 @@ struct inet_connection_sock_af_ops ipv4_ .net_header_len = sizeof(struct iphdr), .setsockopt = ip_setsockopt, .getsockopt = ip_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif .addr2sockaddr = inet_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in), }; @@ -1807,6 +1811,10 @@ struct proto tcp_prot = { .shutdown = tcp_shutdown, .setsockopt = tcp_setsockopt, .getsockopt = tcp_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_tcp_setsockopt, + .compat_getsockopt = compat_tcp_getsockopt, +#endif .sendmsg = tcp_sendmsg, .recvmsg = tcp_recvmsg, .backlog_rcv = tcp_v4_do_rcv, --- ./net/ipv4/udp.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/ipv4/udp.c 2006-03-09 12:58:23.000000000 +0300 @@ -1207,16 +1207,13 @@ static int udp_destroy_sock(struct sock /* * Socket option code for UDP */ -static int udp_setsockopt(struct sock *sk, int level, int optname, +static int do_udp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { struct udp_sock *up = udp_sk(sk); int val; int err = 0; - if (level != SOL_UDP) - return ip_setsockopt(sk, level, optname, optval, optlen); - if(optlenpf == pf) { + if (get) { + if (val >= ops->get_optmin + && val < ops->get_optmax) { + ops->use++; + up(&nf_sockopt_mutex); + if (ops->compat_get) + ret = ops->compat_get(sk, + val, opt, len); + else + ret = ops->get(sk, + val, opt, len); + goto out; + } + } else { + if (val >= ops->set_optmin + && val < ops->set_optmax) { + ops->use++; + up(&nf_sockopt_mutex); + if (ops->compat_set) + ret = ops->compat_set(sk, + val, opt, *len); + else + ret = ops->set(sk, + val, opt, *len); + goto out; + } + } + } + } + up(&nf_sockopt_mutex); + return -ENOPROTOOPT; + + out: + down(&nf_sockopt_mutex); + ops->use--; + if (ops->cleanup_task) + wake_up_process(ops->cleanup_task); + up(&nf_sockopt_mutex); + return ret; +} + +int compat_nf_setsockopt(struct sock *sk, int pf, + int val, char __user *opt, int len) +{ + return compat_nf_sockopt(sk, pf, val, opt, &len, 0); +} +EXPORT_SYMBOL(compat_nf_setsockopt); + +int compat_nf_getsockopt(struct sock *sk, int pf, + int val, char __user *opt, int *len) +{ + return compat_nf_sockopt(sk, pf, val, opt, len, 1); +} +EXPORT_SYMBOL(compat_nf_getsockopt); +#endif --- ./net/sctp/ipv6.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/sctp/ipv6.c 2006-03-09 12:58:23.000000000 +0300 @@ -875,6 +875,10 @@ static const struct proto_ops inet6_seqp .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, --- ./net/sctp/protocol.c.compat 2006-03-09 12:57:54.000000000 +0300 +++ ./net/sctp/protocol.c 2006-03-09 12:58:23.000000000 +0300 @@ -845,6 +845,10 @@ static const struct proto_ops inet_seqpa .shutdown = inet_shutdown, /* Looks harmless. */ .setsockopt = sock_common_setsockopt, /* IP_SOL IP_OPTION is a problem. */ .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, @@ -883,6 +887,10 @@ static struct sctp_af sctp_ipv4_specific .sctp_xmit = sctp_v4_xmit, .setsockopt = ip_setsockopt, .getsockopt = ip_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif .get_dst = sctp_v4_get_dst, .get_saddr = sctp_v4_get_saddr, .copy_addrlist = sctp_v4_copy_addrlist, From pch at coolsystems.dk Thu Mar 9 13:16:37 2006 From: pch at coolsystems.dk (Peter Christensen) Date: Thu Mar 9 13:29:40 2006 Subject: Statefull SOCKS filter Message-ID: <44101CA5.3070004@coolsystems.dk> Hi, I'm currently in the development of a transparent firewall bridge, whose sole purpose is to filter our everything but LAN traffic and traffic for a list of privileged servers on the Internet. Since it is meant to work on an bunch of different network configurations out-of-box, it must be able to detect and filter proxy traffic as well. My problem is specifically with making a SOCKS filter. I've done it in user-space with great success (basically a state machine), but I naturally want this to be done in iptables. And here is the real question: Are the any preferred "smart" way of doing this kind of statefull filters, where some upper software layer handles the actual connection for me, if you follow me? At first I thought connection tracking was the way to go, but apparently this is primarily for temporarily accepting a given connection based on the content of another connection. I CAN solve the whole thing just by making a basic match filter, having my own array of current connections with their appropriate SOCKS state (This is basically what my user-space equivalent does), but I think that it is quite a lot of work, especially if a similar thing is already done elsewhere in the kernel. After all, the bridge does not have a tremendous amount of processing power! I apologize if I'm just too uninformed, but I've so far failed to find any documentation of how to make an actual statefull filter, whose purpose was NOT to help out NAT etc. -- Best regards Peter Christensen Developer ------------------ Cool Systems ApS Tel: +45 2888 1600 @ : pch@coolsystems.dk www: www.coolsystems.dk From alexeyt at freeshell.org Thu Mar 9 14:08:24 2006 From: alexeyt at freeshell.org (Alexey Toptygin) Date: Thu Mar 9 14:21:43 2006 Subject: Statefull SOCKS filter In-Reply-To: <44101CA5.3070004@coolsystems.dk> References: <44101CA5.3070004@coolsystems.dk> Message-ID: Perhaps libipq and the QUEUE target will do what you want? Alexey From beunlovable at gmail.com Thu Mar 9 14:30:06 2006 From: beunlovable at gmail.com (David Vogt) Date: Thu Mar 9 14:43:09 2006 Subject: libipq does not shorten package Message-ID: <859616420603090530p24d0cb9cg@mail.gmail.com> Dear all, I already posted this question on netfilter mailing list, however, I got no response, so maybe someone around here can help me. I am using iptables and queue packets to user space. Packets are either augmented (with a signature) or shortened (removal of signature). The augmentation works. However, when removing the signature, the resulting packet contains the correct data, but its overall size is equal to the original (signed) packet. (i.e. original packet size: 83, signed packet size: 163) I checked this using ethereal on the receiving machine, which lists packets with the additional bytes (163-83=79) as trailer for the ethernet frame. (So, actually, the packet is processed correclty on the other side, however, the additional bytes seem odd to me) I am using iptables 1.2.9 which is - admittedly - not the newset available, maybe there already has been a fix concerning this? I haven't updated iptables yet, since the target platform is somewhat limited and not fully under my control. Any help would be great. From pch at coolsystems.dk Thu Mar 9 14:45:30 2006 From: pch at coolsystems.dk (Peter Christensen) Date: Thu Mar 9 14:58:34 2006 Subject: Statefull SOCKS filter In-Reply-To: References: <44101CA5.3070004@coolsystems.dk> Message-ID: <4410317A.3070502@coolsystems.dk> AFAIK libipq is about having filters in user space which really isn't my issue. I have no problem writing kernel modules, and moving stuff into user space will only make more overhead without actually adding anything useful. What I am searching for is some solution so that I will not have to keep track of all undergoing connections manually. I imagine that this is already done somewhere in iptables, and if so I find it waste of time to do it again, and spend time creating hash tables etc. etc. As I pointed out, there aren't really much CPU power. Actually, the perfect solution was to write my own OS to the bridge, since I can then minimize useless overhead all around, but this will without doubt take significantly more time than just writing modules for iptables in linux. Claims are that netfilter have stateful packet filtering, which I interpret as an interface which makes it easy to create state machines on IPv4 TCP connections, but I probably have misinterpreted the idear of "stateful packet filtering". I imagine a callback such as this (simplified): int stateful_callback (netfilter_conn_t *conn) { switch (conn->state) { case STATE_1: if (foo) conn->state = STATE_2; else conn->state = STATE_2; break; case STATE_2: // Stuff ... } return (conn->state == STATE_n ? NF_DROP : NF_ACCEPT); } -- Best regards Peter Christensen Developer ------------------ Cool Systems ApS Tel: +45 2888 1600 @ : pch@coolsystems.dk www: www.coolsystems.dk Alexey Toptygin wrote: > > Perhaps libipq and the QUEUE target will do what you want? > > Alexey From kaber at trash.net Thu Mar 9 16:55:35 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Mar 9 17:10:27 2006 Subject: libipq does not shorten package In-Reply-To: <859616420603090530p24d0cb9cg@mail.gmail.com> References: <859616420603090530p24d0cb9cg@mail.gmail.com> Message-ID: <44104FF7.1030104@trash.net> David Vogt wrote: > Dear all, > > I already posted this question on netfilter mailing list, however, I > got no response, so maybe someone around here can help me. > > I am using iptables and queue packets to user space. Packets are > either augmented (with a signature) or shortened (removal of > signature). The augmentation works. > However, when removing the signature, the resulting packet contains > the correct data, but its overall size is equal to the original > (signed) packet. (i.e. original packet size: 83, signed packet size: > 163) Thomas Graf posted a patch that should fix this a couple of days ago, check the list archives. From aef at prismnet.com Thu Mar 9 17:49:27 2006 From: aef at prismnet.com (Allen Francom) Date: Thu Mar 9 18:02:40 2006 Subject: Statefull SOCKS filter In-Reply-To: <44101CA5.3070004@coolsystems.dk> References: <44101CA5.3070004@coolsystems.dk> Message-ID: <20060309104524.B75323@tempest.prismnet.com> Once upon a time I interacted with a project called "Hogwash". This was all layer 2 and seemed to be off to a great start. Sounds more like what you need, "transparent". The maintainer resigned, however the code ran, based on Snort and associated libraries. With a lot of help from others, I made a binding for these rules into IPTables via the QUEUE target... but that wasn't all that clean. Maybe skip the IPTables entirely, and "do like hogwash did". 2 cents... On Thu, 9 Mar 2006, Peter Christensen wrote: > I'm currently in the development of a transparent firewall bridge, whose sole > purpose is to filter our everything but LAN traffic and traffic for a list of > privileged servers on the Internet. Since it is meant to work on an bunch of > different network configurations out-of-box, it must be able to detect and > filter proxy traffic as well. From gregor at net.in.tum.de Thu Mar 9 20:33:49 2006 From: gregor at net.in.tum.de (Gregor Maier) Date: Thu Mar 9 20:46:23 2006 Subject: [PATCH][RFC] Unifiy logging in netfilter using nf_log, take #1.1 In-Reply-To: <20060304173815.GA12715@net.in.tum.de> References: <20060304173815.GA12715@net.in.tum.de> Message-ID: <20060309193349.GA20613@net.in.tum.de> A small change to be applied on top of the take #1 patch. cu Gregor [PATCH] Add defines in xt_LOG.h, that are needed by the userspace iptables program. Signed-off-by: Gregor Maier ===================================================================== diff -ur blubber/include/linux/netfilter/xt_LOG.h net-2.6.17/include/linux/netfilter/xt_LOG.h --- blubber/include/linux/netfilter/xt_LOG.h 2006-03-09 19:57:08.000000000 +0100 +++ net-2.6.17/include/linux/netfilter/xt_LOG.h 2006-03-05 18:50:03.000000000 +0100 @@ -8,13 +8,30 @@ #ifndef _XT_LOG_TARGET_H #define _XT_LOG_TARGET_H -/* make sure not to change this without changing netfilter.h:NF_LOG_* (!) */ +/* make sure not to change this without changing netfilter.h:XT_LOG_* (!) */ +#define XT_LOG_BACKEND_SYSLOG 0x01 +#define XT_LOG_BACKEND_NFLOG 0x02 +#define XT_LOG_BACKEND_MASK (XT_LOG_BACKEND_SYSLOG|XT_LOG_BACKEND_NFLOG) + +/* make sure not to change this without changing netfilter.h:XT_LOG_* (!) */ #define XT_LOG_TCPSEQ 0x01 /* Log TCP sequence numbers */ #define XT_LOG_TCPOPT 0x02 /* Log TCP options */ #define XT_LOG_IPOPT 0x04 /* Log IP options */ #define XT_LOG_UID 0x08 /* Log UID owning local socket */ #define XT_LOG_MASK 0x0f +#define IPT_LOG_TCPSEQ XT_LOG_TCPSEQ /* Log TCP sequence numbers */ +#define IPT_LOG_TCPOPT XT_LOG_TCPOPT /* Log TCP options */ +#define IPT_LOG_IPOPT XT_LOG_IPOPT /* Log IP options */ +#define IPT_LOG_UID XT_LOG_UID /* Log UID owning local socket */ +#define IPT_LOG_MASK XT_LOG_MASK + +#define IP6T_LOG_TCPSEQ XT_LOG_TCPSEQ /* Log TCP sequence numbers */ +#define IP6T_LOG_TCPOPT XT_LOG_TCPOPT /* Log TCP options */ +#define IP6T_LOG_IPOPT XT_LOG_IPOPT /* Log IP options */ +#define IP6T_LOG_UID XT_LOG_UID /* Log UID owning local socket */ +#define IP6T_LOG_MASK XT_LOG_MASK + struct xt_log_info { u_int16_t group; unsigned char backends; From gregor at net.in.tum.de Thu Mar 9 20:36:37 2006 From: gregor at net.in.tum.de (Gregor Maier) Date: Thu Mar 9 20:49:10 2006 Subject: [PATCH][RFC] iptables patch for: Unifiy logging in netfilter using nf_log, take #1 In-Reply-To: <20060304173815.GA12715@net.in.tum.de> References: <20060304173815.GA12715@net.in.tum.de> Message-ID: <20060309193637.GB20613@net.in.tum.de> [PATCH][RFC] This patch adds support for the modified LOG targets from the nf-log-unification patch. * Support for syslog and libnetfilter_log backend * The nf-log backends can be combined (log to one backend, log to both, log to none) * semantics of "old" LOG target are preserved Signed-off-by: Gregor Maier =================================================================== Index: include/linux/netfilter/xt_LOG.h =================================================================== --- include/linux/netfilter/xt_LOG.h (revision 0) +++ include/linux/netfilter/xt_LOG.h (revision 0) @@ -0,0 +1,23 @@ +/* iptables module for logging / LOG target + * + * (C) 2006 Gregor Maier + * + * This software is distributed under GNU GPL v2, 1991 + * +*/ +#ifndef _XT_LOG_TARGET_H +#define _XT_LOG_TARGET_H + +/* This file intentionally left empty + * + * When the kernel doesn't provive xt_LOG.h this file will + * be included, so the preprocessor won't complain + * + * In the "real" xt_LOG.h file XT_LOG_BACKEND_SYSLOG is defined + * and we use this to determine if the xt_LOG target is used + * or the ipt_LOG, ip6t_LOG targets + * + * XXX: There must be a way to do this more nicely + */ + +#endif /* _XT_LOG_TARGET_H */ Index: extensions/libip6t_LOG.c =================================================================== --- extensions/libip6t_LOG.c (revision 6554) +++ extensions/libip6t_LOG.c (working copy) @@ -1,4 +1,4 @@ -/* Shared library add-on to iptables to add LOG support. */ +/* Shared library add-on to ip6tables to add LOG support. */ #include #include #include @@ -6,35 +6,55 @@ #include #include #include +/* xt_LOG.h is either from the kernel (then xt_LOG target is + * available) and XT_LOG_BACKEND_* are defined). Otherwise + * xt_LOG.h from the iptables include dir is used and XT_LOG_BACKEND_ + * is not defined + */ +#include #include + +#ifndef XT_LOG_BACKEND_SYSLOG #include +#define xt_log_info ip6t_log_info +#endif + -#ifndef IP6T_LOG_UID /* Old kernel */ -#define IP6T_LOG_UID 0x08 +#define LOG_DEFAULT_LEVEL LOG_WARNING + +#ifndef IP6T_LOG_UID /* Old kernel */ +#define IP6T_LOG_UID 0x08 /* Log UID owning local socket */ #undef IP6T_LOG_MASK #define IP6T_LOG_MASK 0x0f #endif -#define LOG_DEFAULT_LEVEL LOG_WARNING - /* Function which prints out usage message. */ static void help(void) { printf( "LOG v%s options:\n" +" --log-prefix prefix Prefix log messages with this prefix.\n" +#ifdef XT_LOG_BACKEND_SYSLOG +" --log-no-syslog Do not use syslog backend for logging\n" +" --log-group group Log to nf_netlink_log group group (0 - 65535)\n" +" Options when using syslog backend:\n" +#endif " --log-level level Level of logging (numeric or see syslog.conf)\n" -" --log-prefix prefix Prefix log messages with this prefix.\n\n" -" --log-tcp-sequence Log TCP sequence numbers.\n\n" -" --log-tcp-options Log TCP options.\n\n" -" --log-ip-options Log IP options.\n\n" +" --log-tcp-sequence Log TCP sequence numbers.\n" +" --log-tcp-options Log TCP options.\n" +" --log-ip-options Log IP options.\n" " --log-uid Log UID owning the local socket.\n\n", IPTABLES_VERSION); } static struct option opts[] = { + { .name = "log-prefix", .has_arg = 1, .flag = 0, .val = '#' }, +#ifdef XT_LOG_BACKEND_SYSLOG + { .name = "log-no-syslog", .has_arg = 0, .flag = 0, .val = 's' }, + { .name = "log-group", .has_arg = 1, .flag = 0, .val = 'g' }, +#endif { .name = "log-level", .has_arg = 1, .flag = 0, .val = '!' }, - { .name = "log-prefix", .has_arg = 1, .flag = 0, .val = '#' }, { .name = "log-tcp-sequence", .has_arg = 0, .flag = 0, .val = '1' }, { .name = "log-tcp-options", .has_arg = 0, .flag = 0, .val = '2' }, { .name = "log-ip-options", .has_arg = 0, .flag = 0, .val = '3' }, @@ -46,8 +66,11 @@ static void init(struct ip6t_entry_target *t, unsigned int *nfcache) { - struct ip6t_log_info *loginfo = (struct ip6t_log_info *)t->data; + struct xt_log_info *loginfo = (struct xt_log_info *)t->data; +#ifdef XT_LOG_BACKEND_SYSLOG + loginfo->backends = XT_LOG_BACKEND_SYSLOG; +#endif loginfo->level = LOG_DEFAULT_LEVEL; } @@ -105,6 +128,11 @@ #define IP6T_LOG_OPT_TCPOPT 0x08 #define IP6T_LOG_OPT_IPOPT 0x10 #define IP6T_LOG_OPT_UID 0x20 +/* We don't ifdef those, since they don't hurt. And we can use them in parse to + * save tons of ifdefs */ +#define IP6T_LOG_OPT_SYSLOG_MASK (IP6T_LOG_OPT_LEVEL | IP6T_LOG_OPT_TCPSEQ | IP6T_LOG_OPT_TCPOPT | IP6T_LOG_OPT_IPOPT | IP6T_LOG_OPT_UID) +#define IP6T_LOG_OPT_NO_SYSLOG 0x40 +#define IP6T_LOG_OPT_GROUP 0x80 /* Function which parses command options; returns true if it ate an option */ @@ -113,10 +141,45 @@ const struct ip6t_entry *entry, struct ip6t_entry_target **target) { - struct ip6t_log_info *loginfo = (struct ip6t_log_info *)(*target)->data; + struct xt_log_info *loginfo = (struct xt_log_info *)(*target)->data; switch (c) { +#ifdef XT_LOG_BACKEND_SYSLOG /* kernel with stackable loggers */ + unsigned group; + case 's': + if (*flags & IP6T_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-no-syslog twice"); + + /* check if any of the syslog related opts has been specified */ + if (*flags & IP6T_LOG_OPT_SYSLOG_MASK) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-no-syslog when using syslog only options"); + loginfo->backends &= (~XT_LOG_BACKEND_SYSLOG); + *flags |= IP6T_LOG_OPT_NO_SYSLOG; + break; + + case 'g': + if (*flags & IP6T_LOG_OPT_GROUP) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-group twice"); + if (check_inverse(optarg, &invert, NULL, 0)) + exit_error(PARAMETER_PROBLEM, + "Unexpected `!' after --log-group"); + if (string_to_number(optarg, 0, 65535, &group) < 0) + exit_error(PARAMETER_PROBLEM, + "--log-group has to be between 0 and 65535"); + loginfo->group = group; + loginfo->backends |= XT_LOG_BACKEND_NFLOG; + *flags |= IP6T_LOG_OPT_GROUP; + break; +#endif +/* We can savely check for IP6T_LOG_OPT_NO_SYSLOG without the need for ifdefs */ case '!': + if (*flags & IP6T_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-level when not logging to syslog"); + if (*flags & IP6T_LOG_OPT_LEVEL) exit_error(PARAMETER_PROBLEM, "Can't specify --log-level twice"); @@ -152,6 +215,10 @@ break; case '1': + if (*flags & IP6T_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-tcp-sequence when not logging to syslog"); + if (*flags & IP6T_LOG_OPT_TCPSEQ) exit_error(PARAMETER_PROBLEM, "Can't specify --log-tcp-sequence " @@ -162,6 +229,10 @@ break; case '2': + if (*flags & IP6T_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-tcp-options when not logging to syslog"); + if (*flags & IP6T_LOG_OPT_TCPOPT) exit_error(PARAMETER_PROBLEM, "Can't specify --log-tcp-options twice"); @@ -171,6 +242,10 @@ break; case '3': + if (*flags & IP6T_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-ip-options when not logging to syslog"); + if (*flags & IP6T_LOG_OPT_IPOPT) exit_error(PARAMETER_PROBLEM, "Can't specify --log-ip-options twice"); @@ -180,10 +255,14 @@ break; case '4': + if (*flags & IP6T_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-uid when not logging to syslog"); + if (*flags & IP6T_LOG_OPT_UID) exit_error(PARAMETER_PROBLEM, "Can't specify --log-uid twice"); - + loginfo->logflags |= IP6T_LOG_UID; *flags |= IP6T_LOG_OPT_UID; break; @@ -206,35 +285,63 @@ const struct ip6t_entry_target *target, int numeric) { - const struct ip6t_log_info *loginfo - = (const struct ip6t_log_info *)target->data; + const struct xt_log_info *loginfo + = (const struct xt_log_info *)target->data; unsigned int i = 0; + /* These flag saves some ifdefs and thus make the code more readable */ + int use_syslog = 1; /* is uselog used as backend ? */ +#ifdef XT_LOG_BACKEND_SYSLOG + if (!(loginfo->backends & XT_LOG_BACKEND_SYSLOG)) + use_syslog = 0; +#endif + printf("LOG "); - if (numeric) - printf("flags %u level %u ", - loginfo->logflags, loginfo->level); + if (numeric) { +#ifdef XT_LOG_BACKEND_SYSLOG + if (loginfo->backends & XT_LOG_BACKEND_NFLOG) + printf("group %u ", loginfo->group); +#endif + + if (!use_syslog) + printf("no-syslog "); + else + printf("flags %u level %u ", + loginfo->logflags, loginfo->level); + } else { - for (i = 0; - i < sizeof(ip6t_log_names) / sizeof(struct ip6t_log_names); - i++) { - if (loginfo->level == ip6t_log_names[i].level) { - printf("level %s ", ip6t_log_names[i].name); - break; + if (use_syslog) { + for (i = 0; + i < sizeof(ip6t_log_names) / sizeof(struct ip6t_log_names); + i++) { + if (loginfo->level == ip6t_log_names[i].level) { + printf("level %s ", ip6t_log_names[i].name); + break; + } } } - if (i == sizeof(ip6t_log_names) / sizeof(struct ip6t_log_names)) - printf("UNKNOWN level %u ", loginfo->level); - if (loginfo->logflags & IP6T_LOG_TCPSEQ) - printf("tcp-sequence "); - if (loginfo->logflags & IP6T_LOG_TCPOPT) - printf("tcp-options "); - if (loginfo->logflags & IP6T_LOG_IPOPT) - printf("ip-options "); - if (loginfo->logflags & IP6T_LOG_UID) - printf("uid "); - if (loginfo->logflags & ~(IP6T_LOG_MASK)) - printf("unknown-flags "); + +#ifdef XT_LOG_BACKEND_SYSLOG + if (loginfo->backends & XT_LOG_BACKEND_NFLOG) + printf("group %u ", loginfo->group); +#endif + if (!use_syslog) { + printf("no-syslog "); + } + else { + if (i == sizeof(ip6t_log_names) / sizeof(struct ip6t_log_names)) + printf("UNKNOWN level %u ", loginfo->level); + if (loginfo->logflags & IP6T_LOG_TCPSEQ) + printf("tcp-sequence "); + if (loginfo->logflags & IP6T_LOG_TCPOPT) + printf("tcp-options "); + if (loginfo->logflags & IP6T_LOG_IPOPT) + printf("ip-options "); + if (loginfo->logflags & IP6T_LOG_UID) + printf("uid "); + if (loginfo->logflags & ~(IP6T_LOG_MASK)) + printf("unknown-flags "); + } } if (strcmp(loginfo->prefix, "") != 0) @@ -245,14 +352,21 @@ static void save(const struct ip6t_ip6 *ip, const struct ip6t_entry_target *target) { - const struct ip6t_log_info *loginfo - = (const struct ip6t_log_info *)target->data; - + const struct xt_log_info *loginfo + = (const struct xt_log_info *)target->data; +#ifdef XT_LOG_BACKEND_SYSLOG + if (loginfo->backends & XT_LOG_BACKEND_NFLOG) + printf("--log-group %u ", loginfo->group); + if (!(loginfo->backends & XT_LOG_BACKEND_SYSLOG)) + printf("--log-no-syslog "); +#endif + /* Don't need to check for no-syslog, since the options won't be set if + * not logging to syslog */ if (strcmp(loginfo->prefix, "") != 0) printf("--log-prefix \"%s\" ", loginfo->prefix); if (loginfo->level != LOG_DEFAULT_LEVEL) - printf("--log-level %d ", loginfo->level); + printf("--log-level %u ", loginfo->level); if (loginfo->logflags & IP6T_LOG_TCPSEQ) printf("--log-tcp-sequence "); @@ -269,8 +383,9 @@ = { .name = "LOG", .version = IPTABLES_VERSION, - .size = IP6T_ALIGN(sizeof(struct ip6t_log_info)), - .userspacesize = IP6T_ALIGN(sizeof(struct ip6t_log_info)), +/* IP6T_ALIGN is defined as XT_ALIGN if available */ + .size = IP6T_ALIGN(sizeof(struct xt_log_info)), + .userspacesize = IP6T_ALIGN(sizeof(struct xt_log_info)), .help = &help, .init = &init, .parse = &parse, Index: extensions/libip6t_LOG.man =================================================================== --- extensions/libip6t_LOG.man (revision 6554) +++ extensions/libip6t_LOG.man (working copy) @@ -1,22 +1,39 @@ Turn on kernel logging of matching packets. When this option is set -for a rule, the Linux kernel will print some information on all -matching packets (like most IPv6 IPv6-header fields) via the kernel log -(where it can be read with +for a rule, the Linux kernel logs this packet to all specified logging +backends. There is a syslog backend, which prints some +information on matching packets (like most IPv6 header fields) via the +kernel log (where it can be read with .I dmesg or .IR syslogd (8)). -This is a "non-terminating target", i.e. rule traversal continues at -the next rule. So if you want to LOG the packets you refuse, use two -separate rules with the same matching criteria, first using target LOG -then DROP (or REJECT). +The other backend is nfnetlink_log, which passes +packets to userspace to be handled there. Userspace processes my +subsribe to various log groups and receive the packets. A packet can be +logged to any number of backends. Default is to log only to the syslog +backend. +This is a "non-terminating target", i.e. rule traversal +continues at the next rule. So if you want to LOG the packets you +refuse, use two separate rules with the same matching criteria, first +using target LOG then DROP (or REJECT). .TP -.BI "--log-level " "level" -Level of logging (numeric or see \fIsyslog.conf\fP(5)). -.TP .BI "--log-prefix " "prefix" Prefix log messages with the specified prefix; up to 29 letters long, and useful for distinguishing messages in the logs. .TP +.B --log-no-syslog +Do not log to syslog backend. +.TP +.B Options for nfnetlink_log backend +.TP +.BI "--log-group " "group" +Log this packet to nfnetlink_log backend. Group specifies the log group for +this target. +.TP +.B Options for syslog backend +.TP +.BI "--log-level " "level" +Level of logging (numeric or see \fIsyslog.conf\fP(5)). +.TP .B --log-tcp-sequence Log TCP sequence numbers. This is a security risk if the log is readable by users. Index: extensions/libipt_LOG.c =================================================================== --- extensions/libipt_LOG.c (revision 6554) +++ extensions/libipt_LOG.c (working copy) @@ -6,8 +6,19 @@ #include #include #include +/* xt_LOG.h is either from the kernel (then xt_LOG target is + * available) and XT_LOG_BACKEND_* are defined). Otherwise + * xt_LOG.h from the iptables include dir is used and XT_LOG_BACKEND_ + * is not defined + */ +#include #include + +#ifndef XT_LOG_BACKEND_SYSLOG #include +#define xt_log_info ipt_log_info +#endif + #define LOG_DEFAULT_LEVEL LOG_WARNING @@ -23,18 +34,27 @@ { printf( "LOG v%s options:\n" +" --log-prefix prefix Prefix log messages with this prefix.\n" +#ifdef XT_LOG_BACKEND_SYSLOG +" --log-no-syslog Do not use syslog backend for logging\n" +" --log-group group Log to nf_netlink_log group group (0 - 65535)\n" +" Options when using syslog backend:\n" +#endif " --log-level level Level of logging (numeric or see syslog.conf)\n" -" --log-prefix prefix Prefix log messages with this prefix.\n\n" -" --log-tcp-sequence Log TCP sequence numbers.\n\n" -" --log-tcp-options Log TCP options.\n\n" -" --log-ip-options Log IP options.\n\n" +" --log-tcp-sequence Log TCP sequence numbers.\n" +" --log-tcp-options Log TCP options.\n" +" --log-ip-options Log IP options.\n" " --log-uid Log UID owning the local socket.\n\n", IPTABLES_VERSION); } static struct option opts[] = { + { .name = "log-prefix", .has_arg = 1, .flag = 0, .val = '#' }, +#ifdef XT_LOG_BACKEND_SYSLOG + { .name = "log-no-syslog", .has_arg = 0, .flag = 0, .val = 's' }, + { .name = "log-group", .has_arg = 1, .flag = 0, .val = 'g' }, +#endif { .name = "log-level", .has_arg = 1, .flag = 0, .val = '!' }, - { .name = "log-prefix", .has_arg = 1, .flag = 0, .val = '#' }, { .name = "log-tcp-sequence", .has_arg = 0, .flag = 0, .val = '1' }, { .name = "log-tcp-options", .has_arg = 0, .flag = 0, .val = '2' }, { .name = "log-ip-options", .has_arg = 0, .flag = 0, .val = '3' }, @@ -46,8 +66,11 @@ static void init(struct ipt_entry_target *t, unsigned int *nfcache) { - struct ipt_log_info *loginfo = (struct ipt_log_info *)t->data; + struct xt_log_info *loginfo = (struct xt_log_info *)t->data; +#ifdef XT_LOG_BACKEND_SYSLOG + loginfo->backends = XT_LOG_BACKEND_SYSLOG; +#endif loginfo->level = LOG_DEFAULT_LEVEL; } @@ -105,6 +128,11 @@ #define IPT_LOG_OPT_TCPOPT 0x08 #define IPT_LOG_OPT_IPOPT 0x10 #define IPT_LOG_OPT_UID 0x20 +/* We don't ifdef those, since they don't hurt. And we can use them in parse to + * save tons of ifdefs */ +#define IPT_LOG_OPT_SYSLOG_MASK (IPT_LOG_OPT_LEVEL | IPT_LOG_OPT_TCPSEQ | IPT_LOG_OPT_TCPOPT | IPT_LOG_OPT_IPOPT | IPT_LOG_OPT_UID) +#define IPT_LOG_OPT_NO_SYSLOG 0x40 +#define IPT_LOG_OPT_GROUP 0x80 /* Function which parses command options; returns true if it ate an option */ @@ -113,10 +141,45 @@ const struct ipt_entry *entry, struct ipt_entry_target **target) { - struct ipt_log_info *loginfo = (struct ipt_log_info *)(*target)->data; + struct xt_log_info *loginfo = (struct xt_log_info *)(*target)->data; switch (c) { +#ifdef XT_LOG_BACKEND_SYSLOG /* kernel with stackable loggers */ + unsigned group; + case 's': + if (*flags & IPT_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-no-syslog twice"); + + /* check if any of the syslog related opts has been specified */ + if (*flags & IPT_LOG_OPT_SYSLOG_MASK) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-no-syslog when using syslog only options"); + loginfo->backends &= (~XT_LOG_BACKEND_SYSLOG); + *flags |= IPT_LOG_OPT_NO_SYSLOG; + break; + + case 'g': + if (*flags & IPT_LOG_OPT_GROUP) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-group twice"); + if (check_inverse(optarg, &invert, NULL, 0)) + exit_error(PARAMETER_PROBLEM, + "Unexpected `!' after --log-group"); + if (string_to_number(optarg, 0, 65535, &group) < 0) + exit_error(PARAMETER_PROBLEM, + "--log-group has to be between 0 and 65535"); + loginfo->group = group; + loginfo->backends |= XT_LOG_BACKEND_NFLOG; + *flags |= IPT_LOG_OPT_GROUP; + break; +#endif +/* We can savely check for IPT_LOG_OPT_NO_SYSLOG without the need for ifdefs */ case '!': + if (*flags & IPT_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-level when not logging to syslog"); + if (*flags & IPT_LOG_OPT_LEVEL) exit_error(PARAMETER_PROBLEM, "Can't specify --log-level twice"); @@ -152,6 +215,10 @@ break; case '1': + if (*flags & IPT_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-tcp-sequence when not logging to syslog"); + if (*flags & IPT_LOG_OPT_TCPSEQ) exit_error(PARAMETER_PROBLEM, "Can't specify --log-tcp-sequence " @@ -162,6 +229,10 @@ break; case '2': + if (*flags & IPT_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-tcp-options when not logging to syslog"); + if (*flags & IPT_LOG_OPT_TCPOPT) exit_error(PARAMETER_PROBLEM, "Can't specify --log-tcp-options twice"); @@ -171,6 +242,10 @@ break; case '3': + if (*flags & IPT_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-ip-options when not logging to syslog"); + if (*flags & IPT_LOG_OPT_IPOPT) exit_error(PARAMETER_PROBLEM, "Can't specify --log-ip-options twice"); @@ -180,10 +255,14 @@ break; case '4': + if (*flags & IPT_LOG_OPT_NO_SYSLOG) + exit_error(PARAMETER_PROBLEM, + "Can't specify --log-uid when not logging to syslog"); + if (*flags & IPT_LOG_OPT_UID) exit_error(PARAMETER_PROBLEM, "Can't specify --log-uid twice"); - + loginfo->logflags |= IPT_LOG_UID; *flags |= IPT_LOG_OPT_UID; break; @@ -206,35 +285,63 @@ const struct ipt_entry_target *target, int numeric) { - const struct ipt_log_info *loginfo - = (const struct ipt_log_info *)target->data; + const struct xt_log_info *loginfo + = (const struct xt_log_info *)target->data; unsigned int i = 0; + /* These flag saves some ifdefs and thus make the code more readable */ + int use_syslog = 1; /* is uselog used as backend ? */ +#ifdef XT_LOG_BACKEND_SYSLOG + if (!(loginfo->backends & XT_LOG_BACKEND_SYSLOG)) + use_syslog = 0; +#endif + printf("LOG "); - if (numeric) - printf("flags %u level %u ", - loginfo->logflags, loginfo->level); + if (numeric) { +#ifdef XT_LOG_BACKEND_SYSLOG + if (loginfo->backends & XT_LOG_BACKEND_NFLOG) + printf("group %u ", loginfo->group); +#endif + + if (!use_syslog) + printf("no-syslog "); + else + printf("flags %u level %u ", + loginfo->logflags, loginfo->level); + } else { - for (i = 0; - i < sizeof(ipt_log_names) / sizeof(struct ipt_log_names); - i++) { - if (loginfo->level == ipt_log_names[i].level) { - printf("level %s ", ipt_log_names[i].name); - break; + if (use_syslog) { + for (i = 0; + i < sizeof(ipt_log_names) / sizeof(struct ipt_log_names); + i++) { + if (loginfo->level == ipt_log_names[i].level) { + printf("level %s ", ipt_log_names[i].name); + break; + } } } - if (i == sizeof(ipt_log_names) / sizeof(struct ipt_log_names)) - printf("UNKNOWN level %u ", loginfo->level); - if (loginfo->logflags & IPT_LOG_TCPSEQ) - printf("tcp-sequence "); - if (loginfo->logflags & IPT_LOG_TCPOPT) - printf("tcp-options "); - if (loginfo->logflags & IPT_LOG_IPOPT) - printf("ip-options "); - if (loginfo->logflags & IPT_LOG_UID) - printf("uid "); - if (loginfo->logflags & ~(IPT_LOG_MASK)) - printf("unknown-flags "); + +#ifdef XT_LOG_BACKEND_SYSLOG + if (loginfo->backends & XT_LOG_BACKEND_NFLOG) + printf("group %u ", loginfo->group); +#endif + if (!use_syslog) { + printf("no-syslog "); + } + else { + if (i == sizeof(ipt_log_names) / sizeof(struct ipt_log_names)) + printf("UNKNOWN level %u ", loginfo->level); + if (loginfo->logflags & IPT_LOG_TCPSEQ) + printf("tcp-sequence "); + if (loginfo->logflags & IPT_LOG_TCPOPT) + printf("tcp-options "); + if (loginfo->logflags & IPT_LOG_IPOPT) + printf("ip-options "); + if (loginfo->logflags & IPT_LOG_UID) + printf("uid "); + if (loginfo->logflags & ~(IPT_LOG_MASK)) + printf("unknown-flags "); + } } if (strcmp(loginfo->prefix, "") != 0) @@ -245,14 +352,21 @@ static void save(const struct ipt_ip *ip, const struct ipt_entry_target *target) { - const struct ipt_log_info *loginfo - = (const struct ipt_log_info *)target->data; - + const struct xt_log_info *loginfo + = (const struct xt_log_info *)target->data; +#ifdef XT_LOG_BACKEND_SYSLOG + if (loginfo->backends & XT_LOG_BACKEND_NFLOG) + printf("--log-group %u ", loginfo->group); + if (!(loginfo->backends & XT_LOG_BACKEND_SYSLOG)) + printf("--log-no-syslog "); +#endif + /* Don't need to check for no-syslog, since the options won't be set if + * not logging to syslog */ if (strcmp(loginfo->prefix, "") != 0) printf("--log-prefix \"%s\" ", loginfo->prefix); if (loginfo->level != LOG_DEFAULT_LEVEL) - printf("--log-level %d ", loginfo->level); + printf("--log-level %u ", loginfo->level); if (loginfo->logflags & IPT_LOG_TCPSEQ) printf("--log-tcp-sequence "); @@ -269,8 +383,9 @@ = { .name = "LOG", .version = IPTABLES_VERSION, - .size = IPT_ALIGN(sizeof(struct ipt_log_info)), - .userspacesize = IPT_ALIGN(sizeof(struct ipt_log_info)), +/* IPT_ALIGN is defined as XT_ALIGN if available */ + .size = IPT_ALIGN(sizeof(struct xt_log_info)), + .userspacesize = IPT_ALIGN(sizeof(struct xt_log_info)), .help = &help, .init = &init, .parse = &parse, Index: extensions/libipt_LOG.man =================================================================== --- extensions/libipt_LOG.man (revision 6554) +++ extensions/libipt_LOG.man (working copy) @@ -1,22 +1,39 @@ Turn on kernel logging of matching packets. When this option is set -for a rule, the Linux kernel will print some information on all -matching packets (like most IP header fields) via the kernel log -(where it can be read with +for a rule, the Linux kernel logs this packet to all specified logging +backends. There is a syslog backend, which prints some +information on matching packets (like most IP header fields) via the +kernel log (where it can be read with .I dmesg or .IR syslogd (8)). -This is a "non-terminating target", i.e. rule traversal continues at -the next rule. So if you want to LOG the packets you refuse, use two -separate rules with the same matching criteria, first using target LOG -then DROP (or REJECT). +The other backend is nfnetlink_log, which passes +packets to userspace to be handled there. Userspace processes my +subsribe to various log groups and receive the packets. A packet can be +logged to any number of backends. Default is to log only to the syslog +backend. +This is a "non-terminating target", i.e. rule traversal +continues at the next rule. So if you want to LOG the packets you +refuse, use two separate rules with the same matching criteria, first +using target LOG then DROP (or REJECT). .TP -.BI "--log-level " "level" -Level of logging (numeric or see \fIsyslog.conf\fP(5)). -.TP .BI "--log-prefix " "prefix" Prefix log messages with the specified prefix; up to 29 letters long, and useful for distinguishing messages in the logs. .TP +.B --log-no-syslog +Do not log to syslog backend. +.TP +.B Options for nfnetlink_log backend +.TP +.BI "--log-group " "group" +Log this packet to nfnetlink_log backend. Group specifies the log group for +this target. +.TP +.B Options for syslog backend +.TP +.BI "--log-level " "level" +Level of logging (numeric or see \fIsyslog.conf\fP(5)). +.TP .B --log-tcp-sequence Log TCP sequence numbers. This is a security risk if the log is readable by users. Index: Makefile =================================================================== --- Makefile (revision 6554) +++ Makefile (working copy) @@ -12,7 +12,7 @@ TOPLEVEL_INCLUDED=YES ifndef KERNEL_DIR -KERNEL_DIR=/usr/src/linux +KERNEL_DIR=/usr/src/linux-2.6.15.2 endif IPTABLES_VERSION:=1.3.5 OLD_IPTABLES_VERSION:=1.3.4 From davem at davemloft.net Fri Mar 10 00:29:34 2006 From: davem at davemloft.net (David S. Miller) Date: Fri Mar 10 00:42:46 2006 Subject: {get|set}sockopt compat layer In-Reply-To: <200603091324.00362.dim@openvz.org> References: <200603071707.19138.dim@openvz.org> <200603071605.39177.arnd@arndb.de> <200603091324.00362.dim@openvz.org> Message-ID: <20060309.152934.99760924.davem@davemloft.net> From: Dmitry Mishin Date: Thu, 9 Mar 2006 13:23:59 +0300 > Hello, Arnd! > > > For the compat_ioctl stuff, we don't have the function pointer inside an > > #ifdef, the overhead is relatively small since there is only one of these > > structures per module implementing a protocol, but it avoids having to > > rebuild everything when changing CONFIG_COMPAT. > > > > It's probably not a big issue either way, maybe davem has a stronger > > opinion on it either way. > > > Done. I think this looks fine but it doesn't apply cleanly to the current net-2.6.17 tree. Could you cook up a fresh patch, and send it with a complete changelog entry and appropriate Signed-off-by: lines? Thanks a lot Dmitry. From gyhuang at mail.ustc.edu.cn Fri Mar 10 02:41:44 2006 From: gyhuang at mail.ustc.edu.cn (GuanYao Huang) Date: Fri Mar 10 02:55:01 2006 Subject: ip6tables: Unknown error 4294967295 Message-ID: <341954904.20764@ustc.edu.cn> Hi: I am doing research into iptables-1.3.5, in which I am trying to use ROUTE target which is an extension to the current iptables. I added libip6t_ROUTE.h which makes libip6t_ROUTE.c complied. When using the following command: [root@localhost iptables]# /root/CNGI/iptables-1.3.5/ip6tables -A POSTROUTING -t mangle -o eth0 -p tcp --dport 22 -j ROUTE --oif iptun ip6tables: Unknown error 4294967295 I don't know why. Can you help me? Thanks. From beunlovable at gmail.com Fri Mar 10 07:28:59 2006 From: beunlovable at gmail.com (David Vogt) Date: Fri Mar 10 07:42:16 2006 Subject: libipq does not shorten package In-Reply-To: <44104FF7.1030104@trash.net> References: <859616420603090530p24d0cb9cg@mail.gmail.com> <44104FF7.1030104@trash.net> Message-ID: <859616420603092228r23e174dam@mail.gmail.com> 2006/3/9, Patrick McHardy : > David Vogt wrote: > > I am using iptables and queue packets to user space. Packets are > > either augmented (with a signature) or shortened (removal of > > signature). The augmentation works. > > However, when removing the signature, the resulting packet contains > > the correct data, but its overall size is equal to the original > > (signed) packet. (i.e. original packet size: 83, signed packet size: > > 163) > > Thomas Graf posted a patch that should fix this a couple of days > ago, check the list archives. > I haven't seen that patch, altough I have been reading netfilter-devel for quiete some time now. Sorry for the inconvenience. D. From davem at davemloft.net Fri Mar 10 12:13:28 2006 From: davem at davemloft.net (David S. Miller) Date: Fri Mar 10 12:26:43 2006 Subject: [NETFILTER 2.6.16]: Fix wrong option spelling in Makefile for CONFIG_BRIDGE_EBT_ULOG In-Reply-To: <440CA5BF.4010701@trash.net> References: <440CA5BF.4010701@trash.net> Message-ID: <20060310.031328.18995259.davem@davemloft.net> From: Patrick McHardy Date: Mon, 06 Mar 2006 22:12:31 +0100 > this patch fixes an incorrectly named option in the bridge-netfilter > Makefile. Please apply to 2.6.16. Applied, thanks Patrick. From davem at davemloft.net Fri Mar 10 12:34:53 2006 From: davem at davemloft.net (David S. Miller) Date: Fri Mar 10 12:48:03 2006 Subject: [PATCH] {get|set}sockopt compatibility layer In-Reply-To: <200603101421.10920.dim@openvz.org> References: <200603091324.00362.dim@openvz.org> <20060309.152934.99760924.davem@davemloft.net> <200603101421.10920.dim@openvz.org> Message-ID: <20060310.033453.53342192.davem@davemloft.net> From: Dmitry Mishin Date: Fri, 10 Mar 2006 14:21:10 +0300 > This patch extends {get|set}sockopt compatibility layer in order to move > protocol specific parts to their place and avoid huge universal net/compat.c > file in the future. > > Signed-off-by: Dmitry Mishin Applied, thanks Dmitry. Please give "-p1" format patches in the future, I fixed your's up by hand so could feed it to git. Thanks again. From pch at coolsystems.dk Fri Mar 10 13:02:11 2006 From: pch at coolsystems.dk (Peter Christensen) Date: Fri Mar 10 13:15:22 2006 Subject: Statefull SOCKS filter In-Reply-To: <20060309104524.B75323@tempest.prismnet.com> References: <44101CA5.3070004@coolsystems.dk> <20060309104524.B75323@tempest.prismnet.com> Message-ID: <44116AC3.1050506@coolsystems.dk> Hmm, I think I'll just do the connection state maintenance manually... Hopefully I will be able to do it reasonable fast. -- Best regards Peter Christensen Developer ------------------ Cool Systems ApS Tel: +45 2888 1600 @ : pch@coolsystems.dk www: www.coolsystems.dk Allen Francom wrote: > > > Once upon a time I interacted with a project called "Hogwash". > > This was all layer 2 and seemed to be off to a great start. > > Sounds more like what you need, "transparent". > > The maintainer resigned, however the code ran, based on > Snort and associated libraries. > > With a lot of help from others, I made a binding > for these rules into IPTables via the QUEUE target... but > that wasn't all that clean. Maybe skip the IPTables > entirely, and "do like hogwash did". > > 2 cents... > > On Thu, 9 Mar 2006, Peter Christensen wrote: >> I'm currently in the development of a transparent firewall bridge, >> whose sole purpose is to filter our everything but LAN traffic and >> traffic for a list of privileged servers on the Internet. Since it is >> meant to work on an bunch of different network configurations >> out-of-box, it must be able to detect and filter proxy traffic as well. From dim at openvz.org Fri Mar 10 12:21:10 2006 From: dim at openvz.org (Dmitry Mishin) Date: Fri Mar 10 14:08:52 2006 Subject: [PATCH] {get|set}sockopt compatibility layer In-Reply-To: <20060309.152934.99760924.davem@davemloft.net> References: <200603071707.19138.dim@openvz.org> <200603091324.00362.dim@openvz.org> <20060309.152934.99760924.davem@davemloft.net> Message-ID: <200603101421.10920.dim@openvz.org> This patch extends {get|set}sockopt compatibility layer in order to move protocol specific parts to their place and avoid huge universal net/compat.c file in the future. Signed-off-by: Dmitry Mishin -- Thanks, Dmitry. -------------- next part -------------- --- ./include/linux/net.h.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./include/linux/net.h 2006-03-10 12:24:11.000000000 +0300 @@ -149,6 +149,10 @@ struct proto_ops { int optname, char __user *optval, int optlen); int (*getsockopt)(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen); + int (*compat_setsockopt)(struct socket *sock, int level, + int optname, char __user *optval, int optlen); + int (*compat_getsockopt)(struct socket *sock, int level, + int optname, char __user *optval, int __user *optlen); int (*sendmsg) (struct kiocb *iocb, struct socket *sock, struct msghdr *m, size_t total_len); int (*recvmsg) (struct kiocb *iocb, struct socket *sock, --- ./include/linux/netfilter.h.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./include/linux/netfilter.h 2006-03-10 12:24:11.000000000 +0300 @@ -80,10 +80,14 @@ struct nf_sockopt_ops int set_optmin; int set_optmax; int (*set)(struct sock *sk, int optval, void __user *user, unsigned int len); + int (*compat_set)(struct sock *sk, int optval, + void __user *user, unsigned int len); int get_optmin; int get_optmax; int (*get)(struct sock *sk, int optval, void __user *user, int *len); + int (*compat_get)(struct sock *sk, int optval, + void __user *user, int *len); /* Number of users inside set() or get(). */ unsigned int use; @@ -246,6 +250,11 @@ int nf_setsockopt(struct sock *sk, int p int nf_getsockopt(struct sock *sk, int pf, int optval, char __user *opt, int *len); +int compat_nf_setsockopt(struct sock *sk, int pf, int optval, + char __user *opt, int len); +int compat_nf_getsockopt(struct sock *sk, int pf, int optval, + char __user *opt, int *len); + /* Packet queuing */ struct nf_queue_handler { int (*outfn)(struct sk_buff *skb, struct nf_info *info, --- ./include/net/inet_connection_sock.h.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./include/net/inet_connection_sock.h 2006-03-10 12:24:11.000000000 +0300 @@ -50,6 +50,12 @@ struct inet_connection_sock_af_ops { char __user *optval, int optlen); int (*getsockopt)(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); + int (*compat_setsockopt)(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); + int (*compat_getsockopt)(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); void (*addr2sockaddr)(struct sock *sk, struct sockaddr *); int sockaddr_len; }; --- ./include/net/ip.h.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./include/net/ip.h 2006-03-10 12:24:11.000000000 +0300 @@ -356,6 +356,10 @@ extern void ip_cmsg_recv(struct msghdr * extern int ip_cmsg_send(struct msghdr *msg, struct ipcm_cookie *ipc); extern int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); extern int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); +extern int compat_ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen); +extern int compat_ip_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen); extern int ip_ra_control(struct sock *sk, unsigned char on, void (*destructor)(struct sock *)); extern int ip_recv_error(struct sock *sk, struct msghdr *msg, int len); --- ./include/net/ipv6.h.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./include/net/ipv6.h 2006-03-10 13:16:18.000000000 +0300 @@ -520,6 +520,16 @@ extern int ipv6_getsockopt(struct sock int optname, char __user *optval, int __user *optlen); +extern int compat_ipv6_setsockopt(struct sock *sk, + int level, + int optname, + char __user *optval, + int optlen); +extern int compat_ipv6_getsockopt(struct sock *sk, + int level, + int optname, + char __user *optval, + int __user *optlen); extern void ipv6_packet_init(void); --- ./include/net/sctp/structs.h.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./include/net/sctp/structs.h 2006-03-10 12:24:11.000000000 +0300 @@ -514,6 +514,16 @@ struct sctp_af { int optname, char __user *optval, int __user *optlen); + int (*compat_setsockopt) (struct sock *sk, + int level, + int optname, + char __user *optval, + int optlen); + int (*compat_getsockopt) (struct sock *sk, + int level, + int optname, + char __user *optval, + int __user *optlen); struct dst_entry *(*get_dst) (struct sctp_association *asoc, union sctp_addr *daddr, union sctp_addr *saddr); --- ./include/net/sock.h.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./include/net/sock.h 2006-03-10 12:24:11.000000000 +0300 @@ -520,6 +520,14 @@ struct proto { int (*getsockopt)(struct sock *sk, int level, int optname, char __user *optval, int __user *option); + int (*compat_setsockopt)(struct sock *sk, + int level, + int optname, char __user *optval, + int optlen); + int (*compat_getsockopt)(struct sock *sk, + int level, + int optname, char __user *optval, + int __user *option); int (*sendmsg)(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len); int (*recvmsg)(struct kiocb *iocb, struct sock *sk, @@ -816,6 +824,10 @@ extern int sock_common_recvmsg(struct ki struct msghdr *msg, size_t size, int flags); extern int sock_common_setsockopt(struct socket *sock, int level, int optname, char __user *optval, int optlen); +extern int compat_sock_common_getsockopt(struct socket *sock, int level, + int optname, char __user *optval, int __user *optlen); +extern int compat_sock_common_setsockopt(struct socket *sock, int level, + int optname, char __user *optval, int optlen); extern void sk_common_release(struct sock *sk); --- ./include/net/tcp.h.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./include/net/tcp.h 2006-03-10 12:24:11.000000000 +0300 @@ -352,6 +352,12 @@ extern int tcp_getsockopt(struct sock extern int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); +extern int compat_tcp_getsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); +extern int compat_tcp_setsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); extern void tcp_set_keepalive(struct sock *sk, int val); extern int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, --- ./net/compat.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/compat.c 2006-03-10 12:24:11.000000000 +0300 @@ -416,7 +416,7 @@ struct compat_sock_fprog { compat_uptr_t filter; /* struct sock_filter * */ }; -static int do_set_attach_filter(int fd, int level, int optname, +static int do_set_attach_filter(struct socket *sock, int level, int optname, char __user *optval, int optlen) { struct compat_sock_fprog __user *fprog32 = (struct compat_sock_fprog __user *)optval; @@ -432,11 +432,12 @@ static int do_set_attach_filter(int fd, __put_user(compat_ptr(ptr), &kfprog->filter)) return -EFAULT; - return sys_setsockopt(fd, level, optname, (char __user *)kfprog, + return sock_setsockopt(sock, level, optname, (char __user *)kfprog, sizeof(struct sock_fprog)); } -static int do_set_sock_timeout(int fd, int level, int optname, char __user *optval, int optlen) +static int do_set_sock_timeout(struct socket *sock, int level, + int optname, char __user *optval, int optlen) { struct compat_timeval __user *up = (struct compat_timeval __user *) optval; struct timeval ktime; @@ -451,30 +452,61 @@ static int do_set_sock_timeout(int fd, i return -EFAULT; old_fs = get_fs(); set_fs(KERNEL_DS); - err = sys_setsockopt(fd, level, optname, (char *) &ktime, sizeof(ktime)); + err = sock_setsockopt(sock, level, optname, (char *) &ktime, sizeof(ktime)); set_fs(old_fs); return err; } +static int compat_sock_setsockopt(struct socket *sock, int level, int optname, + char __user *optval, int optlen) +{ + if (optname == SO_ATTACH_FILTER) + return do_set_attach_filter(sock, level, optname, + optval, optlen); + if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO) + return do_set_sock_timeout(sock, level, optname, optval, optlen); + + return sock_setsockopt(sock, level, optname, optval, optlen); +} + asmlinkage long compat_sys_setsockopt(int fd, int level, int optname, char __user *optval, int optlen) { + int err; + struct socket *sock; + /* SO_SET_REPLACE seems to be the same in all levels */ if (optname == IPT_SO_SET_REPLACE) return do_netfilter_replace(fd, level, optname, optval, optlen); - if (level == SOL_SOCKET && optname == SO_ATTACH_FILTER) - return do_set_attach_filter(fd, level, optname, - optval, optlen); - if (level == SOL_SOCKET && - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) - return do_set_sock_timeout(fd, level, optname, optval, optlen); - return sys_setsockopt(fd, level, optname, optval, optlen); + if (optlen < 0) + return -EINVAL; + + if ((sock = sockfd_lookup(fd, &err))!=NULL) + { + err = security_socket_setsockopt(sock,level,optname); + if (err) { + sockfd_put(sock); + return err; + } + + if (level == SOL_SOCKET) + err = compat_sock_setsockopt(sock, level, + optname, optval, optlen); + else if (sock->ops->compat_setsockopt) + err = sock->ops->compat_setsockopt(sock, level, + optname, optval, optlen); + else + err = sock->ops->setsockopt(sock, level, + optname, optval, optlen); + sockfd_put(sock); + } + return err; } -static int do_get_sock_timeout(int fd, int level, int optname, +static int do_get_sock_timeout(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { struct compat_timeval __user *up; @@ -490,7 +522,7 @@ static int do_get_sock_timeout(int fd, i len = sizeof(ktime); old_fs = get_fs(); set_fs(KERNEL_DS); - err = sys_getsockopt(fd, level, optname, (char *) &ktime, &len); + err = sock_getsockopt(sock, level, optname, (char *) &ktime, &len); set_fs(old_fs); if (!err) { @@ -503,15 +535,42 @@ static int do_get_sock_timeout(int fd, i return err; } -asmlinkage long compat_sys_getsockopt(int fd, int level, int optname, +static int compat_sock_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { - if (level == SOL_SOCKET && - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) - return do_get_sock_timeout(fd, level, optname, optval, optlen); - return sys_getsockopt(fd, level, optname, optval, optlen); + if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO) + return do_get_sock_timeout(sock, level, optname, optval, optlen); + return sock_getsockopt(sock, level, optname, optval, optlen); } +asmlinkage long compat_sys_getsockopt(int fd, int level, int optname, + char __user *optval, int __user *optlen) +{ + int err; + struct socket *sock; + + if ((sock = sockfd_lookup(fd, &err))!=NULL) + { + err = security_socket_getsockopt(sock, level, + optname); + if (err) { + sockfd_put(sock); + return err; + } + + if (level == SOL_SOCKET) + err = compat_sock_getsockopt(sock, level, + optname, optval, optlen); + else if (sock->ops->compat_getsockopt) + err = sock->ops->compat_getsockopt(sock, level, + optname, optval, optlen); + else + err = sock->ops->getsockopt(sock, level, + optname, optval, optlen); + sockfd_put(sock); + } + return err; +} /* Argument list sizes for compat_sys_socketcall */ #define AL(x) ((x) * sizeof(u32)) static unsigned char nas[18]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3), --- ./net/core/sock.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/core/sock.c 2006-03-10 12:24:11.000000000 +0300 @@ -1385,6 +1385,20 @@ int sock_common_getsockopt(struct socket EXPORT_SYMBOL(sock_common_getsockopt); +#ifdef CONFIG_COMPAT +int compat_sock_common_getsockopt(struct socket *sock, int level, + int optname, char __user *optval, int __user *optlen) +{ + struct sock *sk = sock->sk; + + if (sk->sk_prot->compat_setsockopt) + return sk->sk_prot->compat_getsockopt(sk, level, + optname, optval, optlen); + return sk->sk_prot->getsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL(compat_sock_common_getsockopt); +#endif + int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t size, int flags) { @@ -1414,6 +1428,20 @@ int sock_common_setsockopt(struct socket EXPORT_SYMBOL(sock_common_setsockopt); +#ifdef CONFIG_COMPAT +int compat_sock_common_setsockopt(struct socket *sock, + int level, int optname, char __user *optval, int optlen) +{ + struct sock *sk = sock->sk; + + if (sk->sk_prot->compat_setsockopt) + return sk->sk_prot->compat_setsockopt(sk, level, + optname, optval, optlen); + return sk->sk_prot->setsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL(compat_sock_common_setsockopt); +#endif + void sk_common_release(struct sock *sk) { if (sk->sk_prot->destroy) --- ./net/dccp/dccp.h.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/dccp/dccp.h 2006-03-10 12:24:11.000000000 +0300 @@ -192,6 +192,14 @@ extern int dccp_getsockopt(struct soc char __user *optval, int __user *optlen); extern int dccp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen); +#ifdef CONFIG_COMPAT +extern int compat_dccp_getsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int __user *optlen); +extern int compat_dccp_setsockopt(struct sock *sk, + int level, int optname, + char __user *optval, int optlen); +#endif extern int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg); extern int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t size); --- ./net/dccp/ipv4.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/dccp/ipv4.c 2006-03-10 12:30:33.000000000 +0300 @@ -994,6 +994,10 @@ static struct inet_connection_sock_af_op .net_header_len = sizeof(struct iphdr), .setsockopt = ip_setsockopt, .getsockopt = ip_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif .addr2sockaddr = inet_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in), }; @@ -1040,6 +1044,10 @@ static struct proto dccp_v4_prot = { .init = dccp_v4_init_sock, .setsockopt = dccp_setsockopt, .getsockopt = dccp_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_dccp_setsockopt, + .compat_getsockopt = compat_dccp_getsockopt, +#endif .sendmsg = dccp_sendmsg, .recvmsg = dccp_recvmsg, .backlog_rcv = dccp_v4_do_rcv, @@ -1079,6 +1087,10 @@ static const struct proto_ops inet_dccp_ .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, --- ./net/dccp/ipv6.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/dccp/ipv6.c 2006-03-10 13:18:04.000000000 +0300 @@ -1114,6 +1114,10 @@ static struct inet_connection_sock_af_op .net_header_len = sizeof(struct ipv6hdr), .setsockopt = ipv6_setsockopt, .getsockopt = ipv6_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ipv6_setsockopt, + .compat_getsockopt = compat_ipv6_getsockopt, +#endif .addr2sockaddr = inet6_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in6) }; @@ -1130,6 +1134,10 @@ static struct inet_connection_sock_af_op .net_header_len = sizeof(struct iphdr), .setsockopt = ipv6_setsockopt, .getsockopt = ipv6_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ipv6_setsockopt, + .compat_getsockopt = compat_ipv6_getsockopt, +#endif .addr2sockaddr = inet6_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in6) }; @@ -1167,6 +1175,10 @@ static struct proto dccp_v6_prot = { .init = dccp_v6_init_sock, .setsockopt = dccp_setsockopt, .getsockopt = dccp_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_dccp_setsockopt, + .compat_getsockopt = compat_dccp_getsockopt, +#endif .sendmsg = dccp_sendmsg, .recvmsg = dccp_recvmsg, .backlog_rcv = dccp_v6_do_rcv, @@ -1204,6 +1216,10 @@ static struct proto_ops inet6_dccp_ops = .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, --- ./net/dccp/proto.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/dccp/proto.c 2006-03-10 12:24:11.000000000 +0300 @@ -455,18 +455,13 @@ out_free_val: goto out; } -int dccp_setsockopt(struct sock *sk, int level, int optname, - char __user *optval, int optlen) +static int do_dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) { struct dccp_sock *dp; int err; int val; - if (level != SOL_DCCP) - return inet_csk(sk)->icsk_af_ops->setsockopt(sk, level, - optname, optval, - optlen); - if (optlen < sizeof(int)) return -EINVAL; @@ -512,8 +507,34 @@ int dccp_setsockopt(struct sock *sk, int return err; } +int dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_DCCP) + return inet_csk(sk)->icsk_af_ops->setsockopt(sk, level, + optname, optval, + optlen); + return do_dccp_setsockopt(sk, level, optname, optval, optlen); +} EXPORT_SYMBOL_GPL(dccp_setsockopt); +#ifdef CONFIG_COMPAT +int compat_dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_DCCP) { + if (inet_csk(sk)->icsk_af_ops->compat_setsockopt) + return inet_csk(sk)->icsk_af_ops->compat_setsockopt(sk, + level, optname, optval, optlen); + else + return inet_csk(sk)->icsk_af_ops->setsockopt(sk, + level, optname, optval, optlen); + } + return do_dccp_setsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL_GPL(compat_dccp_setsockopt); +#endif + static int dccp_getsockopt_service(struct sock *sk, int len, __be32 __user *optval, int __user *optlen) @@ -545,16 +566,12 @@ out: return err; } -int dccp_getsockopt(struct sock *sk, int level, int optname, +static int do_dccp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { struct dccp_sock *dp; int val, len; - if (level != SOL_DCCP) - return inet_csk(sk)->icsk_af_ops->getsockopt(sk, level, - optname, optval, - optlen); if (get_user(len, optlen)) return -EFAULT; @@ -587,8 +604,34 @@ int dccp_getsockopt(struct sock *sk, int return 0; } +int dccp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_DCCP) + return inet_csk(sk)->icsk_af_ops->getsockopt(sk, level, + optname, optval, + optlen); + return do_dccp_getsockopt(sk, level, optname, optval, optlen); +} EXPORT_SYMBOL_GPL(dccp_getsockopt); +#ifdef CONFIG_COMPAT +int compat_dccp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_DCCP) { + if (inet_csk(sk)->icsk_af_ops->compat_setsockopt) + return inet_csk(sk)->icsk_af_ops->compat_getsockopt(sk, + level, optname, optval, optlen); + else + return inet_csk(sk)->icsk_af_ops->getsockopt(sk, + level, optname, optval, optlen); + } + return do_dccp_getsockopt(sk, level, optname, optval, optlen); +} +EXPORT_SYMBOL_GPL(compat_dccp_getsockopt); +#endif + int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len) { --- ./net/ipv4/af_inet.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/ipv4/af_inet.c 2006-03-10 12:24:11.000000000 +0300 @@ -802,6 +802,10 @@ const struct proto_ops inet_stream_ops = .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, @@ -823,6 +827,10 @@ const struct proto_ops inet_dgram_ops = .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, @@ -848,6 +856,10 @@ static const struct proto_ops inet_sockr .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, --- ./net/ipv4/ip_sockglue.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/ipv4/ip_sockglue.c 2006-03-10 12:24:11.000000000 +0300 @@ -380,14 +380,12 @@ out: * an IP socket. */ -int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) +static int do_ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) { struct inet_sock *inet = inet_sk(sk); int val=0,err; - if (level != SOL_IP) - return -ENOPROTOOPT; - if (((1< (MRT_BASE + 10)) +#endif + ) { + lock_sock(sk); + err = nf_setsockopt(sk, PF_INET, optname, optval, optlen); + release_sock(sk); + } +#endif + return err; +} + +#ifdef CONFIG_COMPAT +int compat_ip_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) +{ + int err; + + if (level != SOL_IP) + return -ENOPROTOOPT; + + err = do_ip_setsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_HDRINCL && + optname != IP_IPSEC_POLICY && optname != IP_XFRM_POLICY +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > (MRT_BASE + 10)) +#endif + ) { + lock_sock(sk); + err = compat_nf_setsockopt(sk, PF_INET, + optname, optval, optlen); + release_sock(sk); + } +#endif + return err; +} +#endif + /* * Get the options. Note for future reference. The GET of IP options gets the * _received_ ones. The set sets the _sent_ ones. */ -int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) +static int do_ip_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) { struct inet_sock *inet = inet_sk(sk); int val; @@ -1051,17 +1098,8 @@ int ip_getsockopt(struct sock *sk, int l val = inet->freebind; break; default: -#ifdef CONFIG_NETFILTER - val = nf_getsockopt(sk, PF_INET, optname, optval, - &len); - release_sock(sk); - if (val >= 0) - val = put_user(len, optlen); - return val; -#else release_sock(sk); return -ENOPROTOOPT; -#endif } release_sock(sk); @@ -1082,7 +1120,73 @@ int ip_getsockopt(struct sock *sk, int l return 0; } +int ip_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) +{ + int err; + + err = do_ip_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > MRT_BASE+10) +#endif + ) { + int len; + + if(get_user(len,optlen)) + return -EFAULT; + + lock_sock(sk); + err = nf_getsockopt(sk, PF_INET, optname, optval, + &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + return err; + } +#endif + return err; +} + +#ifdef CONFIG_COMPAT +int compat_ip_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) +{ + int err; + + err = do_ip_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS +#ifdef CONFIG_IP_MROUTE + && (optname < MRT_BASE || optname > MRT_BASE+10) +#endif + ) { + int len; + + if(get_user(len,optlen)) + return -EFAULT; + + lock_sock(sk); + err = compat_nf_getsockopt(sk, PF_INET, + optname, optval, &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + return err; + } +#endif + return err; +} +#endif + EXPORT_SYMBOL(ip_cmsg_recv); EXPORT_SYMBOL(ip_getsockopt); EXPORT_SYMBOL(ip_setsockopt); +#ifdef CONFIG_COMPAT +EXPORT_SYMBOL(compat_ip_getsockopt); +EXPORT_SYMBOL(compat_ip_setsockopt); +#endif --- ./net/ipv4/raw.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/ipv4/raw.c 2006-03-10 12:24:11.000000000 +0300 @@ -660,12 +660,9 @@ static int raw_geticmpfilter(struct sock out: return ret; } -static int raw_setsockopt(struct sock *sk, int level, int optname, +static int do_raw_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { - if (level != SOL_RAW) - return ip_setsockopt(sk, level, optname, optval, optlen); - if (optname == ICMP_FILTER) { if (inet_sk(sk)->num != IPPROTO_ICMP) return -EOPNOTSUPP; @@ -675,12 +672,28 @@ static int raw_setsockopt(struct sock *s return -ENOPROTOOPT; } -static int raw_getsockopt(struct sock *sk, int level, int optname, - char __user *optval, int __user *optlen) +static int raw_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) { if (level != SOL_RAW) - return ip_getsockopt(sk, level, optname, optval, optlen); + return ip_setsockopt(sk, level, optname, optval, optlen); + return do_raw_setsockopt(sk, level, optname, optval, optlen); +} +#ifdef CONFIG_COMPAT +static int compat_raw_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + if (level != SOL_RAW) + return compat_ip_setsockopt(sk, level, + optname, optval, optlen); + return do_raw_setsockopt(sk, level, optname, optval, optlen); +} +#endif + +static int do_raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ if (optname == ICMP_FILTER) { if (inet_sk(sk)->num != IPPROTO_ICMP) return -EOPNOTSUPP; @@ -690,6 +703,25 @@ static int raw_getsockopt(struct sock *s return -ENOPROTOOPT; } +static int raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_RAW) + return ip_getsockopt(sk, level, optname, optval, optlen); + return do_raw_getsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +static int compat_raw_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + if (level != SOL_RAW) + return compat_ip_getsockopt(sk, level, + optname, optval, optlen); + return do_raw_getsockopt(sk, level, optname, optval, optlen); +} +#endif + static int raw_ioctl(struct sock *sk, int cmd, unsigned long arg) { switch (cmd) { @@ -728,6 +760,10 @@ struct proto raw_prot = { .init = raw_init, .setsockopt = raw_setsockopt, .getsockopt = raw_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_raw_setsockopt, + .compat_getsockopt = compat_raw_getsockopt, +#endif .sendmsg = raw_sendmsg, .recvmsg = raw_recvmsg, .bind = raw_bind, --- ./net/ipv4/tcp.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/ipv4/tcp.c 2006-03-10 12:24:11.000000000 +0300 @@ -1687,18 +1687,14 @@ int tcp_disconnect(struct sock *sk, int /* * Socket option code for TCP. */ -int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, - int optlen) +static int do_tcp_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); int val; int err = 0; - if (level != SOL_TCP) - return icsk->icsk_af_ops->setsockopt(sk, level, optname, - optval, optlen); - /* This is a string value all the others are int's */ if (optname == TCP_CONGESTION) { char name[TCP_CA_NAME_MAX]; @@ -1871,6 +1867,35 @@ int tcp_setsockopt(struct sock *sk, int return err; } +int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, + int optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) + return icsk->icsk_af_ops->setsockopt(sk, level, optname, + optval, optlen); + return do_tcp_setsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +int compat_tcp_setsockopt(struct sock *sk, int level, + int optname, char __user *optval, int optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) { + if (icsk->icsk_af_ops->compat_setsockopt) + return icsk->icsk_af_ops->compat_setsockopt(sk, + level, optname, optval, optlen); + else + return icsk->icsk_af_ops->setsockopt(sk, + level, optname, optval, optlen); + } + return do_tcp_setsockopt(sk, level, optname, optval, optlen); +} +#endif + /* Return information about state of tcp endpoint in API format. */ void tcp_get_info(struct sock *sk, struct tcp_info *info) { @@ -1931,17 +1956,13 @@ void tcp_get_info(struct sock *sk, struc EXPORT_SYMBOL_GPL(tcp_get_info); -int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, - int __user *optlen) +static int do_tcp_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); int val, len; - if (level != SOL_TCP) - return icsk->icsk_af_ops->getsockopt(sk, level, optname, - optval, optlen); - if (get_user(len, optlen)) return -EFAULT; @@ -2025,6 +2046,34 @@ int tcp_getsockopt(struct sock *sk, int return 0; } +int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, + int __user *optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) + return icsk->icsk_af_ops->getsockopt(sk, level, optname, + optval, optlen); + return do_tcp_getsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +int compat_tcp_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, int __user *optlen) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (level != SOL_TCP) { + if (icsk->icsk_af_ops->compat_getsockopt) + return icsk->icsk_af_ops->compat_getsockopt(sk, + level, optname, optval, optlen); + else + return icsk->icsk_af_ops->getsockopt(sk, + level, optname, optval, optlen); + } + return do_tcp_getsockopt(sk, level, optname, optval, optlen); +} +#endif extern void __skb_cb_too_small_for_tcp(int, int); extern struct tcp_congestion_ops tcp_reno; @@ -2142,3 +2191,7 @@ EXPORT_SYMBOL(tcp_sendpage); EXPORT_SYMBOL(tcp_setsockopt); EXPORT_SYMBOL(tcp_shutdown); EXPORT_SYMBOL(tcp_statistics); +#ifdef CONFIG_COMPAT +EXPORT_SYMBOL(compat_tcp_setsockopt); +EXPORT_SYMBOL(compat_tcp_getsockopt); +#endif --- ./net/ipv4/tcp_ipv4.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/ipv4/tcp_ipv4.c 2006-03-10 12:24:11.000000000 +0300 @@ -1226,6 +1226,10 @@ struct inet_connection_sock_af_ops ipv4_ .net_header_len = sizeof(struct iphdr), .setsockopt = ip_setsockopt, .getsockopt = ip_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif .addr2sockaddr = inet_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in), }; @@ -1808,6 +1812,10 @@ struct proto tcp_prot = { .shutdown = tcp_shutdown, .setsockopt = tcp_setsockopt, .getsockopt = tcp_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_tcp_setsockopt, + .compat_getsockopt = compat_tcp_getsockopt, +#endif .sendmsg = tcp_sendmsg, .recvmsg = tcp_recvmsg, .backlog_rcv = tcp_v4_do_rcv, --- ./net/ipv4/udp.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/ipv4/udp.c 2006-03-10 12:24:11.000000000 +0300 @@ -1207,16 +1207,13 @@ static int udp_destroy_sock(struct sock /* * Socket option code for UDP */ -static int udp_setsockopt(struct sock *sk, int level, int optname, +static int do_udp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { struct udp_sock *up = udp_sk(sk); int val; int err = 0; - if (level != SOL_UDP) - return ip_setsockopt(sk, level, optname, optval, optlen); - if(optlensk_type != SOCK_RAW) - return udp_prot.setsockopt(sk, level, optname, optval, optlen); - - if(level!=SOL_IPV6) - goto out; - if (optval == NULL) val=0; else if (get_user(val, (int __user *) optval)) @@ -613,17 +607,9 @@ done: retv = xfrm_user_policy(sk, optname, optval, optlen); break; -#ifdef CONFIG_NETFILTER - default: - retv = nf_setsockopt(sk, PF_INET6, optname, optval, - optlen); - break; -#endif - } release_sock(sk); -out: return retv; e_inval: @@ -631,6 +617,65 @@ e_inval: return -EINVAL; } +int ipv6_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + int err; + + if (level == SOL_IP && sk->sk_type != SOCK_RAW) + return udp_prot.setsockopt(sk, level, optname, optval, optlen); + + if (level != SOL_IPV6) + return -ENOPROTOOPT; + + err = do_ipv6_setsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IPV6_IPSEC_POLICY && + optname != IPV6_XFRM_POLICY) { + lock_sock(sk); + err = nf_setsockopt(sk, PF_INET6, optname, optval, + optlen); + release_sock(sk); + } +#endif + return err; +} + + +#ifdef CONFIG_COMPAT +int compat_ipv6_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + int err; + + if (level == SOL_IP && sk->sk_type != SOCK_RAW) { + if (udp_prot.compat_setsockopt) + return udp_prot.compat_setsockopt(sk, level, + optname, optval, optlen); + else + return udp_prot.setsockopt(sk, level, + optname, optval, optlen); + } + + if (level != SOL_IPV6) + return -ENOPROTOOPT; + + err = do_ipv6_setsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible ENOPROTOOPTs except default case */ + if (err == -ENOPROTOOPT && optname != IPV6_IPSEC_POLICY && + optname != IPV6_XFRM_POLICY) { + lock_sock(sk); + err = compat_nf_setsockopt(sk, PF_INET6, optname, optval, + optlen); + release_sock(sk); + } +#endif + return err; +} +#endif + static int ipv6_getsockopt_sticky(struct sock *sk, struct ipv6_opt_hdr *hdr, char __user *optval, int len) { @@ -642,17 +687,13 @@ static int ipv6_getsockopt_sticky(struct return len; } -int ipv6_getsockopt(struct sock *sk, int level, int optname, +static int do_ipv6_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { struct ipv6_pinfo *np = inet6_sk(sk); int len; int val; - if (level == SOL_IP && sk->sk_type != SOCK_RAW) - return udp_prot.getsockopt(sk, level, optname, optval, optlen); - if(level!=SOL_IPV6) - return -ENOPROTOOPT; if (get_user(len, optlen)) return -EFAULT; switch (optname) { @@ -842,17 +883,7 @@ int ipv6_getsockopt(struct sock *sk, int break; default: -#ifdef CONFIG_NETFILTER - lock_sock(sk); - val = nf_getsockopt(sk, PF_INET6, optname, optval, - &len); - release_sock(sk); - if (val >= 0) - val = put_user(len, optlen); - return val; -#else return -EINVAL; -#endif } len = min_t(unsigned int, sizeof(int), len); if(put_user(len, optlen)) @@ -862,6 +893,78 @@ int ipv6_getsockopt(struct sock *sk, int return 0; } +int ipv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + int err; + + if (level == SOL_IP && sk->sk_type != SOCK_RAW) + return udp_prot.getsockopt(sk, level, optname, optval, optlen); + + if(level != SOL_IPV6) + return -ENOPROTOOPT; + + err = do_ipv6_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible EINVALs except default case */ + if (err == -ENOPROTOOPT && optname != IPV6_ADDRFORM && + optname != MCAST_MSFILTER) { + int len; + + if (get_user(len, optlen)) + return -EFAULT; + + lock_sock(sk); + err = nf_getsockopt(sk, PF_INET6, optname, optval, + &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + } +#endif + return err; +} + +#ifdef CONFIG_COMPAT +int compat_ipv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + int err; + + if (level == SOL_IP && sk->sk_type != SOCK_RAW) { + if (udp_prot.compat_getsockopt) + return udp_prot.compat_getsockopt(sk, level, + optname, optval, optlen); + else + return udp_prot.getsockopt(sk, level, + optname, optval, optlen); + } + + if(level != SOL_IPV6) + return -ENOPROTOOPT; + + err = do_ipv6_getsockopt(sk, level, optname, optval, optlen); +#ifdef CONFIG_NETFILTER + /* we need to exclude all possible EINVALs except default case */ + if (err == -ENOPROTOOPT && optname != IPV6_ADDRFORM && + optname != MCAST_MSFILTER) { + int len; + + if (get_user(len, optlen)) + return -EFAULT; + + lock_sock(sk); + err = compat_nf_getsockopt(sk, PF_INET6, optname, optval, + &len); + release_sock(sk); + if (err >= 0) + err = put_user(len, optlen); + } +#endif + return err; +} +#endif + void __init ipv6_packet_init(void) { dev_add_pack(&ipv6_packet_type); --- ./net/ipv6/ipv6_syms.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/ipv6/ipv6_syms.c 2006-03-10 13:16:56.000000000 +0300 @@ -18,6 +18,10 @@ EXPORT_SYMBOL(ip6_route_output); EXPORT_SYMBOL(addrconf_lock); EXPORT_SYMBOL(ipv6_setsockopt); EXPORT_SYMBOL(ipv6_getsockopt); +#ifdef CONFIG_COMPAT +EXPORT_SYMBOL(compat_ipv6_setsockopt); +EXPORT_SYMBOL(compat_ipv6_getsockopt); +#endif EXPORT_SYMBOL(inet6_register_protosw); EXPORT_SYMBOL(inet6_unregister_protosw); EXPORT_SYMBOL(inet6_add_protocol); --- ./net/ipv6/raw.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/ipv6/raw.c 2006-03-10 13:34:28.000000000 +0300 @@ -859,29 +859,12 @@ static int rawv6_geticmpfilter(struct so } -static int rawv6_setsockopt(struct sock *sk, int level, int optname, +static int do_rawv6_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { struct raw6_sock *rp = raw6_sk(sk); int val; - switch(level) { - case SOL_RAW: - break; - - case SOL_ICMPV6: - if (inet_sk(sk)->num != IPPROTO_ICMPV6) - return -EOPNOTSUPP; - return rawv6_seticmpfilter(sk, level, optname, optval, - optlen); - case SOL_IPV6: - if (optname == IPV6_CHECKSUM) - break; - default: - return ipv6_setsockopt(sk, level, optname, optval, - optlen); - }; - if (get_user(val, (int __user *)optval)) return -EFAULT; @@ -906,12 +889,9 @@ static int rawv6_setsockopt(struct sock } } -static int rawv6_getsockopt(struct sock *sk, int level, int optname, - char __user *optval, int __user *optlen) +static int rawv6_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) { - struct raw6_sock *rp = raw6_sk(sk); - int val, len; - switch(level) { case SOL_RAW: break; @@ -919,15 +899,47 @@ static int rawv6_getsockopt(struct sock case SOL_ICMPV6: if (inet_sk(sk)->num != IPPROTO_ICMPV6) return -EOPNOTSUPP; - return rawv6_geticmpfilter(sk, level, optname, optval, + return rawv6_seticmpfilter(sk, level, optname, optval, optlen); case SOL_IPV6: if (optname == IPV6_CHECKSUM) break; default: - return ipv6_getsockopt(sk, level, optname, optval, + return ipv6_setsockopt(sk, level, optname, optval, optlen); }; + return do_rawv6_setsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +static int compat_rawv6_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, int optlen) +{ + switch(level) { + case SOL_RAW: + break; + + case SOL_ICMPV6: + if (inet_sk(sk)->num != IPPROTO_ICMPV6) + return -EOPNOTSUPP; + return rawv6_seticmpfilter(sk, level, optname, optval, + optlen); + case SOL_IPV6: + if (optname == IPV6_CHECKSUM) + break; + default: + return compat_ipv6_setsockopt(sk, level, + optname, optval, optlen); + }; + return do_rawv6_setsockopt(sk, level, optname, optval, optlen); +} +#endif + +static int do_rawv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + struct raw6_sock *rp = raw6_sk(sk); + int val, len; if (get_user(len,optlen)) return -EFAULT; @@ -953,6 +965,52 @@ static int rawv6_getsockopt(struct sock return 0; } +static int rawv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + switch(level) { + case SOL_RAW: + break; + + case SOL_ICMPV6: + if (inet_sk(sk)->num != IPPROTO_ICMPV6) + return -EOPNOTSUPP; + return rawv6_geticmpfilter(sk, level, optname, optval, + optlen); + case SOL_IPV6: + if (optname == IPV6_CHECKSUM) + break; + default: + return ipv6_getsockopt(sk, level, optname, optval, + optlen); + }; + return do_rawv6_getsockopt(sk, level, optname, optval, optlen); +} + +#ifdef CONFIG_COMPAT +static int compat_rawv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen) +{ + switch(level) { + case SOL_RAW: + break; + + case SOL_ICMPV6: + if (inet_sk(sk)->num != IPPROTO_ICMPV6) + return -EOPNOTSUPP; + return rawv6_geticmpfilter(sk, level, optname, optval, + optlen); + case SOL_IPV6: + if (optname == IPV6_CHECKSUM) + break; + default: + return compat_ipv6_getsockopt(sk, level, + optname, optval, optlen); + }; + return do_rawv6_getsockopt(sk, level, optname, optval, optlen); +} +#endif + static int rawv6_ioctl(struct sock *sk, int cmd, unsigned long arg) { switch(cmd) { @@ -1008,6 +1066,10 @@ struct proto rawv6_prot = { .destroy = inet6_destroy_sock, .setsockopt = rawv6_setsockopt, .getsockopt = rawv6_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_rawv6_setsockopt, + .compat_getsockopt = compat_rawv6_getsockopt, +#endif .sendmsg = rawv6_sendmsg, .recvmsg = rawv6_recvmsg, .bind = rawv6_bind, --- ./net/ipv6/tcp_ipv6.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/ipv6/tcp_ipv6.c 2006-03-10 13:19:49.000000000 +0300 @@ -1308,6 +1308,10 @@ static struct inet_connection_sock_af_op .setsockopt = ipv6_setsockopt, .getsockopt = ipv6_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ipv6_setsockopt, + .compat_getsockopt = compat_ipv6_getsockopt, +#endif .addr2sockaddr = inet6_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in6) }; @@ -1327,6 +1331,10 @@ static struct inet_connection_sock_af_op .setsockopt = ipv6_setsockopt, .getsockopt = ipv6_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ipv6_setsockopt, + .compat_getsockopt = compat_ipv6_getsockopt, +#endif .addr2sockaddr = inet6_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in6) }; @@ -1566,6 +1574,10 @@ struct proto tcpv6_prot = { .shutdown = tcp_shutdown, .setsockopt = tcp_setsockopt, .getsockopt = tcp_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_tcp_setsockopt, + .compat_getsockopt = compat_tcp_getsockopt, +#endif .sendmsg = tcp_sendmsg, .recvmsg = tcp_recvmsg, .backlog_rcv = tcp_v6_do_rcv, --- ./net/ipv6/udp.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/ipv6/udp.c 2006-03-10 13:26:47.000000000 +0300 @@ -880,16 +880,13 @@ static int udpv6_destroy_sock(struct soc /* * Socket option code for UDP */ -static int udpv6_setsockopt(struct sock *sk, int level, int optname, +static int do_udpv6_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { struct udp_sock *up = udp_sk(sk); int val; int err = 0; - if (level != SOL_UDP) - return ipv6_setsockopt(sk, level, optname, optval, optlen); - if(optlenpf == pf) { + if (get) { + if (val >= ops->get_optmin + && val < ops->get_optmax) { + ops->use++; + mutex_unlock(&nf_sockopt_mutex); + if (ops->compat_get) + ret = ops->compat_get(sk, + val, opt, len); + else + ret = ops->get(sk, + val, opt, len); + goto out; + } + } else { + if (val >= ops->set_optmin + && val < ops->set_optmax) { + ops->use++; + mutex_unlock(&nf_sockopt_mutex); + if (ops->compat_set) + ret = ops->compat_set(sk, + val, opt, *len); + else + ret = ops->set(sk, + val, opt, *len); + goto out; + } + } + } + } + mutex_unlock(&nf_sockopt_mutex); + return -ENOPROTOOPT; + + out: + mutex_lock(&nf_sockopt_mutex); + ops->use--; + if (ops->cleanup_task) + wake_up_process(ops->cleanup_task); + mutex_unlock(&nf_sockopt_mutex); + return ret; +} + +int compat_nf_setsockopt(struct sock *sk, int pf, + int val, char __user *opt, int len) +{ + return compat_nf_sockopt(sk, pf, val, opt, &len, 0); +} +EXPORT_SYMBOL(compat_nf_setsockopt); + +int compat_nf_getsockopt(struct sock *sk, int pf, + int val, char __user *opt, int *len) +{ + return compat_nf_sockopt(sk, pf, val, opt, len, 1); +} +EXPORT_SYMBOL(compat_nf_getsockopt); +#endif --- ./net/sctp/ipv6.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/sctp/ipv6.c 2006-03-10 13:21:06.000000000 +0300 @@ -875,6 +875,10 @@ static const struct proto_ops inet6_seqp .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, @@ -914,6 +918,10 @@ static struct sctp_af sctp_ipv6_specific .sctp_xmit = sctp_v6_xmit, .setsockopt = ipv6_setsockopt, .getsockopt = ipv6_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ipv6_setsockopt, + .compat_getsockopt = compat_ipv6_getsockopt, +#endif .get_dst = sctp_v6_get_dst, .get_saddr = sctp_v6_get_saddr, .copy_addrlist = sctp_v6_copy_addrlist, --- ./net/sctp/protocol.c.compat 2006-03-10 11:58:11.000000000 +0300 +++ ./net/sctp/protocol.c 2006-03-10 12:24:11.000000000 +0300 @@ -845,6 +845,10 @@ static const struct proto_ops inet_seqpa .shutdown = inet_shutdown, /* Looks harmless. */ .setsockopt = sock_common_setsockopt, /* IP_SOL IP_OPTION is a problem. */ .getsockopt = sock_common_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_sock_common_setsockopt, + .compat_getsockopt = compat_sock_common_getsockopt, +#endif .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, @@ -883,6 +887,10 @@ static struct sctp_af sctp_ipv4_specific .sctp_xmit = sctp_v4_xmit, .setsockopt = ip_setsockopt, .getsockopt = ip_getsockopt, +#ifdef CONFIG_COMPAT + .compat_setsockopt = compat_ip_setsockopt, + .compat_getsockopt = compat_ip_getsockopt, +#endif .get_dst = sctp_v4_get_dst, .get_saddr = sctp_v4_get_saddr, .copy_addrlist = sctp_v4_copy_addrlist, From bjo at nefkom.net Fri Mar 10 13:22:46 2006 From: bjo at nefkom.net (Bjo Breiskoll) Date: Fri Mar 10 14:08:53 2006 Subject: Compiling a netfilter shared library Message-ID: <000701c6443d$5700c580$03b2a8c0@bjoserver> Hello. Im currently developing a new netfilter-target for the current netfilter-framework under kernel 2.6.x. I had only a few problems with the kernel-modul registering a new target. After the decision of implementing a few user-mode commands via iptables i need to compile a shared-library for my module. And here is the problem: Is there a "special" way for compiling iptables-shared-libs? For testing purpose i've tried the TCPLAG-Target: The Kernelmodul compiled well but with "gcc -fPIC -c libipt_TCPLAG.c" for the library only compile-errors occour. Maybe i need special compilation arguments? Regards BJO From cnguyen at certicom.com Fri Mar 10 15:57:11 2006 From: cnguyen at certicom.com (Chinh Nguyen) Date: Fri Mar 10 16:20:42 2006 Subject: ip6tables: Unknown error 4294967295 In-Reply-To: <341954904.20764@ustc.edu.cn> References: <341954904.20764@ustc.edu.cn> Message-ID: <441193C7.2070003@certicom.com> GuanYao Huang wrote: > Hi: > I am doing research into iptables-1.3.5, in which I am trying to use ROUTE target > which is an extension to the current iptables. > I added libip6t_ROUTE.h which makes libip6t_ROUTE.c complied. > When using the following command: > [root@localhost iptables]# /root/CNGI/iptables-1.3.5/ip6tables -A POSTROUTING -t > mangle -o eth0 -p tcp --dport 22 -j ROUTE --oif iptun > ip6tables: Unknown error 4294967295 > > I don't know why. Can you help me? Thanks. > > > There are 2 parts to netfilter. The modules that are used by iptables to parse arguments and communicate them to the kernel and the kernel modules that are loaded (or compiled in) with the kernel. One problem could be that your current kernel does not have support for the netfilter module you are trying to used. I have often seen this error associated with an 'invalid argument' returned by the netfilter kernel module. In previous versions of iptables, it will say 'invalid argument' instead of 'Unknown error 4294967295'. This is typically caused by an invalid or missing condition causing the netfilter kernel to reject the rule in its checkentry function. Unfortunately, sometimes all the necessary valid conditions are not enumerated in any iptables manual or checked by the iptables module. For example, consider this /opt/iptables-1.3.5/bin/iptables -A OUTPUT -m esp --espspi ! 0 -j LOG iptables: Unknown error 4294967295 What is not known is that you have to specify '-p esp' if you will to use module 'esp', which becomes apparent if you look at the kernel source code: net/ipv4/netfilter/ipt_esp.c: static int checkentry(const char *tablename, const void *ip_void, void *matchinfo, unsigned int matchinfosize, unsigned int hook_mask) { const struct ipt_esp *espinfo = matchinfo; const struct ipt_ip *ip = ip_void; /* Must specify proto == ESP, and no unknown invflags */ if (ip->proto != IPPROTO_ESP || (ip->invflags & IPT_INV_PROTO)) { duprintf("ipt_esp: Protocol %u != %u\n", ip->proto, IPPROTO_ESP); return 0; } If this is your problem, you might have to do some source code reading :) From max at rfc2324.org Sat Mar 11 04:31:23 2006 From: max at rfc2324.org (Maximilian Wilhelm) Date: Sat Mar 11 04:44:44 2006 Subject: Made ct_sync running with 2.6.15.4... Message-ID: <20060311033122.GA23805@galois.math.uni-paderborn.de> Hi! I'm building a firewall solution for my departement and found ct_sync at [42] while searching for a statefull failover solution. I saw that the patches and the module code were written for kernel version 2.6.10 and experienced rather big trouble patching kernel version 2.6.15.4 :-/ So I began to update the files and got a running version of ct_sync which is now running some days on my test firewalls and is working well after some failovers. The only thing I'm worrying about were many ct entries I produced by nmap -sP which did not vanish after 10++ hours. I had to reboot to get rid of the connections. What I did: 1. branches/netfilter-ha/linux-2.6/patches * make patches fit to 2.6.15.4 by "Index"-lines * removed hunks/patches which are allready in vanilla kernel * made the rest of the patches fit to newest vanilla kernel by diff-by-eyes * added ip_conntrack_hash_insert_nolock.patch (copied from linux-2.6-multigroup) * added ip_conntrack_hash_insert_lock.patck to add a non-locking function. 2. branches/netfilter-ha/linux-2.6/ct_sync * Exchangened some things to fit to newest kernel (All these changes are more or less guessed by looking what has been changed in the upstream netfilter code between 2.6.10 and 2.6.15.4.) - WRITE_LOCK -> write_lock_bh - WRITE_UNLOCK -> write_unlock_bh - READ_LOCK -> read_lock_bh - READ_UNLOCK -> read_unlock_bh - __ip_ct_find_helper_by_name -> __ip_conntrack_helper_find_byname - ct->nat.info.initialized -> ct->status & IPS_NAT_DONE_MASK - h->ctrack -> tuplehash_to_ctrack(h) Let's come to some more interesting changes... :) As 'ct->nat.num_manips' and 'ct->nat.manips' have been removed I had to fix the occurences in ct_sync, but how? I had a look at the linux-2.6-multigroup version of ct_sync and tried to "adjust" my version of ct_sync accordingly. So I just removed the parts with 'ct->nat.manips' and so on. I have the impression, that it works :) Again cheated with linux-2.6-multigroup I replaced 'place_in_hashes' with 'ip_nat_hash_insert'. As 'ip_ct_selective_cleanup' does not exist anymore and google told me, that I should not use it for runtime reasons I switched to 'ip_conntrack_cleanup' at init time, which should have the same effect. As it was not needed anymore I removed 'kill_all'. Because of __ip_conntrack_hash_insert(ct) being static and made me trouble while EXPORTing and requires hash_conntrack, which I didn't want to EXPORT I created ip_conntrack_hash_insert_nolock(ct) in ip_conntrack_core.c and EXPORTed it. I would like someone who knows this code better than me (Harald?) to have a look at my changes and comment on it. As an absolutly newbie in C and netfilter code I'm hoping I did not too much bad things :) Attached you could find my patches against the patches and the ct_sync code each file to be patched as one patch. Ciao Max [42] http://svn.netfilter.org/netfilter/branches/netfilter-ha/linux-2.6/ -- | | Follow the white penguin. | |\/| | |-----------------------------------------------------------. | | |/\| | Rechnerbetrieb Mathematik | Meine Baustellen: TSM | | | Universitaet Paderborn | Hostmaster, Linux, LDAP | -------------- next part -------------- Index: ct_sync.h =================================================================== --- ct_sync.h (revision 6554) +++ ct_sync.h (working copy) @@ -159,7 +159,9 @@ #ifdef CONFIG_IP_NF_NAT_NEEDED int nat_initialized; __u8 nat_num_manips; +#if 0 struct ip_nat_info_manip nat_manips[IP_NAT_MAX_MANIPS]; +#endif char nat_helper[CT_SYNC_NATHELPERSIZE]; union ip_conntrack_nat_help nat_help; struct ip_nat_seq nat_seq[IP_CT_DIR_MAX]; @@ -183,7 +185,6 @@ struct ip_conntrack_tuple tuple, mask; /* expectation tuple and mask */ __u32 seq; /* sequence number */ union ip_conntrack_expect_proto proto; /* protocol specific info */ - union ip_conntrack_expect_help help; /* expectation helper specific info */ }; #ifdef __KERNEL__ -------------- next part -------------- Index: ct_sync_main.c =================================================================== --- ct_sync_main.c (revision 6554) +++ ct_sync_main.c (working copy) @@ -58,8 +58,9 @@ #define CT_SYNC_DUMP_TUPLE(x) #endif -#define ASSERT_READ_LOCK(x) MUST_BE_READ_LOCKED(&ip_conntrack_lock) -#define ASSERT_WRITE_LOCK(x) MUST_BE_WRITE_LOCKED(&ip_conntrack_lock) +#define ASSERT_READ_LOCK(x) +#define ASSERT_WRITE_LOCK(x) + #include #define CT_SYNC_VERSION "0.20" @@ -104,49 +105,7 @@ * FILLING CTSYNC MESSAGES WITH DATA ***********************************************************************/ -#if 0 static int -fill_expectmsg(void *buff, __u8 event, - struct ip_conntrack *master, - struct ip_conntrack_expect *expect) -{ - struct ct_sync_msghdr *hdr = buff; - struct ct_sync_expect *sexp = buff + sizeof(*hdr); - - CT_SYNC_ENTER(); - - /* fill event header */ - hdr->type = event; - hdr->resource = CT_SYNC_RES_EXPECT; - hdr->len = __constant_htons(sizeof(*sexp)); - - /* copy data from expectation */ - memcpy(&sexp->tuple, &expect->tuple, sizeof(sexp->tuple)); - memcpy(&sexp->mask, &expect->mask, sizeof(sexp->mask)); - sexp->seq = expect->seq; - memcpy(&sexp->proto, &expect->proto, sizeof(sexp->proto)); - sexp->help = expect->help; - if (expect->expectant) - memcpy(&sexp->expectant, - &master->tuplehash[IP_CT_DIR_ORIGINAL].tuple, - sizeof(sexp->expectant)); - else - memset(&sexp->expectant, 0, sizeof(sexp->expectant)); - - if (expect->sibling) - memcpy(&sexp->sibling, - &expect->sibling->tuplehash[IP_CT_DIR_ORIGINAL].tuple, - sizeof(sexp->sibling)); - else - memset(&sexp->sibling, 0, sizeof(sexp->sibling)); - - CT_SYNC_LEAVE(); - - return 0; -} -#endif - -static int fill_ctmsg(void *buff, __u8 event, struct ip_conntrack *ct, __u8 flags) { struct ct_sync_msghdr *hdr = buff; @@ -198,21 +157,9 @@ memcpy(&sct->proto, &ct->proto, sizeof(sct->proto)); #ifdef CONFIG_IP_NF_NAT_NEEDED - if (likely(ct->nat.info.initialized)) { + if (likely(ct->status & IPS_NAT_DONE_MASK)) { const struct ip_nat_info *nat = &ct->nat.info; - sct->nat_initialized = nat->initialized; - sct->nat_num_manips = nat->num_manips; - memcpy(sct->nat_manips, &nat->manips, - (nat->num_manips * sizeof(struct ip_nat_info_manip))); - if (unlikely(nat->helper != NULL)) { - strncpy(sct->nat_helper, nat->helper->name, - sizeof(sct->nat_helper)); - memcpy(&sct->nat_help, &ct->nat.help, - sizeof(sct->nat_help)); - } else - sct->nat_helper[0] = '\0'; - memcpy(&sct->nat_seq, &nat->seq, sizeof(sct->nat_seq)); #if defined(CONFIG_IP_NF_TARGET_MASQUERADE) \ || defined(CONFIG_IP_NF_TARGET_MASQUERADE_MODULE) @@ -286,33 +233,6 @@ #ifdef CONFIG_IP_NF_CONNTRACK_MARK ct->mark = (unsigned long) sct->mark; #endif - /* if conntrack has a helper, update helper info */ - if (ct->helper) - memcpy(&ct->help, &sct->help, sizeof(ct->help)); -#ifdef CONFIG_IP_NF_NAT_NEEDED - /* if there is a nat helper present, update helper info */ - if (sct->nat_initialized && ct->nat.info.initialized && - ct->nat.info.helper) - memcpy(&ct->nat.help, &sct->nat_help, sizeof(ct->nat.help)); - - /* if there are more manips initialized in sct than in ct, update ct */ - if (unlikely(sct->nat_num_manips > ct->nat.info.num_manips)) { - int m; - printk(KERN_DEBUG "more manips than first sync !!!\n"); - for (m = ct->nat.info.num_manips; m < sct->nat_num_manips; m++) { - memcpy(&ct->nat.info.manips[m], &sct->nat_manips[m], - sizeof(struct ip_nat_info_manip)); - } - ct->nat.info.num_manips = sct->nat_num_manips; - - WRITE_LOCK(&ip_nat_lock); - if (ct->nat.info.initialized) - replace_in_hashes(ct, &ct->nat.info); - else - place_in_hashes(ct, &ct->nat.info); - WRITE_UNLOCK(&ip_nat_lock); - } -#endif } else { #ifdef CONFIG_IP_NF_NAT_NEEDED struct ip_nat_info *nat = &ct->nat.info; @@ -337,57 +257,28 @@ struct ip_conntrack_helper *helper; sct->helper[CT_SYNC_CTHELPERSIZE - 1] = '\0'; - READ_LOCK(&ip_conntrack_lock); - helper = __ip_ct_find_helper_by_name(sct->helper); + read_lock_bh(&ip_conntrack_lock); + helper = __ip_conntrack_helper_find_byname(sct->helper); if (unlikely(!helper)) { CT_SYNC_ERR("Unknown conntrack helper `%s', " "ignoring.\n", sct->helper); ct->helper = NULL; } else { ct->helper = helper; - memcpy(&ct->help, &sct->help, sizeof(ct->help)); } - READ_UNLOCK(&ip_conntrack_lock); + read_unlock_bh(&ip_conntrack_lock); } #ifdef CONFIG_IP_NF_NAT_NEEDED /* NAT */ INIT_LIST_HEAD(&nat->bysource); - INIT_LIST_HEAD(&nat->byipsproto); - if (likely(sct->nat_initialized && - sct->nat_num_manips <= IP_NAT_MAX_MANIPS)) { + if (likely(sct->status & IPS_NAT_DONE_MASK)) { #if defined(CONFIG_IP_NF_TARGET_MASQUERADE) \ || defined(CONFIG_IP_NF_TARGET_MASQUERADE_MODULE) struct net_device *masq_dev; #endif - nat->initialized = sct->nat_initialized; - /* do not set .conntrack, place_in_hashes will do */ - nat->num_manips = sct->nat_num_manips; - memcpy(&nat->manips, sct->nat_manips, - (sct->nat_num_manips * sizeof(struct ip_nat_info_manip))); - - /* NAT helper, if present */ - if (unlikely(sct->nat_helper[0] != '\0')) { - struct ip_nat_helper *helper; - /* look up nat helper */ - sct->nat_helper[CT_SYNC_NATHELPERSIZE - 1] = '\0'; - READ_LOCK(&ip_nat_lock); - helper = __ip_nat_find_helper_by_name(sct->nat_helper); - if (unlikely(!helper)) { - CT_SYNC_ERR("Unknown NAT helper `%s', ignoring\n", sct->nat_helper); - nat->helper = NULL; - memset(&ct->nat.help, 0, sizeof(ct->nat.help)); - } else { - nat->helper = helper; - memcpy(&ct->nat.help, &sct->nat_help, - sizeof(ct->nat.help)); - } - READ_UNLOCK(&ip_nat_lock); - memcpy(&nat->seq, &sct->nat_seq, sizeof(nat->seq)); - } - #if defined(CONFIG_IP_NF_TARGET_MASQUERADE) \ || defined(CONFIG_IP_NF_TARGET_MASQUERADE_MODULE) if (sct->nat_masq_iface[0] != '\0') { @@ -409,26 +300,24 @@ #endif /* CONFIG_IP_NF_NAT_NEEDED */ /* add to hash tables */ - WRITE_LOCK(&ip_conntrack_lock); + write_lock_bh(&ip_conntrack_lock); if (!__ip_conntrack_find(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, NULL) && !__ip_conntrack_find(&ct->tuplehash[IP_CT_DIR_REPLY].tuple, NULL)) { /* put in conntrack hash */ - __ip_conntrack_hash_insert(ct); + ip_conntrack_hash_insert_nolock(ct); atomic_inc(&ct->ct_general.use); #ifdef CONFIG_IP_NF_NAT_NEEDED /* put in NAT hashes if necessary */ - if (ct->nat.info.initialized) { - WRITE_LOCK(&ip_nat_lock); - place_in_hashes(ct, &ct->nat.info); - WRITE_UNLOCK(&ip_nat_lock); + if (ct->status & IPS_NAT_DONE_MASK) { + ip_nat_hash_insert(ct); } #endif } else { CT_SYNC_ERR("want to put conntrack in hash but is already there\n"); } - WRITE_UNLOCK(&ip_conntrack_lock); + write_unlock_bh(&ip_conntrack_lock); } /* if (new) */ CT_SYNC_LEAVE(); @@ -443,9 +332,9 @@ CT_SYNC_ENTER(); del_timer(&ct->timeout); - WRITE_LOCK(&ip_conntrack_lock); + write_lock_bh(&ip_conntrack_lock); ip_conntrack_clean_from_lists(ct); - WRITE_UNLOCK(&ip_conntrack_lock); + write_unlock_bh(&ip_conntrack_lock); ip_conntrack_put(ct); CT_SYNC_LEAVE(); @@ -457,7 +346,7 @@ _start_ct_timer(const struct ip_conntrack_tuple_hash *h, unsigned int *num, unsigned int *started) { - struct ip_conntrack *ct = h->ctrack; + struct ip_conntrack *ct = tuplehash_to_ctrack(h); if (DIRECTION(h)) return 0; @@ -485,7 +374,7 @@ CT_SYNC_ENTER(); - WRITE_LOCK(&ip_conntrack_lock); + write_lock_bh(&ip_conntrack_lock); for (i = 0; i < ip_conntrack_htable_size; i++) { if (LIST_FIND(&ip_conntrack_hash[i], _start_ct_timer, @@ -494,7 +383,7 @@ break; } - WRITE_UNLOCK(&ip_conntrack_lock); + write_unlock_bh(&ip_conntrack_lock); CT_SYNC_INFO("started timer of %u (total %u) conntrack entries\n", num_started, num_entries); @@ -506,7 +395,7 @@ _stop_ct_timer(const struct ip_conntrack_tuple_hash *h, unsigned int *num, unsigned int *stopped) { - struct ip_conntrack *ct = h->ctrack; + struct ip_conntrack *ct = tuplehash_to_ctrack(h); if (DIRECTION(h)) return 0; @@ -534,7 +423,7 @@ CT_SYNC_ENTER(); - WRITE_LOCK(&ip_conntrack_lock); + write_lock_bh(&ip_conntrack_lock); for (i = 0; i < ip_conntrack_htable_size; i++) { if (LIST_FIND(&ip_conntrack_hash[i], _stop_ct_timer, @@ -543,7 +432,7 @@ break; } - WRITE_UNLOCK(&ip_conntrack_lock); + write_unlock_bh(&ip_conntrack_lock); CT_SYNC_INFO("stopped timer of %u (total %u) conntrack entries\n", num_stopped, num_entries); @@ -580,7 +469,7 @@ h = ip_conntrack_find_get(&sct->orig, NULL); if (h) { - ct = h->ctrack; + ct = tuplehash_to_ctrack(h); } else { CT_SYNC_DEBUG("Conntrack entry not found, creating.\n"); ct = ip_conntrack_alloc(&dummy_tuple, &dummy_tuple); @@ -632,10 +521,12 @@ h = ip_conntrack_find_get(t, NULL); if (h) { - _ct_sync_remove_conntrack(h->ctrack); + struct ip_conntrack *ct = tuplehash_to_ctrack(h); + + _ct_sync_remove_conntrack(ct); CT_SYNC_DEBUG("Deleting conntrack: "); CT_SYNC_DUMP_TUPLE(t); - ip_conntrack_put(h->ctrack); + ip_conntrack_put(ct); } else { CTS_STAT_INC(rx.del_nothere); CT_SYNC_DEBUG("Cannot delete nonexistent conntrack:"); @@ -650,8 +541,6 @@ static int ct_sync_msg_process_updateexpect(void *data, u16 len) { - struct ct_sync_expect *exp = (struct ct_sync_expect *)data; - CT_SYNC_ENTER(); if (unlikely(len < sizeof(struct ct_sync_expect))) { @@ -669,9 +558,6 @@ static int ct_sync_msg_process_delexpect(void *data, u16 len) { - struct ct_sync_expect *sexp = (struct ct_sync_expect *)data; - struct ip_conntrack_expect *exp; - CT_SYNC_ENTER(); if (unlikely(len < sizeof(struct ct_sync_expect))) { @@ -680,22 +566,7 @@ CT_SYNC_LEAVE(); return -1; } -#if 0 - READ_LOCK(&ip_conntrack_lock); - WRITE_LOCK(&ip_conntrack_expect_tuple_lock); - exp = LIST_FIND(&ip_conntrack_expect_list, expect_cmp, - struct ip_conntrack_expect *, exp->tuple); - if (!exp || !exp->expectant) - goto unlock_out; - if (conntrack_tuple_cmp(&exp->expectant, - &exp->expectant->tuplehash[IP_CT_DIR_ORIGINAL].tuple)) - __unexpect_related(exp); - -unlock_out: - WRITE_INLOCK(&ip_conntrack_expect_tuple_lock); - READ_UNLOCK(&ip_donntrack_lock); -#endif CT_SYNC_LEAVE(); return 0; } @@ -1181,7 +1052,7 @@ static inline int _send_initsync(const struct ip_conntrack_tuple_hash *h, unsigned int *num) { - struct ip_conntrack *ct = h->ctrack; + struct ip_conntrack *ct = tuplehash_to_ctrack(h); if (DIRECTION(h)) return 0; @@ -1280,13 +1151,13 @@ if (kthread_should_stop()) break; - READ_LOCK(&ip_conntrack_lock); + read_lock_bh(&ip_conntrack_lock); dump_bucket_locked: if (LIST_FIND(&ip_conntrack_hash[i], _send_initsync, struct ip_conntrack_tuple_hash *, &num_sent)) { - READ_UNLOCK(&ip_conntrack_lock); + read_unlock_bh(&ip_conntrack_lock); break; } if (num_sent < 40 && @@ -1294,7 +1165,7 @@ i++; goto dump_bucket_locked; } - READ_UNLOCK(&ip_conntrack_lock); + read_unlock_bh(&ip_conntrack_lock); num_sent_total += num_sent; @@ -1411,66 +1282,7 @@ CT_SYNC_LEAVE(); } -#if 0 -/* conntrack expectation created notification */ -static void -ct_sync_expect_create(struct ip_conntrack_expect *exp) -{ - struct cts_buff *csb; - struct ip_conntrack *master_ct = exp->expectant; - CT_SYNC_ENTER(); - - if (likely(cts_proto_is_master(cts_cfg.protoh) && - is_confirmed(master_ct))) { - void *buff; - - buff = cts_proto_want_enqueue(cts_cfg.protoh, &csb, - CTMSG_SIZEOF(struct ct_sync_expect)); - if (unlikely(!buff)) { - CT_SYNC_ERR("unable to enqueue event\n"); - CT_SYNC_LEAVE(); - return; - } - fill_expectmsg(buff, CT_SYNC_MSG_UPDATE, master_ct, exp); - csb_use_dec(cts_cfg.protoh, csb); - } - - CT_SYNC_LEAVE(); - - return; -} - -/* conntrack expectation destroyed notification */ -static void -ct_sync_expect_destroy(struct ip_conntrack_expect *exp) -{ - struct cts_buff *csb; - struct ip_conntrack *master_ct = exp->expectant; - - CT_SYNC_ENTER(); - - if (likely(cts_proto_is_master(cts_cfg.protoh) && - is_confirmed(master_ct))) { - void *buff; - - buff = cts_proto_want_enqueue(cts_cfg.protoh, &csb, - CTMSG_SIZEOF(struct ct_sync_expect)); - if (unlikely(!buff)) { - CT_SYNC_ERR("unable to enqueue event\n"); - CT_SYNC_LEAVE(); - return; - } - // FIXME: implementation - csb_use_dec(cts_cfg.protoh, csb); - } - - CT_SYNC_LEAVE(); - - return; -} -#endif - static int ct_sync_notify(struct notifier_block *this, unsigned long events, void *conntrack) @@ -1746,12 +1558,6 @@ * MODULE INITIALIZATION ***********************************************************************/ -static int -kill_all(const struct ip_conntrack *i, void *data) -{ - return 1; -} - static struct task_struct *rcv_thread, *send_thread, *initsync_thread; /* DO NOT declare this as __init!! */ @@ -1817,7 +1623,7 @@ goto error_hook1; if (nf_register_hook(&cts_hook_ops[3]) < 0) goto error_hook2; - ip_ct_selective_cleanup(kill_all, NULL); + ip_conntrack_cleanup(); } /* init protocol layer */ -------------- next part -------------- Index: ct_notifier_pkt.patch =================================================================== --- ct_notifier_pkt.patch (revision 6474) +++ ct_notifier_pkt.patch (working copy) @@ -1,9 +1,9 @@ ===== include/linux/netfilter.h 1.13 vs edited ===== -Index: linux-2.6.10/include/linux/netfilter.h +Index: linux-2.6.15.4/include/linux/netfilter.h =================================================================== ---- linux-2.6.10.orig/include/linux/netfilter.h 2005-01-10 20:23:19.000000000 +0100 -+++ linux-2.6.10/include/linux/netfilter.h 2005-01-10 20:41:44.015934800 +0100 -@@ -21,7 +21,7 @@ +--- linux-2.6.15.4.orig/include/linux/netfilter.h 2005-01-10 20:23:19.000000000 +0100 ++++ linux-2.6.15.4/include/linux/netfilter.h 2005-01-10 20:41:44.015934800 +0100 +@@ -34,7 +34,7 @@ #define NF_MAX_VERDICT NF_REPEAT /* Generic cache responses from hook functions. @@ -12,214 +12,55 @@ #define NFC_UNKNOWN 0x4000 #define NFC_ALTERED 0x8000 -Index: linux-2.6.10/include/linux/netfilter_ipv4.h +Index: linux-2.6.15.4/include/linux/netfilter_ipv4/ip_conntrack.h =================================================================== ---- linux-2.6.10.orig/include/linux/netfilter_ipv4.h 2004-08-14 07:37:39.000000000 +0200 -+++ linux-2.6.10/include/linux/netfilter_ipv4.h 2005-01-10 20:41:44.017934496 +0100 -@@ -8,34 +8,6 @@ - #include - #include +--- linux-2.6.15.4.orig/include/linux/netfilter_ipv4/ip_conntrack.h 2005-01-10 20:23:19.000000000 +0100 ++++ linux-2.6.15.4/include/linux/netfilter_ipv4/ip_conntrack.h 2005-01-10 20:41:44.020934040 +0100 +@@ -207,7 +207,7 @@ --/* IP Cache bits. */ --/* Src IP address. */ --#define NFC_IP_SRC 0x0001 --/* Dest IP address. */ --#define NFC_IP_DST 0x0002 --/* Input device. */ --#define NFC_IP_IF_IN 0x0004 --/* Output device. */ --#define NFC_IP_IF_OUT 0x0008 --/* TOS. */ --#define NFC_IP_TOS 0x0010 --/* Protocol. */ --#define NFC_IP_PROTO 0x0020 --/* IP options. */ --#define NFC_IP_OPTIONS 0x0040 --/* Frag & flags. */ --#define NFC_IP_FRAG 0x0080 -- --/* Per-protocol information: only matters if proto match. */ --/* TCP flags. */ --#define NFC_IP_TCPFLAGS 0x0100 --/* Source port. */ --#define NFC_IP_SRC_PT 0x0200 --/* Dest port. */ --#define NFC_IP_DST_PT 0x0400 --/* Something else about the proto */ --#define NFC_IP_PROTO_UNKNOWN 0x2000 -- - /* IP Hooks */ - /* After promisc drops, checksum checks. */ - #define NF_IP_PRE_ROUTING 0 -Index: linux-2.6.10/include/linux/netfilter_ipv4/ip_conntrack.h -=================================================================== ---- linux-2.6.10.orig/include/linux/netfilter_ipv4/ip_conntrack.h 2005-01-10 20:23:19.000000000 +0100 -+++ linux-2.6.10/include/linux/netfilter_ipv4/ip_conntrack.h 2005-01-10 20:41:44.020934040 +0100 -@@ -47,6 +47,58 @@ - /* Connection is confirmed: originating packet has left box */ - IPS_CONFIRMED_BIT = 3, - IPS_CONFIRMED = (1 << IPS_CONFIRMED_BIT), -+ -+ /* Connection is destroyed (removed from lists), can not be unset. */ -+ IPS_DESTROYED_BIT = 4, -+ IPS_DESTROYED = (1 << IPS_DESTROYED_BIT), -+}; -+ -+/* Connection tracking event bits */ -+enum ip_conntrack_events -+{ -+ /* New conntrack */ -+ IPCT_NEW_BIT = 0, -+ IPCT_NEW = (1 << IPCT_NEW_BIT), -+ -+ /* Expected connection */ -+ IPCT_RELATED_BIT = 1, -+ IPCT_RELATED = (1 << IPCT_RELATED_BIT), -+ -+ /* Destroyed conntrack */ -+ IPCT_DESTROY_BIT = 2, -+ IPCT_DESTROY = (1 << IPCT_DESTROY_BIT), -+ -+ /* Timer has been refreshed */ -+ IPCT_REFRESH_BIT = 3, -+ IPCT_REFRESH = (1 << IPCT_REFRESH_BIT), -+ -+ /* Status has changed */ -+ IPCT_STATUS_BIT = 4, -+ IPCT_STATUS = (1 << IPCT_STATUS_BIT), -+ -+ /* Update of protocol info */ -+ IPCT_PROTOINFO_BIT = 5, -+ IPCT_PROTOINFO = (1 << IPCT_PROTOINFO_BIT), -+ -+ /* Volatile protocol info */ -+ IPCT_PROTOINFO_VOLATILE_BIT = 6, -+ IPCT_PROTOINFO_VOLATILE = (1 << IPCT_PROTOINFO_VOLATILE_BIT), -+ -+ /* New helper for conntrack */ -+ IPCT_HELPER_BIT = 7, -+ IPCT_HELPER = (1 << IPCT_HELPER_BIT), -+ -+ /* Update of helper info */ -+ IPCT_HELPINFO_BIT = 8, -+ IPCT_HELPINFO = (1 << IPCT_HELPINFO_BIT), -+ -+ /* Volatile helper info */ -+ IPCT_HELPINFO_VOLATILE_BIT = 9, -+ IPCT_HELPINFO_VOLATILE = (1 << IPCT_HELPINFO_VOLATILE_BIT), -+ -+ /* NAT info */ -+ IPCT_NATINFO_BIT = 10, -+ IPCT_NATINFO = (1 << IPCT_NATINFO_BIT), - }; + extern void __ip_ct_refresh_acct(struct ip_conntrack *ct, + enum ip_conntrack_info ctinfo, +- const struct sk_buff *skb, ++ struct sk_buff *skb, + unsigned long extra_jiffies, + int do_acct); + +@@ -214,7 +214,7 @@ + /* Refresh conntrack for this many jiffies and do accounting */ + static inline void ip_ct_refresh_acct(struct ip_conntrack *ct, + enum ip_conntrack_info ctinfo, +- const struct sk_buff *skb, ++ struct sk_buff *skb, + unsigned long extra_jiffies) + { + __ip_ct_refresh_acct(ct, ctinfo, skb, extra_jiffies, 1); +@@ -222,7 +222,7 @@ - #include -@@ -263,7 +315,7 @@ /* Refresh conntrack for this many jiffies */ - extern void ip_ct_refresh_acct(struct ip_conntrack *ct, - enum ip_conntrack_info ctinfo, -- const struct sk_buff *skb, -+ struct sk_buff *skb, - unsigned long extra_jiffies); - - /* These are for NAT. Icky. */ -@@ -294,6 +346,11 @@ + static inline void ip_ct_refresh(struct ip_conntrack *ct, +- const struct sk_buff *skb, ++ struct sk_buff *skb, + unsigned long extra_jiffies) + { + __ip_ct_refresh_acct(ct, 0, skb, extra_jiffies, 0); +@@ -294,6 +345,11 @@ return test_bit(IPS_CONFIRMED_BIT, &ct->status); } +static inline int is_destroyed(struct ip_conntrack *ct) +{ -+ return test_bit(IPS_DESTROYED_BIT, &ct->status); ++ return test_bit(IPCT_DESTROY_BIT, &ct->status); +} + extern unsigned int ip_conntrack_htable_size; struct ip_conntrack_stat -@@ -317,6 +374,57 @@ - - #define CONNTRACK_STAT_INC(count) (__get_cpu_var(ip_conntrack_stat).count++) - -+#ifdef CONFIG_IP_NF_CONNTRACK_EVENTS -+#include -+ -+extern struct notifier_block *ip_conntrack_chain; -+ -+static inline int ip_conntrack_register_notifier(struct notifier_block *nb) -+{ -+ return notifier_chain_register(&ip_conntrack_chain, nb); -+} -+ -+static inline int ip_conntrack_unregister_notifier(struct notifier_block *nb) -+{ -+ return notifier_chain_unregister(&ip_conntrack_chain, nb); -+} -+ -+static inline void ip_conntrack_event_cache_init(struct sk_buff *skb) -+{ -+ /* Set to zero first 14 bits, see netfilter.h */ -+ skb->nfcache &= 0xc000; -+} -+ -+static inline void -+ip_conntrack_event_cache(enum ip_conntrack_events event, struct sk_buff *skb) -+{ -+ skb->nfcache |= event; -+} -+ -+static inline void -+ip_conntrack_deliver_cached_events(struct sk_buff *skb) -+{ -+ struct ip_conntrack *ct = (struct ip_conntrack *) skb->nfct; -+ -+ if (ct != NULL && is_confirmed(ct) && !is_destroyed(ct) && skb->nfcache) -+ notifier_call_chain(&ip_conntrack_chain, skb->nfcache, ct); -+} -+ -+static inline void ip_conntrack_event(enum ip_conntrack_events event, -+ struct ip_conntrack *ct) -+{ -+ if (is_confirmed(ct) && !is_destroyed(ct)) -+ notifier_call_chain(&ip_conntrack_chain, event, ct); -+} -+#else /* CONFIG_IP_NF_CONNTRACK_EVENTS */ -+static inline void ip_conntrack_event_cache_init(struct sk_buff *skb) {} -+static inline void ip_conntrack_event_cache(enum ip_conntrack_events event, -+ struct sk_buff *skb) {} -+static inline void ip_conntrack_event(enum ip_conntrack_events event, -+ struct ip_conntrack *ct) {} -+static inline void ip_conntrack_deliver_cached_events(struct sk_buff *skb) {} -+#endif /* CONFIG_IP_NF_CONNTRACK_EVENTS */ -+ - /* eg. PROVIDES_CONNTRACK(ftp); */ - #define PROVIDES_CONNTRACK(name) \ - int needs_ip_conntrack_##name; \ -Index: linux-2.6.10/include/linux/netfilter_ipv4/ip_conntrack_core.h +Index: linux-2.6.15.4/include/linux/netfilter_ipv4/ip_conntrack_protocol.h =================================================================== ---- linux-2.6.10.orig/include/linux/netfilter_ipv4/ip_conntrack_core.h 2005-01-10 20:15:47.000000000 +0100 -+++ linux-2.6.10/include/linux/netfilter_ipv4/ip_conntrack_core.h 2005-01-10 20:41:44.022933736 +0100 -@@ -39,10 +39,14 @@ - /* Confirm a connection: returns NF_DROP if packet must be dropped. */ - static inline int ip_conntrack_confirm(struct sk_buff *skb) - { -+ int ret = NF_ACCEPT; -+ - if (skb->nfct - && !is_confirmed((struct ip_conntrack *)skb->nfct)) -- return __ip_conntrack_confirm(skb); -- return NF_ACCEPT; -+ ret = __ip_conntrack_confirm(skb); -+ ip_conntrack_deliver_cached_events(skb); -+ -+ return ret; - } +--- linux-2.6.15.4.orig/include/linux/netfilter_ipv4/ip_conntrack_protocol.h 2005-01-10 20:15:47.000000000 +0100 ++++ linux-2.6.15.4/include/linux/netfilter_ipv4/ip_conntrack_protocol.h 2005-01-10 20:41:44.023933584 +0100 +@@ -35,7 +35,7 @@ - extern struct list_head *ip_conntrack_hash; -Index: linux-2.6.10/include/linux/netfilter_ipv4/ip_conntrack_protocol.h -=================================================================== ---- linux-2.6.10.orig/include/linux/netfilter_ipv4/ip_conntrack_protocol.h 2005-01-10 20:15:47.000000000 +0100 -+++ linux-2.6.10/include/linux/netfilter_ipv4/ip_conntrack_protocol.h 2005-01-10 20:41:44.023933584 +0100 -@@ -34,7 +34,7 @@ - /* Returns verdict for packet, or -1 for invalid. */ int (*packet)(struct ip_conntrack *conntrack, - const struct sk_buff *skb, @@ -227,11 +68,11 @@ enum ip_conntrack_info ctinfo); /* Called when a new connection for this protocol found; -Index: linux-2.6.10/net/ipv4/netfilter/Kconfig +Index: linux-2.6.15.4/net/ipv4/netfilter/Kconfig =================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/Kconfig 2005-01-10 20:23:29.000000000 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/Kconfig 2005-01-10 20:41:44.027932976 +0100 -@@ -732,5 +732,15 @@ +--- linux-2.6.15.4.orig/net/ipv4/netfilter/Kconfig 2005-01-10 20:23:29.000000000 +0100 ++++ linux-2.6.15.4/net/ipv4/netfilter/Kconfig 2005-01-10 20:41:44.027932976 +0100 +@@ -842,5 +842,15 @@ To compile it as a module, choose M here. If unsure, say N. @@ -247,11 +88,11 @@ + endmenu -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_proto_icmp.c +Index: linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_proto_icmp.c =================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_proto_icmp.c 2005-01-10 20:15:51.000000000 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_proto_icmp.c 2005-01-10 20:41:44.029932672 +0100 -@@ -89,7 +89,7 @@ +--- linux-2.6.15.4.orig/net/ipv4/netfilter/ip_conntrack_proto_icmp.c 2005-01-10 20:15:51.000000000 +0100 ++++ linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_proto_icmp.c 2005-01-10 20:41:44.029932672 +0100 +@@ -90,7 +90,7 @@ /* Returns verdict for packet, or -1 for invalid. */ static int icmp_packet(struct ip_conntrack *ct, @@ -260,18 +101,11 @@ enum ip_conntrack_info ctinfo) { /* Try to delete connection immediately after all replies: -@@ -102,6 +102,7 @@ - ct->timeout.function((unsigned long)ct); - } else { - atomic_inc(&ct->proto.icmp.count); -+ ip_conntrack_event_cache(IPCT_PROTOINFO_VOLATILE, skb); - ip_ct_refresh_acct(ct, ctinfo, skb, ip_ct_icmp_timeout); - } -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_proto_generic.c +Index: linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_proto_generic.c =================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_proto_generic.c 2005-01-10 20:15:51.000000000 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_proto_generic.c 2005-01-10 20:41:44.031932368 +0100 +--- linux-2.6.15.4.orig/net/ipv4/netfilter/ip_conntrack_proto_generic.c 2005-01-10 20:15:51.000000000 +0100 ++++ linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_proto_generic.c 2005-01-10 20:41:44.031932368 +0100 @@ -49,7 +49,7 @@ /* Returns verdict for packet, or -1 for invalid. */ @@ -281,11 +115,11 @@ enum ip_conntrack_info ctinfo) { ip_ct_refresh_acct(conntrack, ctinfo, skb, ip_ct_generic_timeout); -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_proto_sctp.c +Index: linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_proto_sctp.c =================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_proto_sctp.c 2005-01-10 20:15:51.000000000 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_proto_sctp.c 2005-01-10 20:41:44.034931912 +0100 -@@ -310,7 +310,7 @@ +--- linux-2.6.15.4.orig/net/ipv4/netfilter/ip_conntrack_proto_sctp.c 2005-01-10 20:15:51.000000000 +0100 ++++ linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_proto_sctp.c 2005-01-10 20:41:44.034931912 +0100 +@@ -309,7 +309,7 @@ /* Returns verdict for packet, or -1 for invalid. */ static int sctp_packet(struct ip_conntrack *conntrack, @@ -294,148 +128,31 @@ enum ip_conntrack_info ctinfo) { enum sctp_conntrack newconntrack, oldsctpstate; -@@ -405,6 +405,8 @@ - } - - conntrack->proto.sctp.state = newconntrack; -+ if (oldsctpstate != newconntrack) -+ ip_conntrack_event_cache(IPCT_PROTOINFO, skb); - WRITE_UNLOCK(&sctp_lock); - } - -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_core.c +Index: linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_core.c =================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_core.c 2005-01-10 20:23:29.000000000 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_core.c 2005-01-10 20:44:56.354694864 +0100 -@@ -37,6 +37,7 @@ - #include - #include - #include -+#include - - /* This rwlock protects the main hash table, protocol/helper/expected - registrations, conntrack timers*/ -@@ -75,6 +76,10 @@ - struct ip_conntrack ip_conntrack_untracked; - unsigned int ip_ct_log_invalid; - -+#ifdef CONFIG_IP_NF_CONNTRACK_EVENTS -+struct notifier_block *ip_conntrack_chain; -+#endif /* CONFIG_IP_NF_CONNTRACK_EVENTS */ -+ - DEFINE_PER_CPU(struct ip_conntrack_stat, ip_conntrack_stat); - - inline void -@@ -287,6 +292,8 @@ - IP_NF_ASSERT(atomic_read(&nfct->use) == 0); - IP_NF_ASSERT(!timer_pending(&ct->timeout)); - -+ set_bit(IPS_DESTROYED_BIT, &ct->status); -+ - /* To make sure we don't get any weird locking issues here: - * destroy_conntrack() MUST NOT be called with a write lock - * to ip_conntrack_lock!!! -HW */ +--- linux-2.6.15.4.orig/net/ipv4/netfilter/ip_conntrack_core.c 2005-01-10 20:23:29.000000000 +0100 ++++ linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_core.c 2005-01-10 20:44:56.354694864 +0100 @@ -327,6 +334,7 @@ { struct ip_conntrack *ct = (void *)ul_conntrack; + ip_conntrack_event(IPCT_DESTROY, ct); - WRITE_LOCK(&ip_conntrack_lock); + write_lock_bh(&ip_conntrack_lock); /* Inside lock so preempt is disabled on module removal path. * Otherwise we can get spurious warnings. */ -@@ -436,6 +444,14 @@ - set_bit(IPS_CONFIRMED_BIT, &ct->status); - CONNTRACK_STAT_INC(insert); - WRITE_UNLOCK(&ip_conntrack_lock); -+ if (ct->helper) -+ ip_conntrack_event_cache(IPCT_HELPER, skb); -+#ifdef CONFIG_IP_NF_NAT_NEEDED -+ if (ct->nat.info.initialized) -+ ip_conntrack_event_cache(IPCT_NATINFO, skb); -+#endif -+ ip_conntrack_event_cache(master_ct(ct) ? -+ IPCT_RELATED : IPCT_NEW, skb); - return NF_ACCEPT; - } - -@@ -708,6 +724,8 @@ - /* FIXME: Do this right please. --RR */ - (*pskb)->nfcache |= NFC_UNKNOWN; - -+ ip_conntrack_event_cache_init(*pskb); -+ - /* Doesn't cover locally-generated broadcast, so not worth it. */ - #if 0 - /* Ignore broadcast: no `connection'. */ -@@ -769,8 +787,10 @@ - return NF_ACCEPT; - } - } -- if (set_reply) -+ if (set_reply && !test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) { - set_bit(IPS_SEEN_REPLY_BIT, &ct->status); -+ ip_conntrack_event_cache(IPCT_STATUS, *pskb); -+ } - - return ret; - } -@@ -1052,6 +1072,7 @@ - if (i->ctrack->helper == me) { - /* Get rid of any expected. */ - remove_expectations(i->ctrack, 0); -+ ip_conntrack_event(IPCT_HELPER, i->ctrack); - /* And *then* set helper to NULL */ - i->ctrack->helper = NULL; - } -@@ -1092,7 +1113,7 @@ - /* Refresh conntrack for this many jiffies and do accounting (if skb != NULL) */ - void ip_ct_refresh_acct(struct ip_conntrack *ct, +@@ -1119,7 +1130,7 @@ + /* Refresh conntrack for this many jiffies and do accounting if do_acct is 1 */ + void __ip_ct_refresh_acct(struct ip_conntrack *ct, enum ip_conntrack_info ctinfo, - const struct sk_buff *skb, + struct sk_buff *skb, - unsigned long extra_jiffies) + unsigned long extra_jiffies, + int do_acct) { - IP_NF_ASSERT(ct->timeout.data == (unsigned long)ct); -@@ -1107,6 +1128,7 @@ - if (del_timer(&ct->timeout)) { - ct->timeout.expires = jiffies + extra_jiffies; - add_timer(&ct->timeout); -+ ip_conntrack_event_cache(IPCT_REFRESH, skb); - } - ct_add_counters(ct, ctinfo, skb); - WRITE_UNLOCK(&ip_conntrack_lock); -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_ftp.c +Index: linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_proto_tcp.c =================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_ftp.c 2005-01-10 20:23:29.000000000 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_ftp.c 2005-01-10 20:41:44.044930392 +0100 -@@ -300,6 +300,7 @@ - ct_ftp_info->seq_aft_nl[dir] = - ntohl(th->seq) + datalen; - ct_ftp_info->seq_aft_nl_set[dir] = 1; -+ ip_conntrack_event_cache(IPCT_HELPINFO_VOLATILE, skb); - } - } - -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_standalone.c -=================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-01-10 20:23:29.000000000 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-01-10 20:41:44.048929784 +0100 -@@ -881,6 +881,11 @@ - { - } - -+#ifdef CONFIG_IP_NF_CONNTRACK_EVENTS -+EXPORT_SYMBOL(ip_conntrack_chain); -+EXPORT_SYMBOL(ip_conntrack_register_notifier); -+EXPORT_SYMBOL(ip_conntrack_unregister_notifier); -+#endif - EXPORT_SYMBOL(ip_conntrack_protocol_register); - EXPORT_SYMBOL(ip_conntrack_protocol_unregister); - EXPORT_SYMBOL(invert_tuplepr); -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_proto_tcp.c -=================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-01-10 20:23:29.000000000 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-01-10 20:41:44.052929176 +0100 +--- linux-2.6.15.4.orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-01-10 20:23:29.000000000 +0100 ++++ linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-01-10 20:41:44.052929176 +0100 @@ -826,7 +826,7 @@ /* Returns verdict for packet, or -1 for invalid. */ @@ -456,10 +173,10 @@ if (!test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)) { /* If only reply is a RST, we can consider ourselves not to have an established connection: this is a fairly common -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_proto_udp.c +Index: linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_proto_udp.c =================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_proto_udp.c 2005-01-10 20:15:51.000000000 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_proto_udp.c 2005-01-10 20:41:44.055928720 +0100 +--- linux-2.6.15.4.orig/net/ipv4/netfilter/ip_conntrack_proto_udp.c 2005-01-10 20:15:51.000000000 +0100 ++++ linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_proto_udp.c 2005-01-10 20:41:44.055928720 +0100 @@ -64,7 +64,7 @@ /* Returns verdict for packet, and may modify conntracktype */ @@ -469,15 +186,3 @@ enum ip_conntrack_info ctinfo) { /* If we've seen traffic both ways, this is some kind of UDP -@@ -73,7 +73,10 @@ - ip_ct_refresh_acct(conntrack, ctinfo, skb, - ip_ct_udp_timeout_stream); - /* Also, more likely to be important, and not a probe */ -- set_bit(IPS_ASSURED_BIT, &conntrack->status); -+ if (!test_bit(IPS_ASSURED_BIT, &conntrack->status)) { -+ set_bit(IPS_ASSURED_BIT, &conntrack->status); -+ ip_conntrack_event_cache(IPCT_STATUS, skb); -+ } - } else - ip_ct_refresh_acct(conntrack, ctinfo, skb, ip_ct_udp_timeout); - -------------- next part -------------- Index: ct_sync_config_and_makefile.patch =================================================================== --- ct_sync_config_and_makefile.patch (revision 6474) +++ ct_sync_config_and_makefile.patch (working copy) @@ -1,10 +1,10 @@ -Index: linux-2.6.10-ctsync/net/ipv4/netfilter/Makefile +Index: netfilter-2.6.14/net/ipv4/netfilter/Makefile =================================================================== ---- linux-2.6.10-ctsync.orig/net/ipv4/netfilter/Makefile 2005-07-19 11:26:32.516195604 +0200 -+++ linux-2.6.10-ctsync/net/ipv4/netfilter/Makefile 2005-07-19 11:27:42.791714055 +0200 -@@ -16,6 +16,9 @@ - ipfwadm-objs := $(ip_nf_compat-objs) ipfwadm_core.o - ipchains-objs := $(ip_nf_compat-objs) ipchains_core.o +--- netfilter-2.6.14.orig/net/ipv4/netfilter/Makefile 2005-09-18 12:58:03.203433272 +0200 ++++ netfilter-2.6.14/net/ipv4/netfilter/Makefile 2005-10-04 16:23:02.759253880 +0200 +@@ -10,6 +10,9 @@ + ip_conntrack-objs := ip_conntrack_standalone.o ip_conntrack_core.o ip_conntrack_proto_generic.o ip_conntrack_proto_tcp.o ip_conntrack_proto_udp.o ip_conntrack_proto_icmp.o + iptable_nat-objs := ip_nat_standalone.o ip_nat_rule.o ip_nat_core.o ip_nat_helper.o ip_nat_proto_unknown.o ip_nat_proto_tcp.o ip_nat_proto_udp.o ip_nat_proto_icmp.o +# conntrack state synchronization +ct_sync-objs := ct_sync_main.o ct_sync_proto.o ct_sync_sock.o @@ -12,20 +12,20 @@ # connection tracking obj-$(CONFIG_IP_NF_CONNTRACK) += ip_conntrack.o -@@ -101,3 +104,5 @@ - obj-$(CONFIG_IP_NF_COMPAT_IPFWADM) += ipfwadm.o +@@ -103,3 +113,5 @@ - obj-$(CONFIG_IP_NF_QUEUE) += ip_queue.o + # l3 independent conntrack + obj-$(CONFIG_NF_CONNTRACK_IPV4) += nf_conntrack_ipv4.o + +obj-$(CONFIG_IP_NF_CT_SYNC) += ct_sync.o -Index: linux-2.6.10-ctsync/net/ipv4/netfilter/Kconfig +Index: netfilter-2.6.14/net/ipv4/netfilter/Kconfig =================================================================== ---- linux-2.6.10-ctsync.orig/net/ipv4/netfilter/Kconfig 2005-07-19 11:26:32.516195604 +0200 -+++ linux-2.6.10-ctsync/net/ipv4/netfilter/Kconfig 2005-07-19 11:30:00.179497725 +0200 -@@ -742,5 +742,23 @@ - - IF unsure, say `N'. +--- netfilter-2.6.14.orig/net/ipv4/netfilter/Kconfig 2005-09-18 12:58:02.826490576 +0200 ++++ netfilter-2.6.14/net/ipv4/netfilter/Kconfig 2005-10-04 16:22:22.487376136 +0200 +@@ -852,5 +852,23 @@ + To compile it as a module, choose M here. If unsure, say N. + +config IP_NF_CT_SYNC + tristate "Connection tracking state synchronization" + depends on IP_NF_CONNTRACK_EVENTS -------------- next part -------------- Index: export_ip_conntrack_clean_from_lists.patch =================================================================== --- export_ip_conntrack_clean_from_lists.patch (revision 6474) +++ export_ip_conntrack_clean_from_lists.patch (working copy) @@ -1,19 +1,19 @@ -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_standalone.c +Index: linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_standalone.c =================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-01-10 20:54:00.490973576 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-01-10 20:54:18.557227088 +0100 +--- linux-2.6.15.4.orig/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-01-10 20:54:00.490973576 +0100 ++++ linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-01-10 20:54:18.557227088 +0100 @@ -915,6 +915,7 @@ - EXPORT_SYMBOL(ip_conntrack_untracked); - EXPORT_SYMBOL_GPL(ip_conntrack_find_get); + + EXPORT_SYMBOL_GPL(ip_conntrack_flush); EXPORT_SYMBOL_GPL(__ip_conntrack_find); +EXPORT_SYMBOL_GPL(ip_conntrack_clean_from_lists); - EXPORT_SYMBOL_GPL(ip_conntrack_put); - #ifdef CONFIG_IP_NF_NAT_NEEDED - EXPORT_SYMBOL(ip_conntrack_tcp_update); -Index: linux-2.6.10/include/linux/netfilter_ipv4/ip_conntrack_core.h + + EXPORT_SYMBOL_GPL(ip_conntrack_alloc); + EXPORT_SYMBOL_GPL(ip_conntrack_free); +Index: linux-2.6.15.4/include/linux/netfilter_ipv4/ip_conntrack_core.h =================================================================== ---- linux-2.6.10.orig/include/linux/netfilter_ipv4/ip_conntrack_core.h 2005-01-10 20:54:00.492973272 +0100 -+++ linux-2.6.10/include/linux/netfilter_ipv4/ip_conntrack_core.h 2005-01-10 20:54:18.559226784 +0100 +--- linux-2.6.15.4.orig/include/linux/netfilter_ipv4/ip_conntrack_core.h 2005-01-10 20:54:00.492973272 +0100 ++++ linux-2.6.15.4/include/linux/netfilter_ipv4/ip_conntrack_core.h 2005-01-10 20:54:18.559226784 +0100 @@ -54,6 +54,8 @@ return ret; } @@ -23,10 +23,10 @@ extern struct list_head *ip_conntrack_hash; extern struct list_head ip_conntrack_expect_list; DECLARE_RWLOCK_EXTERN(ip_conntrack_lock); -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_core.c +Index: linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_core.c =================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_core.c 2005-01-10 20:54:00.497972512 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_core.c 2005-01-10 20:56:56.733180688 +0100 +--- linux-2.6.15.4.orig/net/ipv4/netfilter/ip_conntrack_core.c 2005-01-10 20:54:00.497972512 +0100 ++++ linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_core.c 2005-01-10 20:56:56.733180688 +0100 @@ -265,12 +265,12 @@ } } @@ -40,15 +40,24 @@ - DEBUGP("clean_from_lists(%p)\n", ct); + DEBUGP("ip_conntrack_clean_from_lists(%p)\n", ct); - MUST_BE_WRITE_LOCKED(&ip_conntrack_lock); + ASSERT_WRITE_LOCK(&ip_conntrack_lock); ho = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); -@@ -339,7 +339,7 @@ +@@ -333,7 +333,7 @@ + ip_conntrack_destroyed(ct); + + write_lock_bh(&ip_conntrack_lock); +- /* Expectations will have been removed in clean_from_lists, ++ /* Expectations will have been removed in ip_conntrack_clean_from_lists, + * except TFTP can create an expectation on the first packet, + * before connection is in the list, so we need to clean here, + * too. */ +@@ -363,7 +363,7 @@ /* Inside lock so preempt is disabled on module removal path. * Otherwise we can get spurious warnings. */ CONNTRACK_STAT_INC(delete_list); - clean_from_lists(ct); + ip_conntrack_clean_from_lists(ct); - WRITE_UNLOCK(&ip_conntrack_lock); + write_unlock_bh(&ip_conntrack_lock); ip_conntrack_put(ct); } -------------- next part -------------- --- linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_core.c 2006-03-07 05:42:07.000000000 +0100 +++ linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_core.max.c 2006-03-07 06:00:34.000000000 +0100 @@ -1388,6 +1388,7 @@ ip_conntrack_htable_size); nf_unregister_sockopt(&so_getorigdst); } +EXPORT_SYMBOL_GPL(ip_conntrack_cleanup); static struct list_head *alloc_hashtable(int size, int *vmalloced) { -------------- next part -------------- Index: export_ip_conntrack_find.patch =================================================================== --- export_ip_conntrack_find.patch (revision 6474) +++ export_ip_conntrack_find.patch (working copy) @@ -1,20 +1,8 @@ -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_standalone.c +Index: linux-2.6.15.4/include/linux/netfilter_ipv4/ip_conntrack_core.h =================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-01-10 20:53:51.796295368 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-01-10 20:54:00.490973576 +0100 -@@ -914,6 +914,7 @@ - EXPORT_SYMBOL(ip_conntrack_hash); - EXPORT_SYMBOL(ip_conntrack_untracked); - EXPORT_SYMBOL_GPL(ip_conntrack_find_get); -+EXPORT_SYMBOL_GPL(__ip_conntrack_find); - EXPORT_SYMBOL_GPL(ip_conntrack_put); - #ifdef CONFIG_IP_NF_NAT_NEEDED - EXPORT_SYMBOL(ip_conntrack_tcp_update); -Index: linux-2.6.10/include/linux/netfilter_ipv4/ip_conntrack_core.h -=================================================================== ---- linux-2.6.10.orig/include/linux/netfilter_ipv4/ip_conntrack_core.h 2005-01-10 20:41:44.022933736 +0100 -+++ linux-2.6.10/include/linux/netfilter_ipv4/ip_conntrack_core.h 2005-01-10 20:54:00.492973272 +0100 -@@ -34,6 +34,11 @@ +--- linux-2.6.15.4.orig/include/linux/netfilter_ipv4/ip_conntrack_core.h 2005-01-10 20:41:44.022933736 +0100 ++++ linux-2.6.15.4/include/linux/netfilter_ipv4/ip_conntrack_core.h 2005-01-10 20:54:00.492973272 +0100 +@@ -36,6 +36,11 @@ ip_conntrack_find_get(const struct ip_conntrack_tuple *tuple, const struct ip_conntrack *ignored_conntrack); @@ -23,19 +11,6 @@ +__ip_conntrack_find(const struct ip_conntrack_tuple *tuple, + const struct ip_conntrack *ignored_conntrack); + - extern int __ip_conntrack_confirm(struct sk_buff *skb); + extern int __ip_conntrack_confirm(struct sk_buff **pskb); /* Confirm a connection: returns NF_DROP if packet must be dropped. */ -Index: linux-2.6.10/net/ipv4/netfilter/ip_conntrack_core.c -=================================================================== ---- linux-2.6.10.orig/net/ipv4/netfilter/ip_conntrack_core.c 2005-01-10 20:53:51.801294608 +0100 -+++ linux-2.6.10/net/ipv4/netfilter/ip_conntrack_core.c 2005-01-10 20:54:00.497972512 +0100 -@@ -354,7 +354,7 @@ - && ip_ct_tuple_equal(tuple, &i->tuple); - } - --static struct ip_conntrack_tuple_hash * -+struct ip_conntrack_tuple_hash * - __ip_conntrack_find(const struct ip_conntrack_tuple *tuple, - const struct ip_conntrack *ignored_conntrack) - { -------------- next part -------------- Index: export_ip_nat_lock_and_hash.patch =================================================================== --- export_ip_nat_lock_and_hash.patch (revision 6474) +++ export_ip_nat_lock_and_hash.patch (working copy) @@ -1,12 +1,61 @@ -Index: linux-2.6.10-ctsync/net/ipv4/netfilter/ip_nat_standalone.c +Index: linux-2.6.15.4/net/ipv4/netfilter/ip_nat_core.c =================================================================== ---- linux-2.6.10-ctsync.orig/net/ipv4/netfilter/ip_nat_standalone.c 2005-07-19 11:24:12.057142833 +0200 -+++ linux-2.6.10-ctsync/net/ipv4/netfilter/ip_nat_standalone.c 2005-07-19 11:27:16.136965119 +0200 -@@ -392,4 +392,7 @@ - EXPORT_SYMBOL(ip_nat_find_helper); - EXPORT_SYMBOL(__ip_nat_find_helper); - EXPORT_SYMBOL_GPL(__ip_nat_find_helper_by_name); -+EXPORT_SYMBOL_GPL(ip_nat_lock); -+EXPORT_SYMBOL_GPL(place_in_hashes); -+EXPORT_SYMBOL_GPL(replace_in_hashes); - MODULE_LICENSE("GPL"); +--- linux-2.6.15.4.orig/net/ipv4/netfilter/ip_nat_core.c 2005-10-04 17:39:04.944696272 +0200 ++++ linux-2.6.15.4/net/ipv4/netfilter/ip_nat_core.c 2005-10-04 17:56:57.845590544 +0200 +@@ -101,6 +101,19 @@ + write_unlock_bh(&ip_nat_lock); + } + ++/* Place the conntrack entry in the nat hashtable. */ ++void ip_nat_hash_insert(struct ip_conntrack *ct) ++{ ++ unsigned int srchash ++ = hash_by_src(&ct->tuplehash[IP_CT_DIR_ORIGINAL] ++ .tuple); ++ ++ write_lock_bh(&ip_nat_lock); ++ list_add(&ct->nat.info.bysource, &bysource[srchash]); ++ write_unlock_bh(&ip_nat_lock); ++} ++EXPORT_SYMBOL_GPL(ip_nat_hash_insert); ++ + /* We do checksum mangling, so if they were wrong before they're still + * wrong. Also works for incomplete packets (eg. ICMP dest + * unreachables.) */ +@@ -295,7 +309,6 @@ + unsigned int hooknum) + { + struct ip_conntrack_tuple curr_tuple, new_tuple; +- struct ip_nat_info *info = &conntrack->nat.info; + int have_to_hash = !(conntrack->status & IPS_NAT_DONE_MASK); + enum ip_nat_manip_type maniptype = HOOK2MANIP(hooknum); + +@@ -330,14 +343,8 @@ + } + + /* Place in source hash if this is the first time. */ +- if (have_to_hash) { +- unsigned int srchash +- = hash_by_src(&conntrack->tuplehash[IP_CT_DIR_ORIGINAL] +- .tuple); +- write_lock_bh(&ip_nat_lock); +- list_add(&info->bysource, &bysource[srchash]); +- write_unlock_bh(&ip_nat_lock); +- } ++ if (have_to_hash) ++ ip_nat_hash_insert(conntrack); + + /* It's done. */ + if (maniptype == IP_NAT_MANIP_DST) +Index: linux-2.6.15.4/include/linux/netfilter_ipv4/ip_nat_core.h +=================================================================== +--- linux-2.6.15.4.orig/include/linux/netfilter_ipv4/ip_nat_core.h 2005-10-04 17:39:04.946695968 +0200 ++++ linux-2.6.15.4/include/linux/netfilter_ipv4/ip_nat_core.h 2005-10-04 17:39:15.282124744 +0200 +@@ -15,4 +15,7 @@ + struct ip_conntrack *ct, + enum ip_nat_manip_type manip, + enum ip_conntrack_dir dir); ++ ++extern void ip_nat_hash_insert(struct ip_conntrack *ct); ++ + #endif /* _IP_NAT_CORE_H */ -------------- next part -------------- --- linux-2.6.15.4/include/linux/netfilter_ipv4/ip_conntrack.h 2006-03-08 01:03:37.079785975 +0100 +++ linux-2.6.15.4/include/linux/netfilter_ipv4/ip_conntrack.max.h 2006-03-08 01:07:52.091014039 +0100 @@ -270,6 +270,9 @@ extern void ip_conntrack_hash_insert(struct ip_conntrack *ct); +/* Non-Locking ip_conntrack_hash_insert for ct_sync */ +extern void ip_conntrack_hash_insert_nolock(struct ip_conntrack *ct); + extern struct ip_conntrack_expect * __ip_conntrack_expect_find(const struct ip_conntrack_tuple *tuple); --- linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_core.c 2006-03-08 01:03:37.139789604 +0100 +++ linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_core.max.c 2006-03-08 01:08:44.278068578 +0100 @@ -437,6 +437,17 @@ write_unlock_bh(&ip_conntrack_lock); } +/* Non-Locking ip_conntrack_hash_insert for ct_sync */ +void ip_conntrack_hash_insert_nolock(struct ip_conntrack *ct) +{ + unsigned int hash, repl_hash; + + hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); + repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple); + + __ip_conntrack_hash_insert(ct, hash, repl_hash); +} + /* Confirm a connection given skb; places it in hash table */ int __ip_conntrack_confirm(struct sk_buff **pskb) --- linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_standalone.c 2006-03-08 01:03:37.123788636 +0100 +++ linux-2.6.15.4/net/ipv4/netfilter/ip_conntrack_standalone.max.c 2006-03-08 01:06:48.643301648 +0100 @@ -1018,6 +1018,7 @@ EXPORT_SYMBOL_GPL(ip_conntrack_alloc); EXPORT_SYMBOL_GPL(ip_conntrack_free); EXPORT_SYMBOL_GPL(ip_conntrack_hash_insert); +EXPORT_SYMBOL_GPL(ip_conntrack_hash_insert_nolock); EXPORT_SYMBOL_GPL(ip_ct_remove_expectations); -------------- next part -------------- Index: pf_packet.patch =================================================================== --- pf_packet.patch (revision 6474) +++ pf_packet.patch (working copy) @@ -1,8 +1,8 @@ %patch -Index: linux-2.6.10/include/linux/netfilter_packet.h +Index: linux-2.6.15.4/include/linux/netfilter_packet.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 -+++ linux-2.6.10/include/linux/netfilter_packet.h 2005-01-10 20:45:51.363332280 +0100 ++++ linux-2.6.15.4/include/linux/netfilter_packet.h 2005-01-10 20:45:51.363332280 +0100 @@ -0,0 +1,17 @@ +#ifndef __LINUX_NETFILTER_PACKET_H +#define __LINUX_NETFILTER_PACKET_H @@ -21,165 +21,3 @@ +#define NF_PACKET_OUTPUT 1 + +#endif /* __LINUX_NETFILTER_PACKET_H */ -Index: linux-2.6.10/net/core/dev.c -=================================================================== ---- linux-2.6.10.orig/net/core/dev.c 2005-01-10 20:23:28.000000000 +0100 -+++ linux-2.6.10/net/core/dev.c 2005-01-10 21:24:31.645595760 +0100 -@@ -112,6 +112,7 @@ - #include /* Note : will define WIRELESS_EXT */ - #include - #endif /* CONFIG_NET_RADIO */ -+#include - #include - - /* This define, if set, will randomly drop a packet when congestion -@@ -1215,35 +1216,12 @@ - * to congestion or traffic shaping. - */ - --int dev_queue_xmit(struct sk_buff *skb) -+static int dev_queue_xmit_finish(struct sk_buff *skb) - { - struct net_device *dev = skb->dev; - struct Qdisc *q; - int rc = -ENOMEM; - -- if (skb_shinfo(skb)->frag_list && -- !(dev->features & NETIF_F_FRAGLIST) && -- __skb_linearize(skb, GFP_ATOMIC)) -- goto out_kfree_skb; -- -- /* Fragmented skb is linearized if device does not support SG, -- * or if at least one of fragments is in highmem and device -- * does not support DMA from it. -- */ -- if (skb_shinfo(skb)->nr_frags && -- (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb)) && -- __skb_linearize(skb, GFP_ATOMIC)) -- goto out_kfree_skb; -- -- /* If packet is not checksummed and device does not support -- * checksumming for this protocol, complete checksumming here. -- */ -- if (skb->ip_summed == CHECKSUM_HW && -- (!(dev->features & (NETIF_F_HW_CSUM | NETIF_F_NO_CSUM)) && -- (!(dev->features & NETIF_F_IP_CSUM) || -- skb->protocol != htons(ETH_P_IP)))) -- if (skb_checksum_help(skb, 0)) -- goto out_kfree_skb; - - /* Disable soft irqs for various locks below. Also - * stops preemption for RCU. -@@ -1324,7 +1302,6 @@ - rc = -ENETDOWN; - local_bh_enable(); - --out_kfree_skb: - kfree_skb(skb); - return rc; - out: -@@ -1332,6 +1309,41 @@ - return rc; - } - -+int dev_queue_xmit(struct sk_buff *skb) -+{ -+ struct net_device *dev = skb->dev; -+ -+ if (skb_shinfo(skb)->frag_list && -+ !(dev->features & NETIF_F_FRAGLIST) && -+ __skb_linearize(skb, GFP_ATOMIC)) -+ goto out_kfree_skb; -+ -+ /* Fragmented skb is linearized if device does not support SG, -+ * or if at least one of fragments is in highmem and device -+ * does not support DMA from it. -+ */ -+ if (skb_shinfo(skb)->nr_frags && -+ (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb)) && -+ __skb_linearize(skb, GFP_ATOMIC)) -+ goto out_kfree_skb; -+ -+ /* If packet is not checksummed and device does not support -+ * checksumming for this protocol, complete checksumming here. -+ */ -+ if (skb->ip_summed == CHECKSUM_HW && -+ (!(dev->features & (NETIF_F_HW_CSUM | NETIF_F_NO_CSUM)) && -+ (!(dev->features & NETIF_F_IP_CSUM) || -+ skb->protocol != htons(ETH_P_IP)))) -+ if (skb_checksum_help(skb, 0)) -+ goto out_kfree_skb; -+ -+ return NF_HOOK(PF_PACKET, NF_PACKET_OUTPUT, skb, NULL, dev, -+ dev_queue_xmit_finish); -+ -+out_kfree_skb: -+ kfree_skb(skb); -+ return -ENOMEM; -+} - - /*======================================================================= - Receiver routines -@@ -1622,29 +1634,12 @@ - } - #endif - --int netif_receive_skb(struct sk_buff *skb) -+static int netif_receive_skb_finish(struct sk_buff *skb) - { - struct packet_type *ptype, *pt_prev; - int ret = NET_RX_DROP; - unsigned short type; - --#ifdef CONFIG_NETPOLL -- if (skb->dev->netpoll_rx && skb->dev->poll && netpoll_rx(skb)) { -- kfree_skb(skb); -- return NET_RX_DROP; -- } --#endif -- -- if (!skb->stamp.tv_sec) -- net_timestamp(&skb->stamp); -- -- skb_bond(skb); -- -- __get_cpu_var(netdev_rx_stat).total++; -- -- skb->h.raw = skb->nh.raw = skb->data; -- skb->mac_len = skb->nh.raw - skb->mac.raw; -- - pt_prev = NULL; - - rcu_read_lock(); -@@ -1713,7 +1708,30 @@ - return ret; - } - --static int process_backlog(struct net_device *backlog_dev, int *budget) -+int netif_receive_skb(struct sk_buff *skb) -+{ -+#ifdef CONFIG_NETPOLL -+ if (skb->dev->netpoll_rx && skb->dev->poll && netpoll_rx(skb)) { -+ kfree_skb(skb); -+ return NET_RX_DROP; -+ } -+#endif -+ -+ if (!skb->stamp.tv_sec) -+ net_timestamp(&skb->stamp); -+ -+ skb_bond(skb); -+ -+ __get_cpu_var(netdev_rx_stat).total++; -+ -+ skb->h.raw = skb->nh.raw = skb->data; -+ skb->mac_len = skb->nh.raw - skb->mac.raw; -+ -+ return NF_HOOK(PF_PACKET, NF_PACKET_INPUT, skb, skb->dev, NULL, -+ netif_receive_skb_finish); -+} -+ -+ static int process_backlog(struct net_device *backlog_dev, int *budget) - { - int work = 0; - int quota = min(backlog_dev->quota, *budget); -------------- next part -------------- Index: series =================================================================== --- series (revision 6474) +++ series (working copy) @@ -1,11 +1,8 @@ ct_notifier_pkt.patch pf_packet.patch -pf_packet_remove_warning.patch -export_ip_conntrack_helpers.patch -export_ip_nat_helpers.patch export_ip_conntrack_find.patch -export_ip_nat_lock_and_hash.patch +export_ip_nat_lock_and_hash.patch export_ip_conntrack_clean_from_lists.patch -conntrack_hash_manip.patch -conntrack_alloc.patch +export_ip_conntrack_cleanup.patch +ip_conntrack_hash_insert_nolock.patch ct_sync_config_and_makefile.patch From syrius.ml at no-log.org Sat Mar 11 13:14:22 2006 From: syrius.ml at no-log.org (syrius.ml@no-log.org) Date: Sat Mar 11 13:43:40 2006 Subject: Fwd: [Fwd: Re: trying to revive rtsp] In-Reply-To: <1a49df600602270144k6b94281eqfacc157b0d98027e@mail.gmail.com> (punkytse@gmail.com's message of "Mon, 27 Feb 2006 17:44:19 +0800") References: <43FAB99F.4000507@kde.org> <1a49df600602270144k6b94281eqfacc157b0d98027e@mail.gmail.com> Message-ID: <87zmjx9ol8.87y7zh9ol8@87wtf19ol8.message.id> Hi Guys, Punky wrote: > I am helping Mickael to resend his rtsp patch as he found his mail > can't get to the list promptly. As far, I am able to use his patch on > my own 2.6.15 kernel. He already has another cleaner version of the > patch and will post to the list later. I'm looking forward to see this cleaner version of the patch. If it could apply to 2.6.16 it would be even better. Also I think It would be nice to send a proper message to netfilter-devel with a proper subjet: '[PATCH] message' if you want it to be integrated. Thanks in advance. -- From jsullivan at opensourcedevel.com Sun Mar 12 00:24:49 2006 From: jsullivan at opensourcedevel.com (John A. Sullivan III) Date: Sun Mar 12 00:38:14 2006 Subject: volunteer tcl script writer needed for iptables application Message-ID: <1142119489.2987.61.camel@localhost> The ISCS open source network security management project (http://iscs.sourceforge.net) could use some volunteer assistance from someone who can adapt bash scripts to tcl for the creation of iptables configuration files and implementing dynamic iptables changes on production devices. If you are interested and able to assist, please contact me using the details in my signature below. For more details, please continue reading. We have added support for the Secure Computing / CyberGuard / SnapGear SG series of devices so that they can be managed using ISCS with no change to firmware. The SG580 devices are working fine in production but the SG570 devices use sash instead of bash. We can get around the limitations of bash by using the tcl interpreter. However, we have no one on the team with tcl experience. ISCS could be described as an open source alternative to very expensive products for managing large, enterprise network security deployments such as Solsoft or Provider1. Actually, it does much more and has no commercial equivalent. It has allowed us to implement complex, perimeter style security within the perimeter to affordably create truly segmented and multi-layered networks with a minimum of labor. To give an idea of what it does, a recent production deployment of internal network security for a global manufacturer would have required well in excess of 100,000 iptables rules. ISCS reduced that rule set to roughly 13,000 rules, only requires traversal of a small subset of those rules for any new packet, generated those rules in a couple of hours and distributed them to all devices automatically at the click of a button within a couple of minutes. ipset could probably reduce the rule set tenfold again. Any ipset experts out there interested in helping? In comparison, if one had to write 13,000 rules at 20 seconds per rule, that would be 72 hours -- at one minute per rule, 217 hours. 150,000 rules would take 833 hours at 20 seconds and 2,500 hours at one minute per rule. All this with a dramatic reduction in exposure to human error (one can imagine the danger of a typo or out of order rule in 150,000 line rule set). That's just the beginning. If you are interested and can help, we would greatly appreciate your assistance. Thanks - John -- John A. Sullivan III Open Source Development Corporation +1 207-985-7880 jsullivan@opensourcedevel.com Financially sustainable open source development http://www.opensourcedevel.com -- John A. Sullivan III Open Source Development Corporation +1 207-985-7880 jsullivan@opensourcedevel.com Financially sustainable open source development http://www.opensourcedevel.com From aton at packetdropped.org Sun Mar 12 15:10:54 2006 From: aton at packetdropped.org (aton) Date: Sun Mar 12 15:24:31 2006 Subject: netfilter_queue reinjecting packets In-Reply-To: <1142119489.2987.61.camel@localhost> References: <1142119489.2987.61.camel@localhost> Message-ID: <20060312151054.5a2020ad.aton@packetdropped.org> has anyone used netfilter_queue and successfully re-injected packets into the net? i want to write sort of a userspace routing application. host A is my workstation, it has host B as default gateway. on host B my routing application runs. it receives packets from netfilter_queue using libnetfilter_queue. this works very well and i can display the whole packets. now i just want to send them back into the net, so that they reach their destination. i modified the main loop in the source of nfqnl_test.c in the libnetfilter_queue package: while ((rv=recv(fd, buf, sizeof(buf), 0)) >= 0) { printf("pkt received:\n"); printf("sending packet back\n"); if ((sv=send(fd, buf, sizeof(buf), 0))==-1) { perror("send"); exit(EXIT_FAILURE); } printf("done\n"); } this should send every packet back to... where? it seems the packets are just sent into nirvana, i cannot sniff them, and i dont get an error from send(). is this the way to go, or should i make two raw sockets, one for tcp and one for udp packets and send the incoming packets on these? greetings, aton -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060312/29197939/attachment.pgp From nsyilmaz at metu.edu.tr Sun Mar 12 15:40:29 2006 From: nsyilmaz at metu.edu.tr (nebi senol yilmaz) Date: Sun Mar 12 15:53:59 2006 Subject: nfhook, sk_buff and change data Message-ID: hello; i've a problem about sk_buff checksum (tcp checksum). I've coded a kernel module that makes some changes at some packets , which are read at NF_IP_FORWARD hook. My module changes data of the packetand returns NF_ACCEPT... but the packet with changed data can not be accepted by the target machine. i'm calculating both ip header checksum and tcp checksum, but still i've problem.. does anybody know about my problem? can you give me some advice whether you have any solutions about it. related piece of code is below: =================== unsigned int hook_pool(unsigned int hooknum, struct sk_buff **skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff*)){ struct tcphdr *tcp; char *data; struct sk_buff *sock_buff; int i=0; char *findGzip; sock_buff=*skb; tcp = (struct tcphdr *)(sock_buff->data + (sock_buff->nh.iph->ihl * 4)); data = (char *)((int)tcp + (int)(tcp->doff * 4)); if (findGzip=(strstr(data,"senol1763"))){ printk("%d Data CheckSum 1 = %u\n",hooknum,sock_buff->csum); printk("%d TCP CheckSum 1 = %u\n",hooknum,sock_buff->h.th->check); printk("%d IP CheckSum 1 = %u\n",hooknum,sock_buff->nh.iph->check); printk("\n"); findGzip[1]='n'; /* when i find a related str, i change a char*/ /* here, i'm calculating ne csum.*/ sock_buff->h.th->check=0; sock_buff->h.th->check=in_cksum_tcp(sock_buff->nh.iph->saddr,sock_buff->nh.iph->daddr,(unsigned short *)sock_buff->h.th , sizeof(*sock_buff->h.th) ); } return NF_ACCEPT; } =================== thanks From kaber at trash.net Sun Mar 12 15:44:18 2006 From: kaber at trash.net (Patrick McHardy) Date: Sun Mar 12 15:59:48 2006 Subject: netfilter_queue reinjecting packets In-Reply-To: <20060312151054.5a2020ad.aton@packetdropped.org> References: <1142119489.2987.61.camel@localhost> <20060312151054.5a2020ad.aton@packetdropped.org> Message-ID: <441433C2.6010901@trash.net> aton wrote: > has anyone used netfilter_queue and successfully re-injected packets into the net? > > i want to write sort of a userspace routing application. > > host A is my workstation, it has host B as default gateway. > > on host B my routing application runs. > it receives packets from netfilter_queue using libnetfilter_queue. > this works very well and i can display the whole packets. > > now i just want to send them back into the net, so that they reach their destination. > > i modified the main loop in the source of nfqnl_test.c in the libnetfilter_queue package: > > while ((rv=recv(fd, buf, sizeof(buf), 0)) >= 0) > { > printf("pkt received:\n"); > > printf("sending packet back\n"); > > if ((sv=send(fd, buf, sizeof(buf), 0))==-1) > { > perror("send"); > exit(EXIT_FAILURE); > } > printf("done\n"); > } > > this should send every packet back to... where? > it seems the packets are just sent into nirvana, i cannot sniff them, and i dont get an error from send(). nfnql_test already reinjects packets by the call to nfq_issue_verdict. It seems you need to read the documentation .. From kaber at trash.net Sun Mar 12 15:46:03 2006 From: kaber at trash.net (Patrick McHardy) Date: Sun Mar 12 16:01:22 2006 Subject: nfhook, sk_buff and change data In-Reply-To: References: Message-ID: <4414342B.2060709@trash.net> nebi senol yilmaz wrote: > hello; > > i've a problem about sk_buff checksum (tcp checksum). > > I've coded a kernel module that makes some changes at some packets , > which are read at NF_IP_FORWARD hook. > My module changes data of the packetand returns NF_ACCEPT... > > but the packet with changed data can not be accepted by the target > machine. > > i'm calculating both ip header checksum and tcp checksum, but still > i've problem.. > > does anybody know about my problem? can you give me some advice > whether you have any solutions about it. > > > related piece of code is below: > > > =================== > unsigned int hook_pool(unsigned int hooknum, > struct sk_buff **skb, > const struct net_device *in, > const struct net_device *out, > int (*okfn)(struct sk_buff*)){ > > > > struct tcphdr *tcp; > char *data; > struct sk_buff *sock_buff; > int i=0; > char *findGzip; > > sock_buff=*skb; > > tcp = (struct tcphdr *)(sock_buff->data + (sock_buff->nh.iph->ihl * 4)); > data = (char *)((int)tcp + (int)(tcp->doff * 4)); > > > if (findGzip=(strstr(data,"senol1763"))){ > printk("%d Data CheckSum 1 = %u\n",hooknum,sock_buff->csum); > printk("%d TCP CheckSum 1 = %u\n",hooknum,sock_buff->h.th->check); > printk("%d IP CheckSum 1 = %u\n",hooknum,sock_buff->nh.iph->check); > printk("\n"); > > findGzip[1]='n'; /* when i find a related str, i change a char*/ > /* here, i'm calculating ne csum.*/ > sock_buff->h.th->check=0; > sock_buff->h.th->check=in_cksum_tcp(sock_buff->nh.iph->saddr,sock_buff->nh.iph->daddr,(unsigned short *)sock_buff->h.th , sizeof(*sock_buff->h.th) ); In the FORWARD hook skb->h.th doesn't point to the TCP header. From nsyilmaz at metu.edu.tr Sun Mar 12 15:59:26 2006 From: nsyilmaz at metu.edu.tr (nebi senol yilmaz) Date: Sun Mar 12 16:12:52 2006 Subject: nfhook, sk_buff and change data In-Reply-To: <4414342B.2060709@trash.net> References: <4414342B.2060709@trash.net> Message-ID: > > > > tcp = (struct tcphdr *)(sock_buff->data + (sock_buff->nh.iph->ihl * 4)); > > data = (char *)((int)tcp + (int)(tcp->doff * 4)); > > > > > > if (findGzip=(strstr(data,"senol1763"))){ > > printk("%d Data CheckSum 1 = %u\n",hooknum,sock_buff->csum); > > printk("%d TCP CheckSum 1 = %u\n",hooknum,sock_buff->h.th->check); > > printk("%d IP CheckSum 1 = %u\n",hooknum,sock_buff->nh.iph->check); > > printk("\n"); > > > > findGzip[1]='n'; /* when i find a related str, i change a char*/ > > /* here, i'm calculating ne csum.*/ > > sock_buff->h.th->check=0; > > sock_buff->h.th->check=in_cksum_tcp(sock_buff->nh.iph->saddr,sock_buff->nh.iph->daddr,(unsigned short *)sock_buff->h.th , sizeof(*sock_buff->h.th) ); > > In the FORWARD hook skb->h.th doesn't point to the TCP header. > really? now i wonder, how can i address the TCP header, and set the calculated csum to header for each hook (if the pointer changes for each hook)? But i've never come up with this result when i'm googling and coding. thanks for interest From gandalf at wlug.westbo.se Sun Mar 12 19:49:05 2006 From: gandalf at wlug.westbo.se (Martin Josefsson) Date: Sun Mar 12 20:02:33 2006 Subject: Hashtrie testing2 (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) In-Reply-To: <1141756438.3881.158.camel@localhost.localdomain> References: <43EFF1F0.1090701@netfilter.org> <20060213112028.GU4601@sunbeam.de.gnumonks.org> <43F438F5.8070607@trash.net> <43F43FA9.4000906@trash.net> <43F4426D.9060807@trash.net> <43F4DBDF.9010008@trash.net> <1141503111.3881.61.camel@localhost.localdomain> <1141580938.3881.129.camel@localhost.localdomain> <1141756438.3881.158.camel@localhost.localdomain> Message-ID: <1142189345.3881.181.camel@localhost.localdomain> On Tue, 2006-03-07 at 19:33 +0100, Martin Josefsson wrote: Hi Jozsef > > But in the non-DoS random pattern case, there was 819200 entries / 27345 > > child-nodes =~ 29 entries/child-nodes, that's still around 10% > > utilization. So it looks it's not the jenkins hash which produces the > > sparse tree. > > I'll see what I can dig out from the entries in the nodes, it might be > very unbalanced. I've added some very simple debug-code and here's what I've found so far when testing with random src/dst ip/port. number of slots per hashtrie bucket: 5 ... Depth 4 - Number of children 0 Depth 4 - Number of buckets 0 Depth 4 - Number of not full buckets 0 Depth 3 - Number of children 0 Depth 3 - Number of buckets 7801856 Depth 3 - Number of entries 291689 (0%) Depth 3 - Number of not full buckets 7801856 Depth 2 - Number of children 121904 Depth 2 - Number of buckets 262144 Depth 2 - Number of entries 1121175 (85%) Depth 2 - Number of not full buckets 99698 Depth 1 - Number of children 4096 Depth 1 - Number of buckets 4096 Depth 1 - Number of entries 20480 (100%) Depth 1 - Number of not full buckets 0 Depth 0 - Number of children 64 Depth 0 - Number of buckets 64 Depth 0 - Number of entries 256 (100%) Depth 0 - Number of not full buckets 0 If we look at depth 2 we can see the following: We have 121904 buckets that are "overfull", thus leading to an expansion. (121904 * 5 + 291689) / (121904 * 5) = 1.48 = 148% And we have 262144 - 121904 - 99698 = 40542 buckets that are 100% full. And we have 99698 not full buckets with a number of entries: 1121175 - (121904 + 40542) * 5 = 308945 entries 308945 / (99698 * 5) = 0.62 = 62% usage Summary: 121904 buckets that are 148% used. 40542 buckets that are 100% used. 99698 buckets that are 62% used. So the jenkins hash doesn't seem to be able to distribute the entries very well. This leads to the allocation of 121904 children for just 291689 entries, that's a huge waste of memory. But even if we are able to come up with a better hash algorithm for this we'll still have this problem when we start expanding deeper, the size of the tree simply explodes. One idea I've played with earlier is to change the number of buckets per child depending on the depth, so we have larger children at the top of the tree and then they get smaller further down. I'll have to revive that code (should be somewhere in svn) in order to say if it helped the memory usage or not, I know it made things a bit slower. I've also tried making the buckets larger and decreasing HASHSHIFT in order to not exceed 4kB allocations, this allows the buckets to absorb the imperfectness of the hashing and memory usage goes down a _lot_. But... this is will simply destroy performance on machines with small cachelines as it will lead to lots of cachemisses per bucket. Any ideas? -- /Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : /pipermail/netfilter-devel/attachments/20060312/bc8bdc65/attachment.pgp From aton at packetdropped.org Sun Mar 12 20:21:33 2006 From: aton at packetdropped.org (aton) Date: Sun Mar 12 20:35:29 2006 Subject: netfilter_queue reinjecting packets In-Reply-To: <441433C2.6010901@trash.net> References: <1142119489.2987.61.camel@localhost> <20060312151054.5a2020ad.aton@packetdropped.org> <441433C2.6010901@trash.net> Message-ID: <20060312202133.08f8d8ee.aton@packetdropped.org> On Sun, 12 Mar 2006 15:44:18 +0100 Patrick McHardy wrote: > aton wrote: > > has anyone used netfilter_queue and successfully re-injected packets into the net? > > > > i want to write sort of a userspace routing application. > > > > host A is my workstation, it has host B as default gateway. > > > > on host B my routing application runs. > > it receives packets from netfilter_queue using libnetfilter_queue. > > this works very well and i can display the whole packets. > > > > now i just want to send them back into the net, so that they reach their destination. > > > > i modified the main loop in the source of nfqnl_test.c in the libnetfilter_queue package: > > > > while ((rv=recv(fd, buf, sizeof(buf), 0)) >= 0) > > { > > printf("pkt received:\n"); > > > > printf("sending packet back\n"); > > > > if ((sv=send(fd, buf, sizeof(buf), 0))==-1) > > { > > perror("send"); > > exit(EXIT_FAILURE); > > } > > printf("done\n"); > > } > > > > this should send every packet back to... where? > > it seems the packets are just sent into nirvana, i cannot sniff them, and i dont get an error from send(). > > nfnql_test already reinjects packets by the call to nfq_issue_verdict. > It seems you need to read the documentation .. > > sorry, but i cannot find any call to nfq_issue_verdict in this file. perhaps you mean nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL); ? i thought nfq_set_verdict was used to specify a handling routine for the packets... in the case of nfq_test.c set the handling routine for packets to the print_pkt() function. am i wrong? what documentation? i would _love_ to read some documentation about libnetfilter_queue. i have looked through http://netfilter.org/documentation/index.html#documentation-howto but i cannot find anything specific about libnetfilter_queue... greetings, aton -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060312/5b1daa04/attachment-0001.pgp From kaber at trash.net Sun Mar 12 20:35:40 2006 From: kaber at trash.net (Patrick McHardy) Date: Sun Mar 12 20:51:02 2006 Subject: netfilter_queue reinjecting packets In-Reply-To: <20060312202133.08f8d8ee.aton@packetdropped.org> References: <1142119489.2987.61.camel@localhost> <20060312151054.5a2020ad.aton@packetdropped.org> <441433C2.6010901@trash.net> <20060312202133.08f8d8ee.aton@packetdropped.org> Message-ID: <4414780C.60907@trash.net> aton wrote: >>nfnql_test already reinjects packets by the call to nfq_issue_verdict. >>It seems you need to read the documentation .. >> > > sorry, but i cannot find any call to nfq_issue_verdict in this file. > perhaps you mean nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL); ? Yes, thats what I meant. > i thought nfq_set_verdict was used to specify a handling routine for the packets... in the case of nfq_test.c set the handling routine for packets to the print_pkt() function. > am i wrong? Yes. nfq_set_verdict is used to tell the kernel to pass the packet on and possibly exchange it. Both print_pkt and nfq_set_verdict are called from the packet callback in the example code. > what documentation? i would _love_ to read some documentation about libnetfilter_queue. > i have looked through http://netfilter.org/documentation/index.html#documentation-howto but i cannot find anything specific about libnetfilter_queue... I don't think there is specific libnetfilter_queue documentation yet (but its very simple and exports only a few functions, look at the code). But we have ip_queue documentation, which should at least help you understand it better conceptually. From zhaojingmin at hotmail.com Mon Mar 13 03:22:23 2006 From: zhaojingmin at hotmail.com (Jing Min Zhao) Date: Mon Mar 13 03:36:06 2006 Subject: New H.323 conntrack & NAT helper module References: <925A849792280C4E80C5461017A4B8A2032119@mail733.InfraSupportEtc.com><44001CDD.3030305@trash.net><4400A541.9080901@trash.net> <440960E4.80601@trash.net> Message-ID: ----- Original Message ----- From: "Patrick McHardy" To: "Jing Min Zhao" Cc: Sent: Saturday, March 04, 2006 4:41 AM Subject: Re: New H.323 conntrack & NAT helper module > > I would expect incremental checksumming to be less expensive than > redoing the entire checksum. I'll try to get a patch ready for > testing this weekend, than we can compare the two approaches. > OK, I think you know much more of kernel than I do anyway, so I have changed the code based on your patches. The new module (version 0.3) is ready for download at http://sourceforge.net/project/showfiles.php?group_id=158936. The document at http://nath323.sourceforge.net is updated. Following are the changes: 1. Added support for multiple TPKTs in one packet (suggested by Patrick McHardy) 2. Avoid excessive stack usage (based on Patrick McHardy's patch) 3. Added support for non-linear skb (based on Patrick McHardy's patch) 4. Fixed missing H.245 module owner (Patrick McHardy) 5. Avoid long RAS expectation chains (Patrick McHardy) 6. Fixed incorrect __exit attribute (Patrick McHardy) 7. Eliminated unnecessary return code 8. Fixed incorrect use of NAT data from conntrack code (found by Patrick McHardy) 9. Fixed TTL calculation error in RCF 10. Added TTL support in RRQ 11. Better support for separate TPKT header and data Next week I'll release a newer version for these issues: 1. Separate ASN.1 code and data. 2. Sort H.323 data to avoid forwarding declarations 3. Add a parameter to make Q.931 signal expect for any address optional. 4. Add support for T.120. Best regards, Jing Min Zhao From menno at netboxblue.com Mon Mar 13 10:06:29 2006 From: menno at netboxblue.com (Menno Smits) Date: Mon Mar 13 10:22:10 2006 Subject: "Late REDIRECT" Message-ID: <44153615.50408@netboxblue.com> Hi, Just wanted to ask for your opinions on an idea. Please let me know if you think this is too difficult or crazy. We use currently use the REDIRECT target in nat PREROUTING to send specific traffic to proxies running on our gateway (http, pop3, dns and smtp). This works ok but we have the following problems: 1) nat PREROUTING happens before filter FORWARD. If we want to apply consistent filter rules to outbound traffic regardless of whether it goes via a transparent proxy or directly out then we can't because the transproxied traffic never goes thru filter FORWARD. Currently we use a horrible system of marks set in mangle PREROUTING to work around this. We reject packets in FORWARD or skip the REDIRECTs in nat based on the marks set. This is ugly and hard to debug (esp because we also use marks for traffic shaping). 2) Return traffic from the transparent proxy REDIRECTs has the source IP and source port of the transparent proxy listener, not the true remote site and port. This means that when we do accounting for return traffic (using ULOG in mangle POSTROUTING) the remote host and port are incorrect. A possible solution to the above problems is to allow REDIRECTs to occur in nat POSTROUTING (a "late redirect" for want of a better term). That way all outbound traffic can pass through filter FORWARD before being REDIRECTed. The reply NAT for the late REDIRECT would work in a similar way, being performed before filter FORWARD so that the true source IP and port is seen there. Is something like this feasible? How difficult would it be implement? Am I barking up the wrong tree? Regards, Menno Scanned by the NetBox from NetBox Blue (http://netboxblue.com/) From pablo at eurodev.net Mon Mar 13 12:01:10 2006 From: pablo at eurodev.net (Pablo Neira Ayuso) Date: Mon Mar 13 12:13:39 2006 Subject: "Late REDIRECT" In-Reply-To: <44153615.50408@netboxblue.com> References: <44153615.50408@netboxblue.com> Message-ID: <441550F6.7060609@eurodev.net> Menno Smits wrote: > Hi, > > Just wanted to ask for your opinions on an idea. Please let me know if > you think this is too difficult or crazy. > > We use currently use the REDIRECT target in nat PREROUTING to send > specific traffic to proxies running on our gateway (http, pop3, dns and > smtp). > > This works ok but we have the following problems: > > 1) nat PREROUTING happens before filter FORWARD. If we want to apply > consistent filter rules to outbound traffic regardless of whether it > goes via a transparent proxy or directly out then we can't because the > transproxied traffic never goes thru filter FORWARD. Currently we use a > horrible system of marks set in mangle PREROUTING to work around this. > We reject packets in FORWARD or skip the REDIRECTs in nat based on the > marks set. This is ugly and hard to debug (esp because we also use marks > for traffic shaping). > > 2) Return traffic from the transparent proxy REDIRECTs has the source IP > and source port of the transparent proxy listener, not the true remote > site and port. This means that when we do accounting for return traffic > (using ULOG in mangle POSTROUTING) the remote host and port are incorrect. > > A possible solution to the above problems is to allow REDIRECTs to occur > in nat POSTROUTING (a "late redirect" for want of a better term). That > way all outbound traffic can pass through filter FORWARD before being > REDIRECTed. The reply NAT for the late REDIRECT would work in a similar > way, being performed before filter FORWARD so that the true source IP > and port is seen there. > > Is something like this feasible? How difficult would it be implement? Am > I barking up the wrong tree? Ick, this seems frigthening. Why don't you filter in the raw PREROUTING? -- Pablo From sebastien.tricaud at wengo.fr Mon Mar 13 12:47:48 2006 From: sebastien.tricaud at wengo.fr (Sebastien Tricaud) Date: Mon Mar 13 13:01:48 2006 Subject: Knowing tables change Message-ID: <44155BE4.80001@wengo.fr> Hi folks, I would like to know if there is a way to watch for tables alteration. I am sure there is a better way than doing "iptables -t table -L" loop and compare with previously stored data. When I look over Internet for possible answers, I can find something that would do the job. It seems libpkttnetlink is for this purpose. However no developments are latter than 2002. Is it a working stuff and nothing has to be improved anymore ? At a lower level, I can see libnfnetlink is the low level library I can also use for it: there is the following quote -> "provides open/close/receive functions only to be used by other libraries libctnetlink/libpkttnetlink". Do you know which lib should I use ? Thanks, Sebastien Tricaud. From kaber at trash.net Mon Mar 13 15:55:11 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Mar 13 16:10:46 2006 Subject: Knowing tables change In-Reply-To: <44155BE4.80001@wengo.fr> References: <44155BE4.80001@wengo.fr> Message-ID: <441587CF.4050203@trash.net> Sebastien Tricaud wrote: > Hi folks, > > I would like to know if there is a way to watch for tables alteration. > > I am sure there is a better way than doing "iptables -t table -L" loop > and compare with previously stored data. watch -n 1 -d iptables -vxnL :) > When I look over Internet for possible answers, I can find something > that would do the job. It seems libpkttnetlink is for this purpose. > However no developments are latter than 2002. Is it a working stuff and > nothing has to be improved anymore ? > > At a lower level, I can see libnfnetlink is the low level library I can > also use for it: there is the following quote -> "provides > open/close/receive functions only to be used by other libraries > libctnetlink/libpkttnetlink". There are no notifications for ruleset updates currently, since ruleset exchange between kernel and userspace isn't built on netlink and happens as one atomic operation, so the kernel doesn't know which rules are new. From kaber at trash.net Mon Mar 13 16:00:48 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Mar 13 16:16:14 2006 Subject: New H.323 conntrack & NAT helper module In-Reply-To: References: <925A849792280C4E80C5461017A4B8A2032119@mail733.InfraSupportEtc.com><44001CDD.3030305@trash.net><4400A541.9080901@trash.net> <440960E4.80601@trash.net> Message-ID: <44158920.30700@trash.net> Jing Min Zhao wrote: > OK, I think you know much more of kernel than I do anyway, so I have > changed the code based on your patches. The new module (version 0.3) > is ready for download at > http://sourceforge.net/project/showfiles.php?group_id=158936. > The document at http://nath323.sourceforge.net is updated. > > Following are the changes: > > 1. Added support for multiple TPKTs in one packet (suggested by Patrick > McHardy) > 2. Avoid excessive stack usage (based on Patrick McHardy's patch) > 3. Added support for non-linear skb (based on Patrick McHardy's patch) > 4. Fixed missing H.245 module owner (Patrick McHardy) > 5. Avoid long RAS expectation chains (Patrick McHardy) > 6. Fixed incorrect __exit attribute (Patrick McHardy) > 7. Eliminated unnecessary return code > 8. Fixed incorrect use of NAT data from conntrack code (found by Patrick > McHardy) > 9. Fixed TTL calculation error in RCF > 10. Added TTL support in RRQ > 11. Better support for separate TPKT header and data > > Next week I'll release a newer version for these issues: > > 1. Separate ASN.1 code and data. > 2. Sort H.323 data to avoid forwarding declarations > 3. Add a parameter to make Q.931 signal expect for any address optional. > 4. Add support for T.120. Great, thanks. When you release the new version, please also post the patch to netfilter-devel and add a Signed-off-by: line (see Documentation/SubmittingPatches in the kernel source). From azez at ufomechanic.net Mon Mar 13 18:47:05 2006 From: azez at ufomechanic.net (Amin Azez) Date: Mon Mar 13 19:00:52 2006 Subject: ipt_recent patch In-Reply-To: <440ECB1B.4070507@trash.net> References: <43F9EA77.4060208@ufomechanic.net> <44096532.2070000@trash.net> <440DAB6B.4020208@ufomechanic.net> <440ECB1B.4070507@trash.net> Message-ID: <4415B019.6050409@ufomechanic.net> This patch has some sort of versioning built in so that the new userland features are not enabled unless built against a new header file. If a new kernel module is called with rules by old userland, then it refuses to accept the rule. I don't know if you will like my CPP macros, I could expand them out? Changes: * /proc/net/ipt_recent/* is sorted by age, most recent first * include Per Hedelands patch in message <200603062203.k26M3pI5024778@tordmule.bluetail.com> * fix problem of not moving up list entries after deleting entries which meant that sometimes you could never fill the list * fix problems of items with hash of 0 being erased whenever an empty slot is used * Add these new features: [!] --listcount-lt count Requires as a precondition that the number of IP entries in the list (subject to the optional --listtime-* specifier) is less than count (or not !). No other options are considered if this is not true. [!] --listcount-le count Requires as a precondition that the number of IP entries in the list (subject to the optional --listtime-* specifier) is less than or equal to count (or not !). No other options are consid- ered if this is not true. [!] --listcount-eq count Requires as a precondition that the number of IP entries in the list (subject to the optional --listtime-* specifier) is equal to (or not !) count. No other options are considered if this is not true. [!] --listcount-ge count Requires as a precondition that the number of IP entries in the list (subject to the optional --listtime-* specifier) is greater than or equal to count (or not !). No other options are consid- ered if this is not true. [!] --listcount-gt count Requires as a precondition that the number of IP entries in the list (subject to the optional --listtime-* specifier) is greater than count (or not !). No other options are considered if this is not true. Only one --listcount-* option can be specified. [!] --listtime-lt seconds Affects the --listcount-* so that instead of counting the number of items in the list, it counts the number of items that were last updated less than seconds seconds ago. [!] --listtime-le seconds Affects the --listcount-* so that instead of counting the number of items in the list, it counts the number of items that were last updated less than or equal to seconds seconds ago. [!] --listtime-ge seconds Affects the --listcount-* so that instead of counting the number of items in the list, it counts the number of items that were last updated more than or equal to seconds seconds ago. [!] --listtime-gt seconds Affects the --listcount-* so that instead of counting the number of items in the list, it counts the number of items that were last updated more than seconds seconds ago. Only one --listtime-* option can be specified. --listtime-* options act as select clauses for what to count. The ! negation for --list- time-* options merely inverts the comparison, so ! --listime-le is the same as --listtime-gt ... The next example accepts the packet if less than 5 ip addresses in the list have been updated in the last 60 seconds # iptables -A FORWARD -m recent --listcount-lt 5 --listtime-lt 60 --set -j ACCEPT -------------- next part -------------- A non-text attachment was scrubbed... Name: iptables.recent.patch Type: text/x-patch Size: 15440 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060313/50227f14/iptables.recent-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: kernel_ipt_recent.patch Type: text/x-patch Size: 19620 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060313/50227f14/kernel_ipt_recent-0001.bin From menno at netboxblue.com Tue Mar 14 03:50:32 2006 From: menno at netboxblue.com (Menno Smits) Date: Tue Mar 14 04:06:19 2006 Subject: "Late REDIRECT" In-Reply-To: <441550F6.7060609@eurodev.net> References: <44153615.50408@netboxblue.com> <441550F6.7060609@eurodev.net> Message-ID: <44162F78.50203@netboxblue.com> Hi Pablo, Pablo Neira Ayuso wrote: >> >> Is something like this feasible? How difficult would it be implement? Am >> I barking up the wrong tree? > > Ick, this seems frigthening. Why don't you filter in the raw PREROUTING? Two reasons: 1) You can't do REJECT in raw, only DROP. 2) You still need to use convoluted rules to figure out where packets are going to go. In the filter table you know what's being forwarded and what is local and in filter FORWARD you know both the source and destination interface. Currently we use an intricate arrangement of chains and rules in mangle PREROUTING to determine the marks to set based on the known IPs, networks and routes for each interface. Packets then get handled according to their marks in the filter and nat tables. Regards, Menno Scanned by the NetBox from NetBox Blue (http://netboxblue.com/) From slaveze at gmail.com Tue Mar 14 10:15:27 2006 From: slaveze at gmail.com (=?ISO-8859-1?Q?S=E9bastien_LAVEZE?=) Date: Tue Mar 14 10:29:04 2006 Subject: Libnetfilter_conntrack, CTNL_TEST In-Reply-To: References: Message-ID: Hi I am developping an application using libnetfilter_conntrack I first tried to run the example program ctnl_test, it seems to work for events and table dumping but i still get errors and i would like to know if it's normal Here is the output : Test for libnetfilter_conntrack NFNETLINK answers: Invalid argument TEST 1: create conntrack (-22) NFNETLINK answers: -EINVAL, make sure ip_conntrack_netlink is loaded and you have NET_CAPABILITIES TEST 2: dump conntrack table and reset (-524) tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 sport=44582 dport=3128 src= 172.16.16.16 dst=192.168.31.42 sport=3128 dport=44582 [ASSURED] use=1 tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 sport=44583 dport=3128 src=172.16.16.16 dst=192.168.31.42 sport=3128 dport=44583 [ASSURED] use=1 tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 sport=44581 dport=3128 src=172.16.16.16 dst= 192.168.31.42 sport=3128 dport=44581 [ASSURED] use=1 tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 sport=44584 dport=3128 src=172.16.16.16 dst=192.168.31.42 sport=3128 dport=44584 [ASSURED] use=1 TEST 3: dump conntrack table (0) TEST 4: get conntrack (-22) TEST 5: update conntrack (-22) NFNETLINK answers: Invalid argument TEST 6: delete conntrack (-22) TEST 7: Waiting for 10 conntrack events Event number 1 Event number 2 Event number 3 Event number 4 Event number 5 Event number 6 Event number 7 Event number 8 Event number 9 Event number 10 TEST 7: Received 10 conntrack events (-1) Test failed with error -1. Errors=5 I'm using a 2.6.15 kernel and i have all the needed modules installed(ip_conntrack_netlink, ip_conntrack, nfnetlink, nfnetlink_log...) Thanks Sebastien From kadlec at blackhole.kfki.hu Tue Mar 14 12:35:31 2006 From: kadlec at blackhole.kfki.hu (Jozsef Kadlecsik) Date: Tue Mar 14 12:48:21 2006 Subject: Hashtrie testing2 (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) In-Reply-To: <1142189345.3881.181.camel@localhost.localdomain> References: <43EFF1F0.1090701@netfilter.org> <20060213112028.GU4601@sunbeam.de.gnumonks.org> <43F438F5.8070607@trash.net> <43F43FA9.4000906@trash.net> <43F4426D.9060807@trash.net> <43F4DBDF.9010008@trash.net> <1141503111.3881.61.camel@localhost.localdomain> <1141580938.3881.129.camel@localhost.localdomain> <1141756438.3881.158.camel@localhost.localdomain> <1142189345.3881.181.camel@localhost.localdomain> Message-ID: Hi Martin, On Sun, 12 Mar 2006, Martin Josefsson wrote: > I've added some very simple debug-code and here's what I've found so far > when testing with random src/dst ip/port. [...] > Summary: > 121904 buckets that are 148% used. > 40542 buckets that are 100% used. > 99698 buckets that are 62% used. > > So the jenkins hash doesn't seem to be able to distribute the entries > very well. I wrote first such words out of frustation, but actually I think the jenkins hash is fine. It was selected after a lot of testing, checking, comparison. There might of course be flaws in it, but > This leads to the allocation of 121904 children for just 291689 entries, > that's a huge waste of memory. But even if we are able to come up with a > better hash algorithm for this we'll still have this problem when we > start expanding deeper, the size of the tree simply explodes. we may simply see the consequences of the inherent behaviour of hashtree *and* the goodness of the jenkins hash: as part of the hash keys clash (they do because HASHSHIFT is relatively small) and as the tree grows, there will be too many empty branches (the whole hash key is pretty unique, there are to *few* near-clashes at the higher levels). But that's an assumption only. > One idea I've played with earlier is to change the number of buckets per > child depending on the depth, so we have larger children at the top of > the tree and then they get smaller further down. I'll have to revive > that code (should be somewhere in svn) in order to say if it helped the > memory usage or not, I know it made things a bit slower. Usually we trade speed for memory. Double hashing does not help really either, because it slows down the lookups too :-( - I learnt it at ipset. Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From pablo at eurodev.net Tue Mar 14 13:25:27 2006 From: pablo at eurodev.net (Pablo Neira Ayuso) Date: Tue Mar 14 13:37:58 2006 Subject: Libnetfilter_conntrack, CTNL_TEST In-Reply-To: References: Message-ID: <4416B637.4070409@eurodev.net> S?bastien LAVEZE wrote: > I am developping an application using libnetfilter_conntrack > I first tried to run the example program ctnl_test, it seems to work > for events and table dumping but i still get errors and i would like > to know if it's normal > Here is the output : > > Test for libnetfilter_conntrack > NFNETLINK answers: Invalid argument > TEST 1: create conntrack (-22) > NFNETLINK answers: -EINVAL, make sure ip_conntrack_netlink is loaded > and you have NET_CAPABILITIES > TEST 2: dump conntrack table and reset (-524) > tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 > sport=44582 dport=3128 src= 172.16.16.16 dst=192.168.31.42 > sport=3128 dport=44582 [ASSURED] use=1 > tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 > sport=44583 dport=3128 src=172.16.16.16 dst=192.168.31.42 > sport=3128 dport=44583 [ASSURED] use=1 > tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 > sport=44581 dport=3128 src=172.16.16.16 dst= 192.168.31.42 > sport=3128 dport=44581 [ASSURED] use=1 > tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 > sport=44584 dport=3128 src=172.16.16.16 dst=192.168.31.42 > sport=3128 dport=44584 [ASSURED] use=1 > TEST 3: dump conntrack table (0) > TEST 4: get conntrack (-22) > TEST 5: update conntrack (-22) > NFNETLINK answers: Invalid argument > TEST 6: delete conntrack (-22) > TEST 7: Waiting for 10 conntrack events > Event number 1 > Event number 2 > Event number 3 > Event number 4 > Event number 5 > Event number 6 > Event number 7 > Event number 8 > Event number 9 > Event number 10 > TEST 7: Received 10 conntrack events (-1) > Test failed with error -1. Errors=5 > > I'm using a 2.6.15 kernel and i have all the needed modules > installed(ip_conntrack_netlink, ip_conntrack, nfnetlink, > nfnetlink_log...) No, it is not a normal output. What version of libnetfilter_conntrack are you using? -- Pablo From slaveze at gmail.com Tue Mar 14 13:50:29 2006 From: slaveze at gmail.com (=?ISO-8859-1?Q?S=E9bastien_LAVEZE?=) Date: Tue Mar 14 14:04:08 2006 Subject: Libnetfilter_conntrack, CTNL_TEST In-Reply-To: <4416B637.4070409@eurodev.net> References: <4416B637.4070409@eurodev.net> Message-ID: I'm using the 0.0.30 version On 3/14/06, Pablo Neira Ayuso wrote: > S?bastien LAVEZE wrote: > > I am developping an application using libnetfilter_conntrack > > I first tried to run the example program ctnl_test, it seems to work > > for events and table dumping but i still get errors and i would like > > to know if it's normal > > Here is the output : > > > > Test for libnetfilter_conntrack > > NFNETLINK answers: Invalid argument > > TEST 1: create conntrack (-22) > > NFNETLINK answers: -EINVAL, make sure ip_conntrack_netlink is loaded > > and you have NET_CAPABILITIES > > TEST 2: dump conntrack table and reset (-524) > > tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 > > sport=44582 dport=3128 src= 172.16.16.16 dst=192.168.31.42 > > sport=3128 dport=44582 [ASSURED] use=1 > > tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 > > sport=44583 dport=3128 src=172.16.16.16 dst=192.168.31.42 > > sport=3128 dport=44583 [ASSURED] use=1 > > tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 > > sport=44581 dport=3128 src=172.16.16.16 dst= 192.168.31.42 > > sport=3128 dport=44581 [ASSURED] use=1 > > tcp 6 431960 ESTABLISHED src=192.168.31.42 dst=172.16.16.16 > > sport=44584 dport=3128 src=172.16.16.16 dst=192.168.31.42 > > sport=3128 dport=44584 [ASSURED] use=1 > > TEST 3: dump conntrack table (0) > > TEST 4: get conntrack (-22) > > TEST 5: update conntrack (-22) > > NFNETLINK answers: Invalid argument > > TEST 6: delete conntrack (-22) > > TEST 7: Waiting for 10 conntrack events > > Event number 1 > > > Event number 2 > > Event number 3 > > Event number 4 > > Event number 5 > > Event number 6 > > Event number 7 > > Event number 8 > > Event number 9 > > Event number 10 > > TEST 7: Received 10 conntrack events (-1) > > Test failed with error -1. Errors=5 > > > > I'm using a 2.6.15 kernel and i have all the needed modules > > installed(ip_conntrack_netlink, ip_conntrack, nfnetlink, > > nfnetlink_log...) > > No, it is not a normal output. What version of libnetfilter_conntrack > are you using? > > -- > Pablo > From aton at packetdropped.org Tue Mar 14 13:54:52 2006 From: aton at packetdropped.org (aton) Date: Tue Mar 14 14:08:47 2006 Subject: netfilter_queue reinjecting packets In-Reply-To: <4414780C.60907@trash.net> References: <1142119489.2987.61.camel@localhost> <20060312151054.5a2020ad.aton@packetdropped.org> <441433C2.6010901@trash.net> <20060312202133.08f8d8ee.aton@packetdropped.org> <4414780C.60907@trash.net> Message-ID: <20060314135452.19a2b219.aton@packetdropped.org> On Sun, 12 Mar 2006 20:35:40 +0100 Patrick McHardy wrote: > aton wrote: > >>nfnql_test already reinjects packets by the call to nfq_issue_verdict. > >>It seems you need to read the documentation .. > >> > > > > sorry, but i cannot find any call to nfq_issue_verdict in this file. > > perhaps you mean nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL); ? > > Yes, thats what I meant. > > > i thought nfq_set_verdict was used to specify a handling routine for the packets... in the case of nfq_test.c set the handling routine for packets to the print_pkt() function. > > am i wrong? > > Yes. nfq_set_verdict is used to tell the kernel to pass the packet > on and possibly exchange it. Both print_pkt and nfq_set_verdict > are called from the packet callback in the example code. > > > what documentation? i would _love_ to read some documentation about libnetfilter_queue. > > i have looked through http://netfilter.org/documentation/index.html#documentation-howto but i cannot find anything specific about libnetfilter_queue... > > I don't think there is specific libnetfilter_queue documentation yet > (but its very simple and exports only a few functions, look at the > code). But we have ip_queue documentation, which should at least > help you understand it better conceptually. > okay i seem to understand how it works a bit better now. what i dont understand is how to get the ethernet header from the library. i tried nfq_get_packet_hw(), but it always returns NULL, is that correct? here is the modified source and the output: http://rafb.net/paste/results/pb8tD850.html greetings -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060314/5283ee6d/attachment.pgp From cnguyen at certicom.com Tue Mar 14 15:54:39 2006 From: cnguyen at certicom.com (Chinh Nguyen) Date: Tue Mar 14 16:18:46 2006 Subject: ip6tables: Unknown error 4294967295 In-Reply-To: <342126766.19325@ustc.edu.cn> References: <342126766.19325@ustc.edu.cn> Message-ID: <4416D92F.40103@certicom.com> net/ipv4/netfilter is a directory in the kernel source code. I took a quick peek at the latest kernel 2.6.16-rc6. I don't think there's any support for the "ROUTE" target in the kernel. Can any netfilter developer confirm? GuanYao Huang wrote: > Hi, I have no net/ipv4/netfilter/ directory. > I am using FC4, iptables-1.3.5. Initially, iptables-1.3.5 does not support ROUTE > module, which is an extension. > There is libip6t_ROUTE.c in PWD/extension directory, but it is not compiled. So I > changed the makefile to include it and add some source code to libip6t_ROUTE.c > which should be the header file for some definitions. That's all I have done. > I don't know if there is something else I should do. > Thanks. > > ????????????????????: > >>From: Chinh Nguyen >>Reply-To: >>To: netfilter-devel@lists.netfilter.org >>Subject: Re: ip6tables: Unknown error 4294967295 >>Date:Fri, 10 Mar 2006 09:57:11 -0500 >> >>GuanYao Huang wrote: >> >>>Hi: >>>I am doing research into iptables-1.3.5, in which I am trying to use ROUTE > > target > >>>which is an extension to the current iptables. >>>I added libip6t_ROUTE.h which makes libip6t_ROUTE.c complied. >>>When using the following command: >>>[root@localhost iptables]# /root/CNGI/iptables-1.3.5/ip6tables -A POSTROUTING > > -t > >>>mangle -o eth0 -p tcp --dport 22 -j ROUTE --oif iptun >>>ip6tables: Unknown error 4294967295 >>> >>>I don't know why. Can you help me? Thanks. >>> >>> >>> >> >>There are 2 parts to netfilter. The modules that are used by iptables to parse >>arguments and communicate them to the kernel and the kernel modules that are >>loaded (or compiled in) with the kernel. >> >>One problem could be that your current kernel does not have support for the >>netfilter module you are trying to used. >> >>I have often seen this error associated with an 'invalid argument' returned by >>the netfilter kernel module. In previous versions of iptables, it will say >>'invalid argument' instead of 'Unknown error 4294967295'. >> >>This is typically caused by an invalid or missing condition causing the >>netfilter kernel to reject the rule in its checkentry function. >> >>Unfortunately, sometimes all the necessary valid conditions are not enumerated >>in any iptables manual or checked by the iptables module. >> >>For example, consider this >> /opt/iptables-1.3.5/bin/iptables -A OUTPUT -m esp --espspi ! 0 -j LOG >>iptables: Unknown error 4294967295 >> >>What is not known is that you have to specify '-p esp' if you will to use > > module > >>'esp', which becomes apparent if you look at the kernel source code: >> >>net/ipv4/netfilter/ipt_esp.c: >>static int >>checkentry(const char *tablename, >> const void *ip_void, >> void *matchinfo, >> unsigned int matchinfosize, >> unsigned int hook_mask) >>{ >> const struct ipt_esp *espinfo = matchinfo; >> const struct ipt_ip *ip = ip_void; >> >> /* Must specify proto == ESP, and no unknown invflags */ >> if (ip->proto != IPPROTO_ESP || (ip->invflags & IPT_INV_PROTO)) { >> duprintf("ipt_esp: Protocol %u != %u\n", ip->proto, >> IPPROTO_ESP); >> return 0; >> } >> >>If this is your problem, you might have to do some source code reading :) >> >> > > > From kaber at trash.net Tue Mar 14 17:46:28 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Mar 14 18:02:12 2006 Subject: netfilter_queue reinjecting packets In-Reply-To: <20060314135452.19a2b219.aton@packetdropped.org> References: <1142119489.2987.61.camel@localhost> <20060312151054.5a2020ad.aton@packetdropped.org> <441433C2.6010901@trash.net> <20060312202133.08f8d8ee.aton@packetdropped.org> <4414780C.60907@trash.net> <20060314135452.19a2b219.aton@packetdropped.org> Message-ID: <4416F364.8090105@trash.net> aton wrote: > okay i seem to understand how it works a bit better now. > what i dont understand is how to get the ethernet header from the library. > i tried nfq_get_packet_hw(), but it always returns NULL, is that correct? The hardware header is only included for packets queued on the input path. From kaber at trash.net Tue Mar 14 17:54:26 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Mar 14 18:10:02 2006 Subject: ip6tables: Unknown error 4294967295 In-Reply-To: <4416D92F.40103@certicom.com> References: <342126766.19325@ustc.edu.cn> <4416D92F.40103@certicom.com> Message-ID: <4416F542.6000404@trash.net> Chinh Nguyen wrote: > net/ipv4/netfilter is a directory in the kernel source code. I took a quick peek > at the latest kernel 2.6.16-rc6. I don't think there's any support for the > "ROUTE" target in the kernel. > > Can any netfilter developer confirm? No, there isn't. But you still should not get this error. Can you send a strace of the failing command please? From kaber at trash.net Tue Mar 14 17:56:33 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Mar 14 18:12:09 2006 Subject: Libnetfilter_conntrack, CTNL_TEST In-Reply-To: References: <4416B637.4070409@eurodev.net> Message-ID: <4416F5C1.4080200@trash.net> S?bastien LAVEZE wrote: > I'm using the 0.0.30 version Try is the libnfnetlink/libnetfilter_conntrack versions from SVN do better, there were some parsing bugs. From kaber at trash.net Wed Mar 15 18:12:26 2006 From: kaber at trash.net (Patrick McHardy) Date: Wed Mar 15 18:28:16 2006 Subject: [LARTC] Possible bug with multiport? In-Reply-To: References: Message-ID: <44184AFA.5020109@trash.net> CCed netfilter-devel. Kirk Reiser wrote: > Hi Folks: I am either using the multiport of the -m or --match option > of iptables in correctly or there is a bug with it. Is anyone else > using it with no problem? This is the way I am trying to use it: > > my_ports=21,25,80 > iptables -t nat -A PREROUTING -i $wan_addr -p tcp -m multiport > --dports $my_ports -j DNAT --to $my_internal_address > > I have used this in the past successfully but that was a few years > ago. I get no errors or warnings it just ignors the ports. The > multiport invokation shows up in an iptables -t nat -L -v however. > The packet and byte counts never get incremented either from zero. > > Any pointers would sure be helpful, having to include a line for every > port check seems wasteful. Please post your kernel version, your iptables version and the output of iptables -vxnL. From shemminger at osdl.org Wed Mar 15 20:23:15 2006 From: shemminger at osdl.org (Stephen Hemminger) Date: Wed Mar 15 20:37:03 2006 Subject: ip_conntrack build warning Message-ID: <20060315112315.6158ad7c@localhost.localdomain> Recent kernels in -mm warn about init section usage in ip_conntrack. The problem is the init_or_cleanup() style makes it impossible to compile time checking hard. WARNING: net/ipv4/netfilter/ip_conntrack.o - Section mismatch: reference to .init.text:ip_conntrack_init from .text.init_or_cleanup after 'init_or_cleanup' (at offset 0x12) From kaber at trash.net Thu Mar 16 00:41:21 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Mar 16 00:55:21 2006 Subject: [LARTC] Possible bug with multiport? In-Reply-To: References: <44184AFA.5020109@trash.net> Message-ID: <4418A621.2040909@trash.net> Kirk Reiser wrote: > Patrick McHardy writes: > > >>Please post your kernel version, your iptables version and the >>output of iptables -vxnL. > > > Woops! The kernel is linux 2.6.15.6 and the iptables is 1.3.3. I > will have to reconstruct the script using multiport so that will take > some time to get the iptables -vxnL. IIRC we had a bug in iptables with revision matching (which affects multiport), could you try the latest version? From kaber at trash.net Thu Mar 16 00:50:38 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Mar 16 01:04:25 2006 Subject: ip_conntrack build warning In-Reply-To: <20060315112315.6158ad7c@localhost.localdomain> References: <20060315112315.6158ad7c@localhost.localdomain> Message-ID: <4418A84E.5080406@trash.net> Stephen Hemminger wrote: > Recent kernels in -mm warn about init section usage in ip_conntrack. > The problem is the init_or_cleanup() style makes it impossible to compile > time checking hard. > > WARNING: net/ipv4/netfilter/ip_conntrack.o - Section mismatch: > reference to .init.text:ip_conntrack_init from .text.init_or_cleanup > after 'init_or_cleanup' (at offset 0x12) The code should be fine, although I'm not a big fan of these functions either. ip_conntrack_init is marked __init and is called from init_or_cleanup, which is not marked, but only on the init path. I'd happily take a patch which cleans up these functions (we have lots of them in the netfilter code). From kaber at trash.net Thu Mar 16 00:52:44 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Mar 16 01:06:41 2006 Subject: ip_conntrack build warning In-Reply-To: <4418A84E.5080406@trash.net> References: <20060315112315.6158ad7c@localhost.localdomain> <4418A84E.5080406@trash.net> Message-ID: <4418A8CC.3040406@trash.net> Patrick McHardy wrote: > The code should be fine, although I'm not a big fan of these > functions either. ip_conntrack_init is marked __init and is > called from init_or_cleanup, which is not marked, but only > on the init path. That should read "only called". From zhaojingmin at hotmail.com Thu Mar 16 03:24:04 2006 From: zhaojingmin at hotmail.com (Jing Min Zhao) Date: Thu Mar 16 03:38:02 2006 Subject: New H.323 conntrack & NAT helper module References: <925A849792280C4E80C5461017A4B8A2032119@mail733.InfraSupportEtc.com><44001CDD.3030305@trash.net><4400A541.9080901@trash.net><440960E4.80601@trash.net> <44158920.30700@trash.net> Message-ID: ----- Original Message ----- From: "Patrick McHardy" To: "Jing Min Zhao" Cc: Sent: Monday, March 13, 2006 10:00 AM Subject: Re: New H.323 conntrack & NAT helper module > > Great, thanks. When you release the new version, please also post > the patch to netfilter-devel and add a Signed-off-by: line (see > Documentation/SubmittingPatches in the kernel source). > > I have finished the new release. The new parameter and T.120 support have been added. But the uncompressed patch is pretty big, about 228KB. Do you think it's ok to post it here? Thanks a lot! Jing Min Zhao From kaber at trash.net Thu Mar 16 09:55:57 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Mar 16 10:11:51 2006 Subject: New H.323 conntrack & NAT helper module In-Reply-To: References: <925A849792280C4E80C5461017A4B8A2032119@mail733.InfraSupportEtc.com><44001CDD.3030305@trash.net><4400A541.9080901@trash.net><440960E4.80601@trash.net> <44158920.30700@trash.net> Message-ID: <4419281D.6050702@trash.net> Jing Min Zhao wrote: >> Great, thanks. When you release the new version, please also post >> the patch to netfilter-devel and add a Signed-off-by: line (see >> Documentation/SubmittingPatches in the kernel source). >> >> > > I have finished the new release. The new parameter and T.120 support > have been added. But the uncompressed patch is pretty big, about 228KB. > Do you think it's ok to post it here? We used to have a pretty small size limit, but I think it has been lifted. Just try if it works. From muddogxp at gmail.com Thu Mar 16 10:14:55 2006 From: muddogxp at gmail.com (mud dog) Date: Thu Mar 16 10:28:57 2006 Subject: "Late REDIRECT" In-Reply-To: <44153615.50408@netboxblue.com> References: <44153615.50408@netboxblue.com> Message-ID: <24dc5ae90603160114k536c351fq@mail.gmail.com> 2006/3/13, Menno Smits : > Hi, > > Just wanted to ask for your opinions on an idea. Please let me know if > you think this is too difficult or crazy. > > We use currently use the REDIRECT target in nat PREROUTING to send > specific traffic to proxies running on our gateway (http, pop3, dns and > smtp). > > This works ok but we have the following problems: > > 1) nat PREROUTING happens before filter FORWARD. If we want to apply > consistent filter rules to outbound traffic regardless of whether it > goes via a transparent proxy or directly out then we can't because the > transproxied traffic never goes thru filter FORWARD. Currently we use a > horrible system of marks set in mangle PREROUTING to work around this. > We reject packets in FORWARD or skip the REDIRECTs in nat based on the > marks set. This is ugly and hard to debug (esp because we also use marks > for traffic shaping). You can REJECT packets in PREROUTING , why set marks, let it to pass FORWARD to see if it must be rejected or not? > > 2) Return traffic from the transparent proxy REDIRECTs has the source IP > and source port of the transparent proxy listener, not the true remote > site and port. This means that when we do accounting for return traffic > (using ULOG in mangle POSTROUTING) the remote host and port are incorrect. > > A possible solution to the above problems is to allow REDIRECTs to occur > in nat POSTROUTING (a "late redirect" for want of a better term). That > way all outbound traffic can pass through filter FORWARD before being > REDIRECTed. The reply NAT for the late REDIRECT would work in a similar > way, being performed before filter FORWARD so that the true source IP > and port is seen there. Hack the REDIRECT target if possible. Malloc a pool in kernel to save the source ip/port, when in POSTROUTING, fetch them. > > Is something like this feasible? How difficult would it be implement? Am > I barking up the wrong tree? > > Regards, > Menno > > Scanned by the NetBox from NetBox Blue > (http://netboxblue.com/) > > > > > > From rv at wallfire.org Thu Mar 16 11:12:26 2006 From: rv at wallfire.org (Herve Eychenne) Date: Thu Mar 16 11:25:22 2006 Subject: Knowing tables change In-Reply-To: <441587CF.4050203@trash.net> References: <44155BE4.80001@wengo.fr> <441587CF.4050203@trash.net> Message-ID: <20060316101226.GO25252@eychenne.org> On Mon, Mar 13, 2006 at 03:55:11PM +0100, Patrick McHardy wrote: > Sebastien Tricaud wrote: > > Hi folks, > > > > I would like to know if there is a way to watch for tables alteration. > > > > I am sure there is a better way than doing "iptables -t table -L" loop > > and compare with previously stored data. > watch -n 1 -d iptables -vxnL :) > > When I look over Internet for possible answers, I can find something > > that would do the job. It seems libpkttnetlink is for this purpose. > > However no developments are latter than 2002. Is it a working stuff and > > nothing has to be improved anymore ? > > > > At a lower level, I can see libnfnetlink is the low level library I can > > also use for it: there is the following quote -> "provides > > open/close/receive functions only to be used by other libraries > > libctnetlink/libpkttnetlink". > There are no notifications for ruleset updates currently, since > ruleset exchange between kernel and userspace isn't built on > netlink and happens as one atomic operation, so the kernel > doesn't know which rules are new. Does listing the rules imply some locking? I guess it can be a costly operation if the ruleset is big... It would at least be nice to send a "signal" (via netlink) when the ruleset is changed, so that third party applications can figure out the changes themselves only when needed (without having to do regular active polls). Herve -- _ (?= Herv? Eychenne //) v_/_ WallFire project: http://www.wallfire.org/ From lishen.cn at gmail.com Wed Mar 15 13:18:35 2006 From: lishen.cn at gmail.com (=?GB2312?B?wO7J6g==?=) Date: Thu Mar 16 14:07:59 2006 Subject: Some Question about Multicast packets go through Netfilter Message-ID: <4cdf02ef0603150418x199f0791i@mail.gmail.com> Dear all and Netfilter Core Team member, As it's the first time to ask question here, So I describe my case very detailedly and the mail is quite long. If it causes any inconvenience, I do feel sorry and please forgive me. I'm a beginner of Netfilter, and recently our lab want to build a teleconference system and using multicast to transfer data. this system uses 2 "communication channels": one is "control channel" using TCP unicast, and the other one is "traffic channel" using UDP multicast. I want to write a patch for Netfilter so that the Netfilter may support that: we only need to open the control channel port to the public, and if there is a connection established, the Netfilter may configure and let the multicast traffic come in automatically. It's much like how Netfilter handle the ftp case. But with the deeper digging, I found some problems. So, I write this mail and ask for help. Step 1 As this scenario is much like the ftp case, I want to handle it like ftp firstly. But from the previous mail-list archive, I found that conntrack module will not handle the multicast packets. So I just think that it may not work if I simply call ip_conntrack_expect_alloc() and make a "conntrack expectation" for multicast. Am I right? For a IPV4 unicast packet, the function module routine in Netfilter will be "Conntrack ->DNAT -> Filter -> SNAT -> Conntrack ". and what the routine will be like for multicast packet. I searched the web, but it seems that there is little topic discussing this issue:( could you give me some hint? Step 2 As conntrack won't work for multicast, I am thinking if I could write a TARGET module for Netfilter. And make a rule with the my-target . If any packet triggers the target module, it will add rule to the filter table and allow all the multicast traffic in from the source IP address. At first, I want to do this in the kernel space and try to find if there is any existing function call (to insert a ipt_entry in filter table) I could use. After checking all the "EXPORT_SYMBOL" in Netfilter files, it seems that there are no such a function call. It's the Iptables who did this task (insert, delete, modify the rule/ipt_entry) and copy the rule from the user space to the kernel space. My question is: Is what I said correct? There is no such a function call in Netfilter to insert or delete the rule? Step 3 At last, I thought out solution to solve the problem: I write a TARGET module run from the kernel space and a user application from the user space. While the target module is triggered, it may notify the user application about Source IP by netlink() socket. if the user application receive the message, it call iptables adding the rule by command shell. I feel that it's very possible to work by this way. I want to ask: Do you have any other better suggestions? I want to hear some advice from the expert :) It's a really a long e-mail and I do hope it will not bore you. with many thanks! with BRs Shen From reiner.siberg at bredband.net Thu Mar 16 09:49:25 2006 From: reiner.siberg at bredband.net (reiner.siberg@bredband.net) Date: Thu Mar 16 14:08:00 2006 Subject: rtsp-conntrack & kernel 2.6.10 Message-ID: <20060316084925.QBME16061.mxfep01.bredband.com@mxfep01> Hi, I am new to netfiltering but still trying to use the rtsp-conntrack with the 2.6.10 kernel. I have found two versions of rtsp-conntrack, one old(?) and one for 2.6.11. As I understand there has been some changes in the netfilter API's between 2.6.10 and 2.6.11. The question is if I should backport the 2.6.11 version or try to add the possible bugfixes to the old one, what is recommended (I think I will be stuck with the 2.6.10 kernel for a while)? Regards From alex at samad.com.au Fri Mar 17 00:55:11 2006 From: alex at samad.com.au (Alexander Samad) Date: Fri Mar 17 01:09:16 2006 Subject: Interesting problem with conntrack and ftp Message-ID: <20060316235511.GA18440@hufpuf.lan1.hme1.samad.com.au> Hi I was resently setting up my new firewall usimng openwrt on a linksys. I got around to setting up my adsl connection and added into my iptables these commands $IPT -t filter -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT $IPT -t filter -A FORWARD -o $WANADSL -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu $IPT -t nat -A POSTROUTING -o $WANADSL -j MASQUERADE which is what I have normally done. http traffic worked well, but ftp of large files, timed out, sign of a mtu problem. It worked when I ftp'ed from the firewall, but not when I did it from behind the firewall. When I did some tcpdumps, I noticed that the second connection created by the client wasn't being clamp'ed. The way I figure it was that the second connection was related to the first one, and thus being consumed by the first line in iptables (above) Once I changed the order of line 1 and 2 every thing worked fine. Now openwrt uses 2.4.30, and my previous firewall used 2.6 and I believe it was setup as shown above and it worked fine. The other difference is that conntrack_ftp is compiled into the kernel. Is this a know feature/bug ? why has it worked in 2.6 and not in 2.4 or is the problem in compiled in and as a module Alex -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : /pipermail/netfilter-devel/attachments/20060317/622b0222/attachment.pgp From beunlovable at gmail.com Fri Mar 17 10:41:39 2006 From: beunlovable at gmail.com (David Vogt) Date: Fri Mar 17 10:55:39 2006 Subject: Modification of MTU/MSS Message-ID: <859616420603170141p6bff0e33j@mail.gmail.com> Dear all, I modify packets with libipq. Specifically, they grow larger. Now it seems I ran into problems concerning the maximum packet size (as determined by MTU). Now my question is how to solve this problem, respectively if I got anything wrong. 1) As far as I understand, packet fragmentation is done by the kernel. Question: Is it done before libipq packet mangeling or afterwards? 2) Is there something like the TCPMSS target (that modifies the MSS) for modifying the MTU? Thanks. David. From dim at openvz.org Fri Mar 17 12:46:19 2006 From: dim at openvz.org (Dmitry Mishin) Date: Fri Mar 17 13:57:59 2006 Subject: [PATCH] futher {ip,ip6,arp}_tables unification Message-ID: <200603171446.19856.dim@openvz.org> This patch moves {ip,ip6,arp}t_entry_{match,target} definitions to x_tables.h. This move simplifies code and future compatibility fixes. Signed-off-by: Dmitry Mishin Acked-off-by: Kirill Korotaev -- Thanks, Dmitry. -------------- next part -------------- diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 46a0f97..ad72a4f 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -4,6 +4,62 @@ #define XT_FUNCTION_MAXNAMELEN 30 #define XT_TABLE_MAXNAMELEN 32 +struct xt_entry_match +{ + union { + struct { + u_int16_t match_size; + + /* Used by userspace */ + char name[XT_FUNCTION_MAXNAMELEN-1]; + + u_int8_t revision; + } user; + struct { + u_int16_t match_size; + + /* Used inside the kernel */ + struct xt_match *match; + } kernel; + + /* Total length */ + u_int16_t match_size; + } u; + + unsigned char data[0]; +}; + +struct xt_entry_target +{ + union { + struct { + u_int16_t target_size; + + /* Used by userspace */ + char name[XT_FUNCTION_MAXNAMELEN-1]; + + u_int8_t revision; + } user; + struct { + u_int16_t target_size; + + /* Used inside the kernel */ + struct xt_target *target; + } kernel; + + /* Total length */ + u_int16_t target_size; + } u; + + unsigned char data[0]; +}; + +struct xt_standard_target +{ + struct xt_entry_target target; + int verdict; +}; + /* The argument to IPT_SO_GET_REVISION_*. Returns highest revision * kernel supports, if >= revision. */ struct xt_get_revision diff --git a/include/linux/netfilter_arp/arp_tables.h b/include/linux/netfilter_arp/arp_tables.h index fd21796..3f89d2f 100644 --- a/include/linux/netfilter_arp/arp_tables.h +++ b/include/linux/netfilter_arp/arp_tables.h @@ -65,35 +65,8 @@ struct arpt_arp { u_int16_t invflags; }; -struct arpt_entry_target -{ - union { - struct { - u_int16_t target_size; - - /* Used by userspace */ - char name[ARPT_FUNCTION_MAXNAMELEN-1]; - u_int8_t revision; - } user; - struct { - u_int16_t target_size; - - /* Used inside the kernel */ - struct arpt_target *target; - } kernel; - - /* Total length */ - u_int16_t target_size; - } u; - - unsigned char data[0]; -}; - -struct arpt_standard_target -{ - struct arpt_entry_target target; - int verdict; -}; +#define arpt_entry_target xt_entry_target +#define arpt_standard_target xt_standard_target /* Values for "flag" field in struct arpt_ip (general arp structure). * No flags defined yet. diff --git a/include/linux/netfilter_ipv4/ip_tables.h b/include/linux/netfilter_ipv4/ip_tables.h index 76ba24b..56eebc6 100644 --- a/include/linux/netfilter_ipv4/ip_tables.h +++ b/include/linux/netfilter_ipv4/ip_tables.h @@ -52,61 +52,9 @@ struct ipt_ip { u_int8_t invflags; }; -struct ipt_entry_match -{ - union { - struct { - u_int16_t match_size; - - /* Used by userspace */ - char name[IPT_FUNCTION_MAXNAMELEN-1]; - - u_int8_t revision; - } user; - struct { - u_int16_t match_size; - - /* Used inside the kernel */ - struct ipt_match *match; - } kernel; - - /* Total length */ - u_int16_t match_size; - } u; - - unsigned char data[0]; -}; - -struct ipt_entry_target -{ - union { - struct { - u_int16_t target_size; - - /* Used by userspace */ - char name[IPT_FUNCTION_MAXNAMELEN-1]; - - u_int8_t revision; - } user; - struct { - u_int16_t target_size; - - /* Used insid