From kaber at trash.net Tue Aug 1 08:39:50 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 1 09:09:20 2006 Subject: [NETFILTER]: SIP helper: expect RTP streams in both directions Message-ID: <44CEF736.4020105@trash.net> Hi Dave, following are two small fixes for 2.6.18. The second patch fixes missing string validation in two netfilter modules. James sent a similar patch for SECMARK to -stable, in my opinion this is not necessary since CAP_NET_ADMIN in practice always means root and mainline doesn't support virtualization yet. But if you feel otherwise please pass it on. Thanks. -------------- next part -------------- [NETFILTER]: SIP helper: expect RTP streams in both directions Since we don't know in which direction the first packet will arrive, we need to create one expectation for each direction, which is currently prevented by max_expected beeing set to 1. Signed-off-by: Patrick McHardy --- commit e8b121382d0690c0d92b6134bb60e7626cd49284 tree 2a85a79242cb160e35d207d504886e770db2ed6f parent 49b1e3ea19b1c95c2f012b8331ffb3b169e4c042 author Patrick McHardy Tue, 01 Aug 2006 07:26:21 +0200 committer Patrick McHardy Tue, 01 Aug 2006 07:26:21 +0200 net/ipv4/netfilter/ip_conntrack_sip.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/netfilter/ip_conntrack_sip.c b/net/ipv4/netfilter/ip_conntrack_sip.c index fc87ce0..4f222d6 100644 --- a/net/ipv4/netfilter/ip_conntrack_sip.c +++ b/net/ipv4/netfilter/ip_conntrack_sip.c @@ -442,7 +442,7 @@ static int __init init(void) sip[i].tuple.src.u.udp.port = htons(ports[i]); sip[i].mask.src.u.udp.port = 0xFFFF; sip[i].mask.dst.protonum = 0xFF; - sip[i].max_expected = 1; + sip[i].max_expected = 2; sip[i].timeout = 3 * 60; /* 3 minutes */ sip[i].me = THIS_MODULE; sip[i].help = sip_help; From kaber at trash.net Tue Aug 1 08:39:55 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 1 09:09:21 2006 Subject: [NETFILTER]: xt_hashlimit/xt_string: missing string validation Message-ID: <44CEF73B.8030601@trash.net> -------------- next part -------------- [NETFILTER]: xt_hashlimit/xt_string: missing string validation The hashlimit table name and the textsearch algorithm need to be terminated, the textsearch pattern length must not exceed the maximum size. Signed-off-by: Patrick McHardy --- commit 873fed085b72ad38b565906676ec8fd44e27bb25 tree 0c7cf87c9a95838d3360f11194a960eb915085b3 parent e8b121382d0690c0d92b6134bb60e7626cd49284 author Patrick McHardy Tue, 01 Aug 2006 08:24:03 +0200 committer Patrick McHardy Tue, 01 Aug 2006 08:24:03 +0200 net/ipv4/netfilter/ipt_hashlimit.c | 3 +++ net/netfilter/xt_string.c | 5 ++++- 2 files changed, 7 insertions(+), 1 deletions(-) diff --git a/net/ipv4/netfilter/ipt_hashlimit.c b/net/ipv4/netfilter/ipt_hashlimit.c index 92980ab..6b66244 100644 --- a/net/ipv4/netfilter/ipt_hashlimit.c +++ b/net/ipv4/netfilter/ipt_hashlimit.c @@ -508,6 +508,9 @@ hashlimit_checkentry(const char *tablena if (!r->cfg.expire) return 0; + if (r->name[sizeof(r->name) - 1] != '\0') + return 0; + /* This is the best we've got: We cannot release and re-grab lock, * since checkentry() is called before ip_tables.c grabs ipt_mutex. * We also cannot grab the hashtable spinlock, since htable_create will diff --git a/net/netfilter/xt_string.c b/net/netfilter/xt_string.c index 0ebb6ac..d8e3891 100644 --- a/net/netfilter/xt_string.c +++ b/net/netfilter/xt_string.c @@ -55,7 +55,10 @@ static int checkentry(const char *tablen /* Damn, can't handle this case properly with iptables... */ if (conf->from_offset > conf->to_offset) return 0; - + if (conf->algo[XT_STRING_MAX_ALGO_NAME_SIZE - 1] != '\0') + return 0; + if (conf->patlen > XT_STRING_MAX_PATTERN_SIZE) + return 0; ts_conf = textsearch_prepare(conf->algo, conf->pattern, conf->patlen, GFP_KERNEL, TS_AUTOLOAD); if (IS_ERR(ts_conf)) From davem at davemloft.net Tue Aug 1 08:47:47 2006 From: davem at davemloft.net (David Miller) Date: Tue Aug 1 09:17:20 2006 Subject: [NETFILTER]: SIP helper: expect RTP streams in both directions In-Reply-To: <44CEF736.4020105@trash.net> References: <44CEF736.4020105@trash.net> Message-ID: <20060731.234747.88341516.davem@davemloft.net> From: Patrick McHardy Date: Tue, 01 Aug 2006 08:39:50 +0200 > following are two small fixes for 2.6.18. The second patch fixes > missing string validation in two netfilter modules. James sent a > similar patch for SECMARK to -stable, in my opinion this is not > necessary since CAP_NET_ADMIN in practice always means root and > mainline doesn't support virtualization yet. But if you feel > otherwise please pass it on. Thanks. I applied James's patch because we should not make life overly difficult for the openvz folks. I've applied both of your patches too, thanks a lot Patrick. From kaber at trash.net Tue Aug 1 08:53:31 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 1 09:22:51 2006 Subject: [NETFILTER]: SIP helper: expect RTP streams in both directions In-Reply-To: <20060731.234747.88341516.davem@davemloft.net> References: <44CEF736.4020105@trash.net> <20060731.234747.88341516.davem@davemloft.net> Message-ID: <44CEFA6B.5020406@trash.net> David Miller wrote: > From: Patrick McHardy > Date: Tue, 01 Aug 2006 08:39:50 +0200 > > >>following are two small fixes for 2.6.18. The second patch fixes >>missing string validation in two netfilter modules. James sent a >>similar patch for SECMARK to -stable, in my opinion this is not >>necessary since CAP_NET_ADMIN in practice always means root and >>mainline doesn't support virtualization yet. But if you feel >>otherwise please pass it on. Thanks. > > > I applied James's patch because we should not make life > overly difficult for the openvz folks. I only meant -stable with "unnecessary", it should be fixed of course. > I've applied both of your patches too, thanks a lot Patrick. Thanks. From jmorris at namei.org Tue Aug 1 17:50:57 2006 From: jmorris at namei.org (James Morris) Date: Tue Aug 1 18:20:35 2006 Subject: [NETFILTER]: xt_hashlimit/xt_string: missing string validation In-Reply-To: <44CEF73B.8030601@trash.net> References: <44CEF73B.8030601@trash.net> Message-ID: > Signed-off-by: Patrick McHardy Acked-by: James Morris -- James Morris From bazsi at balabit.hu Tue Aug 1 19:10:09 2006 From: bazsi at balabit.hu (Balazs Scheidler) Date: Tue Aug 1 20:08:16 2006 Subject: [patch] RFC: matching interface groups Message-ID: <1154452209.6395.77.camel@bzorp.balabit> Hi, I would like to easily match a set of dynamically created interfaces from my packet filter rules. The attached patch forms the basis of my implementation and I would like to know whether something like this is mergeable to mainline. The use-case is as follows: * I have two different subsystems creating interfaces dynamically (for example pptpd and serial pppd lines, each creating dynamic pppX interfaces), * I would like to assign a different set of iptables rules for these clients, * I would like to react to a new interface being added to a specific set in a userspace application, The reasons I see this needs new kernel functionality: * iptables supports wildcard interface matching (for example "iptables -i ppp+"), but as the names of the interfaces used by PPTPD and PPPD cannot be distinguished this way, this is not enough, * Reloading the iptables ruleset everytime a new interface comes up is not really feasible, as it abrupts packet processing, and validating the ruleset in the kernel can take significant amount of time, * the kernel change is very simple, adapting userspace to this change is also very simple, and in userspace various software packages can easily interoperate with each-other once this is merged. The implementation: Each interface can belong to a single "group" at a time, an interface comes up without being a member in any of the groups. Userspace can assign interfaces to groups after being created, this would typically be performed in /etc/ppp/ip-up.d (and similar) scripts. In spirit "interface group" is somewhat similar to the "routing protocol" field for routing entries, which contains information on which routing daemon was responsible for adding the given route entry. Things to be done if you like this approach: * interface group match in iptables, * support for naming interface groups in userspace, a'la routing protocols, * emitting a netlink notification when the group of an interface changes, * possibly converting the "ip link" command to use NETLINK messages, instead of using ioctl() What do you think? kernel patch: ------------- * add a numeric ID to each interface in the system, denoting its "interface group", index df0cdd4..19a103a 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -736,6 +736,8 @@ enum #define IFLA_WEIGHT IFLA_WEIGHT IFLA_OPERSTATE, IFLA_LINKMODE, +#define IFLA_IFGROUP IFLA_IFGROUP + IFLA_IFGROUP, __IFLA_MAX }; diff --git a/include/linux/sockios.h b/include/linux/sockios.h diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 3fcfa9c..26849af 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -279,6 +279,11 @@ static int rtnetlink_fill_ifinfo(struct u32 iflink = dev->iflink; RTA_PUT(skb, IFLA_LINK, sizeof(iflink), &iflink); } + + if (dev->ifgroup) { + u32 ifgroup = dev->ifgroup; + RTA_PUT(skb, IFLA_IFGROUP, sizeof(ifgroup), &ifgroup); + } if (dev->qdisc_sleeping) RTA_PUT(skb, IFLA_QDISC, @@ -459,6 +464,12 @@ static int do_setlink(struct sk_buff *sk dev->link_mode = *((u8 *) RTA_DATA(ida[IFLA_LINKMODE - 1])); write_unlock_bh(&dev_base_lock); } + + if (ida[IFLA_IFGROUP - 1]) { + if (ida[IFLA_IFGROUP - 1]->rta_len != RTA_LENGTH(sizeof(u32))) + goto out; + dev->ifgroup = *((u32 *) RTA_DATA(ida[IFLA_IFGROUP - 1])); + } if (ifm->ifi_index >= 0 && ida[IFLA_IFNAME - 1]) { char ifname[IFNAMSIZ]; ip route patch: --------------- * added a "group" option to ip link set to make it possible to set this id, and a way to print this option diff --git a/ip/iplink.c b/ip/iplink.c index ffc9f06..e694475 100644 --- a/ip/iplink.c +++ b/ip/iplink.c @@ -26,6 +26,7 @@ #include #include #include +#include #include "rt_names.h" #include "utils.h" @@ -44,6 +45,7 @@ void iplink_usage(void) fprintf(stderr, " promisc { on | off } |\n"); fprintf(stderr, " trailers { on | off } |\n"); fprintf(stderr, " txqueuelen PACKETS |\n"); + fprintf(stderr, " group GROUP |\n"); fprintf(stderr, " name NEWNAME |\n"); fprintf(stderr, " address LLADDR | broadcast LLADDR |\n"); fprintf(stderr, " mtu MTU }\n"); @@ -174,6 +176,28 @@ static int set_mtu(const char *dev, int return 0; } +static int set_group(const char *dev, int ifgroup) +{ + struct { + struct nlmsghdr n; + struct ifinfomsg ifi; + char buf[256]; + } req; + + memset(&req, 0, sizeof(req)); + req.n.nlmsg_len = NLMSG_LENGTH(sizeof(req.ifi)); + req.n.nlmsg_flags = NLM_F_REQUEST; + req.n.nlmsg_type = RTM_SETLINK; + + req.ifi.ifi_index = -1; + + addattr_l(&req.n, sizeof(req), IFLA_IFNAME, dev, strlen(dev)+1); + addattr_l(&req.n, sizeof(req), IFLA_IFGROUP, &ifgroup, sizeof(ifgroup)); + if (rtnl_talk(&rth, &req.n, 0, 0, NULL, NULL, NULL) < 0) + return -1; + return 0; +} + static int get_address(const char *dev, int *htype) { struct ifreq ifr; @@ -257,6 +281,7 @@ static int do_set(int argc, char **argv) __u32 mask = 0; __u32 flags = 0; int qlen = -1; + int group = 0; int mtu = -1; char *newaddr = NULL; char *newbrd = NULL; @@ -289,6 +314,12 @@ static int do_set(int argc, char **argv) duparg("txqueuelen", *argv); if (get_integer(&qlen, *argv, 0)) invarg("Invalid \"txqueuelen\" value\n", *argv); + } else if (matches(*argv, "group") == 0) { + NEXT_ARG(); + if (group != 0) + duparg("group", *argv); + if (get_integer(&group, *argv, 0) || group == 0) + invarg("Invalid \"group\" value\n", *argv); } else if (strcmp(*argv, "mtu") == 0) { NEXT_ARG(); if (mtu != -1) @@ -406,6 +437,10 @@ static int do_set(int argc, char **argv) return -1; } } + if (group) { + if (set_group(dev, group) < 0) + return -1; + } if (mask) return do_chflags(dev, flags, mask); return 0; -- Bazsi From shemminger at osdl.org Tue Aug 1 20:29:19 2006 From: shemminger at osdl.org (Stephen Hemminger) Date: Tue Aug 1 20:58:52 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <1154452209.6395.77.camel@bzorp.balabit> References: <1154452209.6395.77.camel@bzorp.balabit> Message-ID: <20060801112919.765eb831@localhost.localdomain> On Tue, 01 Aug 2006 19:10:09 +0200 Balazs Scheidler wrote: > Hi, > > I would like to easily match a set of dynamically created interfaces > from my packet filter rules. The attached patch forms the basis of my > implementation and I would like to know whether something like this is > mergeable to mainline. > > The use-case is as follows: > > * I have two different subsystems creating interfaces dynamically (for > example pptpd and serial pppd lines, each creating dynamic pppX > interfaces), > * I would like to assign a different set of iptables rules for these > clients, > * I would like to react to a new interface being added to a specific set > in a userspace application, > > The reasons I see this needs new kernel functionality: > > * iptables supports wildcard interface matching (for example "iptables > -i ppp+"), but as the names of the interfaces used by PPTPD and PPPD > cannot be distinguished this way, this is not enough, > * Reloading the iptables ruleset everytime a new interface comes up is > not really feasible, as it abrupts packet processing, and validating the > ruleset in the kernel can take significant amount of time, > * the kernel change is very simple, adapting userspace to this change is > also very simple, and in userspace various software packages can easily > interoperate with each-other once this is merged. > > The implementation: > > Each interface can belong to a single "group" at a time, an interface > comes up without being a member in any of the groups. > > Userspace can assign interfaces to groups after being created, this > would typically be performed in /etc/ppp/ip-up.d (and similar) scripts. > > In spirit "interface group" is somewhat similar to the "routing > protocol" field for routing entries, which contains information on which > routing daemon was responsible for adding the given route entry. > > Things to be done if you like this approach: > > * interface group match in iptables, > * support for naming interface groups in userspace, a'la routing > protocols, > * emitting a netlink notification when the group of an interface > changes, > * possibly converting the "ip link" command to use NETLINK messages, > instead of using ioctl() > > What do you think? I like the concept, but it probably needs more review. There is a bigger issue, which is how should the network device namespace exist? There are virtualization efforts, that want to virtualize it, and network device names have always lived in a parallel universe. I don't expect your patch to solve this... From kernel at linuxace.com Tue Aug 1 20:46:55 2006 From: kernel at linuxace.com (Phil Oester) Date: Tue Aug 1 21:16:23 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <1154452209.6395.77.camel@bzorp.balabit> References: <1154452209.6395.77.camel@bzorp.balabit> Message-ID: <20060801184655.GA7452@linuxace.com> On Tue, Aug 01, 2006 at 07:10:09PM +0200, Balazs Scheidler wrote: > Each interface can belong to a single "group" at a time, an interface > comes up without being a member in any of the groups. > > Userspace can assign interfaces to groups after being created, this > would typically be performed in /etc/ppp/ip-up.d (and similar) scripts. Since in this scenario userspace is able to determine ppp vs pptp, could you not also do something like have an inbound_ppp and inbound_pptp chain, then jump to the appropriate chain depending on type? If you need per-interface rules, then create an inbound_pppX chain, populate it with rules, then jump to that chain if -i pppX. In ip-down, just delete the chain as well as the jump. Phil From schuster.sven at gmx.de Tue Aug 1 21:18:05 2006 From: schuster.sven at gmx.de (Sven Schuster) Date: Tue Aug 1 21:47:44 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <20060801184655.GA7452@linuxace.com> References: <1154452209.6395.77.camel@bzorp.balabit> <20060801184655.GA7452@linuxace.com> Message-ID: <20060801191805.GA28649@zion.homelinux.com> Hi Phil, On Tue, Aug 01, 2006 at 11:46:55AM -0700, Phil Oester told us: > Since in this scenario userspace is able to determine ppp vs pptp, > could you not also do something like have an inbound_ppp and inbound_pptp > chain, then jump to the appropriate chain depending on type? If you > need per-interface rules, then create an inbound_pppX chain, populate > it with rules, then jump to that chain if -i pppX. In ip-down, just > delete the chain as well as the jump. if I understood Balazs correctly, one of the things he wanted to avoid is addition/deletion of iptables rules on every pppX interface up/down as this would require the complete chain (say, INPUT or OUTPUT) to be "downloaded" to userspace, modified and then again "uploaded" to the kernel. At least until iptables redesign to allow replacement/insertion/deletion of single rules is completed which if started at all will take quite some more time :-) Sven > Phil > -- Linux zion.homelinux.com 2.6.17-rc5-mm1_35 #35 Tue May 30 14:11:06 CEST 2006 i686 athlon i386 GNU/Linux 21:13:05 up 19:46, 2 users, load average: 0.22, 0.28, 0.27 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060801/e3a81360/attachment.pgp From kernel at linuxace.com Wed Aug 2 04:43:22 2006 From: kernel at linuxace.com (Phil Oester) Date: Wed Aug 2 05:12:49 2006 Subject: [PATCH] update quota match for xtables + fix -D bug Message-ID: <20060802024322.GA15127@linuxace.com> The iptables quota match has not been updated to reflect the new xtables location/structures in 2.6.18-rc. In addition, it has a bug which makes it impossible to delete a rule once added. E.g.: # iptables -A foo -m quota --quota 1111 -j RETURN # iptables -D foo -m quota --quota 1111 -j RETURN iptables: Bad rule (does a matching rule exist in that chain?) Below patch fixes both issues and resolve bugzilla #496. Phil -------------- next part -------------- diff -ru ipt-orig/extensions/libipt_quota.c ipt-new/extensions/libipt_quota.c --- ipt-orig/extensions/libipt_quota.c 2005-02-14 05:13:04.000000000 -0800 +++ ipt-new/extensions/libipt_quota.c 2006-08-01 19:05:28.000000000 -0700 @@ -3,12 +3,13 @@ * * Sam Johnston */ +#include #include #include #include #include -#include +#include #include static struct option opts[] = { @@ -28,7 +29,7 @@ static void print(const struct ipt_ip *ip, const struct ipt_entry_match *match, int numeric) { - struct ipt_quota_info *q = (struct ipt_quota_info *) match->data; + struct xt_quota_info *q = (struct xt_quota_info *) match->data; printf("quota: %llu bytes", (unsigned long long) q->quota); } @@ -36,7 +37,7 @@ static void save(const struct ipt_ip *ip, const struct ipt_entry_match *match) { - struct ipt_quota_info *q = (struct ipt_quota_info *) match->data; + struct xt_quota_info *q = (struct xt_quota_info *) match->data; printf("--quota %llu ", (unsigned long long) q->quota); } @@ -62,7 +63,7 @@ const struct ipt_entry *entry, unsigned int *nfcache, struct ipt_entry_match **match) { - struct ipt_quota_info *info = (struct ipt_quota_info *) (*match)->data; + struct xt_quota_info *info = (struct xt_quota_info *) (*match)->data; switch (c) { case '1': @@ -89,8 +90,8 @@ .next = NULL, .name = "quota", .version = IPTABLES_VERSION, - .size = IPT_ALIGN(sizeof (struct ipt_quota_info)), - .userspacesize = IPT_ALIGN(sizeof (struct ipt_quota_info)), + .size = IPT_ALIGN(sizeof (struct xt_quota_info)), + .userspacesize = offsetof(struct xt_quota_info, quota), .help = &help, .parse = &parse, .final_check = &final_check, diff -ru ipt-orig/extensions/.quota-test ipt-new/extensions/.quota-test --- ipt-orig/extensions/.quota-test 2001-12-03 14:22:55.000000000 -0800 +++ ipt-new/extensions/.quota-test 2006-08-01 17:48:12.000000000 -0700 @@ -1,3 +1,3 @@ #!/bin/sh -[ -f $KERNEL_DIR/include/linux/netfilter_ipv4/ipt_quota.h ] && echo quota +[ -f $KERNEL_DIR/include/linux/netfilter/xt_quota.h ] && echo quota From bazsi at balabit.hu Wed Aug 2 09:04:29 2006 From: bazsi at balabit.hu (Balazs Scheidler) Date: Wed Aug 2 09:33:58 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <20060801191805.GA28649@zion.homelinux.com> References: <1154452209.6395.77.camel@bzorp.balabit> <20060801184655.GA7452@linuxace.com> <20060801191805.GA28649@zion.homelinux.com> Message-ID: <1154502269.6241.11.camel@bzorp.balabit> On Tue, 2006-08-01 at 21:18 +0200, Sven Schuster wrote: > Hi Phil, > > On Tue, Aug 01, 2006 at 11:46:55AM -0700, Phil Oester told us: > > Since in this scenario userspace is able to determine ppp vs pptp, > > could you not also do something like have an inbound_ppp and inbound_pptp > > chain, then jump to the appropriate chain depending on type? If you > > need per-interface rules, then create an inbound_pppX chain, populate > > it with rules, then jump to that chain if -i pppX. In ip-down, just > > delete the chain as well as the jump. > > if I understood Balazs correctly, one of the things he wanted to > avoid is addition/deletion of iptables rules on every pppX interface > up/down Exactly. > as this would require the complete chain (say, INPUT or > OUTPUT) to be "downloaded" to userspace, modified and then again > "uploaded" to the kernel. At least until iptables redesign to > allow replacement/insertion/deletion of single rules is completed > which if started at all will take quite some more time :-) Iptables operates on a per-table basis, so it is not only the INPUT or OUTPUT chain that needs to be down and uploaded, but the whole filter table. And in addition, in my humble opinion the iptables ruleset should be up to the user to maintain, once some kind of automatism starts to add/remove rules on the fly, it becomes more difficult to do other changes to add independent rules to the table. For example the user needs to save the current ruleset using iptables-save, then modify the resulting file, and then load it again. If the ruleset is generated as it happens with a lot of tools, this might not be so easy. -- Bazsi From bazsi at balabit.hu Wed Aug 2 09:18:44 2006 From: bazsi at balabit.hu (Balazs Scheidler) Date: Wed Aug 2 09:48:54 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <20060801112919.765eb831@localhost.localdomain> References: <1154452209.6395.77.camel@bzorp.balabit> <20060801112919.765eb831@localhost.localdomain> Message-ID: <1154503124.6241.21.camel@bzorp.balabit> On Tue, 2006-08-01 at 11:29 -0700, Stephen Hemminger wrote: > On Tue, 01 Aug 2006 19:10:09 +0200 > Balazs Scheidler wrote: > > > Hi, > > > > I would like to easily match a set of dynamically created interfaces > > from my packet filter rules. The attached patch forms the basis of my > > implementation and I would like to know whether something like this is > > mergeable to mainline. > > > > The use-case is as follows: > > > > * I have two different subsystems creating interfaces dynamically (for > > example pptpd and serial pppd lines, each creating dynamic pppX > > interfaces), > > * I would like to assign a different set of iptables rules for these > > clients, > > * I would like to react to a new interface being added to a specific set > > in a userspace application, > > > > The reasons I see this needs new kernel functionality: > > > > * iptables supports wildcard interface matching (for example "iptables > > -i ppp+"), but as the names of the interfaces used by PPTPD and PPPD > > cannot be distinguished this way, this is not enough, > > * Reloading the iptables ruleset everytime a new interface comes up is > > not really feasible, as it abrupts packet processing, and validating the > > ruleset in the kernel can take significant amount of time, > > * the kernel change is very simple, adapting userspace to this change is > > also very simple, and in userspace various software packages can easily > > interoperate with each-other once this is merged. > > > > The implementation: > > > > Each interface can belong to a single "group" at a time, an interface > > comes up without being a member in any of the groups. > > > > Userspace can assign interfaces to groups after being created, this > > would typically be performed in /etc/ppp/ip-up.d (and similar) scripts. > > > > In spirit "interface group" is somewhat similar to the "routing > > protocol" field for routing entries, which contains information on which > > routing daemon was responsible for adding the given route entry. > > > > [snip] > I like the concept, but it probably needs more review. > > There is a bigger issue, which is how should the network device namespace > exist? There are virtualization efforts, that want to virtualize it, > and network device names have always lived in a parallel universe. > I don't expect your patch to solve this... I have read the OLS paper on virtualization, it states that the current state of affairs is that struct net_device will be assigned to one specific namespace. As my change changes struct net_device itself, I expect to work without problems when virtualization comes, the interface group can be interpreted on a per-namespace basis. There probably will be several iptables rulesets when the time comes, one for each namespace, but again, struct net_device will be assigned to a namespace, and the proper iptables tables will be iterated based on the net_device assignment. Am I missing something? -- Bazsi From azez at ufomechanic.net Wed Aug 2 11:01:05 2006 From: azez at ufomechanic.net (Amin Azez) Date: Wed Aug 2 11:30:57 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <1154502269.6241.11.camel@bzorp.balabit> References: <1154452209.6395.77.camel@bzorp.balabit> <20060801184655.GA7452@linuxace.com> <20060801191805.GA28649@zion.homelinux.com> <1154502269.6241.11.camel@bzorp.balabit> Message-ID: <44D069D1.4040702@ufomechanic.net> * Balazs Scheidler wrote, On 02/08/06 08:04: > On Tue, 2006-08-01 at 21:18 +0200, Sven Schuster wrote: >> as this would require the complete chain (say, INPUT or >> OUTPUT) to be "downloaded" to userspace, modified and then again >> "uploaded" to the kernel. At least until iptables redesign to >> allow replacement/insertion/deletion of single rules is completed >> which if started at all will take quite some more time :-) > > Iptables operates on a per-table basis, so it is not only the INPUT or > OUTPUT chain that needs to be down and uploaded, but the whole filter > table. > > And in addition, in my humble opinion the iptables ruleset should be up > to the user to maintain, once some kind of automatism starts to > add/remove rules on the fly, it becomes more difficult to do other > changes to add independent rules to the table. For example the user > needs to save the current ruleset using iptables-save, then modify the > resulting file, and then load it again. If the ruleset is generated as > it happens with a lot of tools, this might not be so easy. > Even without this scenario it is not easily safe; if two interfaces chanegd at the same time, two copies of iptables would be downloaded to user space, both modified differently and the last one to be uploaded would win, the other one loosing its changes. This has bitten me and is one of my reasons for liking ipt_condition Sam From azez at ufomechanic.net Wed Aug 2 11:01:05 2006 From: azez at ufomechanic.net (Amin Azez) Date: Wed Aug 2 11:34:43 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <1154502269.6241.11.camel@bzorp.balabit> References: <1154452209.6395.77.camel@bzorp.balabit> <20060801184655.GA7452@linuxace.com> <20060801191805.GA28649@zion.homelinux.com> <1154502269.6241.11.camel@bzorp.balabit> Message-ID: <44D069D1.4040702@ufomechanic.net> * Balazs Scheidler wrote, On 02/08/06 08:04: > On Tue, 2006-08-01 at 21:18 +0200, Sven Schuster wrote: >> as this would require the complete chain (say, INPUT or >> OUTPUT) to be "downloaded" to userspace, modified and then again >> "uploaded" to the kernel. At least until iptables redesign to >> allow replacement/insertion/deletion of single rules is completed >> which if started at all will take quite some more time :-) > > Iptables operates on a per-table basis, so it is not only the INPUT or > OUTPUT chain that needs to be down and uploaded, but the whole filter > table. > > And in addition, in my humble opinion the iptables ruleset should be up > to the user to maintain, once some kind of automatism starts to > add/remove rules on the fly, it becomes more difficult to do other > changes to add independent rules to the table. For example the user > needs to save the current ruleset using iptables-save, then modify the > resulting file, and then load it again. If the ruleset is generated as > it happens with a lot of tools, this might not be so easy. > Even without this scenario it is not easily safe; if two interfaces chanegd at the same time, two copies of iptables would be downloaded to user space, both modified differently and the last one to be uploaded would win, the other one loosing its changes. This has bitten me and is one of my reasons for liking ipt_condition Sam From varun at rocsys.com Thu Aug 3 13:44:59 2006 From: varun at rocsys.com (varun) Date: Wed Aug 2 14:07:42 2006 Subject: doc for iptables Message-ID: <44D1E1BB.40504@rocsys.com> Hi all, Can anyone tell me where can i find the low level design documents of iptables project. I have gone through the code and i wanted to know why some features have been designed in a particular way. So i want the low level and high level docs for code understanding. Please tell me where to find it. Varun From netfilter at mm-double.de Wed Aug 2 20:12:12 2006 From: netfilter at mm-double.de (Maik Hentsche) Date: Wed Aug 2 20:41:47 2006 Subject: bugreport ipt_CLUSTERIP In-Reply-To: <44CABEBC.9050101@trash.net> References: <20060725082618.8zdzyrc36s0ksgsk@mail.tu-chemnitz.de> <44C8CB58.60709@trash.net> <1154078751.44c9d81ff2c50@www.domainfactory-webmail.de> <44CABEBC.9050101@trash.net> Message-ID: <20060802201212.3089b23b@zeus.subnet.mm-double.de> Patrick McHardy wrote: > I tried to reproduce this (by first adding your rule, then deleting > it, adding it again, rebooting), but got no crash. It did not happen again for me either, even though, I tried a lot to reproduce it. > Did you added/deleted/flushed/... any CLUSTERIP rules before the crashes? I added the rule, then tried to ping the virtual IP, which did not work (no arp response). Then I deleted the rule and did nothing else with CLUSTERIP. The machine was used by another developer a week ago and I do not know, if he did anything with iptables, but he hardly used CLUSTERIP. I'm very sorry to not be able to help more. so long Maik -- Der Verstand ist wie eine Fahrkarte. Sie hat nur Sinn wenn man sie benutzt. (Ernst R. Hauschka (*1926), deutscher Essayist, Aphoristiker und Bibliothekar) From pch at packetconsulting.pl Wed Aug 2 23:22:32 2006 From: pch at packetconsulting.pl (Piotr Chytla) Date: Wed Aug 2 23:56:39 2006 Subject: u32 patch Message-ID: <20060802212232.GA29168@packetconsulting.pl> Hi Here are some small patch for u32 match, to work on 2.6.17 kernels . Matchsize in ipt_match struct was missing. /pch -- Dyslexia bug unpatched since 1977 ... exploit has been leaked to the underground. -------------- next part -------------- --- ipt_u32.c 2006-08-02 22:34:29.000000000 +0200 +++ /usr/src/linux-2.6.17.6/net/ipv4/netfilter/ipt_u32.c 2006-08-02 22:45:43.000000000 +0200 @@ -217,6 +217,7 @@ static struct ipt_match u32_match = { .name = "u32", .match = &match, + .matchsize = sizeof(struct ipt_u32), .checkentry = &checkentry, .me = THIS_MODULE }; From kernel at linuxace.com Thu Aug 3 01:49:33 2006 From: kernel at linuxace.com (Phil Oester) Date: Thu Aug 3 02:19:05 2006 Subject: [PATCH] ipv6 ROUTE target api changes Message-ID: <20060802234933.GA1183@linuxace.com> My recent .targetsize patch missed the other API changes in the IPv6 ROUTE target. This resolves bugzilla #490 (again). Phil -------------- next part -------------- diff -ru pom-orig/patchlets/ROUTE/linux-2.6/net/ipv6/netfilter/ip6t_ROUTE.c pom-new/patchlets/ROUTE/linux-2.6/net/ipv6/netfilter/ip6t_ROUTE.c --- pom-orig/patchlets/ROUTE/linux-2.6/net/ipv6/netfilter/ip6t_ROUTE.c 2006-07-22 10:07:52.000000000 -0400 +++ pom-new/patchlets/ROUTE/linux-2.6/net/ipv6/netfilter/ip6t_ROUTE.c 2006-08-02 19:46:04.000000000 -0400 @@ -192,6 +192,7 @@ const struct net_device *in, const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo, void *userinfo) { @@ -259,7 +260,8 @@ static int ip6t_route_checkentry(const char *tablename, - const struct ip6t_entry *e, + const void *entry, + const struct xt_target *target, void *targinfo, unsigned int targinfosize, unsigned int hook_mask) From kaber at trash.net Thu Aug 3 11:48:30 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Aug 3 12:18:22 2006 Subject: bugreport ipt_CLUSTERIP In-Reply-To: <20060802201212.3089b23b@zeus.subnet.mm-double.de> References: <20060725082618.8zdzyrc36s0ksgsk@mail.tu-chemnitz.de> <44C8CB58.60709@trash.net> <1154078751.44c9d81ff2c50@www.domainfactory-webmail.de> <44CABEBC.9050101@trash.net> <20060802201212.3089b23b@zeus.subnet.mm-double.de> Message-ID: <44D1C66E.1080307@trash.net> Maik Hentsche wrote: > I added the rule, then tried to ping the virtual IP, which did not > work (no arp response). Then I deleted the rule and did nothing else > with CLUSTERIP. The machine was used by another developer a week ago > and I do not know, if he did anything with iptables, but he hardly used > CLUSTERIP. I'm very sorry to not be able to help more. Thanks anyway. I'll try some more to reproduce it .. From kaber at trash.net Thu Aug 3 12:09:04 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Aug 3 12:38:39 2006 Subject: [PATCH 6/6] ulogd2 changes In-Reply-To: <200607311023.11145@nienna> References: <200607311023.11145@nienna> Message-ID: <44D1CB40.8050504@trash.net> KOVACS Krisztian wrote: > Attached are the changes to ulogd2. I'll leave it to Harald to apply this one, I don't know if the ulog branch in SVN matches his current code. From kaber at trash.net Thu Aug 3 12:09:37 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Aug 3 12:39:12 2006 Subject: [PATCH 0/6] pkg-config patches for libnetfilter_* userspace libraries and their users In-Reply-To: <200607311022.46466@nienna> References: <200607311022.46466@nienna> Message-ID: <44D1CB61.3050804@trash.net> KOVACS Krisztian wrote: > As there were no objections raised on the mailing list, here come the > pkg-config patches again. > > These patches modify autoconf scripts so that they use pkg-config to > detect the presence of libnfnetlink and other dependencies and use the > correct include and library names and paths. All but the ulog2 patch applied, thanks Krisztian. From stephen at dino.dnsalias.com Thu Aug 3 06:08:07 2006 From: stephen at dino.dnsalias.com (Stephen J. Bevan) Date: Thu Aug 3 13:46:18 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <1154452209.6395.77.camel@bzorp.balabit> References: <1154452209.6395.77.camel@bzorp.balabit> Message-ID: <17617.30375.336813.199864@localhost.localdomain> Balazs Scheidler writes: > I would like to easily match a set of dynamically created interfaces > from my packet filter rules. The attached patch forms the basis of my > implementation and I would like to know whether something like this is > mergeable to mainline. [snip] > The implementation: > > Each interface can belong to a single "group" at a time, an interface > comes up without being a member in any of the groups. You can get a similar effect by (ab)using the iflink field i.e. set the iflink to the parent interface and modify ip_tables.c:ip_packet_match to check the ifindex (or iflink if defined) for a match. An advantage of this is that it doesn't require adding any new fields and the only kernel change is to ip_tables.c:ip_packet_match (and its caller). That said, an explicit group (or zone as various firewall vendors call it) is cleaner. From lists at egidy.de Thu Aug 3 14:57:25 2006 From: lists at egidy.de (Gerd v. Egidy) Date: Thu Aug 3 15:27:04 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <1154502269.6241.11.camel@bzorp.balabit> References: <1154452209.6395.77.camel@bzorp.balabit> <20060801191805.GA28649@zion.homelinux.com> <1154502269.6241.11.camel@bzorp.balabit> Message-ID: <200608031457.26120.lists@egidy.de> Hi, > > > Since in this scenario userspace is able to determine ppp vs pptp, > > > could you not also do something like have an inbound_ppp and > > > inbound_pptp chain, then jump to the appropriate chain depending on > > > type? If you need per-interface rules, then create an inbound_pppX > > > chain, populate it with rules, then jump to that chain if -i pppX. In > > > ip-down, just delete the chain as well as the jump. > > > > if I understood Balazs correctly, one of the things he wanted to > > avoid is addition/deletion of iptables rules on every pppX interface > > up/down > > Exactly. We faced a similar problem: we wanted to not only differentiate between ppp and pptp interfaces but even between different providers connected via ppp and pptp. We ended up predefining all possible providers in the rules and differentiate between them using the ipt_condition module. This module is not very well-loved in the netfilter community but there are usecases like this where it comes in handy. Maybe this is a solution for you too. > > as this would require the complete chain (say, INPUT or > > OUTPUT) to be "downloaded" to userspace, modified and then again > > "uploaded" to the kernel. At least until iptables redesign to > > allow replacement/insertion/deletion of single rules is completed > > which if started at all will take quite some more time :-) > > Iptables operates on a per-table basis, so it is not only the INPUT or > OUTPUT chain that needs to be down and uploaded, but the whole filter > table. The problem with the current solution is not only speed and maintainability but also locking: if two device up/down events happen at the same time in this scenario, the tabels will become wrong until you develop some kind of userspace locking. Kind regards, Gerd From netfilter at mm-double.de Thu Aug 3 15:59:35 2006 From: netfilter at mm-double.de (Maik Hentsche) Date: Thu Aug 3 16:29:11 2006 Subject: bugreport ipt_CLUSTERIP In-Reply-To: <44D1C66E.1080307@trash.net> References: <20060725082618.8zdzyrc36s0ksgsk@mail.tu-chemnitz.de> <44C8CB58.60709@trash.net> <1154078751.44c9d81ff2c50@www.domainfactory-webmail.de> <44CABEBC.9050101@trash.net> <20060802201212.3089b23b@zeus.subnet.mm-double.de> <44D1C66E.1080307@trash.net> Message-ID: <1154613575.44d201478b3d9@www.domainfactory-webmail.de> Zitat von Patrick McHardy : > Thanks anyway. I'll try some more to reproduce it .. It happened again and now I have a clue, what _might_ trigger it. I did a few (around 15) inserting/deleting/inserting/flushing-cycles. All the time, a ping to the virtual IP was running. Most important, another programm held the virtual IP when it crashed (twice), once keepalived and once heartbeatd. Without them, the error did not occur. I appended a console log. Note, that, even though, there is a "hash=1 ct_hash=1 responsible", which seems to come from CLUSTERIP, I do not get an arp response (and as far as I understood, CLUSTERIP is supposed to send one). HTH & so long Maik hash=1 ct_hash=1 responsible hash=1 ct_hash=1 responsible IPVS: Registered protocols (TCP, UDP, AH, ESP) IPVS: Connection hash table configured (size=4096, memory=64Kbytes) IPVS: ipvs loaded. Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: {:ipt_CLUSTERIP:__clusterip_config_find+34} PGD 798d2067 PUD 7c090067 PMD 0 Oops: 0000 [1] PREEMPT SMP CPU 1 Modules linked in: ip_vs ipt_CLUSTERIP ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables ipt_ULOG x_tables ipv3Pid: 0, comm: swapper Not tainted 2.6.17.3 #1 RIP: 0010:[] {:ipt_CLUSTERIP:__clusterip_config_find+34} RSP: 0018:ffff810001377d60 EFLAGS: 00010a16 RAX: 0000000000000000 RBX: 0000000016fb14ac RCX: ffff81007cb50000 RDX: 0000000000000000 RSI: ffff810001377e48 RDI: 0000000016fb14ac RBP: ffff81007d2e3410 R08: ffffffff803aadc0 R09: 0000000000000205 R10: ffff81007cb50168 R11: 0000000000000001 R12: ffff81007cb50000 R13: ffff81007d2e3418 R14: 0000000000000000 R15: 0000000000000001 FS: 00002b8f4febc6d0(0000) GS:ffff81007e3afa40(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000007d3d4000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff810001370000, task ffff810001344080) Stack: ffffffff881357a4 ffff81000132bac0 ffff810001377df8 ffffffff8052b6f0 0000000080000000 ffff81007cb50000 ffffffff803bce1e ffff810001377e48 0000000000000001 ffff810001377e48 Call Trace: {:ipt_CLUSTERIP:arp_mangle+116} {nf_iterate+94} {dev_queue_xmit+0} {nf_hook_slow+135} {dev_queue_xmit+0} {arp_xmit+62} {arp_solicit+396} {neigh_timer_handler+654} {hrtimer_run_queues+214} {neigh_timer_handler+0} {run_timer_softirq+375} {__do_softirq+91} {default_idle+0} {call_softirq+30} {do_softirq+49} {irq_exit+63} {default_idle+0} {apic_timer_interrupt+98} {_spin_unlock_irq+21} {thread_return+186} {default_idle+47} {cpu_idle+104} Code: 48 8b 00 0f 18 08 48 81 fa 00 75 13 88 75 e5 31 c0 c3 66 66 RIP {:ipt_CLUSTERIP:__clusterip_config_find+34} RSP CR2: 0000000000000000 <0>Kernel panic - not syncing: Aiee, killing interrupt handler! From azez at ufomechanic.net Thu Aug 3 18:10:33 2006 From: azez at ufomechanic.net (Amin Azez) Date: Thu Aug 3 18:40:32 2006 Subject: [PATCH] ipset matches either,both Message-ID: These patches to ipset allow "either" and "both" to be specified as well as "src" and "dst". I considered doing this as a hack in ip_set.c, by calling the match twice, faking flags of SRC and DST, but it was a hack and wouldn't easily cover the porthash which consumes two flags. Instead I've altered the test of all of the set types, to accept BOTH and EITHER as well. The encoding of BOTH and EITHER was troublesome as flags is a mix of an enum and bitmap-set; what I did isn't clean, but it will do, and doesn't break user-space compatability. I've tested the behaviour of src dst either and both for iphash. I'm not sure about ipportmap because the man pages aren't clear how to write rules for this. I'm not sure of the effect on this with bindings; I suspect that with either,both we may want a preferred test order, and the one that matches first is used to follow bindings. I guess that res = set->type->testip_kernel(set, skb, &ip, flags, i++); sets ip for use by bindings following, and I haven't fiddled with this, so src will be set if that matched for either, otherwise "dst" will be set for "either" or "both" Comments welcome. -------------- next part -------------- A non-text attachment was scrubbed... Name: ipset-both-either.patch Type: text/x-patch Size: 14686 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060803/70bf61ad/ipset-both-either.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: iptables-ipset-both.patch Type: text/x-patch Size: 1355 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060803/70bf61ad/iptables-ipset-both.bin From bazsi at balabit.hu Thu Aug 3 21:08:59 2006 From: bazsi at balabit.hu (Balazs Scheidler) Date: Thu Aug 3 21:38:38 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <17617.30375.336813.199864@localhost.localdomain> References: <1154452209.6395.77.camel@bzorp.balabit> <17617.30375.336813.199864@localhost.localdomain> Message-ID: <1154632139.6333.5.camel@bzorp.balabit> On Wed, 2006-08-02 at 21:08 -0700, Stephen J. Bevan wrote: > Balazs Scheidler writes: > > I would like to easily match a set of dynamically created interfaces > > from my packet filter rules. The attached patch forms the basis of my > > implementation and I would like to know whether something like this is > > mergeable to mainline. > [snip] > > The implementation: > > > > Each interface can belong to a single "group" at a time, an interface > > comes up without being a member in any of the groups. > > You can get a similar effect by (ab)using the iflink field i.e. set > the iflink to the parent interface and modify > ip_tables.c:ip_packet_match to check the ifindex (or iflink if > defined) for a match. An advantage of this is that it doesn't require > adding any new fields and the only kernel change is to > ip_tables.c:ip_packet_match (and its caller). That said, an explicit > group (or zone as various firewall vendors call it) is cleaner. I could hack a solution together, but I'd prefer to do this cleanly, preferably as a patch in mainline. I would like to incorporate this functionality in our product. -- Bazsi From pablo at netfilter.org Thu Aug 3 21:15:25 2006 From: pablo at netfilter.org (Pablo Neira Ayuso) Date: Thu Aug 3 21:41:24 2006 Subject: conntrack utility.. In-Reply-To: <416697d80608020725i1f7bcf15nfa487a702de662@mail.gmail.com> References: <416697d80608020725i1f7bcf15nfa487a702de662@mail.gmail.com> Message-ID: <44D24B4D.5080402@netfilter.org> Hi, Devrim Seral wrote > For a few days i tried to fix some problem to handle events int > conntrack utility.. However i haven't success.. > i learned conntrack notification feature from Harald Welte paper. And > also i find conntrack utility from netfilter.org .. > > I am using libnetfilter_conntrack-0.0.31, libnfnetlink-0.0.16 and > conntrack-1.00beta2 on ubuntu server 6 with kernel 2.6.15 64 bit kernel. > Everything seem ok except show events feature.. > I am tried to solve but still i am not get any result.. > i tried to watch events with command like : > ./conntrack -E > ./conntrack -E conntrack > ./conntrack -E conntrack -e DESTROY > > Neither command give me any result.. Only when i pressed Ctrl+C > conntrack utility give output "Now closing conntrack event dumping..." Please, make sure that ip_conntrack_netlink module is loaded (modprobe ip_conntrack_netlink). Let me know if that fixes your problem. -- The dawn of the fourth age of Linux firewalling is coming; a time of great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris From kernel at linuxace.com Fri Aug 4 03:17:40 2006 From: kernel at linuxace.com (Phil Oester) Date: Fri Aug 4 03:47:19 2006 Subject: [PATCH] string match negation fix Message-ID: <20060804011740.GA1177@linuxace.com> The xt_string match is broken with ! negation. This resolves a portion of netfilter bugzilla #497. Phil Signed-off-by: Phil Oester -------------- next part -------------- --- linux-dellfw/net/netfilter/xt_string.c 2006-07-15 15:00:43.000000000 -0400 +++ linux-po/net/netfilter/xt_string.c 2006-08-03 21:06:13.000000000 -0400 @@ -37,7 +37,7 @@ return (skb_find_text((struct sk_buff *)skb, conf->from_offset, conf->to_offset, conf->config, &state) - != UINT_MAX) && !conf->invert; + != UINT_MAX) ^ conf->invert; } #define STRING_TEXT_PRIV(m) ((struct xt_string_info *) m) From thomasheinz at gmx.net Fri Aug 4 08:15:53 2006 From: thomasheinz at gmx.net (Thomas Heinz) Date: Fri Aug 4 08:46:04 2006 Subject: Ipsec, policy match and PROTO=4 In-Reply-To: <200607270131.18655.thomasheinz@gmx.net> References: <200607270131.18655.thomasheinz@gmx.net> Message-ID: <200608040815.54561.thomasheinz@gmx.net> Hi My questions regarding the "ipsec PROTO=4" problem (see below) has not been answered yet. I hate to be so impatient but I can imagine that this issue concerns quite a number of users. I would greatly appreciate it if someone could shortly comment on it. Thanks a lot. Best regards, Thomas On Thursday 27 July 2006 01:31, I wrote: > Hello guys > > I have the following standard ipsec tunnel: > > [tun_A] host_A [pub_A]-------------[pub_B] host_B [tun_B] > > pub_A/B: public IP of host_A/B > tun_A/B: tunnel IP of host_A/B > > After establishing the ipsec connection, putting two simple log-all > rules in INPUT and OUTPUT like this: > # iptables -I INPUT -j LOG > # iptables -I OUTPUT -j LOG > and pinging tun_B from host_A, I get the following log entries: > > IN= OUT=eth0 SRC=tun_A DST=tun_B LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 > DF PROTO=ICMP TYPE=8 CODE=0 ID=16181 SEQ=1 > IN= OUT=eth0 SRC=pub_A DST=pub_B LEN=136 TOS=0x00 PREC=0x00 TTL=64 ID=0 > DF PROTO=ESP SPI=0xxxxxxxxx > > Very nice so far: the packet is seen twice, once clear and once > encrypted. Now, let's look at the ICMP reply. > > IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=pub_B > DST=pub_A LEN=136 TOS=0x00 PREC=0x00 TTL=56 ID=57843 PROTO=ESP > SPI=0xxxxxxxxx > IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=pub_B > DST=pub_A LEN=104 TOS=0x00 PREC=0x00 TTL=56 ID=57843 PROTO=4 > > Here, we see the packet also twice but the cleartext one has PROTO=4 > (ipencap, ipip tunnel). > > This has been observed in e.g. this thread: > http://marc.theaimsgroup.com/?l=netfilter-devel&m=114010374229806&w=2 > > Moreover, I have previously sent this posting to the netfilter user > mailing list. > > Could you please tell me about the current state regarding this bug? Is > it already addressed? Is it hard to fix? > > Is it correct that there is currently no way to filter incoming clear > text packets? Accepting PROTO=4 packets of course works but it is rather > a workaround. > > Thanks for your support. > > > Best regards, > > Thomas From kaber at trash.net Fri Aug 4 12:06:39 2006 From: kaber at trash.net (Patrick McHardy) Date: Fri Aug 4 12:38:12 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <1154452209.6395.77.camel@bzorp.balabit> References: <1154452209.6395.77.camel@bzorp.balabit> Message-ID: <44D31C2F.2050702@trash.net> Balazs Scheidler wrote: > The use-case is as follows: > > * I have two different subsystems creating interfaces dynamically (for > example pptpd and serial pppd lines, each creating dynamic pppX > interfaces), > * I would like to assign a different set of iptables rules for these > clients, > * I would like to react to a new interface being added to a specific set > in a userspace application, > > The reasons I see this needs new kernel functionality: > > * iptables supports wildcard interface matching (for example "iptables > -i ppp+"), but as the names of the interfaces used by PPTPD and PPPD > cannot be distinguished this way, this is not enough, > * Reloading the iptables ruleset everytime a new interface comes up is > not really feasible, as it abrupts packet processing, and validating the > ruleset in the kernel can take significant amount of time, > * the kernel change is very simple, adapting userspace to this change is > also very simple, and in userspace various software packages can easily > interoperate with each-other once this is merged. > > The implementation: > > Each interface can belong to a single "group" at a time, an interface > comes up without being a member in any of the groups. > > Userspace can assign interfaces to groups after being created, this > would typically be performed in /etc/ppp/ip-up.d (and similar) scripts. > > In spirit "interface group" is somewhat similar to the "routing > protocol" field for routing entries, which contains information on which > routing daemon was responsible for adding the given route entry. > > Things to be done if you like this approach: > > * interface group match in iptables, > * support for naming interface groups in userspace, a'la routing > protocols, > * emitting a netlink notification when the group of an interface > changes, > * possibly converting the "ip link" command to use NETLINK messages, > instead of using ioctl() > > What do you think? I like it .. kind of like routing realms. For your specific case there is a possible solution already supported by the kernel, you can pre-allocate ppp devices using PPPIOCNEWUNIT, rename them and later attach to individual units in the ppp daemon using PPPIOCATTACH (I have a patch for this somewhere if you're interested). But that only works for PPP devices and the group idea looks more flexible. From azez at ufomechanic.net Fri Aug 4 16:43:01 2006 From: azez at ufomechanic.net (Amin Azez) Date: Fri Aug 4 17:13:03 2006 Subject: [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation In-Reply-To: <44CDE635.9060101@netfilter.org> References: <44C61A2E.3060102@netfilter.org> <200607281316.k6SDGBVP001926@toshiba.co.jp> <44CDE635.9060101@netfilter.org> Message-ID: <44D35CF5.1090502@ufomechanic.net> * Pablo Neira Ayuso wrote, On 31/07/06 12:15: > Hi Yasuyuki, > > Yasuyuki KOZAKAI wrote: >> From: Pablo Neira Ayuso >> Date: Tue, 25 Jul 2006 15:18:38 +0200 >> >> >>> - rework get_features facility to avoid a softlockup >> >> >> This looks nice cleanup, but __nf_conntrack_alloc() cannot >> be called while holding nf_conntrack_lock, because it may call >> early_drop(), which holds nf_contrack_lock and also may call >> nf_ct_put(). >> >> >>> static struct nf_conn * >>> __nf_conntrack_alloc(const struct nf_conntrack_tuple *orig, >>> const struct nf_conntrack_tuple *repl, >>> - const struct nf_conntrack_l3proto *l3proto) >>> + const struct nf_conntrack_l3proto *l3proto, >>> + u_int32_t features) >>> { >> >> >> You've moved "features = l3proto->get_features(orig);" out of >> this function, then the argument 'l3proto' isn't necessary. > > Indeed, I also detected another problem related with the NAT code in > ip_conntrack_netlink, so this patch needs to be dropped. > > I'm questioning the usefulness of this patch since nfnetlink > serializes the creation of two new conntracks. I'm finding it hard to drop this patch from the series, having trouble applying patch 7 from the series without this patch. I find it difficult to be comfortable with dropping some features of this patch. In this fragment, the second chunk does what looks like an important re-ordering of locking and conntrack creation; i.e. of course the lock is retained till after conntrack creation. So I don't think it is safe to entirely drop this patch; Pablo? Index: linux-2.6.17.1/net/ipv4/netfilter/ip_conntrack_netlink.c =================================================================== --- linux-2.6.17.1.orig/net/ipv4/netfilter/ip_conntrack_netlink.c +++ linux-2.6.17.1/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -1212,13 +1212,9 @@ ctnetlink_create_conntrack(struct nfattr ct->mark = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_MARK-1])); #endif - ct->helper = ip_conntrack_helper_find_get(rtuple); - - add_timer(&ct->timeout); + ct->helper = ip_conntrack_helper_find(rtuple); ip_conntrack_hash_insert(ct); - - if (ct->helper) - ip_conntrack_helper_put(ct->helper); + add_timer(&ct->timeout); DEBUGP("conntrack with id %u inserted\n", ct->id); return 0; @@ -1260,11 +1256,11 @@ ctnetlink_new_conntrack(struct sock *ctn h = __ip_conntrack_find(&rtuple, NULL); if (h == NULL) { - write_unlock_bh(&ip_conntrack_lock); DEBUGP("no such conntrack, create new\n"); err = -ENOENT; if (nlh->nlmsg_flags & NLM_F_CREATE) err = ctnetlink_create_conntrack(cda, &otuple, &rtuple); + write_unlock_bh(&ip_conntrack_lock); return err; } /* implicit 'else' */ From kaber at trash.net Fri Aug 4 21:46:36 2006 From: kaber at trash.net (Patrick McHardy) Date: Fri Aug 4 22:42:49 2006 Subject: Ipsec, policy match and PROTO=4 In-Reply-To: <200607270131.18655.thomasheinz@gmx.net> References: <200607270131.18655.thomasheinz@gmx.net> Message-ID: <44D3A41C.9080606@trash.net> Thomas Heinz wrote: > Hello guys > > I have the following standard ipsec tunnel: > > [tun_A] host_A [pub_A]-------------[pub_B] host_B [tun_B] > > pub_A/B: public IP of host_A/B > tun_A/B: tunnel IP of host_A/B > > After establishing the ipsec connection, putting two simple log-all rules > in INPUT and OUTPUT like this: > # iptables -I INPUT -j LOG > # iptables -I OUTPUT -j LOG > and pinging tun_B from host_A, I get the following log entries: > > IN= OUT=eth0 SRC=tun_A DST=tun_B LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF > PROTO=ICMP TYPE=8 CODE=0 ID=16181 SEQ=1 > IN= OUT=eth0 SRC=pub_A DST=pub_B LEN=136 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF > PROTO=ESP SPI=0xxxxxxxxx > > Very nice so far: the packet is seen twice, once clear and once encrypted. > Now, let's look at the ICMP reply. > > IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=pub_B > DST=pub_A LEN=136 TOS=0x00 PREC=0x00 TTL=56 ID=57843 PROTO=ESP > SPI=0xxxxxxxxx > IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=pub_B > DST=pub_A LEN=104 TOS=0x00 PREC=0x00 TTL=56 ID=57843 PROTO=4 > > Here, we see the packet also twice but the cleartext one has PROTO=4 > (ipencap, ipip tunnel). The cleartext packet should also be seen with the real protocol after that. What do your policies look li From thomasheinz at gmx.net Fri Aug 4 22:57:24 2006 From: thomasheinz at gmx.net (Thomas Heinz) Date: Fri Aug 4 23:27:24 2006 Subject: Ipsec, policy match and PROTO=4 In-Reply-To: <44D3A41C.9080606@trash.net> References: <200607270131.18655.thomasheinz@gmx.net> <44D3A41C.9080606@trash.net> Message-ID: <200608042257.25461.thomasheinz@gmx.net> Hi Patrick Thanks for your reply. You wrote: > The cleartext packet should also be seen with the real protocol after > that. Oh, you're right. Sorry for being so unobservant. Is the fact that the packet with PROTO=4 is seen in between the encrypted and cleartext packet considered a bug? Best regards, Thomas From heavytull at hotmail.com Sat Aug 5 20:38:37 2006 From: heavytull at hotmail.com (Jethro Tull) Date: Sat Aug 5 21:08:28 2006 Subject: iptable compiled but errors returned In-Reply-To: Message-ID: >From: "Jethro Tull" >To: netfilter-devel@lists.netfilter.org >Subject: iptable compiled but errors returned >Date: Fri, 23 Jun 2006 23:15:23 +0000 > >I just compiled iptables by doing so: >make KERNEL_DIR=/kernel/source > >but the returned message for all files compiled is as follows: > > >cc -O2 -Wall -Wunused -I/usr/src/kernel/linux-2.6.17//include -Iinclude/ >-DIPTABLES_VERSION=\"1.3.5-20060622\" -D_UNKNOWN_KERNEL_POINTER_SIZE >-DIP6T_LIB_DIR=\"/usr/local/lib/iptables\" -c -o ip6tables.o ip6tables.c > > >the "-D_UNKNOWN_KERNEL_POINTER_SIZE..." is it normal?? > >_________________________________________________________________ >Be the first to hear what's new at MSN - sign up to our free newsletters! >http://www.msn.co.uk/newsletters > > _________________________________________________________________ Windows Live™ Messenger has arrived. Click here to download it for free! http://imagine-msn.com/messenger/launch80/?locale=en-gb From kernel at linuxace.com Sat Aug 5 23:57:25 2006 From: kernel at linuxace.com (Phil Oester) Date: Sun Aug 6 00:27:17 2006 Subject: iptable compiled but errors returned In-Reply-To: References: Message-ID: <20060805215725.GA12084@linuxace.com> On Sat, Aug 05, 2006 at 06:38:37PM +0000, Jethro Tull wrote: > the "-D_UNKNOWN_KERNEL_POINTER_SIZE..." is it normal?? Yes, normal. Phil From kadlec at blackhole.kfki.hu Mon Aug 7 09:41:47 2006 From: kadlec at blackhole.kfki.hu (Jozsef Kadlecsik) Date: Mon Aug 7 10:11:29 2006 Subject: [PATCH] ipset matches either,both In-Reply-To: References: Message-ID: Hi, On Thu, 3 Aug 2006, Amin Azez wrote: > These patches to ipset allow "either" and "both" to be specified as well > as "src" and "dst". I'm a bit surprised because "either" can naturally be expressed by two iptables rules and "both" is equivalent with two set matches: iptables ... -m set --set foo src -m set --set foo dst Why do you need "both" and "either" encoded internally in ipset? > I considered doing this as a hack in ip_set.c, by calling the match > twice, faking flags of SRC and DST, but it was a hack and wouldn't > easily cover the porthash which consumes two flags. > > Instead I've altered the test of all of the set types, to accept BOTH > and EITHER as well. But thus all the extensions (set types) have to know what "both" and "either" means, which'd belong to the ipset core as I believe, if introduced. > The encoding of BOTH and EITHER was troublesome as flags is a mix of an > enum and bitmap-set; what I did isn't clean, but it will do, and doesn't > break user-space compatability. That is nice! I tend to break backward-compatibility too lightheartedly :-(. > I've tested the behaviour of src dst either and both for iphash. > I'm not sure about ipportmap because the man pages aren't clear how to > write rules for this. In order to match both dst IP and dst port (for example all publicly available servers behind a firewall) one can use an ipportmap type of set and a single rule like this: iptables ... -m set --set foo dst,dst > I'm not sure of the effect on this with bindings; I suspect that with > either,both we may want a preferred test order, and the one that matches > first is used to follow bindings. (Unless someone convinces me that bindings are really great and should deserve to outlast) I think bindings are a dead-end: internally not efficient enough and hard to understand by the users. Pablo and I started to rewrite ipset using netlink as kernel/userspace communication interface in the spring and thrown out bindings as unnecessary baggage. Unfortunately I have been overburdened since then and will only be able to pick up and continue the work in October :-(. But that gives a wide window for new ideas in ipset(/nfset) :-). Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From kadlec at blackhole.kfki.hu Mon Aug 7 12:49:31 2006 From: kadlec at blackhole.kfki.hu (Jozsef Kadlecsik) Date: Mon Aug 7 13:19:15 2006 Subject: [PATCH] ipset matches either,both In-Reply-To: <44D6FF0A.8050009@ufomechanic.net> References: <44D6FF0A.8050009@ufomechanic.net> Message-ID: On Mon, 7 Aug 2006, Amin Azez wrote: > * Jozsef Kadlecsik wrote, On 07/08/06 08:41: > > Hi, > > > > On Thu, 3 Aug 2006, Amin Azez wrote: > > > > > >> These patches to ipset allow "either" and "both" to be specified as well > >> as "src" and "dst". > > > > I'm a bit surprised because "either" can naturally be expressed by two > > iptables rules and "both" is equivalent with two set matches: > > > > iptables ... -m set --set foo src -m set --set foo dst > > > > Why do you need "both" and "either" encoded internally in ipset? > > > Since when did iptables allow the same module to be used twice in a rule > like that? >From the svn changelog: iptables.c, rev 6474, modified Fri Mar 3 09:36:50 2006 UTC (5 months ago): Multiple matches of the same type can be specified on the commandline. If two or more matches of the same type are detected then the options are assumed to be grouped in order to tell which option belongs to which match: ... -m foo ... ... -m foo ... ... Otherwise the commandline parsing is unmodified. > Although "either" can be expressed as two iptables rules, it gets hard > if you want to use ! for neither, and your action is drop or return or > something drastic. You have to start fiddling with subchains and it all > gets very awkward. You can express "neither A or B" as "not A and not B". > >> I considered doing this as a hack in ip_set.c, by calling the match > >> twice, faking flags of SRC and DST, but it was a hack and wouldn't > >> easily cover the porthash which consumes two flags. > >> > >> Instead I've altered the test of all of the set types, to accept BOTH > >> and EITHER as well. > > > > But thus all the extensions (set types) have to know what "both" and > > "either" means, which'd belong to the ipset core as I believe, if > > introduced. > > > You are probably right, but the current design where ipporthash consumes > two flags doesn't permit this. Sorry, I don't see why. > >> I've tested the behaviour of src dst either and both for iphash. > >> I'm not sure about ipportmap because the man pages aren't clear how to > >> write rules for this. > > > > In order to match both dst IP and dst port (for example all publicly > > available servers behind a firewall) one can use an ipportmap type of set > > and a single rule like this: > > > > iptables ... -m set --set foo dst,dst > > > with bindings, yeah; we we going to do this, it would allow "both" if a > set was bound to itself, but wouldn't allow "either." No, without bindings. It *is* confusing :-(. In an ipporthash type of set (sorry for the typo above, ipportmap does not exist) you store an IP address and a port together. And therefore you can match any of the four possibilities (src,src,...., dst,dst) in one go, but the most useful is the example I cited: # our public servers: ipset -N foo ipporthash --network 192.168.0.0/16 ipset -A foo 192.168.0.1%22 ipset -A foo 192.168.0.2%80 ipset -A foo 192.168.0.3%25 .... # Allow access to our public servers iptables -A FORWARD -p tcp -m set --set foo dst,dst -j ACCEPT > > Pablo and I started to rewrite ipset using netlink as kernel/userspace > > communication interface in the spring and thrown out bindings as > > unnecessary baggage. Unfortunately I have been overburdened since then and > > will only be able to pick up and continue the work in October :-(. But > > that gives a wide window for new ideas in ipset(/nfset) :-). > > > Are you happy to "bless" this patch as embodying an ipset feature that > will be retained in future revisions? I'm open to feature requests and suggestions for the next revisions but still not convinced about "both" and "either", sorry: "both" is equal to -m set --set foo src -m set --set foo dst "neither" is equal to -m set ! --set foo src -m ! set --set foo dst "either" can be expressed by two iptables rules. So what can we gain? Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From bazsi at balabit.hu Mon Aug 7 13:44:10 2006 From: bazsi at balabit.hu (Balazs Scheidler) Date: Mon Aug 7 14:14:15 2006 Subject: [patch] RFC: matching interface groups In-Reply-To: <44D31C2F.2050702@trash.net> References: <1154452209.6395.77.camel@bzorp.balabit> <44D31C2F.2050702@trash.net> Message-ID: <1154951050.18676.12.camel@bzorp.balabit> On Fri, 2006-08-04 at 12:06 +0200, Patrick McHardy wrote: > Balazs Scheidler wrote: > > The use-case is as follows: > > > > * I have two different subsystems creating interfaces dynamically (for > > example pptpd and serial pppd lines, each creating dynamic pppX > > interfaces), > > * I would like to assign a different set of iptables rules for these > > clients, > > * I would like to react to a new interface being added to a specific set > > in a userspace application, > > > > The reasons I see this needs new kernel functionality: > > > > * iptables supports wildcard interface matching (for example "iptables > > -i ppp+"), but as the names of the interfaces used by PPTPD and PPPD > > cannot be distinguished this way, this is not enough, > > * Reloading the iptables ruleset everytime a new interface comes up is > > not really feasible, as it abrupts packet processing, and validating the > > ruleset in the kernel can take significant amount of time, > > * the kernel change is very simple, adapting userspace to this change is > > also very simple, and in userspace various software packages can easily > > interoperate with each-other once this is merged. > > > > The implementation: > > > > Each interface can belong to a single "group" at a time, an interface > > comes up without being a member in any of the groups. > > > > Userspace can assign interfaces to groups after being created, this > > would typically be performed in /etc/ppp/ip-up.d (and similar) scripts. > > > > In spirit "interface group" is somewhat similar to the "routing > > protocol" field for routing entries, which contains information on which > > routing daemon was responsible for adding the given route entry. > > > > Things to be done if you like this approach: > > > > * interface group match in iptables, > > * support for naming interface groups in userspace, a'la routing > > protocols, > > * emitting a netlink notification when the group of an interface > > changes, > > * possibly converting the "ip link" command to use NETLINK messages, > > instead of using ioctl() > > > > What do you think? > > > I like it .. kind of like routing realms. For your specific case there > is a possible solution already supported by the kernel, you can > pre-allocate ppp devices using PPPIOCNEWUNIT, rename them and later > attach to individual units in the ppp daemon using PPPIOCATTACH > (I have a patch for this somewhere if you're interested). But that > only works for PPP devices and the group idea looks more flexible. Thanks for liking it :) I'm going to implement a complete patch with iptables match and support for naming interface groups like routing realms and post it when I'm ready. I'd go for the more general solution as I have other interfaces not just ppp, it was just a trivial example. -- Bazsi From azez at ufomechanic.net Mon Aug 7 15:21:26 2006 From: azez at ufomechanic.net (Amin Azez) Date: Mon Aug 7 15:51:29 2006 Subject: [PATCH] ipset matches either,both In-Reply-To: References: <44D6FF0A.8050009@ufomechanic.net> Message-ID: <44D73E56.9070102@ufomechanic.net> * Jozsef Kadlecsik wrote, On 07/08/06 11:49: > >From the svn changelog: > > iptables.c, rev 6474, modified Fri Mar 3 09:36:50 2006 UTC (5 months ago): > > Multiple matches of the same type can be specified on the commandline. > > If two or more matches of the same type are detected then the options > are assumed to be grouped in order to tell which option belongs > to which match: > > ... -m foo ... ... -m foo ... ... > > Otherwise the commandline parsing is unmodified. > dang, I missed that. ... > I'm open to feature requests and suggestions for the next revisions but > still not convinced about "both" and "either", sorry: > > "both" is equal to -m set --set foo src -m set --set foo dst > "neither" is equal to -m set ! --set foo src -m ! set --set foo dst > "either" can be expressed by two iptables rules. > > So what can we gain? > indeed. *sob* Sam From kaber at trash.net Tue Aug 8 09:50:15 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 10:20:31 2006 Subject: Ipsec, policy match and PROTO=4 In-Reply-To: <200608042257.25461.thomasheinz@gmx.net> References: <200607270131.18655.thomasheinz@gmx.net> <44D3A41C.9080606@trash.net> <200608042257.25461.thomasheinz@gmx.net> Message-ID: <44D84237.7000708@trash.net> Thomas Heinz wrote: > Is the fact that the packet with PROTO=4 is seen in between the encrypted > and cleartext packet considered a bug? Not sure right now, I need to look into this again. From kaber at trash.net Tue Aug 8 11:09:20 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 11:39:28 2006 Subject: [PATCH] string match negation fix In-Reply-To: <20060804011740.GA1177@linuxace.com> References: <20060804011740.GA1177@linuxace.com> Message-ID: <44D854C0.2020909@trash.net> Phil Oester wrote: > The xt_string match is broken with ! negation. > > This resolves a portion of netfilter bugzilla #497. Applied, thanks Phil. From kaber at trash.net Tue Aug 8 11:27:04 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 11:57:12 2006 Subject: [PATCH] ipv6 ROUTE target api changes In-Reply-To: <20060802234933.GA1183@linuxace.com> References: <20060802234933.GA1183@linuxace.com> Message-ID: <44D858E8.7000206@trash.net> Phil Oester wrote: > My recent .targetsize patch missed the other API changes in the IPv6 > ROUTE target. > > This resolves bugzilla #490 (again). Thanks, applied. From kaber at trash.net Tue Aug 8 11:28:41 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 11:58:48 2006 Subject: u32 patch In-Reply-To: <20060802212232.GA29168@packetconsulting.pl> References: <20060802212232.GA29168@packetconsulting.pl> Message-ID: <44D85949.3050804@trash.net> Piotr Chytla wrote: > Here are some small patch for u32 match, to work on 2.6.17 kernels . > Matchsize in ipt_match struct was missing. Thanks, applied. From kaber at trash.net Tue Aug 8 11:33:32 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 12:03:39 2006 Subject: [PATCH] update quota match for xtables + fix -D bug In-Reply-To: <20060802024322.GA15127@linuxace.com> References: <20060802024322.GA15127@linuxace.com> Message-ID: <44D85A6C.9060402@trash.net> Phil Oester wrote: > The iptables quota match has not been updated to reflect the new xtables > location/structures in 2.6.18-rc. In addition, it has a bug which makes > it impossible to delete a rule once added. E.g.: > > # iptables -A foo -m quota --quota 1111 -j RETURN > # iptables -D foo -m quota --quota 1111 -j RETURN > iptables: Bad rule (does a matching rule exist in that chain?) > > Below patch fixes both issues and resolve bugzilla #496. Also applied, thanks. From kaber at trash.net Tue Aug 8 12:14:58 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 12:45:06 2006 Subject: [PATCH 1/8][CONNTRACK] mark conntrack event In-Reply-To: <44C619E4.90403@netfilter.org> References: <44C619E4.90403@netfilter.org> Message-ID: <44D86422.50404@trash.net> Pablo Neira Ayuso wrote: > This patch introduces the mark event. ctnetlink can use this to know if > the mark needs to be dumped. Applied, thanks. From kaber at trash.net Tue Aug 8 12:16:17 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 12:46:24 2006 Subject: [PATCH 2/8][CTNETLINK] dump conntrack mark In-Reply-To: <44C619F8.40201@netfilter.org> References: <44C619F8.40201@netfilter.org> Message-ID: <44D86471.9050701@trash.net> Pablo Neira Ayuso wrote: > ctnetlink dumps the mark iif the event mark happened Also applied. And thanks for your patience btw :) From kaber at trash.net Tue Aug 8 12:18:05 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 12:48:13 2006 Subject: [PATCH 3/8][CTNETLINK] send expectation events iif there listeners In-Reply-To: <44C61A0A.6040000@netfilter.org> References: <44C61A0A.6040000@netfilter.org> Message-ID: <44D864DD.1050109@trash.net> Pablo Neira Ayuso wrote: > This patch uses nfnetlink_has_listeners to check for listeners in > userspace. Applied, thanks. From kaber at trash.net Tue Aug 8 12:19:04 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 12:49:12 2006 Subject: [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation In-Reply-To: <44D35CF5.1090502@ufomechanic.net> References: <44C61A2E.3060102@netfilter.org> <200607281316.k6SDGBVP001926@toshiba.co.jp> <44CDE635.9060101@netfilter.org> <44D35CF5.1090502@ufomechanic.net> Message-ID: <44D86518.408@trash.net> Amin Azez wrote: > * Pablo Neira Ayuso wrote, On 31/07/06 12:15: > > I find it difficult to be comfortable with dropping some features of > this patch. In this fragment, the second chunk does what looks like an > important re-ordering of locking and conntrack creation; i.e. of course > the lock is retained till after conntrack creation. > > So I don't think it is safe to entirely drop this patch; Pablo? I'll drop and and see which of the other ones still make sense. From kaber at trash.net Tue Aug 8 12:33:51 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 13:04:00 2006 Subject: [PATCH 5/8][CONNTRACK] Introduce the pickup facilities to take over TCP connections In-Reply-To: <44C61BDA.7050505@netfilter.org> References: <44C61BDA.7050505@netfilter.org> Message-ID: <44D8688F.6030207@trash.net> Pablo Neira Ayuso wrote: > This patch introduces a new flag called IPS_PICKUP that forces the > protocol handler to pick up the required information in order to ensure > that the connection will reach a successful state. > > Two new ctnetlink attributes are also introduced to inject the window > scale factor since TCP window tracking could need it to take over the > connection properly. > > Signed-off-by: Pablo Neira Ayuso > > > ------------------------------------------------------------------------ > > [CONNTRACK] Introduce the pickup facilities to take over TCP connections > > This patch introduces a new flag called IPS_PICKUP that forces the protocol > handler to pick up the required information in order to ensure that the > connection will reach a successful state. > > Two new ctnetlink attributes are also introduced to inject the window scale > factor since TCP window tracking could need it to take over the connection > properly. > > Signed-off-by: Pablo Neira Ayuso > > Index: net-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c > =================================================================== > --- net-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2006-07-14 17:01:02.000000000 +0200 > +++ net-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2006-07-14 17:46:59.000000000 +0200 > @@ -346,6 +346,12 @@ static int tcp_to_nfattr(struct sk_buff > nest_parms = NFA_NEST(skb, CTA_PROTOINFO_TCP); > NFA_PUT(skb, CTA_PROTOINFO_TCP_STATE, sizeof(u_int8_t), > &ct->proto.tcp.state); > + /* window scale factor: original direction (SYN) */ > + NFA_PUT(skb, CTA_PROTOINFO_TCP_WSCALE_ORIGINAL, sizeof(u_int8_t), > + &ct->proto.tcp.seen[0].td_scale); > + /* window scale factor: reply direction (SYN+ACK) */ > + NFA_PUT(skb, CTA_PROTOINFO_TCP_WSCALE_REPLY, sizeof(u_int8_t), > + &ct->proto.tcp.seen[1].td_scale); > read_unlock_bh(&tcp_lock); > > NFA_NEST_END(skb, nest_parms); > @@ -358,7 +364,9 @@ nfattr_failure: > } > > static const size_t cta_min_tcp[CTA_PROTOINFO_TCP_MAX] = { > - [CTA_PROTOINFO_TCP_STATE-1] = sizeof(u_int8_t), > + [CTA_PROTOINFO_TCP_STATE-1] = sizeof(u_int8_t), > + [CTA_PROTOINFO_TCP_WSCALE_ORIGINAL-1] = sizeof(u_int8_t), > + [CTA_PROTOINFO_TCP_WSCALE_REPLY-1] = sizeof(u_int8_t), > }; > > static int nfattr_to_tcp(struct nfattr *cda[], struct ip_conntrack *ct) > @@ -382,6 +390,24 @@ static int nfattr_to_tcp(struct nfattr * > write_lock_bh(&tcp_lock); > ct->proto.tcp.state = > *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_STATE-1]); > + /* window scale factor: original direction (SYN) */ > + if (tb[CTA_PROTOINFO_TCP_WSCALE_ORIGINAL-1]) > + ct->proto.tcp.seen[0].td_scale = > + *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_WSCALE_ORIGINAL-1]); > + /* window scale factor: reply direction (SYN+ACK) */ > + if (tb[CTA_PROTOINFO_TCP_WSCALE_REPLY-1]) > + ct->proto.tcp.seen[1].td_scale = > + *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_WSCALE_REPLY-1]); > + /* set WINDOW_SCALE flag */ > + if (tb[CTA_PROTOINFO_TCP_WSCALE_ORIGINAL-1] || > + tb[CTA_PROTOINFO_TCP_WSCALE_REPLY-1]) { > + /* > + * we have to assume that both sides have > + * sent Window Scale options (RFC 1323) > + */ > + ct->proto.tcp.seen[0].flags |= > + ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_WINDOW_SCALE; This will also set all other flags from seen[1] in seen[0]. RFC 1323 also says quite the opposite, both sides must send a wscale option to enable it. For unreplied connections this will do the wrong thing. And we should have an option to unset it, also for the case where we're synchronizing the state of an unreplied connection. I also think it would be better to add netlink attributes for the entire flags mask, for example IP_CT_TCP_FLAG_SACK_PERM also wants to be synchronized I guess. From kaber at trash.net Tue Aug 8 12:45:00 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 13:15:08 2006 Subject: [PATCH 6/8][CTNETLINK] Rework conntrack fields dumping logic on events In-Reply-To: <44C61C17.5040800@netfilter.org> References: <44C61C17.5040800@netfilter.org> Message-ID: <44D86B2C.80006@trash.net> Pablo Neira Ayuso wrote: > + /* > + * What do we dump on conntrack events? Good question, > + * the following table should clarify 8) > + * > + * | NEW | UPDATE | DESTROY | > + * ----------------------------------------| > + * tuples | Y | Y | Y | > + * status | Y | Y | N | > + * timeout | Y | Y | N | > + * protoinfo | Y | Y | N | > + * helper | S | S | N | > + * counters | N | N | Y | > + * mark | S | S | N | > + * > + * Leyend: > + * Y: yes > + * N: no > + * S: iif the field is set > + */ > + I think this is a bit excessive, the code pretty much speaks for itself. From kaber at trash.net Tue Aug 8 12:50:47 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 13:20:54 2006 Subject: [PATCH 7/8][CTNETLINK] send conntrack events on ctnetlink actions In-Reply-To: <44C61C29.4050202@netfilter.org> References: <44C61C29.4050202@netfilter.org> Message-ID: <44D86C87.20506@trash.net> Pablo Neira Ayuso wrote: > Currently only conntrack events generated by the conntrack core are > delivered to userspace via ctnetlink. This patch force the generation of > event notifications on ctnetlink actions. > > Example scenario: you have two process listening to ctnetlink and one of > them creates/changes/destroy a conntrack upon certain events This doesn't apply without 4/8. I also think sending incremental netlink updates is questionable here, if no conntrack events are configured other listeners can only get the entire new state by doing an additional dump. From kaber at trash.net Tue Aug 8 12:57:01 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 13:27:09 2006 Subject: [PATCH 8/8][CTNETLINK] Remove useless event-bit-is-set checkings on updates In-Reply-To: <44C61C3B.8070504@netfilter.org> References: <44C61C3B.8070504@netfilter.org> Message-ID: <44D86DFD.2020709@trash.net> Pablo Neira Ayuso wrote: > IPCT_HELPER and IPCT_NATINFO bits are never set on updates. Applied, thanks. From kaber at trash.net Tue Aug 8 13:08:37 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 8 13:56:57 2006 Subject: [RFC][PATCH] libnfnetlink new API #2 In-Reply-To: <44C63B3F.2090509@netfilter.org> References: <44C63B3F.2090509@netfilter.org> Message-ID: <44D870B5.8040707@trash.net> Pablo Neira Ayuso wrote: > Hi, > > Since I'll be leaving for two weeks, I'd like to put a patch for > libnfnetlink on the table that I'm currently distributing with > conntrackd for further discussion. I'd like to see this patch or > something similar in mainline someday. > > This patch: > > - Fixes error handling that is currently broken, errors are now reported > via errno so everyone could use perror(...) to get a more detailed > description to know what is going wrong. Basically the new functions > return -1 and set errno appropiately. > > - Adds Documentation that comes handy for developers. > > - Introduces replacement for nfnl_listen (nfnl_receive_process) and for > nfnl_talk (nfnl_send_received_process), both are integrated with the > nfnl_subsys_handle logic introduced by Harald, that IMHO must be the > right direction, and set errno appropiately in case of error. These new > functions obsolete nfnl_listen and nfnl_talk, we can add a clause > __deprecated to warn programmers without removing them. > > - Iterator API: to loop over a multipart netlink message and process it. > This gives more control in the message processing. It is similar to > Harald's nfnl_get_first_msg, nfnl_get_msg_next and nfnl_handle_packet > set of functions but sets errno and move iterator private information > out of nfnl_handle. I must confess that in this case I don't like too > much the idea of providing too many function to do the same but my API > looks friendlier I think, programmers are familiar with the concept of > iterators. > > - Introduce assertions to check input data: This can catch up wrong use > of the API and errors and the application can break "nicer" (if > breakages would ever be nice...) that segfaulting. I have seen these in > others libraries. > > In short: I think that we can deprecate old functions (just adding a > warning in compilation time) and remove them in version 2, I have seen > this in other libraries: we maintaining an old version 1 for those that > don't want to move forward some time but provide a clean version 2 > and drop early design errors. BTW, probably the name of some functions > are ugly, I accept suggestions ;) > > @Patrick: I think that Harald has more in-deep knowledge about the > libraries but, since he's really busy these days, your impressions on > this issue can be also worth as well. Without commenting on deprecating functions (I don't know about that), the patch looks good to me. > + * nfnl_send_receive_process - request/response challenge > + * @h: nfnetlink handler > + * @nlh: nfnetlink message to be sent > + * > + * This function is sends a nfnetlink message to a certain subsystem > + * and receives the response that is processed by the callback registered > + * via register_callback(). Note that this function is a replacement for > + * nfnl_talk, its use is recommended. > + * > + * On success, 0 is returned. On error, a negative is returned. If your > + * does not want to listen to events anymore, then it must return a value > + * lesser or equal to 0. > + * > + * Note that ENOBUFS is returned in case that nfnetlink is exhausted. In > + * that case is possible that the information requested is incomplete. > + */ > +int nfnl_send_receive_process(struct nfnl_handle *h, struct nlmsghdr *nlh) > +{ > + assert(h); > + assert(nlh); > + > + if (nfnl_send(h, nlh) == -1) > + return -1; > + > + return nfnl_receive_process(h); > +} This doesn't really do what it promises, it will call the callback for any message it receives, not only a response. We need to start using sequence numbers before we associate responses with queries. From alex at milivojevic.org Tue Aug 8 17:32:45 2006 From: alex at milivojevic.org (Aleksandar Milivojevic) Date: Tue Aug 8 17:57:05 2006 Subject: Filtering PPPoE Message-ID: <20060808103245.fen2mnkuyask004c@www.milivojevic.org> Hi, I'm attempting to place transparent firewall (on a Linux host configured as bridge) between ADSL modem and some servers. The servers and ADSL modem "speak" PPPoE. Simplified diagram looks like this: +------+ +--------+ +---------+ |ADSL | |bridge/ | |server(s)| |modem |-----|firewall|-----| | +------+ +--------+ +---------+ The problem I have is that Netfilter does not see PPPoE packets on the bridge. Similar thing on the server host(s), Netfilter (if server is Linux host) doesn't see PPPoE, it just sees IP packets as they come out of virtual PPP interface. I can't place the firewall onto the servers, and the servers must have external IP addresses assigned to them. I can't change the toplogy and move PPPoE endpoint infront of the servers (and/or use NAT). Is there any way of filtering IP traffic that is encapsulated inside PPPoE on the bridge host? I found some HOWTOs describing how to do it with OpenBSD based bridge. However, I'm unfamiliar with OpenBSD, and would like to set this up using Linux solution if possible. -- See Ya' later, alligator! http://www.8-P.ca/ From gcoady.lk at gmail.com Wed Aug 9 04:34:18 2006 From: gcoady.lk at gmail.com (Grant Coady) Date: Wed Aug 9 05:04:11 2006 Subject: Filtering PPPoE In-Reply-To: <20060808103245.fen2mnkuyask004c@www.milivojevic.org> References: <20060808103245.fen2mnkuyask004c@www.milivojevic.org> Message-ID: <47iid2lhchbnrha9gi5js28b11vh8hp927@4ax.com> On Tue, 08 Aug 2006 10:32:45 -0500, Aleksandar Milivojevic wrote: >Hi, > >I'm attempting to place transparent firewall (on a Linux host >configured as bridge) between ADSL modem and some servers. The >servers and ADSL modem "speak" PPPoE. > >Simplified diagram looks like this: > > +------+ +--------+ +---------+ > |ADSL | |bridge/ | |server(s)| > |modem |-----|firewall|-----| | > +------+ +--------+ +---------+ > >The problem I have is that Netfilter does not see PPPoE packets on the >bridge. Did you switch ADSL modem to bridge mode? network topology ````````````````` ---------------- ------------ LAN ( ) Phone | | Machines ( Big Bad Internet )--------| ADSL Modem | ( ) Line | | 100-Base-T ---------------- ------------ Switch ----- | -------| | Public IP | X_WORLD | ----- | | ----- ------------- | --| | | ppp0/eth2 | --- | ----- | | | \ |-- ----- X_LOCAL2 <-----|eth1 eth0|-----|/ /|-----| | 192.168.2.0/24 | | | \ |-- ----- 100-Base-T | Firewall | --- | ----- (spare localnet) ------------- | --| | | ----- | ----- -------| | X_LOCAL ----- 192.168.1.0/24 Though I don't port-forward to DMZ servers -- that's the spare localnet that I've not got around to using. Grant. From kaber at trash.net Wed Aug 9 11:41:24 2006 From: kaber at trash.net (Patrick McHardy) Date: Wed Aug 9 12:11:44 2006 Subject: [NETFILTER 01/02]: xt_string: fix negation Message-ID: <44D9ADC4.1000909@trash.net> Hi Dave, following are two small netfilter fixes for 2.6.18. Please apply, thanks. -------------- next part -------------- [NETFILTER]: xt_string: fix negation The xt_string match is broken with ! negation. This resolves a portion of netfilter bugzilla #497. Signed-off-by: Phil Oester Signed-off-by: Patrick McHardy --- commit 71c55528be7cf1199376a1b1c5489f60bf2b2617 tree 9a3262f5694c8eb859852981536baa874d3b0fab parent 9f737633e6ee54fc174282d49b2559bd2208391d author Phil Oester Tue, 08 Aug 2006 11:09:06 +0200 committer Patrick McHardy Tue, 08 Aug 2006 11:09:06 +0200 net/netfilter/xt_string.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/netfilter/xt_string.c b/net/netfilter/xt_string.c index d8e3891..275330f 100644 --- a/net/netfilter/xt_string.c +++ b/net/netfilter/xt_string.c @@ -37,7 +37,7 @@ static int match(const struct sk_buff *s return (skb_find_text((struct sk_buff *)skb, conf->from_offset, conf->to_offset, conf->config, &state) - != UINT_MAX) && !conf->invert; + != UINT_MAX) ^ conf->invert; } #define STRING_TEXT_PRIV(m) ((struct xt_string_info *) m) From kaber at trash.net Wed Aug 9 11:41:27 2006 From: kaber at trash.net (Patrick McHardy) Date: Wed Aug 9 12:11:47 2006 Subject: [NETFILTER 02/02]: xt_hashlimit: fix limit off-by-one Message-ID: <44D9ADC7.30006@trash.net> [NETFILTER]: xt_hashlimit: fix limit off-by-one Hashlimit doesn't account for the first packet, which is inconsistent with the limit match. Reported by ryan.castellucci@gmail.com, netfilter bugzilla #500. Signed-off-by: Patrick McHardy --- commit afe7e5033e79c86de718cb7fce5961a50b1352d3 tree 3c02c7e82f9471ccf72712dc7d8d2f030cbda4fc parent 71c55528be7cf1199376a1b1c5489f60bf2b2617 author Patrick McHardy Wed, 09 Aug 2006 11:08:26 +0200 committer Patrick McHardy Wed, 09 Aug 2006 11:08:26 +0200 net/ipv4/netfilter/ipt_hashlimit.c | 11 ++++------- 1 files changed, 4 insertions(+), 7 deletions(-) diff --git a/net/ipv4/netfilter/ipt_hashlimit.c b/net/ipv4/netfilter/ipt_hashlimit.c index 6b66244..3bd2368 100644 --- a/net/ipv4/netfilter/ipt_hashlimit.c +++ b/net/ipv4/netfilter/ipt_hashlimit.c @@ -454,15 +454,12 @@ hashlimit_match(const struct sk_buff *sk dh->rateinfo.credit_cap = user2credits(hinfo->cfg.avg * hinfo->cfg.burst); dh->rateinfo.cost = user2credits(hinfo->cfg.avg); - - spin_unlock_bh(&hinfo->lock); - return 1; + } else { + /* update expiration timeout */ + dh->expires = now + msecs_to_jiffies(hinfo->cfg.expire); + rateinfo_recalc(dh, now); } - /* update expiration timeout */ - dh->expires = now + msecs_to_jiffies(hinfo->cfg.expire); - - rateinfo_recalc(dh, now); if (dh->rateinfo.credit >= dh->rateinfo.cost) { /* We're underlimit. */ dh->rateinfo.credit -= dh->rateinfo.cost; From samueldg at arcoscom.com Thu Aug 10 01:41:12 2006 From: samueldg at arcoscom.com (=?ISO-8859-1?Q?Samuel_D=EDaz_Garc=EDa?=) Date: Thu Aug 10 02:12:08 2006 Subject: [Fwd: [Bug 499] ipp2p error. ip_tables: ipp2p match: invalid size 0 != 8] Message-ID: <44DA7298.9070805@arcoscom.com> Sorry for the inconvenence, but I don't know where to look for help about this problem. I have this problem, appears to be "x tables" related and somebody put the patch below into bugzilla and it don't works for me. I'm using 2.6.17.8 kernel and 1.3.5 iptables. The /var/log/messages repeats the error in subject every time I try to load "ipp2p" rules. Can anybody help with the patch? Perhaps do I need to restart the box to allow changes to take efect? Thanks -------- Mensaje original -------- https://bugzilla.netfilter.org/bugzilla/show_bug.cgi?id=499 x12345@email.ro changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Additional Comments From x12345@email.ro 2006-08-06 14:36 MET ------- diff -c ipp2p-0.8.1_rc1/Makefile ipp2p-0.8.1_rc1-ok/Makefile *** ipp2p-0.8.1_rc1/Makefile 2006-01-04 18:34:14.000000000 +0200 --- ipp2p-0.8.1_rc1-ok/Makefile 2006-08-05 04:02:17.000000000 +0300 *************** *** 33,39 **** endif ifeq ($(IPTABLES_SRC),) ! IPTABLES_SRC = /usr/src/iptables-1.2.9 endif IPTABLES_INCLUDE = -I$(IPTABLES_SRC)/include --- 33,39 ---- endif ifeq ($(IPTABLES_SRC),) ! IPTABLES_SRC = /usr/src/iptables-1.3.5 endif IPTABLES_INCLUDE = -I$(IPTABLES_SRC)/include Only in ipp2p-0.8.1_rc1-ok/: Modules.symvers diff -c ipp2p-0.8.1_rc1/ipt_ipp2p.c ipp2p-0.8.1_rc1-ok/ipt_ipp2p.c *** ipp2p-0.8.1_rc1/ipt_ipp2p.c 2006-01-04 18:00:49.000000000 +0200 --- ipp2p-0.8.1_rc1-ok/ipt_ipp2p.c 2006-08-06 15:29:13.000000000 +0300 *************** *** 729,734 **** --- 729,735 ---- match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, + const struct xt_match *match, const void *matchinfo, int offset, *************** *** 818,823 **** --- 819,825 ---- static int checkentry(const char *tablename, const struct ipt_ip *ip, + const struct xt_match *match, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) *************** *** 846,851 **** --- 848,854 ---- .name = "ipp2p", .match = &match, .checkentry = &checkentry, + .matchsize = sizeof(struct ipt_p2p_info), .me = THIS_MODULE, #endif }; -- Configure bugmail: https://bugzilla.netfilter.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. -- Samuel D?az Garc?a Director Gerente ArcosCom Wireless, S.L.L. CIF: B11828068 c/ Romero Gago, 19 Arcos de la Frontera 11630 - Cadiz http://www.arcoscom.com mailto:samueldg@arcoscom.com msn: samueldg@arcoscom.com M?vil: 651 93 72 48 Tlfn.: 956 70 13 15 Fax: 956 70 34 83 From olenf at ans.pl Thu Aug 10 11:04:53 2006 From: olenf at ans.pl (Krzysztof Oledzki) Date: Thu Aug 10 11:35:31 2006 Subject: [Fwd: [Bug 499] ipp2p error. ip_tables: ipp2p match: invalid size 0 != 8] In-Reply-To: <44DA7298.9070805@arcoscom.com> References: <44DA7298.9070805@arcoscom.com> Message-ID: On Thu, 10 Aug 2006, Samuel D?az Garc?a wrote: > Sorry for the inconvenence, but I don't know where to look for help about > this problem. The ipp2p extension was moved into external repository and removed from pom-ng. For a 2.6.17.x kernel you need a recent pom-ng extended with "./runme --download". Please verify that you have a properly patched kernel - the ipt_ipp2p.c file should be 25144 bytes long with md5: a674ded594abbf43893ae32630208d08 Best regards, Krzysztof Ol?dzki From eric at inl.fr Thu Aug 10 15:55:32 2006 From: eric at inl.fr (Eric Leblond) Date: Thu Aug 10 16:36:17 2006 Subject: PATCH] nfnetlink_queue Fix typo on debug message Message-ID: <1155218132.10883.3.camel@localhost.localdomain> This patch fixes a trivial typo in nfnetlink_queue. Signed-off-by: Eric Leblond --- net/netfilter/nfnetlink_queue.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index 86a4ac3..d9a2ad8 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -584,7 +584,7 @@ nfqnl_enqueue_packet(struct sk_buff *skb queue->queue_dropped++; status = -ENOSPC; if (net_ratelimit()) - printk(KERN_WARNING "ip_queue: full at %d entries, " + printk(KERN_WARNING "nf_queue: full at %d entries, " "dropping packets(s). Dropped: %d\n", queue->queue_total, queue->queue_dropped); goto err_out_free_nskb; -- 1.4.1 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Ceci est une partie de message =?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?= Url : /pipermail/netfilter-devel/attachments/20060810/2e2d2b71/attachment.pgp From heavytull at hotmail.com Thu Aug 10 21:38:03 2006 From: heavytull at hotmail.com (Jethro Tull) Date: Thu Aug 10 22:08:30 2006 Subject: iptables installed while old version alredy was In-Reply-To: <20060805215725.GA12084@linuxace.com> Message-ID: I mistakenly installed iptables while it was already installed on my pc; do i need to make clean and then use my distro package tool to remove the old version and then re install the latest iptables? _________________________________________________________________ Windows Live™ Messenger has arrived. Click here to download it for free! http://imagine-msn.com/messenger/launch80/?locale=en-gb From kaber at trash.net Fri Aug 11 00:23:00 2006 From: kaber at trash.net (Patrick McHardy) Date: Fri Aug 11 00:53:24 2006 Subject: PATCH] nfnetlink_queue Fix typo on debug message In-Reply-To: <1155218132.10883.3.camel@localhost.localdomain> References: <1155218132.10883.3.camel@localhost.localdomain> Message-ID: <44DBB1C4.5080901@trash.net> Eric Leblond wrote: > This patch fixes a trivial typo in nfnetlink_queue. > > Signed-off-by: Eric Leblond > --- > net/netfilter/nfnetlink_queue.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c > index 86a4ac3..d9a2ad8 100644 > --- a/net/netfilter/nfnetlink_queue.c > +++ b/net/netfilter/nfnetlink_queue.c > @@ -584,7 +584,7 @@ nfqnl_enqueue_packet(struct sk_buff *skb > queue->queue_dropped++; > status = -ENOSPC; > if (net_ratelimit()) > - printk(KERN_WARNING "ip_queue: full at %d entries, " > + printk(KERN_WARNING "nf_queue: full at %d entries, " > "dropping packets(s). Dropped: %d\n", > queue->queue_total, queue->queue_dropped); > goto err_out_free_nskb; You missed one spot. It should actually say nfnetlink_queue I guess, but I'm too lazy to fix up all those printks :) From davem at davemloft.net Fri Aug 11 01:57:45 2006 From: davem at davemloft.net (David Miller) Date: Fri Aug 11 02:28:11 2006 Subject: [NETFILTER 01/02]: xt_string: fix negation In-Reply-To: <44D9ADC4.1000909@trash.net> References: <44D9ADC4.1000909@trash.net> Message-ID: <20060810.165745.51701724.davem@davemloft.net> From: Patrick McHardy Date: Wed, 09 Aug 2006 11:41:24 +0200 > following are two small netfilter fixes for 2.6.18. > Please apply, thanks. Both applied, thanks Patrick. From degraaf at cpsc.ucalgary.ca Fri Aug 11 03:54:23 2006 From: degraaf at cpsc.ucalgary.ca (Rennie deGraaf) Date: Fri Aug 11 04:28:01 2006 Subject: Bug (minor) in ip_tables.c? Message-ID: <44DBE34F.8080902@cpsc.ucalgary.ca> In init() in ip_tables.c, if nf_register_sockopt() fails, then the function returns failure without unregistering the targets and matches that it provides. To correct this, init() should be changed to something like this: static int __init ip_tables_init(void) { int ret; xt_proto_init(AF_INET); /* Noone else will be downing sem now, so we won't sleep */ xt_register_target(&ipt_standard_target); xt_register_target(&ipt_error_target); xt_register_match(&icmp_matchstruct); /* Register setsockopt */ ret = nf_register_sockopt(&ipt_sockopts); if (ret < 0) { duprintf("Unable to register sockopts.\n"); goto failure_sockopt; } printk("ip_tables: (C) 2000-2006 Netfilter Core Team\n"); return 0; failure_sockopt: xt_unregister_match(AF_INET, &icmp_matchstruct); xt_unregister_target(AF_INET, &ipt_error_target); xt_unregister_target(AF_INET, &ipt_standard_target); xt_proto_fini(AF_INET); return ret; } Rennie deGraaf -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: OpenPGP digital signature Url : /pipermail/netfilter-devel/attachments/20060811/aa6adeb2/signature.pgp From pasik at iki.fi Fri Aug 11 10:30:20 2006 From: pasik at iki.fi (Pasi =?iso-8859-1?Q?K=E4rkk=E4inen?=) Date: Fri Aug 11 11:00:57 2006 Subject: [RFC,ANNOUNCE] conntrack daemon (stateful replication) In-Reply-To: <20060530065804.GA24166@kruemel.my-eitzenberger.de> References: <447A1FB7.5080709@netfilter.org> <20060530065804.GA24166@kruemel.my-eitzenberger.de> Message-ID: <20060811083020.GW29423@edu.joroinen.fi> Hello! Any updates to these projects? Files available somewhere? I think many people from this list would like to test and help with these daemons.. - Pasi On Tue, May 30, 2006 at 08:58:05AM +0200, Holger Eitzenberger wrote: > On Mon, May 29, 2006 at 12:09:59AM +0200, Pablo Neira Ayuso wrote: > > > I've been working on a pet project during the last months. Part of this > > stuff is related with my works in the university. > > Hi Pablo, > > I am working on a daemon called 'ctsyncd', which for me started as a > proof-of-concept and is now almost in a state where I can release it > to the public. My current objective is simple master/slave scenario > without active/active, and I am almost done. > > Hopefully I am able to look at your sources. With your great knowledge > of libnetfilter_conntrack and my programming skills we should consider > joining our efforts. But first I will release my code for public review > within a few days. > > > Stay tuned. > > /holger > > > > - Stateful replication: the daemon keeps a cache of internal events via > > libnetfilter_conntrack and a cache of external event received from the > > other node. > > - Support for classical Primary/Backup settings > > - Support for Active/Active settings (two machines max. per VRRP instance) > > - Support for NAT: It recognizes NAT'ed connections and handles them > > properly. > > - UDP traffic ignore facility > > - ICMP traffic ignore facility > > - Ignore loopback traffic (not customizable at the moment) > > - Ignore traffic for certain set of machines: Useful to ignore traffic > > for the firewall since we just want to replicate conntracks that > > represent forwarded connections. > > - Dump internal and external caches via UNIX sockets > > - Flush internal, external caches and conntrack table > > - The communication between daemons is done in NETLINK format, so the > > protocol used is based on NETLINK over IP, to ensure backward compatibility. > > - Configuration via file > > > From mlhuang at CS.Princeton.EDU Thu Aug 10 18:31:10 2006 From: mlhuang at CS.Princeton.EDU (Mark Huang) Date: Fri Aug 11 14:19:31 2006 Subject: [PATCH] Fix ipt_ULOG panics on SMP kernels Message-ID: <44DB5F4E.2080608@cs.princeton.edu> I've run into the same kernel panic as these reports: https://lists.gnumonks.org/pipermail/ulogd/2005-August/000776.html http://lists.netfilter.org/pipermail/netfilter/2006-January/064509.html https://lists.gnumonks.org/pipermail/ulogd/2006-April/000853.html On various SMP machines. The culprit is a null ub->skb in ulog_send(). I believe that this can occur for the following reason. If ulog_timer() has already been scheduled on one CPU and is spinning on the lock, and ipt_ulog_packet() flushes the queue on another CPU by calling ulog_send() right before it exits (because the threshold is reached), there will be no skbuff when ulog_timer() acquires the lock and calls ulog_send(). Cancelling the timer in ulog_send() doesn't help because it has already been scheduled and is running on the first CPU. There are two solutions that I can see: re-allocate ub->skb at the end of ipt_ulog_packet(), just like it does toward the beginning of the function. But the problem will still happen if the allocation fails. The second solution, implemented by the attached patch, is to just return from ulog_send() if ub->skb is null. Regards, --Mark -------------- next part -------------- A non-text attachment was scrubbed... Name: ipt_ULOG.patch Type: text/x-patch Size: 684 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060810/8ca0856a/ipt_ULOG.bin From kaber at trash.net Fri Aug 11 18:09:12 2006 From: kaber at trash.net (Patrick McHardy) Date: Fri Aug 11 18:42:01 2006 Subject: Bug (minor) in ip_tables.c? In-Reply-To: <44DBE34F.8080902@cpsc.ucalgary.ca> References: <44DBE34F.8080902@cpsc.ucalgary.ca> Message-ID: <44DCABA8.10103@trash.net> Rennie deGraaf wrote: > In init() in ip_tables.c, if nf_register_sockopt() fails, then the > function returns failure without unregistering the targets and matches > that it provides. Good spotting. As usual this is duplicated in ip6_tables and arp_tables, I've queued up this patch to fix it. -------------- next part -------------- [NETFILTER]: {arp,ip,ip6}_tables: proper error recovery in initialization path Neither of {arp,ip,ip6}_tables cleans up behind itself when something goes wrong during initialization. Noticed by Rennie deGraaf Signed-off-by: Patrick McHardy --- commit 85b125c30937bf0ef9fad5f4c3b4eab4588d4580 tree fc1796384ca7e973256f16095339c86b2a808c02 parent afe7e5033e79c86de718cb7fce5961a50b1352d3 author Patrick McHardy Fri, 11 Aug 2006 18:10:00 +0200 committer Patrick McHardy Fri, 11 Aug 2006 18:10:00 +0200 net/ipv4/netfilter/arp_tables.c | 27 ++++++++++++++++++++------- net/ipv4/netfilter/ip_tables.c | 33 +++++++++++++++++++++++++-------- net/ipv6/netfilter/ip6_tables.c | 34 +++++++++++++++++++++++++--------- 3 files changed, 70 insertions(+), 24 deletions(-) diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 80c73ca..df4854c 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -1170,21 +1170,34 @@ static int __init arp_tables_init(void) { int ret; - xt_proto_init(NF_ARP); + ret = xt_proto_init(NF_ARP); + if (ret < 0) + goto err1; /* Noone else will be downing sem now, so we won't sleep */ - xt_register_target(&arpt_standard_target); - xt_register_target(&arpt_error_target); + ret = xt_register_target(&arpt_standard_target); + if (ret < 0) + goto err2; + ret = xt_register_target(&arpt_error_target); + if (ret < 0) + goto err3; /* Register setsockopt */ ret = nf_register_sockopt(&arpt_sockopts); - if (ret < 0) { - duprintf("Unable to register sockopts.\n"); - return ret; - } + if (ret < 0) + goto err4; printk("arp_tables: (C) 2002 David S. Miller\n"); return 0; + +err4: + xt_unregister_target(&arpt_error_target); +err3: + xt_unregister_target(&arpt_standard_target); +err2: + xt_proto_fini(NF_ARP); +err1: + return ret; } static void __exit arp_tables_fini(void) diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index fc5bdd5..f316ff5 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -2239,22 +2239,39 @@ static int __init ip_tables_init(void) { int ret; - xt_proto_init(AF_INET); + ret = xt_proto_init(AF_INET); + if (ret < 0) + goto err1; /* Noone else will be downing sem now, so we won't sleep */ - xt_register_target(&ipt_standard_target); - xt_register_target(&ipt_error_target); - xt_register_match(&icmp_matchstruct); + ret = xt_register_target(&ipt_standard_target); + if (ret < 0) + goto err2; + ret = xt_register_target(&ipt_error_target); + if (ret < 0) + goto err3; + ret = xt_register_match(&icmp_matchstruct); + if (ret < 0) + goto err4; /* Register setsockopt */ ret = nf_register_sockopt(&ipt_sockopts); - if (ret < 0) { - duprintf("Unable to register sockopts.\n"); - return ret; - } + if (ret < 0) + goto err5; printk("ip_tables: (C) 2000-2006 Netfilter Core Team\n"); return 0; + +err5: + xt_unregister_match(&icmp_matchstruct); +err4: + xt_unregister_target(&ipt_error_target); +err3: + xt_unregister_target(&ipt_standard_target); +err2: + xt_proto_fini(AF_INET); +err1: + return ret; } static void __exit ip_tables_fini(void) diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index f26898b..c9d6b23 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -1398,23 +1398,39 @@ static int __init ip6_tables_init(void) { int ret; - xt_proto_init(AF_INET6); + ret = xt_proto_init(AF_INET6); + if (ret < 0) + goto err1; /* Noone else will be downing sem now, so we won't sleep */ - xt_register_target(&ip6t_standard_target); - xt_register_target(&ip6t_error_target); - xt_register_match(&icmp6_matchstruct); + ret = xt_register_target(&ip6t_standard_target); + if (ret < 0) + goto err2; + ret = xt_register_target(&ip6t_error_target); + if (ret < 0) + goto err3; + ret = xt_register_match(&icmp6_matchstruct); + if (ret < 0) + goto err4; /* Register setsockopt */ ret = nf_register_sockopt(&ip6t_sockopts); - if (ret < 0) { - duprintf("Unable to register sockopts.\n"); - xt_proto_fini(AF_INET6); - return ret; - } + if (ret < 0) + goto err5; printk("ip6_tables: (C) 2000-2006 Netfilter Core Team\n"); return 0; + +err5: + xt_unregister_match(&icmp6_matchstruct); +err4: + xt_unregister_target(&ip6t_error_target); +err3: + xt_unregister_target(&ip6t_standard_target); +err2: + xt_proto_fini(AF_INET6); +err1: + return ret; } static void __exit ip6_tables_fini(void) From kaber at trash.net Fri Aug 11 18:19:58 2006 From: kaber at trash.net (Patrick McHardy) Date: Fri Aug 11 18:52:25 2006 Subject: [PATCH] Fix ipt_ULOG panics on SMP kernels In-Reply-To: <44DB5F4E.2080608@cs.princeton.edu> References: <44DB5F4E.2080608@cs.princeton.edu> Message-ID: <44DCAE2E.8040006@trash.net> Mark Huang wrote: > I've run into the same kernel panic as these reports: > > https://lists.gnumonks.org/pipermail/ulogd/2005-August/000776.html > http://lists.netfilter.org/pipermail/netfilter/2006-January/064509.html > https://lists.gnumonks.org/pipermail/ulogd/2006-April/000853.html > > On various SMP machines. The culprit is a null ub->skb in ulog_send(). I > believe > that this can occur for the following reason. If ulog_timer() has > already been > scheduled on one CPU and is spinning on the lock, and ipt_ulog_packet() > flushes > the queue on another CPU by calling ulog_send() right before it exits > (because > the threshold is reached), there will be no skbuff when ulog_timer() > acquires > the lock and calls ulog_send(). Cancelling the timer in ulog_send() > doesn't help > because it has already been scheduled and is running on the first CPU. > > There are two solutions that I can see: re-allocate ub->skb at the end of > ipt_ulog_packet(), just like it does toward the beginning of the > function. But > the problem will still happen if the allocation fails. The second solution, > implemented by the attached patch, is to just return from ulog_send() if > ub->skb > is null. Very nice catch, thank you. The second solution is perfectly fine I think, if the skb has already been sent there is no need to do anything, a new allocation could be useless if no further traffic arrives. If you could add a similar fix to net/bridge/netfilter/ebt_ulog.c and net/netfilter/nfnetlink_log.c and send me a Signed-off-by: line I'll push it in 2.6.18. Thanks. From kaber at trash.net Fri Aug 11 19:13:30 2006 From: kaber at trash.net (Patrick McHardy) Date: Fri Aug 11 19:43:59 2006 Subject: [PATCH] Fix ipt_ULOG panics on SMP kernels In-Reply-To: <44DCB42B.7010009@cs.princeton.edu> References: <44DB5F4E.2080608@cs.princeton.edu> <44DCAE2E.8040006@trash.net> <44DCB42B.7010009@cs.princeton.edu> Message-ID: <44DCBABA.4090207@trash.net> Mark Huang wrote: > Fix kernel panic on various SMP machines. The culprit is a null > ub->skb in ulog_send(). If ulog_timer() has already been scheduled on > one CPU and is spinning on the lock, and ipt_ulog_packet() flushes the > queue on another CPU by calling ulog_send() right before it exits, > there will be no skbuff when ulog_timer() acquires the lock and calls > ulog_send(). Cancelling the timer in ulog_send() doesn't help because > it has already been scheduled and is running on the first CPU. > > Similar problem exists in ebt_ulog.c and nfnetlink_log.c. Applied, thanks Mark. From m at rtij.nl Fri Aug 11 20:44:13 2006 From: m at rtij.nl (Martijn Lievaart) Date: Fri Aug 11 21:14:45 2006 Subject: Searching a home for psd Message-ID: <44DCCFFD.2080807@rtij.nl> Hi, Psd has been dropped from patch-o-matic and no one has taken it up. If someone can send me the latest pom entries for psd, I'll set up an external repository for it. Besides trying to bring it up to date to the latest kernel, who knows, maybe I'll really maintain it. Seriously, it's a great patch and I would hat to see it go. M4 From kaber at trash.net Sat Aug 12 02:25:38 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Aug 12 02:56:20 2006 Subject: [NETFILTER 02/02]: ulog: fix panic on SMP kernels In-Reply-To: <20060812002535.30253.73682.sendpatchset@localhost.localdomain> References: <20060812002535.30253.73682.sendpatchset@localhost.localdomain> Message-ID: <20060812002538.30253.39211.sendpatchset@localhost.localdomain> [NETFILTER]: ulog: fix panic on SMP kernels Fix kernel panic on various SMP machines. The culprit is a null ub->skb in ulog_send(). If ulog_timer() has already been scheduled on one CPU and is spinning on the lock, and ipt_ulog_packet() flushes the queue on another CPU by calling ulog_send() right before it exits, there will be no skbuff when ulog_timer() acquires the lock and calls ulog_send(). Cancelling the timer in ulog_send() doesn't help because it has already been scheduled and is running on the first CPU. Similar problem exists in ebt_ulog.c and nfnetlink_log.c. Signed-off-by: Mark Huang Signed-off-by: Patrick McHardy --- commit 005dbeb54700681d8770c3c76ac452387cabe1e1 tree 1d452a2166403710ed576640b6a4d92456a4b69a parent 85b125c30937bf0ef9fad5f4c3b4eab4588d4580 author Mark Huang Fri, 11 Aug 2006 19:39:00 +0200 committer Patrick McHardy Fri, 11 Aug 2006 19:39:00 +0200 net/bridge/netfilter/ebt_ulog.c | 3 +++ net/ipv4/netfilter/ipt_ULOG.c | 5 +++++ net/netfilter/nfnetlink_log.c | 3 +++ 3 files changed, 11 insertions(+), 0 deletions(-) diff --git a/net/bridge/netfilter/ebt_ulog.c b/net/bridge/netfilter/ebt_ulog.c index 02693a2..9f950db 100644 --- a/net/bridge/netfilter/ebt_ulog.c +++ b/net/bridge/netfilter/ebt_ulog.c @@ -74,6 +74,9 @@ static void ulog_send(unsigned int nlgro if (timer_pending(&ub->timer)) del_timer(&ub->timer); + if (!ub->skb) + return; + /* last nlmsg needs NLMSG_DONE */ if (ub->qlen > 1) ub->lastnlh->nlmsg_type = NLMSG_DONE; diff --git a/net/ipv4/netfilter/ipt_ULOG.c b/net/ipv4/netfilter/ipt_ULOG.c index d7dd7fe..d46fd67 100644 --- a/net/ipv4/netfilter/ipt_ULOG.c +++ b/net/ipv4/netfilter/ipt_ULOG.c @@ -115,6 +115,11 @@ static void ulog_send(unsigned int nlgro del_timer(&ub->timer); } + if (!ub->skb) { + DEBUGP("ipt_ULOG: ulog_send: nothing to send\n"); + return; + } + /* last nlmsg needs NLMSG_DONE */ if (ub->qlen > 1) ub->lastnlh->nlmsg_type = NLMSG_DONE; diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c index 61cdda4..b59d3b2 100644 --- a/net/netfilter/nfnetlink_log.c +++ b/net/netfilter/nfnetlink_log.c @@ -366,6 +366,9 @@ __nfulnl_send(struct nfulnl_instance *in if (timer_pending(&inst->timer)) del_timer(&inst->timer); + if (!inst->skb) + return 0; + if (inst->qlen > 1) inst->lastnlh->nlmsg_type = NLMSG_DONE; From kaber at trash.net Sat Aug 12 02:25:36 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Aug 12 02:56:21 2006 Subject: [NETFILTER 01/02]: {arp, ip, ip6}_tables: proper error recovery in initialization path In-Reply-To: <20060812002535.30253.73682.sendpatchset@localhost.localdomain> References: <20060812002535.30253.73682.sendpatchset@localhost.localdomain> Message-ID: <20060812002536.30253.19487.sendpatchset@localhost.localdomain> [NETFILTER]: {arp,ip,ip6}_tables: proper error recovery in init path Neither of {arp,ip,ip6}_tables cleans up behind itself when something goes wrong during initialization. Noticed by Rennie deGraaf Signed-off-by: Patrick McHardy --- commit 85b125c30937bf0ef9fad5f4c3b4eab4588d4580 tree fc1796384ca7e973256f16095339c86b2a808c02 parent afe7e5033e79c86de718cb7fce5961a50b1352d3 author Patrick McHardy Fri, 11 Aug 2006 18:10:00 +0200 committer Patrick McHardy Fri, 11 Aug 2006 18:10:00 +0200 net/ipv4/netfilter/arp_tables.c | 27 ++++++++++++++++++++------- net/ipv4/netfilter/ip_tables.c | 33 +++++++++++++++++++++++++-------- net/ipv6/netfilter/ip6_tables.c | 34 +++++++++++++++++++++++++--------- 3 files changed, 70 insertions(+), 24 deletions(-) diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 80c73ca..df4854c 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -1170,21 +1170,34 @@ static int __init arp_tables_init(void) { int ret; - xt_proto_init(NF_ARP); + ret = xt_proto_init(NF_ARP); + if (ret < 0) + goto err1; /* Noone else will be downing sem now, so we won't sleep */ - xt_register_target(&arpt_standard_target); - xt_register_target(&arpt_error_target); + ret = xt_register_target(&arpt_standard_target); + if (ret < 0) + goto err2; + ret = xt_register_target(&arpt_error_target); + if (ret < 0) + goto err3; /* Register setsockopt */ ret = nf_register_sockopt(&arpt_sockopts); - if (ret < 0) { - duprintf("Unable to register sockopts.\n"); - return ret; - } + if (ret < 0) + goto err4; printk("arp_tables: (C) 2002 David S. Miller\n"); return 0; + +err4: + xt_unregister_target(&arpt_error_target); +err3: + xt_unregister_target(&arpt_standard_target); +err2: + xt_proto_fini(NF_ARP); +err1: + return ret; } static void __exit arp_tables_fini(void) diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index fc5bdd5..f316ff5 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -2239,22 +2239,39 @@ static int __init ip_tables_init(void) { int ret; - xt_proto_init(AF_INET); + ret = xt_proto_init(AF_INET); + if (ret < 0) + goto err1; /* Noone else will be downing sem now, so we won't sleep */ - xt_register_target(&ipt_standard_target); - xt_register_target(&ipt_error_target); - xt_register_match(&icmp_matchstruct); + ret = xt_register_target(&ipt_standard_target); + if (ret < 0) + goto err2; + ret = xt_register_target(&ipt_error_target); + if (ret < 0) + goto err3; + ret = xt_register_match(&icmp_matchstruct); + if (ret < 0) + goto err4; /* Register setsockopt */ ret = nf_register_sockopt(&ipt_sockopts); - if (ret < 0) { - duprintf("Unable to register sockopts.\n"); - return ret; - } + if (ret < 0) + goto err5; printk("ip_tables: (C) 2000-2006 Netfilter Core Team\n"); return 0; + +err5: + xt_unregister_match(&icmp_matchstruct); +err4: + xt_unregister_target(&ipt_error_target); +err3: + xt_unregister_target(&ipt_standard_target); +err2: + xt_proto_fini(AF_INET); +err1: + return ret; } static void __exit ip_tables_fini(void) diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index f26898b..c9d6b23 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -1398,23 +1398,39 @@ static int __init ip6_tables_init(void) { int ret; - xt_proto_init(AF_INET6); + ret = xt_proto_init(AF_INET6); + if (ret < 0) + goto err1; /* Noone else will be downing sem now, so we won't sleep */ - xt_register_target(&ip6t_standard_target); - xt_register_target(&ip6t_error_target); - xt_register_match(&icmp6_matchstruct); + ret = xt_register_target(&ip6t_standard_target); + if (ret < 0) + goto err2; + ret = xt_register_target(&ip6t_error_target); + if (ret < 0) + goto err3; + ret = xt_register_match(&icmp6_matchstruct); + if (ret < 0) + goto err4; /* Register setsockopt */ ret = nf_register_sockopt(&ip6t_sockopts); - if (ret < 0) { - duprintf("Unable to register sockopts.\n"); - xt_proto_fini(AF_INET6); - return ret; - } + if (ret < 0) + goto err5; printk("ip6_tables: (C) 2000-2006 Netfilter Core Team\n"); return 0; + +err5: + xt_unregister_match(&icmp6_matchstruct); +err4: + xt_unregister_target(&ip6t_error_target); +err3: + xt_unregister_target(&ip6t_standard_target); +err2: + xt_proto_fini(AF_INET6); +err1: + return ret; } static void __exit ip6_tables_fini(void) From kaber at trash.net Sat Aug 12 02:25:35 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Aug 12 02:56:23 2006 Subject: [NETFILTER 00/02]: Netfilter fixes Message-ID: <20060812002535.30253.73682.sendpatchset@localhost.localdomain> Hi Dave, following are two more fixes for 2.6.18. The ulog patch fixes an old crash in ulog that has hit quite a few people so far. I'm going to push it to -stable as well. Please apply, thanks. net/bridge/netfilter/ebt_ulog.c | 6 +++ net/ipv4/netfilter/arp_tables.c | 54 +++++++++++++++++++++++-------- net/ipv4/netfilter/ip_tables.c | 66 +++++++++++++++++++++++++++++--------- net/ipv4/netfilter/ipt_ULOG.c | 10 +++++ net/ipv6/netfilter/ip6_tables.c | 68 +++++++++++++++++++++++++++++----------- net/netfilter/nfnetlink_log.c | 6 +++ 6 files changed, 162 insertions(+), 48 deletions(-) Mark Huang: [NETFILTER]: ulog: fix panic on SMP kernels Patrick McHardy: [NETFILTER]: {arp,ip,ip6}_tables: proper error recovery in init path From davem at davemloft.net Sat Aug 12 02:30:55 2006 From: davem at davemloft.net (David Miller) Date: Sat Aug 12 03:01:15 2006 Subject: [NETFILTER 00/02]: Netfilter fixes In-Reply-To: <20060812002535.30253.73682.sendpatchset@localhost.localdomain> References: <20060812002535.30253.73682.sendpatchset@localhost.localdomain> Message-ID: <20060811.173055.122622791.davem@davemloft.net> From: Patrick McHardy Date: Sat, 12 Aug 2006 02:25:35 +0200 (MEST) > following are two more fixes for 2.6.18. The ulog patch fixes an old > crash in ulog that has hit quite a few people so far. I'm going to push > it to -stable as well. > > Please apply, thanks. Both applied, thanks Patrick. From kaber at trash.net Sat Aug 12 02:33:19 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Aug 12 03:03:53 2006 Subject: Searching a home for psd In-Reply-To: <44DCCFFD.2080807@rtij.nl> References: <44DCCFFD.2080807@rtij.nl> Message-ID: <44DD21CF.3090404@trash.net> Martijn Lievaart wrote: > Psd has been dropped from patch-o-matic and no one has taken it up. If > someone can send me the latest pom entries for psd, I'll set up an > external repository for it. Besides trying to bring it up to date to the > latest kernel, who knows, maybe I'll really maintain it. Seriously, it's > a great patch and I would hat to see it go. Trying to employ all my SVN skills to get a copy didn't help, so please just grab it from the latest release, there haven't been any changes since then anyway. From eric at inl.fr Sat Aug 12 16:50:40 2006 From: eric at inl.fr (Eric Leblond) Date: Sat Aug 12 17:20:19 2006 Subject: [PATCH] Trivial replace of ip_queue by nfnetlink_queue in nfnetlink_queue code Message-ID: <1155394240.10172.29.camel@localhost> Hi, This patch replace "ip_queue" by "nfnetlink_queue" in nfnetlink_queue.c. BR, Signed-off-by: Eric Leblond --- net/netfilter/nfnetlink_queue.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) 6c270ef9d8727b2b44807b411fb3dd60c87ccf11 diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index 49ef41e..134219c 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -584,7 +584,7 @@ nfqnl_enqueue_packet(struct sk_buff *skb queue->queue_dropped++; status = -ENOSPC; if (net_ratelimit()) - printk(KERN_WARNING "ip_queue: full at %d entries, " + printk(KERN_WARNING "nfnetlink_queue: full at %d entries, " "dropping packets(s). Dropped: %d\n", queue->queue_total, queue->queue_dropped); goto err_out_free_nskb; @@ -635,7 +635,7 @@ nfqnl_mangle(void *data, int data_len, s diff, GFP_ATOMIC); if (newskb == NULL) { - printk(KERN_WARNING "ip_queue: OOM " + printk(KERN_WARNING "nfnetlink_queue: OOM " "in mangle, dropping packet\n"); return -ENOMEM; } -- 1.1.3 -- Eric Leblond INL -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 191 bytes Desc: Ceci est une partie de message =?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?= Url : /pipermail/netfilter-devel/attachments/20060812/2237c913/attachment.pgp From kaber at trash.net Sat Aug 12 17:12:50 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Aug 12 17:43:22 2006 Subject: [PATCH] Trivial replace of ip_queue by nfnetlink_queue in nfnetlink_queue code In-Reply-To: <1155394240.10172.29.camel@localhost> References: <1155394240.10172.29.camel@localhost> Message-ID: <44DDEFF2.9010403@trash.net> Eric Leblond wrote: > This patch replace "ip_queue" by "nfnetlink_queue" in nfnetlink_queue.c. I already changed it to nf_queue now. If you want to change it to nfnetlink_queue, it should be done in all places. But I think it would make more sense to throw most of these printks out. From eric at inl.fr Sat Aug 12 17:21:41 2006 From: eric at inl.fr (Eric Leblond) Date: Sat Aug 12 17:51:19 2006 Subject: [PATCH] Trivial replace of ip_queue by nfnetlink_queue in nfnetlink_queue code In-Reply-To: <44DDEFF2.9010403@trash.net> References: <1155394240.10172.29.camel@localhost> <44DDEFF2.9010403@trash.net> Message-ID: <1155396101.10172.36.camel@localhost> Le samedi 12 ao?t 2006 ? 17:12 +0200, Patrick McHardy a ?crit : > Eric Leblond wrote: > > This patch replace "ip_queue" by "nfnetlink_queue" in nfnetlink_queue.c. > > I already changed it to nf_queue now. Oups sorry :-/ I thought you want me to replace nf_queue by nfnetlink_queue in my previous patch. > If you want to change it to > nfnetlink_queue, it should be done in all places. No, I prefer nf_queue. > But I think it > would make more sense to throw most of these printks out. The one about queue full is really useful to notice performance issue or problems. By the way, do you have a public git tree where we could fetch the latest version of the code ? Best regards, -- Eric Leblond INL -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 191 bytes Desc: Ceci est une partie de message =?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?= Url : /pipermail/netfilter-devel/attachments/20060812/fc1c695a/attachment-0001.pgp From kaber at trash.net Sat Aug 12 17:31:05 2006 From: kaber at trash.net (Patrick McHardy) Date: Sat Aug 12 18:01:36 2006 Subject: [PATCH] Trivial replace of ip_queue by nfnetlink_queue in nfnetlink_queue code In-Reply-To: <1155396101.10172.36.camel@localhost> References: <1155394240.10172.29.camel@localhost> <44DDEFF2.9010403@trash.net> <1155396101.10172.36.camel@localhost> Message-ID: <44DDF439.30104@trash.net> Eric Leblond wrote: > Le samedi 12 ao?t 2006 ? 17:12 +0200, Patrick McHardy a ?crit : > >>But I think it >>would make more sense to throw most of these printks out. > > > The one about queue full is really useful to notice performance issue or > problems. Yes, but we already have counters for that and dropped packets are not too uncommon, we shouldn't flood the ringbuffer with this. > By the way, do you have a public git tree where we could fetch the > latest version of the code ? No, but it only includes very few patches currently anyway. I plan to merge the priv_data stuff this weekend and push my queued patches for 2.6.19 to Dave, so people can work off his tree. From t.luettgert at pressestimmen.de Sat Aug 12 21:07:54 2006 From: t.luettgert at pressestimmen.de (Torsten Luettgert) Date: Sat Aug 12 21:37:39 2006 Subject: Searching a home for psd In-Reply-To: <44DD21CF.3090404@trash.net> References: <44DCCFFD.2080807@rtij.nl> <44DD21CF.3090404@trash.net> Message-ID: <1155409675.2447.4.camel@murdegern.cbxnet.de> On Sat, 2006-08-12 at 02:33 +0200, Patrick McHardy wrote: > Martijn Lievaart wrote: > > Psd has been dropped from patch-o-matic and no one has taken it up. If > > someone can send me the latest pom entries for psd, I'll set up an > > external repository for it. Besides trying to bring it up to date to the > > latest kernel, who knows, maybe I'll really maintain it. Seriously, it's > > a great patch and I would hat to see it go. > > Trying to employ all my SVN skills to get a copy didn't help, so please > just grab it from the latest release, there haven't been any changes > since then anyway. > "The great removal" occured at 6597 and 6598, so this should work: svn co -r 6596 http://svn.netfilter.org/netfilter/trunk/patch-o-matic-ng - Torsten From m at rtij.nl Sat Aug 12 22:06:29 2006 From: m at rtij.nl (Martijn Lievaart) Date: Sat Aug 12 22:37:07 2006 Subject: Searching a home for psd In-Reply-To: <1155409675.2447.4.camel@murdegern.cbxnet.de> References: <44DCCFFD.2080807@rtij.nl> <44DD21CF.3090404@trash.net> <1155409675.2447.4.camel@murdegern.cbxnet.de> Message-ID: <44DE34C5.9020604@rtij.nl> Torsten Luettgert wrote: >On Sat, 2006-08-12 at 02:33 +0200, Patrick McHardy wrote: > > >>Martijn Lievaart wrote: >> >> >>>Psd has been dropped from patch-o-matic and no one has taken it up. If >>>someone can send me the latest pom entries for psd, I'll set up an >>>external repository for it. Besides trying to bring it up to date to the >>>latest kernel, who knows, maybe I'll really maintain it. Seriously, it's >>>a great patch and I would hat to see it go. >>> >>> >>Trying to employ all my SVN skills to get a copy didn't help, so please >>just grab it from the latest release, there haven't been any changes >>since then anyway. >> >> >> > >"The great removal" occured at 6597 and 6598, so this should work: > >svn co -r 6596 http://svn.netfilter.org/netfilter/trunk/patch-o-matic-ng > > > Snarfed. Thx. M4 From php0t at zorro.hu Mon Aug 14 01:14:20 2006 From: php0t at zorro.hu (php0t) Date: Mon Aug 14 01:45:01 2006 Subject: connlimit Message-ID: <002d01c6bf2e$351f1c70$650ba8c0@DORKA> Dear developers, I've had a nice time trying to limit connections. The kernel is 2.6.17.8. Apart from the first couple of annoyances (such as the patch being renamed from iplimit to connlimit, patch-o-matic not being able to apply it to the current kernel etc), I've managed to patch manually, compile it as a module and load it. However, when I try to add an according test rule, I get the 'Invalid argument' error, and dmesg says: ip_tables: connlimit match: invalid size 0 != 16 I also tried going to the site mentioned in the latest pom-ng's source.list: # ipp2p, time, IPMARK and connlimit maintained by Krzysztof Oledzki http://people.netfilter.org/ole/pom/ But all I get is a smiley :) When I google for my current problem, most suggest that connlimit is out-of-date, nobody cares about it any more, etc. As I'm no C coder, my two questions are, 1) what could I do to make this work ? Are there any similar modules available that are stable? 2) could it be possible to stabilize this patch and have it added to the kernel source? There are so many iptables extensions and modules by default that are probably rarely used, why is this (IMHO very basic) feature excluded? Thanks for reading and any replies P. From kernel at linuxace.com Mon Aug 14 01:40:53 2006 From: kernel at linuxace.com (Phil Oester) Date: Mon Aug 14 02:11:45 2006 Subject: connlimit In-Reply-To: <002d01c6bf2e$351f1c70$650ba8c0@DORKA> References: <002d01c6bf2e$351f1c70$650ba8c0@DORKA> Message-ID: <20060813234053.GA22015@linuxace.com> On Mon, Aug 14, 2006 at 01:14:20AM +0200, php0t wrote: > > Dear developers, > > I've had a nice time trying to limit connections. The kernel is > 2.6.17.8. > Apart from the first couple of annoyances (such as the patch being > renamed from iplimit to connlimit, patch-o-matic not being able to apply > it to the current kernel etc), I've managed to patch manually, compile > it as a module and load it. > > However, when I try to add an according test rule, I get the 'Invalid > argument' error, and dmesg says: > ip_tables: connlimit match: invalid size 0 != 16 See this thread: http://marc.theaimsgroup.com/?l=netfilter-devel&m=115334461228009&w=2 > When I google for my current problem, most suggest that connlimit is > out-of-date, nobody cares about it any more, etc. Next time try searching the netfilter archives directly. Phil From richard at mail3.edong.com Mon Aug 14 05:21:15 2006 From: richard at mail3.edong.com (richard@mail3.edong.com) Date: Mon Aug 14 05:40:07 2006 Subject: port netfilter on linux 2.6 back to linux 2.4 Message-ID: <20060814032115.3C0FC3683C8@mail3.edong.com> hello everyone: I'm developing an embedded firewall router. For some reasons I can only use uClinux 2.4 now. But the 2.6 kernel is more advanced and there are more resources about the linux 2.6 netfilter than 2.4. My product will upgrade to 2.6 next year. Do you think I SHOULD port the linux 2.6 netfilter frame work back to linux 2.4 to have more resources and support? From netfilter at mm-double.de Mon Aug 14 10:58:05 2006 From: netfilter at mm-double.de (Maik Hentsche) Date: Mon Aug 14 11:28:58 2006 Subject: possible Bug in ip_conntrack Message-ID: <1155545885.44e03b1dcec01@www.domainfactory-webmail.de> Hi! While debugging a new version of conntrackd, I (nearly) every time get athe following error "BUG: soft lockup detected on CPU#1!" which completely hangs the system. Here is, what I do to reproduce the error: I ue keepalived to manage a virtual IP, that is default gw for two other machines, which communicate over this IP. I use a client-server-program to count connections, which simply opens a tcp-socket and immediately closes it. The gateway uses keepalived and conntrackd-0.8.3 for hot standby. The HS-master (thta produces the error) has kernel 2.6.18-rc4 running (with no other patches), the slave is a 2.6.17.3 (also vanilla). I use valgrind --leak-check=yes for debugging. After some time, usually between 1 and 5 minutes, the system hangs and writes something like this out on the serial console. BUG: soft lockup detected on CPU#1! Call Trace: [] softlockup_tick+0xf9/0x140 [] update_process_times+0x57/0x90 [] smp_local_timer_interrupt+0x23/0x50 [] smp_apic_timer_interrupt+0x38/0x40 [] apic_timer_interrupt+0x66/0x6c [] _write_lock_irqsave+0x6d/0x90 [] _write_lock_irqsave+0x4a/0x90 [] _write_lock_bh+0x6/0x20 [] :ip_conntrack:destroy_conntrack+0x69/0x100 [] :ip_conntrack_netlink:ctnetlink_dump_table+0x116/0x160 [] netlink_dump+0x82/0x1c0 [] netlink_recvmsg+0x197/0x2a0 [] sock_recvmsg+0xde/0x100 [] task_rq_lock+0x4c/0x90 [] autoremove_wake_function+0x0/0x30 [] current_fs_time+0x3b/0x40 [] __wake_up+0x43/0x70 [] fget_light+0xae/0xe0 [] sys_recvfrom+0xfa/0x190 [] system_call+0x7e/0x83 The calltrace differs every time, but ip_conntrack is always included, therefore and because it only occured yet when I was debugging conntrackd I assume its a ip_conntrack problem. More logmessages and the buggy version of conntrackd can be found here: http://www-user.tu-chemnitz.de/~hmai/ip_conntrack/ If you need more informations, please let me know. so long Maik From cy at microembed.cn Mon Aug 14 12:40:17 2006 From: cy at microembed.cn (Richard Cai) Date: Mon Aug 14 13:11:16 2006 Subject: port netfilter on linux 2.6 back to linux 2.4 In-Reply-To: <20060814032115.3C0FC3683C8@mail3.edong.com> References: <20060814032115.3C0FC3683C8@mail3.edong.com> Message-ID: <44E05311.2010701@microembed.cn> hello everyone: I'm developing an embedded firewall router. For some reasons I can only use uClinux 2.4 now. But the 2.6 kernel is more advanced and there are more resources about the linux 2.6 netfilter than 2.4. My product will upgrade to 2.6 next year. Do you think I SHOULD port the linux 2.6 netfilter frame work back to linux 2.4 to have more resources and support? From kaber at trash.net Mon Aug 14 14:52:08 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 15:22:58 2006 Subject: port netfilter on linux 2.6 back to linux 2.4 In-Reply-To: <44E05311.2010701@microembed.cn> References: <20060814032115.3C0FC3683C8@mail3.edong.com> <44E05311.2010701@microembed.cn> Message-ID: <44E071F8.9090804@trash.net> Richard Cai wrote: > hello everyone: > I'm developing an embedded firewall router. For some reasons I > can only use uClinux 2.4 now. But the 2.6 kernel is more advanced and > there are more resources about the linux 2.6 netfilter than 2.4. My > product will upgrade to 2.6 next year. Do you think I SHOULD port the > linux 2.6 netfilter frame work back to linux 2.4 to have more resources > and support? I wouldn't advise that, the changes between 2.4 and 2.6 are huge and partially depend on changes in the remaining networking stack. Even backporting features between minor versions is not always easy .. From kaber at trash.net Mon Aug 14 14:54:05 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 15:24:53 2006 Subject: connlimit In-Reply-To: <002d01c6bf2e$351f1c70$650ba8c0@DORKA> References: <002d01c6bf2e$351f1c70$650ba8c0@DORKA> Message-ID: <44E0726D.7040504@trash.net> php0t wrote: > Dear developers, > > I've had a nice time trying to limit connections. The kernel is > 2.6.17.8. > Apart from the first couple of annoyances (such as the patch being > renamed from iplimit to connlimit, patch-o-matic not being able to apply > it to the current kernel etc), I've managed to patch manually, compile > it as a module and load it. > > However, when I try to add an according test rule, I get the 'Invalid > argument' error, and dmesg says: > ip_tables: connlimit match: invalid size 0 != 16 > > I also tried going to the site mentioned in the latest pom-ng's > source.list: > > # ipp2p, time, IPMARK and connlimit maintained by Krzysztof Oledzki > > http://people.netfilter.org/ole/pom/ > > But all I get is a smiley :) Just do what it says: "Please use "./runme --download" from a recent pom-ng." :) That will download the patches for you. From kaber at trash.net Mon Aug 14 14:50:36 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 15:51:05 2006 Subject: possible Bug in ip_conntrack In-Reply-To: <1155545885.44e03b1dcec01@www.domainfactory-webmail.de> References: <1155545885.44e03b1dcec01@www.domainfactory-webmail.de> Message-ID: <44E0719C.9050307@trash.net> Maik Hentsche wrote: > Hi! > While debugging a new version of conntrackd, I (nearly) every time get > athe following error "BUG: soft lockup detected on CPU#1!" which > completely hangs the system. Here is, what I do to reproduce the error: > I ue keepalived to manage a virtual IP, that is default gw for two other > machines, which communicate over this IP. I use a client-server-program > to count connections, which simply opens a tcp-socket and immediately > closes it. The gateway uses keepalived and conntrackd-0.8.3 for hot > standby. The HS-master (thta produces the error) has kernel 2.6.18-rc4 > running (with no other patches), the slave is a 2.6.17.3 (also > vanilla). I use valgrind --leak-check=yes for debugging. After some > time, usually between 1 and 5 minutes, the system hangs and writes > something like this out on the serial console. > > BUG: soft lockup detected on CPU#1! > Call Trace: > [] softlockup_tick+0xf9/0x140 > [] update_process_times+0x57/0x90 > [] smp_local_timer_interrupt+0x23/0x50 > [] smp_apic_timer_interrupt+0x38/0x40 > [] apic_timer_interrupt+0x66/0x6c > [] _write_lock_irqsave+0x6d/0x90 > [] _write_lock_irqsave+0x4a/0x90 > [] _write_lock_bh+0x6/0x20 > [] :ip_conntrack:destroy_conntrack+0x69/0x100 > [] > :ip_conntrack_netlink:ctnetlink_dump_table+0x116/0x160 Can you test current -git please? We had some changes in that area .. I don't recall any explicit bugfixes, but who knows .. From kaber at trash.net Mon Aug 14 15:34:05 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 16:04:54 2006 Subject: priv_data patch Message-ID: <44E07BCD.8030206@trash.net> I'm afraid I have some bad news .. While merging the priv_data patch I noticed an oversight. Currently, when modifying the ruleset, all modules dump their entire state (user configuration + internal state kept in the same structure) to userspace, which will return it to the kernel. That means for example that the limit match will not loose its current state when modifying other rules. When we move the state out of the data shared with userspace this can't be done anymore, so each modification to the table will cause all modules to loose their current state, even if they we're not directly affected by the change. We can't break this behaviour, so this limits potential users of the priv_data stuff to things like hashlimit or recent, which do a lookup of state stored completely external from the ruleset (and could use it to cache the lookup result). I don't think that this is worth it, we probably need to wait until we have a better userspace interface before we can do something like this .. From gozem at gozem.se Mon Aug 14 16:17:51 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 16:48:50 2006 Subject: [PATCH] priv_data 0/2 In-Reply-To: <200606261641.47294.max@nucleus.it> References: <200606261641.47294.max@nucleus.it> Message-ID: <20060814141751.GR7194@kriss.csbnet.se> 2006-06-26 16:41:46+0200, Massimiliano Hofer -> > Hi, > this is a version of my priv_data patch that updates targets and renames > functions as suggested by Patrick. > > Since xt_init_match() and xt_init_target() (formerly xt_check_match() and > xt_check_target()) no longer just check and they needed some argument changes > anyway, I included some more common code previously replicated in > ip_tables.c, ip6_tables.c and arp_tables.c. > Similarly I introduced xt_destroy_match() and xt_destroy_taget(). > > The resulting patches are larger than I anticipated, but most of the space is > taken by function ranames and argument adjustments. > > My previous example with xt_condition still applies (just rename checkentry to > init in struct xt_match). > > I tested several combination with iptables, ip6tables and arptables. It can't > make it fail, but I didn't try it with a real world network load. Right now I > don't have a 64 bit machine available for testing (I should be able to use > one in a few days), so I didn't test compat at all. > > Testing and comments, as always, are appreciated. > > -- > Saluti, > Massimiliano Hofer I'm currently porting some of my modules to use this API with priv_data. However, I ran into some troubles. For example i'm writing a revision 1 for the quota module which allows the count to go negative, and in the same time I'm porting it to use this API (as it's using the same stupid thing as -m limit with q->master = q;). However, having the quota counter in the priv_data struct makes it impossible to report the counter to userspace when the user issues iptables -L. "Hidden", kernel data only is good with priv_data, but sometimes it will be too hidden. Do we have any good solution for this? -- Joakim Axelsson From kaber at trash.net Mon Aug 14 16:22:31 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 16:53:29 2006 Subject: [PATCH] priv_data 0/2 In-Reply-To: <20060814141751.GR7194@kriss.csbnet.se> References: <200606261641.47294.max@nucleus.it> <20060814141751.GR7194@kriss.csbnet.se> Message-ID: <44E08727.6070403@trash.net> Joakim Axelsson wrote: > > I'm currently porting some of my modules to use this API with priv_data. > However, I ran into some troubles. For example i'm writing a revision 1 for > the quota module which allows the count to go negative, and in the same time > I'm porting it to use this API (as it's using the same stupid thing as -m > limit with q->master = q;). However, having the quota counter in the > priv_data struct makes it impossible to report the counter to userspace when > the user issues iptables -L. That is actually a bug in the current module IMO (I looked into fixing this yesterday as well). iptables-save/restore won't work properly since they will save the _remaining_ quota, not the one the rule was created with. This might be useful as well, but it diverges from what other modules do. > "Hidden", kernel data only is good with priv_data, but sometimes it will be > too hidden. > > Do we have any good solution for this? Not really, see the mail I just wrote. From gozem at gozem.se Mon Aug 14 16:25:59 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 16:56:49 2006 Subject: priv_data patch In-Reply-To: <44E07BCD.8030206@trash.net> References: <44E07BCD.8030206@trash.net> Message-ID: <20060814142559.GS7194@kriss.csbnet.se> 2006-08-14 15:34:05+0200, Patrick McHardy -> > I'm afraid I have some bad news .. > > While merging the priv_data patch I noticed an oversight. Currently, > when modifying the ruleset, all modules dump their entire state > (user configuration + internal state kept in the same structure) > to userspace, which will return it to the kernel. That means for > example that the limit match will not loose its current state when > modifying other rules. When we move the state out of the data shared > with userspace this can't be done anymore, so each modification to > the table will cause all modules to loose their current state, even > if they we're not directly affected by the change. We can't break > this behaviour, so this limits potential users of the priv_data stuff > to things like hashlimit or recent, which do a lookup of state stored > completely external from the ruleset (and could use it to cache the > lookup result). I don't think that this is worth it, we probably need > to wait until we have a better userspace interface before we can do > something like this .. > I do not completly understand you. Today a modification of ONE rule will or will not trigger the checkentry()/init() of ALL rules? I know they did before (in 2.4) since modules i have written has code to workaround this. Having a low limiter like say a few packets each 5min can't just be reset each time we modify another unrelated rule. Latly howver it seams as it doesn't? What do you mean we are breaking with this patch? A match/target doesn't have to use this new data area. Just let don't alter them and they will continue to act aas they always done? We will however provide better tools for new modules (not yet in pom-ng). However, a problem that i just mailed about in another thread still exists. The problem with limit acting up can be solved with my new module "lim". Its so complety different that a new name is needed rather than a revision. Solving the problem. Im currently porting it now. Or did i missunderstand you completly? -- Joakim Axelsson From kaber at trash.net Mon Aug 14 16:31:34 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 17:02:23 2006 Subject: priv_data patch In-Reply-To: <20060814142559.GS7194@kriss.csbnet.se> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> Message-ID: <44E08946.1040105@trash.net> Joakim Axelsson wrote: > 2006-08-14 15:34:05+0200, Patrick McHardy -> > >>I'm afraid I have some bad news .. >> >>[...] > > I do not completly understand you. Today a modification of ONE rule will or > will not trigger the checkentry()/init() of ALL rules? Yes it will. Modification happens like this: - dump entire table to userspace - modify table - send new table to kernel _All_ matches and target and reinstantiated, since the kernel doesn't know which rule in the currently active table corresponds to which in the new table. When moving state out of the data shared with userspace it will get lost during this. > I know they did before (in 2.4) since modules i have written has code to > workaround this. Having a low limiter like say a few packets each 5min can't > just be reset each time we modify another unrelated rule. Exactly. > Latly howver it seams as it doesn't? What do you mean we are breaking with > this patch? A match/target doesn't have to use this new data area. Just let > don't alter them and they will continue to act aas they always done? We will > however provide better tools for new modules (not yet in pom-ng). Well, if nobody can use it reasonable there is no reason to introduce it. From max at nucleus.it Mon Aug 14 16:40:41 2006 From: max at nucleus.it (Massimiliano Hofer) Date: Mon Aug 14 17:11:29 2006 Subject: priv_data patch In-Reply-To: <44E07BCD.8030206@trash.net> References: <44E07BCD.8030206@trash.net> Message-ID: <200608141640.41759.max@nucleus.it> On Monday 14 August 2006 3:34 pm, you wrote: > While merging the priv_data patch I noticed an oversight. Currently, > when modifying the ruleset, all modules dump their entire state > (user configuration + internal state kept in the same structure) > to userspace, which will return it to the kernel. That means for > example that the limit match will not loose its current state when > modifying other rules. When we move the state out of the data shared > with userspace this can't be done anymore, so each modification to > the table will cause all modules to loose their current state, even > if they we're not directly affected by the change. We can't break > this behaviour, so this limits potential users of the priv_data stuff > to things like hashlimit or recent, which do a lookup of state stored > completely external from the ruleset (and could use it to cache the > lookup result). I don't think that this is worth it, we probably need > to wait until we have a better userspace interface before we can do > something like this .. I'm afraid you are right. This limits the usefulness to volatile data that can safely be discarded. This can be some sort of disposable statistic or performance enhancing cache of data that can be retrieved in other ways. Without this patch we are condemned to ugly tricks for data needed by condition, hashlimit and recent. I think this is useful, but I'll leave it to you to decide if it's worth. Any idea for a better userspace interface? It's not the first time you tell me that we could have a better "next generation" userspace interface. Maybe it's time to start planning. Does anyone have wishes for new or different ways to do things? Just an example without proper thought or planning: we could set an optional way rules could use to tag themselves and have their data back if they want it. As with priv_data this won't benefit everyone. I'll keep thinking of better ways. IMHO a portion of data outside the one passed by userspace (persistent or volatile) is a must in the long run and will free us from an arbitrary constraint between userspace and kernelspace. I see other people are writing matches that rely on separate user input to complete its data (interface groups?) and they will need somewhere to store it. -- Saluti, Massimiliano Hofer Nucleus From kaber at trash.net Mon Aug 14 16:48:27 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 17:19:16 2006 Subject: priv_data patch In-Reply-To: <200608141640.41759.max@nucleus.it> References: <44E07BCD.8030206@trash.net> <200608141640.41759.max@nucleus.it> Message-ID: <44E08D3B.7040505@trash.net> Massimiliano Hofer wrote: > On Monday 14 August 2006 3:34 pm, you wrote: > >>[...] > > I'm afraid you are right. This limits the usefulness to volatile data that can > safely be discarded. This can be some sort of disposable statistic or > performance enhancing cache of data that can be retrieved in other ways. Agreed. > Without this patch we are condemned to ugly tricks for data needed by > condition, hashlimit and recent. I think this is useful, but I'll leave it to > you to decide if it's worth. Hmm .. recent does a table lookup during runtime and the table could be cached. That would improve things a bit, but in my opinion not enough to justify this patch. Same for hashlimit. What data would condition store exactly? > Any idea for a better userspace interface? > It's not the first time you tell me that we could have a better "next > generation" userspace interface. Maybe it's time to start planning. > Does anyone have wishes for new or different ways to do things? Its actually quite clear what is needed. We want a userspace interface built on netlink, that acts on individual rules, not entire rulesets. There are a few more ideas, like handling negation centrally, allowing userspace to specify whether a target is terminal or not, allow multiple non-terminal targets in a row, etc, but nothing really fundamental. > Just an example without proper thought or planning: we could set an optional > way rules could use to tag themselves and have their data back if they want > it. As with priv_data this won't benefit everyone. I'll keep thinking of > better ways. We want to get rid of the atomic table replacements entirely. > IMHO a portion of data outside the one passed by userspace (persistent or > volatile) is a must in the long run and will free us from an arbitrary > constraint between userspace and kernelspace. I see other people are writing > matches that rely on separate user input to complete its data (interface > groups?) and they will need somewhere to store it. Once we stop replacing entire rulesets and move to a finer grained level this problem will be gone, state will be kept for all rules except the ones affected. Using netlink attributes will also allow us to flexibly enhance the interface as needed. From gozem at gozem.se Mon Aug 14 16:58:01 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 17:28:49 2006 Subject: priv_data patch In-Reply-To: <44E08D3B.7040505@trash.net> References: <44E07BCD.8030206@trash.net> <200608141640.41759.max@nucleus.it> <44E08D3B.7040505@trash.net> Message-ID: <20060814145801.GT7194@kriss.csbnet.se> 2006-08-14 16:48:27+0200, Patrick McHardy -> > > Any idea for a better userspace interface? > > It's not the first time you tell me that we could have a better "next > > generation" userspace interface. Maybe it's time to start planning. > > Does anyone have wishes for new or different ways to do things? > > > Its actually quite clear what is needed. We want a userspace interface > built on netlink, that acts on individual rules, not entire rulesets. > There are a few more ideas, like handling negation centrally, allowing > userspace to specify whether a target is terminal or not, allow multiple > non-terminal targets in a row, etc, but nothing really fundamental. > I have suggested this some years ago. But a new module type "action" could be used, along with "match" and "target". Meaning: 1. After zero, one or more matches 2. You run a zero, one or more actions 3. And finally end up in zero or one target. Example: iptables -m condition -m limit -a LOG -j DROP This means that the only targets we have today (as i can remember now) are: -j ACCEPT -j DROP -j REJECT -j other_chain As of now i have a few modules cheating this, being a match when they should be a target. A match always matching. This to lower the number of rules needed to perform some things. -- Joakim Axelsson From kaber at trash.net Mon Aug 14 17:05:07 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 17:35:56 2006 Subject: priv_data patch In-Reply-To: <20060814145801.GT7194@kriss.csbnet.se> References: <44E07BCD.8030206@trash.net> <200608141640.41759.max@nucleus.it> <44E08D3B.7040505@trash.net> <20060814145801.GT7194@kriss.csbnet.se> Message-ID: <44E09123.9070508@trash.net> Joakim Axelsson wrote: > 2006-08-14 16:48:27+0200, Patrick McHardy -> > >>Its actually quite clear what is needed. We want a userspace interface >>built on netlink, that acts on individual rules, not entire rulesets. >>There are a few more ideas, like handling negation centrally, allowing >>userspace to specify whether a target is terminal or not, allow multiple >>non-terminal targets in a row, etc, but nothing really fundamental. >> > > > I have suggested this some years ago. But a new module type "action" could be > used, along with "match" and "target". Meaning: > 1. After zero, one or more matches > 2. You run a zero, one or more actions > 3. And finally end up in zero or one target. > > Example: > iptables -m condition -m limit -a LOG -j DROP > > This means that the only targets we have today (as i can remember now) are: > -j ACCEPT > -j DROP > -j REJECT > -j other_chain Yes, we clearly want something like that. The exact details need to be worked out when actually implementing it. From kaber at trash.net Mon Aug 14 17:14:06 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 17:44:57 2006 Subject: priv_data patch In-Reply-To: <200608141702.50753.max@nucleus.it> References: <44E07BCD.8030206@trash.net> <200608141557.35918.max@nucleus.it> <44E08AC7.2050204@trash.net> <200608141702.50753.max@nucleus.it> Message-ID: <44E0933E.5060905@trash.net> Massimiliano Hofer wrote: > On Monday 14 August 2006 4:37 pm, you wrote: > >>Hmm .. recent does a table lookup during runtime and the table could be >>cached. That would improve things a bit, but in my opinion not enough >>to justify this patch. Same for hashlimit. What data would condition >>store exactly? > > > I need a pointer to per condition data, so that multiple rules with the same > name refer to the same flag. > I can break userspace compatibility and store a pointer in the userspace > structure. I just thought this could be useful to everyone (and let me > maintain userspace compatibility along the way). That looks like the only valid type of usage. Which means your initial implementation, which just provided space for a pointer to the individual instances, might have been the better way. I need to think about this some more and look at the modules that could make use of this again. >>Its actually quite clear what is needed. We want a userspace interface >>built on netlink, that acts on individual rules, not entire rulesets. >>There are a few more ideas, like handling negation centrally, allowing >>userspace to specify whether a target is terminal or not, allow multiple >>non-terminal targets in a row, etc, but nothing really fundamental. > > > I thought the current way of doing things was specifically designed to > minimize softirq locking (especially with arbitarily long chains and > arbitrary initialization code). We could switch to RCU lists, though... Yes, it should be possible to do lockless ruleset evaluation (at least on the ruleset level, some modules will still need locking). From gozem at gozem.se Mon Aug 14 17:20:26 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 17:51:15 2006 Subject: priv_data patch In-Reply-To: <44E08946.1040105@trash.net> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> Message-ID: <20060814152026.GU7194@kriss.csbnet.se> 2006-08-14 16:31:34+0200, Patrick McHardy -> > Joakim Axelsson wrote: > > 2006-08-14 15:34:05+0200, Patrick McHardy -> > > > >>I'm afraid I have some bad news .. > >> > >>[...] > > > > I do not completly understand you. Today a modification of ONE rule will or > > will not trigger the checkentry()/init() of ALL rules? > > Yes it will. Modification happens like this: > > - dump entire table to userspace > - modify table > - send new table to kernel > > _All_ matches and target and reinstantiated, since the kernel doesn't > know which rule in the currently active table corresponds to which > in the new table. When moving state out of the data shared with > userspace it will get lost during this. > Lost? Like the memory will be reallocated and we have a memory leak from the old priv_data? Can't we just figure out if thie pointer is null and don't allocate new memory? Or am i lost here? > > I know they did before (in 2.4) since modules i have written has code to > > workaround this. Having a low limiter like say a few packets each 5min can't > > just be reset each time we modify another unrelated rule. > > Exactly. > > > Latly howver it seams as it doesn't? What do you mean we are breaking with > > this patch? A match/target doesn't have to use this new data area. Just let > > don't alter them and they will continue to act aas they always done? We will > > however provide better tools for new modules (not yet in pom-ng). > > Well, if nobody can use it reasonable there is no reason to introduce > it. Alot of my patches can use it. Not having todo an ugly solution trying to sneak away from being reseted when another rule is altered. I sure would like to have it added. Simpyl do not change for example -m limit into using it if it breaks the "feature" of reseting its state then altering another unrelated rule. Please have a look here for 4 modules "needing" this patch: http://www.gozem.se/~gozem/netfilter/ I'm copying here the code they are using today to workaround this reset-"feature": struct info { ... data here ... atomic_t refcount; }; init() { /* Already initiated? Since this is runned each time ANY rule is changed */ if (lim->state != NULL) { /* Increase the reference counter so we wont delete this match */ atomic_inc( &lim->state->refcount ); DEBUGPRINT("already initiated, abort ref=%u", atomic_read( &lim->state->refcount) ); return 1; } /* init state data, set refcount to 1 */ lim->state = kmalloc( sizeof(struct ipt_lim_state), GFP_ATOMIC ); if (lim->state == NULL) return -ENOMEM; atomic_set(&lim->state->refcount, 1); } destroy() { /* Decrease our reference counter and test if its zero*/ if ( atomic_dec_and_test(&lim->state->refcount) ) { /* Really delete this match */ DEBUGPRINTP("really delete"); /* free state */ kfree(lim->state); } } -- Joakim Axelsson From kaber at trash.net Mon Aug 14 17:28:19 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 18:01:13 2006 Subject: priv_data patch In-Reply-To: <20060814152026.GU7194@kriss.csbnet.se> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> Message-ID: <44E09693.9000606@trash.net> Joakim Axelsson wrote: > 2006-08-14 16:31:34+0200, Patrick McHardy -> > > >>>I do not completly understand you. Today a modification of ONE rule will or >>>will not trigger the checkentry()/init() of ALL rules? >> >>Yes it will. Modification happens like this: >> >>- dump entire table to userspace >>- modify table >>- send new table to kernel >> >>_All_ matches and target and reinstantiated, since the kernel doesn't >>know which rule in the currently active table corresponds to which >>in the new table. When moving state out of the data shared with >>userspace it will get lost during this. >> > > Lost? Like the memory will be reallocated and we have a memory leak from the > old priv_data? No, the contents will be lost since the allocated memory belonging to the old table will get freed and new memory is allocated for the new table. > Can't we just figure out if thie pointer is null and don't allocate new > memory? > > Or am i lost here? It won't be non-NULL since we're always initializing a new table from the kernels POV. >>Well, if nobody can use it reasonable there is no reason to introduce >>it. > > > Alot of my patches can use it. Not having todo an ugly solution trying to > sneak away from being reseted when another rule is altered. I sure would > like to have it added. Simpyl do not change for example -m limit into using > it if it breaks the "feature" of reseting its state then altering another > unrelated rule. > > Please have a look here for 4 modules "needing" this patch: > http://www.gozem.se/~gozem/netfilter/ Please post your examples to the list. > I'm copying here the code they are using today to workaround this > reset-"feature": > > struct info { > ... data here ... > atomic_t refcount; > }; > > init() { > /* Already initiated? Since this is runned each time ANY rule is changed */ > if (lim->state != NULL) { I'm not sure I understand what you're trying to show here, but I assume its some kind of shared state between multiple instances of your "lim" match. the first question would be: where does the state pointer get its value from here? You can't rely on userspace passing back a valid pointer, this is questionable today (CAP_NET_ADMIN might crash the box), but its a huge bug once you consider things like OpenVZ. From kaber at trash.net Mon Aug 14 17:31:18 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 18:04:11 2006 Subject: priv_data patch In-Reply-To: <20060814152026.GU7194@kriss.csbnet.se> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> Message-ID: <44E09746.60302@trash.net> Joakim Axelsson wrote: > Alot of my patches can use it. Not having todo an ugly solution trying to > sneak away from being reseted when another rule is altered. I sure would > like to have it added. Simpyl do not change for example -m limit into using > it if it breaks the "feature" of reseting its state then altering another > unrelated rule. I forgot to reply to this. You seem to misunderstand, limit doesn't reset its state today. It will when moving private data out of the structures shared with userspace. Same for all other users of this, they will "forget" their state on each ruleset change. From gozem at gozem.se Mon Aug 14 17:35:21 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 18:06:09 2006 Subject: [PATCH] priv_data 0/2 In-Reply-To: <44E08727.6070403@trash.net> References: <200606261641.47294.max@nucleus.it> <20060814141751.GR7194@kriss.csbnet.se> <44E08727.6070403@trash.net> Message-ID: <20060814153521.GV7194@kriss.csbnet.se> 2006-08-14 16:22:31+0200, Patrick McHardy -> > Joakim Axelsson wrote: > > > > I'm currently porting some of my modules to use this API with priv_data. > > However, I ran into some troubles. For example i'm writing a revision 1 for > > the quota module which allows the count to go negative, and in the same time > > I'm porting it to use this API (as it's using the same stupid thing as -m > > limit with q->master = q;). However, having the quota counter in the > > priv_data struct makes it impossible to report the counter to userspace when > > the user issues iptables -L. > > That is actually a bug in the current module IMO (I looked into fixing > this yesterday as well). iptables-save/restore won't work properly since > they will save the _remaining_ quota, not the one the rule was created > with. This might be useful as well, but it diverges from what other > modules do. > Well, the priv_data patch will then solve this problem. As the userspace struct will always have the initial quota unaltered. The priv_data quota will decrease. However i guess you want to be able to see both. And in my case, saving the remaining quota with iptables-save is what i want. (I case of the router for some reason craches, the last state must be saved and as accurate as possible.) Solution for this might be using two parameters for the userspace struct. One for initial quota and one for remaining, saving both. Possible even an option saying with figure should be set when iptables-restore is used. Still, we can't nicly access the remaining quota with priv_data unless the modules each time it "matches" writes in both priv_data and userspace info, which is kinda ugly. There is also the way of using a /proc-entry to list the current remaining quota. Also restoring it by echoing a new figure to the entry. Keep the iptables-save/restore only with the initial quota figure. I can write this if this is what we want? -- Joakim Axelsson From gozem at gozem.se Mon Aug 14 17:40:05 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 18:10:54 2006 Subject: priv_data patch In-Reply-To: <44E09746.60302@trash.net> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09746.60302@trash.net> Message-ID: <20060814154005.GW7194@kriss.csbnet.se> 2006-08-14 17:31:18+0200, Patrick McHardy -> > Joakim Axelsson wrote: > > Alot of my patches can use it. Not having todo an ugly solution trying to > > sneak away from being reseted when another rule is altered. I sure would > > like to have it added. Simpyl do not change for example -m limit into using > > it if it breaks the "feature" of reseting its state then altering another > > unrelated rule. > > I forgot to reply to this. You seem to misunderstand, limit doesn't > reset its state today. It will when moving private data out of the > structures shared with userspace. Same for all other users of this, > they will "forget" their state on each ruleset change. Okie, now I get it. This seams to have changed from 2.4 then. As altering one unrelated rule will trigger the checkentry for _all_ rules. The code i posted was a (somewhat ugly) workaround for this, and yes relying on userspace not altering a kernel-space pointer for us. However, the case is the same for xt_limit with r->master = r; (and quota). Alter master in userspace after the limit rule has been initiated and you will get some really nasty result. -- Joakim Axelsson From kaber at trash.net Mon Aug 14 17:43:29 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 18:16:25 2006 Subject: [PATCH] priv_data 0/2 In-Reply-To: <20060814153521.GV7194@kriss.csbnet.se> References: <200606261641.47294.max@nucleus.it> <20060814141751.GR7194@kriss.csbnet.se> <44E08727.6070403@trash.net> <20060814153521.GV7194@kriss.csbnet.se> Message-ID: <44E09A21.6000503@trash.net> Joakim Axelsson wrote: > 2006-08-14 16:22:31+0200, Patrick McHardy -> > >>Joakim Axelsson wrote: >> >>>I'm currently porting some of my modules to use this API with priv_data. >>>However, I ran into some troubles. For example i'm writing a revision 1 for >>>the quota module which allows the count to go negative, and in the same time >>>I'm porting it to use this API (as it's using the same stupid thing as -m >>>limit with q->master = q;). However, having the quota counter in the >>>priv_data struct makes it impossible to report the counter to userspace when >>>the user issues iptables -L. >> >>That is actually a bug in the current module IMO (I looked into fixing >>this yesterday as well). iptables-save/restore won't work properly since >>they will save the _remaining_ quota, not the one the rule was created >>with. This might be useful as well, but it diverges from what other >>modules do. > > Well, the priv_data patch will then solve this problem. As the userspace > struct will always have the initial quota unaltered. The priv_data quota will > decrease. That was the initial idea, forgetting about the atomic table exchange. > However i guess you want to be able to see both. Not really "want", its necessary, otherwise the quota will be refilled on each ruleset update. > And in my case, > saving the remaining quota with iptables-save is what i want. (I case of the > router for some reason craches, the last state must be saved and as accurate > as possible.) I can see that this would also be useful, however none of the other stateful matches does this, so the more important point IMO is to get quota to do the expected thing. If we can do both, also fine, but it would have to be optional and I can't see how to cleanly do this. > Solution for this might be using two parameters for the userspace struct. > One for initial quota and one for remaining, saving both. Possible even an > option saying with figure should be set when iptables-restore is used. Yes. Without breaking compatibility, we could make quota fixed, use the master pointer to store the current quota and put the new master pointer somewhere outside of the struct. But thats really not very pretty. > Still, we can't nicly access the remaining quota with priv_data unless the > modules each time it "matches" writes in both priv_data and userspace info, > which is kinda ugly. As I said, I don't see saving current state as necessary (we don't do it anywhere and it solves an entirely different problem). The unfortunate fact though is that we need to pass it to userspace anyway, because of the limitations of the userspace interface. > There is also the way of using a /proc-entry to list the current remaining > quota. Also restoring it by echoing a new figure to the entry. Keep the > iptables-save/restore only with the initial quota figure. > > I can write this if this is what we want? Definitely not :) We want a better userspace interface that doesn't require us to put in hacks like this. From kaber at trash.net Mon Aug 14 17:46:59 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 18:19:52 2006 Subject: priv_data patch In-Reply-To: <20060814154005.GW7194@kriss.csbnet.se> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09746.60302@trash.net> <20060814154005.GW7194@kriss.csbnet.se> Message-ID: <44E09AF3.2080406@trash.net> Joakim Axelsson wrote: > 2006-08-14 17:31:18+0200, Patrick McHardy -> > >>Joakim Axelsson wrote: >> >>>Alot of my patches can use it. Not having todo an ugly solution trying to >>>sneak away from being reseted when another rule is altered. I sure would >>>like to have it added. Simpyl do not change for example -m limit into using >>>it if it breaks the "feature" of reseting its state then altering another >>>unrelated rule. >> >>I forgot to reply to this. You seem to misunderstand, limit doesn't >>reset its state today. It will when moving private data out of the >>structures shared with userspace. Same for all other users of this, >>they will "forget" their state on each ruleset change. > > > Okie, now I get it. This seams to have changed from 2.4 then. No, this behaviour has been there since the beginning. > As altering > one unrelated rule will trigger the checkentry for _all_ rules. The code i > posted was a (somewhat ugly) workaround for this, and yes relying on > userspace not altering a kernel-space pointer for us. However, the case is > the same for xt_limit with r->master = r; (and quota). Alter master in > userspace after the limit rule has been initiated and you will get some > really nasty result. Thats not true, the master pointer is reinitialized on every change by the checkentry function (which, as you note, is called on all rules for every change). The simple reason why it keeps its current state is because it is dumped to userspace and echoed back. If you move it out of the structure shared with userspace, this can not happen anymore. From max at nucleus.it Mon Aug 14 17:53:52 2006 From: max at nucleus.it (Massimiliano Hofer) Date: Mon Aug 14 18:24:41 2006 Subject: priv_data patch In-Reply-To: <20060814152026.GU7194@kriss.csbnet.se> References: <44E07BCD.8030206@trash.net> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> Message-ID: <200608141753.53072.max@nucleus.it> On Monday 14 August 2006 5:20 pm, Joakim Axelsson wrote: > Lost? Like the memory will be reallocated and we have a memory leak from > the old priv_data? No. Every time a modification is done a complete new chain is created and initialized, swapped for the current one and then the old one is removed. This means that any memory needed by priv_data is reallocated for the new chain end than freed while destroying the old one. My patch doesn't care since I keep a global list where I can fetch my data back using the condition name and use priv_data only as a way to keep it at hand and achieve O(1) performance. If you don't have an alternative means to retrieve your data I'm afraid you'r bust. :( I agree with Patrick and I think it's time to think about other ways to manage changes. -- Saluti, Massimiliano Hofer Nucleus From gozem at gozem.se Mon Aug 14 17:56:42 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 18:27:35 2006 Subject: priv_data patch In-Reply-To: <44E09AF3.2080406@trash.net> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09746.60302@trash.net> <20060814154005.GW7194@kriss.csbnet.se> <44E09AF3.2080406@trash.net> Message-ID: <20060814155642.GA15328@kriss.csbnet.se> 2006-08-14 17:46:59+0200, Patrick McHardy -> > Joakim Axelsson wrote: > > 2006-08-14 17:31:18+0200, Patrick McHardy -> > > > >>Joakim Axelsson wrote: > >> > >>>Alot of my patches can use it. Not having todo an ugly solution trying to > >>>sneak away from being reseted when another rule is altered. I sure would > >>>like to have it added. Simpyl do not change for example -m limit into using > >>>it if it breaks the "feature" of reseting its state then altering another > >>>unrelated rule. > >> > >>I forgot to reply to this. You seem to misunderstand, limit doesn't > >>reset its state today. It will when moving private data out of the > >>structures shared with userspace. Same for all other users of this, > >>they will "forget" their state on each ruleset change. > > > > > > Okie, now I get it. This seams to have changed from 2.4 then. > > No, this behaviour has been there since the beginning. > > > As altering > > one unrelated rule will trigger the checkentry for _all_ rules. The code i > > posted was a (somewhat ugly) workaround for this, and yes relying on > > userspace not altering a kernel-space pointer for us. However, the case is > > the same for xt_limit with r->master = r; (and quota). Alter master in > > userspace after the limit rule has been initiated and you will get some > > really nasty result. > > > Thats not true, the master pointer is reinitialized on every change by > the checkentry function (which, as you note, is called on all rules for > every change). The simple reason why it keeps its current state is > because it is dumped to userspace and echoed back. If you move it out of > the structure shared with userspace, this can not happen anymore. I think we define reinitiated differently then. I define reinitated by checkentry() being runned even tho the match/rule isn't a new (or altered) match/rule. If checkentry() is runned for _all_ matches/targers everytime an unrelated rule is altered then limit will lose its state because the code resets the bucket and refills it. Meaning that if I have a limit of 1 packet each hour and I change an (unrelated) rule every 30mins the limit will be a limit of 1 packet every 30min instead. The (ugly) workaround i posted will solve this issue by keeping track if it has been running the checkentry() before. This is where the priv_data is needed. But if you say that the priv_data pointer will be lost for all rules on any alter of rules its kinda void to have it. However a solution keeping that priv_data pointer intact could be very useful. One reason for not posting my patches is this uglyness where i have to rely on kernel pointers not being altered by userspace. -- Joakim Axelsson From kaber at trash.net Mon Aug 14 18:01:19 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 18:34:12 2006 Subject: priv_data patch In-Reply-To: <20060814155642.GA15328@kriss.csbnet.se> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09746.60302@trash.net> <20060814154005.GW7194@kriss.csbnet.se> <44E09AF3.2080406@trash.net> <20060814155642.GA15328@kriss.csbnet.se> Message-ID: <44E09E4F.3040506@trash.net> Joakim Axelsson wrote: > 2006-08-14 17:46:59+0200, Patrick McHardy -> > >>Thats not true, the master pointer is reinitialized on every change by >>the checkentry function (which, as you note, is called on all rules for >>every change). The simple reason why it keeps its current state is >>because it is dumped to userspace and echoed back. If you move it out of >>the structure shared with userspace, this can not happen anymore. > > > I think we define reinitiated differently then. I define reinitated by > checkentry() being runned even tho the match/rule isn't a new (or altered) > match/rule. > > If checkentry() is runned for _all_ matches/targers everytime an unrelated > rule is altered then limit will lose its state because the code resets the > bucket and refills it. You're right. That is actually a bug in my opinion. > Meaning that if I have a limit of 1 packet each hour > and I change an (unrelated) rule every 30mins the limit will be a limit of 1 > packet every 30min instead. The (ugly) workaround i posted will solve this > issue by keeping track if it has been running the checkentry() before. This > is where the priv_data is needed. No, we can simply check if f.e. credit is non-zero. > But if you say that the priv_data pointer will be lost for all rules on any > alter of rules its kinda void to have it. However a solution keeping that > priv_data pointer intact could be very useful. That would mean the kernel needs to associate new rules with rules in the old table (or the individual modules itself need to do it somehow, like hashlimit or recent). This is really getting too ugly to even consider in my opinion, especially since the obvious solution of not doing an atomic table exchange also solves a lot of other problems. From gozem at gozem.se Mon Aug 14 18:04:43 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 18:35:32 2006 Subject: priv_data patch In-Reply-To: <44E09693.9000606@trash.net> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09693.9000606@trash.net> Message-ID: <20060814160443.GX7194@kriss.csbnet.se> 2006-08-14 17:28:19+0200, Patrick McHardy -> > Joakim Axelsson wrote: > > 2006-08-14 16:31:34+0200, Patrick McHardy -> > > > > > >>>I do not completly understand you. Today a modification of ONE rule will or > >>>will not trigger the checkentry()/init() of ALL rules? > >> > >>Yes it will. Modification happens like this: > >> > >>- dump entire table to userspace > >>- modify table > >>- send new table to kernel > >> > >>_All_ matches and target and reinstantiated, since the kernel doesn't > >>know which rule in the currently active table corresponds to which > >>in the new table. When moving state out of the data shared with > >>userspace it will get lost during this. > >> > > > > Lost? Like the memory will be reallocated and we have a memory leak from the > > old priv_data? > > No, the contents will be lost since the allocated memory belonging to > the old table will get freed and new memory is allocated for the new > table. > > > Can't we just figure out if thie pointer is null and don't allocate new > > memory? > > > > Or am i lost here? > > It won't be non-NULL since we're always initializing a new table from > the kernels POV. > This was a hard one. Seams then the only solution for this is to keep some global state in a list and for every checkentry find your entry in it and cache it in info (or priv_data if we apply the patch). > >>Well, if nobody can use it reasonable there is no reason to introduce > >>it. > > > > > > Alot of my patches can use it. Not having todo an ugly solution trying to > > sneak away from being reseted when another rule is altered. I sure would > > like to have it added. Simpyl do not change for example -m limit into using > > it if it breaks the "feature" of reseting its state then altering another > > unrelated rule. > > > > Please have a look here for 4 modules "needing" this patch: > > http://www.gozem.se/~gozem/netfilter/ > > Please post your examples to the list. > In which format? As a full patch for the kernel or something that fits in pom-ng? The current code works for 2.4 and early 2.6. I don't want to spend time porting it into "wrong" API now when we are discussing priv_data to be or not to be :-) > > I'm copying here the code they are using today to workaround this > > reset-"feature": > > > > struct info { > > ... data here ... > > atomic_t refcount; > > }; > > > > init() { > > /* Already initiated? Since this is runned each time ANY rule is changed */ > > if (lim->state != NULL) { > > > I'm not sure I understand what you're trying to show here, > but I assume its some kind of shared state between multiple > instances of your "lim" match. the first question would be: > where does the state pointer get its value from here? > You can't rely on userspace passing back a valid pointer, > this is questionable today (CAP_NET_ADMIN might crash the > box), but its a huge bug once you consider things like > OpenVZ. > Yes, its ugly but has been working in our routers for atleast a year now. I will however port it to use a global list state instead as some of the modules already does. From gozem at gozem.se Mon Aug 14 18:13:37 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 18:44:26 2006 Subject: priv_data patch In-Reply-To: <44E09E4F.3040506@trash.net> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09746.60302@trash.net> <20060814154005.GW7194@kriss.csbnet.se> <44E09AF3.2080406@trash.net> <20060814155642.GA15328@kriss.csbnet.se> <44E09E4F.3040506@trash.net> Message-ID: <20060814161337.GY7194@kriss.csbnet.se> 2006-08-14 18:01:19+0200, Patrick McHardy -> > Joakim Axelsson wrote: > > 2006-08-14 17:46:59+0200, Patrick McHardy -> > > > >>Thats not true, the master pointer is reinitialized on every change by > >>the checkentry function (which, as you note, is called on all rules for > >>every change). The simple reason why it keeps its current state is > >>because it is dumped to userspace and echoed back. If you move it out of > >>the structure shared with userspace, this can not happen anymore. > > > > > > I think we define reinitiated differently then. I define reinitated by > > checkentry() being runned even tho the match/rule isn't a new (or altered) > > match/rule. > > > > If checkentry() is runned for _all_ matches/targers everytime an unrelated > > rule is altered then limit will lose its state because the code resets the > > bucket and refills it. > > > You're right. That is actually a bug in my opinion. > > > Meaning that if I have a limit of 1 packet each hour > > and I change an (unrelated) rule every 30mins the limit will be a limit of 1 > > packet every 30min instead. The (ugly) workaround i posted will solve this > > issue by keeping track if it has been running the checkentry() before. This > > is where the priv_data is needed. > > > No, we can simply check if f.e. credit is non-zero. > It needs to be coded then :-). It's not the case today. > > But if you say that the priv_data pointer will be lost for all rules on any > > alter of rules its kinda void to have it. However a solution keeping that > > priv_data pointer intact could be very useful. > > > That would mean the kernel needs to associate new rules with rules in > the old table (or the individual modules itself need to do it somehow, > like hashlimit or recent). This is really getting too ugly to even > consider in my opinion, especially since the obvious solution of not > doing an atomic table exchange also solves a lot of other problems. I agree. A better solution is needed or no solution at all. However building a new API that actually only will change the parts it needs to will need a huge amount of work and will most probably break existing userspace. It will be very hard to keep backwards compability. As i see it some new hooks are needed on every module to retrieve the kernel data. This means we need 5 things: 1. start a new rule/match/target (todays checkentry()) 2. remove an old rule/match/target (todays destroy()) 3. the worker (todays match() and target()) 4. A new 'list' that will feed info needed by iptables -L and iptables-save to userspace. 5. Possible a new 'alter' that will alter info in the rules/match/targets private kernel data. Point 4 and 5 can be views as read and write of the rule. Point 1 and 2 as create and destroy. A uniq ID will be needed for each rule so userspace can define which rule it wishes to alter/remove. Fairly simple interface to build with netlink. However the real challenge is to try to keep backward compability. -- Joakim Axelsson From kaber at trash.net Mon Aug 14 18:13:38 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 18:46:30 2006 Subject: priv_data patch In-Reply-To: <20060814160443.GX7194@kriss.csbnet.se> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09693.9000606@trash.net> <20060814160443.GX7194@kriss.csbnet.se> Message-ID: <44E0A132.6040709@trash.net> Joakim Axelsson wrote: > 2006-08-14 17:28:19+0200, Patrick McHardy -> > >>> >>>Please have a look here for 4 modules "needing" this patch: >>>http://www.gozem.se/~gozem/netfilter/ >> >>Please post your examples to the list. >> > > > In which format? As a full patch for the kernel or something that fits in > pom-ng? The current code works for 2.4 and early 2.6. I don't want to spend > time porting it into "wrong" API now when we are discussing priv_data to be > or not to be :-) I'm mostly interested in the way you wish to use it, so I can better judge whether this change is worth doing or not. > Yes, its ugly but has been working in our routers for atleast a year now. I > will however port it to use a global list state instead as some of the > modules already does. I don't say its not working, but we can't put something like this in the kernel. So you need some identifier to find your state after changes. Besides the uglyness, we don't have anywhere to put it in the blob so userspace can return it, so it would end up beeing something local to the individual modules again (like table name in recent and hashlimit), just what we're doing today. From max at nucleus.it Mon Aug 14 18:19:36 2006 From: max at nucleus.it (Massimiliano Hofer) Date: Mon Aug 14 18:50:25 2006 Subject: priv_data patch In-Reply-To: <44E08D3B.7040505@trash.net> References: <44E07BCD.8030206@trash.net> <200608141640.41759.max@nucleus.it> <44E08D3B.7040505@trash.net> Message-ID: <200608141819.38152.max@nucleus.it> On Monday 14 August 2006 4:48 pm, Patrick McHardy wrote: > Hmm .. recent does a table lookup during runtime and the table could be > cached. That would improve things a bit, but in my opinion not enough > to justify this patch. Same for hashlimit. What data would condition > store exactly? I need a pointer to per condition data, so that multiple rules with the same name refer to the same flag. I can break userspace compatibility and store a pointer in the userspace structure. I just thought this could be useful to everyone (and let me maintain userspace compatibility along the way). > Its actually quite clear what is needed. We want a userspace interface > built on netlink, that acts on individual rules, not entire rulesets. > There are a few more ideas, like handling negation centrally, allowing > userspace to specify whether a target is terminal or not, allow multiple > non-terminal targets in a row, etc, but nothing really fundamental. I thought the current way of doing things was specifically designed to minimize softirq locking (especially with arbitarily long chains and arbitrary initialization code). We could switch to RCU lists, though... -- Saluti, Massimiliano Hofer Nucleus From gozem at gozem.se Mon Aug 14 18:24:51 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 18:55:41 2006 Subject: xt_quota (Was: [PATCH] priv_data 0/2) In-Reply-To: <44E09A21.6000503@trash.net> References: <200606261641.47294.max@nucleus.it> <20060814141751.GR7194@kriss.csbnet.se> <44E08727.6070403@trash.net> <20060814153521.GV7194@kriss.csbnet.se> <44E09A21.6000503@trash.net> Message-ID: <20060814162451.GZ7194@kriss.csbnet.se> 2006-08-14 17:43:29+0200, Patrick McHardy -> > > As I said, I don't see saving current state as necessary (we don't do > it anywhere and it solves an entirely different problem). The > unfortunate fact though is that we need to pass it to userspace anyway, > because of the limitations of the userspace interface. > That might be correct. However the other modules doesn't normally suffer that much as they will farily quickly work thier way to the same state again. I guess it's very uncommon that you are using so low limits like a few packets each hour. > > There is also the way of using a /proc-entry to list the current remaining > > quota. Also restoring it by echoing a new figure to the entry. Keep the > > iptables-save/restore only with the initial quota figure. > > > > I can write this if this is what we want? > > > Definitely not :) We want a better userspace interface that doesn't > require us to put in hacks like this. I however really needs some way of figuring out how much of the quota that remains. This is to be able to report this to our users (that receives a certain number of gigabytes each day). So they can see how much they have left (using som scripted interface to iptables). Also saving this holy figure (as it has become :-)) for the user if the router for some reason craches. This is also the reason i need negative quota figures. The users are allowed to "borrow" from their future quota. Doing so only under a byte limiting match (-m lim --limit-bytes 20k/s). In my opinion its more important to save the remaining quota, rather than the original. And most important to in some way be able to see how much is left of the quota. Perhaps this wil satisfy both of us: Somehting put out by iptables-save iptables -m quota --init-quota 1000 --remain-quota 123 --use-quota remain Somthing you write with iptables to create a new rule: iptables -m quota --init-quota 1000 (using --use-quota init explicity) iptables -m quota --init-quota 1000 --use-quota remain But this sure is ugly. -- Joakim Axelsson From kaber at trash.net Mon Aug 14 18:26:21 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 18:59:14 2006 Subject: priv_data patch In-Reply-To: <20060814161337.GY7194@kriss.csbnet.se> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09746.60302@trash.net> <20060814154005.GW7194@kriss.csbnet.se> <44E09AF3.2080406@trash.net> <20060814155642.GA15328@kriss.csbnet.se> <44E09E4F.3040506@trash.net> <20060814161337.GY7194@kriss.csbnet.se> Message-ID: <44E0A42D.604@trash.net> Joakim Axelsson wrote: > 2006-08-14 18:01:19+0200, Patrick McHardy -> > >>[...] > > I agree. A better solution is needed or no solution at all. However building > a new API that actually only will change the parts it needs to will need a > huge amount of work and will most probably break existing userspace. It will > be very hard to keep backwards compability. We need to keep the current API stable of course, but that doesn't prevent us from introducing an entirely new one. Matches and targets probably can be reused to a certain amount, although I wouldn't mind getting rid of the excessive amount of matches doing basically the same thing at the same time. > As i see it some new hooks are needed on every module to retrieve the kernel > data. This means we need 5 things: > 1. start a new rule/match/target (todays checkentry()) > 2. remove an old rule/match/target (todays destroy()) > 3. the worker (todays match() and target()) > 4. A new 'list' that will feed info needed by iptables -L and iptables-save > to userspace. Commonly called dump in other netlink interfaces. > 5. Possible a new 'alter' that will alter info in the rules/match/targets > private kernel data. This is tricky to get right on the rule level, a rule consist of multiple elements that would need to be changed atomically. > Point 4 and 5 can be views as read and write of the rule. Point 1 and 2 as > create and destroy. > > A uniq ID will be needed for each rule so userspace can define which rule it > wishes to alter/remove. Yes. > Fairly simple interface to build with netlink. However the real challenge is > to try to keep backward compability. That is basically impossible. We can keep a compatible command-line interface, but the ABI can't be kept compatible. The interface itself it quite simple, but we also need new ruleset evaluation functions, new loop detection and probably a few other things. From gozem at gozem.se Mon Aug 14 18:32:04 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 19:02:54 2006 Subject: priv_data patch In-Reply-To: <200608141819.38152.max@nucleus.it> References: <44E07BCD.8030206@trash.net> <200608141640.41759.max@nucleus.it> <44E08D3B.7040505@trash.net> <200608141819.38152.max@nucleus.it> Message-ID: <20060814163204.GA7194@kriss.csbnet.se> 2006-08-14 18:19:36+0200, Massimiliano Hofer -> > On Monday 14 August 2006 4:48 pm, Patrick McHardy wrote: > > > Hmm .. recent does a table lookup during runtime and the table could be > > cached. That would improve things a bit, but in my opinion not enough > > to justify this patch. Same for hashlimit. What data would condition > > store exactly? > > I need a pointer to per condition data, so that multiple rules with the same > name refer to the same flag. > I can break userspace compatibility and store a pointer in the userspace > structure. I just thought this could be useful to everyone (and let me > maintain userspace compatibility along the way). > Same goes for all but one of my matches. All of them are accessing "their" global variable that several rules might share. All but my lim (new limiter) that has its own data that should be re-checkentry():ed everytime another rule changes. I sure can keep a global list of all limits but i have no good way of making each uniq. There is no naming as in condition (and all other of my modules). > > Its actually quite clear what is needed. We want a userspace interface > > built on netlink, that acts on individual rules, not entire rulesets. > > There are a few more ideas, like handling negation centrally, allowing > > userspace to specify whether a target is terminal or not, allow multiple > > non-terminal targets in a row, etc, but nothing really fundamental. > > I thought the current way of doing things was specifically designed to > minimize softirq locking (especially with arbitarily long chains and > arbitrary initialization code). We could switch to RCU lists, though... > That would solve alot of problems and make the data structure much more flexible in the futhure for alterations. Guess we have to change back to the old ipchains name then :-P -- Joakim Axelsson From gozem at gozem.se Mon Aug 14 18:40:48 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 19:11:39 2006 Subject: priv_data patch In-Reply-To: <44E0A42D.604@trash.net> References: <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09746.60302@trash.net> <20060814154005.GW7194@kriss.csbnet.se> <44E09AF3.2080406@trash.net> <20060814155642.GA15328@kriss.csbnet.se> <44E09E4F.3040506@trash.net> <20060814161337.GY7194@kriss.csbnet.se> <44E0A42D.604@trash.net> Message-ID: <20060814164048.GB7194@kriss.csbnet.se> 2006-08-14 18:26:21+0200, Patrick McHardy -> > Joakim Axelsson wrote: > > 2006-08-14 18:01:19+0200, Patrick McHardy -> > > > >>[...] > > > > I agree. A better solution is needed or no solution at all. However building > > a new API that actually only will change the parts it needs to will need a > > huge amount of work and will most probably break existing userspace. It will > > be very hard to keep backwards compability. > > We need to keep the current API stable of course, but that doesn't > prevent us from introducing an entirely new one. Matches and targets > probably can be reused to a certain amount, although I wouldn't > mind getting rid of the excessive amount of matches doing basically > the same thing at the same time. > > > As i see it some new hooks are needed on every module to retrieve the kernel > > data. This means we need 5 things: > > 1. start a new rule/match/target (todays checkentry()) > > 2. remove an old rule/match/target (todays destroy()) > > 3. the worker (todays match() and target()) > > 4. A new 'list' that will feed info needed by iptables -L and iptables-save > > to userspace. > > Commonly called dump in other netlink interfaces. > > > 5. Possible a new 'alter' that will alter info in the rules/match/targets > > private kernel data. > > This is tricky to get right on the rule level, a rule consist of > multiple elements that would need to be changed atomically. > Yes, sorry. I mean per match/target. > > Point 4 and 5 can be views as read and write of the rule. Point 1 and 2 as > > create and destroy. > > > > A uniq ID will be needed for each rule so userspace can define which rule it > > wishes to alter/remove. > > Yes. > > > Fairly simple interface to build with netlink. However the real challenge is > > to try to keep backward compability. > > That is basically impossible. We can keep a compatible command-line > interface, but the ABI can't be kept compatible. The interface itself > it quite simple, but we also need new ruleset evaluation functions, > new loop detection and probably a few other things. This work is huge, but really needed. I don't feel I am skilled enough to write it, only contribute with porting matches and other things. I did however write most of the code that ipset is based on now. So I have the "extreme amount of hook functions needed" in my back. The real question is. Do we really want to force each match/target to implement a fair amount of functions for it to work? We need to think big from the start here, not missing some feature that will be hard to add after. I rather see one too many needed function then one too few. Is this really something we want? It will most probably end up in a new ipfwadmin/ipchains/iptables -version. Its the easiest way todo it. Drop the backward compability completly and possibly only make a new iptables userspace command-line compability tool using the new API. -- Joakim Axelsson From kaber at trash.net Mon Aug 14 18:39:57 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 19:12:51 2006 Subject: xt_quota (Was: [PATCH] priv_data 0/2) In-Reply-To: <20060814162451.GZ7194@kriss.csbnet.se> References: <200606261641.47294.max@nucleus.it> <20060814141751.GR7194@kriss.csbnet.se> <44E08727.6070403@trash.net> <20060814153521.GV7194@kriss.csbnet.se> <44E09A21.6000503@trash.net> <20060814162451.GZ7194@kriss.csbnet.se> Message-ID: <44E0A75D.10205@trash.net> Joakim Axelsson wrote: > I however really needs some way of figuring out how much of the quota that > remains. This is to be able to report this to our users (that receives a > certain number of gigabytes each day). So they can see how much they have > left (using som scripted interface to iptables). Also saving this holy > figure (as it has become :-)) for the user if the router for some reason > craches. This is also the reason i need negative quota figures. The users > are allowed to "borrow" from their future quota. Doing so only under a byte > limiting match (-m lim --limit-bytes 20k/s). > > In my opinion its more important to save the remaining quota, rather than > the original. And most important to in some way be able to see how much > is left of the quota. > > Perhaps this wil satisfy both of us: > > Somehting put out by iptables-save > iptables -m quota --init-quota 1000 --remain-quota 123 --use-quota remain > > Somthing you write with iptables to create a new rule: > iptables -m quota --init-quota 1000 (using --use-quota init explicity) > iptables -m quota --init-quota 1000 --use-quota remain > > But this sure is ugly. It should be an explicit flag to iptable-save/restore to save the current state, because we don't do it anywhere else and therefore it is unexpected. The limit match for example does neither show nor save the current amount of tokens, last refill time, ... And I'm not too much of a fan of adding such a flag because it can only be done for a subset of all modules, hashlimit, recent etc. all can't do it. The most extreme case would be the state match :) With a netlink API we could actually dump all internal state (including things like recently seen IP addresses) and accept changes from userspace. This would allow us to get rid of the ugly proc interfaces and covers your need as well. From kaber at trash.net Mon Aug 14 18:50:28 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 19:23:21 2006 Subject: priv_data patch In-Reply-To: <20060814164048.GB7194@kriss.csbnet.se> References: <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09746.60302@trash.net> <20060814154005.GW7194@kriss.csbnet.se> <44E09AF3.2080406@trash.net> <20060814155642.GA15328@kriss.csbnet.se> <44E09E4F.3040506@trash.net> <20060814161337.GY7194@kriss.csbnet.se> <44E0A42D.604@trash.net> <20060814164048.GB7194@kriss.csbnet.se> Message-ID: <44E0A9D4.6060704@trash.net> Joakim Axelsson wrote: > 2006-08-14 18:26:21+0200, Patrick McHardy -> > >>>5. Possible a new 'alter' that will alter info in the rules/match/targets >>>private kernel data. >> >>This is tricky to get right on the rule level, a rule consist of >>multiple elements that would need to be changed atomically. >> > > Yes, sorry. I mean per match/target. Is that really useful without beeing able to change more than one component >>That is basically impossible. We can keep a compatible command-line >>interface, but the ABI can't be kept compatible. The interface itself >>it quite simple, but we also need new ruleset evaluation functions, >>new loop detection and probably a few other things. > > > This work is huge, but really needed. I don't feel I am skilled enough to > write it, only contribute with porting matches and other things. I did > however write most of the code that ipset is based on now. So I have the > "extreme amount of hook functions needed" in my back. > > The real question is. Do we really want to force each match/target to > implement a fair amount of functions for it to work? We need to think big > from the start here, not missing some feature that will be hard to add > after. I rather see one too many needed function then one too few. Its not so much. The interface comes down to "init", "destroy", "dump", "do something". If we really want "change" it should be possible to do it in one function with "init". And as I already said, I would like to get rid of the large amount of matches doing the same thing anyway. connbytes, connmark, conntrack, helper, ... basically all do "take data from conntrack, compare". realm, length, pkttype, .. do the same with skb metadata. A lot of matches on real packet data are also quite similar. We could easily get rid of 50%-75% of all matches and still have the same functionality. > Is this really something we want? It will most probably end up in a new > ipfwadmin/ipchains/iptables -version. Its the easiest way todo it. Drop the > backward compability completly and possibly only make a new iptables > userspace command-line compability tool using the new API. We can't do this, people expect to be able to user old versions of the iptables tool. But we can introduce a new interface and new tools without breaking the old ones. From gozem at gozem.se Mon Aug 14 18:55:49 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 19:26:38 2006 Subject: priv_data patch In-Reply-To: <44E0A132.6040709@trash.net> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09693.9000606@trash.net> <20060814160443.GX7194@kriss.csbnet.se> <44E0A132.6040709@trash.net> Message-ID: <20060814165549.GC7194@kriss.csbnet.se> 2006-08-14 18:13:38+0200, Patrick McHardy -> > Joakim Axelsson wrote: > > 2006-08-14 17:28:19+0200, Patrick McHardy -> > > > >>> > >>>Please have a look here for 4 modules "needing" this patch: > >>>http://www.gozem.se/~gozem/netfilter/ > >> > >>Please post your examples to the list. > >> > > > > > > In which format? As a full patch for the kernel or something that fits in > > pom-ng? The current code works for 2.4 and early 2.6. I don't want to spend > > time porting it into "wrong" API now when we are discussing priv_data to be > > or not to be :-) > > I'm mostly interested in the way you wish to use it, so I can better > judge whether this change is worth doing or not. > The change is NOT worth it if we can't keep the priv_data pointer between changes of unrelated rules. I better use a global list instead as condition is using today. Still, i want to know in whcih format the new modules are wanted. Patch, files fitting pom-ng or my own svn-repository to source.list? I will port my matches as soon as possible and post them for you to judge. Probably for tomorrow evening. -- Joakim Axelsson From kaber at trash.net Mon Aug 14 18:59:21 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 19:32:13 2006 Subject: priv_data patch In-Reply-To: <20060814165549.GC7194@kriss.csbnet.se> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09693.9000606@trash.net> <20060814160443.GX7194@kriss.csbnet.se> <44E0A132.6040709@trash.net> <20060814165549.GC7194@kriss.csbnet.se> Message-ID: <44E0ABE9.8070005@trash.net> Joakim Axelsson wrote: > The change is NOT worth it if we can't keep the priv_data pointer between > changes of unrelated rules. I better use a global list instead as condition > is using today. Thats my feeling as well. > Still, i want to know in whcih format the new modules are wanted. Patch, > files fitting pom-ng or my own svn-repository to source.list? An URL to an external repository is preferred. From gozem at gozem.se Mon Aug 14 19:11:11 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 19:42:02 2006 Subject: priv_data patch In-Reply-To: <44E0A9D4.6060704@trash.net> References: <20060814152026.GU7194@kriss.csbnet.se> <44E09746.60302@trash.net> <20060814154005.GW7194@kriss.csbnet.se> <44E09AF3.2080406@trash.net> <20060814155642.GA15328@kriss.csbnet.se> <44E09E4F.3040506@trash.net> <20060814161337.GY7194@kriss.csbnet.se> <44E0A42D.604@trash.net> <20060814164048.GB7194@kriss.csbnet.se> <44E0A9D4.6060704@trash.net> Message-ID: <20060814171111.GD7194@kriss.csbnet.se> 2006-08-14 18:50:28+0200, Patrick McHardy -> > Joakim Axelsson wrote: > > 2006-08-14 18:26:21+0200, Patrick McHardy -> > > > >>>5. Possible a new 'alter' that will alter info in the rules/match/targets > >>>private kernel data. > >> > >>This is tricky to get right on the rule level, a rule consist of > >>multiple elements that would need to be changed atomically. > >> > > > > Yes, sorry. I mean per match/target. > > > Is that really useful without beeing able to change more than > one component > I don't really understand your consern here? > >>That is basically impossible. We can keep a compatible command-line > >>interface, but the ABI can't be kept compatible. The interface itself > >>it quite simple, but we also need new ruleset evaluation functions, > >>new loop detection and probably a few other things. > > > > > > This work is huge, but really needed. I don't feel I am skilled enough to > > write it, only contribute with porting matches and other things. I did > > however write most of the code that ipset is based on now. So I have the > > "extreme amount of hook functions needed" in my back. > > > > The real question is. Do we really want to force each match/target to > > implement a fair amount of functions for it to work? We need to think big > > from the start here, not missing some feature that will be hard to add > > after. I rather see one too many needed function then one too few. > > Its not so much. The interface comes down to "init", "destroy", "dump", > "do something". If we really want "change" it should be possible to > do it in one function with "init". > So we are down to 4 function: 1. init 2. destroy 3. match/target (use) 4. dump (list) Meaning an alter is really a "dump, destroy, init"? This might/will give you atomic problems. I rather add one too many function not being used by most modules rather than one too few. Remeber its one hook for each modules, not for each rule. Its not a memory waster. Also think large matcher as recent here. If we are to skip any /proc -involement we would like to be able to remove one IP in a recent-match. That's an "alter". Perhaps we should add one or two "userdefined" functions. Something that the modules can define for them self to use. For example in the quota-case this can be "add more quota". It sure will make the new iptables-structure really flexible. Example: iptables -U1 --match-id xxxxx -m module --params iptables -U1 --match-id 12345 -m quota --add-quota 4000 We might be able to do this in an alter if we construct the alter to be able to take commands that the module can specify. Easily done if a seperate struct is passed with defined data that the module can interpetrate it self. We will probably be needing to be able to identify a rule and/or a match/target/module in some way uniqly. A simple tripple (chainname, rulenumber, matchnumber) might do it though. > And as I already said, I would like to get rid of the large amount of > matches doing the same thing anyway. connbytes, connmark, conntrack, > helper, ... basically all do "take data from conntrack, compare". > realm, length, pkttype, .. do the same with skb metadata. A lot > of matches on real packet data are also quite similar. We could > easily get rid of 50%-75% of all matches and still have the same > functionality. > Just like u32 match. It can replace alot of matches. > > Is this really something we want? It will most probably end up in a new > > ipfwadmin/ipchains/iptables -version. Its the easiest way todo it. Drop the > > backward compability completly and possibly only make a new iptables > > userspace command-line compability tool using the new API. > > We can't do this, people expect to be able to user old versions of > the iptables tool. But we can introduce a new interface and new > tools without breaking the old ones. Chainging to using RCU-list instead of tables will probably break old iptables badly. We can try writing a new iptables using the new API, but i guess nobody will be really intressed in doing this work. -- Joakim Axelsson From kaber at trash.net Mon Aug 14 19:48:48 2006 From: kaber at trash.net (Patrick McHardy) Date: Mon Aug 14 20:19:47 2006 Subject: priv_data patch In-Reply-To: <20060814171111.GD7194@kriss.csbnet.se> References: <20060814152026.GU7194@kriss.csbnet.se> <44E09746.60302@trash.net> <20060814154005.GW7194@kriss.csbnet.se> <44E09AF3.2080406@trash.net> <20060814155642.GA15328@kriss.csbnet.se> <44E09E4F.3040506@trash.net> <20060814161337.GY7194@kriss.csbnet.se> <44E0A42D.604@trash.net> <20060814164048.GB7194@kriss.csbnet.se> <44E0A9D4.6060704@trash.net> <20060814171111.GD7194@kriss.csbnet.se> Message-ID: <44E0B780.7050804@trash.net> Joakim Axelsson wrote: > 2006-08-14 18:50:28+0200, Patrick McHardy -> > >>Is that really useful without beeing able to change more than >>one component >> > > > I don't really understand your consern here? Seems I accidentally cut out half of the sentence. I meant to ask if this is really useful if only one component of a rule can be changed at a time atomically. >>>>That is basically impossible. We can keep a compatible command-line >>>>interface, but the ABI can't be kept compatible. The interface itself >>>>it quite simple, but we also need new ruleset evaluation functions, >>>>new loop detection and probably a few other things. >>> >>> >>>This work is huge, but really needed. I don't feel I am skilled enough to >>>write it, only contribute with porting matches and other things. I did >>>however write most of the code that ipset is based on now. So I have the >>>"extreme amount of hook functions needed" in my back. >>> >>>The real question is. Do we really want to force each match/target to >>>implement a fair amount of functions for it to work? We need to think big >>>from the start here, not missing some feature that will be hard to add >>>after. I rather see one too many needed function then one too few. >> >>Its not so much. The interface comes down to "init", "destroy", "dump", >>"do something". If we really want "change" it should be possible to >>do it in one function with "init". >> > > > So we are down to 4 function: > 1. init > 2. destroy > 3. match/target (use) > 4. dump (list) > > Meaning an alter is really a "dump, destroy, init"? No, but init and change will be pretty similar most of the time, so they can probably be handled by the same function. > Perhaps we should add one or two "userdefined" functions. Something that the > modules can define for them self to use. For example in the quota-case this > can be "add more quota". It sure will make the new iptables-structure really > flexible. > > Example: > iptables -U1 --match-id xxxxx -m module --params > iptables -U1 --match-id 12345 -m quota --add-quota 4000 > > We might be able to do this in an alter if we construct the alter to be able > to take commands that the module can specify. Easily done if a seperate > struct is passed with defined data that the module can interpetrate it self. This is not needed, netlink attributes can be nested, so module-specific stuff can be encapsulated in the module attributes. >>And as I already said, I would like to get rid of the large amount of >>matches doing the same thing anyway. connbytes, connmark, conntrack, >>helper, ... basically all do "take data from conntrack, compare". >>realm, length, pkttype, .. do the same with skb metadata. A lot >>of matches on real packet data are also quite similar. We could >>easily get rid of 50%-75% of all matches and still have the same >>functionality. >> > > > Just like u32 match. It can replace alot of matches. Something like that, but hopefully without running over the packet data. I had something like the meta ematch in mind. > Chainging to using RCU-list instead of tables will probably break old > iptables badly. We can try writing a new iptables using the new API, but i > guess nobody will be really intressed in doing this work. Both I and Harald are definitely interested in doing this work. Again, it must not break existing stuff, it will be something completely new (at least the core stuff, maybe not the targets or matches). From gozem at gozem.se Mon Aug 14 19:59:48 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Mon Aug 14 20:30:40 2006 Subject: priv_data patch In-Reply-To: <44E0B780.7050804@trash.net> References: <20060814154005.GW7194@kriss.csbnet.se> <44E09AF3.2080406@trash.net> <20060814155642.GA15328@kriss.csbnet.se> <44E09E4F.3040506@trash.net> <20060814161337.GY7194@kriss.csbnet.se> <44E0A42D.604@trash.net> <20060814164048.GB7194@kriss.csbnet.se> <44E0A9D4.6060704@trash.net> <20060814171111.GD7194@kriss.csbnet.se> <44E0B780.7050804@trash.net> Message-ID: <20060814175948.GE7194@kriss.csbnet.se> 2006-08-14 19:48:48+0200, Patrick McHardy -> > Joakim Axelsson wrote: > > 2006-08-14 18:50:28+0200, Patrick McHardy -> > > > >>Is that really useful without beeing able to change more than > >>one component > >> > > > > > > I don't really understand your consern here? > > Seems I accidentally cut out half of the sentence. I meant to ask > if this is really useful if only one component of a rule can be > changed at a time atomically. > > > >>>>That is basically impossible. We can keep a compatible command-line > >>>>interface, but the ABI can't be kept compatible. The interface itself > >>>>it quite simple, but we also need new ruleset evaluation functions, > >>>>new loop detection and probably a few other things. > >>> > >>> > >>>This work is huge, but really needed. I don't feel I am skilled enough to > >>>write it, only contribute with porting matches and other things. I did > >>>however write most of the code that ipset is based on now. So I have the > >>>"extreme amount of hook functions needed" in my back. > >>> > >>>The real question is. Do we really want to force each match/target to > >>>implement a fair amount of functions for it to work? We need to think big > >>>from the start here, not missing some feature that will be hard to add > >>>after. I rather see one too many needed function then one too few. > >> > >>Its not so much. The interface comes down to "init", "destroy", "dump", > >>"do something". If we really want "change" it should be possible to > >>do it in one function with "init". > >> > > > > > > So we are down to 4 function: > > 1. init > > 2. destroy > > 3. match/target (use) > > 4. dump (list) > > > > Meaning an alter is really a "dump, destroy, init"? > > No, but init and change will be pretty similar most of the time, so they > can probably be handled by the same function. > > > Perhaps we should add one or two "userdefined" functions. Something that the > > modules can define for them self to use. For example in the quota-case this > > can be "add more quota". It sure will make the new iptables-structure really > > flexible. > > > > Example: > > iptables -U1 --match-id xxxxx -m module --params > > iptables -U1 --match-id 12345 -m quota --add-quota 4000 > > > > We might be able to do this in an alter if we construct the alter to be able > > to take commands that the module can specify. Easily done if a seperate > > struct is passed with defined data that the module can interpetrate it self. > > This is not needed, netlink attributes can be nested, so module-specific > stuff can be encapsulated in the module attributes. > I don't really follow here? If you think really big. Include recent, ipset and the old ippool-idea. Where data is added and removed into the modules state as it works. If we want to change this state with a userspace tool we either need: 1. /proc 2. a new userspace-tool hooking, like ipset (ippool). 3. Function added to this new iptables being capable of it. A simply init will not do it. You can't remove one IP in a recent-pool using an init-function. It will probably wipe them all and empty the recent-pool. Removing the need to extra userspace programs (ipset, ippool, i know accounting has a tool as well) or the need for /proc is a huge win in my opinion. This also calls for two different forms of listing. The first one is the one we have today. List the rules for your firewall. The second in is to list the state of one match/target. Like 'cat /proc/net/ipt_recent/xxxx' today. Try to cover functions needed to operate ipset and/or recent from the begining. > > Chainging to using RCU-list instead of tables will probably break old > > iptables badly. We can try writing a new iptables using the new API, but i > > guess nobody will be really intressed in doing this work. > > Both I and Harald are definitely interested in doing this work. > Again, it must not break existing stuff, it will be something > completely new (at least the core stuff, maybe not the targets > or matches). From max at nucleus.it Mon Aug 14 23:12:41 2006 From: max at nucleus.it (Massimiliano Hofer) Date: Mon Aug 14 23:43:31 2006 Subject: new ABI Message-ID: <200608142312.41851.max@nucleus.it> Hi, I couldn't keep pace during the day with all the mail that has been written, so let me summarize what has been said. Please forgive (and correct) me if I forget anything. First of all several people think that the current ABI has shortcomings and something has to be done. Regarding my proposal for priv_data (I'm obviously biased here, but I'll try to be objective): - it offers the ability to store data out of reach to the userspace utilities (for whatever housekeeping any match/targets needs); - it can't offer persistent data to matches/targets. With this patch we can part with some really ugly tricks involving userspace structure fields and kernel pointers and it would let us have O(1) matches for quota, limit and any other match/target that needs cross match data. People may expect the second feature too, but it's just not possible with the current infrastructure. The same infrastructure, making extensive use of arrays whose size is determined before we call any module hook function, leaves us in the cold for a really flexible solution for other problems too. I've not yet read all the code involved, but, if we really need to, we may be able to use a compat-like interface to maintain ABI compatibility with a new netfilter core. What people need from any new infrastructure: - cleaner interface with clearer separation between kernel and user data; - ability to dump internal state of matches/targets (this may not be in a 1-to-1 relation, so it may be tricky, do we need module state dumping?); - ability to change chains/rules/matches without reinitializing everything; - ability to change matches' state or configuration without reinitializing everything; - general infrastructure for common logic that is currently reinvented every time (negation comes to mind, but I'm sure there are other things). Regarding user influence over state, especially where the number of states doesn't match the number of matches involved, I'm not totally opposed to a file-like way of exposing it. I agree that /proc is in a sorry state, but configfs is there precisely for this purpose. Of course not everything can be done this way and I wouldn't like to have complex data passed and parsed this way. We may need a new set of commands in iptables (should I call it iptables-ng?) just for keeping this kind of data (realms, quota groups, conditions, etc.). If we had a general way to keep collections of configurations, I'll be glad to conform and use it. I think the current array oriented data structures won't allow us to add these features. RCU lists come to mind. It sure is a step back in performance (sparser access to memory and more memory fragmentation), but it may not be that noticeable. I think every match/target should expode: - init; - destroy; - change; - dump; - restore. Depending on the API change, dump and restore may melt in a single function. The kernel should let any match: - receive user supplied initialization data (mostly a rule definition) state dumps (this calls for very careful planning and checking); - send state dumps to userspace; - keep private data for every match/target; - keep collections of configurations common to matches in a module (every module may keep it without netfilter core help, but if it becomes part of the infrastructure it may be handled through the userspace ABI). Am I forgetting anything? Do you think any of these features are bugs? Am I overseeing fatal difficulties related to what I wrote? Please reply with your opinions. I'll wear my asbestos suite for the next couple of days. :) -- Saluti, Massimiliano Hofer From gozem at gozem.se Tue Aug 15 02:00:36 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Tue Aug 15 02:31:28 2006 Subject: new ABI In-Reply-To: <200608142312.41851.max@nucleus.it> References: <200608142312.41851.max@nucleus.it> Message-ID: <20060815000036.GF7194@kriss.csbnet.se> 2006-08-14 23:12:41+0200, Massimiliano Hofer -> > Hi, > I couldn't keep pace during the day with all the mail that has been written, > so let me summarize what has been said. > Please forgive (and correct) me if I forget anything. > > First of all several people think that the current ABI has shortcomings and > something has to be done. > > Regarding my proposal for priv_data (I'm obviously biased here, but I'll try > to be objective): > - it offers the ability to store data out of reach to the userspace utilities > (for whatever housekeeping any match/targets needs); > - it can't offer persistent data to matches/targets. > > > With this patch we can part with some really ugly tricks involving userspace > structure fields and kernel pointers and it would let us have O(1) matches > for quota, limit and any other match/target that needs cross match data. > > O(1) can be done today as well. Just figure out the pointer in checkentry() and keep it. However, do this everytime. Don't trust userspace to not alter it. So its O(1) in match()/target() but not in checkentry(). Shouldn't be too bad. > People may expect the second feature too, but it's just not possible with the > current infrastructure. The same infrastructure, making extensive use of > arrays whose size is determined before we call any module hook function, > leaves us in the cold for a really flexible solution for other problems too. > I've not yet read all the code involved, but, if we really need to, we may be > able to use a compat-like interface to maintain ABI compatibility with a new > netfilter core. > > What people need from any new infrastructure: > - cleaner interface with clearer separation between kernel and user data; > - ability to dump internal state of matches/targets (this may not be in a > 1-to-1 relation, so it may be tricky, do we need module state dumping?); > - ability to change chains/rules/matches without reinitializing everything; > - ability to change matches' state or configuration without reinitializing > everything; > - general infrastructure for common logic that is currently reinvented every > time (negation comes to mind, but I'm sure there are other things). > > > Regarding user influence over state, especially where the number of states > doesn't match the number of matches involved, I'm not totally opposed to a > file-like way of exposing it. I agree that /proc is in a sorry state, but > configfs is there precisely for this purpose. Of course not everything can be > done this way and I wouldn't like to have complex data passed and parsed this > way. This isn't a bad idea. Can we make an iptablesfs that can be used in a smart way? It will offer an obvious API for any userspace-program/script to use. We wouldn't then need any explicit userspace-program at all to maintain. Only kernel. Perhaps we will push parsing functions into kernel then. Not good. But if we could come up with an API for a iptablesfs the parsing would be minimal and just in common fast functions. > We may need a new set of commands in iptables (should I call it iptables-ng?) > just for keeping this kind of data (realms, quota groups, conditions, etc.). > If we had a general way to keep collections of configurations, I'll be glad > to conform and use it. > > > I think the current array oriented data structures won't allow us to add these > features. RCU lists come to mind. It sure is a step back in performance > (sparser access to memory and more memory fragmentation), but it may not be > that noticeable. > I think every match/target should expode: > - init; > - destroy; > - change; > - dump; > - restore. > Don't forget the worker: match()/target(). > Depending on the API change, dump and restore may melt in a single function. > > The kernel should let any match: > - receive user supplied initialization data (mostly a rule definition) state > dumps (this calls for very careful planning and checking); > - send state dumps to userspace; > - keep private data for every match/target; > - keep collections of configurations common to matches in a module (every > module may keep it without netfilter core help, but if it becomes part of the > infrastructure it may be handled through the userspace ABI). > > > Am I forgetting anything? > Do you think any of these features are bugs? > Am I overseeing fatal difficulties related to what I wrote? > Please reply with your opinions. I'll wear my asbestos suite for the next > couple of days. :) > It would probably be nice to introduce more advanced pseudo data types. Like a rate type ( X / time ). IP-type. Netmask-type and so on. Common parser libraries for the userspace tool. Also, don't forget that people tend to think that iptables are way too complicated. I think people like BSD style of writing the rules to a file that is then "executed". The file syntax is more of writing sentences of what you want. I my self hate that way of configuring with "words" rather than parameters. But still, one thing to consider. A file only config somewhat solves the iptables vs iptables-save/restore syndrome. They are not always in sync. Also in comparation with switches like Cisco, configuration alterations are always saved. You work in a shell what changes the only config-file directly. This means that in our case, iptables and iptables-save are the same. iptables only alter the "only file" that iptables-save has. By introducing a file only will not nessesary make a full realod of all rules only to alter one rule. The rules, if having a state can get a state id which they can hook back on when reloading. Also, try to move away from the small thinking or rules we have today. Try to see a bigger picture. What is a rule. What can it do? What is match or target? What is a module. I could basicly write myself a module today that does my entire firewall using only one iptables rule. "iptables -A INPUT -m myhugefirewall". Try focus on the bigger modules like recent, ipset, accounting, conntrack. Not the small and simple ones as length, ttl and mport. Another way of doing firewall is to write your rules in some syntax in a file. Have a userspace program parse it into C-code. Have your gcc compiler compile it into a kernel module. Load it. This will optimize the firewall ALOT. Still again, states can be saved between reloads just using some ids and hooks for the rules that needs a state. If you look at the work i began with ippool which was later finished in a much smaller version as ipset has no limits at all (well alot less atleast :-). The idea was to make three category of elements. Data structes, data interpreter and algorithms. Meaning we can have as data strcutes: array, bitmap, hash, rcu-list, priority queue. Data interpreter: ip-addresses, ipv6-addresses, port numbers, times/dates, ranges and so on. Algoritms: timeout, sorting, logging and more. You can then combine a data structure with a data interpreter, and possible with an algoritm or two. So to build recent you combine data structure hash with ip-addresses, might also add the algorithm timeout. To just match a single IP source address, combine the data structure 'single' with data interpreter 'IPv4 address' and algoritm 'source', or something like that. This would be impossible with the current API of iptables. But if you only add alter/change and dump this would be. Even better i can be integrated with new iptables-ng perhaps. Crazy idea and perhaps impossible to make it easy to use. But would be a really really nice professional tool. Also remember that alot of firewall setups for routers handles several different destinations. Today no good way of grouping rules and/or trying to group which destinations belongs to which custumer exists. Many crazy ideas, so keep your asbestos suit on :-P -- Joakim Axelsson From azez at ufomechanic.net Tue Aug 15 10:27:40 2006 From: azez at ufomechanic.net (Amin Azez) Date: Tue Aug 15 10:58:46 2006 Subject: priv_data patch In-Reply-To: <20060814165549.GC7194@kriss.csbnet.se> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09693.9000606@trash.net> <20060814160443.GX7194@kriss.csbnet.se> <44E0A132.6040709@trash.net> <20060814165549.GC7194@kriss.csbnet.se> Message-ID: <44E1857C.80008@ufomechanic.net> * Joakim Axelsson wrote, On 14/08/06 17:55: > 2006-08-14 18:13:38+0200, Patrick McHardy -> >> Joakim Axelsson wrote: >>> 2006-08-14 17:28:19+0200, Patrick McHardy -> >>> >>>>> Please have a look here for 4 modules "needing" this patch: >>>>> http://www.gozem.se/~gozem/netfilter/ >>>> Please post your examples to the list. >>>> >>> >>> In which format? As a full patch for the kernel or something that fits in >>> pom-ng? The current code works for 2.4 and early 2.6. I don't want to spend >>> time porting it into "wrong" API now when we are discussing priv_data to be >>> or not to be :-) >> I'm mostly interested in the way you wish to use it, so I can better >> judge whether this change is worth doing or not. >> > > The change is NOT worth it if we can't keep the priv_data pointer between > changes of unrelated rules. I better use a global list instead as condition > is using today. Or, anticipating the future, why not let the current implementation of priv_data be used to cache the entry from the global list. When priv_data is satsifactory the global list can go. Why not let priv_data manage the "backup" global list while it is needed, then client modules of priv_data don't even need to know about this; which is better than client modules all implementing their own global list. I'm saying that while a global list may currently be needed we can still encapsulate it in the priv_data api and still get MOST of the benefit now and easily get all of the benefit later. Sam From netfilter at mm-double.de Tue Aug 15 10:29:30 2006 From: netfilter at mm-double.de (Maik Hentsche) Date: Tue Aug 15 11:00:26 2006 Subject: possible Bug in ip_conntrack In-Reply-To: <44E0719C.9050307@trash.net> References: <1155545885.44e03b1dcec01@www.domainfactory-webmail.de> <44E0719C.9050307@trash.net> Message-ID: <1155630570.44e185ea45986@www.domainfactory-webmail.de> Zitat von Patrick McHardy : > Can you test current -git please? We had some changes in that area .. > I don't recall any explicit bugfixes, but who knows .. Unfortunatelly, it still occurs with -git. so long Maik From azez at ufomechanic.net Tue Aug 15 10:39:55 2006 From: azez at ufomechanic.net (Amin Azez) Date: Tue Aug 15 11:10:57 2006 Subject: new ABI In-Reply-To: <20060815000036.GF7194@kriss.csbnet.se> References: <200608142312.41851.max@nucleus.it> <20060815000036.GF7194@kriss.csbnet.se> Message-ID: <44E1885B.5000300@ufomechanic.net> * Joakim Axelsson wrote, On 15/08/06 01:00: > 2006-08-14 23:12:41+0200, Massimiliano Hofer -> >> Hi, >> I couldn't keep pace during the day with all the mail that has been written, >> so let me summarize what has been said. >> Please forgive (and correct) me if I forget anything. >> >> First of all several people think that the current ABI has shortcomings and >> something has to be done. >> >> Regarding my proposal for priv_data (I'm obviously biased here, but I'll try >> to be objective): >> - it offers the ability to store data out of reach to the userspace utilities >> (for whatever housekeeping any match/targets needs); >> - it can't offer persistent data to matches/targets. >> >> >> With this patch we can part with some really ugly tricks involving userspace >> structure fields and kernel pointers and it would let us have O(1) matches >> for quota, limit and any other match/target that needs cross match data. >> >> > > O(1) can be done today as well. Just figure out the pointer in checkentry() > and keep it. However, do this everytime. Don't trust userspace to not alter > it. So its O(1) in match()/target() but not in checkentry(). Shouldn't be > too bad. I did this for layer7 matching to cache the compiled regex, however it stopped deletion of rules by specification (not by index) because the matchinfo struct no longer matched (the kernel based one had the pointer but the userland based one that was being compared did not). I didn't pin down the code doing the match which would need "teaching" not to match the private bit, as time constraints were too tight. > It would probably be nice to introduce more advanced pseudo data types. Like > a rate type ( X / time ). IP-type. Netmask-type and so on. Common parser > libraries for the userspace tool. > > Also, don't forget that people tend to think that iptables are way too > complicated. I think people like BSD style of writing the rules to a file > that is then "executed". The file syntax is more of writing sentences of > what you want. I my self hate that way of configuring with "words" rather > than parameters. But still, one thing to consider. > > A file only config somewhat solves the iptables vs iptables-save/restore > syndrome. They are not always in sync. Also in comparation with switches > like Cisco, configuration alterations are always saved. You work in a shell > what changes the only config-file directly. This means that in our case, > iptables and iptables-save are the same. iptables only alter the "only file" > that iptables-save has. By introducing a file only will not nessesary make a > full realod of all rules only to alter one rule. The rules, if having a > state can get a state id which they can hook back on when reloading. > > Also, try to move away from the small thinking or rules we have today. Try > to see a bigger picture. What is a rule. What can it do? What is match or > target? What is a module. I could basicly write myself a module today that > does my entire firewall using only one iptables rule. > "iptables -A INPUT -m myhugefirewall". > Try focus on the bigger modules like recent, ipset, accounting, conntrack. > Not the small and simple ones as length, ttl and mport. I have modified iptables-restore (as it is good at parsing iptables-save format) to output an xml specification. In support of your comment here; if iptables modules preserved semantics by making use of macro's or function calls instead of printf when saving their rules, then it would be easy to support various more readable representations of rule, which would answer your suggestions mentioned here. > Another way of doing firewall is to write your rules in some syntax in a > file. Have a userspace program parse it into C-code. Have your gcc compiler > compile it into a kernel module. Load it. This will optimize the firewall > ALOT. Still again, states can be saved between reloads just using some ids > and hooks for the rules that needs a state. > > If you look at the work i began with ippool which was later finished in a > much smaller version as ipset has no limits at all (well alot less atleast > :-). The idea was to make three category of elements. Data structes, data > interpreter and algorithms. Meaning we can have as data strcutes: array, > bitmap, hash, rcu-list, priority queue. Data interpreter: ip-addresses, > ipv6-addresses, port numbers, times/dates, ranges and so on. Algoritms: > timeout, sorting, logging and more. You can then combine a data structure > with a data interpreter, and possible with an algoritm or two. So to build > recent you combine data structure hash with ip-addresses, might also add the > algorithm timeout. To just match a single IP source address, combine the > data structure 'single' with data interpreter 'IPv4 address' and algoritm > 'source', or something like that. This would be impossible with the current > API of iptables. But if you only add alter/change and dump this would be. > Even better i can be integrated with new iptables-ng perhaps. Crazy idea and > perhaps impossible to make it easy to use. But would be a really really nice > professional tool. > > Also remember that alot of firewall setups for routers handles several > different destinations. Today no good way of grouping rules and/or trying to > group which destinations belongs to which custumer exists. > > Many crazy ideas, so keep your asbestos suit on :-P These ideas arenot crazy and are very much relevant to me. I'm moving to xml representation of rules because the abstraction allows me to implement user requirements in various ways depending on the current capability of iptables. iptables now supports a module appearing more than once in a match, but not multiple targets. With a meaningful xml representation (or any easily manipulatable representation) I can render the users requirements as multiple iptables matches now, possibly with extra chains, and have these reduced to a single rule in the future. Sam Sam From gozem at gozem.se Tue Aug 15 10:40:38 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Tue Aug 15 11:11:50 2006 Subject: priv_data patch In-Reply-To: <44E1857C.80008@ufomechanic.net> References: <44E07BCD.8030206@trash.net> <20060814142559.GS7194@kriss.csbnet.se> <44E08946.1040105@trash.net> <20060814152026.GU7194@kriss.csbnet.se> <44E09693.9000606@trash.net> <20060814160443.GX7194@kriss.csbnet.se> <44E0A132.6040709@trash.net> <20060814165549.GC7194@kriss.csbnet.se> <44E1857C.80008@ufomechanic.net> Message-ID: <20060815084038.GG7194@kriss.csbnet.se> 2006-08-15 09:27:40+0100, Amin Azez -> > * Joakim Axelsson wrote, On 14/08/06 17:55: > > 2006-08-14 18:13:38+0200, Patrick McHardy -> > >> Joakim Axelsson wrote: > >>> 2006-08-14 17:28:19+0200, Patrick McHardy -> > >>> > >>>>> Please have a look here for 4 modules "needing" this patch: > >>>>> http://www.gozem.se/~gozem/netfilter/ > >>>> Please post your examples to the list. > >>>> > >>> > >>> In which format? As a full patch for the kernel or something that fits in > >>> pom-ng? The current code works for 2.4 and early 2.6. I don't want to spend > >>> time porting it into "wrong" API now when we are discussing priv_data to be > >>> or not to be :-) > >> I'm mostly interested in the way you wish to use it, so I can better > >> judge whether this change is worth doing or not. > >> > > > > The change is NOT worth it if we can't keep the priv_data pointer between > > changes of unrelated rules. I better use a global list instead as condition > > is using today. > > > Or, anticipating the future, why not let the current implementation of > priv_data be used to cache the entry from the global list. This can be done in the info userspace passes. Might be somewhat ugly to have a kernel-only-used pointer to this internal kernel state. Of couse will this pointer be needed to be set in every call of checkentry(). > When > priv_data is satsifactory the global list can go. Why not let priv_data > manage the "backup" global list while it is needed, then client modules > of priv_data don't even need to know about this; which is better than > client modules all implementing their own global list. > > I'm saying that while a global list may currently be needed we can still > encapsulate it in the priv_data api and still get MOST of the benefit > now and easily get all of the benefit later. > > Sam From sebastian_hagen at memespace.net Tue Aug 15 19:10:15 2006 From: sebastian_hagen at memespace.net (Sebastian Hagen) Date: Tue Aug 15 19:41:15 2006 Subject: libnetfilter_conntrack checks for (getuid() == 0) Message-ID: <44E1FFF7.2010103@memespace.net> I'm in the process of writing a program that depends on libnetfilter_conntrack (currently using the current version, that is svn revision 6663), and have run into an annoyance. Obviously interfacing with the ip_conntrack_netlink module requires elevated privileges; I'm not quite certain what the required set of required privileges for initializing the socket is, but after that CAP_NET_ADMIN is definitely sufficient for using dump_conntrack_table(). None of these operations, afaict, really require the process to have an uid of 0. Unfortunately libnetfilter_conntrack checks for that anyway, specifically in nfct_event_conntrack() and nfct_event_expectation(). The specific code is in both cases: if (getuid() != 0) return -EPERM; The actual useful code of these functions appears to me to be a strict subset of that of dump_conntrack_table(); so since dump_conntrack_table() continues to work with only CAP_NET_ADMIN, so should nfct_event_conntrack() and nfct_event_expectation(). Additionally, if one does drop CAP_NET_ADMIN from the effective capability set, dump_conntrack_table() will return the error correctly. IMHO, these explicit checks for getuid() == 0... a) are wrong as they prevent the library user from dropping 'privileges' (uid==0 isn't strictly a privilege, but considering the file ownership on many systems, it might as well be) they really should be able to drop b) allow false negatives as uid 0 processes don't necessarily have CAP_NET_ADMIN c) are afaict completely useless in any event, since nfnl_listen() will fail correctly in the absence of CAP_NET_ADMIN ...and should therefore be removed. Please do correct me if I'm mistaken about any of this. If I'm not, should I make a patch for this? Since the fix would simply consist of removing the four mentioned lines from the source, doing that would be trivial. Sebastian Hagen From kaber at trash.net Tue Aug 15 19:12:52 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 15 19:45:59 2006 Subject: libnetfilter_conntrack checks for (getuid() == 0) In-Reply-To: <44E1FFF7.2010103@memespace.net> References: <44E1FFF7.2010103@memespace.net> Message-ID: <44E20094.8080309@trash.net> Sebastian Hagen wrote: > I'm in the process of writing a program that depends on > libnetfilter_conntrack (currently using the current version, that is svn > revision 6663), and have run into an annoyance. > Obviously interfacing with the ip_conntrack_netlink module requires elevated > privileges; I'm not quite certain what the required set of required > privileges for initializing the socket is, but after that CAP_NET_ADMIN is > definitely sufficient for using dump_conntrack_table(). > > None of these operations, afaict, really require the process to have an uid > of 0. Unfortunately libnetfilter_conntrack checks for that anyway, > specifically in nfct_event_conntrack() and nfct_event_expectation(). The > specific code is in both cases: > > if (getuid() != 0) > return -EPERM; > > The actual useful code of these functions appears to me to be a strict > subset of that of dump_conntrack_table(); so since dump_conntrack_table() > continues to work with only CAP_NET_ADMIN, so should nfct_event_conntrack() > and nfct_event_expectation(). > Additionally, if one does drop CAP_NET_ADMIN from the effective capability > set, dump_conntrack_table() will return the error correctly. > > IMHO, these explicit checks for getuid() == 0... > a) are wrong as they prevent the library user from dropping 'privileges' > (uid==0 isn't strictly a privilege, but considering the file ownership on > many systems, it might as well be) they really should be able to drop > > b) allow false negatives as uid 0 processes don't necessarily have CAP_NET_ADMIN > > c) are afaict completely useless in any event, since nfnl_listen() will fail > correctly in the absence of CAP_NET_ADMIN > > ...and should therefore be removed. Fully agreed. > Please do correct me if I'm mistaken about any of this. > If I'm not, should I make a patch for this? Since the fix would simply > consist of removing the four mentioned lines from the source, doing that > would be trivial. Please send a patch. From sebastian_hagen at memespace.net Tue Aug 15 19:45:20 2006 From: sebastian_hagen at memespace.net (Sebastian Hagen) Date: Tue Aug 15 20:16:21 2006 Subject: libnetfilter_conntrack checks for (getuid() == 0) In-Reply-To: <44E20094.8080309@trash.net> References: <44E1FFF7.2010103@memespace.net> <44E20094.8080309@trash.net> Message-ID: <44E20830.6090402@memespace.net> Patrick McHardy wrote: > Please send a patch. Done, and attached to this mail. Sebastian Hagen -------------- next part -------------- A non-text attachment was scrubbed... Name: libnetfilter_conntrack_getuid.patch Type: text/x-patch Size: 784 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060815/e3723c72/libnetfilter_conntrack_getuid.bin From max at nucleus.it Wed Aug 16 00:08:25 2006 From: max at nucleus.it (Massimiliano Hofer) Date: Wed Aug 16 00:39:26 2006 Subject: new ABI In-Reply-To: <20060815000036.GF7194@kriss.csbnet.se> References: <200608142312.41851.max@nucleus.it> <20060815000036.GF7194@kriss.csbnet.se> Message-ID: <200608160008.26880.max@nucleus.it> On Tuesday 15 August 2006 2:00 am, Joakim Axelsson wrote: > > With this patch we can part with some really ugly tricks involving > > userspace structure fields and kernel pointers and it would let us have > > O(1) matches for quota, limit and any other match/target that needs cross > > match data. > > O(1) can be done today as well. Just figure out the pointer in checkentry() > and keep it. However, do this everytime. Don't trust userspace to not alter > it. So its O(1) in match()/target() but not in checkentry(). Shouldn't be > too bad. I know. I already figure it out at checkentry (it wouldn't work at all if I didn't), the real point was: where do I store it? The answer can be: - I reserve some space in the userspace provided structure in order to do it; - I apply priv_data. While I certainly can do the former, I think the latter is far cleaner. Of course changing the ABI to all other matches isn't nice either. > This isn't a bad idea. Can we make an iptablesfs that can be used in a > smart way? It will offer an obvious API for any userspace-program/script to > use. We wouldn't then need any explicit userspace-program at all to > maintain. Only kernel. > > Perhaps we will push parsing functions into kernel then. Not good. But if > we could come up with an API for a iptablesfs the parsing would be minimal > and just in common fast functions. As much as I'd like to "ls" and "tar" my rules, I think that pushing the whole userspace parsing done by iptables into the kernel is a Bad Thing (TM). I was referring to modules that needed extra configurations (something more than simple match parameters) and that needed to communicate with userspace. Basically anything that has a "name" or "realm" parameter in it. I currently maintain condition that uses procfs and I'm going to convert it to configfs. I was asking the people in the mailing list if this feature is useful/widespread enough to justify general infrastructure. This could be in the form of a file system hook (/config/netfilter/modulename/entry) or a specific set of command in the next version of iptables. How many people would benefit from this? > > I think every match/target should expode: > > - init; > > - destroy; > > - change; > > - dump; > > - restore. > > Don't forget the worker: match()/target(). Yeah, that too. :) > It would probably be nice to introduce more advanced pseudo data types. > Like a rate type ( X / time ). IP-type. Netmask-type and so on. Common > parser libraries for the userspace tool. I agree. Although, if matches were really modular and the combination logic powerful enough, you would hardly need to parse the same type of data twice. I realize this is a bit idealistic. A library of parser functions would certainly be nice. > Also, don't forget that people tend to think that iptables are way too > complicated. I think people like BSD style of writing the rules to a file > that is then "executed". The file syntax is more of writing sentences of > what you want. I my self hate that way of configuring with "words" rather > than parameters. But still, one thing to consider. I like my rules generated by my scripts. Of course I could generate a file, but I don't think there is anything obviously better in this approach. Either way some utility needs to parse something, paramters on the command line or lines in a file makes really little difference. I don't know much about BSD. Is there something that makes it really easier? > A file only config somewhat solves the iptables vs iptables-save/restore > syndrome. They are not always in sync. Also in comparation with switches A good library approach to both would solve this. > like Cisco, configuration alterations are always saved. You work in a shell I often work with complex rulesets that are fare less complex to generate based on some paramters and a few criteria. In some cases I will always need program generated rules. In many other cases I could do without them if I had: - expression "subchains": it would be really powerful if I could create a chain with a resulting target that just says true or false and then use it as a match in other rules (we could even cache the result for multiple invocations); - effective ways to handle sets (ports, IPs, anything): I know what I need is in pom, but it's not yet in the stable kernel and we could expand the concept farther away. > what changes the only config-file directly. This means that in our case, > iptables and iptables-save are the same. iptables only alter the "only > file" that iptables-save has. By introducing a file only will not nessesary > make a full realod of all rules only to alter one rule. The rules, if > having a state can get a state id which they can hook back on when > reloading. A system of rules/matches ids will be needed. If we will use command interface these could be assigned by the kernel. With a file we will need to rely on the user. > Also, try to move away from the small thinking or rules we have today. Try That's why I was asking for opinions. Whatever you propose, keep in mind that someone has to implement it. I will gladly help, but I don't have enough time to work but on a fraction of the final result. So, it has to be better, but reachable. Of course will need to design with room for improvement. > Another way of doing firewall is to write your rules in some syntax in a > file. Have a userspace program parse it into C-code. Have your gcc compiler > compile it into a kernel module. Load it. This will optimize the firewall > ALOT. Still again, states can be saved between reloads just using some ids > and hooks for the rules that needs a state. This would optimize the simple rules, but the larger ones eat most of their time in code that is already optimized this way (eg: tracking code for state). More than a compiler we would need a rule optimizer. How many people has performance issues with the current system? How much better are other production systems? If you want to think big you could design the new rules like a logic language and let loose any sort of optimization on it. > If you look at the work i began with ippool which was later finished in a I'll try and read it in the next few days. > the algorithm timeout. To just match a single IP source address, combine > the data structure 'single' with data interpreter 'IPv4 address' and This looks like generic programming and good design patterns. The difficult part is exposing this much flexibility to the user in a meaningful way. > Many crazy ideas, so keep your asbestos suit on :-P You too. :-P -- Saluti, Massimiliano Hofer Nucleus From max at nucleus.it Wed Aug 16 00:57:04 2006 From: max at nucleus.it (Massimiliano Hofer) Date: Wed Aug 16 01:28:03 2006 Subject: new ABI In-Reply-To: <200608151414.24599.simon@parknet.dk> References: <200608142312.41851.max@nucleus.it> <200608151414.24599.simon@parknet.dk> Message-ID: <200608160057.05431.max@nucleus.it> On Tuesday 15 August 2006 2:14 pm, Simon Lodal wrote: > Everybody has a long wishlist and seem to agree that something fundamental > needs to be done. > > The question seems to be when backwards compatibility can be given up. Everyone agrees that we have reached the maximum expressiveness with the current system. Nobody says that we couldn't keep a way to convert old rules in the new system. The real question thus becomes: is it worh to restart from (almost) scratch? > > What people need from any new infrastructure: > > - cleaner interface with clearer separation between kernel and user data; > > - ability to dump internal state of matches/targets (this may not be in a > > 1-to-1 relation, so it may be tricky, do we need module state dumping?); > > Yes, but why should that be hard? Netfilter should already have a list of > registered modules. Yes, but iptables has no way to manipulate per-module data (eg: collection of names and flags for condition, but there are plenty other examples). I don't think it would be difficult, even without a total redesign. I was testing the ground for ideas and real needs. > We are going to have "interesting" data that are not 1:1 with rules. But > then they will be 1:1 with modules, or some other "scope" that netfilter Make it n:1. I don't think n:n is desirable. > knows how to traverse. Each "scope" can have their own section in the > iptables-save output. Hence the parsing complexity lies in > iptables-restore. > > Whether it is all going to be exposed in some filesystem or not is a > different matter. I like file interfaces, but not everything readily becomes a file. It all depends on what people really want to do with this class of data. > What is the version after it going to be then? No, I never liked the -ng > suffix :) > > What is wrong with iptables2? OK. We had ipfwadm and ipchains. So we're really more like iptables4. :) > Flexibility is not free, but perhaps it can be cheap, performance wise. > > Let's say we make iptables more shell-like, with the ability to handle > multiple commands in one invocation (with a final COMMIT command required)? > Would be lovely in itself. > > Then iptables would get a better chance to optimize memory allocation, > since it is not only looking at one rule at a time. > > The case where you load the entire firewall ruleset in one go could be > optimized to a point where it is no different from today. This if we assume we know the sizes of everything. I think matches/targets need to have a chance to influence their own data (now they can't). We'll have: - general data structures (fixed); - match/target descriptor (passed by userspace and of known size); - match/target runtime data (potentially anything from a single byte to a dynamic structure). Currently matches/targets are fed the descriptor. I'd like them to be fed a descriptor and their runtime data. We can suppose the latter won't be needed by every match, so it won't impact performance. We still got a fixed size data structure that we can move/compact/rewrite and a descriptor that we can potentially move (we could move it if people weren't abusing it for lack of runtime data) but with variable sized. The first one can become a simple allocation in list node array (with some mechanism for growing and shrinking). The descriptors are a little more tricky and we would need stricter specifications in order to do proper repacking. Before we continue work on a non-problem: do we have data about kernel memory fragmentation and performance issues? > * ipt_entry* structs might contain data (like basic src/dst/port/iface > matches), but they may not keep pointers to anything, not even their own > fields. They are independent of their own memory location. The memory > management code can therefore rearrange the tables at will (proper locking > assumed), without having to reinitialize rules. Good. I just don't know if this is overdesigned. > * All other memory is accessed through a struct that is passed to each > rule/match/target's API functions. It contains at least .instance_data, but > also .module_data (.priv_data), and perhaps other scopes data, > like .rule_data, .chain_data and .global_data (all cross-module). Note that > each of these are bound to a specific entity. I agree. > * Each module and instance must call special netfilter API's to allocate > memory of the required types. The netfilter part handles free'ing through > refcount (why not). If we don't have cross-module data (does anyone need it?) each module could do it's housekeeping. It's difficult to know how to optimize other people's data. > * The actual .*_data pointers may change between invocations (packets fed > to) of the same rule/match/target. This means the netfilter part is allowed > to rearrange dynamic memory too. What if people want to keep pointers and other complex data structures? The instance data should be opaque to the core code. The risk is that people, not trusting this structure, will use it just to keep a pointer to the real data. > * Bonus: Sync of memory regions with other hosts can be handled > transparently, or at least easily. So that fx. limit rules can work across > redundant hosts. Malus: a whole memory management system just for a subsystem of the kernel. Too much semantics risks to limit what people want to do. Of course anarchy has drawbacks too. I'd seek a middle ground where we handle the common case and leave people free to implement exotic new things. > I have no clear idea how all these individual blobs would be communicated > between kernel and userspace. Except there are two general options: > > 1) The current "pass a large blob" scheme. Since it will contain many > smaller blobs, some in-kernel parsing is required. Worse yet, the kernel > must also be able to assemble a large blob in order to dump to userspace. Either way we'll need some form of rule and match id. I don't know what level of transactionality is desired. Currently iptables-restore is atomic and so are single changes with iptables. How much is needed with the new system? At least rule level atomicity is certainly desired, so we'll need to create duplicate data (just the core structure with pointers to the real descriptors) during modifications. > > I think every match/target should expode: > > - init; > > - destroy; > > - change; > > - dump; > > - restore. > > change() would be nice, like in qdisc. Does it really make sense? How many matches would have a different behaviour while changing instead of a full create-activate_new-destroy? > Applause for possibly opening the can of worms :) :) -- Saluti, Massimiliano Hofer Nucleus From kaber at trash.net Wed Aug 16 13:12:18 2006 From: kaber at trash.net (Patrick McHardy) Date: Wed Aug 16 13:43:23 2006 Subject: libnetfilter_conntrack checks for (getuid() == 0) In-Reply-To: <44E20830.6090402@memespace.net> References: <44E1FFF7.2010103@memespace.net> <44E20094.8080309@trash.net> <44E20830.6090402@memespace.net> Message-ID: <44E2FD92.8080804@trash.net> Sebastian Hagen wrote: > Patrick McHardy wrote: > >>Please send a patch. > > Done, and attached to this mail. Thanks, applied. From kaber at trash.net Wed Aug 16 13:31:06 2006 From: kaber at trash.net (Patrick McHardy) Date: Wed Aug 16 14:04:15 2006 Subject: possible Bug in ip_conntrack In-Reply-To: <1155630570.44e185ea45986@www.domainfactory-webmail.de> References: <1155545885.44e03b1dcec01@www.domainfactory-webmail.de> <44E0719C.9050307@trash.net> <1155630570.44e185ea45986@www.domainfactory-webmail.de> Message-ID: <44E301FA.9020608@trash.net> Maik Hentsche wrote: > Zitat von Patrick McHardy : > > >>Can you test current -git please? We had some changes in that area .. >>I don't recall any explicit bugfixes, but who knows .. > > > Unfortunatelly, it still occurs with -git. Can you try this patch please? It should fix the problem. -------------- next part -------------- [NETFILTER]: ctnetlink: fix deadlock in table dumping ip_conntrack_put must not be called while holding ip_conntrack_lock since destroy_conntrack takes it again. Signed-off-by: Patrick McHardy --- commit 8ebd6bb0f469f2759f39e73adee6916a3d975393 tree ebd73ee261508d483654416b03610910a2968e21 parent 338fe5c67e8fb799c9e3470331db6f3c60a31b1e author Patrick McHardy Wed, 16 Aug 2006 13:32:27 +0200 committer Patrick McHardy Wed, 16 Aug 2006 13:32:27 +0200 net/ipv4/netfilter/ip_conntrack_netlink.c | 17 +++++++---------- net/netfilter/nf_conntrack_netlink.c | 17 +++++++---------- 2 files changed, 14 insertions(+), 20 deletions(-) diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 33891bb..0d4cc92 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -415,21 +415,18 @@ ctnetlink_dump_table(struct sk_buff *skb cb->args[0], *id); read_lock_bh(&ip_conntrack_lock); + last = (struct ip_conntrack *)cb->args[1]; for (; cb->args[0] < ip_conntrack_htable_size; cb->args[0]++) { restart: - last = (struct ip_conntrack *)cb->args[1]; list_for_each_prev(i, &ip_conntrack_hash[cb->args[0]]) { h = (struct ip_conntrack_tuple_hash *) i; if (DIRECTION(h) != IP_CT_DIR_ORIGINAL) continue; ct = tuplehash_to_ctrack(h); - if (last != NULL) { - if (ct == last) { - ip_conntrack_put(last); - cb->args[1] = 0; - last = NULL; - } else + if (cb->args[1]) { + if (ct != last) continue; + cb->args[1] = 0; } if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, @@ -440,17 +437,17 @@ restart: goto out; } } - if (last != NULL) { - ip_conntrack_put(last); + if (cb->args[1]) { cb->args[1] = 0; goto restart; } } out: read_unlock_bh(&ip_conntrack_lock); + if (last) + ip_conntrack_put(last); DEBUGP("leaving, last bucket=%lu id=%u\n", cb->args[0], *id); - return skb->len; } diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index af48459..6527d4e 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -429,9 +429,9 @@ ctnetlink_dump_table(struct sk_buff *skb cb->args[0], *id); read_lock_bh(&nf_conntrack_lock); + last = (struct nf_conn *)cb->args[1]; for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) { restart: - last = (struct nf_conn *)cb->args[1]; list_for_each_prev(i, &nf_conntrack_hash[cb->args[0]]) { h = (struct nf_conntrack_tuple_hash *) i; if (DIRECTION(h) != IP_CT_DIR_ORIGINAL) @@ -442,13 +442,10 @@ restart: * then dump everything. */ if (l3proto && L3PROTO(ct) != l3proto) continue; - if (last != NULL) { - if (ct == last) { - nf_ct_put(last); - cb->args[1] = 0; - last = NULL; - } else + if (cb->args[1]) { + if (ct != last) continue; + cb->args[1] = 0; } if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, @@ -459,17 +456,17 @@ restart: goto out; } } - if (last != NULL) { - nf_ct_put(last); + if (cb->args[1]) { cb->args[1] = 0; goto restart; } } out: read_unlock_bh(&nf_conntrack_lock); + if (last) + nf_ct_put(last); DEBUGP("leaving, last bucket=%lu id=%u\n", cb->args[0], *id); - return skb->len; } From gozem at gozem.se Wed Aug 16 14:16:53 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Wed Aug 16 14:48:00 2006 Subject: new ABI In-Reply-To: <200608142312.41851.max@nucleus.it> References: <200608142312.41851.max@nucleus.it> Message-ID: <20060816121653.GA31235@kriss.csbnet.se> 2006-08-14 23:12:41+0200, Massimiliano Hofer -> > I think the current array oriented data structures won't allow us to add these > features. RCU lists come to mind. It sure is a step back in performance > (sparser access to memory and more memory fragmentation), but it may not be > that noticeable. > I think every match/target should expode: > - init; > - destroy; > - change; > - dump; > - restore. > match()/target() I had some more realistic ideas that i'd like to share. We keep the idea of rules (rather than essay, compiled or other form). To make the new implementation as easy and as fast as possible i think using XML to express the firewall is a good way to go. Now before you turn on all your negatives, here's why: - We want the old (todays) iptables to be compliant. - We want an easy library that implements the ABI to kernel. - We want to make the smallest possible effort writing a userspace tool. Leaving the more advacned for others / other projects. XML already has good parsers. We can very easily rewrite todays iptables to output XML. XML has no real limits on how to express things. Ofcourse can other future userspace program use the new ABI-library directly, not using the XML-parser. However, for those who doesn't want to learn the new library XML is very easy. Both for humans, scripts and programs. So we need one userspace library that talks the ABI with kernel. We also need a library using this ABI that parses XML-files and passes them to the ABI-library. Finally other userspace tools (that we do not need to write) that dumps from kernel passed by the kernel->ABI->XML and pushes XML rules. You can now easily write pre-parser later that can take firewall config on "programing" form: Example: if (packet.ipv4.source = 1.2.3.4 AND limit(2/s)) then LOG(log-prefix, ...) jump(other-chain) DROP end if; The above might be much easier for a newbie to use. The best thing is that we are pushing the "need" to write these kind of tools to others as seperate projects. The common is XML. No need for complex parsers and trix using getopt(). iptables is now easily ported like: iptables <-> XML <-> ABI <-> kernel Also keep in mind that we can allow several targets. I previously in another mail talked about actions. Its not needed i think, but might make it easier to distringuisch between jumps, ending targets and just changing and logging targets. It should be perfect legal already today to say: iptables -m match -j other_chain -j LOG -j other_chain2 -j DROP -j TTL Now, the last TTL will never be executed, but thats a user config choise. The sematics of the above can't be missunderstood. For kernel-space: ------------------- I think the above is good. Perhaps we don't need restore as it can be done with dump. Only that dump must dump both initial state/config and current state. - init() - destroy() - dump() / restore() - change() - match() / target() Much importat is the change() that for exampel recent match can use to remove or add IPs in any recent list. There are several other matches that can use this. Quota for example. Add or remove bytes in the pot. Far more complex matches like ipset can use this as well. Also important is that easy instancse of a match/target in kernel needs its own memory-space. Either allocated by iptables code or just a pointer-hook where the module can allocate and hook in self. Just like priv_data, but actually saved. I also think RCU-list will help instead of tables/arrays. Its much more common to add/change or remove a rule than add them all. I have a router with some 1000 rules. Its a pain to change on of them. To save cache misses we can preallocate memory and use. Meaning use slabs for each list element. We can't control the code of the modules, but again, we can't today either. Another thing that could use an add is the way of grouping rules. "These set of rules belong to customer 1 and these to customer 2. And i'd like to only list all rules related to customer 1". Now there is two aproches to implement this. First one is to tag the rule with what rule or which group it belongs to. Another way it to create sub tables. Also a good way of "finding your way" to either group of rules is needed. Today i have a router with 4096 IPs (students computers) behind. The IPs all need its own chain of rules. I won't go into why, but trust me, it's needed and the only flexible way that i have found. I have created a sort of binary tree of rules trying to make the access of each customer as painless as possible. This area needs work as well i think. I've seen many people asking if this exists. Simple solution might be to allow custom(?) modules that implemet different forms of jumping. To which group or chain do you want to jump to? Well: iptables -m match -J ipmapjump (notice the big -J) The recent "goto" jump also fits here. Next, try to design this new iptables2/-ng so we don't need iptables3 in the future. Rather add one too many unused hooks, void * passed parameter than one too few. Design this so we can have pkttables. Meaning no need for seperate tables for iptables, ip6tables, arptables, ebtables. All in one. Its not that hard really. Just perhaps a few more tables (nat, mangle, raw, filter, bridge etc.) and move even the basic matching like source and dest address into modules. Summary: + Use XML to express firewall rules. Because its easy and backward compability will be easily ported. It fits both human written and scripted rules. The tool is already there in tons of places. + init, destory, dump/restore, change, match/target is needed as implemented functions (or Nulls) for the match / targets. + Use RCU-list in kernel. Because its more editable. + Have smart ways of allocate memory in kernel (slabs). + Allow sveral targets for one rule. + Perhaps seperate ending targets form non ending and jumps. + Allow customs jump modules, beside match and target modules. + Allow grouping of rules in some way. Really large firewall needs this. + We rather have one too many hooks/void *, unused rather than one too few for futhure use. It won't waste that much memory. + Design all this into pkttables rather than focus on IP/IPv6. Thanks for your time reading this far :-) -- Joakim Axelsson From gozem at gozem.se Wed Aug 16 14:29:41 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Wed Aug 16 15:00:45 2006 Subject: new ABI In-Reply-To: <20060816121653.GA31235@kriss.csbnet.se> References: <200608142312.41851.max@nucleus.it> <20060816121653.GA31235@kriss.csbnet.se> Message-ID: <20060816122941.GB31235@kriss.csbnet.se> 2006-08-16 14:16:53+0200, Joakim Axelsson -> > We keep the idea of rules (rather than essay, compiled or other form). To > make the new implementation as easy and as fast as possible i think using > XML to express the firewall is a good way to go. Now before you turn on all > your negatives, here's why: > > - We want the old (todays) iptables to be compliant. > - We want an easy library that implements the ABI to kernel. > - We want to make the smallest possible effort writing a userspace tool. > Leaving the more advacned for others / other projects. > > XML already has good parsers. We can very easily rewrite todays iptables to > output XML. XML has no real limits on how to express things. Ofcourse can > other future userspace program use the new ABI-library directly, not using > the XML-parser. However, for those who doesn't want to learn the new library > XML is very easy. Both for humans, scripts and programs. > > So we need one userspace library that talks the ABI with kernel. We also > need a library using this ABI that parses XML-files and passes them to the > ABI-library. Finally other userspace tools (that we do not need to write) > that dumps from kernel passed by the kernel->ABI->XML and pushes XML rules. > > iptables is now easily ported like: > iptables <-> XML <-> ABI <-> kernel > Too already demonstrate the greatness of XML; I forgot to add in my previous mail the need/wish for being able to list all state data of one module. For example i have a module just making counters, represented in /proc. I can easy list them all by a simple "grep *" (gives both filename and content). With XML this extra isn't needed. As long as i can get the full kernel config and state in XML, i can apply my favorite XML-parser and figure out the data i need. :-) -- Joakim Axelsson From netfilter at mm-double.de Wed Aug 16 16:29:02 2006 From: netfilter at mm-double.de (Maik Hentsche) Date: Wed Aug 16 17:00:09 2006 Subject: possible Bug in ip_conntrack In-Reply-To: <44E301FA.9020608@trash.net> References: <1155545885.44e03b1dcec01@www.domainfactory-webmail.de> <44E0719C.9050307@trash.net> <1155630570.44e185ea45986@www.domainfactory-webmail.de> <44E301FA.9020608@trash.net> Message-ID: <1155738542.44e32baec1521@www.domainfactory-webmail.de> Zitat von Patrick McHardy : > Can you try this patch please? It should fix the problem. Done. The bug did not reappear. Thanks for your fast reaction. so long Maik From gozem at gozem.se Wed Aug 16 16:40:19 2006 From: gozem at gozem.se (Joakim Axelsson) Date: Wed Aug 16 17:11:21 2006 Subject: new ABI In-Reply-To: <20060816121653.GA31235@kriss.csbnet.se> References: <200608142312.41851.max@nucleus.it> <20060816121653.GA31235@kriss.csbnet.se> Message-ID: <20060816144019.GC31235@kriss.csbnet.se> And some more. About logging, debug and hit counters. I think we can remove the hits counters. Allow a seperate module to count if needed. I'm sure most firewalls do not need the counters on every rule. It's just an expensive waste of locking in the kernel. Also, people will want to log here and there. Both for the pure logging purpose but also for the debugging purpose, "does my firewall work?". I think it would be easier to allow each rule to have three flags for debugging purpose. Log on entering the rule, Log on matching the rule and Log on leaving the rule (after targets). This makes it very easy to trace your firewall config. This config should of couse be able to change without having to remove and readd the rule without debugging later. For general logging (and debugging) we should remove -j LOG. The parseing of the packet layout is something for userspace. Debugging only has a "reserved" netlink channel. There is one set back doing this. If the machine gets DoS-Attacked. All the logging will be more or less disabled as the kernel uses all of the available CPU and you get nothing to try to figure out the attack-vector (for counter firewall rules). -- Joakim Axelsson From bsnyder at idirect.net Wed Aug 16 17:20:15 2006 From: bsnyder at idirect.net (Snyder, Brian) Date: Wed Aug 16 17:54:22 2006 Subject: Question from a newbie about libraries needed to get started Message-ID: Hi all. Thanks in advance to anyone whom takes the time to read this (and hopefully replies;). Anyway, I'm a new developer in the netfilter world and have a few basic setup questions. I've got a good handle on iptables and how to utilize that to get desired packets queued into a userspace program. My problem lies with how to go about developing said user space program. I'm running on a fedora core 5 box and I have iptables,iproute2,brctl installed by default - so I didnt have to go out of my way to get any user space applications. My problem is I don't have libipq.a or libipq.h anywhere in my system. So i went about searching for it , and here is where I am very confused. Via tons of online faqs and howtos I see references to the following all of which i've downloaded: libnetfilter_queue-0.0.12 libnfnetlink-0.0.16 and iptables-1.3.5-20060815 (yesterday's nightly snapshot) I guess I am just a bit confused as to what the libnetfilter_queue and libnfnetlink libraries are for? If the libipq is part of iptables as well - what do we need those other libraries for? I feel like i'm missing something fundamental here... So it seems like there are a couple of ways to get libipq on a development system and I'm not sure which package i actually want to use and install. I already have iptables, but I can always 'redo' that if i need to in order to get teh development libraries/headers needed. Anyway, from a sys-admin point is there any advice anyone could give me as to what packages/rpms/etc I actually want to install to get a development system going? Thanks again, brian From kaber at trash.net Thu Aug 17 00:10:22 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Aug 17 00:41:35 2006 Subject: [NETFILTER]: ctnetlink: fix deadlock in table dumping Message-ID: <44E397CE.3030808@trash.net> Fix a deadlock in ctnetlink (introduced early in 2.6.18-rc by myself). Please apply to 2.6.18. -------------- next part -------------- [NETFILTER]: ctnetlink: fix deadlock in table dumping ip_conntrack_put must not be called while holding ip_conntrack_lock since destroy_conntrack takes it again. Signed-off-by: Patrick McHardy --- commit 8ebd6bb0f469f2759f39e73adee6916a3d975393 tree ebd73ee261508d483654416b03610910a2968e21 parent 338fe5c67e8fb799c9e3470331db6f3c60a31b1e author Patrick McHardy Wed, 16 Aug 2006 13:32:27 +0200 committer Patrick McHardy Wed, 16 Aug 2006 13:32:27 +0200 net/ipv4/netfilter/ip_conntrack_netlink.c | 17 +++++++---------- net/netfilter/nf_conntrack_netlink.c | 17 +++++++---------- 2 files changed, 14 insertions(+), 20 deletions(-) diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 33891bb..0d4cc92 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -415,21 +415,18 @@ ctnetlink_dump_table(struct sk_buff *skb cb->args[0], *id); read_lock_bh(&ip_conntrack_lock); + last = (struct ip_conntrack *)cb->args[1]; for (; cb->args[0] < ip_conntrack_htable_size; cb->args[0]++) { restart: - last = (struct ip_conntrack *)cb->args[1]; list_for_each_prev(i, &ip_conntrack_hash[cb->args[0]]) { h = (struct ip_conntrack_tuple_hash *) i; if (DIRECTION(h) != IP_CT_DIR_ORIGINAL) continue; ct = tuplehash_to_ctrack(h); - if (last != NULL) { - if (ct == last) { - ip_conntrack_put(last); - cb->args[1] = 0; - last = NULL; - } else + if (cb->args[1]) { + if (ct != last) continue; + cb->args[1] = 0; } if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, @@ -440,17 +437,17 @@ restart: goto out; } } - if (last != NULL) { - ip_conntrack_put(last); + if (cb->args[1]) { cb->args[1] = 0; goto restart; } } out: read_unlock_bh(&ip_conntrack_lock); + if (last) + ip_conntrack_put(last); DEBUGP("leaving, last bucket=%lu id=%u\n", cb->args[0], *id); - return skb->len; } diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index af48459..6527d4e 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -429,9 +429,9 @@ ctnetlink_dump_table(struct sk_buff *skb cb->args[0], *id); read_lock_bh(&nf_conntrack_lock); + last = (struct nf_conn *)cb->args[1]; for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) { restart: - last = (struct nf_conn *)cb->args[1]; list_for_each_prev(i, &nf_conntrack_hash[cb->args[0]]) { h = (struct nf_conntrack_tuple_hash *) i; if (DIRECTION(h) != IP_CT_DIR_ORIGINAL) @@ -442,13 +442,10 @@ restart: * then dump everything. */ if (l3proto && L3PROTO(ct) != l3proto) continue; - if (last != NULL) { - if (ct == last) { - nf_ct_put(last); - cb->args[1] = 0; - last = NULL; - } else + if (cb->args[1]) { + if (ct != last) continue; + cb->args[1] = 0; } if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, @@ -459,17 +456,17 @@ restart: goto out; } } - if (last != NULL) { - nf_ct_put(last); + if (cb->args[1]) { cb->args[1] = 0; goto restart; } } out: read_unlock_bh(&nf_conntrack_lock); + if (last) + nf_ct_put(last); DEBUGP("leaving, last bucket=%lu id=%u\n", cb->args[0], *id); - return skb->len; } From mbr at cipherdyne.org Thu Aug 17 05:16:07 2006 From: mbr at cipherdyne.org (Michael Rash) Date: Thu Aug 17 05:47:21 2006 Subject: [PATCH] Boyer Moore textsearch bug fix Message-ID: <20060817031607.GA7484@minastirith> Hi - The patch below fixes Bugzilla #501: The compute_prefix_tbl() function in lib/ts_bm.c is called before bm->pattern is initialized, and this results in the following issue. If the rule below is put within the OUTPUT chain (note the slightly repetitive pattern "aaabbbccc" which I think is necessary to expose the fact that the good_shift array is not getting populated correctly): iptables -I OUTPUT -p tcp --dport 80 -m string --string "aaabbbccc" \ --algo bm -j LOG --log-prefix "bm " ...then the issuing the following commands fail to match the rule (no log message is generated): echo "1aaabbbccc" |nc 80 echo "12aaabbbccc" |nc 80 echo "1234aaabbbccc" |nc 80 ...but these do match: echo "aaabbbccc" |nc 80 echo "123aaabbbccc" |nc 80 -- Michael Rash http://www.cipherdyne.org/ Key fingerprint = 53EA 13EA 472E 3771 894F AC69 95D8 5D6B A742 839F --- linux-2.6.17.8/lib/ts_bm.c.orig 2006-08-16 21:17:38.000000000 -0400 +++ linux-2.6.17.8/lib/ts_bm.c 2006-08-16 21:17:56.000000000 -0400 @@ -151,8 +151,8 @@ bm = ts_config_priv(conf); bm->patlen = len; bm->pattern = (u8 *) bm->good_shift + prefix_tbl_len; - compute_prefix_tbl(bm, pattern, len); memcpy(bm->pattern, pattern, len); + compute_prefix_tbl(bm, pattern, len); return conf; } From kaber at trash.net Thu Aug 17 15:39:14 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Aug 17 16:10:36 2006 Subject: [PATCH] Boyer Moore textsearch bug fix In-Reply-To: <20060817031607.GA7484@minastirith> References: <20060817031607.GA7484@minastirith> Message-ID: <44E47182.7010901@trash.net> Michael Rash wrote: > --- linux-2.6.17.8/lib/ts_bm.c.orig 2006-08-16 21:17:38.000000000 -0400 > +++ linux-2.6.17.8/lib/ts_bm.c 2006-08-16 21:17:56.000000000 -0400 > @@ -151,8 +151,8 @@ > bm = ts_config_priv(conf); > bm->patlen = len; > bm->pattern = (u8 *) bm->good_shift + prefix_tbl_len; > - compute_prefix_tbl(bm, pattern, len); > memcpy(bm->pattern, pattern, len); > + compute_prefix_tbl(bm, pattern, len); Good catch, thanks. But since both pattern and len are also passed to compute_prefix_tbl as arguments, I think we should either make it use only those arguments, or remove those arguments and use only the values from struct ts_bm. Please send a new patch and add a Signed-off-by line so I can apply it. Thanks. From pablo at netfilter.org Thu Aug 17 16:25:23 2006 From: pablo at netfilter.org (Pablo Neira Ayuso) Date: Thu Aug 17 16:51:13 2006 Subject: [PATCH] Boyer Moore textsearch bug fix In-Reply-To: <44E47182.7010901@trash.net> References: <20060817031607.GA7484@minastirith> <44E47182.7010901@trash.net> Message-ID: <44E47C53.3010803@netfilter.org> Patrick McHardy wrote: > Michael Rash wrote: > >>--- linux-2.6.17.8/lib/ts_bm.c.orig 2006-08-16 21:17:38.000000000 -0400 >>+++ linux-2.6.17.8/lib/ts_bm.c 2006-08-16 21:17:56.000000000 -0400 >>@@ -151,8 +151,8 @@ >> bm = ts_config_priv(conf); >> bm->patlen = len; >> bm->pattern = (u8 *) bm->good_shift + prefix_tbl_len; >>- compute_prefix_tbl(bm, pattern, len); >> memcpy(bm->pattern, pattern, len); >>+ compute_prefix_tbl(bm, pattern, len); > > > > Good catch, thanks. But since both pattern and len are also passed > to compute_prefix_tbl as arguments, I think we should either make > it use only those arguments, or remove those arguments and use only > the values from struct ts_bm. Damn, that is my fault, thanks for the catch Michael. > Please send a new patch and add a Signed-off-by line so I can > apply it. Thanks. Patrick, do you plan to pass this patch to -stable as well? -- The dawn of the fourth age of Linux firewalling is coming; a time of great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris From pablo at netfilter.org Thu Aug 17 16:28:26 2006 From: pablo at netfilter.org (Pablo Neira Ayuso) Date: Thu Aug 17 16:54:17 2006 Subject: [RFC,ANNOUNCE] conntrack daemon (stateful replication) In-Reply-To: <20060811083020.GW29423@edu.joroinen.fi> References: <447A1FB7.5080709@netfilter.org> <20060530065804.GA24166@kruemel.my-eitzenberger.de> <20060811083020.GW29423@edu.joroinen.fi> Message-ID: <44E47D0A.7090900@netfilter.org> Pasi K?rkk?inen wrote: > Hello! > > Any updates to these projects? Files available somewhere? I think many > people from this list would like to test and help with these daemons.. The only thing available at the moment: http://people.netfilter.org/pablo/conntrackd/ I know, this requires an appropiate documentation and a webpage but I'm working on it, any help on the webpage (something simple) could come handy. -- The dawn of the fourth age of Linux firewalling is coming; a time of great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris From kaber at trash.net Thu Aug 17 16:27:45 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Aug 17 17:01:02 2006 Subject: [PATCH] Boyer Moore textsearch bug fix In-Reply-To: <44E47C53.3010803@netfilter.org> References: <20060817031607.GA7484@minastirith> <44E47182.7010901@trash.net> <44E47C53.3010803@netfilter.org> Message-ID: <44E47CE1.4040507@trash.net> Pablo Neira Ayuso wrote: > Patrick, do you plan to pass this patch to -stable as well? Not yet, but I can do that. From mbr at cipherdyne.org Thu Aug 17 16:46:54 2006 From: mbr at cipherdyne.org (Michael Rash) Date: Thu Aug 17 17:18:06 2006 Subject: [PATCH] Boyer Moore textsearch bug fix In-Reply-To: <44E47182.7010901@trash.net> References: <20060817031607.GA7484@minastirith> <44E47182.7010901@trash.net> Message-ID: <20060817144654.GA7849@minastirith> On Aug 17, 2006, Patrick McHardy wrote: > Michael Rash wrote: > > --- linux-2.6.17.8/lib/ts_bm.c.orig 2006-08-16 21:17:38.000000000 -0400 > > +++ linux-2.6.17.8/lib/ts_bm.c 2006-08-16 21:17:56.000000000 -0400 > > @@ -151,8 +151,8 @@ > > bm = ts_config_priv(conf); > > bm->patlen = len; > > bm->pattern = (u8 *) bm->good_shift + prefix_tbl_len; > > - compute_prefix_tbl(bm, pattern, len); > > memcpy(bm->pattern, pattern, len); > > + compute_prefix_tbl(bm, pattern, len); > > > Good catch, thanks. But since both pattern and len are also passed > to compute_prefix_tbl as arguments, I think we should either make > it use only those arguments, or remove those arguments and use only > the values from struct ts_bm. > > Please send a new patch and add a Signed-off-by line so I can > apply it. Thanks. Got it. Here is a new patch. Thanks. --Mike Signed-off-by: Michael Rash --- linux-2.6.17.8/lib/ts_bm.c.orig 2006-08-16 21:17:38.000000000 -0400 +++ linux-2.6.17.8/lib/ts_bm.c 2006-08-17 10:35:25.000000000 -0400 @@ -112,15 +112,14 @@ return ret; } -static void compute_prefix_tbl(struct ts_bm *bm, const u8 *pattern, - unsigned int len) +static void compute_prefix_tbl(struct ts_bm *bm) { int i, j, g; for (i = 0; i < ASIZE; i++) - bm->bad_shift[i] = len; - for (i = 0; i < len - 1; i++) - bm->bad_shift[pattern[i]] = len - 1 - i; + bm->bad_shift[i] = bm->patlen; + for (i = 0; i < bm->patlen - 1; i++) + bm->bad_shift[bm->pattern[i]] = bm->patlen - 1 - i; /* Compute the good shift array, used to match reocurrences * of a subpattern */ @@ -151,8 +150,8 @@ bm = ts_config_priv(conf); bm->patlen = len; bm->pattern = (u8 *) bm->good_shift + prefix_tbl_len; - compute_prefix_tbl(bm, pattern, len); memcpy(bm->pattern, pattern, len); + compute_prefix_tbl(bm); return conf; } From kaber at trash.net Thu Aug 17 17:43:37 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Aug 17 18:34:48 2006 Subject: [PATCH] Boyer Moore textsearch bug fix In-Reply-To: <20060817144654.GA7849@minastirith> References: <20060817031607.GA7484@minastirith> <44E47182.7010901@trash.net> <20060817144654.GA7849@minastirith> Message-ID: <44E48EA9.8070908@trash.net> Michael Rash wrote: > Got it. Here is a new patch. Thanks. Applied, thanks Michael. From azez at ufomechanic.net Thu Aug 17 18:28:01 2006 From: azez at ufomechanic.net (Amin Azez) Date: Thu Aug 17 18:59:24 2006 Subject: lots of oopses Message-ID: <44E49911.6080007@ufomechanic.net> Cc: l7-filter-developers and netfilter-devel I'm getting lots of oopses with 2.6.17.1 SMP, on a 2 cpu (4 core) machine testing layer7 at 90Mb/s. (The oops dump worries me, maybe I'm reading it wrong but it seems to a two column dump of 2 threads, but I have 4 cores) I also think these are layer7 related because I can't get them to occur without a layer7 rule in iptables, although the oops doesn't always occur in the layer7 module. I also suspect it is related to conntrack handling. I have pablos recent set of 8 conntrack patches applied, as well as the recent layer7 regex-smp-safeness and the layer7 concurrency patch. There are also quite a few other patches for iptables modules not used in this example ruleset. In the later oops I have reduced the number of loaded modules significantly and still obtain the oops; although the examples here do have all the modules loaded. I generate the oops using packETH and udp with randomized source addresses to maximize conntrack creation My iptables-save is: # Generated by iptables-save v1.3.5-20060629 on Tue Aug 15 10:49:39 2006 *raw :PREROUTING ACCEPT [938:92791] :OUTPUT ACCEPT [848:491202] COMMIT # Completed on Tue Aug 15 10:49:39 2006 # Generated by iptables-save v1.3.5-20060629 on Tue Aug 15 10:49:39 2006 *nat :PREROUTING ACCEPT [125:17240] :POSTROUTING ACCEPT [6:372] :OUTPUT ACCEPT [6:372] COMMIT # Completed on Tue Aug 15 10:49:39 2006 # Generated by iptables-save v1.3.5-20060629 on Tue Aug 15 10:49:39 2006 *mangle :PREROUTING ACCEPT [938:92791] :INPUT ACCEPT [851:80777] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [848:491202] :POSTROUTING ACCEPT [848:491202] -A FORWARD -m layer7 --l7proto edonkey -j RETURN COMMIT # Completed on Tue Aug 15 10:49:39 2006 # Generated by iptables-save v1.3.5-20060629 on Tue Aug 15 10:49:39 2006 *filter :INPUT ACCEPT [851:80777] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [848:491202] COMMIT # Completed on Tue Aug 15 10:49:39 2006 The most recent oops where I forgot to unload most of the modules, but was running my layer7 load monitor crashed at a rule-add just after a conntrack flush. Layer7 is clearly implicated in this one. There is another oops further down where this is not the case. + iptables -t mangle -D FORWARD 1 + conntrack -F + iptables -t mangle -A FORWARD [17179991.580000] Oops: 0000 [#1] [17179991.580000] SMP [17179991.580000] Modules linked in: ipt_vlan ipt_connrate ebt_mark ebtable_nat ebtable_filter ebtable_broute ebtables ipt_ULOG ipt_ttl ipt_TOS ipt_tos ipt_TCPMSS ipt_SAME ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_hashlimit ipt_ECN ipt_ecn ipt_DSCP ipt_dscp ipt_ah ipt_addrtype iptable_raw iptable_nat iptable_mangle iptable_filter ip_tables ip_queue cls_fw sch_prio sch_sfq sch_htb crc32 cls_u32 8021q bridge llc ipt_condition ipt_account ipt_recent tg3 e1000 e100 mii ip_conntrack_netlink ip_nat ipt_SET xt_tcpudp xt_tcpmss xt_string xt_state xt_sctp xt_realm xt_policy xt_pkttype xt_physdev xt_multiport xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_dccp xt_conntrack xt_connmark xt_connbytes xt_comment xt_NOTRACK xt_NFQUEUE xt_MARK xt_CONNMARK ip_conntrack nfnetlink xt_CLASSIFY ipt_layer7 ipt_time ipt_set x_tables ip_set_portmap ip_set_nethash ip_set_macipmap ip_set_iptree ip_set_ipporthash ip_set_ipmap ip_set_iphash ip_set ata_piix libata sd_mod scsi_mod [17179991.580000] CPU: 0 [17179991.580000] EIP: 0060:[] Not tainted VLI [17179991.580000] EFLAGS: 00010202 (2.6.17.1-smp-dbamK #34) [17179991.580000] EIP is at match+0x154/0x30a [ipt_layer7] [17179991.580000] eax: 0000021c ebx: e84de9e8 ecx: 0000021c edx: e84de9e8 [17179991.580000] esi: f8b651b8 edi: f88dec96 ebp: f8b651b8 esp: f6053b84 [17179991.580000] ds: 007b es: 007b ss: 0068 [17179991.580000] Process conntrack (pid: 3067, threadinfo=f6052000 task=f7b39540) [17179991.580000] Stack: e84de9e8 ecca0c3c 0000021c 00000000 00000000 f5db2780 e3e912e4 f8b65198 [17179991.580000] f654a000 f8b65128 00000070 f8a2b261 f54fe880 f654a000 f654a000 f88e03a0 [17179991.580000] f8b651b8 00000000 00000014 f6053bf4 00000000 f895e9cc f8b69498 f8b65000 [17179991.580000] Call Trace: [17179991.580000] ipt_do_table+0x239/0x4cd [ip_tables] ipt_route_hook+0x37/0x3b [iptable_mangle] [17179991.580000] nf_iterate+0x6f/0xaa br_nf_forward_finish+0x0/0x106 [bridge] [17179991.580000] nf_hook_slow+0x6b/0xf7 br_nf_forward_finish+0x0/0x106 [bridge] [17179991.580000] br_nf_forward_ip+0xe0/0x173 [bridge] br_nf_forward_finish+0x0/0x106 [bridge] [17179991.580000] nf_iterate+0x6f/0xaa br_forward_finish+0x0/0x5b [bridge] [17179991.580000] nf_hook_slow+0x6b/0xf7 br_forward_finish+0x0/0x5b [bridge] [17179991.580000] __br_forward+0x57/0x6e [bridge] br_forward_finish+0x0/0x5b [bridge] [17179991.580000] br_handle_frame_finish+0xe8/0x138 [bridge] br_nf_pre_routing_finish+0x135/0x322 [bridge] [17179991.580000] br_handle_frame_finish+0x0/0x138 [bridge] ip_nat_in+0x43/0xb0 [iptable_nat] [17179991.580000] br_nf_pre_routing_finish+0x0/0x322 [bridge] nf_iterate+0x6f/0xaa [17179991.580000] br_nf_pre_routing_finish+0x0/0x322 [bridge] nf_hook_slow+0x6b/0xf7 [17179991.580000] br_nf_pre_routing_finish+0x0/0x322 [bridge] br_nf_pre_routing+0x264/0x3e4 [bridge] [17179991.580000] br_nf_pre_routing_finish+0x0/0x322 [bridge] nf_iterate+0x6f/0xaa [17179991.580000] br_handle_frame_finish+0x0/0x138 [bridge] nf_hook_slow+0x6b/0xf7 [17179991.580000] br_handle_frame_finish+0x0/0x138 [bridge] br_handle_frame+0x122/0x1d5 [bridge] [17179991.580000] br_handle_frame_finish+0x0/0x138 [bridge] netif_receive_skb+0x1b8/0x3ee [17179991.580000] process_backlog+0x84/0x109 net_rx_action+0x8e/0x15f [17179991.580000] __do_softirq+0xc2/0xd4 do_softirq+0x32/0x34 [17179991.580000] do_IRQ+0x3b/0x66 common_interrupt+0x1a/0x20 [17179991.580000] Code: 98 03 8e f8 00 00 00 01 8b 83 58 01 00 00 85 c0 0f 84 31 01 00 00 8b 54 24 30 80 7a 30 00 0f 84 ee 00 00 00 bf 96 ec 8d f8 89 ee ae 75 08 84 c0 75 f8 31 c0 eb 04 19 c0 0c 01 85 c0 c7 44 24 [17179991.580000] EIP: [] match+0x154/0x30a [ipt_layer7] SS:ESP 0068:f6053b84 [17179991.580000] <0>Kernel panic - not syncing: Fatal exception in interrupt [17179991.580000] Layer7 is not obviously implicated in this one, but I think it is related. [17183769.024000] Oops: 0000 [#1] [17183769.024000] SMP [17183769.024000] Modules linked in: ipt_vlan ipt_connrate ebt_mark ebtable_nat ebtable_filter ebtable_broute ebtables ipt_ULOG ipt_ttl ipt_TOS ipt_tos ipt_TCPMSS ipt_SAME ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_hashlimit ipt_ECN ipt_ecn ipt_DSCP ipt_dscp ipt_ah ipt_addrtype iptable_raw iptable_nat iptable_mangle iptable_filter ip_tables ip_queue cls_fw sch_prio sch_sfq sch_htb crc32 cls_u32 8021q bridge llc ipt_condition ipt_account ipt_recent tg3 e1000 e100 mii ip_conntrack_netlink ip_nat ipt_SET xt_tcpudp xt_tcpmss xt_string xt_state xt_sctp xt_realm xt_policy xt_pkttype xt_physdev xt_multiport xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_dccp xt_conntrack xt_connmark xt_connbytes xt_comment xt_NOTRACK xt_NFQUEUE xt_MARK xt_CONNMARK ip_conntrack nfnetlink xt_CLASSIFY ipt_layer7 ipt_time ipt_set x_tables ip_set_portmap ip_set_nethash ip_set_macipmap ip_set_iptree ip_set_ipporthash ip_set_ipmap ip_set_iphash ip_set ata_piix libata sd_mod scsi_mod [17183769.024000] CPU: 0 [17183769.024000] EIP: 0060:[] Not tainted VLI [17183769.024000] EFLAGS: 00010286 (2.6.17.1-smp-dbamK #34) [17183769.024000] EIP is at ipt_do_table+0xd1/0x4cd [ip_tables] [17183769.024000] eax: f8b8e2e8 ebx: f8ba5b48 ecx: 00000000 edx: 00000001 [17183769.024000] esi: f700d000 edi: f8ba7c90 ebp: 00000070 esp: f7e87bb4 [17183769.024000] ds: 007b es: 007b ss: 0068 [17183769.024000] Process ksoftirqd/0 (pid: 3, threadinfo=f7e86000 task=f7e1a070) [17183769.024000] Stack: f4ff9a80 f700d000 f700d000 f88e03a0 f8ba5b68 00000000 00000014 f7e87bf4 [17183769.024000] 00000000 f895e9cc f8b8e2e8 f8b8e000 f700d000 f700d000 00000000 e1e56420 [17183769.024000] 00000000 f895ea18 f7e87c74 80000000 c04ba010 f895e037 f7e87cac 00000002 [17183769.024000] Call Trace: [17183769.024000] ipt_route_hook+0x37/0x3b [iptable_mangle] nf_iterate+0x6f/0xaa [17183769.024000] br_nf_forward_finish+0x0/0x106 [bridge] nf_hook_slow+0x6b/0xf7 [17183769.024000] br_nf_forward_finish+0x0/0x106 [bridge] br_nf_forward_ip+0xe0/0x173 [bridge] [17183769.024000] br_nf_forward_finish+0x0/0x106 [bridge] nf_iterate+0x6f/0xaa [17183769.024000] br_forward_finish+0x0/0x5b [bridge] nf_hook_slow+0x6b/0xf7 [17183769.024000] br_forward_finish+0x0/0x5b [bridge] __br_forward+0x57/0x6e [bridge] [17183769.024000] br_forward_finish+0x0/0x5b [bridge] br_handle_frame_finish+0xe8/0x138 [bridge] [17183769.024000] br_nf_pre_routing_finish+0x135/0x322 [bridge] br_handle_frame_finish+0x0/0x138 [bridge] [17183769.024000] ip_nat_in+0x43/0xb0 [iptable_nat] br_nf_pre_routing_finish+0x0/0x322 [bridge] [17183769.024000] nf_iterate+0x6f/0xaa br_nf_pre_routing_finish+0x0/0x322 [bridge] [17183769.024000] nf_hook_slow+0x6b/0xf7 br_nf_pre_routing_finish+0x0/0x322 [bridge] [17183769.024000] br_nf_pre_routing+0x264/0x3e4 [bridge] br_nf_pre_routing_finish+0x0/0x322 [bridge] [17183769.024000] nf_iterate+0x6f/0xaa br_handle_frame_finish+0x0/0x138 [bridge] [17183769.024000] nf_hook_slow+0x6b/0xf7 br_handle_frame_finish+0x0/0x138 [bridge] [17183769.024000] br_handle_frame+0x122/0x1d5 [bridge] br_handle_frame_finish+0x0/0x138 [bridge] [17183769.024000] netif_receive_skb+0x1b8/0x3ee process_backlog+0x84/0x109 [17183769.024000] net_rx_action+0x8e/0x15f __do_softirq+0xc2/0xd4 [17183769.024000] ksoftirqd+0x0/0xbd do_softirq+0x32/0x34 [17183769.024000] ksoftirqd+0x7b/0xbd kthread+0xb7/0xbd [17183769.024000] kthread+0x0/0xbd kernel_thread_helper+0x5/0xb [17183769.024000] Code: 44 24 5c 8b 54 24 2c 03 7c 86 0c 03 54 86 20 89 6c 24 20 89 54 24 28 85 ff 0f 84 b6 03 00 00 8b 44 24 28 85 c0 0f 84 81 03 00 00 <0f> b6 5f 53 89 d8 24 08 84 c0 0f 84 5e 03 00 00 8b 47 08 8b 4c [17183769.024000] EIP: [] ipt_do_table+0xd1/0x4cd [ip_tables] SS:ESP 0068:f7e87bb4 [17183769.024000] <0>Kernel panic - not syncing: Fatal exception in interrupt [17183769.024000] (Is this stackdump one or two threads?) I'm doing more tests to locate the cause. Sam From kaber at trash.net Thu Aug 17 19:04:43 2006 From: kaber at trash.net (Patrick McHardy) Date: Thu Aug 17 19:58:08 2006 Subject: lots of oopses In-Reply-To: <44E49911.6080007@ufomechanic.net> References: <44E49911.6080007@ufomechanic.net> Message-ID: <44E4A1AB.3000001@trash.net> Amin Azez wrote: > The most recent oops where I forgot to unload most of the modules, but > was running my layer7 load monitor crashed at a rule-add just after a > conntrack flush. Layer7 is clearly implicated in this one. There is > another oops further down where this is not the case. The second one looks like a race in ipt_tables when changing the ruleset (the attached patch should fix that). The first one looks like a l7 bug. BTW, what is ipt_vlan? -------------- next part -------------- [NETFILTER]: ip_tables: fix table locking in ipt_do_table table->private might change because of ruleset changes, don't use it without holding the lock. Signed-off-by: Patrick McHardy --- commit 338fe5c67e8fb799c9e3470331db6f3c60a31b1e tree 2dc15d63244ed18a8035ae483ae2d722e7fbcf62 parent 32ce9bc41528c327b1353713b2108d2213128dee author Patrick McHardy Tue, 15 Aug 2006 16:06:57 +0200 committer Patrick McHardy Tue, 15 Aug 2006 16:06:57 +0200 net/ipv4/netfilter/arp_tables.c | 3 ++- net/ipv4/netfilter/ip_tables.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index df4854c..8d1d7a6 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -236,7 +236,7 @@ unsigned int arpt_do_table(struct sk_buf struct arpt_entry *e, *back; const char *indev, *outdev; void *table_base; - struct xt_table_info *private = table->private; + struct xt_table_info *private; /* ARP header, plus 2 device addresses, plus 2 IP addresses. */ if (!pskb_may_pull((*pskb), (sizeof(struct arphdr) + @@ -248,6 +248,7 @@ unsigned int arpt_do_table(struct sk_buf outdev = out ? out->name : nulldevname; read_lock_bh(&table->lock); + private = table->private; table_base = (void *)private->entries[smp_processor_id()]; e = get_entry(table_base, private->hook_entry[hook]); back = get_entry(table_base, private->underflow[hook]); diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index f316ff5..048514f 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -230,7 +230,7 @@ ipt_do_table(struct sk_buff **pskb, const char *indev, *outdev; void *table_base; struct ipt_entry *e, *back; - struct xt_table_info *private = table->private; + struct xt_table_info *private; /* Initialization */ ip = (*pskb)->nh.iph; @@ -247,6 +247,7 @@ ipt_do_table(struct sk_buff **pskb, read_lock_bh(&table->lock); IP_NF_ASSERT(table->valid_hooks & (1 << hook)); + private = table->private; table_base = (void *)private->entries[smp_processor_id()]; e = get_entry(table_base, private->hook_entry[hook]); From davem at davemloft.net Fri Aug 18 03:13:00 2006 From: davem at davemloft.net (David Miller) Date: Fri Aug 18 03:44:07 2006 Subject: [NETFILTER]: ctnetlink: fix deadlock in table dumping In-Reply-To: <44E397CE.3030808@trash.net> References: <44E397CE.3030808@trash.net> Message-ID: <20060817.181300.104034404.davem@davemloft.net> From: Patrick McHardy Date: Thu, 17 Aug 2006 00:10:22 +0200 > Fix a deadlock in ctnetlink (introduced early in 2.6.18-rc by myself). > Please apply to 2.6.18. Applied, thanks Patrick. From azez at ufomechanic.net Fri Aug 18 09:33:13 2006 From: azez at ufomechanic.net (Amin Azez) Date: Fri Aug 18 10:01:29 2006 Subject: lots of oopses In-Reply-To: <44E4A1AB.3000001@trash.net> References: <44E49911.6080007@ufomechanic.net> <44E4A1AB.3000001@trash.net> Message-ID: <44E56D39.50806@ufomechanic.net> Patrick McHardy wrote: > Amin Azez wrote: > >> The most recent oops where I forgot to unload most of the modules, but >> was running my layer7 load monitor crashed at a rule-add just after a >> conntrack flush. Layer7 is clearly implicated in this one. There is >> another oops further down where this is not the case. >> > > The second one looks like a race in ipt_tables when changing the > ruleset (the attached patch should fix that). The first one looks > like a l7 bug. > > Thanks Patrick, you are top. > BTW, what is ipt_vlan? > I think I posted it here a year ago, but I'll do so again if you want it. It matches on vlan-id. It was said that strictly this is a layer 2 thing and not for iptables; I find it useful though;- which iptables rules should be applied may depend on vlan stuff, and sometimes it seems like there isn't enough mark to go around... I like the iptables/ebtables seperation but sometimes it seems like they should be able to share each-others matches, like one big happy table with a few extra points of inspection. Anyway... thanks again. Sam > ------------------------------------------------------------------------ > > [NETFILTER]: ip_tables: fix table locking in ipt_do_table > > table->private might change because of ruleset changes, don't use it without > holding the lock. > > Signed-off-by: Patrick McHardy > > --- > commit 338fe5c67e8fb799c9e3470331db6f3c60a31b1e > tree 2dc15d63244ed18a8035ae483ae2d722e7fbcf62 > parent 32ce9bc41528c327b1353713b2108d2213128dee > author Patrick McHardy Tue, 15 Aug 2006 16:06:57 +0200 > committer Patrick McHardy Tue, 15 Aug 2006 16:06:57 +0200 > > net/ipv4/netfilter/arp_tables.c | 3 ++- > net/ipv4/netfilter/ip_tables.c | 3 ++- > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c > index df4854c..8d1d7a6 100644 > --- a/net/ipv4/netfilter/arp_tables.c > +++ b/net/ipv4/netfilter/arp_tables.c > @@ -236,7 +236,7 @@ unsigned int arpt_do_table(struct sk_buf > struct arpt_entry *e, *back; > const char *indev, *outdev; > void *table_base; > - struct xt_table_info *private = table->private; > + struct xt_table_info *private; > > /* ARP header, plus 2 device addresses, plus 2 IP addresses. */ > if (!pskb_may_pull((*pskb), (sizeof(struct arphdr) + > @@ -248,6 +248,7 @@ unsigned int arpt_do_table(struct sk_buf > outdev = out ? out->name : nulldevname; > > read_lock_bh(&table->lock); > + private = table->private; > table_base = (void *)private->entries[smp_processor_id()]; > e = get_entry(table_base, private->hook_entry[hook]); > back = get_entry(table_base, private->underflow[hook]); > diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c > index f316ff5..048514f 100644 > --- a/net/ipv4/netfilter/ip_tables.c > +++ b/net/ipv4/netfilter/ip_tables.c > @@ -230,7 +230,7 @@ ipt_do_table(struct sk_buff **pskb, > const char *indev, *outdev; > void *table_base; > struct ipt_entry *e, *back; > - struct xt_table_info *private = table->private; > + struct xt_table_info *private; > > /* Initialization */ > ip = (*pskb)->nh.iph; > @@ -247,6 +247,7 @@ ipt_do_table(struct sk_buff **pskb, > > read_lock_bh(&table->lock); > IP_NF_ASSERT(table->valid_hooks & (1 << hook)); > + private = table->private; > table_base = (void *)private->entries[smp_processor_id()]; > e = get_entry(table_base, private->hook_entry[hook]); > > From amieres at eneotecnologia.com Fri Aug 18 14:18:27 2006 From: amieres at eneotecnologia.com (Angel Mieres) Date: Fri Aug 18 14:50:13 2006 Subject: [RFC,ANNOUNCE] conntrack daemon (stateful replication) In-Reply-To: <44E47D0A.7090900@netfilter.org> References: <447A1FB7.5080709@netfilter.org> <20060530065804.GA24166@kruemel.my-eitzenberger.de> <20060811083020.GW29423@edu.joroinen.fi> <44E47D0A.7090900@netfilter.org> Message-ID: <1155903507.19082.25.camel@supercoco> Hi Pasi, Im honored to told you that advance in this project is positive and im sure netfilter(with Pablo leading) are doing all they can. I have the chance to test conntrackd and i will be pleased to help you to test it ;) Best Regards, Angel M. El jue, 17-08-2006 a las 16:28 +0200, Pablo Neira Ayuso escribi?: > Pasi K?rkk?inen wrote: > > Hello! > > > > Any updates to these projects? Files available somewhere? I think many > > people from this list would like to test and help with these daemons.. > > The only thing available at the moment: > > http://people.netfilter.org/pablo/conntrackd/ > > I know, this requires an appropiate documentation and a webpage but I'm > working on it, any help on the webpage (something simple) could come handy. > -- Angel Mieres - amieres@eneotecnologia.com ///////////////////////////////////////// Gentoo has you... From amieres at eneotecnologia.com Fri Aug 18 14:33:34 2006 From: amieres at eneotecnologia.com (Angel Mieres) Date: Fri Aug 18 15:05:20 2006 Subject: iptables patch for tproxy Message-ID: <1155904414.19081.33.camel@supercoco> Hi All, Im trying to implement TPROXY(from balabit) to work with squid. Im having problems when it has to spoof clients from squid. i think my patch for iptables is obsolete or i don't applying correctly. Can someone supply me the iptables-1.3.5 patch for tproxy? A lot of thanks, Angel M. -- Angel Mieres - amieres@eneotecnologia.com ///////////////////////////////////////// Gentoo has you... From simonl at parknet.dk Fri Aug 18 15:06:03 2006 From: simonl at parknet.dk (Simon Lodal) Date: Fri Aug 18 15:37:18 2006 Subject: new ABI In-Reply-To: <20060816121653.GA31235@kriss.csbnet.se> References: <200608142312.41851.max@nucleus.it> <20060816121653.GA31235@kriss.csbnet.se> Message-ID: <1086.83.88.199.217.1155906363.squirrel@mail.parknet.dk> > Also keep in mind that we can allow several targets. I previously in > another mail talked about actions. Its not needed i think, but might > make it easier to distringuisch between jumps, ending targets and just > changing and logging targets. It should be perfect legal already today > to say: > > iptables -m match -j other_chain -j LOG -j other_chain2 -j DROP -j TTL LOG + DROP in one rule would be a huge improvement. Even though it would just reintroduce an ipchains feature. > Now, the last TTL will never be executed, but thats a user config > choise. The sematics of the above can't be missunderstood. > > > For kernel-space: > ------------------- > I think the above is good. Perhaps we don't need restore as it can be > done with dump. Only that dump must dump both initial state/config and > current state. > - init() > - destroy() > - dump() / restore() > - change() > - match() / target() > > Much importat is the change() that for exampel recent match can use to > remove or add IPs in any recent list. There are several other matches > that can use this. Quota for example. Add or remove bytes in the pot. > Far more complex matches like ipset can use this as well. That could provide the basis for some of the dynamic userspace features that people are often pointed to on this list, even though they do not yet exist. > Also a good way of "finding your way" to either group of rules is > needed. Today i have a router with 4096 IPs (students computers) > behind. The IPs all need its own chain of rules. I won't go into why, > but trust me, it's needed and the only flexible way that i have found. > I have created a sort of binary tree of rules trying to make the access > of each customer as painless as possible. This area needs work as well > i think. I've seen many people asking if this exists. Simple solution > might be to allow custom(?) modules that implemet different forms of > jumping. To which group or chain do you want to jump to? Well: iptables > -m match -J ipmapjump (notice the big -J) The recent "goto" jump also > fits here. I would like to do it in a generic way: Introduce a "match index" variable that can be set by matches and used by targets. A "--dports 1000:1023" match has 24 possible matches, so it would set the index to between 0 and 23. Same can be done for IP, sets; all other matches that have a finite set of possible matches and can enumerate them. Now, it should be relatively simple to create a generic jump target that uses the match index to jump to a specific subchain (don?t know exactly how the list of subchains wouyld be defined, should not be that difficult). Other interesting targets might be NAT that could NAT from/to a base address plus match index. > Summary: > + Use XML to express firewall rules. Because its easy and backward > compability will be easily ported. It fits both human written and > scripted rules. The tool is already there in tons of places. > > + init, destory, dump/restore, change, match/target is needed as > implemented functions (or Nulls) for the match / targets. > > + Use RCU-list in kernel. Because its more editable. > > + Have smart ways of allocate memory in kernel (slabs). > > + Allow sveral targets for one rule. > > + Perhaps seperate ending targets form non ending and jumps. > > + Allow customs jump modules, beside match and target modules. > > + Allow grouping of rules in some way. Really large firewall needs > this. > > + We rather have one too many hooks/void *, unused rather than one too > few for futhure use. It won't waste that much memory. > > + Design all this into pkttables rather than focus on IP/IPv6. I agree with all your points, perhaps except the XML part ... I am one of those non-converts. But you may be right anyway. It would be nice to have an standard way to define a ruleset, as descriptive data rather than commands. Regards, Simon From azez at ufomechanic.net Fri Aug 18 15:55:46 2006 From: azez at ufomechanic.net (Amin Azez) Date: Fri Aug 18 16:27:13 2006 Subject: ipt_vlan In-Reply-To: <44E4A1AB.3000001@trash.net> References: <44E49911.6080007@ufomechanic.net> <44E4A1AB.3000001@trash.net> Message-ID: <44E5C6E2.30809@ufomechanic.net> Attached is my ipt_vlan patch for 2.6.17 and 2.6.11, and the iptables patch to go with it. It's based on the mac match. I think the iptables patch is wrong, the way I freak the extension makefile needs reviewing, but it does compile. it doesn't require any vlan interfaces to be set up on the box; unless you want to route (I guess); I do bridging and get to match on vlan. Sam -------------- next part -------------- A non-text attachment was scrubbed... Name: vlan.2.6.17.patch Type: text/x-patch Size: 4415 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060818/508fa575/vlan.2.6.17-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: vlan.2.6.11.patch Type: text/x-patch Size: 4417 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060818/508fa575/vlan.2.6.11-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: iptables.vlan.patch Type: text/x-patch Size: 2707 bytes Desc: not available Url : /pipermail/netfilter-devel/attachments/20060818/508fa575/iptables.vlan-0001.bin From simonl at parknet.dk Fri Aug 18 16:14:35 2006 From: simonl at parknet.dk (Simon Lodal) Date: Fri Aug 18 16:45:51 2006 Subject: new ABI In-Reply-To: <200608160057.05431.max@nucleus.it> References: <200608142312.41851.max@nucleus.it> <200608151414.24599.simon@parknet.dk> <200608160057.05431.max@nucleus.it> Message-ID: <1108.83.88.199.217.1155910475.squirrel@mail.parknet.dk> >> Everybody has a long wishlist and seem to agree that something >> fundamental needs to be done. >> >> The question seems to be when backwards compatibility can be given up. > > Everyone agrees that we have reached the maximum expressiveness with > the current system. You mean we have created the ideal system?! Or that we have created a mess that is no longer extendable? > Nobody says that we couldn't keep a way to convert old rules in the new > system. > The real question thus becomes: is it worh to restart from (almost) > scratch? Sometimes you can have something entirely different in mind and still make incremental changes. The iptables syntax/interface as seen by user is far from stellar but perhaps good enough for it's purposes. I do not see any urgent need to change it's syntax. But the API's suck. Good ideas get nowhere because the API's can not support it. Is that really controversial? My point is they need to change, and it will be incomatible, too bad, but is has to done some day. >> > What people need from any new infrastructure: >> > - cleaner interface with clearer separation between kernel and user >> > data; - ability to dump internal state of matches/targets (this may >> > not be in a 1-to-1 relation, so it may be tricky, do we need module >> > state dumping?); >> >> Yes, but why should that be hard? Netfilter should already have a list >> of registered modules. > > Yes, but iptables has no way to manipulate per-module data (eg: > collection of names and flags for condition, but there are plenty > other examples). I don't think it would be difficult, even without a > total redesign. I was testing the ground for ideas and real needs. I agree. >> We are going to have "interesting" data that are not 1:1 with rules. >> But then they will be 1:1 with modules, or some other "scope" that >> netfilter > > Make it n:1. I don't think n:n is desirable. It is not n:n. It is just more 1:1's. >> knows how to traverse. Each "scope" can have their own section in the >> iptables-save output. Hence the parsing complexity lies in >> iptables-restore. >> >> Whether it is all going to be exposed in some filesystem or not is a >> different matter. > > I like file interfaces, but not everything readily becomes a file. It > all depends on what people really want to do with this class of data. > >> What is the version after it going to be then? No, I never liked the >> -ng suffix :) >> >> What is wrong with iptables2? > > OK. We had ipfwadm and ipchains. So we're really more like iptables4. > :) That might do. >> Flexibility is not free, but perhaps it can be cheap, performance >> wise. >> >> Let's say we make iptables more shell-like, with the ability to handle >> multiple commands in one invocation (with a final COMMIT command >> required)? Would be lovely in itself. >> >> Then iptables would get a better chance to optimize memory allocation, >> since it is not only looking at one rule at a time. >> >> The case where you load the entire firewall ruleset in one go could be >> optimized to a point where it is no different from today. > > This if we assume we know the sizes of everything. I think > matches/targets need to have a chance to influence their own data (now > they can't). Correct, and it is a major annoyance. > We'll have: > - general data structures (fixed); > - match/target descriptor (passed by userspace and of known size); - > match/target runtime data (potentially anything from a single byte to a > dynamic structure). > > Currently matches/targets are fed the descriptor. I'd like them to be > fed a descriptor and their runtime data. We can suppose the latter > won't be needed by every match, so it won't impact performance. > We still got a fixed size data structure that we can > move/compact/rewrite and a descriptor that we can potentially move (we > could move it if people weren't abusing it for lack of runtime data) > but with variable sized. > The first one can become a simple allocation in list node array (with > some mechanism for growing and shrinking). The descriptors are a > little more tricky and we would need stricter specifications in order > to do proper repacking. Sounds reasonable. > Before we continue work on a non-problem: do we have data about kernel > memory fragmentation and performance issues? I would love to know that too! >> * ipt_entry* structs might contain data (like basic >> src/dst/port/iface >> matches), but they may not keep pointers to anything, not even their >> own fields. They are independent of their own memory location. The >> memory management code can therefore rearrange the tables at will >> (proper locking assumed), without having to reinitialize rules. > > Good. I just don't know if this is overdesigned. Perhaps yes. It is irrelevant unless there really is a fregmentation issue. >> * All other memory is accessed through a struct that is passed to >> each >> rule/match/target's API functions. It contains at least >> .instance_data, but also .module_data (.priv_data), and perhaps other >> scopes data, >> like .rule_data, .chain_data and .global_data (all cross-module). Note >> that each of these are bound to a specific entity. > > I agree. > >> * Each module and instance must call special netfilter API's to >> allocate >> memory of the required types. The netfilter part handles free'ing >> through refcount (why not). > > If we don't have cross-module data (does anyone need it?) each module > could do it's housekeeping. It's difficult to know how to optimize > other people's data. The idea is just to make it less error prone to write match/target modules; the less free()'s you need to call the less memory leaks we get. Since you can only have one .instance_data pointer, the old one should be deallocated if you allocate another. Why not let netfilter do that. You would just tell netfilter how much memory you need, and it will just deliver that. And guarantee against accidental memory leaks in individual modules. >> * The actual .*_data pointers may change between invocations (packets >> fed >> to) of the same rule/match/target. This means the netfilter part is >> allowed to rearrange dynamic memory too. > > What if people want to keep pointers and other complex data structures? > The instance data should be opaque to the core code. The risk is that > people, not trusting this structure, will use it just to keep a > pointer to the real data. If someone really wants to break the rules they can. Here, the only rule is: No pointers. Use local offsets instead if you really need to "point". >> * Bonus: Sync of memory regions with other hosts can be handled >> transparently, or at least easily. So that fx. limit rules can work >> across redundant hosts. > > Malus: a whole memory management system just for a subsystem of the > kernel. Too much semantics risks to limit what people want to do. Of > course anarchy has drawbacks too. I'd seek a middle ground where we > handle the common case and leave people free to implement exotic new > things. The most important goal is to pass a number of pointers (descriptors) to the modules, each being shared in different ways (and having dynamic size, preferably). It might work if any module could (re)allocate those shared memory areas independently, but would be much simpler if netfilter allocation wrappers handles synchronization between them. We already have a memory management subsystem; we try to manage how things are located. Unfortunately it is very dumb which is the reason for many problems we have. I am no fan of a new complex memory management subsystem in-kernel. It is only a suggestion for a solution to the fragmentation issue, if it really exists. >> I have no clear idea how all these individual blobs would be >> communicated between kernel and userspace. Except there are two >> general options: >> >> 1) The current "pass a large blob" scheme. Since it will contain many >> smaller blobs, some in-kernel parsing is required. Worse yet, the >> kernel must also be able to assemble a large blob in order to dump to >> userspace. > > Either way we'll need some form of rule and match id. > I don't know what level of transactionality is desired. Currently > iptables-restore is atomic and so are single changes with iptables. How > much is needed with the new system? At least rule level atomicity is > certainly desired, so we'll need to create duplicate data (just the > core structure with pointers to the real descriptors) during > modifications. I agree with that desire. But it totally rules out a filesystem representation, I guess. Not that I really want an iptablesfs. I guess some hierarchical locking would be necessary. Regards, Simon From azez at ufomechanic.net Fri Aug 18 16:50:43 2006 From: azez at ufomechanic.net (Amin Azez) Date: Fri Aug 18 17:22:18 2006 Subject: new ABI In-Reply-To: <200608160057.05431.max@nucleus.it> References: <200608142312.41851.max@nucleus.it> <200608151414.24599.simon@parknet.dk> <200608160057.05431.max@nucleus.it> Message-ID: <44E5D3C3.30707@ufomechanic.net> * Massimiliano Hofer wrote, On 15/08/06 23:57: > If we don't have cross-module data (does anyone need it?) All my cross-module data is per connection and so can be kept in the conntrack. Sam From hidden at balabit.hu Fri Aug 18 19:50:56 2006 From: hidden at balabit.hu (KOVACS Krisztian) Date: Fri Aug 18 20:21:52 2006 Subject: iptables patch for tproxy In-Reply-To: <1155904414.19081.33.camel@supercoco> References: <1155904414.19081.33.camel@supercoco> Message-ID: <200608181950.57032@nienna> Hi, On Friday 18 August 2006 14:33, Angel Mieres wrote: > Im trying to implement TPROXY(from balabit) to work with squid. Im > having problems when it has to spoof clients from squid. i think my > patch for iptables is obsolete or i don't applying correctly. > Can someone supply me the iptables-1.3.5 patch for tproxy? Which iptables patch were you trying to apply? You can find an iptables-1.3.x patch in the "official" tproxy tarballs, for example in this one: http://www.balabit.com/downloads/tproxy/linux-2.6/cttproxy-2.6.15-2.0.4.tar.gz Oh, and more up-to-date (not yet officially released) kernel patch snapshots are available also: http://people.balabit.hu/hidden/tproxy2-2.6.16_20060727.tar.bz2 http://people.balabit.hu/hidden/tproxy2-2.6.17_20060727.tar.bz2 http://people.balabit.hu/hidden/tproxy2-2.6.18_20060727.tar.bz2 -- Regards, Krisztian Kovacs From kaber at trash.net Fri Aug 18 20:18:03 2006 From: kaber at trash.net (Patrick McHardy) Date: Fri Aug 18 20:49:22 2006 Subject: ipt_vlan In-Reply-To: <44E5C6E2.30809@ufomechanic.net> References: <44E49911.6080007@ufomechanic.net> <44E4A1AB.3000001@trash.net> <44E5C6E2.30809@ufomechanic.net> Message-ID: <44E6045B.6050407@trash.net> Amin Azez wrote: > Attached is my ipt_vlan patch for 2.6.17 and 2.6.11, and the iptables > patch to go with it. It's based on the mac match. This looks useful. If we hadn't already got ebt_vlan (which seems to do the same thing with a few extra features) I would be tempted to ask you to submit it :) > I think the iptables patch is wrong, the way I freak the extension > makefile needs reviewing, but it does compile. > --- extensions/Makefile.orig 2006-06-30 10:41:38.000000000 +0100 > +++ extensions/Makefile 2006-06-30 10:42:00.000000000 +0100 > @@ -14,2 +14,4 @@ > > +PF_EXT_SLIB+=vlan > + Its fine this way, although the usual way for extensions is to either add them to the long list at the top or add a .test script. From kaber at trash.net Fri Aug 18 20:23:29 2006 From: kaber at trash.net (Patrick McHardy) Date: Fri Aug 18 20:54:47 2006 Subject: lots of oopses In-Reply-To: <44E56D39.50806@ufomechanic.net> References: <44E49911.6080007@ufomechanic.net> <44E4A1AB.3000001@trash.net> <44E56D39.50806@ufomechanic.net> Message-ID: <44E605A1.5000007@trash.net> Amin Azez wrote: > Patrick McHardy wrote: > >> BTW, what is ipt_vlan? >> > > I think I posted it here a year ago, but I'll do so again if you want it. > It matches on vlan-id. > > It was said that strictly this is a layer 2 thing and not for iptables; > I find it useful though;- > which iptables rules should be applied may depend on vlan stuff, and > sometimes it seems like there isn't enough mark to go around... > > I like the iptables/ebtables seperation but sometimes it seems like they > should be able to share each-others matches, like one big happy table > with a few extra points of inspection. Anyway... Agreed. It should be possible for ebtables to use all iptables matches looking only at packet data, but not necessarily the other way around. Unfortunately ebtables is in large parts a copy of iptables, with just enough differences to prevent it from using x_tables. From max at nucleus.it Fri Aug 18 23:40:18 2006 From: max at nucleus.it (Massimiliano Hofer) Date: Sat Aug 19 00:11:44 2006 Subject: new ABI In-Reply-To: <1108.83.88.199.217.1155910475.squirrel@mail.parknet.dk> References: <200608142312.41851.max@nucleus.it> <200608160057.05431.max@nucleus.it> <1108.83.88.199.217.1155910475.squirrel@mail.parknet.dk> Message-ID: <200608182340.19213.max@nucleus.it> On Friday 18 August 2006 4:14 pm, Simon Lodal wrote: > > Everyone agrees that we have reached the maximum expressiveness with > > the current system. > > You mean we have created the ideal system?! > > Or that we have created a mess that is no longer extendable? Something in between. :) You can do incremental improvements, but some people is asking for structural changes in order to achieve other goals. I'm not dissatisfied with the current code. The whole purpose of this thread is to understand if there is something worth an extensive change of the API (between core netfilter and its modules), possibly maintaining the ABI. I was just seeking new ideas for the sake of it. Then we'll see what's worth, what can be done with what we have today, what is just too troublesome and what is just too idealistic to achieve in the near future (but potentially interesting). The hard part will be when someone will have to do the real work. ;) > > The real question thus becomes: is it worh to restart from (almost) > > scratch? > > Sometimes you can have something entirely different in mind and still make > incremental changes. > The iptables syntax/interface as seen by user is far from stellar but > perhaps good enough for it's purposes. I do not see any urgent need to > change it's syntax. Neither do I, but it's mostly a matter of taste. > But the API's suck. Good ideas get nowhere because the API's can not > support it. Is that really controversial? My point is they need to change, > and it will be incomatible, too bad, but is has to done some day. The question is: how much can we change the API without affecting the ABI? Most things can be added incrementally, but I mostly started this thread because Patrick complained about the lack of a way to change individual rules or matches. I think this will be the hardest feature yet proposed. > The idea is just to make it less error prone to write match/target > modules; the less free()'s you need to call the less memory leaks we get. > Since you can only have one .instance_data pointer, the old one should be > deallocated if you allocate another. Why not let netfilter do that. You > would just tell netfilter how much memory you need, and it will just > deliver that. And guarantee against accidental memory leaks in individual > modules. We'd need a 2 stage intialization. Something like: - a simple init for matches that don't need .(priv|instance)_data or that declare a fixed size in the match registration; - a size determinig call and the real init for dynamic ones. This could be tricky for complex data and this could be rare enough to justify a fixed base structure and more complex data completely managed by the match module. People might end up like that anyway if what we do isn't enough (after all we could supply arbirtrarily sized structures, but only at init time). Maybe the simple solution would be enough. > Here, the only rule is: No pointers. Use local offsets instead if you > really need to "point". No "local" pointers. Pointers to external data will work. With these requirements we could keep the current copy and discard mechanism. We could have a match array (mostly like the current one) and supplement it with a (priv|instance)_data array with size and offsets computed with a quick pass through the first one. With proper locking we could copy the necessary data with no memory fragmantation, and no lists. Of course we have a major disadvantage: the current code can afford to build the new array without locking. The list approach can lock single nodes or the whole list for the time needed to change a single node. This last proposal would need to lock everything while it copies what could be thousands of rules. The main questio remains this one: are we really scared by fragmentation? I'll do some investigation, but I don't know if I'll have an answer. > The most important goal is to pass a number of pointers (descriptors) to > the modules, each being shared in different ways (and having dynamic size, > preferably). It might work if any module could (re)allocate those shared > memory areas independently, but would be much simpler if netfilter > allocation wrappers handles synchronization between them. You're describing priv_data. :) > > Either way we'll need some form of rule and match id. > > I don't know what level of transactionality is desired. Currently > > iptables-restore is atomic and so are single changes with iptables. How > > much is needed with the new system? At least rule level atomicity is > > certainly desired, so we'll need to create duplicate data (just the > > core structure with pointers to the real descriptors) during > > modifications. > > I agree with that desire. But it totally rules out a filesystem > representation, I guess. Not that I really want an iptablesfs. > I guess some hierarchical locking would be necessary. I mentioned the file system approach just for the sake of it. I like the everything-is-a-file approach, but it certainly has its limits. -- Saluti, Massimiliano Hofer Nucleus From max at nucleus.it Fri Aug 18 23:40:25 2006 From: max at nucleus.it (Massimiliano Hofer) Date: Sat Aug 19 00:11:49 2006 Subject: new ABI In-Reply-To: <1086.83.88.199.217.1155906363.squirrel@mail.parknet.dk> References: <200608142312.41851.max@nucleus.it> <20060816121653.GA31235@kriss.csbnet.se> <1086.83.88.199.217.1155906363.squirrel@mail.parknet.dk> Message-ID: <200608182340.26555.max@nucleus.it> On Friday 18 August 2006 3:06 pm, Simon Lodal wrote: > > Also keep in mind that we can allow several targets. I previously in > > another mail talked about actions. Its not needed i think, but might > > make it easier to distringuisch between jumps, ending targets and just > > changing and logging targets. It should be perfect legal already today > > to say: > > > > iptables -m match -j other_chain -j LOG -j other_chain2 -j DROP -j TTL > > LOG + DROP in one rule would be a huge improvement. Even though it would > just reintroduce an ipchains feature. I like the idea of actions. I could perform separate type of mangling and other non "terminal" things with separate rules without worrying about precedence and specific combinations. The current use of "--continue" with some, but not all, targets really should be handles in a more general way and actions looks like a good solution to me. > I would like to do it in a generic way: Introduce a "match index" variable > that can be set by matches and used by targets. A "--dports 1000:1023" > match has 24 possible matches, so it would set the index to between 0 and > 23. Same can be done for IP, sets; all other matches that have a finite > set of possible matches and can enumerate them. What if we just assign a numeric index to every rule (plus an additional index for individual matches). This would let us identify rules for future changes, but we could go a step farther and let people choose a specific label if they want to. This way we could jump to a separate chain or just to label two rules away. If we combine this with my proposal for "functional chains" we could represent a whole lot of complex rulesets with far less rules than today. > I agree with all your points, perhaps except the XML part ... I am one of > those non-converts. But you may be right anyway. It would be nice to have > an standard way to define a ruleset, as descriptive data rather than > commands. I'm a non-convert too, but perhaps it doesn't matter. The final userspace representation is irrelevant to the kernel and might be a matter of a few additional scripts. -- Saluti, Massimiliano Hofer Nucleus From max at nucleus.it Sat Aug 19 00:24:28 2006 From: max at nucleus.it (Massimiliano Hofer) Date: Sat Aug 19 00:55:49 2006 Subject: new ABI In-Reply-To: <20060816121653.GA31235@kriss.csbnet.se> References: <200608142312.41851.max@nucleus.it> <20060816121653.GA31235@kriss.csbnet.se> Message-ID: <200608190024.28883.max@nucleus.it> On Wednesday 16 August 2006 2:16 pm, Joakim Axelsson wrote: > We keep the idea of rules (rather than essay, compiled or other form). To > make the new implementation as easy and as fast as possible i think using > XML to express the firewall is a good way to go. Now before you turn on all > your negatives, here's why: > > - We want the old (todays) iptables to be compliant. > - We want an easy library that implements the ABI to kernel. > - We want to make the smallest possible effort writing a userspace tool. > Leaving the more advacned for others / other projects. We want the existing ABI to continue working. Having people recompile iptables with a kernel change is not a short term proposition. I'd say it's kernel 3.0 matter. Using it between user utilities/scripts and a new version of iptables is certainly feasible, but it already is. > XML already has good parsers. We can very easily rewrite todays iptables to > output XML. XML has no real limits on how to express things. Ofcourse can > other future userspace program use the new ABI-library directly, not using > the XML-parser. However, for those who doesn't want to learn the new > library XML is very easy. Both for humans, scripts and programs. This is good for automatic manipulation. I'm not convinced it will be easier to digest for humans. > So we need one userspace library that talks the ABI with kernel. We also > need a library using this ABI that parses XML-files and passes them to the > ABI-library. Finally other userspace tools (that we do not need to write) > that dumps from kernel passed by the kernel->ABI->XML and pushes XML rules. I'm all for a good library. XML might just be one of its representation plugins. > if (packet.ipv4.source = 1.2.3.4 AND limit(2/s)) then > LOG(log-prefix, ...) > jump(other-chain) > DROP > end if; > > The above might be much easier for a newbie to use. The best thing is that > we are pushing the "need" to write these kind of tools to others as > seperate projects. The common is XML. No need for complex parsers and trix > using getopt(). I'm not convinced that this will be much easier to parse, but I was expecting more a tag hell. Maybe you're right. Count me as undecided yet. > iptables is now easily ported like: > iptables <-> XML <-> ABI <-> kernel The old iptables needs to continue working. We can't introduce layers between it and the current kernel ABI. > Also keep in mind that we can allow several targets. I previously in > another mail talked about actions. Its not needed i think, but might make > it easier to distringuisch between jumps, ending targets and just changing > and logging targets. It should be perfect legal already today to say: > > iptables -m match -j other_chain -j LOG -j other_chain2 -j DROP -j TTL > > Now, the last TTL will never be executed, but thats a user config choise. > The sematics of the above can't be missunderstood. I like this idea. Maybe I would use a different parameter for actions ("-a"?), but it's interesting. > I think the above is good. Perhaps we don't need restore as it can be done > with dump. Only that dump must dump both initial state/config and current > state. > - init() > - destroy() > - dump() / restore() > - change() > - match() / target() > > Much importat is the change() that for exampel recent match can use to > remove or add IPs in any recent list. There are several other matches that > can use this. Quota for example. Add or remove bytes in the pot. Far more > complex matches like ipset can use this as well. How would you represent single match changes with your XML implementation? It sure would be expressive enought for any kind of module data. > I also think RCU-list will help instead of tables/arrays. Its much more > common to add/change or remove a rule than add them all. I have a router > with some 1000 rules. Its a pain to change on of them. To save cache misses I usually wipe everything and write from scratch. All my firewalls are generated (at least partially) and I don't want to risk making mismatched changes in the current firewall and the generating script. I'd need several much more powerful primitives to abandon the approach. Anyway I agree to the usefulness of RCU lists. > Another thing that could use an add is the way of grouping rules. "These > set of rules belong to customer 1 and these to customer 2. And i'd like to > only list all rules related to customer 1". Now there is two aproches to > implement this. First one is to tag the rule with what rule or which group > it belongs to. Another way it to create sub tables. > > Also a good way of "finding your way" to either group of rules is needed. > Today i have a router with 4096 IPs (students computers) behind. The IPs > all need its own chain of rules. I won't go into why, but trust me, it's > needed and the only flexible way that i have found. I have created a sort > of binary tree of rules trying to make the access of each customer as > painless as possible. This area needs work as well i think. I've seen many > people asking if this exists. Simple solution might be to allow custom(?) > modules that implemet different forms of jumping. To which group or chain > do you want to jump to? Well: iptables -m match -J ipmapjump (notice the > big -J) > The recent "goto" jump also fits here. You want some sort of multijump? I had this kind of problem too. ipmapjump seems too specific. We'd really need a rule compiler/optimizer to handle this in an efficient way. In your language it would be something like: switch(packet.ipv4.source && 0xFF) case 1: ... ... end switch; This would be several orders of magnitude more complex than the current system (although magnificient). > Next, try to design this new iptables2/-ng so we don't need iptables3 in > the future. Rather add one too many unused hooks, void * passed parameter > than one too few. Of course this is an illusion. Nothing short of Turing-completeness will prevent iptables3. Even so someone will ask for an OO-iptables, a generic-iptables, etc. We can try to keep people satisfied for the next few years and plan for the unplanned, but there is always something that is really unplannable and unforeseeable. > Design this so we can have pkttables. Meaning no need for seperate tables > for iptables, ip6tables, arptables, ebtables. All in one. Its not that hard > really. Just perhaps a few more tables (nat, mangle, raw, filter, bridge > etc.) and move even the basic matching like source and dest address into > modules. OK. > Summary: > + Use XML to express firewall rules. Because its easy and backward > compability will be easily ported. It fits both human written and scripted > rules. The tool is already there in tons of places. Not convinced yet, but keep insisting. :) Of course we'd need someone to do it. :) > + init, destory, dump/restore, change, match/target is needed as > implemented functions (or Nulls) for the match / targets. OK. > + Use RCU-list in kernel. Because its more editable. Mostly OK, for lack of better alternatives. > + Have smart ways of allocate memory in kernel (slabs). OK. > + Allow sveral targets for one rule. > + Perhaps seperate ending targets form non ending and jumps. OK. I'd separate flow-changing targets from packet-altering (or some other lateral effect) actions. Actions would be like current targets with a "--continue" parameter and we could combine it with a target to make them stop. > + Allow customs jump modules, beside match and target modules. What exacly are you proposing? > + Allow grouping of rules in some way. Really large firewall needs this. Proposal? What about function-chains that we could use as a match? > + We rather have one too many hooks/void *, unused rather than one too few > for futhure use. It won't waste that much memory. Of course. > + Design all this into pkttables rather than focus on IP/IPv6. OK. > Thanks for your time reading this far :-) :) -- Saluti, Massimiliano Hofer Nucleus From pablo at netfilter.org Mon Aug 21 10:46:02 2006 From: pablo at netfilter.org (Pablo Neira Ayuso) Date: Mon Aug 21 11:12:09 2006 Subject: [PATCH 1/3][CTNETLINK] Rework conntrack fields dumping logic on events Message-ID: <44E972CA.8040004@netfilter.org> What do we dump on conntrack events? Good question, the following table should clarify 8) | NEW | UPDATE | DESTROY | ----------------------------------------| tuples | Y | Y | Y | status | Y | Y | N | timeout | Y | Y | N | protoinfo | Y | Y | N | helper | S | S | N | counters | N | N | Y | mark | S | S | N | Leyend: Y: yes N: no S: iif the field is set This patch also replace IPCT_HELPINFO by IPCT_HELPER since we want to track the helper assignation process, not the changes in the private information held by the helper. Signed-off-by: Pablo Neira Ayuso -- The dawn of the fourth age of Linux firewalling is coming; a time of great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris -------------- next part -------------- [CTNETLINK] Rework conntrack fields dumping logic on events What do we dump on conntrack events? Good question, the following table should clarify 8) | NEW | UPDATE | DESTROY | ----------------------------------------| tuples | Y | Y | Y | status | Y | Y | N | timeout | Y | Y | N | protoinfo | Y | Y | N | helper | S | S | N | counters | N | N | Y | mark | S | S | N | Leyend: Y: yes N: no S: iif the field is set This patch also replace IPCT_HELPINFO by IPCT_HELPER since we want to track the helper assignation process, not the changes in the private information held by the helper. Signed-off-by: Pablo Neira Ayuso Index: net-2.6/net/netfilter/nf_conntrack_netlink.c =================================================================== --- net-2.6.orig/net/netfilter/nf_conntrack_netlink.c 2006-08-17 11:52:27.000000000 +0200 +++ net-2.6/net/netfilter/nf_conntrack_netlink.c 2006-08-17 11:53:01.000000000 +0200 @@ -336,11 +336,15 @@ static int ctnetlink_conntrack_event(str } else if (events & (IPCT_NEW | IPCT_RELATED)) { type = IPCTNL_MSG_CT_NEW; flags = NLM_F_CREATE|NLM_F_EXCL; - /* dump everything */ - events = ~0UL; + events |= IPCT_REFRESH | + IPCT_STATUS | + IPCT_PROTOINFO; group = NFNLGRP_CONNTRACK_NEW; } else if (events & (IPCT_STATUS | IPCT_PROTOINFO)) { type = IPCTNL_MSG_CT_NEW; + events |= IPCT_REFRESH | + IPCT_STATUS | + IPCT_PROTOINFO; group = NFNLGRP_CONNTRACK_UPDATE; } else return NOTIFY_DONE; @@ -383,15 +387,17 @@ static int ctnetlink_conntrack_event(str if (events & IPCT_PROTOINFO && ctnetlink_dump_protoinfo(skb, ct) < 0) goto nfattr_failure; - if (events & IPCT_HELPINFO + if ((events & IPCT_HELPER || nfct_help(ct)) && ctnetlink_dump_helpinfo(skb, ct) < 0) goto nfattr_failure; - if (ctnetlink_dump_counters(skb, ct, IP_CT_DIR_ORIGINAL) < 0 || - ctnetlink_dump_counters(skb, ct, IP_CT_DIR_REPLY) < 0) + /* this connection has died or counters wrapped around */ + if ((events & IPCT_DESTROY || events & IPCT_COUNTER_FILLING) + && (ctnetlink_dump_counters(skb, ct, IP_CT_DIR_ORIGINAL) < 0 || + ctnetlink_dump_counters(skb, ct, IP_CT_DIR_REPLY) < 0)) goto nfattr_failure; - if (events & IPCT_MARK + if ((events & IPCT_MARK || ct->mark) && ctnetlink_dump_mark(skb, ct) < 0) goto nfattr_failure; Index: net-2.6/net/ipv4/netfilter/ip_conntrack_netlink.c =================================================================== --- net-2.6.orig/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-08-17 11:52:27.000000000 +0200 +++ net-2.6/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-08-17 11:53:14.000000000 +0200 @@ -326,11 +326,15 @@ static int ctnetlink_conntrack_event(str } else if (events & (IPCT_NEW | IPCT_RELATED)) { type = IPCTNL_MSG_CT_NEW; flags = NLM_F_CREATE|NLM_F_EXCL; - /* dump everything */ - events = ~0UL; + events |= IPCT_REFRESH | + IPCT_STATUS | + IPCT_PROTOINFO; group = NFNLGRP_CONNTRACK_NEW; } else if (events & (IPCT_STATUS | IPCT_PROTOINFO)) { type = IPCTNL_MSG_CT_NEW; + events |= IPCT_REFRESH | + IPCT_STATUS | + IPCT_PROTOINFO; group = NFNLGRP_CONNTRACK_UPDATE; } else return NOTIFY_DONE; @@ -373,15 +377,17 @@ static int ctnetlink_conntrack_event(str if (events & IPCT_PROTOINFO && ctnetlink_dump_protoinfo(skb, ct) < 0) goto nfattr_failure; - if (events & IPCT_HELPINFO + if ((events & IPCT_HELPER || ct->helper) && ctnetlink_dump_helpinfo(skb, ct) < 0) goto nfattr_failure; - if (ctnetlink_dump_counters(skb, ct, IP_CT_DIR_ORIGINAL) < 0 || - ctnetlink_dump_counters(skb, ct, IP_CT_DIR_REPLY) < 0) + /* this connection has died or counters wrapped around */ + if ((events & IPCT_DESTROY || events & IPCT_COUNTER_FILLING) + && (ctnetlink_dump_counters(skb, ct, IP_CT_DIR_ORIGINAL) < 0 || + ctnetlink_dump_counters(skb, ct, IP_CT_DIR_REPLY) < 0)) goto nfattr_failure; - if (events & IPCT_MARK + if ((events & IPCT_MARK || ct->mark) && ctnetlink_dump_mark(skb, ct) < 0) goto nfattr_failure; From pablo at netfilter.org Mon Aug 21 10:46:25 2006 From: pablo at netfilter.org (Pablo Neira Ayuso) Date: Mon Aug 21 11:12:26 2006 Subject: [PATCH 2/3][CONNTRACK] Introduce the pickup facilities to take over TCP connections Message-ID: <44E972E1.4080500@netfilter.org> This patch introduces a new flag called IPS_PICKUP that forces the protocol handler to pick up the window of valid TCP packets. Moreover, four new attributes to inject the window scale factor and enable SACK are introduced. These new facilities provide the appropiate mechanisms to take over TCP connections in failover settings with TCP tracking enabled. Signed-off-by: Pablo Neira Ayuso -- The dawn of the fourth age of Linux firewalling is coming; a time of great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris -------------- next part -------------- [CONNTRACK] Introduce the pickup facilities to take over TCP connections This patch introduces a new flag called IPS_PICKUP that forces the protocol handler to pick up the window of valid TCP packets. Moreover, four new attributes to inject the window scale factor and enable SACK are introduced. These new facilities provide the appropiate mechanisms to take over TCP connections in failover settings with TCP tracking enabled. Signed-off-by: Pablo Neira Ayuso Index: net-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c =================================================================== --- net-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2006-08-16 22:35:52.000000000 +0200 +++ net-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2006-08-17 13:56:10.000000000 +0200 @@ -341,11 +341,24 @@ static int tcp_to_nfattr(struct sk_buff const struct ip_conntrack *ct) { struct nfattr *nest_parms; - + u_int8_t sack; + read_lock_bh(&tcp_lock); nest_parms = NFA_NEST(skb, CTA_PROTOINFO_TCP); NFA_PUT(skb, CTA_PROTOINFO_TCP_STATE, sizeof(u_int8_t), &ct->proto.tcp.state); + /* window scale factor: original direction (SYN) */ + NFA_PUT(skb, CTA_PROTOINFO_TCP_WSCALE_ORIGINAL, sizeof(u_int8_t), + &ct->proto.tcp.seen[0].td_scale); + /* window scale factor: reply direction (SYN+ACK) */ + NFA_PUT(skb, CTA_PROTOINFO_TCP_WSCALE_REPLY, sizeof(u_int8_t), + &ct->proto.tcp.seen[1].td_scale); + /* SACK: original direction */ + sack = ct->proto.tcp.seen[0].flags & IP_CT_TCP_FLAG_SACK_PERM; + NFA_PUT(skb, CTA_PROTOINFO_TCP_SACK_ORIGINAL, sizeof(u_int8_t), &sack); + /* SACK: reply direction */ + sack = ct->proto.tcp.seen[1].flags & IP_CT_TCP_FLAG_SACK_PERM; + NFA_PUT(skb, CTA_PROTOINFO_TCP_SACK_REPLY, sizeof(u_int8_t), &sack); read_unlock_bh(&tcp_lock); NFA_NEST_END(skb, nest_parms); @@ -358,7 +371,11 @@ nfattr_failure: } static const size_t cta_min_tcp[CTA_PROTOINFO_TCP_MAX] = { - [CTA_PROTOINFO_TCP_STATE-1] = sizeof(u_int8_t), + [CTA_PROTOINFO_TCP_STATE-1] = sizeof(u_int8_t), + [CTA_PROTOINFO_TCP_WSCALE_ORIGINAL-1] = sizeof(u_int8_t), + [CTA_PROTOINFO_TCP_WSCALE_REPLY-1] = sizeof(u_int8_t), + [CTA_PROTOINFO_TCP_SACK_ORIGINAL-1] = sizeof(u_int8_t), + [CTA_PROTOINFO_TCP_SACK_REPLY-1] = sizeof(u_int8_t) }; static int nfattr_to_tcp(struct nfattr *cda[], struct ip_conntrack *ct) @@ -382,6 +399,40 @@ static int nfattr_to_tcp(struct nfattr * write_lock_bh(&tcp_lock); ct->proto.tcp.state = *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_STATE-1]); + /* window scale factor: original direction (SYN) */ + if (tb[CTA_PROTOINFO_TCP_WSCALE_ORIGINAL-1]) { + ct->proto.tcp.seen[0].td_scale = + *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_WSCALE_ORIGINAL-1]); + ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_WINDOW_SCALE; + } + /* window scale factor: reply direction (SYN+ACK) */ + if (tb[CTA_PROTOINFO_TCP_WSCALE_REPLY-1]) { + ct->proto.tcp.seen[1].td_scale = + *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_WSCALE_REPLY-1]); + ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_WINDOW_SCALE; + } + /* enable/disable SACK: original direction */ + if (tb[CTA_PROTOINFO_TCP_SACK_ORIGINAL-1]) { + u_int8_t enable = + *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_SACK_ORIGINAL-1]); + if (enable) + ct->proto.tcp.seen[0].flags |= + IP_CT_TCP_FLAG_SACK_PERM; + else + ct->proto.tcp.seen[0].flags &= + ~IP_CT_TCP_FLAG_SACK_PERM; + } + /* enable/disable SACK: reply direction */ + if (tb[CTA_PROTOINFO_TCP_SACK_REPLY-1]) { + u_int8_t enable = + *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_SACK_REPLY-1]); + if (enable) + ct->proto.tcp.seen[1].flags |= + IP_CT_TCP_FLAG_SACK_PERM; + else + ct->proto.tcp.seen[1].flags &= + ~IP_CT_TCP_FLAG_SACK_PERM; + } write_unlock_bh(&tcp_lock); return 0; @@ -425,10 +476,10 @@ static unsigned int get_conntrack_index( we doesn't have to deal with fragments. */ -static inline __u32 segment_seq_plus_len(__u32 seq, - size_t len, - struct iphdr *iph, - struct tcphdr *tcph) +static inline __u32 segment_seq_plus_len(const __u32 seq, + const size_t len, + const struct iphdr *iph, + const struct tcphdr *tcph) { return (seq + len - (iph->ihl + tcph->doff)*4 + (tcph->syn ? 1 : 0) + (tcph->fin ? 1 : 0)); @@ -890,6 +941,22 @@ static int tcp_error(struct sk_buff *skb return NF_ACCEPT; } +static void tcp_pickup_window(struct ip_conntrack *conntrack, + const struct sk_buff *skb, + const struct iphdr *iph, + const struct tcphdr *th) +{ + conntrack->proto.tcp.seen[0].td_end = + segment_seq_plus_len(ntohl(th->seq), skb->len, iph, th); + conntrack->proto.tcp.seen[0].td_maxwin = ntohs(th->window); + if (conntrack->proto.tcp.seen[0].td_maxwin == 0) + conntrack->proto.tcp.seen[0].td_maxwin = 1; + conntrack->proto.tcp.seen[0].td_maxend = + conntrack->proto.tcp.seen[0].td_end + + conntrack->proto.tcp.seen[0].td_maxwin; + conntrack->proto.tcp.seen[0].td_scale = 0; +} + /* Returns verdict for packet, or -1 for invalid. */ static int tcp_packet(struct ip_conntrack *conntrack, const struct sk_buff *skb, @@ -907,6 +974,14 @@ static int tcp_packet(struct ip_conntrac BUG_ON(th == NULL); write_lock_bh(&tcp_lock); + + /* + * This conntrack was added via ctnetlink or ct_sync and needs to + * take over sequence tracking in order to work properly. + */ + if (test_and_clear_bit(IPS_PICKUP, &conntrack->status)) + tcp_pickup_window(conntrack, skb, iph, th); + old_state = conntrack->proto.tcp.state; dir = CTINFO2DIR(ctinfo); index = get_conntrack_index(th); @@ -1116,16 +1191,7 @@ static int tcp_new(struct ip_conntrack * * its history is lost for us. * Let's try to use the data from the packet. */ - conntrack->proto.tcp.seen[0].td_end = - segment_seq_plus_len(ntohl(th->seq), skb->len, - iph, th); - conntrack->proto.tcp.seen[0].td_maxwin = ntohs(th->window); - if (conntrack->proto.tcp.seen[0].td_maxwin == 0) - conntrack->proto.tcp.seen[0].td_maxwin = 1; - conntrack->proto.tcp.seen[0].td_maxend = - conntrack->proto.tcp.seen[0].td_end + - conntrack->proto.tcp.seen[0].td_maxwin; - conntrack->proto.tcp.seen[0].td_scale = 0; + tcp_pickup_window(conntrack, skb, iph, th); /* We assume SACK. Should we assume window scaling too? */ conntrack->proto.tcp.seen[0].flags = Index: net-2.6/include/linux/netfilter/nf_conntrack_common.h =================================================================== --- net-2.6.orig/include/linux/netfilter/nf_conntrack_common.h 2006-08-17 11:51:40.000000000 +0200 +++ net-2.6/include/linux/netfilter/nf_conntrack_common.h 2006-08-17 11:53:57.000000000 +0200 @@ -73,6 +73,10 @@ enum ip_conntrack_status { /* Connection has fixed timeout. */ IPS_FIXED_TIMEOUT_BIT = 10, IPS_FIXED_TIMEOUT = (1 << IPS_FIXED_TIMEOUT_BIT), + + /* Pick up connection information if required */ + IPS_PICKUP_BIT = 11, + IPS_PICKUP = (1 << IPS_PICKUP_BIT), }; /* Connection tracking event bits */ Index: net-2.6/net/netfilter/nf_conntrack_proto_tcp.c =================================================================== --- net-2.6.orig/net/netfilter/nf_conntrack_proto_tcp.c 2006-08-16 22:35:52.000000000 +0200 +++ net-2.6/net/netfilter/nf_conntrack_proto_tcp.c 2006-08-17 13:55:13.000000000 +0200 @@ -381,10 +381,10 @@ static unsigned int get_conntrack_index( we doesn't have to deal with fragments. */ -static inline __u32 segment_seq_plus_len(__u32 seq, - size_t len, - unsigned int dataoff, - struct tcphdr *tcph) +static inline __u32 segment_seq_plus_len(const __u32 seq, + const size_t len, + const unsigned int dataoff, + const struct tcphdr *tcph) { /* XXX Should I use payload length field in IP/IPv6 header ? * - YK */ @@ -850,6 +850,22 @@ static int tcp_error(struct sk_buff *skb return NF_ACCEPT; } +static void tcp_pickup_window(struct nf_conn *conntrack, + const struct sk_buff *skb, + const unsigned int dataoff, + const struct tcphdr *th) +{ + conntrack->proto.tcp.seen[0].td_end = + segment_seq_plus_len(ntohl(th->seq), skb->len, dataoff, th); + conntrack->proto.tcp.seen[0].td_maxwin = ntohs(th->window); + if (conntrack->proto.tcp.seen[0].td_maxwin == 0) + conntrack->proto.tcp.seen[0].td_maxwin = 1; + conntrack->proto.tcp.seen[0].td_maxend = + conntrack->proto.tcp.seen[0].td_end + + conntrack->proto.tcp.seen[0].td_maxwin; + conntrack->proto.tcp.seen[0].td_scale = 0; +} + /* Returns verdict for packet, or -1 for invalid. */ static int tcp_packet(struct nf_conn *conntrack, const struct sk_buff *skb, @@ -868,6 +884,14 @@ static int tcp_packet(struct nf_conn *co BUG_ON(th == NULL); write_lock_bh(&tcp_lock); + + /* + * This conntrack was added via ctnetlink or ct_sync and needs to + * take over sequence tracking in order to work properly. + */ + if (test_and_clear_bit(IPS_PICKUP, &conntrack->status)) + tcp_pickup_window(conntrack, skb, dataoff, th); + old_state = conntrack->proto.tcp.state; dir = CTINFO2DIR(ctinfo); index = get_conntrack_index(th); @@ -1075,16 +1099,7 @@ static int tcp_new(struct nf_conn *connt * its history is lost for us. * Let's try to use the data from the packet. */ - conntrack->proto.tcp.seen[0].td_end = - segment_seq_plus_len(ntohl(th->seq), skb->len, - dataoff, th); - conntrack->proto.tcp.seen[0].td_maxwin = ntohs(th->window); - if (conntrack->proto.tcp.seen[0].td_maxwin == 0) - conntrack->proto.tcp.seen[0].td_maxwin = 1; - conntrack->proto.tcp.seen[0].td_maxend = - conntrack->proto.tcp.seen[0].td_end + - conntrack->proto.tcp.seen[0].td_maxwin; - conntrack->proto.tcp.seen[0].td_scale = 0; + tcp_pickup_window(conntrack, skb, dataoff, th); /* We assume SACK. Should we assume window scaling too? */ conntrack->proto.tcp.seen[0].flags = @@ -1121,11 +1136,24 @@ static int tcp_to_nfattr(struct sk_buff const struct nf_conn *ct) { struct nfattr *nest_parms; - + u_int8_t sack; + read_lock_bh(&tcp_lock); nest_parms = NFA_NEST(skb, CTA_PROTOINFO_TCP); NFA_PUT(skb, CTA_PROTOINFO_TCP_STATE, sizeof(u_int8_t), &ct->proto.tcp.state); + /* window scale factor: original direction (SYN) */ + NFA_PUT(skb, CTA_PROTOINFO_TCP_WSCALE_ORIGINAL, sizeof(u_int8_t), + &ct->proto.tcp.seen[0].td_scale); + /* window scale factor: reply direction (SYN+ACK) */ + NFA_PUT(skb, CTA_PROTOINFO_TCP_WSCALE_REPLY, sizeof(u_int8_t), + &ct->proto.tcp.seen[1].td_scale); + /* SACK: original direction */ + sack = ct->proto.tcp.seen[0].flags & IP_CT_TCP_FLAG_SACK_PERM; + NFA_PUT(skb, CTA_PROTOINFO_TCP_SACK_ORIGINAL, sizeof(u_int8_t), &sack); + /* SACK: reply direction */ + sack = ct->proto.tcp.seen[1].flags & IP_CT_TCP_FLAG_SACK_PERM; + NFA_PUT(skb, CTA_PROTOINFO_TCP_SACK_REPLY, sizeof(u_int8_t), &sack); read_unlock_bh(&tcp_lock); NFA_NEST_END(skb, nest_parms); @@ -1138,7 +1166,11 @@ nfattr_failure: } static const size_t cta_min_tcp[CTA_PROTOINFO_TCP_MAX] = { - [CTA_PROTOINFO_TCP_STATE-1] = sizeof(u_int8_t), + [CTA_PROTOINFO_TCP_STATE-1] = sizeof(u_int8_t), + [CTA_PROTOINFO_TCP_WSCALE_ORIGINAL-1] = sizeof(u_int8_t), + [CTA_PROTOINFO_TCP_WSCALE_REPLY-1] = sizeof(u_int8_t), + [CTA_PROTOINFO_TCP_SACK_ORIGINAL-1] = sizeof(u_int8_t), + [CTA_PROTOINFO_TCP_SACK_REPLY-1] = sizeof(u_int8_t) }; static int nfattr_to_tcp(struct nfattr *cda[], struct nf_conn *ct) @@ -1162,6 +1194,40 @@ static int nfattr_to_tcp(struct nfattr * write_lock_bh(&tcp_lock); ct->proto.tcp.state = *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_STATE-1]); + /* window scale factor: original direction (SYN) */ + if (tb[CTA_PROTOINFO_TCP_WSCALE_ORIGINAL-1]) { + ct->proto.tcp.seen[0].td_scale = + *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_WSCALE_ORIGINAL-1]); + ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_WINDOW_SCALE; + } + /* window scale factor: reply direction (SYN+ACK) */ + if (tb[CTA_PROTOINFO_TCP_WSCALE_REPLY-1]) { + ct->proto.tcp.seen[1].td_scale = + *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_WSCALE_REPLY-1]); + ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_WINDOW_SCALE; + } + /* enable/disable SACK: original direction */ + if (tb[CTA_PROTOINFO_TCP_SACK_ORIGINAL-1]) { + u_int8_t enable = + *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_SACK_ORIGINAL-1]); + if (enable) + ct->proto.tcp.seen[0].flags |= + IP_CT_TCP_FLAG_SACK_PERM; + else + ct->proto.tcp.seen[0].flags &= + ~IP_CT_TCP_FLAG_SACK_PERM; + } + /* enable/disable SACK: reply direction */ + if (tb[CTA_PROTOINFO_TCP_SACK_REPLY-1]) { + u_int8_t enable = + *(u_int8_t *)NFA_DATA(tb[CTA_PROTOINFO_TCP_SACK_REPLY-1]); + if (enable) + ct->proto.tcp.seen[1].flags |= + IP_CT_TCP_FLAG_SACK_PERM; + else + ct->proto.tcp.seen[1].flags &= + ~IP_CT_TCP_FLAG_SACK_PERM; + } write_unlock_bh(&tcp_lock); return 0; Index: net-2.6/include/linux/netfilter/nfnetlink_conntrack.h =================================================================== --- net-2.6.orig/include/linux/netfilter/nfnetlink_conntrack.h 2006-08-16 22:35:52.000000000 +0200 +++ net-2.6/include/linux/netfilter/nfnetlink_conntrack.h 2006-08-17 13:24:13.000000000 +0200 @@ -83,6 +83,10 @@ enum ctattr_protoinfo { enum ctattr_protoinfo_tcp { CTA_PROTOINFO_TCP_UNSPEC, CTA_PROTOINFO_TCP_STATE, + CTA_PROTOINFO_TCP_WSCALE_ORIGINAL, + CTA_PROTOINFO_TCP_WSCALE_REPLY, + CTA_PROTOINFO_TCP_SACK_ORIGINAL, + CTA_PROTOINFO_TCP_SACK_REPLY, __CTA_PROTOINFO_TCP_MAX }; #define CTA_PROTOINFO_TCP_MAX (__CTA_PROTOINFO_TCP_MAX - 1) From pablo at netfilter.org Mon Aug 21 10:47:49 2006 From: pablo at netfilter.org (Pablo Neira Ayuso) Date: Mon Aug 21 11:13:50 2006 Subject: [PATCH 3/3][CONNTRACK] Fix race condition in early drop Message-ID: <44E97335.1080105@netfilter.org> [CONNTRACK] Fix race condition in early drop On SMP environments the maximum number of conntracks can be overpassed under heavy stress situations due to an existing race condition. CPU A CPU B atomic_read() ... early_drop() ... ... atomic_read() allocate conntrack allocate conntrack atomic_inc() atomic_inc() This patch uses an optimistic approach to solve the concurrency problem. Signed-off-by: Pablo Neira Ayuso -- The dawn of the fourth age of Linux firewalling is coming; a time of great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris -------------- next part -------------- [CONNTRACK] Fix race condition in early drop On SMP environments the maximum number of conntracks can be overpassed under heavy stress situations due to an existing race condition. CPU A CPU B atomic_read() ... early_drop() ... ... atomic_read() allocate conntrack allocate conntrack atomic_inc() atomic_inc() This patch uses an optimistic approach to solve the concurrency problem. Signed-off-by: Pablo Neira Ayuso Index: net-2.6/net/ipv4/netfilter/ip_conntrack_core.c =================================================================== --- net-2.6.orig/net/ipv4/netfilter/ip_conntrack_core.c 2006-08-17 15:50:33.000000000 +0200 +++ net-2.6/net/ipv4/netfilter/ip_conntrack_core.c 2006-08-17 17:52:27.000000000 +0200 @@ -642,21 +642,32 @@ struct ip_conntrack *ip_conntrack_alloc( } if (ip_conntrack_max - && atomic_read(&ip_conntrack_count) >= ip_conntrack_max) { + && !atomic_add_unless(&ip_conntrack_count, 1, ip_conntrack_max)) { unsigned int hash = hash_conntrack(orig); /* Try dropping from this hash chain. */ - if (!early_drop(&ip_conntrack_hash[hash])) { - if (net_ratelimit()) - printk(KERN_WARNING - "ip_conntrack: table full, dropping" - " packet.\n"); - return ERR_PTR(-ENOMEM); - } + do { + if (!early_drop(&ip_conntrack_hash[hash])) { + if (net_ratelimit()) + printk(KERN_WARNING + "ip_conntrack: table full, " + "dropping packet.\n"); + return ERR_PTR(-ENOMEM); + } + /* + * On SMP environments, if the table is full and we + * early drop a conntrack to make some place for this + * new one then we have to ensure that no other + * conntrack slips through. + */ + } while (!atomic_add_unless(&ip_conntrack_count, + 1, + ip_conntrack_max)); } conntrack = kmem_cache_alloc(ip_conntrack_cachep, GFP_ATOMIC); if (!conntrack) { DEBUGP("Can't allocate conntrack.\n"); + atomic_dec(&ip_conntrack_count); return ERR_PTR(-ENOMEM); } @@ -670,8 +681,6 @@ struct ip_conntrack *ip_conntrack_alloc( conntrack->timeout.data = (unsigned long)conntrack; conntrack->timeout.function = death_by_timeout; - atomic_inc(&ip_conntrack_count); - return conntrack; } Index: net-2.6/net/netfilter/nf_conntrack_core.c =================================================================== --- net-2.6.orig/net/netfilter/nf_conntrack_core.c 2006-08-18 19:23:19.000000000 +0200 +++ net-2.6/net/netfilter/nf_conntrack_core.c 2006-08-18 20:20:08.000000000 +0200 @@ -868,16 +868,26 @@ __nf_conntrack_alloc(const struct nf_con } if (nf_conntrack_max - && atomic_read(&nf_conntrack_count) >= nf_conntrack_max) { + && !atomic_add_unless(&nf_conntrack_count, 1, nf_conntrack_max)) { unsigned int hash = hash_conntrack(orig); /* Try dropping from this hash chain. */ - if (!early_drop(&nf_conntrack_hash[hash])) { - if (net_ratelimit()) - printk(KERN_WARNING - "nf_conntrack: table full, dropping" - " packet.\n"); - return ERR_PTR(-ENOMEM); - } + do { + if (!early_drop(&nf_conntrack_hash[hash])) { + if (net_ratelimit()) + printk(KERN_WARNING + "ip_conntrack: table full, " + "dropping packet.\n"); + return ERR_PTR(-ENOMEM); + } + /* + * On SMP environments, if the table is full and we + * early drop a conntrack to make some place for this + * new one then we have to ensure that no other + * conntrack slips through. + */ + } while (!atomic_add_unless(&nf_conntrack_count, + 1, + nf_conntrack_max)); } /* find features needed by this conntrack. */ @@ -923,9 +933,12 @@ __nf_conntrack_alloc(const struct nf_con conntrack->timeout.data = (unsigned long)conntrack; conntrack->timeout.function = death_by_timeout; - atomic_inc(&nf_conntrack_count); + read_unlock_bh(&nf_ct_cache_lock); + return conntrack; + out: read_unlock_bh(&nf_ct_cache_lock); + atomic_dec(&nf_conntrack_count); return conntrack; } From pablo at netfilter.org Mon Aug 21 11:00:07 2006 From: pablo at netfilter.org (Pablo Neira Ayuso) Date: Mon Aug 21 11:26:07 2006 Subject: [RFC][PATCH] libnfnetlink new API #2 In-Reply-To: <44D870B5.8040707@trash.net> References: <44C63B3F.2090509@netfilter.org> <44D870B5.8040707@trash.net> Message-ID: <44E97617.7060006@netfilter.org> Patrick McHardy wrote: > Pablo Neira Ayuso wrote: > >>Hi, >> >>Since I'll be leaving for two weeks, I'd like to put a patch for >>libnfnetlink on the table that I'm currently distributing with >>conntrackd for further discussion. I'd like to see this patch or >>something similar in mainline someday. >> >>This patch: >> >>- Fixes error handling that is currently broken, errors are now reported >>via errno so everyone could use perror(...) to get a more detailed >>description to know what is going wrong. Basically the new functions >>return -1 and set errno appropiately. >> >>- Adds Documentation that comes handy for developers. >> >>- Introduces replacement for nfnl_listen (nfnl_receive_process) and for >>nfnl_talk (nfnl_send_received_process), both are integrated with the >>nfnl_subsys_handle logic introduced by Harald, that IMHO must be the >>right direction, and set errno appropiately in case of error. These new >>functions obsolete nfnl_listen and nfnl_talk, we can add a clause >>__deprecated to warn programmers without removing them. >> >>- Iterator API: to loop over a multipart netlink message and process it. >>This gives more control in the message processing. It is similar to >>Harald's nfnl_get_first_msg, nfnl_get_msg_next and nfnl_handle_packet >>set of functions but sets errno and move iterator private information >>out of nfnl_handle. I must confess that in this case I don't like too >>much the idea of providing too many function to do the same but my API >>looks friendlier I think, programmers are familiar with the concept of >>iterators. >> >>- Introduce assertions to check input data: This can catch up wrong use >>of the API and errors and the application can break "nicer" (if >>breakages would ever be nice...) that segfaulting. I have seen these in >>others libraries. >> >>In short: I think that we can deprecate old functions (just adding a >>warning in compilation time) and remove them in version 2, I have seen >>this in other libraries: we maintaining an old version 1 for those that >>don't want to move forward some time but provide a clean version 2 >>and drop early design errors. BTW, probably the name of some functions >>are ugly, I accept suggestions ;) >> >>@Patrick: I think that Harald has more in-deep knowledge about the >>libraries but, since he's really busy these days, your impressions on >>this issue can be also worth as well. > > > > Without commenting on deprecating functions (I don't know about that), > the patch looks good to me. My idea is to introduce something like this to deprecate functions: +/* Deprecated API, keep it to ensure backward compatibility */ + +#if __GNUC_MINOR__ > 0 +# ifndef __deprecated +# define __deprecated __attribute__((deprecated)) +# endif +#endif + +extern int __deprecated nfnl_handle_packet(struct nfnl_handle *, char *buf, + int len); + >>+ * nfnl_send_receive_process - request/response challenge >>+ * @h: nfnetlink handler >>+ * @nlh: nfnetlink message to be sent >>+ * >>+ * This function is sends a nfnetlink message to a certain subsystem >>+ * and receives the response that is processed by the callback registered >>+ * via register_callback(). Note that this function is a replacement for >>+ * nfnl_talk, its use is recommended. >>+ * >>+ * On success, 0 is returned. On error, a negative is returned. If your >>+ * does not want to listen to events anymore, then it must return a value >>+ * lesser or equal to 0. >>+ * >>+ * Note that ENOBUFS is returned in case that nfnetlink is exhausted. In >>+ * that case is possible that the information requested is incomplete. >>+ */ >>+int nfnl_send_receive_process(struct nfnl_handle *h, struct nlmsghdr *nlh) >>+{ >>+ assert(h); >>+ assert(nlh); >>+ >>+ if (nfnl_send(h, nlh) == -1) >>+ return -1; >>+ >>+ return nfnl_receive_process(h); >>+} > > > This doesn't really do what it promises, it will call the callback for > any message it receives, not only a response. We need to start using > sequence numbers before we associate responses with queries. Hm, then it is my English that is broken, I supposed that response means every message received from the subsystem :( -- The dawn of the fourth age of Linux firewalling is coming; a time of great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris From pablo at netfilter.org Mon Aug 21 11:01:23 2006 From: pablo at netfilter.org (Pablo Neira Ayuso) Date: Mon Aug 21 11:27:29 2006 Subject: [RFC]libnfnetlink new API #3 Message-ID: <44E97663.8040904@netfilter.org> Hi, This patch follows from: http://patchwork.netfilter.org/netfilter-devel/patch.pl?id=3695 Now we do sequence tracking for queries, I assume that sequence number 0 is only used by events. Wait for your comments, Pablo -- The dawn of the fourth age of Linux firewalling is coming; a time of great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris -------------- next part -------------- Index: include/libnfnetlink/libnfnetlink.h =================================================================== --- include/libnfnetlink/libnfnetlink.h (revisiĂłn: 6652) +++ include/libnfnetlink/libnfnetlink.h (copia de trabajo) @@ -98,6 +98,35 @@ const unsigned char *buf, size_t len); +/* join a certain netlink multicast group */ +extern int nfnl_join(const struct nfnl_handle *nfnlh, unsigned int group); + +/* process a netlink message */ +extern int nfnl_process(struct nfnl_handle *h, + const unsigned char *buf, + size_t len); + +/* iterator API */ + +extern struct nfnl_iterator * +nfnl_iterator_create(const struct nfnl_handle *h, + const char *buf, + size_t len); + +extern void nfnl_iterator_destroy(struct nfnl_iterator *it); + +extern int nfnl_iterator_process(struct nfnl_handle *h, + struct nfnl_iterator *it); + +extern int nfnl_iterator_next(const struct nfnl_handle *h, + struct nfnl_iterator *it); + +/* replacement for nfnl_listen */ +extern int nfnl_catch(struct nfnl_handle *h); + +/* replacement for nfnl_talk */ +extern int nfnl_query(struct nfnl_handle *h, struct nlmsghdr *nlh); + #define nfnl_attr_present(tb, attr) \ (tb[attr-1]) Index: src/libnfnetlink.c =================================================================== --- src/libnfnetlink.c (revisiĂłn: 6652) +++ src/libnfnetlink.c (copia de trabajo) @@ -1,6 +1,7 @@ /* libnfnetlink.c: generic library for communication with netfilter * * (C) 2002-2006 by Harald Welte + * (C) 2006 by Pablo Neira Ayuso * * Based on some original ideas from Jay Schulist * @@ -24,6 +25,13 @@ * 2006-01-26 Harald Welte : * remove bogus nfnlh->local.nl_pid from nfnl_open ;) * add 16bit attribute functions + * + * 2006-07-03 Pablo Neira Ayuso : + * add iterator API + * add replacements for nfnl_listen and nfnl_talk + * fix error handling + * add assertions + * add documentation */ #include @@ -33,7 +41,7 @@ #include #include #include - +#include #include #include @@ -50,7 +58,7 @@ #define SOL_NETLINK 270 #endif - +/* FIXME: this should vanish, but it is used by listen() and talk() */ #define nfnl_error(format, args...) \ fprintf(stderr, "%s: " format "\n", __FUNCTION__, ## args) @@ -104,8 +112,16 @@ } } +/** + * nfnl_fd - returns the descriptor that identifies the socket + * @nfnlh: nfnetlink handler + * + * Use this function if you need to interact with the socket. Common + * scenarios are the use of poll()/select() to achieve multiplexation. + */ int nfnl_fd(struct nfnl_handle *h) { + assert(h); return h->fd; } @@ -117,14 +133,11 @@ for (i = 0; i < NFNL_MAX_SUBSYS; i++) new_subscriptions |= nfnlh->subsys[i].subscriptions; - nfnlh->local.nl_groups = new_subscriptions; err = bind(nfnlh->fd, (struct sockaddr *)&nfnlh->local, sizeof(nfnlh->local)); - if (err < 0) { - nfnl_error("bind(netlink): %s", strerror(errno)); - return err; - } + if (err == -1) + return -1; nfnlh->subscriptions = new_subscriptions; @@ -132,10 +145,13 @@ } /** - * nfnl_open - open a netlink socket + * nfnl_open - open a nfnetlink handler * - * nfnlh: libnfnetlink handle to be allocated by user + * This function creates a nfnetlink handler, this is required to establish + * a communication between the userspace and the nfnetlink system. * + * On success, a valid address that points to a nfnl_handle structure + * is returned. On error, NULL is returned and errno is set approapiately. */ struct nfnl_handle *nfnl_open(void) { @@ -149,10 +165,8 @@ memset(nfnlh, 0, sizeof(*nfnlh)); nfnlh->fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER); - if (nfnlh->fd < 0) { - nfnl_error("socket(netlink): %s", strerror(errno)); + if (nfnlh->fd == -1) goto err_free; - } nfnlh->local.nl_family = AF_NETLINK; nfnlh->peer.nl_family = AF_NETLINK; @@ -161,12 +175,11 @@ err = getsockname(nfnlh->fd, (struct sockaddr *)&nfnlh->local, &addr_len); if (addr_len != sizeof(nfnlh->local)) { - nfnl_error("Bad address length (%u != %zd)", addr_len, - sizeof(nfnlh->local)); + errno = EINVAL; goto err_close; } if (nfnlh->local.nl_family != AF_NETLINK) { - nfnl_error("Bad address family %d", nfnlh->local.nl_family); + errno = EINVAL; goto err_close; } nfnlh->seq = time(NULL); @@ -183,8 +196,7 @@ err = getsockname(nfnlh->fd, (struct sockaddr *)&nfnlh->local, &addr_len); if (addr_len != sizeof(nfnlh->local)) { - nfnl_error("Bad address length (%u != %zd)", addr_len, - sizeof(nfnlh->local)); + errno = EINVAL; goto err_close; } @@ -199,11 +211,18 @@ /** * nfnl_subsys_open - open a netlink subsystem + * @nfnlh: libnfnetlink handle + * @subsys_id: which nfnetlink subsystem we are interested in + * @cb_count: number of callbacks that are used maximum. + * @subscriptions: netlink groups we want to be subscribed to * - * nfnlh: libnfnetlink handle - * subsys_id: which nfnetlink subsystem we are interested in - * cb_count: number of callbacks that are used maximum. - * subscriptions: netlink groups we want to be subscribed to + * This function creates a subsystem handler that contains the set of + * callbacks that handle certain types of messages coming from a netfilter + * subsystem. Initially the callback set is empty, you can register callbacks + * via nfnl_callback_register(). + * + * On error, NULL is returned and errno is set appropiately. On success, + * a valid address that points to a nfnl_subsys_handle structure is returned. */ struct nfnl_subsys_handle * nfnl_subsys_open(struct nfnl_handle *nfnlh, u_int8_t subsys_id, @@ -211,30 +230,30 @@ { struct nfnl_subsys_handle *ssh; + assert(nfnlh); + if (subsys_id > NFNL_MAX_SUBSYS) { - + errno = ENOENT; return NULL; } ssh = &nfnlh->subsys[subsys_id]; if (ssh->cb) { - + errno = EBUSY; return NULL; } ssh->cb = malloc(sizeof(*(ssh->cb)) * cb_count); - if (!ssh->cb) { - + if (!ssh->cb) return NULL; - } ssh->nfnlh = nfnlh; ssh->cb_count = cb_count; ssh->subscriptions = subscriptions; ssh->subsys_id = subsys_id; - /* FIXME: reimplement this based on - * setsockopt(nfnlh->fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP,,) */ + /* although now we have nfnl_join to subscribe to certain + * groups, just keep this to ensure compatibility */ if (recalc_rebind_subscriptions(nfnlh) < 0) { free(ssh->cb); ssh->cb = NULL; @@ -244,8 +263,16 @@ return ssh; } +/** + * nfnl_subsys_close - close a nfnetlink subsys handler + * @ssh: nfnetlink subsystem handler + * + * Release all the callbacks registered in a subsystem handler. + */ void nfnl_subsys_close(struct nfnl_subsys_handle *ssh) { + assert(ssh); + ssh->subscriptions = 0; ssh->cb_count = 0; if (ssh->cb) { @@ -255,15 +282,18 @@ } /** - * nfnl_close - close netlink socket + * nfnl_close - close a nfnetlink handler + * @nfnlh: nfnetlink handler * - * nfnlh: libnfnetlink handle - * + * This function closes the nfnetlink handler. On success, 0 is returned. + * On error, -1 is returned and errno is set appropiately. */ int nfnl_close(struct nfnl_handle *nfnlh) { int i, ret; + assert(nfnlh); + for (i = 0; i < NFNL_MAX_SUBSYS; i++) nfnl_subsys_close(&nfnlh->subsys[i]); @@ -277,13 +307,37 @@ } /** + * nfnl_join - join a nfnetlink multicast group + * @nfnlh: nfnetlink handler + * @group: group we want to join + * + * This function is used to join a certain multicast group. It must be + * called once the nfnetlink handler has been created. If any doubt, + * just use it if you have to listen to nfnetlink events. + * + * On success, 0 is returned. On error, -1 is returned and errno is set + * approapiately. + */ +int nfnl_join(const struct nfnl_handle *nfnlh, unsigned int group) +{ + assert(nfnlh); + return setsockopt(nfnlh->fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, + &group, sizeof(group)); +} + +/** * nfnl_send - send a nfnetlink message through netlink socket + * @nfnlh: nfnetlink handler + * @n: netlink message * - * nfnlh: libnfnetlink handle - * n: netlink message + * On success, 0 is returned. On error, -1 is returned and errno is set + * appropiately. */ int nfnl_send(struct nfnl_handle *nfnlh, struct nlmsghdr *n) { + assert(nfnlh); + assert(n); + nfnl_debug_dump_packet(n, n->nlmsg_len+sizeof(*n), "nfnl_send"); return sendto(nfnlh->fd, n, n->nlmsg_len, 0, @@ -293,6 +347,9 @@ int nfnl_sendmsg(const struct nfnl_handle *nfnlh, const struct msghdr *msg, unsigned int flags) { + assert(nfnlh); + assert(msg); + return sendmsg(nfnlh->fd, msg, flags); } @@ -301,6 +358,8 @@ { struct msghdr msg; + assert(nfnlh); + msg.msg_name = (struct sockaddr *) &nfnlh->peer; msg.msg_namelen = sizeof(nfnlh->peer); msg.msg_iov = (struct iovec *) iov; @@ -314,18 +373,17 @@ /** * nfnl_fill_hdr - fill in netlink and nfnetlink header + * @nfnlh: nfnetlink handle + * @nlh: netlink message to be filled in + * @len: length of _payload_ bytes (not including nfgenmsg) + * @family: AF_INET / ... + * @res_id: resource id + * @msg_type: nfnetlink message type (without subsystem) + * @msg_flags: netlink message flags * - * nfnlh: libnfnetlink handle - * nlh: netlink header to be filled in - * len: length of _payload_ bytes (not including nfgenmsg) - * family: AF_INET / ... - * res_id: resource id - * msg_type: nfnetlink message type (without subsystem) - * msg_flags: netlink message flags - * - * NOTE: the nlmsghdr must point to a memory region of at least - * the size of struct nlmsghdr + struct nfgenmsg - * + * This function sets up appropiately the nfnetlink header. See that the + * pointer to the netlink message passed must point to a memory region of + * at least the size of struct nlmsghdr + struct nfgenmsg. */ void nfnl_fill_hdr(struct nfnl_subsys_handle *ssh, struct nlmsghdr *nlh, unsigned int len, @@ -334,6 +392,9 @@ u_int16_t msg_type, u_int16_t msg_flags) { + assert(ssh); + assert(nlh); + struct nfgenmsg *nfg = (struct nfgenmsg *) ((void *)nlh + sizeof(*nlh)); @@ -343,6 +404,10 @@ nlh->nlmsg_pid = 0; nlh->nlmsg_seq = ++ssh->nfnlh->seq; + /* check for wraparounds: assume that seqnum 0 is only used by events */ + if (!ssh->nfnlh->seq) + nlh->nlmsg_seq = ssh->nfnlh->seq = time(NULL); + nfg->nfgen_family = family; nfg->version = NFNETLINK_V0; nfg->res_id = htons(res_id); @@ -368,6 +433,22 @@ return ((void *)nlh + NLMSG_LENGTH(sizeof(struct nfgenmsg))); } +/** + * nfnl_recv - receive data from a nfnetlink subsystem + * @h: nfnetlink handler + * @buf: buffer where the data will be stored + * @len: size of the buffer + * + * This function doesn't perform any sanity checking. So do no expect + * that the data is well-formed. Such checkings are done by the parsing + * functions. + * + * On success, 0 is returned. On error, -1 is returned and errno is set + * appropiately. + * + * Note that ENOBUFS is returned in case that nfnetlink is exhausted. In + * that case is possible that the information requested is incomplete. + */ ssize_t nfnl_recv(const struct nfnl_handle *h, unsigned char *buf, size_t len) { @@ -375,6 +456,10 @@ int status; struct nlmsghdr *nlh; struct sockaddr_nl peer; + + assert(h); + assert(buf); + assert(len > 0); if (len < sizeof(struct nlmsgerr) || len < sizeof(struct nlmsghdr)) @@ -400,9 +485,8 @@ } /** * nfnl_listen: listen for one or more netlink messages - * - * nfnhl: libnfnetlink handle - * handler: callback function to be called for every netlink message + * @nfnhl: libnfnetlink handle + * @handler: callback function to be called for every netlink message * - the callback handler should normally return 0 * - but may return a negative error code which will cause * nfnl_listen to return immediately with the same error code @@ -413,8 +497,14 @@ * without any loss of data, a negative error code will terminate * nfnl_listen "very soon" and throw away data already read from * the netlink socket. - * jarg: opaque argument passed on to callback + * @jarg: opaque argument passed on to callback * + * This function is used to receive and process messages coming from an open + * nfnetlink handler like events or information request via nfnl_send(). + * + * On error, -1 is returned, unfortunately errno is not always set + * appropiately. For that reason, the use of this function is DEPRECATED. + * Please, use nfnl_receive_process() instead. */ int nfnl_listen(struct nfnl_handle *nfnlh, int (*handler)(struct sockaddr_nl *, struct nlmsghdr *n, @@ -511,6 +601,20 @@ return quit; } +/** + * nfnl_talk - send a request and then receive and process messages returned + * @nfnlh: nfnetelink handler + * @n: netlink message that contains the request + * @peer: peer PID + * @groups: netlink groups + * @junk: callback called if out-of-sequence messages were received + * @jarg: data for the junk callback + * + * This function is used to request an action that does not returns any + * information. On error, a negative value is returned, errno could be + * set appropiately. For that reason, the use of this function is DEPRECATED. + * Please, use nfnl_send_received_process() instead. + */ int nfnl_talk(struct nfnl_handle *nfnlh, struct nlmsghdr *n, pid_t peer, unsigned groups, struct nlmsghdr *answer, int (*junk)(struct sockaddr_nl *, struct nlmsghdr *n, void *), @@ -629,13 +733,11 @@ /** * nfnl_addattr_l - Add variable length attribute to nlmsghdr - * - * n: netlink message header to which attribute is to be added - * maxlen: maximum length of netlink message header - * type: type of new attribute - * data: content of new attribute - * alen: attribute length - * + * @n: netlink message header to which attribute is to be added + * @maxlen: maximum length of netlink message header + * @type: type of new attribute + * @data: content of new attribute + * @len: attribute length */ int nfnl_addattr_l(struct nlmsghdr *n, int maxlen, int type, void *data, int alen) @@ -643,9 +745,12 @@ int len = NFA_LENGTH(alen); struct nfattr *nfa; + assert(n); + assert(maxlen > 0); + assert(type >= 0); + if ((NLMSG_ALIGN(n->nlmsg_len) + len) > maxlen) { - nfnl_error("%d greater than maxlen (%d)\n", - NLMSG_ALIGN(n->nlmsg_len) + len, maxlen); + errno = ENOSPC; return -1; } @@ -673,8 +778,14 @@ struct nfattr *subnfa; int len = NFA_LENGTH(alen); - if ((NFA_OK(nfa, nfa->nfa_len) + len) > maxlen) + assert(nfa); + assert(maxlen > 0); + assert(type >= 0); + + if ((NFA_OK(nfa, nfa->nfa_len) + len) > maxlen) { + errno = ENOSPC; return -1; + } subnfa = (struct nfattr *)(((char *)nfa) + NFA_OK(nfa, nfa->nfa_len)); subnfa->nfa_type = type; @@ -697,6 +808,9 @@ int nfnl_nfa_addattr16(struct nfattr *nfa, int maxlen, int type, u_int16_t data) { + assert(nfa); + assert(maxlen > 0); + assert(type >= 0); return nfnl_nfa_addattr_l(nfa, maxlen, type, &data, sizeof(data)); } @@ -713,6 +827,10 @@ int nfnl_addattr16(struct nlmsghdr *n, int maxlen, int type, u_int16_t data) { + assert(n); + assert(maxlen > 0); + assert(type >= 0); + return nfnl_addattr_l(n, maxlen, type, &data, sizeof(data)); } @@ -728,6 +846,9 @@ int nfnl_nfa_addattr32(struct nfattr *nfa, int maxlen, int type, u_int32_t data) { + assert(nfa); + assert(maxlen > 0); + assert(type >= 0); return nfnl_nfa_addattr_l(nfa, maxlen, type, &data, sizeof(data)); } @@ -744,6 +865,10 @@ int nfnl_addattr32(struct nlmsghdr *n, int maxlen, int type, u_int32_t data) { + assert(n); + assert(maxlen > 0); + assert(type >= 0); + return nfnl_addattr_l(n, maxlen, type, &data, sizeof(data)); } @@ -758,6 +883,10 @@ */ int nfnl_parse_attr(struct nfattr *tb[], int max, struct nfattr *nfa, int len) { + assert(tb); + assert(max > 0); + assert(nfa); + memset(tb, 0, sizeof(struct nfattr *) * max); while (NFA_OK(nfa, len)) { @@ -765,8 +894,7 @@ tb[NFA_TYPE(nfa)-1] = nfa; nfa = NFA_NEXT(nfa,len); } - if (len) - nfnl_error("deficit (%d) len (%d).\n", len, nfa->nfa_len); + assert(len == 0); return 0; } @@ -784,6 +912,9 @@ void nfnl_build_nfa_iovec(struct iovec *iov, struct nfattr *nfa, u_int16_t type, u_int32_t len, unsigned char *val) { + assert(iov); + assert(nfa); + /* Set the attribut values */ nfa->nfa_len = sizeof(struct nfattr) + len; nfa->nfa_type = type; @@ -798,12 +929,25 @@ #define SO_RCVBUFFORCE (33) #endif +/** + * nfnl_rcvbufsiz - set the socket buffer size + * @h: nfnetlink handler + * @size: size of the buffer we want to set + * + * This function sets the new size of the socket buffer. Use this setting + * to increase the socket buffer size if your system is reporting ENOBUFS + * errors. + * + * This function returns the new size of the socket buffer. + */ unsigned int nfnl_rcvbufsiz(struct nfnl_handle *h, unsigned int size) { int status; socklen_t socklen = sizeof(size); unsigned int read_size = 0; + assert(h); + /* first we try the FORCE option, which is introduced in kernel * 2.6.14 to give "root" the ability to override the system wide * maximum */ @@ -818,13 +962,28 @@ return read_size; } - +/** + * nfnl_get_msg_first - get the first message of a multipart netlink message + * @h: nfnetlink handle + * @buf: data received that we want to process + * @len: size of the data received + * + * This function returns a pointer to the first netlink message contained + * in the chunk of data received from certain nfnetlink subsystem. + * + * On success, a valid address that points to the netlink message is returned. + * On error, NULL is returned. + */ struct nlmsghdr *nfnl_get_msg_first(struct nfnl_handle *h, const unsigned char *buf, size_t len) { struct nlmsghdr *nlh; + assert(h); + assert(buf); + assert(len > 0); + /* first message in buffer */ nlh = (struct nlmsghdr *)buf; if (!NLMSG_OK(nlh, len)) @@ -841,6 +1000,10 @@ struct nlmsghdr *nlh; size_t remain_len; + assert(h); + assert(buf); + assert(len > 0); + /* if last header in handle not inside this buffer, * drop reference to last header */ if (!h->last_nlhdr || @@ -872,9 +1035,21 @@ return nlh; } +/** + * nfnl_callback_register - register a callback for a certain message type + * @ssh: nfnetlink subsys handler + * @type: subsys call + * @cb: nfnetlink callback to be registered + * + * On success, 0 is returned. On error, -1 is returned and errno is set + * appropiately. + */ int nfnl_callback_register(struct nfnl_subsys_handle *ssh, u_int8_t type, struct nfnl_callback *cb) { + assert(ssh); + assert(cb); + if (type >= ssh->cb_count) return -EINVAL; @@ -883,8 +1058,18 @@ return 0; } +/** + * nfnl_callback_unregister - unregister a certain callback + * @ssh: nfnetlink subsys handler + * @type: subsys call + * + * On sucess, 0 is returned. On error, -1 is returned and errno is + * set appropiately. + */ int nfnl_callback_unregister(struct nfnl_subsys_handle *ssh, u_int8_t type) { + assert(ssh); + if (type >= ssh->cb_count) return -EINVAL; @@ -897,6 +1082,10 @@ const struct nlmsghdr *nlh, struct nfattr *nfa[]) { + assert(h); + assert(nlh); + assert(nfa); + int min_len; u_int8_t type = NFNL_MSG_TYPE(nlh->nlmsg_type); u_int8_t subsys_id = NFNL_SUBSYS_ID(nlh->nlmsg_type); @@ -997,3 +1186,351 @@ } return 0; } + +static int nfnl_is_error(struct nfnl_handle *h, struct nlmsghdr *nlh) +{ + /* This message is an ACK or a DONE */ + if (nlh->nlmsg_type == NLMSG_ERROR || + (nlh->nlmsg_type == NLMSG_DONE && + nlh->nlmsg_flags & NLM_F_MULTI)) { + if (nlh->nlmsg_len < NLMSG_ALIGN(sizeof(struct nlmsgerr))) { + errno = EBADMSG; + return 1; + } + errno = *((int *)NLMSG_DATA(nlh)); + return 1; + } + return 0; +} + +/* On error, -1 is returned and errno is set appropiately. On success, + * 0 is returned if there is no more data to process, >0 if there is + * more data to process */ +static int nfnl_step(struct nfnl_handle *h, struct nlmsghdr *nlh) +{ + struct nfnl_subsys_handle *ssh; + u_int8_t type = NFNL_MSG_TYPE(nlh->nlmsg_type); + u_int8_t subsys_id = NFNL_SUBSYS_ID(nlh->nlmsg_type); + + /* Is this an error message? */ + if (nfnl_is_error(h, nlh)) { + /* This is an ACK */ + if (errno == 0) + return 0; + /* This an error message */ + return -1; + } + + /* nfnetlink sanity checks: check for nfgenmsg size */ + if (nlh->nlmsg_len < NLMSG_SPACE(sizeof(struct nfgenmsg))) { + errno = ENOSPC; + return -1; + } + + if (subsys_id > NFNL_MAX_SUBSYS) { + errno = ENOENT; + return -1; + } + + ssh = &h->subsys[subsys_id]; + if (!ssh) { + errno = ENOENT; + return -1; + } + + if (type >= ssh->cb_count) { + errno = ENOENT; + return -1; + } + + if (ssh->cb[type].attr_count) { + int err; + struct nfattr *tb[ssh->cb[type].attr_count]; + struct nfattr *attr = NFM_NFA(NLMSG_DATA(nlh)); + int min_len = NLMSG_SPACE(sizeof(struct nfgenmsg)); + int len = nlh->nlmsg_len - NLMSG_ALIGN(min_len); + + err = nfnl_parse_attr(tb, ssh->cb[type].attr_count, attr, len); + if (err == -1) + return -1; + + if (ssh->cb[type].call) { + /* + * On error, the callback returns -1 and errno must + * be explicitely set. On success, 0 is returned + * and we're done, otherwise >0 is returned that + * means that we want to continue data processing. + */ + return ssh->cb[type].call(nlh, + tb, + ssh->cb[type].data); + } + } + /* no callback set, continue data processing */ + return 1; +} + +/** + * nfnl_process - process data coming from a nfnetlink system + * @h: nfnetlink handler + * @buf: buffer that contains the netlink message + * @len: size of the data contained in the buffer (not the buffer size) + * + * This function processes all the nfnetlink messages contained inside a + * buffer. It performs the appropiate sanity checks and passes the message + * to a certain handler that is registered via register_callback(). + * + * On success, 0 is returned if the data processing has finished. If a + * value > 0 is returned, then there is more data to process. On error, + * -1 is returned and errno is set to the appropiate value. + * + * Note that the callback must return -1 and set errno in case of error. + * If your callback decides not to process data anymore for any reason, + * then it must return 0. Otherwise, if the callback continues the + * processing 1 is returned. + */ +int nfnl_process(struct nfnl_handle *h, const unsigned char *buf, size_t len) +{ + int ret = 0; + struct nlmsghdr *nlh = (struct nlmsghdr *)buf; + + assert(h); + assert(buf); + assert(len > 0); + + /* check for out of sequence message */ + if (nlh->nlmsg_seq && nlh->nlmsg_seq != h->seq) { + errno = EILSEQ; + return -1; + } + while (len >= NLMSG_SPACE(0) && NLMSG_OK(nlh, len)) { + + ret = nfnl_step(h, nlh); + if (ret <= 0) + break; + + nlh = NLMSG_NEXT(nlh, len); + } + return ret; +} + +/* + * New parsing functions based on iterators + */ + +struct nfnl_iterator { + struct nlmsghdr *nlh; + unsigned int len; +}; + +/** + * nfnl_iterator_create: create an nfnetlink iterator + * @h: nfnetlink handler + * @buf: buffer that contains data received from a nfnetlink system + * @len: size of the data contained in the buffer (not the buffer size) + * + * This function creates an iterator that can be used to parse nfnetlink + * message one by one. The iterator gives more control to the programmer + * in the messages processing. + * + * On success, a valid address is returned. On error, NULL is returned + * and errno is set to the appropiate value. + */ +struct nfnl_iterator * +nfnl_iterator_create(const struct nfnl_handle *h, + const char *buf, + size_t len) +{ + struct nlmsghdr *nlh; + struct nfnl_iterator *it; + + assert(h); + assert(buf); + assert(len > 0); + + it = malloc(sizeof(struct nfnl_iterator)); + if (!it) { + errno = ENOMEM; + return NULL; + } + + /* first message in buffer */ + nlh = (struct nlmsghdr *)buf; + if (len < NLMSG_SPACE(0) || !NLMSG_OK(nlh, len)) { + free(it); + errno = EBADMSG; + return NULL; + } + it->nlh = nlh; + it->len = len; + + return it; +} + +/** + * nfnl_iterator_destroy - destroy a nfnetlink iterator + * @it: nfnetlink iterator + * + * This function destroys a certain iterator. Nothing is returned. + */ +void nfnl_iterator_destroy(struct nfnl_iterator *it) +{ + assert(it); + free(it); +} + +/** + * nfnl_iterator_process - process a nfnetlink message + * @h: nfnetlink handler + * @it: nfnetlink iterator that contains the current message to be proccesed + * + * This function process just the current message selected by the iterator. + * On success, a value greater or equal to zero is returned. On error, + * -1 is returned and errno is appropiately set. + */ +int nfnl_iterator_process(struct nfnl_handle *h, struct nfnl_iterator *it) +{ + assert(h); + assert(it->nlh); + + /* check for out of sequence message */ + if (it->nlh->nlmsg_seq && it->nlh->nlmsg_seq != h->seq) { + errno = EILSEQ; + return -1; + } + if (it->len < NLMSG_SPACE(0) || !NLMSG_OK(it->nlh, it->len)) { + errno = EBADMSG; + return -1; + } + return nfnl_step(h, it->nlh); +} + +/** + * nfnl_iterator_next - get the next message hold by the iterator + * @h: nfnetlink handler + * @it: nfnetlink iterator that contains the current message processed + * + * This function update the current message to be processed pointer. + * It returns 1 if there is still more messages to be processed, otherwise + * 0 is returned. + */ +int nfnl_iterator_next(const struct nfnl_handle *h, struct nfnl_iterator *it) +{ + assert(h); + assert(it); + + it->nlh = NLMSG_NEXT(it->nlh, it->len); + if (!it->nlh) + return 0; + return 1; +} + +/** + * nfnl_catch - get responses from the nfnetlink system and process them + * @h: nfnetlink handler +* + * This function handles the data received from the nfnetlink system. + * For example, events generated by one of the subsystems. The message + * is passed to the callback registered via callback_register(). Note that + * this a replacement of nfnl_listen and its use is recommended. + * + * On success, 0 is returned. On error, a -1 is returned. If your does not + * want to listen to events anymore, then it must return a value equal + * to -1 and set errno to 0 (success). + * + * Note that ENOBUFS is returned in case that nfnetlink is exhausted. In + * that case is possible that the information requested is incomplete. + */ +int nfnl_catch(struct nfnl_handle *h) +{ + int ret; + unsigned int size = NFNL_BUFFSIZE; + + assert(h); + + /* + * Since nfqueue can send big packets, we don't know how big + * must be the buffer that have to store the received data. + */ + { + unsigned char buf[size]; + struct sockaddr_nl peer; + struct iovec iov = { + .iov_len = size, + }; + struct msghdr msg = { + .msg_name = (void *) &peer, + .msg_namelen = sizeof(peer), + .msg_iov = &iov, + .msg_iovlen = 1, + .msg_control = NULL, + .msg_controllen = 0, + .msg_flags = 0 + }; + + memset(&peer, 0, sizeof(peer)); + peer.nl_family = AF_NETLINK; + iov.iov_base = buf; + iov.iov_len = size; + +retry: ret = recvmsg(h->fd, &msg, MSG_PEEK); + if (ret == -1) { + /* interrupted syscall must retry */ + if (errno == EINTR) + goto retry; + /* otherwise give up */ + return -1; + } + + if (msg.msg_flags & MSG_TRUNC) + /* maximum size of data received from netlink */ + size = 65535; + } + + /* now, receive data from netlink */ + while (1) { + unsigned char buf[size]; + + ret = nfnl_recv(h, buf, sizeof(buf)); + if (ret == -1) { + /* interrupted syscall must retry */ + if (errno == EINTR) + continue; + break; + } + + ret = nfnl_process(h, buf, ret); + if (ret <= 0) + break; + } + + return ret; +} + +/** + * nfnl_query - request/response communication challenge + * @h: nfnetlink handler + * @nlh: nfnetlink message to be sent + * + * This function sends a nfnetlink message to a certain subsystem and + * receives the response messages associated, such messages are passed to + * the callback registered via register_callback(). Note that this function + * is a replacement for nfnl_talk, its use is recommended. + * + * On success, 0 is returned. On error, a negative is returned. If your + * does not want to listen to events anymore, then it must return a value + * lesser or equal to 0. + * + * Note that ENOBUFS is returned in case that nfnetlink is exhausted. In + * that case is possible that the information requested is incomplete. + */ +int nfnl_query(struct nfnl_handle *h, struct nlmsghdr *nlh) +{ + assert(h); + assert(nlh); + + if (nfnl_send(h, nlh) == -1) + return -1; + + return nfnl_catch(h); +} From olenf at ans.pl Mon Aug 21 12:18:22 2006 From: olenf at ans.pl (Krzysztof Oledzki) Date: Mon Aug 21 12:49:59 2006 Subject: [PATCH 2/3][CONNTRACK] Introduce the pickup facilities to take over TCP connections In-Reply-To: <44E972E1.4080500@netfilter.org> References: <44E972E1.4080500@netfilter.org> Message-ID: On Mon, 21 Aug 2006, Pablo Neira Ayuso wrote: > This patch introduces a new flag called IPS_PICKUP that forces the protocol > handler to pick up the window of valid TCP packets. Moreover, four new > attributes to inject the window scale factor and enable SACK are introduced. > > These new facilities provide the appropiate mechanisms to take over TCP > connections in failover settings with TCP tracking enabled. > Are there any plans for active-active synchronization? This requires online TCP SEQ sync or to keep connections in IPS_PICKUP state forever, doesn't it? Best regards, Krzysztof Ol?dzki From deepstar+NRpGDEuW at singularity.be Mon Aug 21 16:57:20 2006 From: deepstar+NRpGDEuW at singularity.be (Steven Van Acker) Date: Mon Aug 21 17:27:52 2006 Subject: status of nf-HIPAC integration ? Message-ID: <20060821145720.GA749@ekonomika.be> Hello, for some time now we have been using the nf-HIPAC patch in our firewalls' kernels and I'm glad to say it works nicely. Our firewalls still run 2.4.x kernels. Ever since the introduction of x-tables in the 2.6.x branch, the nf-HIPAC patch no longer applies. I found a patch at http://www.kernelproject.org/people/jhpark/nf-hipac-0.9.1-to-linux-2.6.16.16.patch by Jeho-Park, which should allow me to compile 2.6.16.16 with nf-HIPAC. Has anyone tried this patch ? I'm not sure what the future of nf-HIPAC is. I'd like it very much if the mainstream kernel came with nf-HIPAC by default, but I see no indications that anything is moving in that direction. Is nf-HIPAC still being worked on ? Is it still on the TODO-list to integrate nf-HIPAC into the mainstream kernel ? kind regards, -- Steven Van Acker -- My amazon wishlist: http://www.amazon.com/gp/registry/1DB4XNEIEQBPB From pablo at netfilter.org Mon Aug 21 22:04:17 2006 From: pablo at netfilter.org (Pablo Neira Ayuso) Date: Mon Aug 21 22:30:30 2006 Subject: [PATCH 2/3][CONNTRACK] Introduce the pickup facilities to take over TCP connections In-Reply-To: References: <44E972E1.4080500@netfilter.org> Message-ID: <44EA11C1.2090705@netfilter.org> Krzysztof Oledzki wrote: > > On Mon, 21 Aug 2006, Pablo Neira Ayuso wrote: > >> This patch introduces a new flag called IPS_PICKUP that forces the >> protocol handler to pick up the window of valid TCP packets. Moreover, >> four new attributes to inject the window scale factor and enable SACK >> are introduced. >> >> These new facilities provide the appropiate mechanisms to take over >> TCP connections in failover settings with TCP tracking enabled. >> > Are there any plans for active-active synchronization? This requires > online TCP SEQ sync or to keep connections in IPS_PICKUP state forever, > doesn't it? Hm, you mean the active-active setting for conntrackd? The current architecture already supports it. You seem to be confused with the IPS_PICKUP flag: this flag must be set for conntracks created from userspace via ctnetlink, thus the TCP window tracking knows that it has to take over the valid window of TCP sequences, once that happens this flag is unset. -- The dawn of the fourth age of Linux firewalling is coming; a time of great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris From olenf at ans.pl Tue Aug 22 00:15:22 2006 From: olenf at ans.pl (Krzysztof Oledzki) Date: Tue Aug 22 00:47:03 2006 Subject: [PATCH 2/3][CONNTRACK] Introduce the pickup facilities to take over TCP connections In-Reply-To: <44EA11C1.2090705@netfilter.org> References: <44E972E1.4080500@netfilter.org> <44EA11C1.2090705@netfilter.org> Message-ID: On Mon, 21 Aug 2006, Pablo Neira Ayuso wrote: > Krzysztof Oledzki wrote: >> >> On Mon, 21 Aug 2006, Pablo Neira Ayuso wrote: >> >>> This patch introduces a new flag called IPS_PICKUP that forces the >>> protocol handler to pick up the window of valid TCP packets. Moreover, >>> four new attributes to inject the window scale factor and enable SACK are >>> introduced. >>> >>> These new facilities provide the appropiate mechanisms to take over TCP >>> connections in failover settings with TCP tracking enabled. >>> >> Are there any plans for active-active synchronization? This requires online >> TCP SEQ sync or to keep connections in IPS_PICKUP state forever, doesn't >> it? > > Hm, you mean the active-active setting for conntrackd? The current > architecture already supports it. OK. > You seem to be confused with the IPS_PICKUP flag: this flag must be set for > conntracks created from userspace via ctnetlink, thus the TCP window tracking > knows that it has to take over the valid window of TCP sequences, once that > happens this flag is unset. Hm, lets assume we have two firewalls for active-active configuration. How does the second firewall know that it should accept/drop a packet with specific seq number from connection that was previously handled by the first one? Plase excuse me if this is obvious, but I wasn't able to find any information about how it was solved. All I found is the TODO file with: o support for TCP window tracking - at the moment you have to disable it: echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal Best regards, Krzysztof Ol?dzki From kaber at trash.net Tue Aug 22 00:52:22 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:24:28 2006 Subject: [NETFILTER 03/18]: ipt_recent: add module parameter for changing ownership of /proc/net/ipt_recent/* In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225222.10288.38763.sendpatchset@localhost.localdomain> [NETFILTER]: ipt_recent: add module parameter for changing ownership of /proc/net/ipt_recent/* Signed-off-by: Daniel De Graaf Signed-off-by: Patrick McHardy --- commit 55eeb35cfb789640cc0d3b179398b196286c5991 tree 7e5c77fd3c21372eeacca1c456a2c9bd49044517 parent aee6e3b681f66196cf3ec43b53b252b61f870f1a author Daniel De Graaf Fri, 11 Aug 2006 21:01:03 +0200 committer Patrick McHardy Fri, 11 Aug 2006 21:01:03 +0200 net/ipv4/netfilter/ipt_recent.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/net/ipv4/netfilter/ipt_recent.c b/net/ipv4/netfilter/ipt_recent.c index 61a2139..682c094 100644 --- a/net/ipv4/netfilter/ipt_recent.c +++ b/net/ipv4/netfilter/ipt_recent.c @@ -35,14 +35,20 @@ static unsigned int ip_list_tot = 100; static unsigned int ip_pkt_list_tot = 20; static unsigned int ip_list_hash_size = 0; static unsigned int ip_list_perms = 0644; +static unsigned int ip_list_uid = 0; +static unsigned int ip_list_gid = 0; module_param(ip_list_tot, uint, 0400); module_param(ip_pkt_list_tot, uint, 0400); module_param(ip_list_hash_size, uint, 0400); module_param(ip_list_perms, uint, 0400); +module_param(ip_list_uid, uint, 0400); +module_param(ip_list_gid, uint, 0400); MODULE_PARM_DESC(ip_list_tot, "number of IPs to remember per list"); MODULE_PARM_DESC(ip_pkt_list_tot, "number of packets per IP to remember (max. 255)"); MODULE_PARM_DESC(ip_list_hash_size, "size of hash table used to look up IPs"); MODULE_PARM_DESC(ip_list_perms, "permissions on /proc/net/ipt_recent/* files"); +MODULE_PARM_DESC(ip_list_uid,"owner of /proc/net/ipt_recent/* files"); +MODULE_PARM_DESC(ip_list_gid,"owning group of /proc/net/ipt_recent/* files"); struct recent_entry { @@ -274,6 +280,8 @@ #ifdef CONFIG_PROC_FS goto out; } t->proc->proc_fops = &recent_fops; + t->proc->uid = ip_list_uid; + t->proc->gid = ip_list_gid; t->proc->data = t; #endif spin_lock_bh(&recent_lock); From kaber at trash.net Tue Aug 22 00:52:19 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:24:29 2006 Subject: [NETFILTER 01/18]: x_tables: replace IPv4 dscp match by address family independent version In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225219.10288.17937.sendpatchset@localhost.localdomain> [NETFILTER]: x_tables: replace IPv4 dscp match by address family independent version This replaces IPv4 dscp match by address family independent version. This also - utilizes dsfield.h to get the DS field in IPv4/IPv6 header, and - checks for the DSCP value from user space. - fixes Kconfig help text. Signed-off-by: Yasuyuki Kozakai Signed-off-by: Patrick McHardy --- commit 8419c86c6871f880cd17a6cc29146d3d2da0477a tree 3a134391c4d162f7675f4374c144f7f5e0ab3725 parent 99c4451081b0ea2107ba4827f7d518e1c739cf1b author Yasuyuki Kozakai Fri, 11 Aug 2006 20:35:46 +0200 committer Patrick McHardy Fri, 11 Aug 2006 20:35:46 +0200 include/linux/netfilter/xt_dscp.h | 23 ++++++ include/linux/netfilter_ipv4/ipt_dscp.h | 14 ++-- net/ipv4/netfilter/Kconfig | 11 --- net/ipv4/netfilter/Makefile | 1 net/ipv4/netfilter/ipt_dscp.c | 54 --------------- net/netfilter/Kconfig | 11 +++ net/netfilter/Makefile | 1 net/netfilter/xt_dscp.c | 113 +++++++++++++++++++++++++++++++ 8 files changed, 154 insertions(+), 74 deletions(-) diff --git a/include/linux/netfilter/xt_dscp.h b/include/linux/netfilter/xt_dscp.h new file mode 100644 index 0000000..1da61e6 --- /dev/null +++ b/include/linux/netfilter/xt_dscp.h @@ -0,0 +1,23 @@ +/* x_tables module for matching the IPv4/IPv6 DSCP field + * + * (C) 2002 Harald Welte + * This software is distributed under GNU GPL v2, 1991 + * + * See RFC2474 for a description of the DSCP field within the IP Header. + * + * xt_dscp.h,v 1.3 2002/08/05 19:00:21 laforge Exp +*/ +#ifndef _XT_DSCP_H +#define _XT_DSCP_H + +#define XT_DSCP_MASK 0xfc /* 11111100 */ +#define XT_DSCP_SHIFT 2 +#define XT_DSCP_MAX 0x3f /* 00111111 */ + +/* match info */ +struct xt_dscp_info { + u_int8_t dscp; + u_int8_t invert; +}; + +#endif /* _XT_DSCP_H */ diff --git a/include/linux/netfilter_ipv4/ipt_dscp.h b/include/linux/netfilter_ipv4/ipt_dscp.h index 2fa6dfe..4b82ca9 100644 --- a/include/linux/netfilter_ipv4/ipt_dscp.h +++ b/include/linux/netfilter_ipv4/ipt_dscp.h @@ -10,14 +10,12 @@ #ifndef _IPT_DSCP_H #define _IPT_DSCP_H -#define IPT_DSCP_MASK 0xfc /* 11111100 */ -#define IPT_DSCP_SHIFT 2 -#define IPT_DSCP_MAX 0x3f /* 00111111 */ +#include -/* match info */ -struct ipt_dscp_info { - u_int8_t dscp; - u_int8_t invert; -}; +#define IPT_DSCP_MASK XT_DSCP_MASK +#define IPT_DSCP_SHIFT XT_DSCP_SHIFT +#define IPT_DSCP_MAX XT_DSCP_MAX + +#define ipt_dscp_info xt_dscp_info #endif /* _IPT_DSCP_H */ diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig index ef0b5aa..d88d71d 100644 --- a/net/ipv4/netfilter/Kconfig +++ b/net/ipv4/netfilter/Kconfig @@ -278,17 +278,6 @@ config IP_NF_MATCH_ECN To compile it as a module, choose M here. If unsure, say N. -config IP_NF_MATCH_DSCP - tristate "DSCP match support" - depends on IP_NF_IPTABLES - help - This option adds a `DSCP' match, which allows you to match against - the IPv4 header DSCP field (DSCP codepoint). - - The DSCP codepoint can have any value between 0x0 and 0x4f. - - To compile it as a module, choose M here. If unsure, say N. - config IP_NF_MATCH_AH tristate "AH match support" depends on IP_NF_IPTABLES diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile index 3ded4a3..b946b0f 100644 --- a/net/ipv4/netfilter/Makefile +++ b/net/ipv4/netfilter/Makefile @@ -59,7 +59,6 @@ obj-$(CONFIG_IP_NF_MATCH_OWNER) += ipt_o obj-$(CONFIG_IP_NF_MATCH_TOS) += ipt_tos.o obj-$(CONFIG_IP_NF_MATCH_RECENT) += ipt_recent.o obj-$(CONFIG_IP_NF_MATCH_ECN) += ipt_ecn.o -obj-$(CONFIG_IP_NF_MATCH_DSCP) += ipt_dscp.o obj-$(CONFIG_IP_NF_MATCH_AH) += ipt_ah.o obj-$(CONFIG_IP_NF_MATCH_TTL) += ipt_ttl.o obj-$(CONFIG_IP_NF_MATCH_ADDRTYPE) += ipt_addrtype.o diff --git a/net/ipv4/netfilter/ipt_dscp.c b/net/ipv4/netfilter/ipt_dscp.c deleted file mode 100644 index 4717759..0000000 --- a/net/ipv4/netfilter/ipt_dscp.c +++ /dev/null @@ -1,54 +0,0 @@ -/* IP tables module for matching the value of the IPv4 DSCP field - * - * ipt_dscp.c,v 1.3 2002/08/05 19:00:21 laforge Exp - * - * (C) 2002 by Harald Welte - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -#include -#include - -#include -#include - -MODULE_AUTHOR("Harald Welte "); -MODULE_DESCRIPTION("iptables DSCP matching module"); -MODULE_LICENSE("GPL"); - -static int match(const struct sk_buff *skb, - const struct net_device *in, const struct net_device *out, - const struct xt_match *match, const void *matchinfo, - int offset, unsigned int protoff, int *hotdrop) -{ - const struct ipt_dscp_info *info = matchinfo; - const struct iphdr *iph = skb->nh.iph; - - u_int8_t sh_dscp = ((info->dscp << IPT_DSCP_SHIFT) & IPT_DSCP_MASK); - - return ((iph->tos&IPT_DSCP_MASK) == sh_dscp) ^ info->invert; -} - -static struct ipt_match dscp_match = { - .name = "dscp", - .match = match, - .matchsize = sizeof(struct ipt_dscp_info), - .me = THIS_MODULE, -}; - -static int __init ipt_dscp_init(void) -{ - return ipt_register_match(&dscp_match); -} - -static void __exit ipt_dscp_fini(void) -{ - ipt_unregister_match(&dscp_match); - -} - -module_init(ipt_dscp_init); -module_exit(ipt_dscp_fini); diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index a9894dd..f781405 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -263,6 +263,17 @@ config NETFILTER_XT_MATCH_DCCP If you want to compile it as a module, say M here and read . If unsure, say `N'. +config NETFILTER_XT_MATCH_DSCP + tristate '"DSCP" match support' + depends on NETFILTER_XTABLES + help + This option adds a `DSCP' match, which allows you to match against + the IPv4/IPv6 header DSCP field (differentiated services codepoint). + + The DSCP field can have any value between 0x0 and 0x3f inclusive. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_ESP tristate '"ESP" match support' depends on NETFILTER_XTABLES diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 6fa4b75..0b8a70c 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -37,6 +37,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTE obj-$(CONFIG_NETFILTER_XT_MATCH_CONNMARK) += xt_connmark.o obj-$(CONFIG_NETFILTER_XT_MATCH_CONNTRACK) += xt_conntrack.o obj-$(CONFIG_NETFILTER_XT_MATCH_DCCP) += xt_dccp.o +obj-$(CONFIG_NETFILTER_XT_MATCH_DSCP) += xt_dscp.o obj-$(CONFIG_NETFILTER_XT_MATCH_ESP) += xt_esp.o obj-$(CONFIG_NETFILTER_XT_MATCH_HELPER) += xt_helper.o obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) += xt_length.o diff --git a/net/netfilter/xt_dscp.c b/net/netfilter/xt_dscp.c new file mode 100644 index 0000000..82e250d --- /dev/null +++ b/net/netfilter/xt_dscp.c @@ -0,0 +1,113 @@ +/* IP tables module for matching the value of the IPv4/IPv6 DSCP field + * + * xt_dscp.c,v 1.3 2002/08/05 19:00:21 laforge Exp + * + * (C) 2002 by Harald Welte + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include + +#include +#include + +MODULE_AUTHOR("Harald Welte "); +MODULE_DESCRIPTION("x_tables DSCP matching module"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("ipt_dscp"); +MODULE_ALIAS("ip6t_dscp"); + +static int match(const struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + const struct xt_match *match, + const void *matchinfo, + int offset, + unsigned int protoff, + int *hotdrop) +{ + const struct xt_dscp_info *info = matchinfo; + u_int8_t dscp = ipv4_get_dsfield(skb->nh.iph) >> XT_DSCP_SHIFT; + + return (dscp == info->dscp) ^ !!info->invert; +} + +static int match6(const struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + const struct xt_match *match, + const void *matchinfo, + int offset, + unsigned int protoff, + int *hotdrop) +{ + const struct xt_dscp_info *info = matchinfo; + u_int8_t dscp = ipv6_get_dsfield(skb->nh.ipv6h) >> XT_DSCP_SHIFT; + + return (dscp == info->dscp) ^ !!info->invert; +} + +static int checkentry(const char *tablename, + const void *info, + const struct xt_match *match, + void *matchinfo, + unsigned int matchsize, + unsigned int hook_mask) +{ + const u_int8_t dscp = ((struct xt_dscp_info *)matchinfo)->dscp; + + if (dscp > XT_DSCP_MAX) { + printk(KERN_ERR "xt_dscp: dscp %x out of range\n", dscp); + return 0; + } + + return 1; +} + +static struct xt_match dscp_match = { + .name = "dscp", + .match = match, + .checkentry = checkentry, + .matchsize = sizeof(struct xt_dscp_info), + .family = AF_INET, + .me = THIS_MODULE, +}; + +static struct xt_match dscp6_match = { + .name = "dscp", + .match = match6, + .checkentry = checkentry, + .matchsize = sizeof(struct xt_dscp_info), + .family = AF_INET6, + .me = THIS_MODULE, +}; + +static int __init xt_dscp_match_init(void) +{ + int ret; + ret = xt_register_match(&dscp_match); + if (ret) + return ret; + + ret = xt_register_match(&dscp6_match); + if (ret) + xt_unregister_match(&dscp_match); + + return ret; +} + +static void __exit xt_dscp_match_fini(void) +{ + xt_unregister_match(&dscp_match); + xt_unregister_match(&dscp6_match); +} + +module_init(xt_dscp_match_init); +module_exit(xt_dscp_match_fini); From kaber at trash.net Tue Aug 22 00:52:18 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:24:30 2006 Subject: [NETFILTER 00/18]: Netfilter Update for 2.6.19 Message-ID: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Hi Dave, following is my first batch of netfilter patches for 2.6.19. Mostly cleanups and code consolidation and a few ctnetlink updates - it looks a lot larger than it is :) Please apply, thanks. include/linux/netfilter/nf_conntrack_common.h | 4 include/linux/netfilter/nfnetlink.h | 4 include/linux/netfilter/nfnetlink_log.h | 6 include/linux/netfilter/nfnetlink_queue.h | 8 include/linux/netfilter/x_tables.h | 19 +- include/linux/netfilter/xt_DSCP.h | 20 ++ include/linux/netfilter/xt_dscp.h | 23 ++ include/linux/netfilter_arp/arp_tables.h | 3 include/linux/netfilter_ipv4/ip_tables.h | 3 include/linux/netfilter_ipv4/ipt_DSCP.h | 6 include/linux/netfilter_ipv4/ipt_dscp.h | 14 - include/linux/netfilter_ipv6/ip6_tables.h | 3 net/ipv4/netfilter/Kconfig | 22 -- net/ipv4/netfilter/Makefile | 2 net/ipv4/netfilter/arp_tables.c | 14 - net/ipv4/netfilter/arpt_mangle.c | 4 net/ipv4/netfilter/arptable_filter.c | 2 net/ipv4/netfilter/ip_conntrack_netlink.c | 13 - net/ipv4/netfilter/ip_nat_rule.c | 10 - net/ipv4/netfilter/ip_tables.c | 23 -- net/ipv4/netfilter/ipt_CLUSTERIP.c | 7 net/ipv4/netfilter/ipt_DSCP.c | 96 ----------- net/ipv4/netfilter/ipt_ECN.c | 26 +-- net/ipv4/netfilter/ipt_LOG.c | 4 net/ipv4/netfilter/ipt_MASQUERADE.c | 4 net/ipv4/netfilter/ipt_NETMAP.c | 4 net/ipv4/netfilter/ipt_REDIRECT.c | 4 net/ipv4/netfilter/ipt_REJECT.c | 4 net/ipv4/netfilter/ipt_SAME.c | 7 net/ipv4/netfilter/ipt_TCPMSS.c | 4 net/ipv4/netfilter/ipt_TOS.c | 26 +-- net/ipv4/netfilter/ipt_TTL.c | 12 - net/ipv4/netfilter/ipt_ULOG.c | 3 net/ipv4/netfilter/ipt_ah.c | 1 net/ipv4/netfilter/ipt_dscp.c | 54 ------ net/ipv4/netfilter/ipt_ecn.c | 3 net/ipv4/netfilter/ipt_hashlimit.c | 4 net/ipv4/netfilter/ipt_owner.c | 1 net/ipv4/netfilter/ipt_recent.c | 13 + net/ipv4/netfilter/iptable_filter.c | 4 net/ipv4/netfilter/iptable_mangle.c | 4 net/ipv4/netfilter/iptable_raw.c | 2 net/ipv6/netfilter/Makefile | 2 net/ipv6/netfilter/ip6_tables.c | 19 -- net/ipv6/netfilter/ip6t_HL.c | 3 net/ipv6/netfilter/ip6t_LOG.c | 4 net/ipv6/netfilter/ip6t_REJECT.c | 8 net/ipv6/netfilter/ip6t_ah.c | 1 net/ipv6/netfilter/ip6t_dst.c | 220 -------------------------- net/ipv6/netfilter/ip6t_frag.c | 1 net/ipv6/netfilter/ip6t_hbh.c | 49 ++--- net/ipv6/netfilter/ip6t_ipv6header.c | 1 net/ipv6/netfilter/ip6t_owner.c | 1 net/ipv6/netfilter/ip6t_rt.c | 1 net/ipv6/netfilter/ip6table_filter.c | 4 net/ipv6/netfilter/ip6table_mangle.c | 4 net/ipv6/netfilter/ip6table_raw.c | 2 net/netfilter/Kconfig | 23 ++ net/netfilter/Makefile | 2 net/netfilter/nf_conntrack_netlink.c | 13 - net/netfilter/nfnetlink_queue.c | 4 net/netfilter/x_tables.c | 60 +++++++ net/netfilter/xt_CLASSIFY.c | 63 +++---- net/netfilter/xt_CONNMARK.c | 128 +++++++-------- net/netfilter/xt_CONNSECMARK.c | 61 ++----- net/netfilter/xt_DSCP.c | 188 ++++++++++++++++++---- net/netfilter/xt_MARK.c | 92 ++++------ net/netfilter/xt_NFQUEUE.c | 71 +++----- net/netfilter/xt_NOTRACK.c | 50 ++--- net/netfilter/xt_SECMARK.c | 59 ++---- net/netfilter/xt_comment.c | 45 ++--- net/netfilter/xt_connbytes.c | 50 ++--- net/netfilter/xt_connmark.c | 56 ++---- net/netfilter/xt_conntrack.c | 8 net/netfilter/xt_dccp.c | 52 ++---- net/netfilter/xt_dscp.c | 161 +++++++++++++++---- net/netfilter/xt_esp.c | 52 ++---- net/netfilter/xt_helper.c | 55 ++---- net/netfilter/xt_length.c | 43 ++--- net/netfilter/xt_limit.c | 48 ++--- net/netfilter/xt_mac.c | 52 ++---- net/netfilter/xt_mark.c | 48 ++--- net/netfilter/xt_multiport.c | 115 ++++--------- net/netfilter/xt_physdev.c | 50 ++--- net/netfilter/xt_pkttype.c | 44 ++--- net/netfilter/xt_policy.c | 54 ++---- net/netfilter/xt_quota.c | 53 ++---- net/netfilter/xt_sctp.c | 52 ++---- net/netfilter/xt_state.c | 56 ++---- net/netfilter/xt_statistic.c | 55 ++---- net/netfilter/xt_string.c | 54 ++---- net/netfilter/xt_tcpmss.c | 97 ++++------- net/netfilter/xt_tcpudp.c | 109 ++++-------- net/sched/act_ipt.c | 7 94 files changed, 1292 insertions(+), 1748 deletions(-) Daniel De Graaf: [NETFILTER]: ipt_recent: add module parameter for changing ownership of /proc/net/ipt_recent/* Pablo Neira Ayuso: [NETFILTER]: conntrack: introduce connection mark event [NETFILTER]: ctnetlink: dump connection mark [NETFILTER]: ctnetlink: check for listeners before sending expectation events [NETFILTER]: ctnetlink: remove impossible events tests for updates Patrick McHardy: [NETFILTER]: nfnetlink_queue: fix typo in error message [NETFILTER]: replace open coded checksum updates [NETFILTER]: xt_CONNMARK: use tabs for indentation [NETFILTER]: x_tables: add helpers for mass match/target registration [NETFILTER]: x_tables: make use of mass registation helpers [NETFILTER]: x_tables: remove unused argument to target functions [NETFILTER]: x_tables: remove unused size argument to check/destroy functions [NETFILTER]: nfnetlink: remove unnecessary packed attributes [NETFILTER]: x_tables: add data member to struct xt_match [NETFILTER]: ip6_tables: consolidate dst and hbh matches [NETFILTER]: xt_tcpmss: minor cleanups Yasuyuki Kozakai: [NETFILTER]: x_tables: replace IPv4 dscp match by address family independent version [NETFILTER]: x_tables: replace IPv4 DSCP target by address family independent version From kaber at trash.net Tue Aug 22 00:52:23 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:24:31 2006 Subject: [NETFILTER 04/18]: conntrack: introduce connection mark event In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225223.10288.810.sendpatchset@localhost.localdomain> [NETFILTER]: conntrack: introduce connection mark event This patch introduces the mark event. ctnetlink can use this to know if the mark needs to be dumped. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Patrick McHardy --- commit 706761e6047b57ee8fa33c2eca912ffc5f36bfa7 tree 450fe3c2d130148e1e88909f1ef6b2b2ae1c0496 parent 55eeb35cfb789640cc0d3b179398b196286c5991 author Pablo Neira Ayuso Fri, 11 Aug 2006 21:01:12 +0200 committer Patrick McHardy Fri, 11 Aug 2006 21:01:12 +0200 include/linux/netfilter/nf_conntrack_common.h | 4 ++++ net/netfilter/xt_CONNMARK.c | 16 ++++++++++++++-- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/include/linux/netfilter/nf_conntrack_common.h b/include/linux/netfilter/nf_conntrack_common.h index d2e4bd7..9e0dae0 100644 --- a/include/linux/netfilter/nf_conntrack_common.h +++ b/include/linux/netfilter/nf_conntrack_common.h @@ -125,6 +125,10 @@ enum ip_conntrack_events /* Counter highest bit has been set */ IPCT_COUNTER_FILLING_BIT = 11, IPCT_COUNTER_FILLING = (1 << IPCT_COUNTER_FILLING_BIT), + + /* Mark is set */ + IPCT_MARK_BIT = 12, + IPCT_MARK = (1 << IPCT_MARK_BIT), }; enum ip_conntrack_expect_events { diff --git a/net/netfilter/xt_CONNMARK.c b/net/netfilter/xt_CONNMARK.c index 60c375d..784482b 100644 --- a/net/netfilter/xt_CONNMARK.c +++ b/net/netfilter/xt_CONNMARK.c @@ -52,13 +52,25 @@ target(struct sk_buff **pskb, switch(markinfo->mode) { case XT_CONNMARK_SET: newmark = (*ctmark & ~markinfo->mask) | markinfo->mark; - if (newmark != *ctmark) + if (newmark != *ctmark) { *ctmark = newmark; +#ifdef CONFIG_IP_NF_CONNTRACK_EVENTS + ip_conntrack_event_cache(IPCT_MARK, *pskb); +#else + nf_conntrack_event_cache(IPCT_MARK, *pskb); +#endif + } break; case XT_CONNMARK_SAVE: newmark = (*ctmark & ~markinfo->mask) | ((*pskb)->nfmark & markinfo->mask); - if (*ctmark != newmark) + if (*ctmark != newmark) { *ctmark = newmark; +#ifdef CONFIG_IP_NF_CONNTRACK_EVENTS + ip_conntrack_event_cache(IPCT_MARK, *pskb); +#else + nf_conntrack_event_cache(IPCT_MARK, *pskb); +#endif + } break; case XT_CONNMARK_RESTORE: nfmark = (*pskb)->nfmark; From kaber at trash.net Tue Aug 22 00:52:24 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:24:32 2006 Subject: [NETFILTER 05/18]: ctnetlink: dump connection mark In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225224.10288.77960.sendpatchset@localhost.localdomain> [NETFILTER]: ctnetlink: dump connection mark ctnetlink dumps the mark iif the event mark happened Signed-off-by: Pablo Neira Ayuso Signed-off-by: Patrick McHardy --- commit 430bb812a0703f2faddbe92a097d0ef7289b963b tree e2c09a86971ac58d9b4ed85bf5962a9e360bbf05 parent 706761e6047b57ee8fa33c2eca912ffc5f36bfa7 author Pablo Neira Ayuso Fri, 11 Aug 2006 21:01:17 +0200 committer Patrick McHardy Fri, 11 Aug 2006 21:01:17 +0200 net/ipv4/netfilter/ip_conntrack_netlink.c | 4 ++++ net/netfilter/nf_conntrack_netlink.c | 4 ++++ 2 files changed, 8 insertions(+), 0 deletions(-) diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 33891bb..319022e 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -385,6 +385,10 @@ static int ctnetlink_conntrack_event(str ctnetlink_dump_counters(skb, ct, IP_CT_DIR_REPLY) < 0) goto nfattr_failure; + if (events & IPCT_MARK + && ctnetlink_dump_mark(skb, ct) < 0) + goto nfattr_failure; + nlh->nlmsg_len = skb->tail - b; nfnetlink_send(skb, 0, group, 0); return NOTIFY_DONE; diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index af48459..ed8268a 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -395,6 +395,10 @@ static int ctnetlink_conntrack_event(str ctnetlink_dump_counters(skb, ct, IP_CT_DIR_REPLY) < 0) goto nfattr_failure; + if (events & IPCT_MARK + && ctnetlink_dump_mark(skb, ct) < 0) + goto nfattr_failure; + nlh->nlmsg_len = skb->tail - b; nfnetlink_send(skb, 0, group, 0); return NOTIFY_DONE; From kaber at trash.net Tue Aug 22 00:52:27 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:24:39 2006 Subject: [NETFILTER 07/18]: ctnetlink: remove impossible events tests for updates In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225227.10288.44294.sendpatchset@localhost.localdomain> [NETFILTER]: ctnetlink: remove impossible events tests for updates IPCT_HELPER and IPCT_NATINFO bits are never set on updates. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Patrick McHardy --- commit b27f40cbcae710e0b68589c9943499d8487a1590 tree 95dc47c638f174eec0c5ab68a1ad7cc9d966093f parent 67b49f9ce48d6acb259d20a38bf1d131250a01c6 author Pablo Neira Ayuso Fri, 11 Aug 2006 21:01:28 +0200 committer Patrick McHardy Fri, 11 Aug 2006 21:01:28 +0200 net/ipv4/netfilter/ip_conntrack_netlink.c | 6 +----- net/netfilter/nf_conntrack_netlink.c | 6 +----- 2 files changed, 2 insertions(+), 10 deletions(-) diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 090df76..194158e 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -329,11 +329,7 @@ static int ctnetlink_conntrack_event(str /* dump everything */ events = ~0UL; group = NFNLGRP_CONNTRACK_NEW; - } else if (events & (IPCT_STATUS | - IPCT_PROTOINFO | - IPCT_HELPER | - IPCT_HELPINFO | - IPCT_NATINFO)) { + } else if (events & (IPCT_STATUS | IPCT_PROTOINFO)) { type = IPCTNL_MSG_CT_NEW; group = NFNLGRP_CONNTRACK_UPDATE; } else diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 81bcbe8..b5ed955 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -339,11 +339,7 @@ static int ctnetlink_conntrack_event(str /* dump everything */ events = ~0UL; group = NFNLGRP_CONNTRACK_NEW; - } else if (events & (IPCT_STATUS | - IPCT_PROTOINFO | - IPCT_HELPER | - IPCT_HELPINFO | - IPCT_NATINFO)) { + } else if (events & (IPCT_STATUS | IPCT_PROTOINFO)) { type = IPCTNL_MSG_CT_NEW; group = NFNLGRP_CONNTRACK_UPDATE; } else From kaber at trash.net Tue Aug 22 00:52:31 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:24:40 2006 Subject: [NETFILTER 10/18]: xt_CONNMARK: use tabs for indentation In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225231.10288.68694.sendpatchset@localhost.localdomain> [NETFILTER]: xt_CONNMARK: use tabs for indentation Signed-off-by: Patrick McHardy --- commit c3e06d8b091765def127afcc148835736d64fad5 tree cd0512e19c5b2958a06ac0c0bf22a018242787df parent 54ba0f09d3cb3d4ce48e4eb8cb9cae3ac60bade1 author Patrick McHardy Sun, 13 Aug 2006 17:19:50 +0200 committer Patrick McHardy Sun, 13 Aug 2006 17:19:50 +0200 net/netfilter/xt_CONNMARK.c | 57 ++++++++++++++++++++++--------------------- 1 files changed, 29 insertions(+), 28 deletions(-) diff --git a/net/netfilter/xt_CONNMARK.c b/net/netfilter/xt_CONNMARK.c index 784482b..19989a9 100644 --- a/net/netfilter/xt_CONNMARK.c +++ b/net/netfilter/xt_CONNMARK.c @@ -49,36 +49,37 @@ target(struct sk_buff **pskb, u_int32_t *ctmark = nf_ct_get_mark(*pskb, &ctinfo); if (ctmark) { - switch(markinfo->mode) { - case XT_CONNMARK_SET: - newmark = (*ctmark & ~markinfo->mask) | markinfo->mark; - if (newmark != *ctmark) { - *ctmark = newmark; + switch(markinfo->mode) { + case XT_CONNMARK_SET: + newmark = (*ctmark & ~markinfo->mask) | markinfo->mark; + if (newmark != *ctmark) { + *ctmark = newmark; #ifdef CONFIG_IP_NF_CONNTRACK_EVENTS - ip_conntrack_event_cache(IPCT_MARK, *pskb); + ip_conntrack_event_cache(IPCT_MARK, *pskb); #else - nf_conntrack_event_cache(IPCT_MARK, *pskb); + nf_conntrack_event_cache(IPCT_MARK, *pskb); #endif } - break; - case XT_CONNMARK_SAVE: - newmark = (*ctmark & ~markinfo->mask) | ((*pskb)->nfmark & markinfo->mask); - if (*ctmark != newmark) { - *ctmark = newmark; + break; + case XT_CONNMARK_SAVE: + newmark = (*ctmark & ~markinfo->mask) | + ((*pskb)->nfmark & markinfo->mask); + if (*ctmark != newmark) { + *ctmark = newmark; #ifdef CONFIG_IP_NF_CONNTRACK_EVENTS - ip_conntrack_event_cache(IPCT_MARK, *pskb); + ip_conntrack_event_cache(IPCT_MARK, *pskb); #else - nf_conntrack_event_cache(IPCT_MARK, *pskb); + nf_conntrack_event_cache(IPCT_MARK, *pskb); #endif + } + break; + case XT_CONNMARK_RESTORE: + nfmark = (*pskb)->nfmark; + diff = (*ctmark ^ nfmark) & markinfo->mask; + if (diff != 0) + (*pskb)->nfmark = nfmark ^ diff; + break; } - break; - case XT_CONNMARK_RESTORE: - nfmark = (*pskb)->nfmark; - diff = (*ctmark ^ nfmark) & markinfo->mask; - if (diff != 0) - (*pskb)->nfmark = nfmark ^ diff; - break; - } } return XT_CONTINUE; @@ -95,17 +96,17 @@ checkentry(const char *tablename, struct xt_connmark_target_info *matchinfo = targinfo; if (matchinfo->mode == XT_CONNMARK_RESTORE) { - if (strcmp(tablename, "mangle") != 0) { - printk(KERN_WARNING "CONNMARK: restore can only be called from \"mangle\" table, not \"%s\"\n", tablename); - return 0; - } + if (strcmp(tablename, "mangle") != 0) { + printk(KERN_WARNING "CONNMARK: restore can only be " + "called from \"mangle\" table, not \"%s\"\n", + tablename); + return 0; + } } - if (matchinfo->mark > 0xffffffff || matchinfo->mask > 0xffffffff) { printk(KERN_WARNING "CONNMARK: Only supports 32bit mark\n"); return 0; } - return 1; } From kaber at trash.net Tue Aug 22 00:52:28 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:24:52 2006 Subject: [NETFILTER 08/18]: nfnetlink_queue: fix typo in error message In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225228.10288.68921.sendpatchset@localhost.localdomain> [NETFILTER]: nfnetlink_queue: fix typo in error message Signed-off-by: Patrick McHardy --- commit 94964e26cff67825112477f3c8bae88539245d72 tree f011d732bad268a2060caa362348665f690f5e66 parent b27f40cbcae710e0b68589c9943499d8487a1590 author Patrick McHardy Fri, 11 Aug 2006 21:01:34 +0200 committer Patrick McHardy Fri, 11 Aug 2006 21:01:34 +0200 net/netfilter/nfnetlink_queue.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index eddfbe4..8eb2473 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -584,7 +584,7 @@ nfqnl_enqueue_packet(struct sk_buff *skb queue->queue_dropped++; status = -ENOSPC; if (net_ratelimit()) - printk(KERN_WARNING "ip_queue: full at %d entries, " + printk(KERN_WARNING "nf_queue: full at %d entries, " "dropping packets(s). Dropped: %d\n", queue->queue_total, queue->queue_dropped); goto err_out_free_nskb; @@ -635,7 +635,7 @@ nfqnl_mangle(void *data, int data_len, s diff, GFP_ATOMIC); if (newskb == NULL) { - printk(KERN_WARNING "ip_queue: OOM " + printk(KERN_WARNING "nf_queue: OOM " "in mangle, dropping packet\n"); return -ENOMEM; } From kaber at trash.net Tue Aug 22 00:52:20 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:24:53 2006 Subject: [NETFILTER 02/18]: x_tables: replace IPv4 DSCP target by address family independent version In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225220.10288.33754.sendpatchset@localhost.localdomain> [NETFILTER]: x_tables: replace IPv4 DSCP target by address family independent version This replaces IPv4 DSCP target by address family independent version. This also - utilizes dsfield.h to get/mangle DS field in IPv4/IPv6 header - fixes Kconfig help text. Signed-off-by: Yasuyuki Kozakai Signed-off-by: Patrick McHardy --- commit aee6e3b681f66196cf3ec43b53b252b61f870f1a tree 4c4d9bc72012d42bc9bc0c8442ae644f2d5125e9 parent 8419c86c6871f880cd17a6cc29146d3d2da0477a author Yasuyuki Kozakai Fri, 11 Aug 2006 21:00:58 +0200 committer Patrick McHardy Fri, 11 Aug 2006 21:00:58 +0200 include/linux/netfilter/xt_DSCP.h | 20 +++++ include/linux/netfilter_ipv4/ipt_DSCP.h | 6 - net/ipv4/netfilter/Kconfig | 11 --- net/ipv4/netfilter/Makefile | 1 net/ipv4/netfilter/ipt_DSCP.c | 96 ----------------------- net/netfilter/Kconfig | 12 +++ net/netfilter/Makefile | 1 net/netfilter/xt_DSCP.c | 130 +++++++++++++++++++++++++++++++ 8 files changed, 165 insertions(+), 112 deletions(-) diff --git a/include/linux/netfilter/xt_DSCP.h b/include/linux/netfilter/xt_DSCP.h new file mode 100644 index 0000000..3c7c963 --- /dev/null +++ b/include/linux/netfilter/xt_DSCP.h @@ -0,0 +1,20 @@ +/* x_tables module for setting the IPv4/IPv6 DSCP field + * + * (C) 2002 Harald Welte + * based on ipt_FTOS.c (C) 2000 by Matthew G. Marsh + * This software is distributed under GNU GPL v2, 1991 + * + * See RFC2474 for a description of the DSCP field within the IP Header. + * + * xt_DSCP.h,v 1.7 2002/03/14 12:03:13 laforge Exp +*/ +#ifndef _XT_DSCP_TARGET_H +#define _XT_DSCP_TARGET_H +#include + +/* target info */ +struct xt_DSCP_info { + u_int8_t dscp; +}; + +#endif /* _XT_DSCP_TARGET_H */ diff --git a/include/linux/netfilter_ipv4/ipt_DSCP.h b/include/linux/netfilter_ipv4/ipt_DSCP.h index b30f510..3491e52 100644 --- a/include/linux/netfilter_ipv4/ipt_DSCP.h +++ b/include/linux/netfilter_ipv4/ipt_DSCP.h @@ -11,10 +11,8 @@ #ifndef _IPT_DSCP_TARGET_H #define _IPT_DSCP_TARGET_H #include +#include -/* target info */ -struct ipt_DSCP_info { - u_int8_t dscp; -}; +#define ipt_DSCP_info xt_DSCP_info #endif /* _IPT_DSCP_TARGET_H */ diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig index d88d71d..a55b8ff 100644 --- a/net/ipv4/netfilter/Kconfig +++ b/net/ipv4/netfilter/Kconfig @@ -557,17 +557,6 @@ config IP_NF_TARGET_ECN To compile it as a module, choose M here. If unsure, say N. -config IP_NF_TARGET_DSCP - tristate "DSCP target support" - depends on IP_NF_MANGLE - help - This option adds a `DSCP' match, which allows you to match against - the IPv4 header DSCP field (DSCP codepoint). - - The DSCP codepoint can have any value between 0x0 and 0x4f. - - To compile it as a module, choose M here. If unsure, say N. - config IP_NF_TARGET_TTL tristate 'TTL target support' depends on IP_NF_MANGLE diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile index b946b0f..09aaed1 100644 --- a/net/ipv4/netfilter/Makefile +++ b/net/ipv4/netfilter/Makefile @@ -67,7 +67,6 @@ # targets obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o obj-$(CONFIG_IP_NF_TARGET_TOS) += ipt_TOS.o obj-$(CONFIG_IP_NF_TARGET_ECN) += ipt_ECN.o -obj-$(CONFIG_IP_NF_TARGET_DSCP) += ipt_DSCP.o obj-$(CONFIG_IP_NF_TARGET_MASQUERADE) += ipt_MASQUERADE.o obj-$(CONFIG_IP_NF_TARGET_REDIRECT) += ipt_REDIRECT.o obj-$(CONFIG_IP_NF_TARGET_NETMAP) += ipt_NETMAP.o diff --git a/net/ipv4/netfilter/ipt_DSCP.c b/net/ipv4/netfilter/ipt_DSCP.c deleted file mode 100644 index c8e9712..0000000 --- a/net/ipv4/netfilter/ipt_DSCP.c +++ /dev/null @@ -1,96 +0,0 @@ -/* iptables module for setting the IPv4 DSCP field, Version 1.8 - * - * (C) 2002 by Harald Welte - * based on ipt_FTOS.c (C) 2000 by Matthew G. Marsh - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * See RFC2474 for a description of the DSCP field within the IP Header. - * - * ipt_DSCP.c,v 1.8 2002/08/06 18:41:57 laforge Exp -*/ - -#include -#include -#include -#include - -#include -#include - -MODULE_AUTHOR("Harald Welte "); -MODULE_DESCRIPTION("iptables DSCP modification module"); -MODULE_LICENSE("GPL"); - -static unsigned int -target(struct sk_buff **pskb, - const struct net_device *in, - const struct net_device *out, - unsigned int hooknum, - const struct xt_target *target, - const void *targinfo, - void *userinfo) -{ - const struct ipt_DSCP_info *dinfo = targinfo; - u_int8_t sh_dscp = ((dinfo->dscp << IPT_DSCP_SHIFT) & IPT_DSCP_MASK); - - - if (((*pskb)->nh.iph->tos & IPT_DSCP_MASK) != sh_dscp) { - u_int16_t diffs[2]; - - if (!skb_make_writable(pskb, sizeof(struct iphdr))) - return NF_DROP; - - diffs[0] = htons((*pskb)->nh.iph->tos) ^ 0xFFFF; - (*pskb)->nh.iph->tos = ((*pskb)->nh.iph->tos & ~IPT_DSCP_MASK) - | sh_dscp; - diffs[1] = htons((*pskb)->nh.iph->tos); - (*pskb)->nh.iph->check - = csum_fold(csum_partial((char *)diffs, - sizeof(diffs), - (*pskb)->nh.iph->check - ^ 0xFFFF)); - } - return IPT_CONTINUE; -} - -static int -checkentry(const char *tablename, - const void *e_void, - const struct xt_target *target, - void *targinfo, - unsigned int targinfosize, - unsigned int hook_mask) -{ - const u_int8_t dscp = ((struct ipt_DSCP_info *)targinfo)->dscp; - - if ((dscp > IPT_DSCP_MAX)) { - printk(KERN_WARNING "DSCP: dscp %x out of range\n", dscp); - return 0; - } - return 1; -} - -static struct ipt_target ipt_dscp_reg = { - .name = "DSCP", - .target = target, - .targetsize = sizeof(struct ipt_DSCP_info), - .table = "mangle", - .checkentry = checkentry, - .me = THIS_MODULE, -}; - -static int __init ipt_dscp_init(void) -{ - return ipt_register_target(&ipt_dscp_reg); -} - -static void __exit ipt_dscp_fini(void) -{ - ipt_unregister_target(&ipt_dscp_reg); -} - -module_init(ipt_dscp_init); -module_exit(ipt_dscp_fini); diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index f781405..0a28d2c 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -148,6 +148,18 @@ config NETFILTER_XT_TARGET_CONNMARK . The module will be called ipt_CONNMARK.o. If unsure, say `N'. +config NETFILTER_XT_TARGET_DSCP + tristate '"DSCP" target support' + depends on NETFILTER_XTABLES + depends on IP_NF_MANGLE || IP6_NF_MANGLE + help + This option adds a `DSCP' target, which allows you to manipulate + the IPv4/IPv6 header DSCP field (differentiated services codepoint). + + The DSCP field can have any value between 0x0 and 0x3f inclusive. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_TARGET_MARK tristate '"MARK" target support' depends on NETFILTER_XTABLES diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 0b8a70c..a74be49 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -25,6 +25,7 @@ obj-$(CONFIG_NETFILTER_XTABLES) += x_tab # targets obj-$(CONFIG_NETFILTER_XT_TARGET_CLASSIFY) += xt_CLASSIFY.o obj-$(CONFIG_NETFILTER_XT_TARGET_CONNMARK) += xt_CONNMARK.o +obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o obj-$(CONFIG_NETFILTER_XT_TARGET_MARK) += xt_MARK.o obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o obj-$(CONFIG_NETFILTER_XT_TARGET_NOTRACK) += xt_NOTRACK.o diff --git a/net/netfilter/xt_DSCP.c b/net/netfilter/xt_DSCP.c new file mode 100644 index 0000000..79df816 --- /dev/null +++ b/net/netfilter/xt_DSCP.c @@ -0,0 +1,130 @@ +/* x_tables module for setting the IPv4/IPv6 DSCP field, Version 1.8 + * + * (C) 2002 by Harald Welte + * based on ipt_FTOS.c (C) 2000 by Matthew G. Marsh + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * See RFC2474 for a description of the DSCP field within the IP Header. + * + * xt_DSCP.c,v 1.8 2002/08/06 18:41:57 laforge Exp +*/ + +#include +#include +#include +#include +#include + +#include +#include + +MODULE_AUTHOR("Harald Welte "); +MODULE_DESCRIPTION("x_tables DSCP modification module"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("ipt_DSCP"); +MODULE_ALIAS("ip6t_DSCP"); + +static unsigned int target(struct sk_buff **pskb, + const struct net_device *in, + const struct net_device *out, + unsigned int hooknum, + const struct xt_target *target, + const void *targinfo, + void *userinfo) +{ + const struct xt_DSCP_info *dinfo = targinfo; + u_int8_t dscp = ipv4_get_dsfield((*pskb)->nh.iph) >> XT_DSCP_SHIFT; + + if (dscp != dinfo->dscp) { + if (!skb_make_writable(pskb, sizeof(struct iphdr))) + return NF_DROP; + + ipv4_change_dsfield((*pskb)->nh.iph, (__u8)(~XT_DSCP_MASK), + dinfo->dscp << XT_DSCP_SHIFT); + + } + return XT_CONTINUE; +} + +static unsigned int target6(struct sk_buff **pskb, + const struct net_device *in, + const struct net_device *out, + unsigned int hooknum, + const struct xt_target *target, + const void *targinfo, + void *userinfo) +{ + const struct xt_DSCP_info *dinfo = targinfo; + u_int8_t dscp = ipv6_get_dsfield((*pskb)->nh.ipv6h) >> XT_DSCP_SHIFT; + + if (dscp != dinfo->dscp) { + if (!skb_make_writable(pskb, sizeof(struct ipv6hdr))) + return NF_DROP; + + ipv6_change_dsfield((*pskb)->nh.ipv6h, (__u8)(~XT_DSCP_MASK), + dinfo->dscp << XT_DSCP_SHIFT); + } + return XT_CONTINUE; +} + +static int checkentry(const char *tablename, + const void *e_void, + const struct xt_target *target, + void *targinfo, + unsigned int targinfosize, + unsigned int hook_mask) +{ + const u_int8_t dscp = ((struct xt_DSCP_info *)targinfo)->dscp; + + if ((dscp > XT_DSCP_MAX)) { + printk(KERN_WARNING "DSCP: dscp %x out of range\n", dscp); + return 0; + } + return 1; +} + +static struct xt_target xt_dscp_reg = { + .name = "DSCP", + .target = target, + .targetsize = sizeof(struct xt_DSCP_info), + .table = "mangle", + .checkentry = checkentry, + .family = AF_INET, + .me = THIS_MODULE, +}; + +static struct xt_target xt_dscp6_reg = { + .name = "DSCP", + .target = target6, + .targetsize = sizeof(struct xt_DSCP_info), + .table = "mangle", + .checkentry = checkentry, + .family = AF_INET6, + .me = THIS_MODULE, +}; + +static int __init xt_dscp_target_init(void) +{ + int ret; + ret = xt_register_target(&xt_dscp_reg); + if (ret) + return ret; + + ret = xt_register_target(&xt_dscp6_reg); + if (ret) + xt_unregister_target(&xt_dscp_reg); + + return ret; +} + +static void __exit xt_dscp_target_fini(void) +{ + xt_unregister_target(&xt_dscp_reg); + xt_unregister_target(&xt_dscp6_reg); +} + +module_init(xt_dscp_target_init); +module_exit(xt_dscp_target_fini); From kaber at trash.net Tue Aug 22 00:52:39 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:24:57 2006 Subject: [NETFILTER 15/18]: nfnetlink: remove unnecessary packed attributes In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225238.10288.62091.sendpatchset@localhost.localdomain> [NETFILTER]: nfnetlink: remove unnecessary packed attributes Remove unnecessary packed attributes in nfnetlink structures. Unfortunately in a few cases they have to stay to avoid changing structure sizes. Signed-off-by: Patrick McHardy --- commit 9c8627cc82b512bf2c7d07f0a9afd9a6afc9e7ec tree e8491a9be51317f39b98618dfee2564b27dca2c2 parent 0945146f9085ef6469e137000fd125df648a137f author Patrick McHardy Fri, 18 Aug 2006 04:38:47 +0200 committer Patrick McHardy Fri, 18 Aug 2006 04:38:47 +0200 include/linux/netfilter/nfnetlink.h | 4 ++-- include/linux/netfilter/nfnetlink_log.h | 6 +++--- include/linux/netfilter/nfnetlink_queue.h | 8 ++++---- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/linux/netfilter/nfnetlink.h b/include/linux/netfilter/nfnetlink.h index 9f5b12c..6d8e3e5 100644 --- a/include/linux/netfilter/nfnetlink.h +++ b/include/linux/netfilter/nfnetlink.h @@ -43,7 +43,7 @@ struct nfattr u_int16_t nfa_len; u_int16_t nfa_type; /* we use 15 bits for the type, and the highest * bit to indicate whether the payload is nested */ -} __attribute__ ((packed)); +}; /* FIXME: Apart from NFNL_NFA_NESTED shamelessly copy and pasted from * rtnetlink.h, it's time to put this in a generic file */ @@ -79,7 +79,7 @@ struct nfgenmsg { u_int8_t nfgen_family; /* AF_xxx */ u_int8_t version; /* nfnetlink version */ u_int16_t res_id; /* resource id */ -} __attribute__ ((packed)); +}; #define NFNETLINK_V0 0 diff --git a/include/linux/netfilter/nfnetlink_log.h b/include/linux/netfilter/nfnetlink_log.h index a7497c7..87b92f8 100644 --- a/include/linux/netfilter/nfnetlink_log.h +++ b/include/linux/netfilter/nfnetlink_log.h @@ -19,18 +19,18 @@ struct nfulnl_msg_packet_hdr { u_int16_t hw_protocol; /* hw protocol (network order) */ u_int8_t hook; /* netfilter hook */ u_int8_t _pad; -} __attribute__ ((packed)); +}; struct nfulnl_msg_packet_hw { u_int16_t hw_addrlen; u_int16_t _pad; u_int8_t hw_addr[8]; -} __attribute__ ((packed)); +}; struct nfulnl_msg_packet_timestamp { aligned_u64 sec; aligned_u64 usec; -} __attribute__ ((packed)); +}; #define NFULNL_PREFIXLEN 30 /* just like old log target */ diff --git a/include/linux/netfilter/nfnetlink_queue.h b/include/linux/netfilter/nfnetlink_queue.h index 9e77437..36af036 100644 --- a/include/linux/netfilter/nfnetlink_queue.h +++ b/include/linux/netfilter/nfnetlink_queue.h @@ -22,12 +22,12 @@ struct nfqnl_msg_packet_hw { u_int16_t hw_addrlen; u_int16_t _pad; u_int8_t hw_addr[8]; -} __attribute__ ((packed)); +}; struct nfqnl_msg_packet_timestamp { aligned_u64 sec; aligned_u64 usec; -} __attribute__ ((packed)); +}; enum nfqnl_attr_type { NFQA_UNSPEC, @@ -49,7 +49,7 @@ #define NFQA_MAX (__NFQA_MAX - 1) struct nfqnl_msg_verdict_hdr { u_int32_t verdict; u_int32_t id; -} __attribute__ ((packed)); +}; enum nfqnl_msg_config_cmds { @@ -64,7 +64,7 @@ struct nfqnl_msg_config_cmd { u_int8_t command; /* nfqnl_msg_config_cmds */ u_int8_t _pad; u_int16_t pf; /* AF_xxx for PF_[UN]BIND */ -} __attribute__ ((packed)); +}; enum nfqnl_config_mode { NFQNL_COPY_NONE, From kaber at trash.net Tue Aug 22 00:52:30 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:24:59 2006 Subject: [NETFILTER 09/18]: replace open coded checksum updates In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225229.10288.30059.sendpatchset@localhost.localdomain> [NETFILTER]: replace open coded checksum updates Replace open coded checksum update by nf_csum_update calls and clean up the surrounding code a bit. Signed-off-by: Patrick McHardy --- commit 54ba0f09d3cb3d4ce48e4eb8cb9cae3ac60bade1 tree a34a4a57ebe2e4f12711b66f913a81fef9619713 parent 94964e26cff67825112477f3c8bae88539245d72 author Patrick McHardy Fri, 11 Aug 2006 21:15:53 +0200 committer Patrick McHardy Fri, 11 Aug 2006 21:15:53 +0200 net/ipv4/netfilter/ipt_ECN.c | 22 +++++++++------------- net/ipv4/netfilter/ipt_TOS.c | 22 ++++++++-------------- net/ipv4/netfilter/ipt_TTL.c | 9 +++------ 3 files changed, 20 insertions(+), 33 deletions(-) diff --git a/net/ipv4/netfilter/ipt_ECN.c b/net/ipv4/netfilter/ipt_ECN.c index 35916c7..7e30e6d 100644 --- a/net/ipv4/netfilter/ipt_ECN.c +++ b/net/ipv4/netfilter/ipt_ECN.c @@ -27,22 +27,18 @@ MODULE_DESCRIPTION("iptables ECN modific static inline int set_ect_ip(struct sk_buff **pskb, const struct ipt_ECN_info *einfo) { - if (((*pskb)->nh.iph->tos & IPT_ECN_IP_MASK) - != (einfo->ip_ect & IPT_ECN_IP_MASK)) { - u_int16_t diffs[2]; + struct iphdr *iph = (*pskb)->nh.iph; + u_int16_t oldtos; + if ((iph->tos & IPT_ECN_IP_MASK) != (einfo->ip_ect & IPT_ECN_IP_MASK)) { if (!skb_make_writable(pskb, sizeof(struct iphdr))) return 0; - - diffs[0] = htons((*pskb)->nh.iph->tos) ^ 0xFFFF; - (*pskb)->nh.iph->tos &= ~IPT_ECN_IP_MASK; - (*pskb)->nh.iph->tos |= (einfo->ip_ect & IPT_ECN_IP_MASK); - diffs[1] = htons((*pskb)->nh.iph->tos); - (*pskb)->nh.iph->check - = csum_fold(csum_partial((char *)diffs, - sizeof(diffs), - (*pskb)->nh.iph->check - ^0xFFFF)); + iph = (*pskb)->nh.iph; + oldtos = iph->tos; + iph->tos &= ~IPT_ECN_IP_MASK; + iph->tos |= (einfo->ip_ect & IPT_ECN_IP_MASK); + iph->check = nf_csum_update(oldtos ^ 0xFFFF, iph->tos, + iph->check); } return 1; } diff --git a/net/ipv4/netfilter/ipt_TOS.c b/net/ipv4/netfilter/ipt_TOS.c index 1c7a5ca..52e9d70 100644 --- a/net/ipv4/netfilter/ipt_TOS.c +++ b/net/ipv4/netfilter/ipt_TOS.c @@ -30,23 +30,17 @@ target(struct sk_buff **pskb, void *userinfo) { const struct ipt_tos_target_info *tosinfo = targinfo; + struct iphdr *iph = (*pskb)->nh.iph; + u_int16_t oldtos; - if (((*pskb)->nh.iph->tos & IPTOS_TOS_MASK) != tosinfo->tos) { - u_int16_t diffs[2]; - + if ((iph->tos & IPTOS_TOS_MASK) != tosinfo->tos) { if (!skb_make_writable(pskb, sizeof(struct iphdr))) return NF_DROP; - - diffs[0] = htons((*pskb)->nh.iph->tos) ^ 0xFFFF; - (*pskb)->nh.iph->tos - = ((*pskb)->nh.iph->tos & IPTOS_PREC_MASK) - | tosinfo->tos; - diffs[1] = htons((*pskb)->nh.iph->tos); - (*pskb)->nh.iph->check - = csum_fold(csum_partial((char *)diffs, - sizeof(diffs), - (*pskb)->nh.iph->check - ^0xFFFF)); + iph = (*pskb)->nh.iph; + oldtos = iph->tos; + iph->tos = (iph->tos & IPTOS_PREC_MASK) | tosinfo->tos; + iph->check = nf_csum_update(oldtos ^ 0xFFFF, iph->tos, + iph->check); } return IPT_CONTINUE; } diff --git a/net/ipv4/netfilter/ipt_TTL.c b/net/ipv4/netfilter/ipt_TTL.c index f48892a..2afb2a8 100644 --- a/net/ipv4/netfilter/ipt_TTL.c +++ b/net/ipv4/netfilter/ipt_TTL.c @@ -27,7 +27,6 @@ ipt_ttl_target(struct sk_buff **pskb, { struct iphdr *iph; const struct ipt_TTL_info *info = targinfo; - u_int16_t diffs[2]; int new_ttl; if (!skb_make_writable(pskb, (*pskb)->len)) @@ -55,12 +54,10 @@ ipt_ttl_target(struct sk_buff **pskb, } if (new_ttl != iph->ttl) { - diffs[0] = htons(((unsigned)iph->ttl) << 8) ^ 0xFFFF; + iph->check = nf_csum_update((iph->ttl << 8) ^ 0xFFFF, + new_ttl << 8, + iph->check); iph->ttl = new_ttl; - diffs[1] = htons(((unsigned)iph->ttl) << 8); - iph->check = csum_fold(csum_partial((char *)diffs, - sizeof(diffs), - iph->check^0xFFFF)); } return IPT_CONTINUE; From kaber at trash.net Tue Aug 22 00:52:32 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:25:01 2006 Subject: [NETFILTER 11/18]: x_tables: add helpers for mass match/target registration In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225232.10288.34148.sendpatchset@localhost.localdomain> [NETFILTER]: x_tables: add helpers for mass match/target registration Signed-off-by: Patrick McHardy --- commit 15e38196fbab0fbedc31889f45440b9fe6fdf257 tree 13de7a026236dacc50d6db347169b1132d30caff parent c3e06d8b091765def127afcc148835736d64fad5 author Patrick McHardy Sun, 13 Aug 2006 19:01:17 +0200 committer Patrick McHardy Sun, 13 Aug 2006 19:01:17 +0200 include/linux/netfilter/x_tables.h | 5 +++ net/netfilter/x_tables.c | 60 ++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+), 0 deletions(-) diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 48cc32d..9a99124 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -290,8 +290,13 @@ struct xt_table_info extern int xt_register_target(struct xt_target *target); extern void xt_unregister_target(struct xt_target *target); +extern int xt_register_targets(struct xt_target *target, unsigned int n); +extern void xt_unregister_targets(struct xt_target *target, unsigned int n); + extern int xt_register_match(struct xt_match *target); extern void xt_unregister_match(struct xt_match *target); +extern int xt_register_matches(struct xt_match *match, unsigned int n); +extern void xt_unregister_matches(struct xt_match *match, unsigned int n); extern int xt_check_match(const struct xt_match *match, unsigned short family, unsigned int size, const char *table, unsigned int hook, diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 174e8f9..8037ba6 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -87,6 +87,36 @@ xt_unregister_target(struct xt_target *t EXPORT_SYMBOL(xt_unregister_target); int +xt_register_targets(struct xt_target *target, unsigned int n) +{ + unsigned int i; + int err = 0; + + for (i = 0; i < n; i++) { + err = xt_register_target(&target[i]); + if (err) + goto err; + } + return err; + +err: + if (i > 0) + xt_unregister_targets(target, i); + return err; +} +EXPORT_SYMBOL(xt_register_targets); + +void +xt_unregister_targets(struct xt_target *target, unsigned int n) +{ + unsigned int i; + + for (i = 0; i < n; i++) + xt_unregister_target(&target[i]); +} +EXPORT_SYMBOL(xt_unregister_targets); + +int xt_register_match(struct xt_match *match) { int ret, af = match->family; @@ -113,6 +143,36 @@ xt_unregister_match(struct xt_match *mat } EXPORT_SYMBOL(xt_unregister_match); +int +xt_register_matches(struct xt_match *match, unsigned int n) +{ + unsigned int i; + int err = 0; + + for (i = 0; i < n; i++) { + err = xt_register_match(&match[i]); + if (err) + goto err; + } + return err; + +err: + if (i > 0) + xt_unregister_matches(match, i); + return err; +} +EXPORT_SYMBOL(xt_register_matches); + +void +xt_unregister_matches(struct xt_match *match, unsigned int n) +{ + unsigned int i; + + for (i = 0; i < n; i++) + xt_unregister_match(&match[i]); +} +EXPORT_SYMBOL(xt_unregister_matches); + /* * These are weird, but module loading must not be done with mutex From kaber at trash.net Tue Aug 22 00:52:40 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:25:04 2006 Subject: [NETFILTER 16/18]: x_tables: add data member to struct xt_match In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225240.10288.33283.sendpatchset@localhost.localdomain> [NETFILTER]: x_tables: add data member to struct xt_match Shared match functions can use this to make runtime decisions basen on the used match. Signed-off-by: Patrick McHardy --- commit 0da4cda5f20c851bf4a7da543d6eb91cad7810aa tree 4f6bc6de8304515ddcfd03d99e876f13d7299190 parent 9c8627cc82b512bf2c7d07f0a9afd9a6afc9e7ec author Patrick McHardy Fri, 18 Aug 2006 06:06:17 +0200 committer Patrick McHardy Fri, 18 Aug 2006 06:06:17 +0200 include/linux/netfilter/x_tables.h | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 9d97102..03d1027 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -185,6 +185,9 @@ struct xt_match /* Set this to THIS_MODULE if you are a module, otherwise NULL */ struct module *me; + /* Free to use by each match */ + unsigned long data; + char *table; unsigned int matchsize; unsigned int hooks; From kaber at trash.net Tue Aug 22 00:52:26 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:25:05 2006 Subject: [NETFILTER 06/18]: ctnetlink: check for listeners before sending expectation events In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225225.10288.61654.sendpatchset@localhost.localdomain> [NETFILTER]: ctnetlink: check for listeners before sending expectation events This patch uses nfnetlink_has_listeners to check for listeners in userspace. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Patrick McHardy --- commit 67b49f9ce48d6acb259d20a38bf1d131250a01c6 tree a68fa4897286bc09b28047f8de29f38c57301f51 parent 430bb812a0703f2faddbe92a097d0ef7289b963b author Pablo Neira Ayuso Fri, 11 Aug 2006 21:01:23 +0200 committer Patrick McHardy Fri, 11 Aug 2006 21:01:23 +0200 net/ipv4/netfilter/ip_conntrack_netlink.c | 3 +++ net/netfilter/nf_conntrack_netlink.c | 3 +++ 2 files changed, 6 insertions(+), 0 deletions(-) diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 319022e..090df76 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -1260,6 +1260,9 @@ static int ctnetlink_expect_event(struct } else return NOTIFY_DONE; + if (!nfnetlink_has_listeners(NFNLGRP_CONNTRACK_EXP_NEW)) + return NOTIFY_DONE; + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); if (!skb) return NOTIFY_DONE; diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index ed8268a..81bcbe8 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1281,6 +1281,9 @@ static int ctnetlink_expect_event(struct } else return NOTIFY_DONE; + if (!nfnetlink_has_listeners(NFNLGRP_CONNTRACK_EXP_NEW)) + return NOTIFY_DONE; + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); if (!skb) return NOTIFY_DONE; From kaber at trash.net Tue Aug 22 00:52:41 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:25:07 2006 Subject: [NETFILTER 17/18]: ip6_tables: consolidate dst and hbh matches In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225241.10288.17588.sendpatchset@localhost.localdomain> [NETFILTER]: ip6_tables: consolidate dst and hbh matches The matches are identical besides one looking for NEXTHDR_HOP, the other for NEXTHDR_DEST. Remove ip6t_dst.c and handle both in ip6t_hbh.c. Signed-off-by: Patrick McHardy --- commit e60478517acbd24f288c0d579adaea625c3419fb tree ff12a24846c7d1ba54c02426598db9f1dd3e348b parent 0da4cda5f20c851bf4a7da543d6eb91cad7810aa author Patrick McHardy Tue, 22 Aug 2006 00:36:32 +0200 committer Patrick McHardy Tue, 22 Aug 2006 00:36:32 +0200 net/ipv6/netfilter/Makefile | 2 net/ipv6/netfilter/ip6t_dst.c | 219 ----------------------------------------- net/ipv6/netfilter/ip6t_hbh.c | 48 ++++----- 3 files changed, 25 insertions(+), 244 deletions(-) diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile index eeeb57d..ac1dfeb 100644 --- a/net/ipv6/netfilter/Makefile +++ b/net/ipv6/netfilter/Makefile @@ -5,7 +5,7 @@ # # Link order matters here. obj-$(CONFIG_IP6_NF_IPTABLES) += ip6_tables.o obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o -obj-$(CONFIG_IP6_NF_MATCH_OPTS) += ip6t_hbh.o ip6t_dst.o +obj-$(CONFIG_IP6_NF_MATCH_OPTS) += ip6t_hbh.o obj-$(CONFIG_IP6_NF_MATCH_IPV6HEADER) += ip6t_ipv6header.o obj-$(CONFIG_IP6_NF_MATCH_FRAG) += ip6t_frag.o obj-$(CONFIG_IP6_NF_MATCH_AH) += ip6t_ah.o diff --git a/net/ipv6/netfilter/ip6t_dst.c b/net/ipv6/netfilter/ip6t_dst.c deleted file mode 100644 index 223c335..0000000 --- a/net/ipv6/netfilter/ip6t_dst.c +++ /dev/null @@ -1,219 +0,0 @@ -/* Kernel module to match Hop-by-Hop and Destination parameters. */ - -/* (C) 2001-2002 Andras Kis-Szabo - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -#include -#include -#include -#include -#include -#include - -#include - -#include -#include - -#define HOPBYHOP 0 - -MODULE_LICENSE("GPL"); -#if HOPBYHOP -MODULE_DESCRIPTION("IPv6 HbH match"); -#else -MODULE_DESCRIPTION("IPv6 DST match"); -#endif -MODULE_AUTHOR("Andras Kis-Szabo "); - -#if 0 -#define DEBUGP printk -#else -#define DEBUGP(format, args...) -#endif - -/* - * (Type & 0xC0) >> 6 - * 0 -> ignorable - * 1 -> must drop the packet - * 2 -> send ICMP PARM PROB regardless and drop packet - * 3 -> Send ICMP if not a multicast address and drop packet - * (Type & 0x20) >> 5 - * 0 -> invariant - * 1 -> can change the routing - * (Type & 0x1F) Type - * 0 -> Pad1 (only 1 byte!) - * 1 -> PadN LENGTH info (total length = length + 2) - * C0 | 2 -> JUMBO 4 x x x x ( xxxx > 64k ) - * 5 -> RTALERT 2 x x - */ - -static int -match(const struct sk_buff *skb, - const struct net_device *in, - const struct net_device *out, - const struct xt_match *match, - const void *matchinfo, - int offset, - unsigned int protoff, - int *hotdrop) -{ - struct ipv6_opt_hdr _optsh, *oh; - const struct ip6t_opts *optinfo = matchinfo; - unsigned int temp; - unsigned int ptr; - unsigned int hdrlen = 0; - unsigned int ret = 0; - u8 _opttype, *tp = NULL; - u8 _optlen, *lp = NULL; - unsigned int optlen; - -#if HOPBYHOP - if (ipv6_find_hdr(skb, &ptr, NEXTHDR_HOP, NULL) < 0) -#else - if (ipv6_find_hdr(skb, &ptr, NEXTHDR_DEST, NULL) < 0) -#endif - return 0; - - oh = skb_header_pointer(skb, ptr, sizeof(_optsh), &_optsh); - if (oh == NULL) { - *hotdrop = 1; - return 0; - } - - hdrlen = ipv6_optlen(oh); - if (skb->len - ptr < hdrlen) { - /* Packet smaller than it's length field */ - return 0; - } - - DEBUGP("IPv6 OPTS LEN %u %u ", hdrlen, oh->hdrlen); - - DEBUGP("len %02X %04X %02X ", - optinfo->hdrlen, hdrlen, - (!(optinfo->flags & IP6T_OPTS_LEN) || - ((optinfo->hdrlen == hdrlen) ^ - !!(optinfo->invflags & IP6T_OPTS_INV_LEN)))); - - ret = (oh != NULL) && - (!(optinfo->flags & IP6T_OPTS_LEN) || - ((optinfo->hdrlen == hdrlen) ^ - !!(optinfo->invflags & IP6T_OPTS_INV_LEN))); - - ptr += 2; - hdrlen -= 2; - if (!(optinfo->flags & IP6T_OPTS_OPTS)) { - return ret; - } else if (optinfo->flags & IP6T_OPTS_NSTRICT) { - DEBUGP("Not strict - not implemented"); - } else { - DEBUGP("Strict "); - DEBUGP("#%d ", optinfo->optsnr); - for (temp = 0; temp < optinfo->optsnr; temp++) { - /* type field exists ? */ - if (hdrlen < 1) - break; - tp = skb_header_pointer(skb, ptr, sizeof(_opttype), - &_opttype); - if (tp == NULL) - break; - - /* Type check */ - if (*tp != (optinfo->opts[temp] & 0xFF00) >> 8) { - DEBUGP("Tbad %02X %02X\n", - *tp, - (optinfo->opts[temp] & 0xFF00) >> 8); - return 0; - } else { - DEBUGP("Tok "); - } - /* Length check */ - if (*tp) { - u16 spec_len; - - /* length field exists ? */ - if (hdrlen < 2) - break; - lp = skb_header_pointer(skb, ptr + 1, - sizeof(_optlen), - &_optlen); - if (lp == NULL) - break; - spec_len = optinfo->opts[temp] & 0x00FF; - - if (spec_len != 0x00FF && spec_len != *lp) { - DEBUGP("Lbad %02X %04X\n", *lp, - spec_len); - return 0; - } - DEBUGP("Lok "); - optlen = *lp + 2; - } else { - DEBUGP("Pad1\n"); - optlen = 1; - } - - /* Step to the next */ - DEBUGP("len%04X \n", optlen); - - if ((ptr > skb->len - optlen || hdrlen < optlen) && - (temp < optinfo->optsnr - 1)) { - DEBUGP("new pointer is too large! \n"); - break; - } - ptr += optlen; - hdrlen -= optlen; - } - if (temp == optinfo->optsnr) - return ret; - else - return 0; - } - - return 0; -} - -/* Called when user tries to insert an entry of this type. */ -static int -checkentry(const char *tablename, - const void *info, - const struct xt_match *match, - void *matchinfo, - unsigned int hook_mask) -{ - const struct ip6t_opts *optsinfo = matchinfo; - - if (optsinfo->invflags & ~IP6T_OPTS_INV_MASK) { - DEBUGP("ip6t_opts: unknown flags %X\n", optsinfo->invflags); - return 0; - } - return 1; -} - -static struct ip6t_match opts_match = { -#if HOPBYHOP - .name = "hbh", -#else - .name = "dst", -#endif - .match = match, - .matchsize = sizeof(struct ip6t_opts), - .checkentry = checkentry, - .me = THIS_MODULE, -}; - -static int __init ip6t_dst_init(void) -{ - return ip6t_register_match(&opts_match); -} - -static void __exit ip6t_dst_fini(void) -{ - ip6t_unregister_match(&opts_match); -} - -module_init(ip6t_dst_init); -module_exit(ip6t_dst_fini); diff --git a/net/ipv6/netfilter/ip6t_hbh.c b/net/ipv6/netfilter/ip6t_hbh.c index 72defc8..d32a205 100644 --- a/net/ipv6/netfilter/ip6t_hbh.c +++ b/net/ipv6/netfilter/ip6t_hbh.c @@ -19,15 +19,10 @@ #include #include #include -#define HOPBYHOP 1 - MODULE_LICENSE("GPL"); -#if HOPBYHOP -MODULE_DESCRIPTION("IPv6 HbH match"); -#else -MODULE_DESCRIPTION("IPv6 DST match"); -#endif +MODULE_DESCRIPTION("IPv6 opts match"); MODULE_AUTHOR("Andras Kis-Szabo "); +MODULE_ALIAS("ip6t_dst"); #if 0 #define DEBUGP printk @@ -71,11 +66,7 @@ match(const struct sk_buff *skb, u8 _optlen, *lp = NULL; unsigned int optlen; -#if HOPBYHOP - if (ipv6_find_hdr(skb, &ptr, NEXTHDR_HOP, NULL) < 0) -#else - if (ipv6_find_hdr(skb, &ptr, NEXTHDR_DEST, NULL) < 0) -#endif + if (ipv6_find_hdr(skb, &ptr, match->data, NULL) < 0) return 0; oh = skb_header_pointer(skb, ptr, sizeof(_optsh), &_optsh); @@ -193,26 +184,35 @@ checkentry(const char *tablename, return 1; } -static struct ip6t_match opts_match = { -#if HOPBYHOP - .name = "hbh", -#else - .name = "dst", -#endif - .match = match, - .matchsize = sizeof(struct ip6t_opts), - .checkentry = checkentry, - .me = THIS_MODULE, +static struct xt_match opts_match[] = { + { + .name = "hbh", + .family = AF_INET6, + .match = match, + .matchsize = sizeof(struct ip6t_opts), + .checkentry = checkentry, + .me = THIS_MODULE, + .data = NEXTHDR_HOP, + }, + { + .name = "dst", + .family = AF_INET6, + .match = match, + .matchsize = sizeof(struct ip6t_opts), + .checkentry = checkentry, + .me = THIS_MODULE, + .data = NEXTHDR_DEST, + }, }; static int __init ip6t_hbh_init(void) { - return ip6t_register_match(&opts_match); + return xt_register_matches(opts_match, ARRAY_SIZE(opts_match)); } static void __exit ip6t_hbh_fini(void) { - ip6t_unregister_match(&opts_match); + xt_unregister_matches(opts_match, ARRAY_SIZE(opts_match)); } module_init(ip6t_hbh_init); From kaber at trash.net Tue Aug 22 00:52:35 2006 From: kaber at trash.net (Patrick McHardy) Date: Tue Aug 22 01:25:09 2006 Subject: [NETFILTER 13/18]: x_tables: remove unused argument to target functions In-Reply-To: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> References: <20060821225217.10288.69738.sendpatchset@localhost.localdomain> Message-ID: <20060821225235.10288.7810.sendpatchset@localhost.localdomain> [NETFILTER]: x_tables: remove unused argument to target functions Signed-off-by: Patrick McHardy --- commit 855a763072e92f8b1e4931003cc82a0b3ba1131d tree e683e7d21c3d96032d90c0634a6d62c2473932ac parent 160a7782ed42f6295b9d68b18bda7eb1f37e86a0 author Patrick McHardy Sun, 13 Aug 2006 19:46:20 +0200 committer Patrick McHardy Sun, 13 Aug 2006 19:46:20 +0200 include/linux/netfilter/x_tables.h | 3 +-- include/linux/netfilter_arp/arp_tables.h | 3 +-- include/linux/netfilter_ipv4/ip_tables.h | 3 +-- include/linux/netfilter_ipv6/ip6_tables.h | 3 +-- net/ipv4/netfilter/arp_tables.c | 9 +++------ net/ipv4/netfilter/arpt_mangle.c | 2 +- net/ipv4/netfilter/arptable_filter.c | 2 +- net/ipv4/netfilter/ip_nat_rule.c | 8 +++----- net/ipv4/netfilter/ip_tables.c | 9 +++------ net/ipv4/netfilter/ipt_CLUSTERIP.c | 3 +-- net/ipv4/netfilter/ipt_ECN.c | 3 +-- net/ipv4/netfilter/ipt_LOG.c | 3 +-- net/ipv4/netfilter/ipt_MASQUERADE.c | 3 +-- net/ipv4/netfilter/ipt_NETMAP.c | 3 +-- net/ipv4/netfilter/ipt_REDIRECT.c | 3 +-- net/ipv4/netfilter/ipt_REJECT.c | 3 +-- net/ipv4/netfilter/ipt_SAME.c | 3 +-- net/ipv4/netfilter/ipt_TCPMSS.c | 3 +-- net/ipv4/netfilter/ipt_TOS.c | 3 +-- net/ipv4/netfilter/ipt_TTL.c | 2 +- net/ipv4/netfilter/ipt_ULOG.c | 2 +- net/ipv4/netfilter/iptable_filter.c | 4 ++-- net/ipv4/netfilter/iptable_mangle.c | 4 ++-- net/ipv4/netfilter/iptable_raw.c | 2 +- net/ipv6/netfilter/ip6_tables.c | 9 +++------ net/ipv6/netfilter/ip6t_HL.c | 2 +- net/ipv6/netfilter/ip6t_LOG.c | 3 +-- net/ipv6/netfilter/ip6t_REJECT.c | 3 +-- net/ipv6/netfilter/ip6table_filter.c | 4 ++-- net/ipv6/netfilter/ip6table_mangle.c | 4 ++-- net/ipv6/netfilter/ip6table_raw.c | 2 +- net/netfilter/xt_CLASSIFY.c | 3 +-- net/netfilter/xt_CONNMARK.c | 3 +-- net/netfilter/xt_CONNSECMARK.c | 2 +- net/netfilter/xt_DSCP.c | 6 ++---- net/netfilter/xt_MARK.c | 6 ++---- net/netfilter/xt_NFQUEUE.c | 3 +-- net/netfilter/xt_NOTRACK.c | 3 +-- net/netfilter/xt_SECMARK.c | 2 +- net/netfilter/xt_connbytes.c | 2 +- net/sched/act_ipt.c | 3 +-- 41 files changed, 55 insertions(+), 91 deletions(-) diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 9a99124..9cef0e9 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -211,8 +211,7 @@ struct xt_target const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userdata); + const void *targinfo); /* Called when user tries to insert an entry of this type: hook_mask is a bitmask of hooks from which it can be diff --git a/include/linux/netfilter_arp/arp_tables.h b/include/linux/netfilter_arp/arp_tables.h index 62cc27d..149e87c 100644 --- a/include/linux/netfilter_arp/arp_tables.h +++ b/include/linux/netfilter_arp/arp_tables.h @@ -248,8 +248,7 @@ extern unsigned int arpt_do_table(struct unsigned int hook, const struct net_device *in, const struct net_device *out, - struct arpt_table *table, - void *userdata); + struct arpt_table *table); #define ARPT_ALIGN(s) (((s) + (__alignof__(struct arpt_entry)-1)) & ~(__alignof__(struct arpt_entry)-1)) #endif /*__KERNEL__*/ diff --git a/include/linux/netfilter_ipv4/ip_tables.h b/include/linux/netfilter_ipv4/ip_tables.h index c0dac16..a536bbd 100644 --- a/include/linux/netfilter_ipv4/ip_tables.h +++ b/include/linux/netfilter_ipv4/ip_tables.h @@ -312,8 +312,7 @@ extern unsigned int ipt_do_table(struct unsigned int hook, const struct net_device *in, const struct net_device *out, - struct ipt_table *table, - void *userdata); + struct ipt_table *table); #define IPT_ALIGN(s) XT_ALIGN(s) diff --git a/include/linux/netfilter_ipv6/ip6_tables.h b/include/linux/netfilter_ipv6/ip6_tables.h index d0d5d1e..d7a8e9c 100644 --- a/include/linux/netfilter_ipv6/ip6_tables.h +++ b/include/linux/netfilter_ipv6/ip6_tables.h @@ -300,8 +300,7 @@ extern unsigned int ip6t_do_table(struct unsigned int hook, const struct net_device *in, const struct net_device *out, - struct ip6t_table *table, - void *userdata); + struct ip6t_table *table); /* Check for an extension */ extern int ip6t_ext_hdr(u8 nexthdr); diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 80c73ca..c38c6c4 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -208,8 +208,7 @@ static unsigned int arpt_error(struct sk const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { if (net_ratelimit()) printk("arp_tables: error: '%s'\n", (char *)targinfo); @@ -226,8 +225,7 @@ unsigned int arpt_do_table(struct sk_buf unsigned int hook, const struct net_device *in, const struct net_device *out, - struct arpt_table *table, - void *userdata) + struct arpt_table *table) { static const char nulldevname[IFNAMSIZ]; unsigned int verdict = NF_DROP; @@ -301,8 +299,7 @@ unsigned int arpt_do_table(struct sk_buf in, out, hook, t->u.kernel.target, - t->data, - userdata); + t->data); /* Target might have changed stuff. */ arp = (*pskb)->nh.arph; diff --git a/net/ipv4/netfilter/arpt_mangle.c b/net/ipv4/netfilter/arpt_mangle.c index a58325c..05fb242 100644 --- a/net/ipv4/netfilter/arpt_mangle.c +++ b/net/ipv4/netfilter/arpt_mangle.c @@ -11,7 +11,7 @@ static unsigned int target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, void *userinfo) + const void *targinfo) { const struct arpt_mangle *mangle = targinfo; struct arphdr *arp; diff --git a/net/ipv4/netfilter/arptable_filter.c b/net/ipv4/netfilter/arptable_filter.c index d7c472f..7edea2a 100644 --- a/net/ipv4/netfilter/arptable_filter.c +++ b/net/ipv4/netfilter/arptable_filter.c @@ -155,7 +155,7 @@ static unsigned int arpt_hook(unsigned i const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return arpt_do_table(pskb, hook, in, out, &packet_filter, NULL); + return arpt_do_table(pskb, hook, in, out, &packet_filter); } static struct nf_hook_ops arpt_ops[] = { diff --git a/net/ipv4/netfilter/ip_nat_rule.c b/net/ipv4/netfilter/ip_nat_rule.c index 1aba926..1aa0e4f 100644 --- a/net/ipv4/netfilter/ip_nat_rule.c +++ b/net/ipv4/netfilter/ip_nat_rule.c @@ -104,8 +104,7 @@ static unsigned int ipt_snat_target(stru const struct net_device *out, unsigned int hooknum, const struct ipt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { struct ip_conntrack *ct; enum ip_conntrack_info ctinfo; @@ -147,8 +146,7 @@ static unsigned int ipt_dnat_target(stru const struct net_device *out, unsigned int hooknum, const struct ipt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { struct ip_conntrack *ct; enum ip_conntrack_info ctinfo; @@ -255,7 +253,7 @@ int ip_nat_rule_find(struct sk_buff **ps { int ret; - ret = ipt_do_table(pskb, hooknum, in, out, &nat_table, NULL); + ret = ipt_do_table(pskb, hooknum, in, out, &nat_table); if (ret == NF_ACCEPT) { if (!ip_nat_initialized(ct, HOOK2MANIP(hooknum))) diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index fc5bdd5..bdf9196 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -180,8 +180,7 @@ ipt_error(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { if (net_ratelimit()) printk("ip_tables: error: `%s'\n", (char *)targinfo); @@ -217,8 +216,7 @@ ipt_do_table(struct sk_buff **pskb, unsigned int hook, const struct net_device *in, const struct net_device *out, - struct ipt_table *table, - void *userdata) + struct ipt_table *table) { static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long)))); u_int16_t offset; @@ -307,8 +305,7 @@ #endif in, out, hook, t->u.kernel.target, - t->data, - userdata); + t->data); #ifdef CONFIG_NETFILTER_DEBUG if (((struct ipt_entry *)table_base)->comefrom diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c index d994c5f..a08383c 100644 --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c @@ -302,8 +302,7 @@ target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct ipt_clusterip_tgt_info *cipinfo = targinfo; enum ip_conntrack_info ctinfo; diff --git a/net/ipv4/netfilter/ipt_ECN.c b/net/ipv4/netfilter/ipt_ECN.c index 7e30e6d..1c3da4a 100644 --- a/net/ipv4/netfilter/ipt_ECN.c +++ b/net/ipv4/netfilter/ipt_ECN.c @@ -85,8 +85,7 @@ target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct ipt_ECN_info *einfo = targinfo; diff --git a/net/ipv4/netfilter/ipt_LOG.c b/net/ipv4/netfilter/ipt_LOG.c index b98f7b0..a8d356c 100644 --- a/net/ipv4/netfilter/ipt_LOG.c +++ b/net/ipv4/netfilter/ipt_LOG.c @@ -416,8 +416,7 @@ ipt_log_target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct ipt_log_info *loginfo = targinfo; struct nf_loginfo li; diff --git a/net/ipv4/netfilter/ipt_MASQUERADE.c b/net/ipv4/netfilter/ipt_MASQUERADE.c index ebd94f2..9659793 100644 --- a/net/ipv4/netfilter/ipt_MASQUERADE.c +++ b/net/ipv4/netfilter/ipt_MASQUERADE.c @@ -64,8 +64,7 @@ masquerade_target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { struct ip_conntrack *ct; enum ip_conntrack_info ctinfo; diff --git a/net/ipv4/netfilter/ipt_NETMAP.c b/net/ipv4/netfilter/ipt_NETMAP.c index 736c4b5..fd5e74a 100644 --- a/net/ipv4/netfilter/ipt_NETMAP.c +++ b/net/ipv4/netfilter/ipt_NETMAP.c @@ -55,8 +55,7 @@ target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { struct ip_conntrack *ct; enum ip_conntrack_info ctinfo; diff --git a/net/ipv4/netfilter/ipt_REDIRECT.c b/net/ipv4/netfilter/ipt_REDIRECT.c index f290463..839fe99 100644 --- a/net/ipv4/netfilter/ipt_REDIRECT.c +++ b/net/ipv4/netfilter/ipt_REDIRECT.c @@ -58,8 +58,7 @@ redirect_target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { struct ip_conntrack *ct; enum ip_conntrack_info ctinfo; diff --git a/net/ipv4/netfilter/ipt_REJECT.c b/net/ipv4/netfilter/ipt_REJECT.c index 95c6662..1dfd8e5 100644 --- a/net/ipv4/netfilter/ipt_REJECT.c +++ b/net/ipv4/netfilter/ipt_REJECT.c @@ -228,8 +228,7 @@ static unsigned int reject(struct sk_buf const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct ipt_reject_info *reject = targinfo; diff --git a/net/ipv4/netfilter/ipt_SAME.c b/net/ipv4/netfilter/ipt_SAME.c index 7169b09..cf80174 100644 --- a/net/ipv4/netfilter/ipt_SAME.c +++ b/net/ipv4/netfilter/ipt_SAME.c @@ -133,8 +133,7 @@ same_target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { struct ip_conntrack *ct; enum ip_conntrack_info ctinfo; diff --git a/net/ipv4/netfilter/ipt_TCPMSS.c b/net/ipv4/netfilter/ipt_TCPMSS.c index 0fce85e..6d668dc 100644 --- a/net/ipv4/netfilter/ipt_TCPMSS.c +++ b/net/ipv4/netfilter/ipt_TCPMSS.c @@ -41,8 +41,7 @@ ipt_tcpmss_target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct ipt_tcpmss_info *tcpmssinfo = targinfo; struct tcphdr *tcph; diff --git a/net/ipv4/netfilter/ipt_TOS.c b/net/ipv4/netfilter/ipt_TOS.c index 52e9d70..043df01 100644 --- a/net/ipv4/netfilter/ipt_TOS.c +++ b/net/ipv4/netfilter/ipt_TOS.c @@ -26,8 +26,7 @@ target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct ipt_tos_target_info *tosinfo = targinfo; struct iphdr *iph = (*pskb)->nh.iph; diff --git a/net/ipv4/netfilter/ipt_TTL.c b/net/ipv4/netfilter/ipt_TTL.c index 2afb2a8..1640071 100644 --- a/net/ipv4/netfilter/ipt_TTL.c +++ b/net/ipv4/netfilter/ipt_TTL.c @@ -23,7 +23,7 @@ static unsigned int ipt_ttl_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, void *userinfo) + const void *targinfo) { struct iphdr *iph; const struct ipt_TTL_info *info = targinfo; diff --git a/net/ipv4/netfilter/ipt_ULOG.c b/net/ipv4/netfilter/ipt_ULOG.c index d7dd7fe..062b456 100644 --- a/net/ipv4/netfilter/ipt_ULOG.c +++ b/net/ipv4/netfilter/ipt_ULOG.c @@ -303,7 +303,7 @@ static unsigned int ipt_ulog_target(stru const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, void *userinfo) + const void *targinfo) { struct ipt_ulog_info *loginfo = (struct ipt_ulog_info *) targinfo; diff --git a/net/ipv4/netfilter/iptable_filter.c b/net/ipv4/netfilter/iptable_filter.c index 7f41748..e2e7dd8 100644 --- a/net/ipv4/netfilter/iptable_filter.c +++ b/net/ipv4/netfilter/iptable_filter.c @@ -90,7 +90,7 @@ ipt_hook(unsigned int hook, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ipt_do_table(pskb, hook, in, out, &packet_filter, NULL); + return ipt_do_table(pskb, hook, in, out, &packet_filter); } static unsigned int @@ -108,7 +108,7 @@ ipt_local_out_hook(unsigned int hook, return NF_ACCEPT; } - return ipt_do_table(pskb, hook, in, out, &packet_filter, NULL); + return ipt_do_table(pskb, hook, in, out, &packet_filter); } static struct nf_hook_ops ipt_ops[] = { diff --git a/net/ipv4/netfilter/iptable_mangle.c b/net/ipv4/netfilter/iptable_mangle.c index 4e7998b..79336cb 100644 --- a/net/ipv4/netfilter/iptable_mangle.c +++ b/net/ipv4/netfilter/iptable_mangle.c @@ -119,7 +119,7 @@ ipt_route_hook(unsigned int hook, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ipt_do_table(pskb, hook, in, out, &packet_mangler, NULL); + return ipt_do_table(pskb, hook, in, out, &packet_mangler); } static unsigned int @@ -148,7 +148,7 @@ ipt_local_hook(unsigned int hook, daddr = (*pskb)->nh.iph->daddr; tos = (*pskb)->nh.iph->tos; - ret = ipt_do_table(pskb, hook, in, out, &packet_mangler, NULL); + ret = ipt_do_table(pskb, hook, in, out, &packet_mangler); /* Reroute for ANY change. */ if (ret != NF_DROP && ret != NF_STOLEN && ret != NF_QUEUE && ((*pskb)->nh.iph->saddr != saddr diff --git a/net/ipv4/netfilter/iptable_raw.c b/net/ipv4/netfilter/iptable_raw.c index 7912cce..bcbeb4a 100644 --- a/net/ipv4/netfilter/iptable_raw.c +++ b/net/ipv4/netfilter/iptable_raw.c @@ -95,7 +95,7 @@ ipt_hook(unsigned int hook, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ipt_do_table(pskb, hook, in, out, &packet_raw, NULL); + return ipt_do_table(pskb, hook, in, out, &packet_raw); } /* 'raw' is the very first table. */ diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index f26898b..1978b6c 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -220,8 +220,7 @@ ip6t_error(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { if (net_ratelimit()) printk("ip6_tables: error: `%s'\n", (char *)targinfo); @@ -258,8 +257,7 @@ ip6t_do_table(struct sk_buff **pskb, unsigned int hook, const struct net_device *in, const struct net_device *out, - struct xt_table *table, - void *userdata) + struct xt_table *table) { static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long)))); int offset = 0; @@ -349,8 +347,7 @@ #endif in, out, hook, t->u.kernel.target, - t->data, - userdata); + t->data); #ifdef CONFIG_NETFILTER_DEBUG if (((struct ip6t_entry *)table_base)->comefrom diff --git a/net/ipv6/netfilter/ip6t_HL.c b/net/ipv6/netfilter/ip6t_HL.c index b8eff8e..c85d124 100644 --- a/net/ipv6/netfilter/ip6t_HL.c +++ b/net/ipv6/netfilter/ip6t_HL.c @@ -22,7 +22,7 @@ static unsigned int ip6t_hl_target(struc const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, void *userinfo) + const void *targinfo) { struct ipv6hdr *ip6h; const struct ip6t_HL_info *info = targinfo; diff --git a/net/ipv6/netfilter/ip6t_LOG.c b/net/ipv6/netfilter/ip6t_LOG.c index 73c6300..acb9173 100644 --- a/net/ipv6/netfilter/ip6t_LOG.c +++ b/net/ipv6/netfilter/ip6t_LOG.c @@ -427,8 +427,7 @@ ip6t_log_target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct ip6t_log_info *loginfo = targinfo; struct nf_loginfo li; diff --git a/net/ipv6/netfilter/ip6t_REJECT.c b/net/ipv6/netfilter/ip6t_REJECT.c index 7929ff4..343acd3 100644 --- a/net/ipv6/netfilter/ip6t_REJECT.c +++ b/net/ipv6/netfilter/ip6t_REJECT.c @@ -180,8 +180,7 @@ static unsigned int reject6_target(struc const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct ip6t_reject_info *reject = targinfo; diff --git a/net/ipv6/netfilter/ip6table_filter.c b/net/ipv6/netfilter/ip6table_filter.c index 60976c0..2fc07c7 100644 --- a/net/ipv6/netfilter/ip6table_filter.c +++ b/net/ipv6/netfilter/ip6table_filter.c @@ -108,7 +108,7 @@ ip6t_hook(unsigned int hook, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ip6t_do_table(pskb, hook, in, out, &packet_filter, NULL); + return ip6t_do_table(pskb, hook, in, out, &packet_filter); } static unsigned int @@ -128,7 +128,7 @@ #if 0 } #endif - return ip6t_do_table(pskb, hook, in, out, &packet_filter, NULL); + return ip6t_do_table(pskb, hook, in, out, &packet_filter); } static struct nf_hook_ops ip6t_ops[] = { diff --git a/net/ipv6/netfilter/ip6table_mangle.c b/net/ipv6/netfilter/ip6table_mangle.c index 03a13ea..32db04f 100644 --- a/net/ipv6/netfilter/ip6table_mangle.c +++ b/net/ipv6/netfilter/ip6table_mangle.c @@ -138,7 +138,7 @@ ip6t_route_hook(unsigned int hook, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ip6t_do_table(pskb, hook, in, out, &packet_mangler, NULL); + return ip6t_do_table(pskb, hook, in, out, &packet_mangler); } static unsigned int @@ -174,7 +174,7 @@ #endif /* flowlabel and prio (includes version, which shouldn't change either */ flowlabel = *((u_int32_t *) (*pskb)->nh.ipv6h); - ret = ip6t_do_table(pskb, hook, in, out, &packet_mangler, NULL); + ret = ip6t_do_table(pskb, hook, in, out, &packet_mangler); if (ret != NF_DROP && ret != NF_STOLEN && (memcmp(&(*pskb)->nh.ipv6h->saddr, &saddr, sizeof(saddr)) diff --git a/net/ipv6/netfilter/ip6table_raw.c b/net/ipv6/netfilter/ip6table_raw.c index 61a7c58..b4154da 100644 --- a/net/ipv6/netfilter/ip6table_raw.c +++ b/net/ipv6/netfilter/ip6table_raw.c @@ -122,7 +122,7 @@ ip6t_hook(unsigned int hook, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ip6t_do_table(pskb, hook, in, out, &packet_raw, NULL); + return ip6t_do_table(pskb, hook, in, out, &packet_raw); } static struct nf_hook_ops ip6t_ops[] = { diff --git a/net/netfilter/xt_CLASSIFY.c b/net/netfilter/xt_CLASSIFY.c index 2d77ebb..5b3bff6 100644 --- a/net/netfilter/xt_CLASSIFY.c +++ b/net/netfilter/xt_CLASSIFY.c @@ -29,8 +29,7 @@ target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct xt_classify_target_info *clinfo = targinfo; diff --git a/net/netfilter/xt_CONNMARK.c b/net/netfilter/xt_CONNMARK.c index e577356..c2125f6 100644 --- a/net/netfilter/xt_CONNMARK.c +++ b/net/netfilter/xt_CONNMARK.c @@ -38,8 +38,7 @@ target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct xt_connmark_target_info *markinfo = targinfo; u_int32_t diff; diff --git a/net/netfilter/xt_CONNSECMARK.c b/net/netfilter/xt_CONNSECMARK.c index 48f7fc3..4b9cc65 100644 --- a/net/netfilter/xt_CONNSECMARK.c +++ b/net/netfilter/xt_CONNSECMARK.c @@ -66,7 +66,7 @@ static void secmark_restore(struct sk_bu static unsigned int target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, void *userinfo) + const void *targinfo) { struct sk_buff *skb = *pskb; const struct xt_connsecmark_target_info *info = targinfo; diff --git a/net/netfilter/xt_DSCP.c b/net/netfilter/xt_DSCP.c index a1cd972..9d23c95 100644 --- a/net/netfilter/xt_DSCP.c +++ b/net/netfilter/xt_DSCP.c @@ -32,8 +32,7 @@ static unsigned int target(struct sk_buf const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct xt_DSCP_info *dinfo = targinfo; u_int8_t dscp = ipv4_get_dsfield((*pskb)->nh.iph) >> XT_DSCP_SHIFT; @@ -54,8 +53,7 @@ static unsigned int target6(struct sk_bu const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct xt_DSCP_info *dinfo = targinfo; u_int8_t dscp = ipv6_get_dsfield((*pskb)->nh.ipv6h) >> XT_DSCP_SHIFT; diff --git a/net/netfilter/xt_MARK.c b/net/netfilter/xt_MARK.c index 0a61272..95a171c 100644 --- a/net/netfilter/xt_MARK.c +++ b/net/netfilter/xt_MARK.c @@ -27,8 +27,7 @@ target_v0(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct xt_mark_target_info *markinfo = targinfo; @@ -44,8 +43,7 @@ target_v1(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct xt_mark_target_info_v1 *markinfo = targinfo; int mark = 0; diff --git a/net/netfilter/xt_NFQUEUE.c b/net/netfilter/xt_NFQUEUE.c index 7b98228..db9b896 100644 --- a/net/netfilter/xt_NFQUEUE.c +++ b/net/netfilter/xt_NFQUEUE.c @@ -29,8 +29,7 @@ target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { const struct xt_NFQ_info *tinfo = targinfo; diff --git a/net/netfilter/xt_NOTRACK.c b/net/netfilter/xt_NOTRACK.c index cab881d..6d00dca 100644 --- a/net/netfilter/xt_NOTRACK.c +++ b/net/netfilter/xt_NOTRACK.c @@ -16,8 +16,7 @@ target(struct sk_buff **pskb, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, - void *userinfo) + const void *targinfo) { /* Previously seen (loopback)? Ignore. */ if ((*pskb)->nfct != NULL) diff --git a/net/netfilter/xt_SECMARK.c b/net/netfilter/xt_SECMARK.c index 4300988..8a04dcf 100644 --- a/net/netfilter/xt_SECMARK.c +++ b/net/netfilter/xt_SECMARK.c @@ -31,7 +31,7 @@ static u8 mode; static unsigned int target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, - const void *targinfo, void *userinfo) + const void *targinfo) { u32 secmark = 0; const struct xt_secmark_target_info *info = targinfo; diff --git a/net/netfilter/xt_connbytes.c b/net/netfilter/xt_connbytes.c index 2d49948..d725e8b 100644 --- a/net/netfilter/xt_connbytes.c +++ b/net/netfilter/xt_connbytes.c @@ -143,7 +143,7 @@ static int check(const char *tablename, return 1; } -static struct xt_match xt_connbytes_match = { +static struct xt_match xt_connbytes_match[] = { { .name = "connbytes", .family = AF_INET, diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c index d799e01..1a5f49e 100644 --- a/net/sched/act_ipt.c +++ b/net/sched/act_ipt.c @@ -230,8 +230,7 @@ tcf_ipt(struct sk_buff *skb, struct tc_a * needs to be replaced. We don't own the skb, so this must not * happen. The pskb_expand_head above should make sure of this */ ret = p->t->u.kernel.target->target(&skb, skb->dev, NULL, p->hook, - p->t->u.kernel.target, p->t->data, - NULL); + p->t->u.kernel.target, p->t->data);