pablo | 8 May 02:21 2012

[PATCH 00/25] netfilter updates for net-next (upcoming 3.5)

From: Pablo Neira Ayuso <pablo <at> netfilter.org>

Hi David,

The following patchset contains the Netfilter updates for net-next.
Most notably:

* The new /proc/sys/net/netfilter/nf_conntrack_helper entry that
  allows to disable the automatic conntrack helper assignment from
  Eric Leblond. This patch also spots a warning to inform the user
  that this behaviour will be removed at some point. The automatic
  conntrack helper assignment may allows attackers to open hole in
  the firewall to access the protected network segments (with
  incorrect configurations). More information on this issue at:

  https://home.regit.org/netfilter-en/secure-use-of-helpers/

  In the near future, all conntrack helpers will be explicitly
  attached via the CT target, as we longing discussed during
  the last netfilter workshop.

* One new sysctl to translate the input device to vlan device name
  from Florian Westphal. He required this to get the REDIRECT target
  working with another sysctl vlan-on-top-of-bridge.

* Major improvements in the ip_vs_sync code from Julian Anastasov.
  They aim to improve scalability and to address possible message
  loss due to socket overrun under high rate of synchronization
  messages.

(Continue reading)

pablo | 8 May 02:21 2012

[PATCH 02/25] netfilter: nf_ct_helper: allow to disable automatic helper assignment

From: Eric Leblond <eric <at> regit.org>

This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper

[ Note: flows that already got a helper will keep using it even
  if automatic helper assignment has been disabled ]

Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.

There are good reasons to stop supporting automatic helper
assignment, for further information, please read:

http://www.netfilter.org/news.html#2012-04-03

This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).

Signed-off-by: Eric Leblond <eric <at> regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo <at> netfilter.org>
---
 include/net/netfilter/nf_conntrack_helper.h |    4 +-
 include/net/netns/conntrack.h               |    3 +
 net/netfilter/nf_conntrack_core.c           |   15 ++--
 net/netfilter/nf_conntrack_helper.c         |  107 ++++++++++++++++++++++++---
(Continue reading)

David Miller | 8 May 03:34 2012
Picon

Re: [PATCH 02/25] netfilter: nf_ct_helper: allow to disable automatic helper assignment

From: pablo <at> netfilter.org
Date: Tue,  8 May 2012 02:21:56 +0200

> +	if (!net->ct.helper_sysctl_header) {
> +		printk(KERN_ERR "nf_conntrack_helper: can't register to sysctl.\n");
> +		goto out_register;
> +	}

Please use pr_err().

> +			printk(KERN_INFO "nf_conntrack: automatic helper "
> +				"assignment is deprecated. Please, read "
> +				"http://www.netfilter.org/news.html#2012-04-03\n");
> +			net->ct.auto_assign_helper_warned = true;

Please use pr_info().

Pointers to web sites to explain a problem is absolutely not
appropriate in kernel log messages, nor commit messages.

Either add a document to the kernel tree, or explain things
fully both in the kernel log message and the commit message.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pablo Neira Ayuso | 8 May 09:31 2012

Re: [PATCH 02/25] netfilter: nf_ct_helper: allow to disable automatic helper assignment

On Mon, May 07, 2012 at 09:34:18PM -0400, David Miller wrote:
> From: pablo <at> netfilter.org
> Date: Tue,  8 May 2012 02:21:56 +0200
> 
> > +	if (!net->ct.helper_sysctl_header) {
> > +		printk(KERN_ERR "nf_conntrack_helper: can't register to sysctl.\n");
> > +		goto out_register;
> > +	}
> 
> Please use pr_err().
> 
> > +			printk(KERN_INFO "nf_conntrack: automatic helper "
> > +				"assignment is deprecated. Please, read "
> > +				"http://www.netfilter.org/news.html#2012-04-03\n");
> > +			net->ct.auto_assign_helper_warned = true;
> 
> Please use pr_info().
> 
> Pointers to web sites to explain a problem is absolutely not
> appropriate in kernel log messages, nor commit messages.
> 
> Either add a document to the kernel tree, or explain things
> fully both in the kernel log message and the commit message.

I'll fix those. Thanks for spotting this issue.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 11/25] ipvs: use GFP_KERNEL allocation where possible

From: Sasha Levin <levinsasha928 <at> gmail.com>

Use GFP_KERNEL instead of GFP_ATOMIC when registering an ipvs protocol.

This is safe since it will always run from a process context.

Signed-off-by: Sasha Levin <levinsasha928 <at> gmail.com>
Acked-by: Julian Anastasov <ja <at> ssi.bg>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo <at> netfilter.org>
---
 net/netfilter/ipvs/ip_vs_proto.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index a981b7c..8726488 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
 <at>  <at>  -71,7 +71,7  <at>  <at>  register_ip_vs_proto_netns(struct net *net, struct ip_vs_protocol *pp)
 	struct netns_ipvs *ipvs = net_ipvs(net);
 	unsigned int hash = IP_VS_PROTO_HASH(pp->protocol);
 	struct ip_vs_proto_data *pd =
-			kzalloc(sizeof(struct ip_vs_proto_data), GFP_ATOMIC);
+			kzalloc(sizeof(struct ip_vs_proto_data), GFP_KERNEL);

 	if (!pd)
 		return -ENOMEM;
--

-- 
1.7.9.5

(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 07/25] ipvs: DH scheduler does not need GFP_ATOMIC allocation

From: Julian Anastasov <ja <at> ssi.bg>

	Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Signed-off-by: Hans Schillstrom <hans <at> schillstrom.com>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_dh.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
index 1a53a7a..8b7dca9 100644
--- a/net/netfilter/ipvs/ip_vs_dh.c
+++ b/net/netfilter/ipvs/ip_vs_dh.c
 <at>  <at>  -149,7 +149,7  <at>  <at>  static int ip_vs_dh_init_svc(struct ip_vs_service *svc)

 	/* allocate the DH table for this service */
 	tbl = kmalloc(sizeof(struct ip_vs_dh_bucket)*IP_VS_DH_TAB_SIZE,
-		      GFP_ATOMIC);
+		      GFP_KERNEL);
 	if (tbl == NULL)
 		return -ENOMEM;

--

-- 
1.7.9.5

pablo | 8 May 02:22 2012

[PATCH 12/25] ipvs: ignore IP_VS_CONN_F_NOOUTPUT in backup server

From: Julian Anastasov <ja <at> ssi.bg>

	As IP_VS_CONN_F_NOOUTPUT is derived from the
forwarding method we should get it from conn_flags just
like we do it for IP_VS_CONN_F_FWD_MASK bits when binding
to real server.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_conn.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 4a09b78..f562e63 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
 <at>  <at>  -567,7 +567,7  <at>  <at>  ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)
 		if (!(cp->flags & IP_VS_CONN_F_TEMPLATE))
 			conn_flags &= ~IP_VS_CONN_F_INACTIVE;
 		/* connections inherit forwarding method from dest */
-		cp->flags &= ~IP_VS_CONN_F_FWD_MASK;
+		cp->flags &= ~(IP_VS_CONN_F_FWD_MASK | IP_VS_CONN_F_NOOUTPUT);
 	}
 	cp->flags |= conn_flags;
 	cp->dest = dest;
--

-- 
1.7.9.5

(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 15/25] ipvs: always update some of the flags bits in backup

From: Julian Anastasov <ja <at> ssi.bg>

	As the goal is to mirror the inactconns/activeconns
counters in the backup server, make sure the cp->flags are
updated even if cp is still not bound to dest. If cp->flags
are not updated ip_vs_bind_dest will rely only on the initial
flags when updating the counters. To avoid mistakes and
complicated checks for protocol state rely only on the
IP_VS_CONN_F_INACTIVE bit when updating the counters.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov <at> gmail.com>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 include/linux/ip_vs.h           |    5 +++
 net/netfilter/ipvs/ip_vs_sync.c |   65 ++++++++++++++-------------------------
 2 files changed, 28 insertions(+), 42 deletions(-)

diff --git a/include/linux/ip_vs.h b/include/linux/ip_vs.h
index be0ef3d..8a2d438 100644
--- a/include/linux/ip_vs.h
+++ b/include/linux/ip_vs.h
 <at>  <at>  -89,6 +89,7  <at>  <at> 
 #define IP_VS_CONN_F_TEMPLATE	0x1000		/* template, not connection */
 #define IP_VS_CONN_F_ONE_PACKET	0x2000		/* forward only one packet */

+/* Initial bits allowed in backup server */
 #define IP_VS_CONN_F_BACKUP_MASK (IP_VS_CONN_F_FWD_MASK | \
 				  IP_VS_CONN_F_NOOUTPUT | \
 				  IP_VS_CONN_F_INACTIVE | \
(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 21/25] ipvs: ip_vs_proto: local functions should not be exposed globally

From: H Hartley Sweeten <hartleys <at> visionengravers.com>

Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.

This quiets the sparse warnings:

warning: symbol '__ipvs_proto_data_get' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten <at> visionengravers.com>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_proto.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 8726488..e3f4bb0 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
 <at>  <at>  -153,7 +153,7  <at>  <at>  EXPORT_SYMBOL(ip_vs_proto_get);
 /*
  *	get ip_vs_protocol object data by netns and proto
  */
-struct ip_vs_proto_data *
+static struct ip_vs_proto_data *
 __ipvs_proto_data_get(struct netns_ipvs *ipvs, unsigned short proto)
 {
 	struct ip_vs_proto_data *pd;
--

-- 
1.7.9.5
(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 20/25] ipvs: ip_vs_ftp: local functions should not be exposed globally

From: H Hartley Sweeten <hartleys <at> visionengravers.com>

Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.

This quiets the sparse warnings:

warning: symbol 'ip_vs_ftp_init' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten <at> visionengravers.com>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_ftp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index debb8c7..091bec9 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
 <at>  <at>  -483,7 +483,7  <at>  <at>  static struct pernet_operations ip_vs_ftp_ops = {
 	.exit = __ip_vs_ftp_exit,
 };

-int __init ip_vs_ftp_init(void)
+static int __init ip_vs_ftp_init(void)
 {
 	int rv;

--

-- 
1.7.9.5
(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 24/25] netfilter: nf_conntrack: fix explicit helper attachment and NAT

From: Pablo Neira Ayuso <pablo <at> netfilter.org>

Explicit helper attachment via the CT target is broken with NAT
if non-standard ports are used. This problem was hidden behind
the automatic helper assignment routine. Thus, it becomes more
noticeable now that we can disable the automatic helper assignment
with Eric Leblond's:

9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment

Basically, nf_conntrack_alter_reply asks for looking up the helper
up if NAT is enabled. Unfortunately, we don't have the conntrack
template at that point anymore.

Since we don't want to rely on the automatic helper assignment,
we can skip the second look-up and stick to the helper that was
attached by iptables. With the CT target, the user is in full
control of helper attachment, thus, the policy is to trust what
the user explicitly configures via iptables (no automatic magic
anymore).

Interestingly, this bug was hidden by the automatic helper look-up
code. But it can be easily trigger if you attach the helper in
a non-standard port, eg.

iptables -I PREROUTING -t raw -p tcp --dport 8888 \
	-j CT --helper ftp

And you disabled the automatic helper assignment.

(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 25/25] netfilter: remove ip_queue support

From: Pablo Neira Ayuso <pablo <at> netfilter.org>

This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.

This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.

Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.

Signed-off-by: Pablo Neira Ayuso <pablo <at> netfilter.org>
---
 Documentation/ABI/removed/ip_queue      |    9 +
 include/linux/netfilter_ipv4/Kbuild     |    1 -
 include/linux/netfilter_ipv4/ip_queue.h |   72 ----
 include/linux/netlink.h                 |    2 +-
 net/ipv4/netfilter/Makefile             |    3 -
 net/ipv4/netfilter/ip_queue.c           |  639 ------------------------------
 net/ipv6/netfilter/Kconfig              |   22 --
 net/ipv6/netfilter/Makefile             |    1 -
 net/ipv6/netfilter/ip6_queue.c          |  641 -------------------------------
 security/selinux/nlmsgtab.c             |   13 -
 10 files changed, 10 insertions(+), 1393 deletions(-)
 create mode 100644 Documentation/ABI/removed/ip_queue
 delete mode 100644 include/linux/netfilter_ipv4/ip_queue.h
 delete mode 100644 net/ipv4/netfilter/ip_queue.c
 delete mode 100644 net/ipv6/netfilter/ip6_queue.c
(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 19/25] ipvs: optimize the use of flags in ip_vs_bind_dest

From: Julian Anastasov <ja <at> ssi.bg>

	cp->flags is marked volatile but ip_vs_bind_dest
can safely modify the flags, so save some CPU cycles by
using temp variable.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_conn.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 6a43c93..7f21b91 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
 <at>  <at>  -548,6 +548,7  <at>  <at>  static inline void
 ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)
 {
 	unsigned int conn_flags;
+	__u32 flags;

 	/* if dest is NULL, then return directly */
 	if (!dest)
 <at>  <at>  -559,17 +560,19  <at>  <at>  ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)
 	conn_flags = atomic_read(&dest->conn_flags);
 	if (cp->protocol != IPPROTO_UDP)
 		conn_flags &= ~IP_VS_CONN_F_ONE_PACKET;
+	flags = cp->flags;
 	/* Bind with the destination and its corresponding transmitter */
(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 23/25] netfilter: nf_ct_expect: partially implement ctnetlink_change_expect

From: Kelvie Wong <kelvie <at> ieee.org>

This refreshes the "timeout" attribute in existing expectations if one is
given.

The use case for this would be for userspace helpers to extend the lifetime
of the expectation when requested, as this is not possible right now
without deleting/recreating the expectation.

I use this specifically for forwarding DCERPC traffic through:

DCERPC has a port mapper daemon that chooses a (seemingly) random port for
future traffic to go to. We expect this traffic (with a reasonable
timeout), but sometimes the port mapper will tell the client to continue
using the same port. This allows us to extend the expectation accordingly.

Signed-off-by: Kelvie Wong <kelvie <at> ieee.org>
Signed-off-by: Pablo Neira Ayuso <pablo <at> netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 462ec2d..6f4b00a 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
 <at>  <at>  -2080,7 +2080,15  <at>  <at>  static int
 ctnetlink_change_expect(struct nf_conntrack_expect *x,
 			const struct nlattr * const cda[])
 {
(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 18/25] ipvs: add support for sync threads

From: Julian Anastasov <ja <at> ssi.bg>

	Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.

	The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.

	Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.

	Make sure the backup socks do not block on reading.

Special thanks to Aleksey Chudov for helping in all tests.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov <at> gmail.com>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 include/net/ip_vs.h             |   34 +++-
 net/netfilter/ipvs/ip_vs_conn.c |    7 +
 net/netfilter/ipvs/ip_vs_ctl.c  |   29 ++-
 net/netfilter/ipvs/ip_vs_sync.c |  401 ++++++++++++++++++++++++---------------
 4 files changed, 305 insertions(+), 166 deletions(-)
(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 22/25] net: export sysctl_[r|w]mem_max symbols needed by ip_vs_sync

From: Hans Schillstrom <hans.schillstrom <at> ericsson.com>

To build ip_vs as a module sysctl_rmem_max and sysctl_wmem_max
needs to be exported.

The dependency was added by "ipvs: wakeup master thread" patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom <at> ericsson.com>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
Acked-by: David S. Miller <davem <at> davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo <at> netfilter.org>
---
 net/core/sock.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index c7e60ea..ac3131a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
 <at>  <at>  -258,7 +258,9  <at>  <at>  static struct lock_class_key af_callback_keys[AF_MAX];

 /* Run time adjustable parameters. */
 __u32 sysctl_wmem_max __read_mostly = SK_WMEM_MAX;
+EXPORT_SYMBOL(sysctl_wmem_max);
 __u32 sysctl_rmem_max __read_mostly = SK_RMEM_MAX;
+EXPORT_SYMBOL(sysctl_rmem_max);
 __u32 sysctl_wmem_default __read_mostly = SK_WMEM_MAX;
 __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;

--

-- 
(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 14/25] ipvs: fix ip_vs_try_bind_dest to rebind app and transmitter

From: Julian Anastasov <ja <at> ssi.bg>

	Initially, when the synced connection is created we
use the forwarding method provided by master but once we
bind to destination it can be changed. As result, we must
update the application and the transmitter.

	As ip_vs_try_bind_dest is called always for connections
that require dest binding, there is no need to validate the
cp and dest pointers.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_conn.c |   33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 7647f3b..9d237d7 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
 <at>  <at>  -612,14 +612,33  <at>  <at>  struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp)
 {
 	struct ip_vs_dest *dest;

-	if ((cp) && (!cp->dest)) {
-		dest = ip_vs_find_dest(ip_vs_conn_net(cp), cp->af, &cp->daddr,
-				       cp->dport, &cp->vaddr, cp->vport,
-				       cp->protocol, cp->fwmark, cp->flags);
+	dest = ip_vs_find_dest(ip_vs_conn_net(cp), cp->af, &cp->daddr,
(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 17/25] ipvs: reduce sync rate with time thresholds

From: Julian Anastasov <ja <at> ssi.bg>

	Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.

sync_refresh_period: in seconds, difference in reported connection
	timer that triggers new sync message. It can be used to
	avoid sync messages for the specified period (or half of
	the connection timeout if it is lower) if connection state
	is not changed from last sync.

sync_retries: integer, 0..3, defines sync retries with period of
	sync_refresh_period/8. Useful to protect against loss of
	sync messages.

	Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.

	Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.

	As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.

	Special thanks to Aleksey Chudov for being patient with me,
(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 16/25] ipvs: wakeup master thread

From: Julian Anastasov <ja <at> ssi.bg>

	High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,

	Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.

	Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.

	As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).

	Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.

	Change master thread to enter TASK_INTERRUPTIBLE
(Continue reading)

pablo | 8 May 02:22 2012

[PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest

From: Julian Anastasov <ja <at> ssi.bg>

	As the IP_VS_CONN_F_INACTIVE bit is properly set
in cp->flags for all kind of connections we do not need to
add special checks for synced connections when updating
the activeconns/inactconns counters for first time. Now
logic will look just like in ip_vs_unbind_dest.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_conn.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index f562e63..7647f3b 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
 <at>  <at>  -585,11 +585,10  <at>  <at>  ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)

 	/* Update the connection counters */
 	if (!(cp->flags & IP_VS_CONN_F_TEMPLATE)) {
-		/* It is a normal connection, so increase the inactive
-		   connection counter because it is in TCP SYNRECV
-		   state (inactive) or other protocol inacive state */
-		if ((cp->flags & IP_VS_CONN_F_SYNC) &&
-		    (!(cp->flags & IP_VS_CONN_F_INACTIVE)))
+		/* It is a normal connection, so modify the counters
+		 * according to the flags, later the protocol can
+		 * update them on state change */
(Continue reading)

David Miller | 8 May 03:36 2012
Picon

Re: [PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest

From: pablo <at> netfilter.org
Date: Tue,  8 May 2012 02:22:07 +0200

> +		/* It is a normal connection, so modify the counters
> +		 * according to the flags, later the protocol can
> +		 * update them on state change */

Rather:

		/* It is a normal connection, so modify the counters
		 * according to the flags, later the protocol can
		 * update them on state change
		 */
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Simon Horman | 8 May 04:08 2012
Picon
Gravatar

Re: [PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest

On Mon, May 07, 2012 at 09:36:07PM -0400, David Miller wrote:
> From: pablo <at> netfilter.org
> Date: Tue,  8 May 2012 02:22:07 +0200
> 
> > +		/* It is a normal connection, so modify the counters
> > +		 * according to the flags, later the protocol can
> > +		 * update them on state change */
> 
> Rather:
> 
> 		/* It is a normal connection, so modify the counters
> 		 * according to the flags, later the protocol can
> 		 * update them on state change
> 		 */

Hi Dave,

can I fix this up as a subsequent patch?

David Miller | 8 May 04:16 2012
Picon

Re: [PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest

From: Simon Horman <horms <at> verge.net.au>
Date: Tue, 8 May 2012 11:08:44 +0900

> can I fix this up as a subsequent patch?

Pablo's tree needs to get respun to address the other feedback
I gave, so no reason for him or you to not fix this as well.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Simon Horman | 8 May 05:15 2012
Picon
Gravatar

Re: [PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest

On Mon, May 07, 2012 at 10:16:17PM -0400, David Miller wrote:
> From: Simon Horman <horms <at> verge.net.au>
> Date: Tue, 8 May 2012 11:08:44 +0900
> 
> > can I fix this up as a subsequent patch?
> 
> Pablo's tree needs to get respun to address the other feedback
> I gave, so no reason for him or you to not fix this as well.

Understood.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pablo Neira Ayuso | 8 May 09:32 2012

Re: [PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest

On Tue, May 08, 2012 at 12:15:47PM +0900, Simon Horman wrote:
> On Mon, May 07, 2012 at 10:16:17PM -0400, David Miller wrote:
> > From: Simon Horman <horms <at> verge.net.au>
> > Date: Tue, 8 May 2012 11:08:44 +0900
> > 
> > > can I fix this up as a subsequent patch?
> > 
> > Pablo's tree needs to get respun to address the other feedback
> > I gave, so no reason for him or you to not fix this as well.
> 
> Understood.

Yes, I'll fix it myself.

Expect a new batch in a couple of minutes.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Engelhardt | 8 May 04:15 2012
Picon

Re: [PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest

On Tuesday 2012-05-08 03:36, David Miller wrote:

>From: pablo <at> netfilter.org
>Date: Tue,  8 May 2012 02:22:07 +0200
>
>> +		/* It is a normal connection, so modify the counters
>> +		 * according to the flags, later the protocol can
>> +		 * update them on state change */
>
>Rather:
>
>		/* It is a normal connection, so modify the counters
>		 * according to the flags, later the protocol can
>		 * update them on state change
>		 */

Well, CodingStyle even says

>		/*
>		 * It is a normal connection, so modify the counters
>		 * according to the flags, later the protocol can
>		 * update them on state change.
>		 */
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller | 8 May 04:17 2012
Picon

Re: [PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest

From: Jan Engelhardt <jengelh <at> inai.de>
Date: Tue, 8 May 2012 04:15:15 +0200 (CEST)

> On Tuesday 2012-05-08 03:36, David Miller wrote:
> 
>>From: pablo <at> netfilter.org
>>Date: Tue,  8 May 2012 02:22:07 +0200
>>
>>> +		/* It is a normal connection, so modify the counters
>>> +		 * according to the flags, later the protocol can
>>> +		 * update them on state change */
>>
>>Rather:
>>
>>		/* It is a normal connection, so modify the counters
>>		 * according to the flags, later the protocol can
>>		 * update them on state change
>>		 */
> 
> Well, CodingStyle even says

We've discussed this to death, subsystem maintainers can ask
for whatever they want and this is what I've asked for for years.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

pablo | 8 May 02:21 2012

[PATCH 05/25] ipvs: timeout tables do not need GFP_ATOMIC allocation

From: Julian Anastasov <ja <at> ssi.bg>

	They are called only on initialization.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Signed-off-by: Hans Schillstrom <hans <at> schillstrom.com>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_proto.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 6eda11d..a981b7c 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
 <at>  <at>  -196,7 +196,7  <at>  <at>  void ip_vs_protocol_timeout_change(struct netns_ipvs *ipvs, int flags)
 int *
 ip_vs_create_timeout_table(int *table, int size)
 {
-	return kmemdup(table, size, GFP_ATOMIC);
+	return kmemdup(table, size, GFP_KERNEL);
 }

 
--

-- 
1.7.9.5

pablo | 8 May 02:22 2012

[PATCH 10/25] ipvs: SH scheduler does not need GFP_ATOMIC allocation

From: Julian Anastasov <ja <at> ssi.bg>

        Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Signed-off-by: Hans Schillstrom <hans <at> schillstrom.com>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_sh.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index 91e97ee..0512652 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
 <at>  <at>  -162,7 +162,7  <at>  <at>  static int ip_vs_sh_init_svc(struct ip_vs_service *svc)

 	/* allocate the SH table for this service */
 	tbl = kmalloc(sizeof(struct ip_vs_sh_bucket)*IP_VS_SH_TAB_SIZE,
-		      GFP_ATOMIC);
+		      GFP_KERNEL);
 	if (tbl == NULL)
 		return -ENOMEM;

--

-- 
1.7.9.5

pablo | 8 May 02:22 2012

[PATCH 06/25] ipvs: LBLC scheduler does not need GFP_ATOMIC allocation on init

From: Julian Anastasov <ja <at> ssi.bg>

	Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Signed-off-by: Hans Schillstrom <hans <at> schillstrom.com>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_lblc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index 27c24f1..7ba1672 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
 <at>  <at>  -342,7 +342,7  <at>  <at>  static int ip_vs_lblc_init_svc(struct ip_vs_service *svc)
 	/*
 	 *    Allocate the ip_vs_lblc_table for this service
 	 */
-	tbl = kmalloc(sizeof(*tbl), GFP_ATOMIC);
+	tbl = kmalloc(sizeof(*tbl), GFP_KERNEL);
 	if (tbl == NULL)
 		return -ENOMEM;

--

-- 
1.7.9.5

pablo | 8 May 02:22 2012

[PATCH 09/25] ipvs: LBLCR scheduler does not need GFP_ATOMIC allocation on init

From: Julian Anastasov <ja <at> ssi.bg>

	Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Signed-off-by: Hans Schillstrom <hans <at> schillstrom.com>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_lblcr.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 7498756..00906ea 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
 <at>  <at>  -511,7 +511,7  <at>  <at>  static int ip_vs_lblcr_init_svc(struct ip_vs_service *svc)
 	/*
 	 *    Allocate the ip_vs_lblcr_table for this service
 	 */
-	tbl = kmalloc(sizeof(*tbl), GFP_ATOMIC);
+	tbl = kmalloc(sizeof(*tbl), GFP_KERNEL);
 	if (tbl == NULL)
 		return -ENOMEM;

--

-- 
1.7.9.5

pablo | 8 May 02:22 2012

[PATCH 08/25] ipvs: WRR scheduler does not need GFP_ATOMIC allocation

From: Julian Anastasov <ja <at> ssi.bg>

	Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja <at> ssi.bg>
Signed-off-by: Hans Schillstrom <hans <at> schillstrom.com>
Signed-off-by: Simon Horman <horms <at> verge.net.au>
---
 net/netfilter/ipvs/ip_vs_wrr.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_wrr.c b/net/netfilter/ipvs/ip_vs_wrr.c
index fd0d4e0..231be7d 100644
--- a/net/netfilter/ipvs/ip_vs_wrr.c
+++ b/net/netfilter/ipvs/ip_vs_wrr.c
 <at>  <at>  -84,7 +84,7  <at>  <at>  static int ip_vs_wrr_init_svc(struct ip_vs_service *svc)
 	/*
 	 *    Allocate the mark variable for WRR scheduling
 	 */
-	mark = kmalloc(sizeof(struct ip_vs_wrr_mark), GFP_ATOMIC);
+	mark = kmalloc(sizeof(struct ip_vs_wrr_mark), GFP_KERNEL);
 	if (mark == NULL)
 		return -ENOMEM;

--

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
(Continue reading)

pablo | 8 May 02:21 2012

[PATCH 03/25] netfilter: nf_conntrack: use this_cpu_inc()

From: Eric Dumazet <edumazet <at> google.com>

this_cpu_inc() is IRQ safe and faster than
local_bh_disable()/__this_cpu_inc()/local_bh_enable(), at least on x86.

Signed-off-by: Eric Dumazet <edumazet <at> google.com>
Cc: Patrick McHardy <kaber <at> trash.net>
Cc: Christoph Lameter <cl <at> linux.com>
Cc: Tejun Heo <tj <at> kernel.org>
Reviewed-by: Christoph Lameter <cl <at> linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo <at> netfilter.org>
---
 include/net/netfilter/nf_conntrack.h |   10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index ab86036..cce7f6a 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
 <at>  <at>  -321,14 +321,8  <at>  <at>  extern unsigned int nf_conntrack_max;
 extern unsigned int nf_conntrack_hash_rnd;
 void init_nf_conntrack_hash_rnd(void);

-#define NF_CT_STAT_INC(net, count)	\
-	__this_cpu_inc((net)->ct.stat->count)
-#define NF_CT_STAT_INC_ATOMIC(net, count)		\
-do {							\
-	local_bh_disable();				\
-	__this_cpu_inc((net)->ct.stat->count);		\
-	local_bh_enable();				\
(Continue reading)

pablo | 8 May 02:21 2012

[PATCH 04/25] netfilter: bridge: optionally set indev to vlan

From: Florian Westphal <fw <at> strlen.de>

if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge
netfilter removes the vlan header temporarily and then feeds the packet
to ip(6)tables.

When the new "bridge-nf-pass-vlan-input-device" sysctl is on
(default off), then bridge netfilter will also set the
in-interface to the vlan interface; if such an interface exists.

This is needed to make iptables REDIRECT target work with
"vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to
match the vlan device name.

Also update Documentation with current brnf default settings.

Signed-off-by: Florian Westphal <fw <at> strlen.de>
Acked-by: Bart De Schuymer <bdschuym <at> pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo <at> netfilter.org>
---
 Documentation/networking/ip-sysctl.txt |   13 +++++++++++--
 net/bridge/br_netfilter.c              |   26 ++++++++++++++++++++++++--
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index bd80ba5..edff76d 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
 <at>  <at>  -1287,13 +1287,22  <at>  <at>  bridge-nf-call-ip6tables - BOOLEAN
 bridge-nf-filter-vlan-tagged - BOOLEAN
(Continue reading)

pablo | 8 May 02:21 2012

[PATCH 01/25] netfilter: nf_ct_ecache: refactor notifier registration

From: Tony Zelenoff <antonz <at> parallels.com>

* ret variable initialization removed as useless
* similar code strings concatenated and functions code
  flow became more plain

Signed-off-by: Tony Zelenoff <antonz <at> parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo <at> netfilter.org>
---
 net/netfilter/nf_conntrack_ecache.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index 5bd3047d..3a3409f 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
 <at>  <at>  -84,7 +84,7  <at>  <at>  EXPORT_SYMBOL_GPL(nf_ct_deliver_cached_events);
 int nf_conntrack_register_notifier(struct net *net,
 				   struct nf_ct_event_notifier *new)
 {
-	int ret = 0;
+	int ret;
 	struct nf_ct_event_notifier *notify;

 	mutex_lock(&nf_ct_ecache_mutex);
 <at>  <at>  -95,8 +95,7  <at>  <at>  int nf_conntrack_register_notifier(struct net *net,
 		goto out_unlock;
 	}
 	rcu_assign_pointer(net->ct.nf_conntrack_event_cb, new);
-	mutex_unlock(&nf_ct_ecache_mutex);
(Continue reading)


Gmane