Dave Täht | 14 Mar 20:26
Gravatar

shaper team forming up


There are multiple people on these lists looking into existing and
potential shaper designs. I'd like it very much if we could co-ordinate
our efforts. I've had several folk volunteer their efforts, could use more.

Also...

Over the weekend, Dan Siemons uncovered a possible bad interaction
between ECN and the default pfifo_fast qdisc in Linux.

http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/

Perhaps some people could look into this, determine the extent that this
is a real problem (it seems very bad to me that acks would get tossed
into different queues based on interpreting mmc instead of ecn)

and figure out where the right place(s) are for a patch?

--

-- 
Dave Taht
http://nex-6.taht.net
Jonathan Morton | 14 Mar 20:55
Picon

Re: shaper team forming up


On 14 Mar, 2011, at 9:26 pm, Dave Täht wrote:

> Over the weekend, Dan Siemons uncovered a possible bad interaction
> between ECN and the default pfifo_fast qdisc in Linux.
> 
> http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/

This seems to be more complicated that it appears.  It looks as though Linux has re-used the LSB of the old TOS
field for some "link local" flag which is used by routing.

It's not immediately obvious whether pfifo_fast is using this new interpretation though.  If it isn't, the
fix should be to remove the RTO_ONLINK bit from the mask it's using on the tos field.  The other half of the
mask correctly excludes the ECN bits from the field.

 - Jonathan
Eric Dumazet | 14 Mar 21:24
Picon

Re: [Bloat] shaper team forming up

Le lundi 14 mars 2011 à 21:55 +0200, Jonathan Morton a écrit :
> On 14 Mar, 2011, at 9:26 pm, Dave Täht wrote:
> 
> > Over the weekend, Dan Siemons uncovered a possible bad interaction
> > between ECN and the default pfifo_fast qdisc in Linux.
> > 
> > http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/
> 
> This seems to be more complicated that it appears.  It looks as though
> Linux has re-used the LSB of the old TOS field for some "link local"
> flag which is used by routing.
> 
> It's not immediately obvious whether pfifo_fast is using this new
> interpretation though.  If it isn't, the fix should be to remove the
> RTO_ONLINK bit from the mask it's using on the tos field.  The other
> half of the mask correctly excludes the ECN bits from the field.
> 

CC netdev, where linux network dev can take a look.

I would say that this is a wrong analysis : 

1) ECN uses two low order bits of TOS byte

2) pfifo_fast uses skb->priority

skb->priority = rt_tos2priority(iph->tos);

#define IPTOS_TOS_MASK            0x1E
#define IPTOS_TOS(tos)            ((tos)&IPTOS_TOS_MASK)
(Continue reading)

Eric Dumazet | 15 Mar 05:42
Picon

Re: [Bloat] shaper team forming up

Le lundi 14 mars 2011 à 21:24 +0100, Eric Dumazet a écrit :

remove CC to bloat lists for now, adding David Miller to thread.

> Le lundi 14 mars 2011 à 21:55 +0200, Jonathan Morton a écrit :
> > On 14 Mar, 2011, at 9:26 pm, Dave Täht wrote:
> > 
> > > Over the weekend, Dan Siemons uncovered a possible bad interaction
> > > between ECN and the default pfifo_fast qdisc in Linux.
> > > 
> > > http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/
> > 
> > This seems to be more complicated that it appears.  It looks as though
> > Linux has re-used the LSB of the old TOS field for some "link local"
> > flag which is used by routing.
> > 
> > It's not immediately obvious whether pfifo_fast is using this new
> > interpretation though.  If it isn't, the fix should be to remove the
> > RTO_ONLINK bit from the mask it's using on the tos field.  The other
> > half of the mask correctly excludes the ECN bits from the field.
> > 
> 
> CC netdev, where linux network dev can take a look.
> 
> I would say that this is a wrong analysis : 
> 
> 1) ECN uses two low order bits of TOS byte
> 
> 2) pfifo_fast uses skb->priority
> 
(Continue reading)

Dave Täht | 15 Mar 06:27
Gravatar

ECN + pfifo_fast borked? (Was Re: [Bloat] shaper team forming up)

Eric Dumazet <eric.dumazet <at> gmail.com> writes:

> Le lundi 14 mars 2011 à 21:24 +0100, Eric Dumazet a écrit :
>
> remove CC to bloat lists for now, adding David Miller to thread.
>
>> Le lundi 14 mars 2011 à 21:55 +0200, Jonathan Morton a écrit :
>> > On 14 Mar, 2011, at 9:26 pm, Dave Täht wrote:
>> > 
>> > > Over the weekend, Dan Siemon uncovered a possible bad interaction
>> > > between ECN and the default pfifo_fast qdisc in Linux.
>> > > 
>> > > http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/
>> > 

Totally irrelevant bit elided

>> CC netdev, where linux network dev can take a look.
>> 
>> I would say that this is a wrong analysis : 
>> 
>> 1) ECN uses two low order bits of TOS byte
>> 
>> 2) pfifo_fast uses skb->priority
>> 
>> 
>> skb->priority = rt_tos2priority(iph->tos);
>> 
>> #define IPTOS_TOS_MASK            0x1E
>> #define IPTOS_TOS(tos)            ((tos)&IPTOS_TOS_MASK)
(Continue reading)

Eric Dumazet | 15 Mar 07:15
Picon

Re: ECN + pfifo_fast borked? (Was Re: [Bloat] shaper team forming up)

Le lundi 14 mars 2011 à 23:27 -0600, Dave Täht a écrit :

> Well, that makes 3 of us that think it's wrong. Can we get more? 
> 
> (I'll run through the math again in the morning)
> 
> It's most often not actually "enablement" but "assertion", when for
> example an ECN bit is put on an ACK packet (by an application, or qdisc)
> , it drops that ACK packet into the 2 queue - leaving all the other
> non-ECN asserted packets in that flow to flow out ahead of it.
> 

There are two ECN bits, not one.
The low order bit is not taken into account by skb->priority mapping.
The high order bit cannot be changed during flow lifetime.
(So : no OOO (Out Of Order) problems on say TCP flows)

> Or so dan siemon & I & now you, think. It's late and I really want to recheck
> the math and the shifts in the morning. However, if true... this would
> explain much ECN related weirdness precisely where it has been hard to
> measure, on heavily loaded systems.
> 
> >
> > Thanks
> >
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 6ed6603..fabfe81 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -171,7 +171,7 @@ static struct dst_ops ipv4_dst_ops = {
(Continue reading)

Jonathan Morton | 15 Mar 18:09
Picon

Re: ECN + pfifo_fast borked? (Was Re: [Bloat] shaper team forming up)


On 15 Mar, 2011, at 8:15 am, Eric Dumazet wrote:

> band 0 : high priority packets (like now)
> band 1 : (old band 1, ECN capable flows)
> band 2 : (old band 1, no ECN flows)
> band 3 : low priority packets (old band 2)

This seems good to me.  It would provide a concrete (if minor) enticement to turn ECN on.

 - Jonathan

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet | 15 Mar 19:28
Picon

Re: ECN + pfifo_fast borked? (Was Re: [Bloat] shaper team forming up)

Le mardi 15 mars 2011 à 19:09 +0200, Jonathan Morton a écrit :
> On 15 Mar, 2011, at 8:15 am, Eric Dumazet wrote:
> 
> > band 0 : high priority packets (like now)
> > band 1 : (old band 1, ECN capable flows)
> > band 2 : (old band 1, no ECN flows)
> > band 3 : low priority packets (old band 2)
> 
> This seems good to me.  It would provide a concrete (if minor) enticement to turn ECN on.
> 
>  

Here is a patch to implement that, on top of net-next-2.6 git tree

qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 4 priomap  2 1 3 3 2 3 0 0 2 2 2 2 2 2 2 2
 Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index c84b659..95ddf54 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -412,19 +412,39 @@ static struct Qdisc noqueue_qdisc = {
 };

 
-static const u8 prio2band[TC_PRIO_MAX + 1] = {
-	1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1
+/* 4-band FIFO queue: old style, but should be a bit faster than
+   generic prio+fifo combination.
(Continue reading)

Jonathan Morton | 15 Mar 19:37
Picon

Re: ECN + pfifo_fast borked? (Was Re: [Bloat] shaper team forming up)


On 15 Mar, 2011, at 8:28 pm, Eric Dumazet wrote:

>>> band 0 : high priority packets (like now)
>>> band 1 : (old band 1, ECN capable flows)
>>> band 2 : (old band 1, no ECN flows)
>>> band 3 : low priority packets (old band 2)
>> 
>> This seems good to me.  It would provide a concrete (if minor) enticement to turn ECN on.
> 
> Here is a patch to implement that, on top of net-next-2.6 git tree

Does this take both ECN bits into account?  The ECT(0), ECT(1) and ECE codepoints all need to be recognised equally.

 - Jonathan

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet | 15 Mar 20:56
Picon

Re: ECN + pfifo_fast borked? (Was Re: [Bloat] shaper team forming up)

Le mardi 15 mars 2011 à 20:37 +0200, Jonathan Morton a écrit :
> On 15 Mar, 2011, at 8:28 pm, Eric Dumazet wrote:
> 
> >>> band 0 : high priority packets (like now)
> >>> band 1 : (old band 1, ECN capable flows)
> >>> band 2 : (old band 1, no ECN flows)
> >>> band 3 : low priority packets (old band 2)
> >> 
> >> This seems good to me.  It would provide a concrete (if minor) enticement to turn ECN on.
> > 
> > Here is a patch to implement that, on top of net-next-2.6 git tree
> 
> Does this take both ECN bits into account?  The ECT(0), ECT(1) and ECE codepoints all need to be recognised equally.

This is done in a different layer, as already explained.

Current linux code ignores low order bit when doing TOS -> skb->priority
mapping.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller | 15 Mar 23:51
Favicon

Re: [Bloat] shaper team forming up

From: Eric Dumazet <eric.dumazet <at> gmail.com>
Date: Tue, 15 Mar 2011 05:42:46 +0100

> Le lundi 14 mars 2011 à 21:24 +0100, Eric Dumazet a écrit :
> 
> David, it seems ip_tos2prio is wrong on its 2nd entry :

Indeed, and in context, this is simply a thinko which has survived since
ECN support first got added to the tree, here's the relevant hunk:

--------------------
@@ -152,23 +152,29 @@ struct dst_ops ipv4_dst_ops =
 	sizeof(struct rtable),
 };

+#ifdef CONFIG_INET_ECN
+#define ECN_OR_COST(class)	TC_PRIO_##class
+#else
+#define ECN_OR_COST(class)	TC_PRIO_FILLER
+#endif
+
 __u8 ip_tos2prio[16] = {
 	TC_PRIO_BESTEFFORT,
-	TC_PRIO_FILLER,
+	ECN_OR_COST(FILLER),
 	TC_PRIO_BESTEFFORT,
-	TC_PRIO_FILLER,
+	ECN_OR_COST(BESTEFFORT),
 	TC_PRIO_BULK,
-	TC_PRIO_FILLER,
(Continue reading)

David Miller | 15 Mar 23:53
Favicon

Re: [Bloat] shaper team forming up

From: Eric Dumazet <eric.dumazet <at> gmail.com>
Date: Tue, 15 Mar 2011 05:42:46 +0100

> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 6ed6603..fabfe81 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -171,7 +171,7 @@ static struct dst_ops ipv4_dst_ops = {
>  
>  const __u8 ip_tos2prio[16] = {
>  	TC_PRIO_BESTEFFORT,
> -	ECN_OR_COST(FILLER),
> +	ECN_OR_COST(BESTEFFORT),
>  	TC_PRIO_BESTEFFORT,
>  	ECN_OR_COST(BESTEFFORT),
>  	TC_PRIO_BULK,
> 
> 

Eric can you please formally submit this fix?  I'd like to apply it soon.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet | 16 Mar 00:00
Picon

Re: [Bloat] shaper team forming up

Le mardi 15 mars 2011 à 15:53 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet <at> gmail.com>
> Date: Tue, 15 Mar 2011 05:42:46 +0100
> 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 6ed6603..fabfe81 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -171,7 +171,7 @@ static struct dst_ops ipv4_dst_ops = {
> >  
> >  const __u8 ip_tos2prio[16] = {
> >  	TC_PRIO_BESTEFFORT,
> > -	ECN_OR_COST(FILLER),
> > +	ECN_OR_COST(BESTEFFORT),
> >  	TC_PRIO_BESTEFFORT,
> >  	ECN_OR_COST(BESTEFFORT),
> >  	TC_PRIO_BULK,
> > 
> > 
> 
> Eric can you please formally submit this fix?  I'd like to apply it soon.
> 
> Thanks.

Sure, I can do this right now, but I want to fully credit Dan Siemon on
this one, as the author of the patch.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo <at> vger.kernel.org
(Continue reading)

Eric Dumazet | 16 Mar 00:56
Picon

[PATCH] net_sched: fix ip_tos2prio

From: Dan Siemon <dan <at> coverfire.com>

ECN support incorrectly maps ECN BESTEFFORT packets to TC_PRIO_FILLER
(1) instead of TC_PRIO_BESTEFFORT (0)

This means ECN enabled flows are placed in pfifo_fast/prio low priority
band, giving ECN enabled flows [ECT(0) and CE codepoints] higher drop
probabilities.

This is rather unfortunate, given we would like ECN being more widely
used.

Ref : http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/

Signed-off-by: Dan Siemon <dan <at> coverfire.com>
Signed-off-by: Eric Dumazet <eric.dumazet <at> gmail.com>
Cc: Dave Täht <d <at> taht.net>
Cc: Jonathan Morton <chromatix99 <at> gmail.com>
---
Note: I left TC_PRIO_FILLER definition, this can be removed later.

 net/ipv4/route.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 209989c..870b518 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -204,7 +204,7 @@ static struct dst_ops ipv4_dst_ops = {

(Continue reading)

David Miller | 16 Mar 02:54
Favicon

Re: [PATCH] net_sched: fix ip_tos2prio

From: Eric Dumazet <eric.dumazet <at> gmail.com>
Date: Wed, 16 Mar 2011 00:56:07 +0100

> From: Dan Siemon <dan <at> coverfire.com>
> 
> ECN support incorrectly maps ECN BESTEFFORT packets to TC_PRIO_FILLER
> (1) instead of TC_PRIO_BESTEFFORT (0)
> 
> This means ECN enabled flows are placed in pfifo_fast/prio low priority
> band, giving ECN enabled flows [ECT(0) and CE codepoints] higher drop
> probabilities.
> 
> This is rather unfortunate, given we would like ECN being more widely
> used.
> 
> Ref : http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/
> 
> Signed-off-by: Dan Siemon <dan <at> coverfire.com>
> Signed-off-by: Eric Dumazet <eric.dumazet <at> gmail.com>
> Cc: Dave Täht <d <at> taht.net>
> Cc: Jonathan Morton <chromatix99 <at> gmail.com>

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane