Beverly Schwartz | 20 Apr 2012 01:39
Picon

BNX driver problem when mbuf clusters run out

In bnx_rx_intr, there is a while loop:

while (sw_cons != hw_cons)

Inside this loop, we grab the next mbuf that's available.

m = sc->rx_mbuf_ptr[sw_chain_cons];
sc->rx_mbuf_ptr[sw_chain_cons] = NULL;

It then goes on and tries to get an mbuf cluster to replace the one we just took off the ring.

if (bnx_get_buf(sc, &sw_prod, &sw_chain_prod, &sw_prod_bseq))

If bnx_get_buf fails, the code then calls bnx_add_buf to put the mbuf we just received back on the ring.

bnx_add_buf(sc, m, &sw_prod, &sw_chain_prod, &sw_prod_bseq)

Inside bnx_add_buf, first we put the mbuf (the recycled m) at sw_chain_prod.  Because this is the same as
sw_chain_cons, m gets placed back at the point we just nulled out.

sc->rx_buf_ptr[*chain_prod] = m_new;

Then sw_chain_prod gets bumped up.

*prod = NEXT_RX_BD(*prod);
*chain_prod = RX_CHAIN_IDX(*prod);

When the code returns from bnx_add_buf, a "continue" is executed, thus going around the loop again.

sw_chain_cons has NOT been incremented, since that call to NEXT_RX_BD is further down in processing in the loop.
(Continue reading)

Joerg Sonnenberger | 20 Apr 2012 01:43
Picon

Re: BNX driver problem when mbuf clusters run out

On Thu, Apr 19, 2012 at 07:39:33PM -0400, Beverly Schwartz wrote:
> Note, there is another condition in which we recycle mbufs,
> which suffers from the same problem.

The easiest approach for this is generally to invert the order. *First*
try to obtain an mbuf for the list. If that fails, just drop the newly
received packet. That avoids most of the complications.

Joerg

Beverly Schwartz | 20 Apr 2012 01:45
Picon

Re: BNX driver problem when mbuf clusters run out

That won't work in this case.  If bnx_get_buf succeeds in getting an mbuf cluster, it then puts it in the rx
ring.  If we haven't removed the mbuf first, there will be no place to put the new mbuf.

-Bev

On Apr 19, 2012, at 7:43 PM, Joerg Sonnenberger wrote:

> On Thu, Apr 19, 2012 at 07:39:33PM -0400, Beverly Schwartz wrote:
>> Note, there is another condition in which we recycle mbufs,
>> which suffers from the same problem.
> 
> The easiest approach for this is generally to invert the order. *First*
> try to obtain an mbuf for the list. If that fails, just drop the newly
> received packet. That avoids most of the complications.
> 
> Joerg

Joerg Sonnenberger | 20 Apr 2012 01:54
Picon

Re: BNX driver problem when mbuf clusters run out

On Thu, Apr 19, 2012 at 07:45:35PM -0400, Beverly Schwartz wrote:
> That won't work in this case.  If bnx_get_buf succeeds in getting
> an mbuf cluster, it then puts it in the rx ring.  If we haven't
> removed the mbuf first, there will be no place to put the new mbuf.

That's circular reasoning. There are three places currently using
bnx_get_buf:

bnx_init_rx_chain
bnx_rx_intr
bnx_tick

The last one can be dropped with the changed semantic, since the RX ring
will never be depleted. The rest is solved by properly splitting up the
actions into allocating a mbuf cluster for the RX ring, loading a
cluster into the RX ring and unloading an already mapped entry.
bnx_init_rx_chain wants to the first two in order, bnx_rx_intr wants to
alloc, if that works: unload and load new cluster.

Joerg

Matt Thomas | 20 Apr 2012 02:02

Re: BNX driver problem when mbuf clusters run out


On Apr 19, 2012, at 4:54 PM, Joerg Sonnenberger wrote:

> On Thu, Apr 19, 2012 at 07:45:35PM -0400, Beverly Schwartz wrote:
>> That won't work in this case.  If bnx_get_buf succeeds in getting
>> an mbuf cluster, it then puts it in the rx ring.  If we haven't
>> removed the mbuf first, there will be no place to put the new mbuf.
> 
> That's circular reasoning. There are three places currently using
> bnx_get_buf:
> 
> bnx_init_rx_chain
> bnx_rx_intr
> bnx_tick
> 
> The last one can be dropped with the changed semantic, since the RX ring
> will never be depleted. The rest is solved by properly splitting up the
> actions into allocating a mbuf cluster for the RX ring, loading a
> cluster into the RX ring and unloading an already mapped entry.
> bnx_init_rx_chain wants to the first two in order, bnx_rx_intr wants to
> alloc, if that works: unload and load new cluster.

Unfortunately, that can lead to receiver livelock (spend all the time servicing interrupts without
making progress).  The best thing to do is pass the packet up the stack and just another packet to the
hardware.  Instead use bnx_tick to add mbufs if needed.  This gives the stack a chance to run and hopefully
free some mbufs.
Beverly Schwartz | 20 Apr 2012 02:08
Picon

Re: BNX driver problem when mbuf clusters run out


On Apr 19, 2012, at 8:02 PM, Matt Thomas wrote:

> 
> On Apr 19, 2012, at 4:54 PM, Joerg Sonnenberger wrote:
> 
>> On Thu, Apr 19, 2012 at 07:45:35PM -0400, Beverly Schwartz wrote:
>>> That won't work in this case.  If bnx_get_buf succeeds in getting
>>> an mbuf cluster, it then puts it in the rx ring.  If we haven't
>>> removed the mbuf first, there will be no place to put the new mbuf.
>> 
>> That's circular reasoning. There are three places currently using
>> bnx_get_buf:
>> 
>> bnx_init_rx_chain
>> bnx_rx_intr
>> bnx_tick
>> 
>> The last one can be dropped with the changed semantic, since the RX ring
>> will never be depleted. The rest is solved by properly splitting up the
>> actions into allocating a mbuf cluster for the RX ring, loading a
>> cluster into the RX ring and unloading an already mapped entry.
>> bnx_init_rx_chain wants to the first two in order, bnx_rx_intr wants to
>> alloc, if that works: unload and load new cluster.
> 
> Unfortunately, that can lead to receiver livelock (spend all the time servicing interrupts without
making progress).  The best thing to do is pass the packet up the stack and just another packet to the
hardware.  Instead use bnx_tick to add mbufs if needed.  This gives the stack a chance to run and hopefully
free some mbufs.

(Continue reading)

Matt Thomas | 20 Apr 2012 02:27

Re: BNX driver problem when mbuf clusters run out


On Apr 19, 2012, at 5:08 PM, Beverly Schwartz wrote:

> 
> On Apr 19, 2012, at 8:02 PM, Matt Thomas wrote:
>> 
>> Unfortunately, that can lead to receiver livelock (spend all the time servicing interrupts without
making progress).  The best thing to do is pass the packet up the stack and just another packet to the
hardware.  Instead use bnx_tick to add mbufs if needed.  This gives the stack a chance to run and hopefully
free some mbufs.
> 
> I tried that.  And at first, things move along.  But eventually, bnx_rx_intr never gets called, because 
> if (sblk->status_rx_quick_consumer_index0 != sc->nw_rx_cons)
> in bnx_intr always fails.

For now, can you try the patch at

http://www.netbsd.org/~matt/if_bnx-diff.txt

It's kind of ugly but it is a minimal change.
Beverly Schwartz | 20 Apr 2012 03:48
Picon

Re: BNX driver problem when mbuf clusters run out


On Apr 19, 2012, at 8:27 PM, Matt Thomas wrote:

> 
> On Apr 19, 2012, at 5:08 PM, Beverly Schwartz wrote:
> 
>> 
>> On Apr 19, 2012, at 8:02 PM, Matt Thomas wrote:
>>> 
>>> Unfortunately, that can lead to receiver livelock (spend all the time servicing interrupts without
making progress).  The best thing to do is pass the packet up the stack and just another packet to the
hardware.  Instead use bnx_tick to add mbufs if needed.  This gives the stack a chance to run and hopefully
free some mbufs.
>> 
>> I tried that.  And at first, things move along.  But eventually, bnx_rx_intr never gets called, because 
>> if (sblk->status_rx_quick_consumer_index0 != sc->nw_rx_cons)
>> in bnx_intr always fails.
> 
> For now, can you try the patch at
> 
> http://www.netbsd.org/~matt/if_bnx-diff.txt
> 
> It's kind of ugly but it is a minimal change.

Doesn't quite do it.  With this change, the sw_cons index doesn't get incremented.  Next time bnx_rx_intr is
called, we'll have the same overwrite problem.  I'm now trying incremeing sw_cons before calling break. 
When I left work, it was still running.  Tomorrow morning, I'll see if everything is still intact.

-Bev
(Continue reading)

Matt Thomas | 20 Apr 2012 07:27

Re: BNX driver problem when mbuf clusters run out


On Apr 19, 2012, at 6:48 PM, Beverly Schwartz wrote:

> 
> On Apr 19, 2012, at 8:27 PM, Matt Thomas wrote:
> 
>> 
>> On Apr 19, 2012, at 5:08 PM, Beverly Schwartz wrote:
>> 
>>> 
>>> On Apr 19, 2012, at 8:02 PM, Matt Thomas wrote:
>>>> 
>>>> Unfortunately, that can lead to receiver livelock (spend all the time servicing interrupts without
making progress).  The best thing to do is pass the packet up the stack and just another packet to the
hardware.  Instead use bnx_tick to add mbufs if needed.  This gives the stack a chance to run and hopefully
free some mbufs.
>>> 
>>> I tried that.  And at first, things move along.  But eventually, bnx_rx_intr never gets called, because 
>>> if (sblk->status_rx_quick_consumer_index0 != sc->nw_rx_cons)
>>> in bnx_intr always fails.
>> 
>> For now, can you try the patch at
>> 
>> http://www.netbsd.org/~matt/if_bnx-diff.txt
>> 
>> It's kind of ugly but it is a minimal change.
> 
> Doesn't quite do it.  With this change, the sw_cons index doesn't get incremented.  Next time bnx_rx_intr
is called, we'll have the same overwrite problem.  I'm now trying incremeing sw_cons before calling
break.  When I left work, it was still running.  Tomorrow morning, I'll see if everything is still intact.
(Continue reading)

Beverly Schwartz | 20 Apr 2012 12:29
Picon

Re: BNX driver problem when mbuf clusters run out


On Apr 20, 2012, at 1:27 AM, Matt Thomas wrote:

> 
> On Apr 19, 2012, at 6:48 PM, Beverly Schwartz wrote:
> 
>> 
>> On Apr 19, 2012, at 8:27 PM, Matt Thomas wrote:
>> 
>>> 
>>> On Apr 19, 2012, at 5:08 PM, Beverly Schwartz wrote:
>>> 
>>>> 
>>>> On Apr 19, 2012, at 8:02 PM, Matt Thomas wrote:
>>>>> 
>>>>> Unfortunately, that can lead to receiver livelock (spend all the time servicing interrupts without
making progress).  The best thing to do is pass the packet up the stack and just another packet to the
hardware.  Instead use bnx_tick to add mbufs if needed.  This gives the stack a chance to run and hopefully
free some mbufs.
>>>> 
>>>> I tried that.  And at first, things move along.  But eventually, bnx_rx_intr never gets called, because 
>>>> if (sblk->status_rx_quick_consumer_index0 != sc->nw_rx_cons)
>>>> in bnx_intr always fails.
>>> 
>>> For now, can you try the patch at
>>> 
>>> http://www.netbsd.org/~matt/if_bnx-diff.txt
>>> 
>>> It's kind of ugly but it is a minimal change.
>> 
(Continue reading)

Matt Thomas | 20 Apr 2012 15:45

Re: BNX driver problem when mbuf clusters run out


On Apr 20, 2012, at 3:29 AM, Beverly Schwartz wrote:

> 
> On Apr 20, 2012, at 1:27 AM, Matt Thomas wrote:
> 
>> 
>> On Apr 19, 2012, at 6:48 PM, Beverly Schwartz wrote:
>> 
>>> 
>>> On Apr 19, 2012, at 8:27 PM, Matt Thomas wrote:
>>> 
>>>> 
>>>> On Apr 19, 2012, at 5:08 PM, Beverly Schwartz wrote:
>>>> 
>>>>> 
>>>>> On Apr 19, 2012, at 8:02 PM, Matt Thomas wrote:
>>>>>> 
>>>>>> Unfortunately, that can lead to receiver livelock (spend all the time servicing interrupts
without making progress).  The best thing to do is pass the packet up the stack and just another packet to the
hardware.  Instead use bnx_tick to add mbufs if needed.  This gives the stack a chance to run and hopefully
free some mbufs.
>>>>> 
>>>>> I tried that.  And at first, things move along.  But eventually, bnx_rx_intr never gets called,
because 
>>>>> if (sblk->status_rx_quick_consumer_index0 != sc->nw_rx_cons)
>>>>> in bnx_intr always fails.
>>>> 
>>>> For now, can you try the patch at
>>>> 
(Continue reading)

Beverly Schwartz | 20 Apr 2012 17:06
Picon

Re: BNX driver problem when mbuf clusters run out


On Apr 20, 2012, at 9:45 AM, Matt Thomas wrote:

> I don't think you applied the patch properly.
> 
> The "if" for testing the mbuf was changed to a while loop so the breaks 
> just terminate that loop.  
> 
> The patch has been updated a bit to include a missing bus_dmamap_sync

You are correct, I missed the added loop.

My interfaces still freeze.  Here's what I think the problem is.

If bnx_get_buf fails, we still call bnx_add_buf.  The mbuf gets put back in the ring.  We break out of the inner
loop, and then end up calling ifp->ip_input, which eventually frees the mbuf.

I think there are some other problems with this change:

If there are receive frame problems, the mbuf is put back into the ring, and the mbuf is also processed.  This
will have the same problem as bnx_get_buf failure.  However, in this case, we add processing a packet that
probably shouldn't be processed.

Further down in the inner loop, we check for a VLAN tag.  This has a continue statement.  I believe we will now be
in an infinite loop.  (There are some other weird outcomes I could posit, but in any case, I think we will have
undesired behavior.)

I also think there's another bug in the original code.  sc->free_rx_bd is incremented before we check if the
the mbuf pointer is NULL.  Shouldn't we only increment free_rx_bd if we're actually processing a packet?

(Continue reading)

Beverly Schwartz | 21 Apr 2012 23:07
Picon

Re: BNX driver problem when mbuf clusters run out

On Apr 20, 2012, at 9:45 AM, Matt Thomas wrote:
> The patch has been updated a bit to include a missing bus_dmamap_sync

I do believe I have tracked down the problem. I have not added the bus_dmamap_sync to my patch since Matt
already got that one.

-Bev

Here's my description:

  Problem 1:
  In bnx_rx_intr, there are two places where the producer can get pushed
  ahead of the consumer.  In the while (sw_cons != hw_cons) loop,
  there is a check for a packet whose layer 2 data has been corrupted,
  and a check for being able to allocate a new mbuf cluster to be
  placed in the receive queue.  In both cases, if the test fails,
  the mbuf cluster that has just been removed from the receive ring
  buffer is put back, and the producer index is pushed forward.  However,
  the consumer index is not pushed forward, so the next time around,
  the mbuf cluster is removed from where the consumer points, but a
  new (or recycled) mbuf cluster is put where the producer points.  This
  over writes the pointer to another mbuf cluster.  The mbuf cluster is
  lost forever, and the receive buffer ends up filled with NULLs because
  the consumer and produce indices stay out of sync.  Eventually all
  mbuf clusters are lost, and no more communication can happen over any
  interface.

  To fix this, I changed the action taken in each test in case of failure.
  For the botched layer 2 data, I advance the consumer pointer, and then
  break out of the loop.  For the failure to get a new mbuf cluster,
(Continue reading)

Greg Troxel | 22 Apr 2012 01:47
Picon

Re: BNX driver problem when mbuf clusters run out


If there are no objections I'll commit Bev's patch to current on Monday
or Tuesday.  It's getting pretty serious stress testing and has
survived.

<pointy-haired>
Also, should anyone feel in a rush to commit it before I do, please
include in the commit message that this patch is:

  Approved for Public Release, Distribution Unlimited

  This material is based upon work supported by the Defense Advanced Research
  Projects Agency and Space and Naval Warfare Systems Center, Pacific, under
  Contract No. N66001-09-C-2073.
</>

Greg Troxel | 20 Apr 2012 01:48
Picon

Re: BNX driver problem when mbuf clusters run out


Joerg Sonnenberger <joerg <at> britannica.bec.de> writes:

> On Thu, Apr 19, 2012 at 07:39:33PM -0400, Beverly Schwartz wrote:
>> Note, there is another condition in which we recycle mbufs,
>> which suffers from the same problem.
>
> The easiest approach for this is generally to invert the order. *First*
> try to obtain an mbuf for the list. If that fails, just drop the newly
> received packet. That avoids most of the complications.

A fair point, but the driver doesn't actually free an mbuf with the
packet.  The 'drop' mechanism is to leave the cluster in place (to be
overwritten later), and the error is in the ring pointer accounting.

My approach would be to try to have the code path in the error case be
as similar to the normal case as possible (in terms of updating ring
pointers), and just not call *if_input with the received mbuf.
Manuel Bouyer | 20 Apr 2012 13:05

Re: BNX driver problem when mbuf clusters run out

On Thu, Apr 19, 2012 at 07:48:44PM -0400, Greg Troxel wrote:
> A fair point, but the driver doesn't actually free an mbuf with the
> packet.  The 'drop' mechanism is to leave the cluster in place (to be
> overwritten later), and the error is in the ring pointer accounting.
> 
> My approach would be to try to have the code path in the error case be
> as similar to the normal case as possible (in terms of updating ring
> pointers), and just not call *if_input with the received mbuf.

My feeling is that this is the right approach. When we're out of
mbuf cluster we should drop the received packet(s) and reuse the
mbuf(s) from the ring. If we fail to ack the received packets,
we can either create an interrupt storm or stop the receive process
completely.

--

-- 
Manuel Bouyer <bouyer <at> antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Gmane