Vladimir Ivanov | 1 Nov 2007 01:00
Picon

Re: Proposed #ifdef change to em

Scott Long wrote:
> Jack Vogel wrote:
>> I have found that the FAST interrupt handling is  implicated
>> in the watchdog resets that I have seen.

It's not true. I have seen watchdogs much earlier then FASTINTR.
Also, please note: older driver had a bug preventing watchdog to be 
reported (see http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/92895)

>>
>> What I plan to do is revert to the way 6.2 had things, meaning
>> that FAST interrupts will be available but defined off by default.
>>
>> I wanted to know if anyone has an issue with this. And more
>> importantly, I have personally not seen this problem on 7, but
>> I could set up #ifdef's in that driver to be the same way.
>>
>> What does everyone think?

We've a lot of computers w/FASTINTR (more or less patched). They carry 
huge traffic. But I don't remember when I have seen last watchdog.

E.g.:

pitman:~# sysctl dev.em.0.stats=1; dmesg  | tail -30
dev.em.0.stats: -1 -> -1
[skip]
em0: Excessive collisions = 0
em0: Sequence errors = 0
em0: Defer count = 0
(Continue reading)

Jack Vogel | 1 Nov 2007 01:28
Picon

Re: Proposed #ifdef change to em

Vladimir,

  Your one phrase "more or less patched" invalidated the whole
data point. We are talking about code thats checked in and bound
for 6.3 :)

   I have hundreds of machines here at Intel that DON'T have the
problem, that's why in early 20th century philosophy they realized
that verification as scientific method was ineffective, falsification
on the other hand is powerful. So if any users out there have
a problem I am trying to understand why. The only way that I
have so far reproduced something like their failure is when
FAST interrupts are enabled, THEN when I disable them on that
same machine the problem disappears. Right now I have still
not figured out why this is, I'm trying to do that as I write this.

I am also not saying that nothing ever caused a watchdog
before FAST handling, only that as best that I can tell right now
the one repro I have on STABLE, October Snapshot, is related to it.

Regards,

Jack

On 10/31/07, Vladimir Ivanov <wawa <at> yandex-team.ru> wrote:
> Scott Long wrote:
> > Jack Vogel wrote:
> >> I have found that the FAST interrupt handling is  implicated
> >> in the watchdog resets that I have seen.
>
(Continue reading)

Vladimir Ivanov | 1 Nov 2007 09:36
Picon

Re: Proposed #ifdef change to em

Hi,

Jack Vogel wrote:
> Vladimir,
> 
>   Your one phrase "more or less patched" invalidated the whole
> data point. We are talking about code thats checked in and bound
> for 6.3 :)

Oops. I've got it. Maybe we talk about different kinds of watchdog. I 
have meant TX queue watchdogs.

Yes, there is a problem with system watchdog in mainstream driver.
Sometimes system stops to respond due to kernel activity for a one 
minute or less. Hardware watchdog can reset system this time.
This issue is specific to taskq (fastintr) version of driver

The fix is very simple: we've to schedule less priority to RX thread. We 
use PRI_MAX_KERN instead of PI_NET in Yandex' revision of driver.

> 
>    I have hundreds of machines here at Intel that DON'T have the
> problem, that's why in early 20th century philosophy they realized
> that verification as scientific method was ineffective, falsification
> on the other hand is powerful. So if any users out there have
> a problem I am trying to understand why. The only way that I
> have so far reproduced something like their failure is when
> FAST interrupts are enabled, THEN when I disable them on that
> same machine the problem disappears. Right now I have still
> not figured out why this is, I'm trying to do that as I write this.
(Continue reading)


Gmane