Michael L. Hitch | 14 Jun 2007 19:54

Re: yamt-idlelwp fallout for mips/cobalt?

On Fri, 25 May 2007, Izumi Tsutsui wrote:

> The following patch makes a LOCKDEBUG kernel work,
> but I don't know if it's really correct.

   I don't think it's correct.  This was changed several years ago
(starting with revision 1.175) because kernel threads would start
and run with interrupts disabled.  I had the same problem with my
amiga (m68k) not all that long ago because I had added a raidframe
drive and was getting lots of clock skew when parity rebuilding was
going on.  I finally figured out that the raidframe thread was running
with interrupts blocked, and started looking at several other ports
to see how they started kthread processes and found that they had
the same problem, but had fixed for several years.

   Looking back in the mail archives, this seems to be an attempt
to fix a locking against myself panic, so I suspect it's more likely
a locking error somewhere.  I remember that the m68k port had a similar
problem when running a DIAGNOSTIC kernel (which I had not tested at
the time), and I tracked down a small section of code I had missed
for the idlelwp changes.

> Index: arch/mips/mips/vm_machdep.c
> ===================================================================
> RCS file: /cvsroot/src/sys/arch/mips/mips/vm_machdep.c,v
> retrieving revision 1.117
> diff -u -r1.117 vm_machdep.c
> --- arch/mips/mips/vm_machdep.c	17 May 2007 14:51:25 -0000	1.117
> +++ arch/mips/mips/vm_machdep.c	25 May 2007 14:47:42 -0000
>  <at>  <at>  -170,7 +170,9  <at>  <at> 
(Continue reading)

Andrew Doran | 14 Jun 2007 22:37
Picon

Re: yamt-idlelwp fallout for mips/cobalt?

On Thu, Jun 14, 2007 at 11:54:39AM -0600, Michael L. Hitch wrote:

> On Fri, 25 May 2007, Izumi Tsutsui wrote:
> 
> >The following patch makes a LOCKDEBUG kernel work,
> >but I don't know if it's really correct.
> 
>   I don't think it's correct.  This was changed several years ago
> (starting with revision 1.175) because kernel threads would start
> and run with interrupts disabled.  I had the same problem with my
> amiga (m68k) not all that long ago because I had added a raidframe
> drive and was getting lots of clock skew when parity rebuilding was
> going on.  I finally figured out that the raidframe thread was running
> with interrupts blocked, and started looking at several other ports
> to see how they started kthread processes and found that they had
> the same problem, but had fixed for several years.

> >Index: arch/mips/mips/vm_machdep.c
> >===================================================================
> >RCS file: /cvsroot/src/sys/arch/mips/mips/vm_machdep.c,v
> >retrieving revision 1.117
> >diff -u -r1.117 vm_machdep.c
> >--- arch/mips/mips/vm_machdep.c	17 May 2007 14:51:25 -0000	1.117
> >+++ arch/mips/mips/vm_machdep.c	25 May 2007 14:47:42 -0000
> > <at>  <at>  -170,7 +170,9  <at>  <at> 
> >	pcb->pcb_context[MIPS_CURLWP_CARD - 16] = (intptr_t)l2;/* S? */
> >	pcb->pcb_context[8] = (intptr_t)f;		/* SP */
> >	pcb->pcb_context[10] = (intptr_t)lwp_trampoline;/* RA */
> >+#if 0
> >	pcb->pcb_context[11] |= PSL_LOWIPL;		/* SR */
(Continue reading)

Izumi Tsutsui | 15 Jun 2007 15:26
Picon
Gravatar

Re: yamt-idlelwp fallout for mips/cobalt?

ad <at> NetBSD.org wrote:

> > Looking back in the mail archives, this seems to be an attempt
> > to fix a locking against myself panic, so I suspect it's more likely
> > a locking error somewhere. 
> 
> The call into lwp_startup() does an spl0(). Before it does that, it also
> unlocks the previous LWP if any. If we enable interrupts before unlocking
> the previous LWP, we can end up taking an interrupt and trying to acquire
> a spinlock that is already held.

ddb trace on today's -current kernel shows:
---
Mounting all filesystems...
Mutex error: lockdebug_wantlock: locking against myself

lock address : 0x0000000080318d00 type     :               spin
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  1
current cpu  :                  0 last held:                  0
current lwp  : 0x000000008fc8b000 last held: 0x000000008fc8b700
last locked  : 0x000000008018ad6c unlocked : 0x0000000080184010
owner field  : 000000000000000000 wait/spin:                0/1

panic: LOCKDEBUG
Stopped in pid 249.1 (nfsio) at netbsd:cpu_Debugger+0x4:        jr      ra
                bdslot: nop
db> tr
cpu_Debugger+4 (8fffe000,802f3370,d,0) ra 801a6508 sz 0
panic+190 (8fffe000,802f3370,d,0) ra 8019e218 sz 48
(Continue reading)

Andrew Doran | 15 Jun 2007 15:33
Picon

Re: yamt-idlelwp fallout for mips/cobalt?

On Fri, Jun 15, 2007 at 10:26:02PM +0900, Izumi Tsutsui wrote:

> ad <at> NetBSD.org wrote:
> 
> > > Looking back in the mail archives, this seems to be an attempt
> > > to fix a locking against myself panic, so I suspect it's more likely
> > > a locking error somewhere. 
> > 
> > The call into lwp_startup() does an spl0(). Before it does that, it also
> > unlocks the previous LWP if any. If we enable interrupts before unlocking
> > the previous LWP, we can end up taking an interrupt and trying to acquire
> > a spinlock that is already held.
> 
> ddb trace on today's -current kernel shows:
> ---
> Mounting all filesystems...
> Mutex error: lockdebug_wantlock: locking against myself
> 
> lock address : 0x0000000080318d00 type     :               spin
> shared holds :                  0 exclusive:                  1
> shares wanted:                  0 exclusive:                  1
> current cpu  :                  0 last held:                  0
> current lwp  : 0x000000008fc8b000 last held: 0x000000008fc8b700
> last locked  : 0x000000008018ad6c unlocked : 0x0000000080184010
> owner field  : 000000000000000000 wait/spin:                0/1
> 
> panic: LOCKDEBUG
> Stopped in pid 249.1 (nfsio) at netbsd:cpu_Debugger+0x4:        jr      ra
>                 bdslot: nop
> db> tr
(Continue reading)

Izumi Tsutsui | 15 Jun 2007 15:42
Picon
Gravatar

Re: yamt-idlelwp fallout for mips/cobalt?

ad <at> NetBSD.org wrote:

> > so softintr(9) occurred (or was enabled) as soon as
> > a kernel jumped to lwp_trampoline() before lwp_startup()
> > was called.
> 
> Is this with or without your patch?

Without my patch.

With the patch (#ifdef'ed out PSL_LOWIPL), this doesn't happen.
---
Izumi Tsustui

Izumi Tsutsui | 17 Jul 2007 16:11
Picon
Gravatar

Re: yamt-idlelwp fallout for mips/cobalt?

I wrote:

> ad <at> NetBSD.org wrote:
> 
> > > so softintr(9) occurred (or was enabled) as soon as
> > > a kernel jumped to lwp_trampoline() before lwp_startup()
> > > was called.
> > 
> > Is this with or without your patch?
> 
> Without my patch.
> 
> With the patch (#ifdef'ed out PSL_LOWIPL), this doesn't happen.

Is it okay to commit my patch?
http://mail-index.netbsd.org/port-mips/2007/05/25/0000.html

The spl is lowered by spl0() in lwp_startup() which is called
at the beginning of lwp_trampoline() anyway.
---
Izumi Tsutsui

Andrew Doran | 18 Jul 2007 15:26
Picon

Re: yamt-idlelwp fallout for mips/cobalt?

On Tue, Jul 17, 2007 at 11:11:24PM +0900, Izumi Tsutsui wrote:

> Is it okay to commit my patch?
> http://mail-index.netbsd.org/port-mips/2007/05/25/0000.html

Please - I think we should just delete the line.

Thanks,
Andrew

Michael L. Hitch | 17 Jul 2007 18:41

Re: yamt-idlelwp fallout for mips/cobalt?

On Tue, 17 Jul 2007, Izumi Tsutsui wrote:

>> Without my patch.
>>
>> With the patch (#ifdef'ed out PSL_LOWIPL), this doesn't happen.
>
> Is it okay to commit my patch?
> http://mail-index.netbsd.org/port-mips/2007/05/25/0000.html

   I say just remove the setting PSL_LOWIPL, since it's no longer needed.

--
Michael L. Hitch			mhitch <at> montana.edu
Computer Consultant
Information Technology Center
Montana State University	Bozeman, MT	USA

Izumi Tsutsui | 18 Jul 2007 16:31
Picon
Gravatar

Re: yamt-idlelwp fallout for mips/cobalt?

> > Is it okay to commit my patch?
> > http://mail-index.netbsd.org/port-mips/2007/05/25/0000.html

mhitch <at> lightning.msu.montana.edu wrote:
>    I say just remove the setting PSL_LOWIPL, since it's no longer needed.

ad <at> NetBSD.org wrote:
> Please - I think we should just delete the line.

Fix committed. Thank you for comments.
---
Izumi Tsutsui

Michael L. Hitch | 14 Jun 2007 22:56

Re: yamt-idlelwp fallout for mips/cobalt?

On Thu, 14 Jun 2007, Andrew Doran wrote:

> I'm not yet sure what we need to set into SR in this case, but post
> yamt-idlelwp, LWPs should start up at IPL_SCHED as far as MD code is
> concerned. cpu_switchto() is expected to maintain the IPL at IPL_SCHED or
> above across the switch. If I read the code correctly, PSL_LOWIPL enables
> all interrupts.

   Ah - I just looked at the alpha changes, and the code to set the IPL
level for the new lwp was removed, leaving it what was copied from the
parent.  I thought I had found something simllar in the i386 code, but
I can't find what I remember.

> The call into lwp_startup() does an spl0(). Before it does that, it also
> unlocks the previous LWP if any. If we enable interrupts before unlocking
> the previous LWP, we can end up taking an interrupt and trying to acquire
> a spinlock that is already held.

   In that case, I think not changing the interrupt masking when forking a 
new lwp is probably what is now desired.  The new process will inherit the
IPL level from the parent process (although I'm now sure what it actually
is for the various ports).

--
Michael L. Hitch			mhitch <at> montana.edu
Computer Consultant
Information Technology Center
Montana State University	Bozeman, MT	USA

(Continue reading)


Gmane