Derrick Lobo | 30 Apr 2012 20:57
Favicon

BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

Hi All.

I am using the /201204271810Z/ built version of BETAnetbsd6 on a H8DG6-F, it
has one AMD 6272 with 16 cores..

The server crashes with no errors on the console or logs for a simple
untar.. booting the server with -12(Disable SMP) boots it with one core and
the server is stable..

I have the following turned on the kernel and still don't see a thing in the
logs..

# Diagnostic/debugging support options
options         DIAGNOSTIC      # expensive kernel consistency checks
                                # XXX to be commented out on release branch
options         DEBUG           # expensive debugging checks/support
options         LOCKDEBUG       # expensive locking checks/support
options         KMEMSTATS       # kernel memory statistics (vmstat -m)

options         DDB             # in-kernel debugger
options         DDB_ONPANIC=1   # see also sysctl(8): `ddb.onpanic'
options         DDB_HISTORY_SIZE=512    # enable history editing in DDB
#options        KGDB            # remote debugger
#options        KGDB_DEVNAME="\"com\"",KGDB_DEVADDR=0x3f8,KGDB_DEVRATE=9600

I would definitely like to use the CPU. But seems like betbsd hangs on any
activity.. maybe its some setup that I have incorrect
Regards

Derrick Lobo
(Continue reading)

Manuel Bouyer | 30 Apr 2012 22:00

Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

On Mon, Apr 30, 2012 at 02:57:04PM -0400, Derrick Lobo wrote:
> Hi All.
> 
> I am using the /201204271810Z/ built version of BETAnetbsd6 on a H8DG6-F, it
> has one AMD 6272 with 16 cores..
> 
> The server crashes with no errors on the console or logs for a simple
> untar.. booting the server with -12(Disable SMP) boots it with one core and
> the server is stable..

Can you try a HEAD kernel ? I've been playing with NetBSD on a 4-6282 system
and couldn't make the kernel crash ...

--

-- 
Manuel Bouyer <bouyer <at> antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--

Derrick Lobo | 30 Apr 2012 22:42
Favicon

RE: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

Ok I will give it try.

-----Original Message-----
From: Manuel Bouyer [mailto:bouyer <at> antioche.eu.org] 
Sent: Monday, April 30, 2012 4:01 PM
To: Derrick Lobo
Cc: 'Port-Amd64 <at> Netbsd. Org'
Subject: Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config
crashes

On Mon, Apr 30, 2012 at 02:57:04PM -0400, Derrick Lobo wrote:
> Hi All.
> 
> I am using the /201204271810Z/ built version of BETAnetbsd6 on a 
> H8DG6-F, it has one AMD 6272 with 16 cores..
> 
> The server crashes with no errors on the console or logs for a simple 
> untar.. booting the server with -12(Disable SMP) boots it with one 
> core and the server is stable..

Can you try a HEAD kernel ? I've been playing with NetBSD on a 4-6282 system
and couldn't make the kernel crash ...

--
Manuel Bouyer <bouyer <at> antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--

David Laight | 30 Apr 2012 23:15
Picon

Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

On Mon, Apr 30, 2012 at 04:42:36PM -0400, Derrick Lobo wrote:
> Ok I will give it try.
> 
> -----Original Message-----
> From: Manuel Bouyer [mailto:bouyer <at> antioche.eu.org] 
> Sent: Monday, April 30, 2012 4:01 PM
> To: Derrick Lobo
> Cc: 'Port-Amd64 <at> Netbsd. Org'
> Subject: Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config
> crashes
> 
> On Mon, Apr 30, 2012 at 02:57:04PM -0400, Derrick Lobo wrote:
> > Hi All.
> > 
> > I am using the /201204271810Z/ built version of BETAnetbsd6 on a 
> > H8DG6-F, it has one AMD 6272 with 16 cores..
> > 
> > The server crashes with no errors on the console or logs for a simple 
> > untar.. booting the server with -12(Disable SMP) boots it with one 
> > core and the server is stable..
> 
> Can you try a HEAD kernel ? I've been playing with NetBSD on a 4-6282 system
> and couldn't make the kernel crash ...

I just had my i7 system lock solid.
Was just running a kernel build.
Has run a few ful builds without problems.
May, or may not, be related.

	David
(Continue reading)

Tom Ivar Helbekkmo | 1 May 2012 08:53
Picon
Gravatar

Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

David Laight <david <at> l8s.co.uk> writes:

> I just had my i7 system lock solid.
> Was just running a kernel build.
> Has run a few ful builds without problems.
> May, or may not, be related.

My Dell PE2850 identifies thus at boot:

Dell Computer Corporation PowerEdge 2850
mainbus0 (root)
cpu0 at mainbus0 apid 0: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu1 at mainbus0 apid 6: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu2 at mainbus0 apid 1: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu3 at mainbus0 apid 7: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43

One of the things it does is be an NFS server for a couple other
systems.  Lately, using a -current from April 10th, it's been hanging
itself up from time to time, during heavy disk access from clients.
Everything else still works, but anything that tries to access disk
locks.  No error messages on the console or in the logs.

I can't break into the debugger on this box, so it's hard to debug
further.  I've tried setting hw.cnmagic to something simple, even just
a single character, but it doesn't do anything...

-tih
--

-- 
"The market" is a bunch of 28-year-olds who don't know anything. --Paul Krugman

(Continue reading)

Tom Ivar Helbekkmo | 1 May 2012 09:08
Picon
Gravatar

Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

Tom Ivar Helbekkmo <tih <at> Hamartun.Priv.NO> writes:

> Lately, using a -current from April 10th, it's been hanging
> itself up from time to time, during heavy disk access from clients.
> Everything else still works, but anything that tries to access disk
> locks.  No error messages on the console or in the logs.

...and it's managed to do something wrong, too.  The system has a RAID
controller with battery backed RAM on it, and /usr is mounted thus:

/dev/ld0e       /usr            ffs     rw,log          0 2

I just got this on the console and in /var/log/messages, a few minutes
after the latest forced reboot:

free inode /usr/2067949 had 4294967264 blocks

-tih
--

-- 
"The market" is a bunch of 28-year-olds who don't know anything. --Paul Krugman

Martin Husemann | 1 May 2012 11:00
Picon

Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

I have one machine (phenom II x6) which locks up on heavy disk access
"sometimes". Usually a build.sh -j 12 is good enough to trigger it.
Lockup is realy hard, no ddb.

I have no idea how to debug this :-(

On other amd64 machines -current runs just fine for me, no matter how hard
I beat on it.

Martin

Tom Ivar Helbekkmo | 2 May 2012 19:56
Picon
Gravatar

Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

Tom Ivar Helbekkmo <tih <at> Hamartun.Priv.NO> writes:

> One of the things it does is be an NFS server for a couple other
> systems.  Lately, using a -current from April 10th, it's been hanging
> itself up from time to time, during heavy disk access from clients.
> Everything else still works, but anything that tries to access disk
> locks.  No error messages on the console or in the logs.

It happened again after updating to -current as of May 1st.  I've now
booted it with SMP disabled.  If it stays up for a few days running on
one CPU, that should be a reasonably good indication that the problem is
SMP related.

> I can't break into the debugger on this box, so it's hard to debug
> further.  I've tried setting hw.cnmagic to something simple, even just
> a single character, but it doesn't do anything...

I'd still like to get this to work with a serial console.  Any hints?

-tih
--

-- 
"The market" is a bunch of 28-year-olds who don't know anything. --Paul Krugman

Tom Ivar Helbekkmo | 8 May 2012 10:57
Picon
Gravatar

Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

Tom Ivar Helbekkmo <tih <at> Hamartun.Priv.NO> writes:

> Tom Ivar Helbekkmo <tih <at> Hamartun.Priv.NO> writes:
>
>> One of the things it does is be an NFS server for a couple other
>> systems.  Lately, using a -current from April 10th, it's been hanging
>> itself up from time to time, during heavy disk access from clients.
>> Everything else still works, but anything that tries to access disk
>> locks.  No error messages on the console or in the logs.
>
> It happened again after updating to -current as of May 1st.  I've now
> booted it with SMP disabled.  If it stays up for a few days running on
> one CPU, that should be a reasonably good indication that the problem is
> SMP related.

This box has been getting these hangs annoyingly often, and has also
been dropping into the debugger on NMI from time to time.  One or the
other, most often the hang, would happen once or twice per day.  I never
wrote down any backtraces from the NMI traps, thinking it was a flaky
RAM module (even though this was kind of strange, since the machine has
redundant (mirrored) RAM), but at least once I noticed the kernel was
doing something memory mapping-related when it happened.

Then, Martin Husemann suggested I try disabling the direct map stuff, by
editing sys/arch/amd64/include/types.h, and getting rid of these defines,
near the end of the file:

#include "opt_xen.h"
#if defined(__x86_64__) && !defined(XEN)
#define	__HAVE_DIRECT_MAP 1
(Continue reading)

Derrick Lobo | 1 May 2012 17:38
Favicon

RE: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

Manuel

I have tried the latest HEAD kernel from yesterdays build and the server
crashed when I started scp'ing a file..

I can replicate this issue on all three of the new servers.. 

I am also using a LSI 2108 as the primary disk controller so not sure if
theres some irq clash.. however seems like disabling SMP seems to fix the
issue rightaway.. I was also able to build couple of custom kernel by
turning of SMP...

Derrick

-----Original Message-----
From: Manuel Bouyer [mailto:bouyer <at> antioche.eu.org] 
Sent: Monday, April 30, 2012 4:01 PM
To: Derrick Lobo
Cc: 'Port-Amd64 <at> Netbsd. Org'
Subject: Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config
crashes

On Mon, Apr 30, 2012 at 02:57:04PM -0400, Derrick Lobo wrote:
> Hi All.
> 
> I am using the /201204271810Z/ built version of BETAnetbsd6 on a 
> H8DG6-F, it has one AMD 6272 with 16 cores..
> 
> The server crashes with no errors on the console or logs for a simple 
> untar.. booting the server with -12(Disable SMP) boots it with one 
(Continue reading)

Manuel Bouyer | 1 May 2012 20:10

Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

On Tue, May 01, 2012 at 11:38:59AM -0400, Derrick Lobo wrote:
> Manuel
> 
> I have tried the latest HEAD kernel from yesterdays build and the server
> crashed when I started scp'ing a file..
> 
> I can replicate this issue on all three of the new servers.. 
> 
> I am also using a LSI 2108 as the primary disk controller so not sure if
> theres some irq clash.. however seems like disabling SMP seems to fix the
> issue rightaway.. I was also able to build couple of custom kernel by
> turning of SMP...

I tested again with an up-to-date HEAD kernel on the 4-6282 server.
I've scp'ed 29GB of data, containing both small files (src and pkgsrc trees)
and large ones (xen disk images). No problems.

Could it be related to some BIOS settings ? I don't remember the details;
I think I just disabled the on-board IDE controller. Console is on
com2 (which I had to enable in the kernel) if that makes a difference.
The network controller is:
wm0 at pci2 dev 0 function 0: 82576 1000BaseT Ethernet (rev. 0x01)

connected to a 100Mbs switch. Disk controller is:
mpii0 at pci1 dev 0 function 0: vendor 0x1000 product 0x0072 (rev. 0x03)

--

-- 
Manuel Bouyer <bouyer <at> antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--
(Continue reading)

Derrick Lobo | 30 Apr 2012 21:04
Favicon

RE: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

We also see the server crash/freeze when we scp a huge file to the server.. 

-----Original Message-----
From: port-amd64-owner <at> NetBSD.org [mailto:port-amd64-owner <at> NetBSD.org] On
Behalf Of Derrick Lobo
Sent: Monday, April 30, 2012 2:57 PM
To: 'Port-Amd64 <at> Netbsd. Org'
Subject: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

Hi All.

I am using the /201204271810Z/ built version of BETAnetbsd6 on a H8DG6-F, it
has one AMD 6272 with 16 cores..

The server crashes with no errors on the console or logs for a simple
untar.. booting the server with -12(Disable SMP) boots it with one core and
the server is stable..

I have the following turned on the kernel and still don't see a thing in the
logs..

# Diagnostic/debugging support options
options         DIAGNOSTIC      # expensive kernel consistency checks
                                # XXX to be commented out on release branch
options         DEBUG           # expensive debugging checks/support
options         LOCKDEBUG       # expensive locking checks/support
options         KMEMSTATS       # kernel memory statistics (vmstat -m)

options         DDB             # in-kernel debugger
options         DDB_ONPANIC=1   # see also sysctl(8): `ddb.onpanic'
(Continue reading)


Gmane