Bolo | 10 Jan 2009 02:55
Picon
Favicon

Re: 4.01 panic/lock issues?

> > Model: sun3 60
> > fpu: mc68882
> > 
> > This may be part of the problem. It appears that at some point in the
> > past life of this box, someone did the 68882 mod on it. I never caught
> > this before since it was never a problem on past SunOS or NetBSD
> > versions, but something here may be triggering it.
> 
> According to the Sun3 Archive page:
> http://www.sun3arc.org/hardpatches/3.60-tuning/fpu.phtml
> 68882 is software compatible with 68881 (except CPI),
> so I wonder if it could cause FPU related errors.
> Actually I used 68882 on 3/60 about 13 years ago (with 1.1 or 1.2?)
> and it worked fine with X11 etc.

Off the top of my head...

The size of the floating point save context can be different.
If the kernel was only compiled with code for the 68881 floating point
save size, a larger 68882 context could overflow into something else.

It won't matter for executables that don't use floating point, because
of the only-save-if-used fp code.

> Hmm, does only the mutt binary cause the problem?

Mutt probably does some floating point arith for, which
makes a save-context be generated on context switch, and boom,
something dies.

(Continue reading)

Izumi Tsutsui | 11 Jan 2009 01:30
Picon
Gravatar

Re: 4.01 panic/lock issues?

Bolo wrote:

> The size of the floating point save context can be different.
> If the kernel was only compiled with code for the 68881 floating point
> save size, a larger 68882 context could overflow into something else.
> 
> It won't matter for executables that don't use floating point, because
> of the only-save-if-used fp code.

Yes, I think that's the way how src/sys/arch/sun3/sun3/fpu.c
detects 68881 or 68882.  I guess MI m68k code will handle
frame size by "fpu_type" variable since 3/80 (68030+68882)
uses the same fpu.c.

> Mutt probably does some floating point arith for, which
> makes a save-context be generated on context switch, and boom,
> something dies.
> 
> To try and reproduce it just have something that does a floating
> point op, and context switches -- ato[df] or printf("%f") might be
> enough to trigger it.

Many simple commands (like ps(1)) uses FP ops so if FP instructions
don't work completely it's unlikely to boot up to multiuser.
(see "LC040 FPE problem" on mac68k port page)

In this case, unimplemented FP instruction trap happens
even though the machine has 68020+68882.

The trap invokes FP emulation functions, but I'm afraid
(Continue reading)


Gmane