Christophe Rhodes | 4 Jan 2004 00:01
Picon
Picon
Favicon

Various brokennesses

So, firstly, as pointed out by Paul Dietz, my implementation of
modular multiplication on the x86 appears to be broken, in that (as of
sbcl-0.8.7.5):

  * (funcall
     (compile
      nil
      '(lambda (a)
         (declare (type (integer 177547470 226026978) a))
         (declare (optimize (speed 3) (space 0) (safety 0) (debug 0)
                            (compilation-speed 1)))
         (logand a (* a 438810))))
     215067723)
  ; in: LAMBDA (A)
  ;     (LOGAND A (* A 438810))
  ; 
  ; note: deleting unreachable code
  ; compilation unit finished
  ;   printed 1 note

  debugger invoked on a TYPE-ERROR in thread 2649:
    The value 215067723 is not of type (INTEGER 177547470 226026978).

[ this does not happen if the form is (logand #xffffffff (* a 438810));
  it _does_ happen if the form is (logand #xffff0000 (* a 438810)).
  Any ideas?  It looks like another "lying to the compiler" thing
  prompted by OPTIMIZE-MULTIPLY, but what? ]

So, given that I'm not in a great position for first-stone casting,
I'm not convinced that the latest GC refactoring is entirely
(Continue reading)

Paul F. Dietz | 4 Jan 2004 09:55
Favicon

Re: Various brokennesses

Christophe Rhodes wrote:
> So, firstly, as pointed out by Paul Dietz, my implementation of
> modular multiplication on the x86 appears to be broken, in that (as of
> sbcl-0.8.7.5):

I've added three tests (from the random test generator) for this bug
to gcl/ansi-tests/misc.lsp.

	Paul

-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
Alexey Dejneka | 4 Jan 2004 10:01
X-Face
Picon
Favicon

Re: Various brokennesses

Christophe Rhodes <csr21 <at> cam.ac.uk> writes:

> So, firstly, as pointed out by Paul Dietz, my implementation of
> modular multiplication on the x86 appears to be broken

It is. You need to choose the behavior of %LEA for overflow: should it
cut the result to 32 bits (in this case the definition and the type
deriver are wrong) or to pass as is (then the declared type is wrong
and %LEA-MOD32 is necessary)?

--

-- 
Regards,
Alexey Dejneka

"Alas, the spheres of truth are less transparent than those of
illusion." -- L.E.J. Brouwer

-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
Christophe Rhodes | 4 Jan 2004 18:44
Picon
Picon
Favicon

Re: Various brokennesses

Alexey Dejneka <adejneka <at> comail.ru> writes:

> Christophe Rhodes <csr21 <at> cam.ac.uk> writes:
>
>> So, firstly, as pointed out by Paul Dietz, my implementation of
>> modular multiplication on the x86 appears to be broken
>
> It is. You need to choose the behavior of %LEA for overflow: should it
> cut the result to 32 bits (in this case the definition and the type
> deriver are wrong) or to pass as is (then the declared type is wrong
> and %LEA-MOD32 is necessary)?

Thanks.  I think I've fixed this in 0.8.7.6; I went for the second
option, making %LEA a little more generic, and defining a
modular-fun-optimizer to convert it into %LEA-MOD32 as appropriate.

Cheers,

Christophe
--

-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)

-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
(Continue reading)

Paul F. Dietz | 4 Jan 2004 00:18
Favicon

Re: Various brokennesses

Christophe Rhodes wrote:
>  running Paul Dietz' regular test suite
> yielded me most recently vast numbers of SIMPLE-ERRORs (probably
> segfaults of some kind) followed eventually by
> 
>   fatal error encountered in SBCL pid 26584:
>   no transport function for object 0x0c91d1ef (widetag 0x0)
>   LDB monitor
>   ldb>
> 
> It's possible of course that my hardware is bad, or that I'm running a
> broken tree, so could someone try to reproduce this?

This bug can be invoked by something as simple as:

* (loop repeat 100 do (compile nil '(lambda (a b) (+ a b))) do (sb-ext::gc :full t))
fatal error encountered in SBCL pid 2685:
no transport function for object 0x0901e11f (widetag 0x89)
There's no LDB in this build; exiting.

	Paul

-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
Paul F. Dietz | 4 Jan 2004 00:40
Favicon

Re: Various brokennesses

I wrote:

> This bug can be invoked by something as simple as:
> 
> * (loop repeat 100 do (compile nil '(lambda (a b) (+ a b))) do 
> (sb-ext::gc :full t))
> fatal error encountered in SBCL pid 2685:
> no transport function for object 0x0901e11f (widetag 0x89)
> There's no LDB in this build; exiting.

Even simpler:

This is SBCL 0.8.7.5, an implementation of ANSI Common Lisp.

More information about SBCL is available at <http://www.sbcl.org/>.
SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
* (loop repeat 2 do (compile nil '(lambda () nil)) do (sb-ext::gc :full t))
fatal error encountered in SBCL pid 2712:
GC invariant lost, file "gc-common.c", line 154
There's no LDB in this build; exiting.

	Paul

-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
(Continue reading)

Paul F. Dietz | 4 Jan 2004 00:45
Favicon

Re: Various brokennesses

Or even:

This is SBCL 0.8.7.5, an implementation of ANSI Common Lisp.

More information about SBCL is available at <http://www.sbcl.org/>.
SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
* (progn (compile nil '(lambda (a) (+ a a))) (sb-ext::gc :full t))
fatal error encountered in SBCL pid 2734:
no transport function for object 0x0900d1d7 (widetag 0x41)
There's no LDB in this build; exiting.

-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
Daniel Barlow | 7 Jan 2004 04:47

GC (was Re: Re: Various brokennesses)

"Paul F. Dietz" <dietz <at> dls.net> writes:

> * (progn (compile nil '(lambda (a) (+ a a))) (sb-ext::gc :full t))

Or in my copy (which has a working variant of the RESCAN_CHECK turned
on, and is also slightly further mutated but not I believe in any
relevant way - oh how I wish I had the Intarweb again and I could
check these things easily)

* (let () (sb-ext:gc :full t)

What?  This /isn't/ a competition to find the shortest form that'll kill
SBCL's GC?

There is a bug, yes.  I'm not sure where, but I am looking for it.  I
suspect it's not in the last round of refactoring: more likely it's in
some fairly old code (maybe CMUCL code, or maybe code I screwed up in
some previous refactoring) and merely exacerbated by the last round of
changes.  Untested, but I expect the relevant bits to back out if
you'd like to return to the previous
not-actually-reliable-but-still-better state are

1) revert gencgc_pickup_dynamic to the form it had in 0.8.7 
2) revert the (gc_alloc_generation == 0) clause of the very large
if statement in gc_find_freeish_pages - and, if you like, the comment
about things breaking randomly if it's omitted ;-)

This will land you back with the same ghastly fragmentation behaviour
that 0.8.7 had, but I think it's the case that we had said behaviour
throughout the 0.8 series, and I'm the only person to have objected so
(Continue reading)

Christophe Rhodes | 13 Jan 2004 18:57
Picon
Picon
Favicon

Re: GC

Daniel Barlow <dan <at> telent.net> writes:

> 1) revert gencgc_pickup_dynamic to the form it had in 0.8.7 
> 2) revert the (gc_alloc_generation == 0) clause of the very large
> if statement in gc_find_freeish_pages - and, if you like, the comment
> about things breaking randomly if it's omitted ;-)

It would seem, at least on casual testing[*], that the attached is
sufficient to restore good behaviour.  Why the gc_alloc_generation
test is needed -- that is, why we only open an allocation region on a
page with some stuff already on it, and why this matters (is this
right?) -- is beyond me currently.

Index: src/runtime/gencgc.c
===================================================================
RCS file: /cvsroot/sbcl/sbcl/src/runtime/gencgc.c,v
retrieving revision 1.46
diff -u -r1.46 gencgc.c
--- src/runtime/gencgc.c	8 Jan 2004 16:26:33 -0000	1.46
+++ src/runtime/gencgc.c	13 Jan 2004 17:14:03 -0000
 <at>  <at>  -966,6 +966,8  <at>  <at> 
 		    (unboxed ? UNBOXED_PAGE : BOXED_PAGE)) &&
 		   (page_table[first_page].large_object == 0) &&
 		   (page_table[first_page].gen == gc_alloc_generation) &&
+		   /* No idea */
+		   (gc_alloc_generation == 0) &&
 		   (page_table[first_page].bytes_used < (PAGE_BYTES-32)) &&
 		   (page_table[first_page].write_protected == 0) &&
(Continue reading)

Christophe Rhodes | 8 Jan 2004 15:42
Picon
Picon
Favicon

Re: GC

Daniel Barlow <dan <at> telent.net> writes:

> "Paul F. Dietz" <dietz <at> dls.net> writes:
>
>> * (progn (compile nil '(lambda (a) (+ a a))) (sb-ext::gc :full t))
>
> Or in my copy (which has a working variant of the RESCAN_CHECK turned
> on, and is also slightly further mutated but not I believe in any
> relevant way - oh how I wish I had the Intarweb again and I could
> check these things easily)
>
> * (let () (sb-ext:gc :full t)
>
> What?  This /isn't/ a competition to find the shortest form that'll kill
> SBCL's GC?
>
> There is a bug, yes.  I'm not sure where, but I am looking for it.  I
> suspect it's not in the last round of refactoring: more likely it's in
> some fairly old code (maybe CMUCL code, or maybe code I screwed up in
> some previous refactoring) and merely exacerbated by the last round of
> changes.  Untested, but I expect the relevant bits to back out if
> you'd like to return to the previous
> not-actually-reliable-but-still-better state are
>
> 1) revert gencgc_pickup_dynamic to the form it had in 0.8.7 
> 2) revert the (gc_alloc_generation == 0) clause of the very large
> if statement in gc_find_freeish_pages - and, if you like, the comment
> about things breaking randomly if it's omitted ;-)

There must be something else, though, because current sources on PPC
(Continue reading)

Christophe Rhodes | 8 Jan 2004 15:52
Picon
Picon
Favicon

Re: Re: GC

Christophe Rhodes <csr21 <at> cam.ac.uk> writes:

> In words: running PURIFY appears to break the disassembler on purified
> functions.  Before purification, SB-DISASSEM::CODE-INST-AREA-LENGTH
> returns 408 for FOO; afterwards, it returns NIL.
> CODE-INST-AREA-LENGTH appears to be a simple slot reference, so it
> appears that purify isn't copying everything it needs to?  Or is
> corrupting data?  Ugh.
>
> I don't have sbcl-0.8.7 to hand; can someone with a PPC (preferably
> MacOS X, to minimise the number of variables) version of that verify
> whether it suffers from this problem?  sbcl-0.8.6 is known not to.

Oh, bah.  Inevitably, I find the problem just after sending the
message:

  -    if (new->trace_table_offset & 0x3)
  +    /* FIXME: why would this be a fixnum? */
  +    if (!(new->trace_table_offset & (EVEN_FIXNUM_LOWTAG|ODD_FIXNUM_LOWTAG)))
   #if 0
  -      pscav(&new->trace_table_offset, 1, 0);
  +	pscav(&new->trace_table_offset, 1, 0);
   #else
  -      new->trace_table_offset = NIL; /* limit lifetime */
  +        new->trace_table_offset = NIL; /* limit lifetime */
   #endif

So.  It would be a fixnum if it's a code component, apparently.  But
worse:

(Continue reading)


Gmane