Gary Byers | 10 Aug 12:12

Re: Tracking Down CFFI Problem

I just glanced at this and may be missing something, but the code
didn't look like it'd ever been built on a 64-bit system.  Just trying
to build the C library on a 64-bit Linux system (where the C compiler/
toolchain assume a 64-bit world) led to

/usr/bin/ld: betabase.o: relocation R_X86_64_32 against `a local symbol' 
can not be used when making a shared object; recompile with -fPIC
betabase.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [liblispstat.so] Error 1

and there didn't seem to be anything in the Makefile that addressed this.
(Adding "-fPIC" to CFLAGS probably isn't the last change you'd have to
make; I don't know how sensitive the rest of the code is to word-size
issues, but the fact that the linker was complaining strongly suggests
that the C code has only ever been built or run in a 32-bit environment.)

If that assumption's correct, then it may also be the case that the
lisp code (and/or CFFI glue) have word-size issues.  Someone who is
interested in this should certainly look at the code carefully and
try to determine if it's making assumptions about the size or alignment
of foreign objects.

There are lots of things to go wrong, and it's certainly possible that
foreign memory access primitives in CCL could have bugs in them (though
I don't know of any such bugs.)  It seems likely that word-size issues
would be obscuring the issue enough that any such bugs would be very
difficult to isolate.

On Sat, 9 Aug 2008, Brent Fulgham wrote:
(Continue reading)

Brent Fulgham | 18 Aug 05:56

Re: Tracking Down CFFI Problem [SOLVED]

Hi,

Just for future web searchers, I thought I would report that I  
resolved the problem.

On Aug 10, 2008, at 3:12 AM, Gary Byers wrote:

> I just glanced at this and may be missing something, but the code
> didn't look like it'd ever been built on a 64-bit system.  Just trying
> to build the C library on a 64-bit Linux system (where the C compiler/
> toolchain assume a 64-bit world) led to

As Gary pointed out, there were several 32-bit assumptions in the  
original code that caused the problem.  The matrix was correctly built  
to hold 64-bit double values, but it assumed pointer sizes were 32- 
bits.  Consequently, the matrix was constructed as an array of three  
32-bit values that were supposed to hold the pointers to corresponding  
arrays of 64-bit doubles.  Through some luck (probably due to padding  
on 64-bit boundaries) the first two pointers ended up being valid,  
while the third was garbage.

A thorough review of the entire C code-base (and use of the - 
Wconversion and -Wshorten-64-to-32 flags) helped track down the cause  
of this problem, as well as several other less catastrophic issues I  
had not noticed yet.

Thanks for the helpful pointers!

-Brent
(Continue reading)


Gmane