Michael Jones | 13 Aug 06:44 2014

Odd FFI behavior

I have some strange behavior with GHC 7.6.3 on Ubuntu 14 TLS when using FFI and I am looking for some ideas on
what could be going on.

Fundamentally, adding wait calls (delays coded in the C) changes the behavior of the C, in that returned
status codes have proper values when there are delays, and return errors when there are no delays. But
these same calls result in proper behavior on the Aardvark’s serial bus, return proper data, etc. Only
the status get’s messed up.

The module calls a thin custom C layer over the Aaardvark C layer, which dynamically loads a dll and makes
calls into it. The thin layer just makes the use of c2hs eaiser.

It is always possible there is some kind of memory issue, but there is no pattern to the mishap. It is random.
Adding delays just changes probabilities of bad status.

I made a C version of my application calling the same custom C layer, and there are no problems. This sort of
indicates the problem is with the FFI.

Because the failures are not general in that they target one particular value, and seem to be affected by
time, it makes me wonder if there is some subtle Haskell run time issue. Like, could the garbage collector
be interacting with things?

Does anyone have an idea what kind of things to look for?

Mike

DLL loader

static void *_loadFunction (const char *name, int *result) {
    static DLL_HANDLE handle = 0;
    void * function = 0;
(Continue reading)

Donn Cave | 13 Aug 07:04 2014

Re: Odd FFI behavior

...
> Because the failures are not general in that they target one
> particular value, and seem to be affected by time, it makes me
> wonder if there is some subtle Haskell run time issue. Like,
> could the garbage collector be interacting with things?
> 
> Does anyone have an idea what kind of things to look for?

Sure - not that I have worked out in any detail how this would
do what you're seeing, but it's easy to do and often enough
works.

Compile with RTS options enabled and invoke with RTS option -V0.

That will disable the runtime internal timer, which uses signals.
The flood of signals from this source can interrupt functions
that aren't really designed to deal with that, because in a more
normal context they don't have to.

	Donn
Michael Jones | 13 Aug 17:12 2014

Re: Odd FFI behavior

Donn,

Thanks, this solved the problem.

I would like to know more about what the signals are doing, and what am I giving up by disabling them?

My hope is I can then go back to the dll expert and ask why this is causing their library a problem and try to see
if they can solve the problem from their end, etc.

Mike

On Aug 12, 2014, at 11:04 PM, Donn Cave <donn <at> avvanta.com> wrote:

> ...
>> Because the failures are not general in that they target one
>> particular value, and seem to be affected by time, it makes me
>> wonder if there is some subtle Haskell run time issue. Like,
>> could the garbage collector be interacting with things?
>> 
>> Does anyone have an idea what kind of things to look for?
> 
> Sure - not that I have worked out in any detail how this would
> do what you're seeing, but it's easy to do and often enough
> works.
> 
> Compile with RTS options enabled and invoke with RTS option -V0.
> 
> That will disable the runtime internal timer, which uses signals.
> The flood of signals from this source can interrupt functions
> that aren't really designed to deal with that, because in a more
(Continue reading)

Donn Cave | 13 Aug 17:56 2014

Re: Odd FFI behavior

[ ... re -V0 ]
> Thanks, this solved the problem.
> 
> I would like to know more about what the signals are doing, and
> what am I giving up by disabling them?
> 
> My hope is I can then go back to the dll expert and ask why this
> is causing their library a problem and try to see if they can
> solve the problem from their end, etc.

I'm disgracefully ignorant about that.  When I've been forced to
run this way, it doesn't seem to do any very obvious immediate
harm to the application at all, but I could be missing long term
effects.

The problem with the library might be easy to fix, and in principle
it's sure worth looking into - while the GHC runtime delivers signals
on an exceptionally massive scale, there are plenty of normal UNIX
applications that use signals, maybe timers just like this for example,
and it's easy to set up a similar test environment using setitimer(2)
to provide the signal bombardment.  (I believe GHC actually uses
SIGVTALRM rather than SIGALRM, but don't think it will make any
difference.)

But realistically, in the end sometimes we can't get a fix for it,
so it's interesting to know how -V0 works out as a work-around.
I hope you will keep us posted.

	Donn
(Continue reading)

Michael Jones | 14 Aug 14:32 2014

Re: Odd FFI behavior

Donn,

I was able to duplicate my problem in C using SIGVTALRM.

Can someone explain the impact of using -V0 ? What does it do to performance, etc?

Mike

Sent from my iPad

> On Aug 13, 2014, at 9:56 AM, Donn Cave <donn <at> avvanta.com> wrote:
> 
> [ ... re -V0 ]
>> Thanks, this solved the problem.
>> 
>> I would like to know more about what the signals are doing, and
>> what am I giving up by disabling them?
>> 
>> My hope is I can then go back to the dll expert and ask why this
>> is causing their library a problem and try to see if they can
>> solve the problem from their end, etc.
> 
> I'm disgracefully ignorant about that.  When I've been forced to
> run this way, it doesn't seem to do any very obvious immediate
> harm to the application at all, but I could be missing long term
> effects.
> 
> The problem with the library might be easy to fix, and in principle
> it's sure worth looking into - while the GHC runtime delivers signals
> on an exceptionally massive scale, there are plenty of normal UNIX
(Continue reading)


Gmane