diego souza | 25 Oct 14:58 2013

fcntl locks, executeFile and threaded runtime

Hi,

I'm having problems with executeFile as it seems to clear the advisory
locks using the threaded runtime. Consider the following snippet (a
simplification of what I'm doing):

    import System.IO
    import Control.Monad
    import System.Posix.IO
    import Control.Concurrent
    import System.Posix.Files
    import System.Posix.Process

    main = do
      let lock = (WriteLock, AbsoluteSeek, 0, 0)
      fd  <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True}
      pid <- forkProcess $ do
        setLock fd lock >> putStrLn "child: ok"
        executeFile "/usr/bin/sleep" False ["5"] Nothing
      threadDelay $ 1 * 1000 * 1000
      setLock fd lock >> putStrLn "parent: fail!"
      void $ getProcessStatus True False pid

Then I consistentlty get these results:

$ ghc -threaded --make test.hs; ./test
child: ok
parent: fail!

$ ghc -rtsopts --make test.hs; ./test
(Continue reading)

Donn Cave | 25 Oct 17:52 2013

Re: fcntl locks, executeFile and threaded runtime

diego souza <dsouza <at> c0d3.xxx>,

> I'm having problems with executeFile as it seems to clear the advisory
> locks using the threaded runtime.

I'm stumped, and unfortunately can't duplicate it here (no surprise as
I have a different platform and GHC version.)  But in case it helps ...
your fcntl(2) file lock will be lost if your process closes any fd open
on that file.  So if the threaded runtime for some reason were to dup
random fds and then close them, around a fork, that would do it.  You
might be able to pick that up in an strace (or whatever your platform
utility for system call tracing.) But I don't see how executeFile could
make any difference, in that scenario.

	Donn

> Consider the following snippet (a simplification of what I'm doing):
> 
>     import System.IO
>     import Control.Monad
>     import System.Posix.IO
>     import Control.Concurrent
>     import System.Posix.Files
>     import System.Posix.Process
>      
>     main = do
>       let lock = (WriteLock, AbsoluteSeek, 0, 0)
>       fd  <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True}
>       pid <- forkProcess $ do
>         setLock fd lock >> putStrLn "child: ok"
(Continue reading)

Brandon Allbery | 25 Oct 18:32 2013
Picon

Re: fcntl locks, executeFile and threaded runtime

On Fri, Oct 25, 2013 at 11:52 AM, Donn Cave <donn <at> avvanta.com> wrote:
But I don't see how executeFile could
make any difference, in that scenario.

Look for fcntl(fd, FD_CLOEXEC, 1) calls?

--
brandon s allbery kf8nh                               sine nomine associates
allbery.b <at> gmail.com                                  ballbery <at> sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Donn Cave | 25 Oct 19:20 2013

Re: fcntl locks, executeFile and threaded runtime

> 
> On Fri, Oct 25, 2013 at 11:52 AM, Donn Cave <donn <at> avvanta.com> wrote:
> 
>> But I don't see how executeFile could
>> make any difference, in that scenario.
> 
> Look for fcntl(fd, FD_CLOEXEC, 1) calls?

Oh, that would be heinous!

	Donn
Brandon Allbery | 25 Oct 19:28 2013
Picon

Re: fcntl locks, executeFile and threaded runtime

On Fri, Oct 25, 2013 at 1:20 PM, Donn Cave <donn <at> avvanta.com> wrote:
>
> On Fri, Oct 25, 2013 at 11:52 AM, Donn Cave <donn <at> avvanta.com> wrote:
>
>> But I don't see how executeFile could
>> make any difference, in that scenario.
>
> Look for fcntl(fd, FD_CLOEXEC, 1) calls?

Oh, that would be heinous!

It would be because I got that completely wrong. fcntl(fd, F_SETFD, FD_CLOEXEC). sigh.

--
brandon s allbery kf8nh                               sine nomine associates
allbery.b <at> gmail.com                                  ballbery <at> sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
diego souza | 26 Oct 01:49 2013

Re: fcntl locks, executeFile and threaded runtime

Yeah, it was my first thought too, but I didn't see anything like this
in the strace output.

What I do see, though, are two additional forks when using -threaded
that seems that die early. This could very well explain why I'm
loosing the lock.

But then, why this only happens using executeFile?

Thanks!
~dsouza

ghc -rtsopts --make test.hs; strace -f -e trace=fork,fcntl,dup,dup2,close -e signal=\!SIGVTALRM
./test >/dev/null
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
Process 11591 attached
[pid 11591] fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
[pid 11591] close(4)                    = 0
[pid 11591] close(4)                    = 0
[pid 11591] close(4)                    = 0
[pid 11590] fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource
temporarily unavailable)
test: setLock: resource exhausted (Resource temporarily unavailable)
[pid 11590] +++ exited with 1 +++
close(1)                                = 0
close(2)                                = 0
+++ exited with 0 +++

ghc -threaded -rtsopts --make test.hs; strace -f -e trace=fork,fcntl,dup,dup2,close -e
signal=\!SIGVTALRM ./tes[23/96461]
ll
Linking test ...
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
close(3)                                = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fcntl(5, F_GETFL)                       = 0x1 (flags O_WRONLY)
fcntl(5, F_SETFL, O_WRONLY|O_NONBLOCK)  = 0
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
Process 11610 attached
[pid 11609] fcntl(6, F_GETFL)           = 0x2 (flags O_RDWR)
[pid 11609] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 11609] fcntl(6, F_SETFD, FD_CLOEXEC) = 0
Process 11611 attached
Process 11612 attached
[pid 11612] close(3)                    = 0
[pid 11612] close(4)                    = 0
[pid 11612] close(5)                    = 0
[pid 11612] close(6)                    = 0
[pid 11612] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 11612] fcntl(5, F_GETFL)           = 0x1 (flags O_WRONLY)
[pid 11612] fcntl(5, F_SETFL, O_WRONLY|O_NONBLOCK) = 0
[pid 11612] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 11612] fcntl(5, F_SETFD, FD_CLOEXEC) = 0
Process 11613 attached
[pid 11612] fcntl(6, F_GETFL)           = 0x2 (flags O_RDWR)
[pid 11612] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 11612] fcntl(6, F_SETFD, FD_CLOEXEC) = 0
[pid 11612] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
[pid 11613] +++ exited with 0 +++
[pid 11612] close(3)                    = 0
[pid 11612] close(3)                    = 0
[pid 11612] close(3)                    = 0
[pid 11609] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
[pid 11612] close(1)                    = 0
[pid 11612] close(2)                    = 0
[pid 11612] +++ exited with 0 +++
[pid 11609] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=11612, si_status=0,
si_utime=0, si_stime=0} ---
[pid 11610] close(3)                    = 0
[pid 11610] close(4)                    = 0
[pid 11610] close(5)                    = 0
[pid 11610] close(6)                    = 0
[pid 11610] +++ exited with 0 +++
[pid 11611] +++ exited with 0 +++
+++ exited with 0 +++

At Fri, 25 Oct 2013 13:28:23 -0400,
Brandon Allbery wrote:
> 
> [1  <multipart/alternative (7bit)>]
> [1.1  <text/plain; UTF-8 (7bit)>]
> 
> [1.2  <text/html; UTF-8 (quoted-printable)>]
> On Fri, Oct 25, 2013 at 1:20 PM, Donn Cave <donn <at> avvanta.com> wrote:
> 
>     >
>     > On Fri, Oct 25, 2013 at 11:52 AM, Donn Cave <donn <at> avvanta.com> wrote:
>     >
>     >> But I don't see how executeFile could
>     >> make any difference, in that scenario.
>     >
>     > Look for fcntl(fd, FD_CLOEXEC, 1) calls?
>    
>     Oh, that would be heinous!
> 
> It would be because I got that completely wrong. fcntl(fd, F_SETFD, FD_CLOEXEC). sigh.
> 
> --
> brandon s allbery kf8nh                               sine nomine associates
> allbery.b <at> gmail.com                                  ballbery <at> sinenomine.net
> unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net
> 
> 
> [2  <text/plain; us-ascii (7bit)>]
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Brandon Allbery | 26 Oct 02:03 2013
Picon

Re: fcntl locks, executeFile and threaded runtime

On Fri, Oct 25, 2013 at 7:49 PM, diego souza <dsouza <at> c0d3.xxx> wrote:
Yeah, it was my first thought too, but I didn't see anything like this
in the strace output.

What I do see, though, are two additional forks when using -threaded
that seems that die early. This could very well explain why I'm
loosing the lock.

If this is Linux then you also want to track clone() calls. It's possible, depending on Linux kernel and/or glibc version, that you are seeing threads.

--
brandon s allbery kf8nh                               sine nomine associates
allbery.b <at> gmail.com                                  ballbery <at> sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
diego souza | 26 Oct 02:32 2013

Re: fcntl locks, executeFile and threaded runtime

Good catch! Tomorrow (I'm too sleepy to do this right now) I'm going
to try it out with different ghc versions as well.

I'll let you know about my findings.

Thanks!
~dsouza

> If this is Linux then you also want to track clone() calls. It's possible, depending on Linux kernel and/or glibc
> version, that you are seeing threads.
diego souza | 28 Oct 18:14 2013

Re: fcntl locks, executeFile and threaded runtime

Howdy,

I've tested the previous program with all versions down to 6.12.3 and
I've got the same results. Then I tried something different:

  main = do
    let lock = (WriteLock, AbsoluteSeek, 0, 0)
    fd <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True}
    setLock fd lock >> putStrLn "parent: locked!"

    pid <- forkProcess $ do
      setLock fd lock >> putStrLn "child: locked!"
      executeFile "/usr/bin/sleep" False ["5"] Nothing

    void $ getProcessStatus True False pid

Which, always works as it supposes to: child process always fail
to acquire the lock.

The following one is quite interesting, though. The moment I insert
the threadDelay function (like in the previous example), it fails some
times (it seems to have something to do with cpu idleness). I guess
this this explains why the previous version didn't work properly for
me:

  main = do
    let lock = (WriteLock, AbsoluteSeek, 0, 0)
    fd <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True}

    pid0 <- forkProcess $ do
      setLock fd lock >> putStrLn "child0: locked!"
      executeFile "/usr/bin/sleep" False ["5"] Nothing

    pid1 <- forkProcess $ do
      setLock fd lock >> putStrLn "child1: locked!"
      executeFile "/usr/bin/sleep" False ["5"] Nothing

    threadDelay $ 1 * 1000 * 1000 -- take out this line and everything works
    mapM_ (getProcessStatus True False) [pid1, pid2]

$ ghc -threaded -fforce-recomp --make -O2 ~/test; for _ in `seq 1 10`; do ~/test; echo; done;                                                           
[1 of 1] Compiling Main             ( /home/dsouza/test.hs, /home/dsouza/test.o )
Linking /home/dsouza/test ...
parent: locked!
test: setLock: resource exhausted (Resource temporarily unavailable)

parent: locked!
child: locked!

child: locked!
test: setLock: resource exhausted (Resource temporarily unavailable)

child: locked!
test: setLock: resource exhausted (Resource temporarily unavailable)

parent: locked!
test: setLock: resource exhausted (Resource temporarily unavailable)

parent: locked!
child: locked!

child: locked!
test: setLock: resource exhausted (Resource temporarily unavailable)

parent: locked!
child: locked!

parent: locked!
child: locked!

child: locked!
parent: locked!

Am I doing something wrong?

Thanks!
~dsouza
diego souza | 28 Oct 18:16 2013

Re: fcntl locks, executeFile and threaded runtime

Sorry, I've sent the wrong snippet. This is the correct one:

  main = do
    let lock = (WriteLock, AbsoluteSeek, 0, 0)
    fd <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True}

    pid1 <- forkProcess $ do
      setLock fd lock >> putStrLn "child: locked!"
      executeFile "/usr/bin/sleep" False ["5"] Nothing

    pid2 <- forkProcess $ do
      setLock fd lock >> putStrLn "parent: locked!"
      executeFile "/usr/bin/sleep" False ["5"] Nothing

    threadDelay $ 1 * 1000 * 1000
    mapM_ (getProcessStatus True False) [pid1, pid2]

~dsouza
Donn Cave | 28 Oct 19:13 2013

Re: fcntl locks, executeFile and threaded runtime

Quoth diego souza,
...
> The following one is quite interesting, though. The moment I insert
> the threadDelay function (like in the previous example), it fails some
> times (it seems to have something to do with cpu idleness). I guess
> this this explains why the previous version didn't work properly for
> me:

For me, this example kind is of ambiguous.  I thought it seemed clear enough
from your earlier results, that executeFile played some role in the problem,
but in this example the two locking forks are parallel, and it's entirely
possible that both lock syscalls will complete before either executeFile has
finished or even begun, so ... unless I'm missing something (again!) I guess
I would say this calls for a lot more tests to verify that you have this
problem only with executeFile, and not with, say, a Haskell fork that does
the same thing (sleep and exit.)

By the way, I haven't been able to duplicate the problem with 6.12.3 on MacOS.

	Donn
Diego Souza | 28 Oct 20:34 2013

Re: fcntl locks, executeFile and threaded runtime


​​
For me, this example kind is of ambiguous.  I thought it seemed clear enough
​​
from your earlier results, that executeFile played some role in the problem,
​​
but in this example the two locking forks are parallel, and it's entirely
​​
possible that both lock syscalls will complete before either executeFile has
​​
finished or even begun, so ... unless I'm missing something (again!) I guess
​​
I would say this calls for a lot more tests to verify that you have this
​​
problem only with executeFile, and not with, say, a Haskell fork that does
​​
the same thing (sleep and exit.)
​​

​​
By the way, I haven't been able to duplicate the problem with 6.12.3 on MacOS.
​​
I ruled out executeFile as creating the lock prior forking makes the problem vanish. So I though it 
must be something else, which led me to the second example. I don't think it is related with threadDelay, 
actually I'm thinking it must something to do with the threaded runtime.

I guess what you are missing is that fcntl locks are atomic and per process, so that:

  * forkIO or forkOS are no replacement for this (they don't create new processes, just threads);

  * setLock before executeFile is fine (fcntl locks are atomic), as long as the process does not terminates, or, sleep must be running so that the lock continues;

I found no evidence that sleep is terminating early (but I'll double check), and I'm certain that the two locks took place. But I can make these tests better.

I don't know why you can't reproduce this on MacOS. I have tried it in another linux machine (same architecture, different kernel, libc) and got pretty much the same results.

At any rate, if you have a better way to reproduce or rule out the problem, let me know. But I'll keep digging on this.

Posix locks are hard to work with. Flock is much better, which is what I'm using now.

Thanks!
~dsouza
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Donn Cave | 28 Oct 21:42 2013

Re: fcntl locks, executeFile and threaded runtime

quoth Diego Souza,
...
> I ruled out executeFile as creating the lock prior forking makes the
> problem vanish.

Well, yes, but you'd expect this if there's a problem with executeFile,
wouldn't you?  Because here both locks are attempted prior to executeFile,
so it's kind of out of the picture.

It might be interesting to use a `sleeplock' as below that accepts a FD
parameter and attempts to lock it, and exec that from your Haskell main program.
Then you can verify (I think) that if you use that in your initial configuration
(where the parent locks second), it will always work when the exec'd program
does the lock, but maybe fail when it's done prior to the exec (same program
with no FD parameter.)

> Posix locks are hard to work with. Flock is much better, which is what I'm
> using now

For sure, I'll go along with that.  The operation of posix file locks should
be down to the kernel/filesystem/etc., though, so it seems to me, if it's
GHC's fault, the runtime must be doing something different to this fd at the
syscall level.  That's a fairly narrow set of possibilities.  If we rule that
out, and it really is something about thread scheduling etc., then it would
have to be a Linux bug, wouldn't it?

	Donn
------------
import System.IO
import System.Posix.IO
import System.Posix.Files
import System.Environment (getArgs)
import System.Posix.Unistd (sleep)

possiblyLock [] = return ()
possiblyLock (a:_) = do
      setLock (read a) (WriteLock, AbsoluteSeek, 0, 0)
      putStrLn "exec lock OK"

main = do
      args <- getArgs
      possiblyLock (tail args)
      sleep (read (head args))
      putStrLn "child waking up!"
diego souza | 29 Oct 19:52 2013

Re: fcntl locks, executeFile and threaded runtime


> Well, yes, but you'd expect this if there's a problem with executeFile,
> wouldn't you?  Because here both locks are attempted prior to executeFile,
> so it's kind of out of the picture.

Right, it does make a lot sense. :-)

> It might be interesting to use a `sleeplock' as below that accepts a FD
> parameter and attempts to lock it, and exec that from your Haskell main program.
> Then you can verify (I think) that if you use that in your initial configuration
> (where the parent locks second), it will always work when the exec'd program
> does the lock, but maybe fail when it's done prior to the exec (same program
> with no FD parameter.)

Locking from the child *always* works, so I'll hope you guys trust me
and I'm not including any code.

I've done two versions of the same program, a simplified one. One in C
and another in haskell. They just 'forkProcess', 'setLock' and
'execFile'. I have tested the haskell version in three different
linuxs systems and one macosx box:

  * ubuntu (kernel 3.11/ghc 7.6.3)
  * archlinux (kernel 3.11/ghc 7.6.3);
  * debian (kernel 3.9/ghc 7.4.1);
  * macos mavericks/ghc 7.6.3;

I can't reproduce the bug on macos. It never fails.

The c version (attached file c_fcntl.c) using the test_fcntl.sh (also
attached), as expected, never fails:

$ gcc -o c_fcntl c_fcntl.c
$ sh test_fcntl.sh ./c_fcntl 2>/dev/null
./c_fcntl: ok: 1; fail: 0
./c_fcntl: ok: 2; fail: 0
./c_fcntl: ok: 3; fail: 0
./c_fcntl: ok: 4; fail: 0
./c_fcntl: ok: 5; fail: 0
./c_fcntl: ok: 6; fail: 0
./c_fcntl: ok: 7; fail: 0
./c_fcntl: ok: 8; fail: 0
./c_fcntl: ok: 9; fail: 0
./c_fcntl: ok: 10; fail: 0
./c_fcntl: ok: 11; fail: 0

Now the haskell version (attached file hs_fcntl.hs), I'm including
only one output, but it is pretty much the same on all machines:

$ ghc -threaded hs_fcntl.hs
$ sh test_fcntl.sh ./hs_fcntl 2>/dev/null
./hs_fcntl: ok: 0; fail: 1
./hs_fcntl: ok: 0; fail: 2
./hs_fcntl: ok: 0; fail: 3
./hs_fcntl: ok: 0; fail: 4
./hs_fcntl: ok: 0; fail: 5
./hs_fcntl: ok: 0; fail: 6
./hs_fcntl: ok: 0; fail: 7
./hs_fcntl: ok: 0; fail: 8
./hs_fcntl: ok: 0; fail: 9
./hs_fcntl: ok: 0; fail: 10
./hs_fcntl: ok: 0; fail: 11

Now this outcome seem pretty easy to reproduce on linux systems, I
guess. If you guys could try it on some linux machine I would
appreciate [or let me know if you don't think this is a valid test].

Regarding the previous program, specially the on that did two
'forkProcess', I could see something from the strace output, but I
can't relate to anything documented or that I'm aware of.

On my machine, it seems that if the 'execve' happens before the second
'fcntl' the lock fails (as must be happening in the hs_fcntl.hs). I
have tried a number of times and it really seems to be the case. It
only works when the two 'fcntl' happens before any 'execve'. But I
can't say for sure this is the case in other systems.

The only thing that happens during an 'execve' that comes to mind is
that it kills all threads but the current one. And I do see this
happening on the trace output.

But I guess this should make no difference (unless it seems it does).

Well, right now I'm pretty much without pointers. :-)

Thanks!
~dsouza

Attachment (hs_fcntl.hs): application/octet-stream, 937 bytes

Attachment (c_fcntl.c): application/octet-stream, 1123 bytes

Attachment (test_fcntl.sh): application/octet-stream, 385 bytes

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
diego souza | 29 Oct 20:11 2013

Re: fcntl locks, executeFile and threaded runtime

And for the record, the non-threaded runtime never fails (at least
I've never seen once a failure):

$ ghc hs_fcntl.hs
$ sh test_fcntl.sh ./hs_fcntl 2>/dev/null
./hs_fcntl: ok: 1; fail: 0
./hs_fcntl: ok: 2; fail: 0
./hs_fcntl: ok: 3; fail: 0
./hs_fcntl: ok: 4; fail: 0
./hs_fcntl: ok: 5; fail: 0
./hs_fcntl: ok: 6; fail: 0
./hs_fcntl: ok: 7; fail: 0
./hs_fcntl: ok: 8; fail: 0
./hs_fcntl: ok: 9; fail: 0
./hs_fcntl: ok: 10; fail: 0
./hs_fcntl: ok: 11; fail: 0

Thanks!
~dsouza
Donn Cave | 29 Oct 21:37 2013

Re: fcntl locks, executeFile and threaded runtime

Quoth diego souza,
...
> The only thing that happens during an 'execve' that comes to mind is
> that it kills all threads but the current one. And I do see this
> happening on the trace output.
> 
> But I guess this should make no difference (unless it seems it does).

It could, if 1) with Linux "clone" threads, the file lock is a property
of only the thread that acquired the lock, and 2) the thread that survives
execve is not that one.

	Donn
Donn Cave | 30 Oct 15:35 2013

Re: fcntl locks, executeFile and threaded runtime

While I'm grasping at straws, might as well mention that the
threaded runtime uses signals, lots of signals, and that can
break things that are interruptible and haven't been adequately
signal-proofed.  For example, earlier in this exchange I included
a short program that uses System.Posix.Unistd.sleep, and on
MacOS anyway, that breaks with -threaded -- the sleep doesn't
sleep for any appreciable time before it gets interrupted by
the flood of runtime ALRM signals.

I can't account for any obvious reason why signal interrupts
could cause the present problem, but it's easy enough to test
if you're curious - just compile with -rtsopts, and pass the
-V0 flag to the runtime.  (E.e.g., ./a.out +RTS -V0 -RTS,
GHCRTS=-V0 ./a.out, ...)

	Donn
Donn Cave | 26 Oct 17:38 2013

Re: fcntl locks, executeFile and threaded runtime

Question about the syscall trace -- in the second, threaded version,

> [pid 11612] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
...
> [pid 11612] +++ exited with 0 +++

?? In the not threaded version, I don't see the child process exit - and
wasn't expecting to, since it's supposed to have exec'd to /usr/bin/sleep.

	Donn
------------
> ghc -rtsopts --make test.hs; strace -f -e trace=fork,fcntl,dup,dup2,close -e signal=\!SIGVTALRM
./test >/dev/null
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> Process 11591 attached
> [pid 11591] fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
> [pid 11591] close(4)                    = 0
> [pid 11591] close(4)                    = 0
> [pid 11591] close(4)                    = 0
> [pid 11590] fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource
temporarily unavailable)
> test: setLock: resource exhausted (Resource temporarily unavailable)
> [pid 11590] +++ exited with 1 +++
> close(1)                                = 0
> close(2)                                = 0
> +++ exited with 0 +++
> 
> ghc -threaded -rtsopts --make test.hs; strace -f -e trace=fork,fcntl,dup,dup2,close -e
signal=\!SIGVTALRM ./tes[23/96461]
> ll
> Linking test ...
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> close(3)                                = 0
> fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
> fcntl(5, F_GETFL)                       = 0x1 (flags O_WRONLY)
> fcntl(5, F_SETFL, O_WRONLY|O_NONBLOCK)  = 0
> fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
> fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
> Process 11610 attached
> [pid 11609] fcntl(6, F_GETFL)           = 0x2 (flags O_RDWR)
> [pid 11609] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> [pid 11609] fcntl(6, F_SETFD, FD_CLOEXEC) = 0
> Process 11611 attached
> Process 11612 attached
> [pid 11612] close(3)                    = 0
> [pid 11612] close(4)                    = 0
> [pid 11612] close(5)                    = 0
> [pid 11612] close(6)                    = 0
> [pid 11612] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
> [pid 11612] fcntl(5, F_GETFL)           = 0x1 (flags O_WRONLY)
> [pid 11612] fcntl(5, F_SETFL, O_WRONLY|O_NONBLOCK) = 0
> [pid 11612] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
> [pid 11612] fcntl(5, F_SETFD, FD_CLOEXEC) = 0
> Process 11613 attached
> [pid 11612] fcntl(6, F_GETFL)           = 0x2 (flags O_RDWR)
> [pid 11612] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> [pid 11612] fcntl(6, F_SETFD, FD_CLOEXEC) = 0
> [pid 11612] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
> [pid 11613] +++ exited with 0 +++
> [pid 11612] close(3)                    = 0
> [pid 11612] close(3)                    = 0
> [pid 11612] close(3)                    = 0
> [pid 11609] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
> [pid 11612] close(1)                    = 0
> [pid 11612] close(2)                    = 0
> [pid 11612] +++ exited with 0 +++
> [pid 11609] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=11612, si_status=0,
si_utime=0, si_stime=0} ---
> [pid 11610] close(3)                    = 0
> [pid 11610] close(4)                    = 0
> [pid 11610] close(5)                    = 0
> [pid 11610] close(6)                    = 0
> [pid 11610] +++ exited with 0 +++
> [pid 11611] +++ exited with 0 +++
> +++ exited with 0 +++
> 
> At Fri, 25 Oct 2013 13:28:23 -0400,
> Brandon Allbery wrote:
> > 
> > [1  <multipart/alternative (7bit)>]
> > [1.1  <text/plain; UTF-8 (7bit)>]
> > 
> > [1.2  <text/html; UTF-8 (quoted-printable)>]
> > On Fri, Oct 25, 2013 at 1:20 PM, Donn Cave <donn <at> avvanta.com> wrote:
> > 
> >     >
> >     > On Fri, Oct 25, 2013 at 11:52 AM, Donn Cave <donn <at> avvanta.com> wrote:
> >     >
> >     >> But I don't see how executeFile could
> >     >> make any difference, in that scenario.
> >     >
> >     > Look for fcntl(fd, FD_CLOEXEC, 1) calls?
> >    
> >     Oh, that would be heinous!
> > 
> > It would be because I got that completely wrong. fcntl(fd, F_SETFD, FD_CLOEXEC). sigh.
> > 
> > --
> > brandon s allbery kf8nh                sine nomine associates
> > allbery.b <at> gmail.com                 Âballbery <at> sinenomine.net
> > unix, openafs, kerberos, infrastructure, xmonad    Âhttp://sinenomine.net
> > 
> > 
> > [2  <text/plain; us-ascii (7bit)>]
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe <at> haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> 
> 
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Brandon Allbery | 26 Oct 17:58 2013
Picon

Re: fcntl locks, executeFile and threaded runtime

On Sat, Oct 26, 2013 at 11:38 AM, Donn Cave <donn <at> avvanta.com> wrote:
Question about the syscall trace -- in the second, threaded version,

> [pid 11612] fcntl(7, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
...
> [pid 11612] +++ exited with 0 +++

?? In the not threaded version, I don't see the child process exit - and
wasn't expecting to, since it's supposed to have exec'd to /usr/bin/sleep.

You don't see it in the non-threaded one because the parent throws an exception while setting the lock and exits first. The sleep is only 5 seconds, so the second one reaches waitForProcess and collects it (note that the "parent: fail!" is a putStrLn, not an error) before exiting.

--
brandon s allbery kf8nh                               sine nomine associates
allbery.b <at> gmail.com                                  ballbery <at> sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
diego souza | 26 Oct 01:37 2013

Re: fcntl locks, executeFile and threaded runtime

Hi Donn,

Thanks for your response. Here is more info, first what I'm trying to
accomplish.

The program I'm writing is a kind of supervisor (like daemontools),
something to daemonize a process then report the state on zookeeper
which is why I need the threaded runtime.

That said, I'm aware of the limitations. But for this particular case
I believe it is going to work. I was using the posix locks only to
test whether or not the program I'm supervising is still alive, as
they have the nice feature of returning the pid that actually holds
the lock.

Before writing to haskell-cafe <at>  I've been "stracing" this program
trying to make some sense of it, with no luck. I'm could attach the
results, but there is nothing there that could explain this
behavior. Well, at least I could not figure it out.

The reason that makes me believe that executeFile is the culpirt is
that without that line it works. If you take that out and use
'threadDelay' for instance, the problem is gone.

Ah, and for the mean time I'm using flock. This has no problems, which
makes sense as the locks is per file handle not per process as the
posix ones.

~dsouza

At Fri, 25 Oct 2013 08:52:38 -0700 (PDT),
Donn Cave wrote:
> 
> diego souza <dsouza <at> c0d3.xxx>,
> 
> > I'm having problems with executeFile as it seems to clear the advisory
> > locks using the threaded runtime.
> 
> I'm stumped, and unfortunately can't duplicate it here (no surprise as
> I have a different platform and GHC version.)  But in case it helps ...
> your fcntl(2) file lock will be lost if your process closes any fd open
> on that file.  So if the threaded runtime for some reason were to dup
> random fds and then close them, around a fork, that would do it.  You
> might be able to pick that up in an strace (or whatever your platform
> utility for system call tracing.) But I don't see how executeFile could
> make any difference, in that scenario.
> 
> 	Donn
> 
> 
> > Consider the following snippet (a simplification of what I'm doing):
> > 
> >     import System.IO
> >     import Control.Monad
> >     import System.Posix.IO
> >     import Control.Concurrent
> >     import System.Posix.Files
> >     import System.Posix.Process
> >      
> >     main = do
> >       let lock = (WriteLock, AbsoluteSeek, 0, 0)
> >       fd  <- openFd "/tmp/foobar" ReadWrite (Just stdFileMode) defaultFileFlags {trunc=True}
> >       pid <- forkProcess $ do
> >         setLock fd lock >> putStrLn "child: ok"
> >         executeFile "/usr/bin/sleep" False ["5"] Nothing
> >       threadDelay $ 1 * 1000 * 1000
> >       setLock fd lock >> putStrLn "parent: fail!"
> >       void $ getProcessStatus True False pid
> > 
> > Then I consistentlty get these results:
> > 
> > $ ghc -threaded --make test.hs; ./test
> > child: ok
> > parent: fail!
> > 
> > $ ghc -rtsopts --make test.hs; ./test
> > child: ok
> > test: setLock: resource exhausted (Resource temporarily unavailable)
> > 
> > Any pointers? At first I though it might be an issue with the unix
> > package but that doesn't seem to be the case.
> > 
> > $ ghc-pkg list | grep unix
> >     unix-2.6.0.1
> > 
> > $ ./test +RTS --info
> >  [("GHC RTS", "YES")
> >  ,("GHC version", "7.6.3")
> >  ,("RTS way", "rts_thr")
> >  ,("Build platform", "x86_64-unknown-linux")
> >  ,("Build architecture", "x86_64")
> >  ,("Build OS", "linux")
> >  ,("Build vendor", "unknown")
> >  ,("Host platform", "x86_64-unknown-linux")
> >  ,("Host architecture", "x86_64")
> >  ,("Host OS", "linux")
> >  ,("Host vendor", "unknown")
> >  ,("Target platform", "x86_64-unknown-linux")
> >  ,("Target architecture", "x86_64")
> >  ,("Target OS", "linux")
> >  ,("Target vendor", "unknown")
> >  ,("Word size", "64")
> >  ,("Compiler unregisterised", "NO")
> >  ,("Tables next to code", "YES")
> >  ]
> > 
> > Thanks!
> > ~dsouza
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe <at> haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> > 
> > 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane