John Lato | 20 Dec 02:36 2013
Picon

memory ordering

Hello,

I'm working on a lock-free algorithm that's meant to be used in a concurrent setting, and I've run into a possible issue.

The crux of the matter is that a particular function needs to perform the following:

> x <- MVector.read vec ix
> position <- readIORef posRef

and the algorithm is only safe if these two reads are not reordered (both the vector and IORef are written to by other threads).

My concern is, according to standard Haskell semantics this should be safe, as IO sequencing should guarantee that the reads happen in-order.  Of course this also relies upon the architecture's memory model, but x86 also guarantees that reads happen in order.  However doubts remain; I do not have confidence that the code generator will handle this properly.  In particular, LLVM may freely re-order loads of NotAtomic and Unordered values.

The one hope I have is that ghc will preserve IO semantics through the entire pipeline.  This seems like it would be necessary for proper handling of exceptions, for example.  So, can anyone tell me if my worries are unfounded, or if there's any way to ensure the behavior I want?  I could change the readIORef to an atomicModifyIORef, which should issue an mfence, but that seems a bit heavy-handed as just a read fence would be sufficient (although even that seems more than necessary).

Thanks,
John L.
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Picon
Picon

Re: memory ordering

Hi John,

I guess you probably want to "pseq x". See below for an example. Since your 2nd
action does not depend on your 1st.

Gruss,
Christian

import Debug.Trace
import GHC.Conc

main = do
  x <- return (traceShow "1" $ 1::Int)
  -- x `pseq` print (2::Int)
  print (2::Int)
  print x

* John Lato <jwlato <at> gmail.com> [20.12.2013 02:36]:
>    Hello,
> 
>    I'm working on a lock-free algorithm that's meant to be used in a
>    concurrent setting, and I've run into a possible issue.
> 
>    The crux of the matter is that a particular function needs to perform the
>    following:
> 
>    > x <- MVector.read vec ix
>    > position <- readIORef posRef
> 
>    and the algorithm is only safe if these two reads are not reordered (both
>    the vector and IORef are written to by other threads).
> 
>    My concern is, according to standard Haskell semantics this should be
>    safe, as IO sequencing should guarantee that the reads happen in-order. 
>    Of course this also relies upon the architecture's memory model, but x86
>    also guarantees that reads happen in order.  However doubts remain; I do
>    not have confidence that the code generator will handle this properly.  In
>    particular, LLVM may freely re-order loads of NotAtomic and Unordered
>    values.
> 
>    The one hope I have is that ghc will preserve IO semantics through the
>    entire pipeline.  This seems like it would be necessary for proper
>    handling of exceptions, for example.  So, can anyone tell me if my worries
>    are unfounded, or if there's any way to ensure the behavior I want?  I
>    could change the readIORef to an atomicModifyIORef, which should issue an
>    mfence, but that seems a bit heavy-handed as just a read fence would be
>    sufficient (although even that seems more than necessary).
> 
>    Thanks,
>    John L.

> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users <at> haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Carter Schonwald | 20 Dec 18:05 2013
Picon

Re: memory ordering

Hey John, so you're wanting atomic reads and writes? 

I'm pretty sure that you want to use atomic memory operations for this.  I believe Ryan Newton has some tooling you can use right now for that.


On Fri, Dec 20, 2013 at 3:57 AM, Christian Höner zu Siederdissen <choener <at> tbi.univie.ac.at> wrote:
Hi John,

I guess you probably want to "pseq x". See below for an example. Since your 2nd
action does not depend on your 1st.

Gruss,
Christian


import Debug.Trace
import GHC.Conc

main = do
  x <- return (traceShow "1" $ 1::Int)
  -- x `pseq` print (2::Int)
  print (2::Int)
  print x


* John Lato <jwlato <at> gmail.com> [20.12.2013 02:36]:
>    Hello,
>
>    I'm working on a lock-free algorithm that's meant to be used in a
>    concurrent setting, and I've run into a possible issue.
>
>    The crux of the matter is that a particular function needs to perform the
>    following:
>
>    > x <- MVector.read vec ix
>    > position <- readIORef posRef
>
>    and the algorithm is only safe if these two reads are not reordered (both
>    the vector and IORef are written to by other threads).
>
>    My concern is, according to standard Haskell semantics this should be
>    safe, as IO sequencing should guarantee that the reads happen in-order.
>    Of course this also relies upon the architecture's memory model, but x86
>    also guarantees that reads happen in order.  However doubts remain; I do
>    not have confidence that the code generator will handle this properly.  In
>    particular, LLVM may freely re-order loads of NotAtomic and Unordered
>    values.
>
>    The one hope I have is that ghc will preserve IO semantics through the
>    entire pipeline.  This seems like it would be necessary for proper
>    handling of exceptions, for example.  So, can anyone tell me if my worries
>    are unfounded, or if there's any way to ensure the behavior I want?  I
>    could change the readIORef to an atomicModifyIORef, which should issue an
>    mfence, but that seems a bit heavy-handed as just a read fence would be
>    sufficient (although even that seems more than necessary).
>
>    Thanks,
>    John L.

> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users <at> haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
John Lato | 23 Dec 00:33 2013
Picon

Re: memory ordering

Hi Carter,

Atomics are more or less what I'm after.

I've taken a look at Ryan's work (https://github.com/rrnewton/haskell-lockfree/), and there are certainly some very useful items there, but I don't see anything quite like what I'm looking for (except perhaps for a read barrier).

The problem is that I don't actually need atomic operations for this.  I'm just doing reads after all.  My concern is that many optimization pipelines (i.e. LLVM) don't guarantee ordering of reads unless you use atomic variables.

The IORef docs warn that IORef operations may appear out-of-order depending on the architecture's memory model.  On (newer) x86, loads won't move relative to other loads, so that should be ok, and Haskell's semantics should guarantee that two IO operations will happen in program order.

It's the Haskell semantics guarantee I'm concerned about; I guess I'm not entirely sure I believe that it's implemented properly (although I have no reason to believe it's wrong either).  Perhaps I'm just overly paranoid.

John Lato

On Fri, Dec 20, 2013 at 9:05 AM, Carter Schonwald <carter.schonwald <at> gmail.com> wrote:
Hey John, so you're wanting atomic reads and writes? 

I'm pretty sure that you want to use atomic memory operations for this.  I believe Ryan Newton has some tooling you can use right now for that.


On Fri, Dec 20, 2013 at 3:57 AM, Christian Höner zu Siederdissen <choener <at> tbi.univie.ac.at> wrote:
Hi John,

I guess you probably want to "pseq x". See below for an example. Since your 2nd
action does not depend on your 1st.

Gruss,
Christian


import Debug.Trace
import GHC.Conc

main = do
  x <- return (traceShow "1" $ 1::Int)
  -- x `pseq` print (2::Int)
  print (2::Int)
  print x


* John Lato <jwlato <at> gmail.com> [20.12.2013 02:36]:
>    Hello,
>
>    I'm working on a lock-free algorithm that's meant to be used in a
>    concurrent setting, and I've run into a possible issue.
>
>    The crux of the matter is that a particular function needs to perform the
>    following:
>
>    > x <- MVector.read vec ix
>    > position <- readIORef posRef
>
>    and the algorithm is only safe if these two reads are not reordered (both
>    the vector and IORef are written to by other threads).
>
>    My concern is, according to standard Haskell semantics this should be
>    safe, as IO sequencing should guarantee that the reads happen in-order.
>    Of course this also relies upon the architecture's memory model, but x86
>    also guarantees that reads happen in order.  However doubts remain; I do
>    not have confidence that the code generator will handle this properly.  In
>    particular, LLVM may freely re-order loads of NotAtomic and Unordered
>    values.
>
>    The one hope I have is that ghc will preserve IO semantics through the
>    entire pipeline.  This seems like it would be necessary for proper
>    handling of exceptions, for example.  So, can anyone tell me if my worries
>    are unfounded, or if there's any way to ensure the behavior I want?  I
>    could change the readIORef to an atomicModifyIORef, which should issue an
>    mfence, but that seems a bit heavy-handed as just a read fence would be
>    sufficient (although even that seems more than necessary).
>
>    Thanks,
>    John L.

> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users <at> haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Carter Schonwald | 23 Dec 04:58 2013
Picon

Re: memory ordering

Well if you hit any weird / surprising / none obvious issues with the memory ordering model, please share! It's also very relevant to some of the ghc hacking I hope to make time for January onwards.  

On Sunday, December 22, 2013, John Lato wrote:

Hi Carter,

Atomics are more or less what I'm after.

I've taken a look at Ryan's work (https://github.com/rrnewton/haskell-lockfree/), and there are certainly some very useful items there, but I don't see anything quite like what I'm looking for (except perhaps for a read barrier).

The problem is that I don't actually need atomic operations for this.  I'm just doing reads after all.  My concern is that many optimization pipelines (i.e. LLVM) don't guarantee ordering of reads unless you use atomic variables.

The IORef docs warn that IORef operations may appear out-of-order depending on the architecture's memory model.  On (newer) x86, loads won't move relative to other loads, so that should be ok, and Haskell's semantics should guarantee that two IO operations will happen in program order.

It's the Haskell semantics guarantee I'm concerned about; I guess I'm not entirely sure I believe that it's implemented properly (although I have no reason to believe it's wrong either).  Perhaps I'm just overly paranoid.

John Lato

On Fri, Dec 20, 2013 at 9:05 AM, Carter Schonwald <carter.schonwald <at> gmail.com> wrote:
Hey John, so you're wanting atomic reads and writes? 

I'm pretty sure that you want to use atomic memory operations for this.  I believe Ryan Newton has some tooling you can use right now for that.


On Fri, Dec 20, 2013 at 3:57 AM, Christian Höner zu Siederdissen <choener <at> tbi.univie.ac.at> wrote:
Hi John,

I guess you probably want to "pseq x". See below for an example. Since your 2nd
action does not depend on your 1st.

Gruss,
Christian


import Debug.Trace
import GHC.Conc

main = do
  x <- return (traceShow "1" $ 1::Int)
  -- x `pseq` print (2::Int)
  print (2::Int)
  print x


* John Lato <jwlato <at> gmail.com> [20.12.2013 02:36]:
>    Hello,
>
>    I'm working on a lock-free algorithm that's meant to be used in a
>    concurrent setting, and I've run into a possible issue.
>
>    The crux of the matter is that a particular function needs to perform the
>    following:
>
>    > x <- MVector.read vec ix
>    > position <- readIORef posRef
>
>    and the algorithm is only safe if these two reads are not reordered (both
>    the vector and IORef are written to by other threads).
>
>    My concern is, according to standard Haskell semantics this should be
>    safe, as IO sequencing should guarantee that the reads happen in-order.
>    Of course this also relies upon the architecture's memory model, but x86
>    also guarantees that reads happen in order.  However doubts remain; I do
>    not have confidence that the code generator will handle this properly.  In
>    particular, LLVM may freely re-order loads of NotAtomic and Unordered
>    values.
>
>    The one hope I have is that ghc will preserve IO semantics through the
>    entire pipeline.  This seems like it would be necessary for proper
>    handling of exceptions, for example.  So, can anyone tell me if my worries
>    are unfounded, or if there's any way to ensure the behavior I want?  I
>    could change the readIORef to an atomicModifyIORef, which should issue an
>    mfence, but that seems a bit heavy-handed as just a read fence would be
>    sufficient (although even that seems more than necessary).
>
>    Thanks,
>    John L.

> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users <at> haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Edward Z. Yang | 30 Dec 15:04 2013
Picon

Re: memory ordering

Hello John,

Here are some prior discussions (which I will attempt to summarize
below):

    http://www.haskell.org/pipermail/haskell-cafe/2011-May/091878.html
    http://www.haskell.org/pipermail/haskell-prime/2006-April/001237.html
    http://www.haskell.org/pipermail/haskell-prime/2006-March/001079.html

The guarantees that Haskell and GHC give in this area are hand-wavy at
best; at the moment, I don't think Haskell or GHC have a formal memory
model—this seems to be an open research problem. (Unfortunately, AFAICT
all the researchers working on relaxed memory models have their hands
full with things like C++ :-)

If you want to go ahead and build something that /just/ works for a
/specific version/ of GHC, you will need to answer this question
separately for every phase of the compiler.  For Core and STG, monads
will preserve ordering, so there is no trouble.  However, for C--, we
will almost certainly apply optimizations which reorder reads (look at
CmmSink.hs).  To properly support your algorithm, you will have to add
some new read barrier mach-ops, and teach the optimizer to respect them.
(This could be fiendishly subtle; it might be better to give C-- a
memory model first.)  These mach-ops would then translate into
appropriate arch-specific assembly or LLVM instructions, preserving
the guarantees further.

This is not related to your original question, but the situation is a
bit better with regards to reordering stores: we have a WriteBarrier
MachOp, which in principle, prevents store reordering.  In practice, we
don't seem to actually have any C-- optimizations that reorder stores.
So, at least you can assume these will work OK!

Hope this helps (and is not too inaccurate),
Edward

Excerpts from John Lato's message of 2013-12-20 09:36:11 +0800:
> Hello,
> 
> I'm working on a lock-free algorithm that's meant to be used in a
> concurrent setting, and I've run into a possible issue.
> 
> The crux of the matter is that a particular function needs to perform the
> following:
> 
> > x <- MVector.read vec ix
> > position <- readIORef posRef
> 
> and the algorithm is only safe if these two reads are not reordered (both
> the vector and IORef are written to by other threads).
> 
> My concern is, according to standard Haskell semantics this should be safe,
> as IO sequencing should guarantee that the reads happen in-order.  Of
> course this also relies upon the architecture's memory model, but x86 also
> guarantees that reads happen in order.  However doubts remain; I do not
> have confidence that the code generator will handle this properly.  In
> particular, LLVM may freely re-order loads of NotAtomic and Unordered
> values.
> 
> The one hope I have is that ghc will preserve IO semantics through the
> entire pipeline.  This seems like it would be necessary for proper handling
> of exceptions, for example.  So, can anyone tell me if my worries are
> unfounded, or if there's any way to ensure the behavior I want?  I could
> change the readIORef to an atomicModifyIORef, which should issue an mfence,
> but that seems a bit heavy-handed as just a read fence would be sufficient
> (although even that seems more than necessary).
> 
> Thanks,
> John L.
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
John Lato | 30 Dec 20:01 2013
Picon

Re: memory ordering

Hi Edward,

Thanks very much for this reply, it answers a lot of questions I'd had.  I'd hoped that ordering would be preserved through C--, but c'est la vie.  Optimizing compilers are ever the bane of concurrent algorithms!

stg/SMP.h does define a loadLoadBarrier, which is exposed in Ryan Newton's atomic-primops package.  From the docs, I think that's a general read barrier, and should do what I want.  Assuming it works properly, of course.  If I'm lucky it might even be optimized out.

Thanks,
John


On Mon, Dec 30, 2013 at 6:04 AM, Edward Z. Yang <ezyang <at> mit.edu> wrote:
Hello John,

Here are some prior discussions (which I will attempt to summarize
below):

    http://www.haskell.org/pipermail/haskell-cafe/2011-May/091878.html
    http://www.haskell.org/pipermail/haskell-prime/2006-April/001237.html
    http://www.haskell.org/pipermail/haskell-prime/2006-March/001079.html

The guarantees that Haskell and GHC give in this area are hand-wavy at
best; at the moment, I don't think Haskell or GHC have a formal memory
model—this seems to be an open research problem. (Unfortunately, AFAICT
all the researchers working on relaxed memory models have their hands
full with things like C++ :-)

If you want to go ahead and build something that /just/ works for a
/specific version/ of GHC, you will need to answer this question
separately for every phase of the compiler.  For Core and STG, monads
will preserve ordering, so there is no trouble.  However, for C--, we
will almost certainly apply optimizations which reorder reads (look at
CmmSink.hs).  To properly support your algorithm, you will have to add
some new read barrier mach-ops, and teach the optimizer to respect them.
(This could be fiendishly subtle; it might be better to give C-- a
memory model first.)  These mach-ops would then translate into
appropriate arch-specific assembly or LLVM instructions, preserving
the guarantees further.

This is not related to your original question, but the situation is a
bit better with regards to reordering stores: we have a WriteBarrier
MachOp, which in principle, prevents store reordering.  In practice, we
don't seem to actually have any C-- optimizations that reorder stores.
So, at least you can assume these will work OK!

Hope this helps (and is not too inaccurate),
Edward

Excerpts from John Lato's message of 2013-12-20 09:36:11 +0800:
> Hello,
>
> I'm working on a lock-free algorithm that's meant to be used in a
> concurrent setting, and I've run into a possible issue.
>
> The crux of the matter is that a particular function needs to perform the
> following:
>
> > x <- MVector.read vec ix
> > position <- readIORef posRef
>
> and the algorithm is only safe if these two reads are not reordered (both
> the vector and IORef are written to by other threads).
>
> My concern is, according to standard Haskell semantics this should be safe,
> as IO sequencing should guarantee that the reads happen in-order.  Of
> course this also relies upon the architecture's memory model, but x86 also
> guarantees that reads happen in order.  However doubts remain; I do not
> have confidence that the code generator will handle this properly.  In
> particular, LLVM may freely re-order loads of NotAtomic and Unordered
> values.
>
> The one hope I have is that ghc will preserve IO semantics through the
> entire pipeline.  This seems like it would be necessary for proper handling
> of exceptions, for example.  So, can anyone tell me if my worries are
> unfounded, or if there's any way to ensure the behavior I want?  I could
> change the readIORef to an atomicModifyIORef, which should issue an mfence,
> but that seems a bit heavy-handed as just a read fence would be sufficient
> (although even that seems more than necessary).
>
> Thanks,
> John L.

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Edward Z. Yang | 31 Dec 15:04 2013
Picon

Re: memory ordering

I was thinking about my response, and realized there was one major
misleading thing in my description.  The load reordering I described
applies to load instructions in C-- proper, i.e. things that show up
in the C-- dup as:

    W_ x = I64[...addr...]

Reads to IORefs and reads to vectors get compiled inline (as they
eventually translate into inline primops), so my admonitions are
applicable.

However, the story with *foreign primops* (which is how loadLoadBarrier
in atomic-primops is defined, how you might imagine defining a custom
read function as a primop) is a little different.  First, what does a
call to an foreign primop compile into? It is *not* inlined, so it will
eventually get compiled into a jump (this could be a problem if you're
really trying to squeeze out performance!)  Second, the optimizer is a
bit more conservative when it comes to primop calls (internally referred
to as "unsafe foreign calls"); at the moment, the optimizer assumes
these foreign calls clobber heap memory, so we *automatically* will not
push loads/stores beyond this boundary. (NB: We reserve the right to
change this in the future!)

This is probably why atomic-primops, as it is written today, seems to
work OK, even in the presence of the optimizer.  But I also have a hard
time believing it gives the speedups you want, due to the current
design. (CC'd Ryan Newton, because I would love to be wrong here, and
maybe he can correct me on this note.)

Cheers,
Edward

P.S. loadLoadBarrier compiles to a no-op on x86 architectures, but
because it's not inlined I think you will still end up with a jump (LLVM
might be able to eliminate it).

Excerpts from John Lato's message of 2013-12-31 03:01:58 +0800:
> Hi Edward,
> 
> Thanks very much for this reply, it answers a lot of questions I'd had.
>  I'd hoped that ordering would be preserved through C--, but c'est la vie.
>  Optimizing compilers are ever the bane of concurrent algorithms!
> 
> stg/SMP.h does define a loadLoadBarrier, which is exposed in Ryan Newton's
> atomic-primops package.  From the docs, I think that's a general read
> barrier, and should do what I want.  Assuming it works properly, of course.
>  If I'm lucky it might even be optimized out.
> 
> Thanks,
> John
> 
> On Mon, Dec 30, 2013 at 6:04 AM, Edward Z. Yang <ezyang <at> mit.edu> wrote:
> 
> > Hello John,
> >
> > Here are some prior discussions (which I will attempt to summarize
> > below):
> >
> >     http://www.haskell.org/pipermail/haskell-cafe/2011-May/091878.html
> >     http://www.haskell.org/pipermail/haskell-prime/2006-April/001237.html
> >     http://www.haskell.org/pipermail/haskell-prime/2006-March/001079.html
> >
> > The guarantees that Haskell and GHC give in this area are hand-wavy at
> > best; at the moment, I don't think Haskell or GHC have a formal memory
> > model—this seems to be an open research problem. (Unfortunately, AFAICT
> > all the researchers working on relaxed memory models have their hands
> > full with things like C++ :-)
> >
> > If you want to go ahead and build something that /just/ works for a
> > /specific version/ of GHC, you will need to answer this question
> > separately for every phase of the compiler.  For Core and STG, monads
> > will preserve ordering, so there is no trouble.  However, for C--, we
> > will almost certainly apply optimizations which reorder reads (look at
> > CmmSink.hs).  To properly support your algorithm, you will have to add
> > some new read barrier mach-ops, and teach the optimizer to respect them.
> > (This could be fiendishly subtle; it might be better to give C-- a
> > memory model first.)  These mach-ops would then translate into
> > appropriate arch-specific assembly or LLVM instructions, preserving
> > the guarantees further.
> >
> > This is not related to your original question, but the situation is a
> > bit better with regards to reordering stores: we have a WriteBarrier
> > MachOp, which in principle, prevents store reordering.  In practice, we
> > don't seem to actually have any C-- optimizations that reorder stores.
> > So, at least you can assume these will work OK!
> >
> > Hope this helps (and is not too inaccurate),
> > Edward
> >
> > Excerpts from John Lato's message of 2013-12-20 09:36:11 +0800:
> > > Hello,
> > >
> > > I'm working on a lock-free algorithm that's meant to be used in a
> > > concurrent setting, and I've run into a possible issue.
> > >
> > > The crux of the matter is that a particular function needs to perform the
> > > following:
> > >
> > > > x <- MVector.read vec ix
> > > > position <- readIORef posRef
> > >
> > > and the algorithm is only safe if these two reads are not reordered (both
> > > the vector and IORef are written to by other threads).
> > >
> > > My concern is, according to standard Haskell semantics this should be
> > safe,
> > > as IO sequencing should guarantee that the reads happen in-order.  Of
> > > course this also relies upon the architecture's memory model, but x86
> > also
> > > guarantees that reads happen in order.  However doubts remain; I do not
> > > have confidence that the code generator will handle this properly.  In
> > > particular, LLVM may freely re-order loads of NotAtomic and Unordered
> > > values.
> > >
> > > The one hope I have is that ghc will preserve IO semantics through the
> > > entire pipeline.  This seems like it would be necessary for proper
> > handling
> > > of exceptions, for example.  So, can anyone tell me if my worries are
> > > unfounded, or if there's any way to ensure the behavior I want?  I could
> > > change the readIORef to an atomicModifyIORef, which should issue an
> > mfence,
> > > but that seems a bit heavy-handed as just a read fence would be
> > sufficient
> > > (although even that seems more than necessary).
> > >
> > > Thanks,
> > > John L.
> >
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Edward Z. Yang | 31 Dec 15:45 2013
Picon

Re: memory ordering

> Second, the optimizer is a bit more conservative when it comes to
> primop calls (internally referred to as "unsafe foreign calls")

Sorry, I need to correct myself here.  Foreign primops, and most
out-of-line primops, compile into jumps which end basic blocks, which
constitute hard boundaries since we don't do really do inter-block
optimization.  Unsafe foreign calls are generally reserved for function
calls which use the C calling convention; primops manage the return
convention themselves.

Edward
Carter Schonwald | 31 Dec 16:12 2013
Picon

Re: memory ordering

I suspect it's not by design, because there's certainly plans to make them inline primops, and the reordering issue of the cmm optimizer hasn't come up in the design discussion previously. (And I should add those notes to the associated tickets)

On Tuesday, December 31, 2013, Edward Z. Yang wrote:

I was thinking about my response, and realized there was one major
misleading thing in my description.  The load reordering I described
applies to load instructions in C-- proper, i.e. things that show up
in the C-- dup as:

    W_ x = I64[...addr...]

Reads to IORefs and reads to vectors get compiled inline (as they
eventually translate into inline primops), so my admonitions are
applicable.

However, the story with *foreign primops* (which is how loadLoadBarrier
in atomic-primops is defined, how you might imagine defining a custom
read function as a primop) is a little different.  First, what does a
call to an foreign primop compile into? It is *not* inlined, so it will
eventually get compiled into a jump (this could be a problem if you're
really trying to squeeze out performance!)  Second, the optimizer is a
bit more conservative when it comes to primop calls (internally referred
to as "unsafe foreign calls"); at the moment, the optimizer assumes
these foreign calls clobber heap memory, so we *automatically* will not
push loads/stores beyond this boundary. (NB: We reserve the right to
change this in the future!)

This is probably why atomic-primops, as it is written today, seems to
work OK, even in the presence of the optimizer.  But I also have a hard
time believing it gives the speedups you want, due to the current
design. (CC'd Ryan Newton, because I would love to be wrong here, and
maybe he can correct me on this note.)

Cheers,
Edward

P.S. loadLoadBarrier compiles to a no-op on x86 architectures, but
because it's not inlined I think you will still end up with a jump (LLVM
might be able to eliminate it).

Excerpts from John Lato's message of 2013-12-31 03:01:58 +0800:
> Hi Edward,
>
> Thanks very much for this reply, it answers a lot of questions I'd had.
>  I'd hoped that ordering would be preserved through C--, but c'est la vie.
>  Optimizing compilers are ever the bane of concurrent algorithms!
>
> stg/SMP.h does define a loadLoadBarrier, which is exposed in Ryan Newton's
> atomic-primops package.  From the docs, I think that's a general read
> barrier, and should do what I want.  Assuming it works properly, of course.
>  If I'm lucky it might even be optimized out.
>
> Thanks,
> John
>
> On Mon, Dec 30, 2013 at 6:04 AM, Edward Z. Yang <ezyang <at> mit.edu> wrote:
>
> > Hello John,
> >
> > Here are some prior discussions (which I will attempt to summarize
> > below):
> >
> >     http://www.haskell.org/pipermail/haskell-cafe/2011-May/091878.html
> >     http://www.haskell.org/pipermail/haskell-prime/2006-April/001237.html
> >     http://www.haskell.org/pipermail/haskell-prime/2006-March/001079.html
> >
> > The guarantees that Haskell and GHC give in this area are hand-wavy at
> > best; at the moment, I don't think Haskell or GHC have a formal memory
> > model—this seems to be an open research problem. (Unfortunately, AFAICT
> > all the researchers working on relaxed memory models have their hands
> > full with things like C++ :-)
> >
> > If you want to go ahead and build something that /just/ works for a
> > /specific version/ of GHC, you will need to answer this question
> > separately for every phase of the compiler.  For Core and STG, monads
> > will preserve ordering, so there is no trouble.  However, for C--, we
> > will almost certainly apply optimizations which reorder reads (look at
> > CmmSink.hs).  To properly support your algorithm, you will have to add
> > some new read barrier mach-ops, and teach the optimizer to respect them.
> > (This could be fiendishly subtle; it might be better to give C-- a
> > memory model first.)  These mach-ops would then translate into
> > appropriate arch-specific assembly or LLVM instructions, preserving
> > the guarantees further.
> >
> > This is not related to your original question, but the situation is a
> > bit better with regards to reordering stores: we have a WriteBarrier
> > MachOp, which in principle, prevents store reordering.  In practice, we
> > don't seem to actually have any C-- optimizations that reorder stores.
> > So, at least you can assume these will work OK!
> >
> > Hope this helps (and is not too inaccurate),
> > Edward
> >
> > Excerpts from John Lato's message of 2013-12-20 09:36:11 +0800:
> > > Hello,
> > >
> > > I'm working on a lock-free algorithm that's meant to be used in a
> > > concurrent setting, and I've run into a possible issue.
> > >
> > > The crux of the matter is that a particular function needs to perform the
> > > following:
> > >
> > > > x <- MVector.read vec ix
> > > > position <- readIORef posRef
> > >
> > > and the algorithm is only safe if these two reads are not reordered (both
> > > the vector and IORef are written to by other threads).
> > >
> > > My concern is, according to standard Haskell semantics this should be
> > safe,
> > > as IO sequencing should guarantee that the reads happen in-order.  Of
> > > course this also relies upon the architecture's memory model, but x86
> > also
> > > guarantees that reads happen in order.  However doubts remain; I do not
> > > have confidence that the code generator will handle this properly.  In
> > > particular, LLVM may freely re-order loads of NotAtomic and Unordered
> > > values.
> > >
> > > The one hope I have is that ghc will preserve IO semantics through the
> > > entire pipeline.  This see_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
John Lato | 1 Jan 20:38 2014
Picon

Re: memory ordering

One point I'm getting from this discussion is that perhaps not much time has been spent considering these issues in ghc backends. If so, it's probably a good thing to work through it now.

For myself, I guess the only option I have now is to measure using loadLoadBarrier and see if it's better or worse than calling atomicModifyIORef.

On Dec 31, 2013 6:42 AM, "Edward Z. Yang" <ezyang <at> mit.edu> wrote:
I was thinking about my response, and realized there was one major
misleading thing in my description.  The load reordering I described
applies to load instructions in C-- proper, i.e. things that show up
in the C-- dup as:

    W_ x = I64[...addr...]

Reads to IORefs and reads to vectors get compiled inline (as they
eventually translate into inline primops), so my admonitions are
applicable.

However, the story with *foreign primops* (which is how loadLoadBarrier
in atomic-primops is defined, how you might imagine defining a custom
read function as a primop) is a little different.  First, what does a
call to an foreign primop compile into? It is *not* inlined, so it will
eventually get compiled into a jump (this could be a problem if you're
really trying to squeeze out performance!)  Second, the optimizer is a
bit more conservative when it comes to primop calls (internally referred
to as "unsafe foreign calls"); at the moment, the optimizer assumes
these foreign calls clobber heap memory, so we *automatically* will not
push loads/stores beyond this boundary. (NB: We reserve the right to
change this in the future!)

This is probably why atomic-primops, as it is written today, seems to
work OK, even in the presence of the optimizer.  But I also have a hard
time believing it gives the speedups you want, due to the current
design. (CC'd Ryan Newton, because I would love to be wrong here, and
maybe he can correct me on this note.)

Cheers,
Edward

P.S. loadLoadBarrier compiles to a no-op on x86 architectures, but
because it's not inlined I think you will still end up with a jump (LLVM
might be able to eliminate it).

Excerpts from John Lato's message of 2013-12-31 03:01:58 +0800:
> Hi Edward,
>
> Thanks very much for this reply, it answers a lot of questions I'd had.
>  I'd hoped that ordering would be preserved through C--, but c'est la vie.
>  Optimizing compilers are ever the bane of concurrent algorithms!
>
> stg/SMP.h does define a loadLoadBarrier, which is exposed in Ryan Newton's
> atomic-primops package.  From the docs, I think that's a general read
> barrier, and should do what I want.  Assuming it works properly, of course.
>  If I'm lucky it might even be optimized out.
>
> Thanks,
> John
>
> On Mon, Dec 30, 2013 at 6:04 AM, Edward Z. Yang <ezyang <at> mit.edu> wrote:
>
> > Hello John,
> >
> > Here are some prior discussions (which I will attempt to summarize
> > below):
> >
> >     http://www.haskell.org/pipermail/haskell-cafe/2011-May/091878.html
> >     http://www.haskell.org/pipermail/haskell-prime/2006-April/001237.html
> >     http://www.haskell.org/pipermail/haskell-prime/2006-March/001079.html
> >
> > The guarantees that Haskell and GHC give in this area are hand-wavy at
> > best; at the moment, I don't think Haskell or GHC have a formal memory
> > model—this seems to be an open research problem. (Unfortunately, AFAICT
> > all the researchers working on relaxed memory models have their hands
> > full with things like C++ :-)
> >
> > If you want to go ahead and build something that /just/ works for a
> > /specific version/ of GHC, you will need to answer this question
> > separately for every phase of the compiler.  For Core and STG, monads
> > will preserve ordering, so there is no trouble.  However, for C--, we
> > will almost certainly apply optimizations which reorder reads (look at
> > CmmSink.hs).  To properly support your algorithm, you will have to add
> > some new read barrier mach-ops, and teach the optimizer to respect them.
> > (This could be fiendishly subtle; it might be better to give C-- a
> > memory model first.)  These mach-ops would then translate into
> > appropriate arch-specific assembly or LLVM instructions, preserving
> > the guarantees further.
> >
> > This is not related to your original question, but the situation is a
> > bit better with regards to reordering stores: we have a WriteBarrier
> > MachOp, which in principle, prevents store reordering.  In practice, we
> > don't seem to actually have any C-- optimizations that reorder stores.
> > So, at least you can assume these will work OK!
> >
> > Hope this helps (and is not too inaccurate),
> > Edward
> >
> > Excerpts from John Lato's message of 2013-12-20 09:36:11 +0800:
> > > Hello,
> > >
> > > I'm working on a lock-free algorithm that's meant to be used in a
> > > concurrent setting, and I've run into a possible issue.
> > >
> > > The crux of the matter is that a particular function needs to perform the
> > > following:
> > >
> > > > x <- MVector.read vec ix
> > > > position <- readIORef posRef
> > >
> > > and the algorithm is only safe if these two reads are not reordered (both
> > > the vector and IORef are written to by other threads).
> > >
> > > My concern is, according to standard Haskell semantics this should be
> > safe,
> > > as IO sequencing should guarantee that the reads happen in-order.  Of
> > > course this also relies upon the architecture's memory model, but x86
> > also
> > > guarantees that reads happen in order.  However doubts remain; I do not
> > > have confidence that the code generator will handle this properly.  In
> > > particular, LLVM may freely re-order loads of NotAtomic and Unordered
> > > values.
> > >
> > > The one hope I have is that ghc will preserve IO semantics through the
> > > entire pipeline.  This seems like it would be necessary for proper
> > handling
> > > of exceptions, for example.  So, can anyone tell me if my worries are
> > > unfounded, or if there's any way to ensure the behavior I want?  I could
> > > change the readIORef to an atomicModifyIORef, which should issue an
> > mfence,
> > > but that seems a bit heavy-handed as just a read fence would be
> > sufficient
> > > (although even that seems more than necessary).
> > >
> > > Thanks,
> > > John L.
> >
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Gmane