Brandon Simmons | 9 May 01:18 2014
Picon

Using mutable array after an unsafeFreezeArray, and GC details

I have an unusual application with some unusual performance problems
and I'm trying to understand how I might use unsafeFreezeArray to help
me, as well as understand in detail what's going on with boxed mutable
arrays and GC. I'm using the interface from 'primitive' below.

First some basic questions, then a bit more background

1) What happens when I do `newArray s x >>= \a-> unsafeFreezeArray a
>> return a` and then use `a`? What problems could that cause?

2) And what if a do a `cloneMutableArray` on `a` and likewise use the
resulting array?

Background: I've been looking into an issue [1] in a library in which
as more mutable arrays are allocated, GC dominates (I think I verified
this?) and all code gets slower in proportion to the number of mutable
arrays that are hanging around.

I've been trying to understand how this is working internally. I don't
quite understand how the "remembered set" works with respect to
MutableArray. As best I understand: the remembered set in generation G
points to certain objects in older generations, which objects hold
references to objects in G. Then for MutableArrays specifically,
card-marking is used to mark regions of the array with garbage in some
way.

So my hypothesis is the slowdown is associated with the size of the
remembered set, and whatever the GC has to do on it. And in my tests,
freezing the array seems to make that overhead (at least the overhead
proportional to number of arrays) disappear.
(Continue reading)

Edward Z. Yang | 9 May 08:06 2014
Picon

Re: Using mutable array after an unsafeFreezeArray, and GC details

Hello Brandon,

Excerpts from Brandon Simmons's message of 2014-05-08 16:18:48 -0700:
> I have an unusual application with some unusual performance problems
> and I'm trying to understand how I might use unsafeFreezeArray to help
> me, as well as understand in detail what's going on with boxed mutable
> arrays and GC. I'm using the interface from 'primitive' below.
> 
> First some basic questions, then a bit more background
> 
> 1) What happens when I do `newArray s x >>= \a-> unsafeFreezeArray a
> >> return a` and then use `a`? What problems could that cause?

Your code as written wouldn't compile, but assuming you're talking about
the primops newArray# and unsafeFreezeArray#, what this operation does
is allocate a new array of pointers (initially recorded as mutable), and
then freezes it in-place (by changing the info-table associated with
it), but while maintaining a pointer to the original mutable array.  Nothing bad
will happen immediately, but if you use this mutable reference to mutate
the pointer array, you can cause a crash (in particular, if the array
makes it to the old generation, it will not be on the mutable list and
so if you mutate it, you may be missing roots.)

> 2) And what if a do a `cloneMutableArray` on `a` and likewise use the
> resulting array?

If you do the clone before freezing, that's fine for all use-cases;
if you do the clone after, you will end up with the same result as (1).

> Background: I've been looking into an issue [1] in a library in which
(Continue reading)

Brandon Simmons | 10 May 01:25 2014
Picon

Re: Using mutable array after an unsafeFreezeArray, and GC details


On May 9, 2014 5:13 PM, "Edward Z. Yang" <ezyang <at> mit.edu> wrote:
>
> Hello Brandon,
>
> Excerpts from Brandon Simmons's message of 2014-05-08 16:18:48 -0700:
> > I have an unusual application with some unusual performance problems
> > and I'm trying to understand how I might use unsafeFreezeArray to help
> > me, as well as understand in detail what's going on with boxed mutable
> > arrays and GC. I'm using the interface from 'primitive' below.
> >
> > First some basic questions, then a bit more background
> >
> > 1) What happens when I do `newArray s x >>= \a-> unsafeFreezeArray a
> > >> return a` and then use `a`? What problems could that cause?
>
> Your code as written wouldn't compile, but assuming you're talking about
> the primops newArray# and unsafeFreezeArray#, what this operation does
> is allocate a new array of pointers (initially recorded as mutable), and
> then freezes it in-place (by changing the info-table associated with
> it), but while maintaining a pointer to the original mutable array.  Nothing bad
> will happen immediately, but if you use this mutable reference to mutate
> the pointer array, you can cause a crash (in particular, if the array
> makes it to the old generation, it will not be on the mutable list and
> so if you mutate it, you may be missing roots.)
>
> > 2) And what if a do a `cloneMutableArray` on `a` and likewise use the
> > resulting array?
>
> If you do the clone before freezing, that's fine for all use-cases;
> if you do the clone after, you will end up with the same result as (1).
>
> > Background: I've been looking into an issue [1] in a library in which
> > as more mutable arrays are allocated, GC dominates (I think I verified
> > this?) and all code gets slower in proportion to the number of mutable
> > arrays that are hanging around.
> >
> > I've been trying to understand how this is working internally. I don't
> > quite understand how the "remembered set" works with respect to
> > MutableArray. As best I understand: the remembered set in generation G
> > points to certain objects in older generations, which objects hold
> > references to objects in G. Then for MutableArrays specifically,
> > card-marking is used to mark regions of the array with garbage in some
> > way.
> >
> > So my hypothesis is the slowdown is associated with the size of the
> > remembered set, and whatever the GC has to do on it. And in my tests,
> > freezing the array seems to make that overhead (at least the overhead
> > proportional to number of arrays) disappear.
>
> You're basically correct.  In the current GC design, mutable arrays of
> pointers are always placed on the mutable list.  The mutable list of
> generations which are not being collected are always traversed; thus,
> the number of pointer arrays corresponds to a linear overhead for minor GCs.
>
> Here is a feature request tracking many of the infelicities that our
> current GC design has:  https://ghc.haskell.org/trac/ghc/ticket/7662
> The upshot is that the Haskell GC is very nicely tuned for mostly
> immutable workloads, but there are some bad asymptotics when your
> heap has lots of mutable objects.  This is generally a hard problem:
> tuned GC implementations for mutable languages are a lot of work!
> (Just ask the JVM implementors.)
>

Very helpful, thanks! And take some internet points.

> > Now I'm really lost in the woods though. My hope is that I might be
> > able to safely use unsafeFreezeArray to help me here [3]. Here are the
> > particulars of how I use MutableArray in my algorithm, which are
> > somewhat unusual:
> >   - keep around a small template `MutableArray Nothing`
> >   - use cloneMutableArray for fast allocation of new arrays
> >   - for each array only a *single* write (CAS actually) happens at each position
> >
> > In fact as far as I can reason, there ought to be no garbage to
> > collect at all until the entire array becomes garbage (the initial
> > value is surely shared, especially since I'm keeping this template
> > array around to clone from, right?). In fact I was even playing with
> > the idea of rolling a new CAS that skips the card-marking stuff.
>
> I don't understand your full workload, but if you have a workload that
> involves creating an array, mutating it over a short period of time,
> and then never mutating it afterwards, you should simply freeze it after
> you are done writing it.  Once frozen, the array will no longer be kept
> on the mutable list and you won't pay for it when doing GC.  However,
> the fact that you are doing a CAS makes it seem to me that your workflow
> may be more complicated than that...

Yes I think for my use case the overhead required to determine when the array can be frozen would not be worth it. I think I have some other knobs I can twist here.

I'll keep an eye on that ticket and maybe chime in if I have any ideas.

Thanks,
Brandon

>
> Cheers,
> Edward

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Carter Schonwald | 10 May 01:49 2014
Picon

Re: Using mutable array after an unsafeFreezeArray, and GC details

Any chance you could try to use storable or unboxed vectors?

On Friday, May 9, 2014, Brandon Simmons <brandon.m.simmons <at> gmail.com> wrote:


On May 9, 2014 5:13 PM, "Edward Z. Yang" <ezyang <at> mit.edu> wrote:
>
> Hello Brandon,
>
> Excerpts from Brandon Simmons's message of 2014-05-08 16:18:48 -0700:
> > I have an unusual application with some unusual performance problems
> > and I'm trying to understand how I might use unsafeFreezeArray to help
> > me, as well as understand in detail what's going on with boxed mutable
> > arrays and GC. I'm using the interface from 'primitive' below.
> >
> > First some basic questions, then a bit more background
> >
> > 1) What happens when I do `newArray s x >>= \a-> unsafeFreezeArray a
> > >> return a` and then use `a`? What problems could that cause?
>
> Your code as written wouldn't compile, but assuming you're talking about
> the primops newArray# and unsafeFreezeArray#, what this operation does
> is allocate a new array of pointers (initially recorded as mutable), and
> then freezes it in-place (by changing the info-table associated with
> it), but while maintaining a pointer to the original mutable array.  Nothing bad
> will happen immediately, but if you use this mutable reference to mutate
> the pointer array, you can cause a crash (in particular, if the array
> makes it to the old generation, it will not be on the mutable list and
> so if you mutate it, you may be missing roots.)
>
> > 2) And what if a do a `cloneMutableArray` on `a` and likewise use the
> > resulting array?
>
> If you do the clone before freezing, that's fine for all use-cases;
> if you do the clone after, you will end up with the same result as (1).
>
> > Background: I've been looking into an issue [1] in a library in which
> > as more mutable arrays are allocated, GC dominates (I think I verified
> > this?) and all code gets slower in proportion to the number of mutable
> > arrays that are hanging around.
> >
> > I've been trying to understand how this is working internally. I don't
> > quite understand how the "remembered set" works with respect to
> > MutableArray. As best I understand: the remembered set in generation G
> > points to certain objects in older generations, which objects hold
> > references to objects in G. Then for MutableArrays specifically,
> > card-marking is used to mark regions of the array with garbage in some
> > way.
> >
> > So my hypothesis is the slowdown is associated with the size of the
> > remembered set, and whatever the GC has to do on it. And in my tests,
> > freezing the array seems to make that overhead (at least the overhead
> > proportional to number of arrays) disappear.
>
> You're basically correct.  In the current GC design, mutable arrays of
> pointers are always placed on the mutable list.  The mutable list of
> generations which are not being collected are always traversed; thus,
> the number of pointer arrays corresponds to a linear overhead for minor GCs.
>
> Here is a feature request tracking many of the infelicities that our
> current GC design has:  https://ghc.haskell.org/trac/ghc/ticket/7662
> The upshot is that the Haskell GC is very nicely tuned for mostly
> immutable workloads, but there are some bad asymptotics when your
> heap has lots of mutable objects.  This is generally a hard problem:
> tuned GC implementations for mutable languages are a lot of work!
> (Just ask the JVM implementors.)
>

Very helpful, thanks! And take some internet points.

> > Now I'm really lost in the woods though. My hope is that I might be
> > able to safely use unsafeFreezeArray to help me here [3]. Here are the
> > particulars of how I use MutableArray in my algorithm, which are
> > somewhat unusual:
> >   - keep around a small template `MutableArray Nothing`
> >   - use cloneMutableArray for fast allocation of new arrays
> >   - for each array only a *single* write (CAS actually) happens at each position
> >
> > In fact as far as I can reason, there ought to be no garbage to
> > collect at all until the entire array becomes garbage (the initial
> > value is surely shared, especially since I'm keeping this template
> > array around to clone from, right?). In fact I was even playing with
> > the idea of rolling a new CAS that skips the card-marking stuff.
>
> I don't understand your full workload, but if you have a workload that
> involves creating an array, mutating it over a short period of time,
> and then never mutating it afterwards, you should simply freeze it after
> you are done writing it.  Once frozen, the array will no longer be kept
> on the mutable list and you won't pay for it when doing GC.  However,
> the fact that you are doing a CAS makes it seem to me that your workflow
> may be more complicated than that...

Yes I think for my use case the overhead required to determine when the array can be frozen would not be worth it. I think I have some other knobs I can twist here.

I'll keep an eye on that ticket and maybe chime in if I have any ideas.

Thanks,
Brandon

>
> Cheers,
> Edward

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Edward Z. Yang | 10 May 01:55 2014
Picon

Re: Using mutable array after an unsafeFreezeArray, and GC details

Excerpts from Carter Schonwald's message of 2014-05-09 16:49:07 -0700:
> Any chance you could try to use storable or unboxed vectors?

Neither of those will work if, at the end of the day, you need to
store pointers to heap objects

Edward
Carter Schonwald | 10 May 03:34 2014
Picon

Re: Using mutable array after an unsafeFreezeArray, and GC details

true enough 


On Fri, May 9, 2014 at 7:55 PM, Edward Z. Yang <ezyang <at> mit.edu> wrote:
Excerpts from Carter Schonwald's message of 2014-05-09 16:49:07 -0700:
> Any chance you could try to use storable or unboxed vectors?

Neither of those will work if, at the end of the day, you need to
store pointers to heap objects

Edward

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Brandon Simmons | 10 May 20:47 2014
Picon

Re: Using mutable array after an unsafeFreezeArray, and GC details

On Fri, May 9, 2014 at 7:49 PM, Carter Schonwald
<carter.schonwald <at> gmail.com> wrote:
> Any chance you could try to use storable or unboxed vectors?
>

Yeah, I'd thought about a variant for Storables, but it didn't seem
like it would be a win on paper (probably requiring a CAS on one
unboxed array, a loadLoadBarrier, and a second write to a different
unboxed array), but it would probably be worth playing with in light
of this issue. Thanks for the suggestion!

Brandon
Brandon Simmons | 10 May 20:52 2014
Picon

Re: Using mutable array after an unsafeFreezeArray, and GC details

On Sat, May 10, 2014 at 2:47 PM, Brandon Simmons
<brandon.m.simmons <at> gmail.com> wrote:
> On Fri, May 9, 2014 at 7:49 PM, Carter Schonwald
> <carter.schonwald <at> gmail.com> wrote:
>> Any chance you could try to use storable or unboxed vectors?
>>
>
> Yeah, I'd thought about a variant for Storables, but it didn't seem
> like it would be a win on paper (probably requiring a CAS on one
> unboxed array, a loadLoadBarrier, and a second write to a different

Sorry s/loadLoadBarrier/writeBarrier/  i.e. write_barrier() from SMP.h

By the way, I'm playing with concurrent FIFO queue designs, I realized
I haven't mentioned!

> unboxed array), but it would probably be worth playing with in light
> of this issue. Thanks for the suggestion!
>
> Brandon
Brandon Simmons | 10 May 22:57 2014
Picon

Re: Using mutable array after an unsafeFreezeArray, and GC details

On Fri, May 9, 2014 at 2:06 AM, Edward Z. Yang <ezyang <at> mit.edu> wrote:
> Hello Brandon,
>
> Excerpts from Brandon Simmons's message of 2014-05-08 16:18:48 -0700:
>> I have an unusual application with some unusual performance problems
>> and I'm trying to understand how I might use unsafeFreezeArray to help
>> me, as well as understand in detail what's going on with boxed mutable
>> arrays and GC. I'm using the interface from 'primitive' below.
>>
>> First some basic questions, then a bit more background
>>
>> 1) What happens when I do `newArray s x >>= \a-> unsafeFreezeArray a
>> >> return a` and then use `a`? What problems could that cause?
>
> Your code as written wouldn't compile, but assuming you're talking about
> the primops newArray# and unsafeFreezeArray#, what this operation does
> is allocate a new array of pointers (initially recorded as mutable), and
> then freezes it in-place (by changing the info-table associated with
> it), but while maintaining a pointer to the original mutable array.  Nothing bad
> will happen immediately, but if you use this mutable reference to mutate
> the pointer array, you can cause a crash (in particular, if the array
> makes it to the old generation, it will not be on the mutable list and
> so if you mutate it, you may be missing roots.)
>
>> 2) And what if a do a `cloneMutableArray` on `a` and likewise use the
>> resulting array?
>
> If you do the clone before freezing, that's fine for all use-cases;
> if you do the clone after, you will end up with the same result as (1).
>
>> Background: I've been looking into an issue [1] in a library in which
>> as more mutable arrays are allocated, GC dominates (I think I verified
>> this?) and all code gets slower in proportion to the number of mutable
>> arrays that are hanging around.
>>
>> I've been trying to understand how this is working internally. I don't
>> quite understand how the "remembered set" works with respect to
>> MutableArray. As best I understand: the remembered set in generation G
>> points to certain objects in older generations, which objects hold
>> references to objects in G. Then for MutableArrays specifically,
>> card-marking is used to mark regions of the array with garbage in some
>> way.
>>
>> So my hypothesis is the slowdown is associated with the size of the
>> remembered set, and whatever the GC has to do on it. And in my tests,
>> freezing the array seems to make that overhead (at least the overhead
>> proportional to number of arrays) disappear.
>
> You're basically correct.  In the current GC design, mutable arrays of
> pointers are always placed on the mutable list.  The mutable list of
> generations which are not being collected are always traversed; thus,
> the number of pointer arrays corresponds to a linear overhead for minor GCs.
>
> Here is a feature request tracking many of the infelicities that our
> current GC design has:  https://ghc.haskell.org/trac/ghc/ticket/7662
> The upshot is that the Haskell GC is very nicely tuned for mostly
> immutable workloads, but there are some bad asymptotics when your
> heap has lots of mutable objects.  This is generally a hard problem:
> tuned GC implementations for mutable languages are a lot of work!
> (Just ask the JVM implementors.)
>
>> Now I'm really lost in the woods though. My hope is that I might be
>> able to safely use unsafeFreezeArray to help me here [3]. Here are the
>> particulars of how I use MutableArray in my algorithm, which are
>> somewhat unusual:
>>   - keep around a small template `MutableArray Nothing`
>>   - use cloneMutableArray for fast allocation of new arrays
>>   - for each array only a *single* write (CAS actually) happens at each position
>>
>> In fact as far as I can reason, there ought to be no garbage to
>> collect at all until the entire array becomes garbage (the initial
>> value is surely shared, especially since I'm keeping this template
>> array around to clone from, right?). In fact I was even playing with
>> the idea of rolling a new CAS that skips the card-marking stuff.
>
> I don't understand your full workload, but if you have a workload that
> involves creating an array, mutating it over a short period of time,
> and then never mutating it afterwards, you should simply freeze it after
> you are done writing it.  Once frozen, the array will no longer be kept
> on the mutable list and you won't pay for it when doing GC.  However,
> the fact that you are doing a CAS makes it seem to me that your workflow
> may be more complicated than that...
>
> Cheers,
> Edward

Another silly question: when card-marking happens after a write or
CAS, does that indicate "this segment maybe contains old-to-new
generation references, so be sure to preserve (scavenge?) them from
collection "? In my initial question I was thinking of the cards as
indicating "here be garbage" (e.g. a previous overwritten array
value), but I think I had the wrong idea about how copying GC works
generally (shouldn't it really be called Non-Garbage Preservation?).

Thanks again,
Brandon
Simon Marlow | 12 May 10:29 2014
Picon

Re: Using mutable array after an unsafeFreezeArray, and GC details

On 10/05/2014 21:57, Brandon Simmons wrote:
> Another silly question: when card-marking happens after a write or
> CAS, does that indicate "this segment maybe contains old-to-new
> generation references, so be sure to preserve (scavenge?) them from
> collection "?

Yes, that's exactly right.

Cheers,
Simon
Edward Z. Yang | 11 May 02:28 2014
Picon

Re: Using mutable array after an unsafeFreezeArray, and GC details

Excerpts from Brandon Simmons's message of 2014-05-10 13:57:40 -0700:
> Another silly question: when card-marking happens after a write or
> CAS, does that indicate "this segment maybe contains old-to-new
> generation references, so be sure to preserve (scavenge?) them from
> collection "? In my initial question I was thinking of the cards as
> indicating "here be garbage" (e.g. a previous overwritten array
> value), but I think I had the wrong idea about how copying GC works
> generally (shouldn't it really be called Non-Garbage Preservation?).

That's correct.

Cheers,
Edward

Gmane