C K Kashyap | 29 Jul 09:52 2012
Picon

Current state of garbage collection in Haskell

Hi,

I was looking at a video that talks about GC pauses. That got me curious about the current state of GC in Haskell - say ghc 7.4.1.
Would it suffer from lengthy pauses when we talk about memory in the range of 500M +?
What would be a good way to keep abreast with the progress on haskell GC?
Regards,
Kashyap
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Alexander Solla | 29 Jul 17:16 2012
Picon

Re: Current state of garbage collection in Haskell

On Sun, Jul 29, 2012 at 12:52 AM, C K Kashyap <ckkashyap <at> gmail.com> wrote:

Hi,
I was looking at a video that talks about GC pauses. That got me curious about the current state of GC in Haskell - say ghc 7.4.1.
Would it suffer from lengthy pauses when we talk about memory in the range of 500M +?
What would be a good way to keep abreast with the progress on haskell GC?
Regards,
Kashyap

Have you read the latest GHC manual pages?[1]  It has a list of options, suggestions, gotchas, etc.  I haven't read the GHC specific mailing lists, but cvs-ghc sounds like where you might get real-time updates.



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Thomas Schilling | 29 Jul 20:29 2012

Re: Current state of garbage collection in Haskell

GHC does not provide any form of real-time guarantees (and support for
them is not planned).

That said, it's not as bad as it sounds:

 - Collecting the first (young) generation is fast and you can control
the size of that first generation via runtime system (RTS) options.

 - The older generation is collected rarely and can be collected in parallel.

 - You can explicitly invoke the GC via System.Mem.performGC

In a multi-threaded / multi-core program collecting the first
generation still requires stopping all application threads even though
only one thread (CPU) will perform GC (and having other threads help
out usually doesn't work out due to locality issues). This can be
particularly expensive if the OS decides to deschedule an OS thread,
as then the GHC RTS has to wait for the OS. You can avoid that
particular problem by properly configuring the OS via (linux boot
isolcpus=... and taskset(8)). The GHC team has been working on a
independent *local* GC, but it's unlikely to make it into the main
branch at this time. It turned out to be very difficult to implement,
with not large enough gains. Building a fully-concurrent GC is
(AFAICT) even harder.

I don't know how long the pause times for your 500MB live heap would
be. Generally, you want your heap to be about twice the size of your
live data. Other than that it depends heavily on the characteristics
of you heap objects. E.g., if it's mostly arrays of unboxed
non-pointer data, then it'll be very quick to collect (since the GC
doesn't have to do anything with the contents of these arrays). If
it's mostly many small objects with pointers to other objects, GC will
be very expensive and bound by the latency of your RAM. So, I suggest
you run some tests with realistic heaps.

Regarding keeping up, Simon Marlow is the main person working on GHC's
GC (often collaborating with others) and he keeps a list of papers on
his homepage: http://research.microsoft.com/en-us/people/simonmar/

If you have further questions about GHC's GC, you can ask them on the
glasgow-haskell-users <at> haskell.org mailing list (but please consult the
GHC user's guide section on RTS options first).

HTH

On 29 July 2012 08:52, C K Kashyap <ckkashyap <at> gmail.com> wrote:
> Hi,
> I was looking at a video that talks about GC pauses. That got me curious
> about the current state of GC in Haskell - say ghc 7.4.1.
> Would it suffer from lengthy pauses when we talk about memory in the range
> of 500M +?
> What would be a good way to keep abreast with the progress on haskell GC?
> Regards,
> Kashyap
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>

--

-- 
Push the envelope. Watch it bend.
C K Kashyap | 30 Jul 11:55 2012
Picon

Re: Current state of garbage collection in Haskell

Thank you so much Alexander and Thomas.


Regards,
Kashyap

On Sun, Jul 29, 2012 at 11:59 PM, Thomas Schilling <nominolo <at> googlemail.com> wrote:
GHC does not provide any form of real-time guarantees (and support for
them is not planned).

That said, it's not as bad as it sounds:

 - Collecting the first (young) generation is fast and you can control
the size of that first generation via runtime system (RTS) options.

 - The older generation is collected rarely and can be collected in parallel.

 - You can explicitly invoke the GC via System.Mem.performGC

In a multi-threaded / multi-core program collecting the first
generation still requires stopping all application threads even though
only one thread (CPU) will perform GC (and having other threads help
out usually doesn't work out due to locality issues). This can be
particularly expensive if the OS decides to deschedule an OS thread,
as then the GHC RTS has to wait for the OS. You can avoid that
particular problem by properly configuring the OS via (linux boot
isolcpus=... and taskset(8)). The GHC team has been working on a
independent *local* GC, but it's unlikely to make it into the main
branch at this time. It turned out to be very difficult to implement,
with not large enough gains. Building a fully-concurrent GC is
(AFAICT) even harder.

I don't know how long the pause times for your 500MB live heap would
be. Generally, you want your heap to be about twice the size of your
live data. Other than that it depends heavily on the characteristics
of you heap objects. E.g., if it's mostly arrays of unboxed
non-pointer data, then it'll be very quick to collect (since the GC
doesn't have to do anything with the contents of these arrays). If
it's mostly many small objects with pointers to other objects, GC will
be very expensive and bound by the latency of your RAM. So, I suggest
you run some tests with realistic heaps.

Regarding keeping up, Simon Marlow is the main person working on GHC's
GC (often collaborating with others) and he keeps a list of papers on
his homepage: http://research.microsoft.com/en-us/people/simonmar/

If you have further questions about GHC's GC, you can ask them on the
glasgow-haskell-users <at> haskell.org mailing list (but please consult the
GHC user's guide section on RTS options first).

HTH

On 29 July 2012 08:52, C K Kashyap <ckkashyap <at> gmail.com> wrote:
> Hi,
> I was looking at a video that talks about GC pauses. That got me curious
> about the current state of GC in Haskell - say ghc 7.4.1.
> Would it suffer from lengthy pauses when we talk about memory in the range
> of 500M +?
> What would be a good way to keep abreast with the progress on haskell GC?
> Regards,
> Kashyap
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



--
Push the envelope. Watch it bend.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane