oleg | 26 Feb 11:00 2013

Thunks and GHC pessimisation


Tom Ellis wrote:
> To avoid retaining a large lazy data structure in memory it is useful to
> hide it behind a function call.  Below, "many" is used twice.  It is hidden
> behind a function call so it can be garbage collected between uses. 

As you discovered, it is quite challenging to ``go against the grain''
and force recomputation. GHC is quite good at avoiding
recomputation. This is a trade-off, of time vs space. For large
search tree, it is space that is a premium, and laziness and similar
strategies are exactly the wrong trade-off. 

The solution (which I've seen in some of the internal library code) is
to confuse GHC with extra functions:
        http://okmij.org/ftp/Haskell/misc.html#memo-off

So, eventually it is possible to force recomputation. But the solution
leaves a poor taste -- fighting a compiler is never a good idea. So,
this is a bug of sort -- not the bug of GHC, but of lazy
evaluation. Lazy evaluation is not the best evaluation strategy. It is
a trade-off, which suits a large class of problems and punishes
another large class of problems.
Tom Ellis | 26 Feb 22:13 2013
Picon

Re: Thunks and GHC pessimisation

On Tue, Feb 26, 2013 at 10:00:32AM -0000, oleg <at> okmij.org wrote:
> Tom Ellis wrote:
> > To avoid retaining a large lazy data structure in memory it is useful to
> > hide it behind a function call.  Below, "many" is used twice.  It is hidden
> > behind a function call so it can be garbage collected between uses. 
> 
> As you discovered, it is quite challenging to ``go against the grain''
> and force recomputation. GHC is quite good at avoiding
> recomputation. This is a trade-off, of time vs space. For large
> search tree, it is space that is a premium, and laziness and similar
> strategies are exactly the wrong trade-off. 

Hi Oleg,

I have good news: Don's suggestion of -fno-full-laziness that fixed the
space leak in my toy example also fixes the space leak in your iterative
deepening code.  To be precise, there is no longer any space leak in your
second implementation where subtrees are hidden behind a function call.

With full laziness your second implementation consumes 99M.  To avoid
sharing hacks are required and your third implementation consumes 2M. 
However, without full laziness your second implementation only consumes 2M.

> The solution (which I've seen in some of the internal library code) is
> to confuse GHC with extra functions:
>         http://okmij.org/ftp/Haskell/misc.html#memo-off

Yes, when I read your exposition several months ago it started the thinking
process which culminated in this thread.

(Continue reading)


Gmane