### Re: Shared thunk optimization. Looking to solidify my understanding

Holger Siegel <holgersiegel74 <at> yahoo.de>

2010-09-22 15:42:36 GMT

Am 22.09.2010 um 17:10 schrieb David Sankel:
> The following code (full code available here[1], example taken from comments here[2]),
>
> const' = \a _ -> a
>
> test1 c = let a = const' (nthPrime 100000)
> in (a c, a c)
>
> test2 c = let a = \_ -> (nthPrime 100000)
> in (a c, a c)
>
> in a ghci session (with :set +s):
>
> *Primes> test1 1
> (9592,9592)
> (0.89 secs, 106657468 bytes)
> *Primes> test2 1
> (9592,9592)
> (1.80 secs, 213077068 bytes)
>
> test1, although denotationally equivalent to test2 runs about twice as fast.
>
> My questions are:
> • What is the optimization that test1 is taking advantage of called?
It would be called 'Common Subexpression Elimination' if it would occur here. ;)
> • Is said optimization required to conform to the Haskell98 standard? If so, where is it stated?
Yes, because the Haskell98 standard doesn't prescribe an operational semantics. A semantics that is
often used to model for Haskell's runtime behavior is John Launchbury's Natural Semantics for Lazy Evaluation:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.2016
> • Could someone explain or point to a precise explanation of this optimization? If I'm doing an
operational reduction by hand on paper, how would I take account for this optimization?
In this semantics, only constants and variables may occur as arguments, so you have to rewrite test1 as
test1 = \ c -> let x = nthPrime 100000
a = const' x
l = a c
r = a c
in (l, r)
An actual value is a term combined with a heap that maps identifiers to terms. Evaluation of (test1 1) starts
with an empty heap and the expression (test1 1):
{},
test1 1
First, test1 is substituted:
{},
(\ c -> let x = nthPrime 100000
a = const' x
l = a c
r = a c
in (l, r)) 1
Now, the term is beta-reduced:
{},
(let x = nthPrime 100000
a = const' x
l = a 1
r = a 1
in (l, r))
Let-bindings on top level are replaced with bindings on the heaps, closures are created for the bound
terms.Here, a closure is created for the value of x:
{c_x = nthPrime 100000},
let a = const' c_x
l = a 1
r = a 1
in (l, r))
also, a, l and r are transformed into references to closures:
{c_x = nthPrime 100000
c_a = const' c_x
c_l = c_a 1
c_r = c_a 1},
(c_l, c_r))
When you access the first element of the tuple, c_l is evaluated by substituting c_a and const' and then
evaluating c_x, resulting in:
{c_x = <big_number>
c_a = const' c_x
c_l = c_x
c_r = c_a 1},
(c_l, c_r))
When you access the second element, the value of c_x is reused:
{c_x = <big_number>
c_a = const' c_x
c_l = c_x
c_r = c_x},
(c_l, c_r))
For test2, things are different, because the function (\_ -> nthPrime 100000) is substituted directly:
{},(let a = \_ -> (nthPrime 100000)
l = a 1
r = a 1
in (l, r))
becomes
{c_a = \_ -> (nthPrime 100000)
l = nthPrime 100000
r = nthPrime 100000
},(l, r)
Now, l and r refer to different calls to nthPrime.