David Sankel | 22 Sep 17:10 2010
Picon

Shared thunk optimization. Looking to solidify my understanding

The following code (full code available here[1], example taken from comments here[2]),

const' = \a _ -> a

test1 c = let a = const' (nthPrime 100000)
          in (a c, a c)

test2 c = let a = \_ -> (nthPrime 100000)
          in (a c, a c)

 in a ghci session (with :set +s):

*Primes> test1 1
(9592,9592)
(0.89 secs, 106657468 bytes)
*Primes> test2 1
(9592,9592)
(1.80 secs, 213077068 bytes)

test1, although denotationally equivalent to test2 runs about twice as fast.

My questions are:
  • What is the optimization that test1 is taking advantage of called?
  • Is said optimization required to conform to the Haskell98 standard? If so, where is it stated?
  • Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization?
Thanks,

David

[1] http://hpaste.org/40033/operational_semantics
[2] http://conal.net/blog/posts/lazier-functional-programming-part-2/

--
David Sankel
Sankel Software
www.sankelsoftware.com
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Holger Siegel | 22 Sep 17:42 2010
Picon

Re: Shared thunk optimization. Looking to solidify my understanding


Am 22.09.2010 um 17:10 schrieb David Sankel:

> The following code (full code available here[1], example taken from comments here[2]),
> 
> const' = \a _ -> a
> 
> test1 c = let a = const' (nthPrime 100000)
>          in (a c, a c)
> 
> test2 c = let a = \_ -> (nthPrime 100000)
>          in (a c, a c)
> 
> in a ghci session (with :set +s):
> 
> *Primes> test1 1
> (9592,9592)
> (0.89 secs, 106657468 bytes)
> *Primes> test2 1
> (9592,9592)
> (1.80 secs, 213077068 bytes)
> 
> test1, although denotationally equivalent to test2 runs about twice as fast.
> 
> My questions are:
> 	• What is the optimization that test1 is taking advantage of called?

It would be called 'Common Subexpression Elimination' if it would occur here. ;)

> 	• Is said optimization required to conform to the Haskell98 standard? If so, where is it stated?

Yes, because the Haskell98 standard doesn't prescribe an operational semantics. A semantics that is
often used to model for Haskell's runtime behavior is John Launchbury's Natural Semantics for Lazy Evaluation:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.2016

> 	• Could someone explain or point to a precise explanation of this optimization? If I'm doing an
operational reduction by hand on paper, how would I take account for this optimization?

In this semantics, only constants and variables may occur as arguments, so you have to rewrite test1 as

test1 = \ c -> let x = nthPrime 100000
                        a = const' x
                        l = a c
                        r = a c
                    in (l, r)

An actual value is a term combined with a heap that maps identifiers to terms. Evaluation of (test1 1) starts
with an empty heap and the expression (test1 1):

{},
test1 1

First, test1 is substituted:

{},
(\ c -> let x = nthPrime 100000
                        a = const' x
                        l = a c
                        r = a c
                    in (l, r)) 1

Now, the term is beta-reduced:

{},
(let x = nthPrime 100000
     a = const' x
     l = a 1
     r = a 1
in (l, r))

Let-bindings on top level are replaced with bindings on the heaps, closures are created for the bound
terms.Here, a closure is created for the value of x:

{c_x = nthPrime 100000},
let a = const'  c_x
    l = a 1
    r = a 1
in (l, r))

also, a, l and r are transformed into references to closures:

{c_x = nthPrime 100000
c_a = const'  c_x
c_l = c_a 1
c_r = c_a 1},
(c_l, c_r))

When you access the first element of the tuple, c_l is evaluated by substituting c_a and const' and then
evaluating c_x, resulting in:

{c_x = <big_number>
c_a = const'  c_x
c_l = c_x
c_r = c_a 1},
(c_l, c_r))

When you access the second element, the value of c_x is reused:

{c_x = <big_number>
c_a = const'  c_x
c_l = c_x
c_r = c_x},
(c_l, c_r))

For test2, things are different, because the function (\_ -> nthPrime 100000) is substituted directly:

{},(let a = \_ -> (nthPrime 100000)
        l = a 1
        r = a 1
    in (l, r))

becomes

{c_a = \_ -> (nthPrime 100000)
l = nthPrime 100000
r = nthPrime 100000
},(l, r)

Now, l and r refer to different calls to nthPrime.
Daniel Fischer | 22 Sep 17:48 2010
Picon

Re: Shared thunk optimization. Looking to solidify my understanding

On Wednesday 22 September 2010 17:10:06, David Sankel wrote:
> The following code (full code available here[1], example taken from
> comments here[2]),
>
> const' = \a _ -> a
>
> test1 c = let a = const' (nthPrime 100000)
>           in (a c, a c)
>
> test2 c = let a = \_ -> (nthPrime 100000)
>           in (a c, a c)
>
>
>  in a ghci session (with :set +s):
>
> *Primes> test1 1
> (9592,9592)
> (0.89 secs, 106657468 bytes)
> *Primes> test2 1
> (9592,9592)
> (1.80 secs, 213077068 bytes)
>
>
> test1, although denotationally equivalent to test2 runs about twice as
> fast.
>
> My questions are:
>
>    - What is the optimization that test1 is taking advantage of called?

Sharing. test1 computes nthPrime 100000 only once, test2 twice.
Note that when compiled with optimisations, test1 and test2 give exactly 
the same core.

>    - Is said optimization required to conform to the Haskell98 standard?
> If so, where is it stated?

No, it's not required. Unconditional sharing may be bad, so it would be 
unwise to demand sharing.

>    - Could someone explain or point to a precise explanation of this
>    optimization? If I'm doing an operational reduction by hand on paper,
> how would I take account for this optimization?

More or less

test1 c =
    let v = nthPrime 100000
        a = const' v
    in (a c, a c)

You (the compiler) give(s) a name to a value, when the value is referred to 
via that name, it's not recalculated (it's calculated on first demand).

>
> Thanks,
>
> David
>
> [1] http://hpaste.org/40033/operational_semantics
> [2] http://conal.net/blog/posts/lazier-functional-programming-part-2/
wren ng thornton | 22 Sep 22:04 2010

Re: Shared thunk optimization. Looking to solidify my understanding

On 9/22/10 11:48 AM, Daniel Fischer wrote:
> On Wednesday 22 September 2010 17:10:06, David Sankel wrote:
>> The following code (full code available here[1], example taken from
>> comments here[2]),
>>
>> const' = \a _ ->  a
>>
>> test1 c = let a = const' (nthPrime 100000)
>>            in (a c, a c)
>>
>> test2 c = let a = \_ ->  (nthPrime 100000)
>>            in (a c, a c)
>>
>> test1, although denotationally equivalent to test2 runs about twice as
>> fast. My questions are:
>>
>>     - What is the optimization that test1 is taking advantage of called?
>
> Sharing. test1 computes nthPrime 100000 only once, test2 twice.

Actually, I'd call it partial evaluation.

That is, if we expand the notation for multiple lambdas and inline 
const' then we get

     test1 c =
         let a = (\x -> \_ -> x) (nthPrime 100000)
         in (a c, a c)

     test2 c =
         let a = \_ -> (nthPrime 100000)
         in (a c, a c)

Now, if we follow the evaluation we'll see

     (test1 G) -- for some G
{force test1 to WHNF}
     (\c ->
         let a = (\x -> \_ -> x) (nthPrime 100000)
         in (a c, a c))
     g
{force application to WHNF}
     let c = G in
     let a = (\x -> \_ -> x) (nthPrime 100000) in
     (a c, a c)

And here we're already in WHNF, but lets assume you're going for full NF.

     let c = G in
     let a = (\x -> \_ -> x) (nthPrime 100000) in
     (a c, a c)
{force a to WHNF == force application to WHNF}
     let c = G in
     let a =
         let x = nthPrime 100000 in
         \_ -> x in
     (a c, a c)
{force x to WHNF in the application of a to c}
     let c = G in
     let a =
         let x = X in -- for brevity
         \_ -> x in
     (a c, a c)
{force the application (itself) of a to c}
     let c = G in
     let a =
         let x = X in
         \_ -> x in
     (X, a c)
{force the application of a to c, the second one}
     let c = G in
     let a =
         let x = X in
         \_ -> x in
     (X, X)

Now, notice that because the binding of x was floated out in front of 
the ignored parameter to a, we only end up computing it once and sharing 
that X among all the different calls. This is typically called partial 
evaluation because we can partially evaluate the function a, even 
without knowing its arguments. Partial evaluation is a general 
optimization technique. It is, in fact, the same technique as floating 
invariants out of loops in imperative languages.

When GHC's optimizations are turned on, it realizes that the binding of 
x can be floated out of the lambda because it does not depend on the 
(ignored) parameter. Thus, in optimized code the two will look the same, 
provided the optimizer is smart enough to figure out when things can be 
floated out. But noone wants to be forced to guarantee that the 
optimizer is smart enough, which is why the Haskell specs say nothing 
about the exact evaluation order.

--

-- 
Live well,
~wren
David Sankel | 24 Sep 23:58 2010
Picon

Re: Shared thunk optimization. Looking to solidify my understanding

On Wed, Sep 22, 2010 at 11:10 AM, David Sankel <camior <at> gmail.com> wrote:
<snip>
My questions are:
  • What is the optimization that test1 is taking advantage of called?
  • Is said optimization required to conform to the Haskell98 standard? If so, where is it stated?
  • Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization?

Thanks everyone for your responses. I found them very helpful. This is my current understanding, please correct me where I am wrong:

When using Launchbury's Natural Semantics (LNS) as an operational model, this optimization is called sharing which would lie in a category of optimizations called common subexpression elimination. Holger Siegel's email provided steps of an evaluation using LNS to show the different runtimes between test1 and test2.

Because Haskell98 does not specify an operational semantics, there is no guarantee that an implementation will provide a sharing optimization. On the other hand, Haskell implementations are all similar enough that the sharing optimization can be depended on. LNS was indeed written to model what is common in implementations for languages characteristically like Haskell.

When compiled with ghc with optimizations, test1 and test2 result in identical runtime behaviors. This is an artifact of another, more aggressive, optimization that falls within common subexpression elimination optimizations. It is not easy to describe or predict when this optimization occurs so depending on it when writing code is problematic.

wren ng thornton provided an evaluation using another operational semantics (reference?). Under this semantics, this optimization would be called partial evaluation. Unfortunately I couldn't follow the steps or the reasoning behind them, perhaps a reference to the semantics would help.

Thanks again!

David

--
David Sankel
Sankel Software
www.sankelsoftware.com
585 617 4748 (Office)
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Bernie Pope | 25 Sep 06:46 2010
Picon

Re: Re: Shared thunk optimization. Looking to solidify my understanding

On 25 September 2010 07:58, David Sankel <camior <at> gmail.com> wrote:

> Thanks everyone for your responses. I found them very helpful. This is my
> current understanding, please correct me where I am wrong:
> When using Launchbury's Natural Semantics (LNS) as an operational model,
> this optimization is called sharing which would lie in a category of
> optimizations called common subexpression elimination. Holger Siegel's email
> provided steps of an evaluation using LNS to show the different runtimes
> between test1 and test2.
> Because Haskell98 does not specify an operational semantics, there is
> no guarantee that an implementation will provide a sharing optimization. On
> the other hand, Haskell implementations are all similar enough that the
> sharing optimization can be depended on. LNS was indeed written to model
> what is common in implementations for languages characteristically like
> Haskell.
> When compiled with ghc with optimizations, test1 and test2 result in
> identical runtime behaviors. This is an artifact of another, more
> aggressive, optimization that falls within common subexpression elimination
> optimizations. It is not easy to describe or predict when this optimization
> occurs so depending on it when writing code is problematic.
> wren ng thornton provided an evaluation using another operational semantics
> (reference?). Under this semantics, this optimization would be called
> partial evaluation. Unfortunately I couldn't follow the steps or the
> reasoning behind them, perhaps a reference to the semantics would help.
> Thanks again!

Hi David,

If you are interested in exploring operational semantics you might
also get some use out of MiniSTG:

   http://www.haskell.org/haskellwiki/Ministg

It implements the push-enter and eval-apply semantics for the STG
machine. Perhaps the most useful part is that it can generate a HTML
trace of a small-step reduction. The trace shows the code, stack and
heap, so you can see how structures are shared.

It might help to read the "fast curry" paper if you want to play with MiniSTG:

   http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/

Cheers,
Bernie.
Jan-Willem Maessen | 25 Sep 21:43 2010
Picon

Re: Re: Shared thunk optimization. Looking to solidify my understanding

No one seems to have mentioned that this is a non-optimization in
call-by-need lambda-calculus (Ariola et al.), where it follows from
the standard reduction rules.  Since lazy implementations of Haskell
all use call-by-need evaluation in some form, I'd call this "playing
by the rules" rather than "optimization".  Unoptimized call-by-need
indeed evaluates (nthPrime 100000) twice in test2, but only once in
test1.  (Challenge: prove observationl equivalence of these two
fragments under call-by-need.)

-Jan-Willem Maessen

On Fri, Sep 24, 2010 at 5:58 PM, David Sankel <camior <at> gmail.com> wrote:
> On Wed, Sep 22, 2010 at 11:10 AM, David Sankel <camior <at> gmail.com> wrote:
>>
>> <snip>
>> My questions are:
>>
>> What is the optimization that test1 is taking advantage of called?
>> Is said optimization required to conform to the Haskell98 standard? If so,
>> where is it stated?
>> Could someone explain or point to a precise explanation of this
>> optimization? If I'm doing an operational reduction by hand on paper, how
>> would I take account for this optimization?
>
> Thanks everyone for your responses. I found them very helpful. This is my
> current understanding, please correct me where I am wrong:
> When using Launchbury's Natural Semantics (LNS) as an operational model,
> this optimization is called sharing which would lie in a category of
> optimizations called common subexpression elimination. Holger Siegel's email
> provided steps of an evaluation using LNS to show the different runtimes
> between test1 and test2.
> Because Haskell98 does not specify an operational semantics, there is
> no guarantee that an implementation will provide a sharing optimization. On
> the other hand, Haskell implementations are all similar enough that the
> sharing optimization can be depended on. LNS was indeed written to model
> what is common in implementations for languages characteristically like
> Haskell.
> When compiled with ghc with optimizations, test1 and test2 result in
> identical runtime behaviors. This is an artifact of another, more
> aggressive, optimization that falls within common subexpression elimination
> optimizations. It is not easy to describe or predict when this optimization
> occurs so depending on it when writing code is problematic.
> wren ng thornton provided an evaluation using another operational semantics
> (reference?). Under this semantics, this optimization would be called
> partial evaluation. Unfortunately I couldn't follow the steps or the
> reasoning behind them, perhaps a reference to the semantics would help.
> Thanks again!
>
> David
> --
> David Sankel
> Sankel Software
> www.sankelsoftware.com
> 585 617 4748 (Office)
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
wren ng thornton | 26 Sep 01:27 2010

Re: Re: Shared thunk optimization. Looking to solidify my understanding

On 9/25/10 3:43 PM, Jan-Willem Maessen wrote:
> No one seems to have mentioned that this is a non-optimization in
> call-by-need lambda-calculus (Ariola et al.), where it follows from
> the standard reduction rules.

Exactly. Then again, call-by-need gives a form of partial evaluation, 
which was what I was pointing out in in the post described as:

> On Wed, Sep 22, 2010 at 11:10 AM, David Sankel<camior <at> gmail.com>  wrote:
>> wren ng thornton provided an evaluation using another operational semantics
>> (reference?). Under this semantics, this optimization would be called
>> partial evaluation. Unfortunately I couldn't follow the steps or the
>> reasoning behind them, perhaps a reference to the semantics would help.

--

-- 
Live well,
~wren

Gmane