Mikhail Glushenkov | 20 Apr 10:55 2013
Picon

Why is GHC so much worse than JHC when computing the Ackermann function?

Hi all,

This came up on StackOverflow [1]. When compiled with GHC (7.4.2 &
7.6.2), this simple program:

main = print $ ack 4 1
  where ack :: Int -> Int -> Int
        ack 0 n = n+1
        ack m 0 = ack (m-1) 1
        ack m n = ack (m-1) (ack m (n-1))

consumes all available memory on my machine and slows down to a crawl.
However, when compiled with JHC it runs in constant space and is about
as fast as the straightforward Ocaml version (see the SO question for
benchmark numbers).

I was able to fix the space leak by using CPS-conversion, but the
CPS-converted version is still about 10 times slower than the naive
version compiled with JHC.

I looked both at the Core and Cmm, but couldn't find anything
obviously wrong with the generated code - 'ack' is compiled to a
simple loop of type 'Int# -> Int# -> Int#'. What's more frustrating is
that running the program with +RTS -hc makes the space leak
mysteriously vanish.

Can someone please explain where the space leak comes from and if it's
possible to further improve the runtime of this program with GHC?
Apparently it's somehow connected to the stack management strategy,
since running the program with a larger stack chunk size (+RTS -kc1M)
(Continue reading)

Christopher Done | 20 Apr 12:08 2013
Picon

Re: Why is GHC so much worse than JHC when computing the Ackermann function?

JHC compiles to C and last time I tried this it looked very much like the original C version: http://www.reddit.com/r/haskell/comments/1bcru7/damn_lies_and_haskell_performance/c9689a3

This is the same thing. Compile with --tdir=/tmp/ajhc and then cat /tmp/ajhc/main_code.c. Output should be like this:

static uint32_t A_STD
fW$__fMain_1__ack(gc_t gc,uint32_t v105553376,uint32_t v61835120)
{
        if (0 == v105553376) {
            return 1 + v61835120;
        } else {
            uint32_t v16;
            uint32_t v22;
            struct tup1 x2;
            if (0 == v61835120) {
                uint32_t v228308038 = (v105553376 - 1);
                x2.t0 = v228308038;
                x2.t1 = 1;
            } else {
                uint32_t v110947990;
                uint32_t v215884490 = (v61835120 - 1);
                uint32_t v62470112 = (v105553376 - 1);
                v110947990 = fW$__fMain_1__ack(gc,v105553376,v215884490);
                x2.t0 = v62470112;
                x2.t1 = v110947990;
            }
            v22 = x2.t0;
            v16 = x2.t1;
            return fW$__fMain_1__ack(gc,v22,v16);
        }
}

And it's as fast as C on my machine:

chris <at> midnight:~/Projects/me/ajhc$ time ./ack
65533

real 0m2.134s
user 0m2.124s
sys 0m0.000s
chris <at> midnight:~/Projects/me/ajhc$ gcc -O3 ack.c -o ack-c
ack.c: In function ‘main’:
ack.c:8:3: warning: incompatible implicit declaration of built-in function ‘printf’ [enabled by default]
chris <at> midnight:~/Projects/me/ajhc$ time ./ack-c
65533

real 0m2.255s
user 0m2.248s
sys 0m0.000s
chris <at> midnight:~/Projects/me/ajhc$ 




On 20 April 2013 10:55, Mikhail Glushenkov <the.dead.shall.rise <at> gmail.com> wrote:
Hi all,

This came up on StackOverflow [1]. When compiled with GHC (7.4.2 &
7.6.2), this simple program:

main = print $ ack 4 1
  where ack :: Int -> Int -> Int
        ack 0 n = n+1
        ack m 0 = ack (m-1) 1
        ack m n = ack (m-1) (ack m (n-1))

consumes all available memory on my machine and slows down to a crawl.
However, when compiled with JHC it runs in constant space and is about
as fast as the straightforward Ocaml version (see the SO question for
benchmark numbers).

I was able to fix the space leak by using CPS-conversion, but the
CPS-converted version is still about 10 times slower than the naive
version compiled with JHC.

I looked both at the Core and Cmm, but couldn't find anything
obviously wrong with the generated code - 'ack' is compiled to a
simple loop of type 'Int# -> Int# -> Int#'. What's more frustrating is
that running the program with +RTS -hc makes the space leak
mysteriously vanish.

Can someone please explain where the space leak comes from and if it's
possible to further improve the runtime of this program with GHC?
Apparently it's somehow connected to the stack management strategy,
since running the program with a larger stack chunk size (+RTS -kc1M)
makes the space leak go away. Interestingly, choosing smaller stack
chunk sizes (256K, 512K) causes it to die with an OOM exception:

$ time ./Test +RTS -kc256K
Test: out of memory (requested 2097152 bytes)


[1] http://stackoverflow.com/questions/16115815/ackermann-very-inefficient-with-haskell-ghc/16116074#16116074

--
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Mikhail Glushenkov | 20 Apr 12:23 2013
Picon

Re: Why is GHC so much worse than JHC when computing the Ackermann function?

Hi,

On Sat, Apr 20, 2013 at 12:08 PM, Christopher Done <chrisdone <at> gmail.com> wrote:
> JHC compiles to C and last time I tried this it looked very much like the
> original C version:
> http://www.reddit.com/r/haskell/comments/1bcru7/damn_lies_and_haskell_performance/c9689a3
>
> This is the same thing.

Yes, but with the Fibonacci example GHC wasn't consuming all available
memory and running 10 times slower. Surely it must be possible to make
the GHC version at least 2x as slow as Ocaml/GHC?

--

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments
Mikhail Glushenkov | 20 Apr 12:24 2013
Picon

Re: Why is GHC so much worse than JHC when computing the Ackermann function?

On Sat, Apr 20, 2013 at 12:23 PM, Mikhail Glushenkov
<the.dead.shall.rise <at> gmail.com> wrote:
> Surely it must be possible to make
> the GHC version at least 2x as slow as Ocaml/GHC?

s|Ocaml/GHC|Ocaml/JHC|

--

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments
Dan Doel | 20 Apr 14:20 2013
Picon

Re: Why is GHC so much worse than JHC when computing the Ackermann function?

There's something strange going on in this example. For instance, setting (-M) heap limits as low as 40K seems to have no effect, even though the program easily uses more than 8G. Except, interrupting the program in such a case does seem to give a message about heap limits being exceeded (it won't stop on its own, though). Also, the program compiled without optimizations uses very little memory (though it's slow), which is odd, because the optimized version works exclusively on Int#, which shouldn't cause heap allocation.

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Mikhail Glushenkov | 20 Apr 14:53 2013
Picon

Re: Why is GHC so much worse than JHC when computing the Ackermann function?

Hi,

On 20 April 2013 14:20, Dan Doel <dan.doel <at> gmail.com> wrote:
> There's something strange going on in this example. For instance, setting
> (-M) heap limits as low as 40K seems to have no effect, even though the
> program easily uses more than 8G.

Apparently the only things it allocates is stack chunks.

I managed to produce a version of this program that runs approximately
as fast as the Ocaml one by manually unrolling the main loop [1], but
it still has to be run with +RTS -kc1M to avoid the memory leak.

[1] https://gist.github.com/23Skidoo/5425891

--

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments
Edward Z. Yang | 20 Apr 22:03 2013
Picon

Re: Why is GHC so much worse than JHC when computing the Ackermann function?

I don't seem to get the leak on latest GHC head.  Running the program
in GC debug mode in 7.6.2 is quite telling; the program is allocating
*a lot* of megablocks.  We probably fixed it though?

Edward

Excerpts from Mikhail Glushenkov's message of Sat Apr 20 01:55:10 -0700 2013:
> Hi all,
> 
> This came up on StackOverflow [1]. When compiled with GHC (7.4.2 &
> 7.6.2), this simple program:
> 
> main = print $ ack 4 1
>   where ack :: Int -> Int -> Int
>         ack 0 n = n+1
>         ack m 0 = ack (m-1) 1
>         ack m n = ack (m-1) (ack m (n-1))
> 
> consumes all available memory on my machine and slows down to a crawl.
> However, when compiled with JHC it runs in constant space and is about
> as fast as the straightforward Ocaml version (see the SO question for
> benchmark numbers).
> 
> I was able to fix the space leak by using CPS-conversion, but the
> CPS-converted version is still about 10 times slower than the naive
> version compiled with JHC.
> 
> I looked both at the Core and Cmm, but couldn't find anything
> obviously wrong with the generated code - 'ack' is compiled to a
> simple loop of type 'Int# -> Int# -> Int#'. What's more frustrating is
> that running the program with +RTS -hc makes the space leak
> mysteriously vanish.
> 
> Can someone please explain where the space leak comes from and if it's
> possible to further improve the runtime of this program with GHC?
> Apparently it's somehow connected to the stack management strategy,
> since running the program with a larger stack chunk size (+RTS -kc1M)
> makes the space leak go away. Interestingly, choosing smaller stack
> chunk sizes (256K, 512K) causes it to die with an OOM exception:
> 
> $ time ./Test +RTS -kc256K
> Test: out of memory (requested 2097152 bytes)
> 
> 
> [1] http://stackoverflow.com/questions/16115815/ackermann-very-inefficient-with-haskell-ghc/16116074#16116074
> 
Joachim Breitner | 20 Apr 22:12 2013
Picon

Re: Why is GHC so much worse than JHC when computing the Ackermann function?

Hi,

Am Samstag, den 20.04.2013, 13:03 -0700 schrieb Edward Z. Yang:
> I don't seem to get the leak on latest GHC head.  Running the program
> in GC debug mode in 7.6.2 is quite telling; the program is allocating
> *a lot* of megablocks.  We probably fixed it though?

it’s confirmed that it is fixed in HEAD. But it’d be really interesting
to know what the bug was and how it was fixed. Could someone who has a
setup for running bisect at his fingertips find out?

Greetings,
Joachim

--

-- 
Joachim "nomeata" Breitner
Debian Developer
  nomeata <at> debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nomeata <at> joachim-breitner.de | http://people.debian.org/~nomeata

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Thomas Schilling | 20 Apr 23:03 2013

Re: Why is GHC so much worse than JHC when computing the Ackermann function?

Sounds similar to the bug we had in 6.12.1.  It was a simple
accounting bug, i.e., the GC was not invoked, because it didn't know
that the memory was allocated.

On 20 April 2013 22:03, Edward Z. Yang <ezyang <at> mit.edu> wrote:
> I don't seem to get the leak on latest GHC head.  Running the program
> in GC debug mode in 7.6.2 is quite telling; the program is allocating
> *a lot* of megablocks.  We probably fixed it though?
>
> Edward
>
> Excerpts from Mikhail Glushenkov's message of Sat Apr 20 01:55:10 -0700 2013:
>> Hi all,
>>
>> This came up on StackOverflow [1]. When compiled with GHC (7.4.2 &
>> 7.6.2), this simple program:
>>
>> main = print $ ack 4 1
>>   where ack :: Int -> Int -> Int
>>         ack 0 n = n+1
>>         ack m 0 = ack (m-1) 1
>>         ack m n = ack (m-1) (ack m (n-1))
>>
>> consumes all available memory on my machine and slows down to a crawl.
>> However, when compiled with JHC it runs in constant space and is about
>> as fast as the straightforward Ocaml version (see the SO question for
>> benchmark numbers).
>>
>> I was able to fix the space leak by using CPS-conversion, but the
>> CPS-converted version is still about 10 times slower than the naive
>> version compiled with JHC.
>>
>> I looked both at the Core and Cmm, but couldn't find anything
>> obviously wrong with the generated code - 'ack' is compiled to a
>> simple loop of type 'Int# -> Int# -> Int#'. What's more frustrating is
>> that running the program with +RTS -hc makes the space leak
>> mysteriously vanish.
>>
>> Can someone please explain where the space leak comes from and if it's
>> possible to further improve the runtime of this program with GHC?
>> Apparently it's somehow connected to the stack management strategy,
>> since running the program with a larger stack chunk size (+RTS -kc1M)
>> makes the space leak go away. Interestingly, choosing smaller stack
>> chunk sizes (256K, 512K) causes it to die with an OOM exception:
>>
>> $ time ./Test +RTS -kc256K
>> Test: out of memory (requested 2097152 bytes)
>>
>>
>> [1] http://stackoverflow.com/questions/16115815/ackermann-very-inefficient-with-haskell-ghc/16116074#16116074
>>
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users <at> haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Gmane