Eugene Kirpichov | 18 Jan 17:22 2012
Picon

Encouraging performance with ghc 7.4

Hi cafe,


Just wanted to inform you that I've been benchmarking my compute-intensive stuff on ghc 7.0.4 vs 7.4.0, and 7.4.0 gave a large speedup - one program that took 2.9s on a particular input is now taking 2.2s. This result is repeatable.

So I encourage people to try out GHC 7.4.0. Some stuff may stop compiling (I had to do 3 basically one-line fixes in 3 packages before I got mine to compile), but I didn't encounter other problems.

Below is +RTS -s of two runs: one compiled with ghc 7.4.0 and another with 7.0.4.

I can make a more detailed comparison if it's useful and if someone tells me how - I thought about including +RTS -p, but Simon Marlow wrote recently that it has started giving meaningful results only in 7.4, so comparison with 7.0.4 would be unfair.

However in this case I'm sure that the program is heavily compute-bound by the following two functions (because it's always compute-bound by them):

jkff <at> jkff-laptop ~/projects/logs/... $ tplot -if e.trace -dk 'within[.] cumsum 10' -o e.png +RTS -s
tplot -if e.trace -dk within[.] cumsum 10 -o e.png +RTS -s 
   2,751,756,296 bytes allocated in the heap
     135,129,112 bytes copied during GC
      33,149,720 bytes maximum residency (22 sample(s))
       1,755,868 bytes maximum slop
              56 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  4994 collections,     0 parallel,  0.22s,  0.23s elapsed
  Generation 1:    22 collections,     0 parallel,  0.08s,  0.09s elapsed

  INIT  time    0.01s  (  0.00s elapsed)
  MUT   time    2.61s  (  2.91s elapsed)
  GC    time    0.30s  (  0.32s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    2.91s  (  3.22s elapsed)

  %GC time      10.1%  (9.8% elapsed)

  Alloc rate    1,051,262,083 bytes per MUT second

  Productivity  89.6% of total user, 80.9% of total elapsed

jkff <at> jkff-laptop ~/projects/logs/... $ tplot -if e.trace -dk 'within[.] cumsum 10' -o e.png +RTS -s
   2,161,811,620 bytes allocated in the heap
     107,589,660 bytes copied during GC
      34,799,400 bytes maximum residency (22 sample(s))
       1,721,152 bytes maximum slop
              58 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3899 colls,     0 par    0.13s    0.14s     0.0000s    0.0003s
  Gen  1        22 colls,     0 par    0.08s    0.09s     0.0043s    0.0482s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.03s  (  2.28s elapsed)
  GC      time    0.21s  (  0.23s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    2.26s  (  2.51s elapsed)

  %GC     time       9.3%  (9.1% elapsed)

  Alloc rate    1,056,397,390 bytes per MUT second

  Productivity  90.7% of total user, 81.5% of total elapsed


--
Eugene Kirpichov
Principal Engineer, Mirantis Inc. http://www.mirantis.com/
Editor, http://fprog.ru/
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Eugene Kirpichov | 18 Jan 17:26 2012
Picon

Re: Encouraging performance with ghc 7.4

I got even more encouraging results on another input - 5 vs 7 seconds:


=========== GHC 7.4.0 ==============
jkff <at> jkff-laptop ~/projects/logs/... $ tplot -if lat.trace -dk 'within[.] duration quantile 10 0.25,0.5,0.75,0.9,0.95' -o lat3.png +RTS -s
   2,809,230,872 bytes allocated in the heap
     358,393,440 bytes copied during GC
      42,478,364 bytes maximum residency (68 sample(s))
       1,833,848 bytes maximum slop
             113 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4740 colls,     0 par    0.39s    0.40s     0.0001s    0.0005s
  Gen  1        68 colls,     0 par    0.42s    0.48s     0.0071s    0.1157s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    3.76s  (  4.01s elapsed)
  GC      time    0.82s  (  0.88s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    4.59s  (  4.89s elapsed)

  %GC     time      17.8%  (18.0% elapsed)

  Alloc rate    744,774,183 bytes per MUT second

  Productivity  82.2% of total user, 77.2% of total elapsed

============ GHC 7.0.4 ===============
jkff <at> jkff-laptop ~/projects/logs/... $ tplot -if lat.trace -dk 'within[.] duration quantile 10 0.25,0.5,0.75,0.9,0.95' -o lat3.png +RTS -s
tplot -if lat.trace -dk within[.] duration quantile 10 0.25,0.5,0.75,0.9,0.95 -o lat3.png +RTS -s 
   4,024,048,244 bytes allocated in the heap
     419,997,300 bytes copied during GC
      44,546,920 bytes maximum residency (70 sample(s))
       1,840,208 bytes maximum slop
             113 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  6995 collections,     0 parallel,  0.63s,  0.64s elapsed
  Generation 1:    70 collections,     0 parallel,  0.39s,  0.43s elapsed

  INIT  time    0.01s  (  0.00s elapsed)
  MUT   time    5.96s  (  6.17s elapsed)
  GC    time    1.02s  (  1.08s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    6.99s  (  7.24s elapsed)

  %GC time      14.6%  (14.9% elapsed)

  Alloc rate    674,478,767 bytes per MUT second

  Productivity  85.3% of total user, 82.3% of total elapsed


On Wed, Jan 18, 2012 at 8:22 PM, Eugene Kirpichov <ekirpichov <at> gmail.com> wrote:
Hi cafe,

Just wanted to inform you that I've been benchmarking my compute-intensive stuff on ghc 7.0.4 vs 7.4.0, and 7.4.0 gave a large speedup - one program that took 2.9s on a particular input is now taking 2.2s. This result is repeatable.

So I encourage people to try out GHC 7.4.0. Some stuff may stop compiling (I had to do 3 basically one-line fixes in 3 packages before I got mine to compile), but I didn't encounter other problems.

Below is +RTS -s of two runs: one compiled with ghc 7.4.0 and another with 7.0.4.

I can make a more detailed comparison if it's useful and if someone tells me how - I thought about including +RTS -p, but Simon Marlow wrote recently that it has started giving meaningful results only in 7.4, so comparison with 7.0.4 would be unfair.

However in this case I'm sure that the program is heavily compute-bound by the following two functions (because it's always compute-bound by them):

jkff <at> jkff-laptop ~/projects/logs/... $ tplot -if e.trace -dk 'within[.] cumsum 10' -o e.png +RTS -s
tplot -if e.trace -dk within[.] cumsum 10 -o e.png +RTS -s 
   2,751,756,296 bytes allocated in the heap
     135,129,112 bytes copied during GC
      33,149,720 bytes maximum residency (22 sample(s))
       1,755,868 bytes maximum slop
              56 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  4994 collections,     0 parallel,  0.22s,  0.23s elapsed
  Generation 1:    22 collections,     0 parallel,  0.08s,  0.09s elapsed

  INIT  time    0.01s  (  0.00s elapsed)
  MUT   time    2.61s  (  2.91s elapsed)
  GC    time    0.30s  (  0.32s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    2.91s  (  3.22s elapsed)

  %GC time      10.1%  (9.8% elapsed)

  Alloc rate    1,051,262,083 bytes per MUT second

  Productivity  89.6% of total user, 80.9% of total elapsed

jkff <at> jkff-laptop ~/projects/logs/... $ tplot -if e.trace -dk 'within[.] cumsum 10' -o e.png +RTS -s
   2,161,811,620 bytes allocated in the heap
     107,589,660 bytes copied during GC
      34,799,400 bytes maximum residency (22 sample(s))
       1,721,152 bytes maximum slop
              58 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3899 colls,     0 par    0.13s    0.14s     0.0000s    0.0003s
  Gen  1        22 colls,     0 par    0.08s    0.09s     0.0043s    0.0482s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.03s  (  2.28s elapsed)
  GC      time    0.21s  (  0.23s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    2.26s  (  2.51s elapsed)

  %GC     time       9.3%  (9.1% elapsed)

  Alloc rate    1,056,397,390 bytes per MUT second

  Productivity  90.7% of total user, 81.5% of total elapsed


--
Eugene Kirpichov
Principal Engineer, Mirantis Inc. http://www.mirantis.com/
Editor, http://fprog.ru/



--
Eugene Kirpichov
Principal Engineer, Mirantis Inc. http://www.mirantis.com/
Editor, http://fprog.ru/
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane