Fixie Fixie | 29 Nov 19:09 2012
Picon

To my boss: The code is cool, but it is about 100 times slower than the old one...

Hi all haskellers

I every now and then get the feeling that doing my job code in Haskell would be a good idea.

I have tried a couple of times, but each time I seem to run into performance problems - I do lots of heavy computing.

The problem seems to be connected to lazy loading, which makes my programs so slow that I really can not show them to anyone. I have tried all tricks in the books, like !, seq, non-lazy datatypes...

I was poking around to see if this had changed, then I ran into this forum post: http://stackoverflow.com/questions/9409634/is-indexing-of-data-vector-unboxed-mutable-mvector-really-this-slow

The last solution was a haskell program which was in the 3x range to C, which I think is ok. This was in the days of ghc 7.0

I then tried compile the programs myself (ghc 7.4.1), but found that now the C program now was more that 100x faster. The ghc code was compiled with both O2 and O3, giving only small differences on my 64-bit Linux box.

So it seems something has changed - and even small examples are still not safe when it comes to the lazy-monster. It reminds me of some code I read a couple of years ago where one of the Simons actually fired off a new thread, to make sure a variable was realized.

A sad thing, since I am More that willing to go for Haskell if proves to be usable. If anyone can see what is wrong with the code (there are two haskell versions on the page, I have tried the last and fastest one) it would also be interesting.

What is your experience, dear haskellers? To me it seems this beautiful language is useless without a better lazy/eager-analyzer.

Cheers,

Felix
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Alfredo Di Napoli | 29 Nov 20:00 2012
Picon

Re: To my boss: The code is cool, but it is about 100 times slower than the old one...

Hi there,

I'm only an amateur so just my 2 cent: Haskell can be really fast, but reaching that speed can be all but trivial: you need to use different data types (e.g. ByteString vs. the normal String type) relies on "unconventional" IO (e.g. Conduit, Iterateee) and still be ready to go "out of the base", using packages and functions wich are not in base/haskell platform (e.g. mwc-random).

My 2 cents :)
A.

On 29 November 2012 18:09, Fixie Fixie <fixie.fixie <at> rocketmail.com> wrote:
Hi all haskellers

I every now and then get the feeling that doing my job code in Haskell would be a good idea.

I have tried a couple of times, but each time I seem to run into performance problems - I do lots of heavy computing.

The problem seems to be connected to lazy loading, which makes my programs so slow that I really can not show them to anyone. I have tried all tricks in the books, like !, seq, non-lazy datatypes...

I was poking around to see if this had changed, then I ran into this forum post: http://stackoverflow.com/questions/9409634/is-indexing-of-data-vector-unboxed-mutable-mvector-really-this-slow

The last solution was a haskell program which was in the 3x range to C, which I think is ok. This was in the days of ghc 7.0

I then tried compile the programs myself (ghc 7.4.1), but found that now the C program now was more that 100x faster. The ghc code was compiled with both O2 and O3, giving only small differences on my 64-bit Linux box.

So it seems something has changed - and even small examples are still not safe when it comes to the lazy-monster. It reminds me of some code I read a couple of years ago where one of the Simons actually fired off a new thread, to make sure a variable was realized.

A sad thing, since I am More that willing to go for Haskell if proves to be usable. If anyone can see what is wrong with the code (there are two haskell versions on the page, I have tried the last and fastest one) it would also be interesting.

What is your experience, dear haskellers? To me it seems this beautiful language is useless without a better lazy/eager-analyzer.

Cheers,

Felix

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Clark Gaebel | 29 Nov 20:17 2012
Picon
Picon

Re: To my boss: The code is cool, but it is about 100 times slower than the old one...

If you can give an example of some underperforming code, I'm sure someone (or several people) on this list would be more than happy to help you make it more performant.

Generally, it doesn't take much. It's all in knowing where to look. Also, if you know performance is key, you should be using the performance-oriented data structures (ByteString, Text, Vector) from the very beginning. Personally, I never find myself using Data.Array, and rarely use String in real code. It's just not worth the performance headaches.

And finally, depending on what you're doing, neither Haskell, nor C might be right for you! Especially with certain numerics-related code, you might find fortran, OpenCL, or CUDA easier to make performant.

Examples of this would be lovely.

  - Clark


On Thu, Nov 29, 2012 at 2:00 PM, Alfredo Di Napoli <alfredo.dinapoli <at> gmail.com> wrote:
Hi there,
I'm only an amateur so just my 2 cent: Haskell can be really fast, but reaching that speed can be all but trivial: you need to use different data types (e.g. ByteString vs. the normal String type) relies on "unconventional" IO (e.g. Conduit, Iterateee) and still be ready to go "out of the base", using packages and functions wich are not in base/haskell platform (e.g. mwc-random).

My 2 cents :)
A.

On 29 November 2012 18:09, Fixie Fixie <fixie.fixie <at> rocketmail.com> wrote:
Hi all haskellers

I every now and then get the feeling that doing my job code in Haskell would be a good idea.

I have tried a couple of times, but each time I seem to run into performance problems - I do lots of heavy computing.

The problem seems to be connected to lazy loading, which makes my programs so slow that I really can not show them to anyone. I have tried all tricks in the books, like !, seq, non-lazy datatypes...

I was poking around to see if this had changed, then I ran into this forum post: http://stackoverflow.com/questions/9409634/is-indexing-of-data-vector-unboxed-mutable-mvector-really-this-slow

The last solution was a haskell program which was in the 3x range to C, which I think is ok. This was in the days of ghc 7.0

I then tried compile the programs myself (ghc 7.4.1), but found that now the C program now was more that 100x faster. The ghc code was compiled with both O2 and O3, giving only small differences on my 64-bit Linux box.

So it seems something has changed - and even small examples are still not safe when it comes to the lazy-monster. It reminds me of some code I read a couple of years ago where one of the Simons actually fired off a new thread, to make sure a variable was realized.

A sad thing, since I am More that willing to go for Haskell if proves to be usable. If anyone can see what is wrong with the code (there are two haskell versions on the page, I have tried the last and fastest one) it would also be interesting.

What is your experience, dear haskellers? To me it seems this beautiful language is useless without a better lazy/eager-analyzer.

Cheers,

Felix

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Fixie Fixie | 29 Nov 20:57 2012
Picon

Vedr: To my boss: The code is cool, but it is about 100 times slower than the old one...

Sure, the code is from the url I mentioned, both in C and in Haskell.

It allready seems like the haskell-version should run fast to me - it uses unboxed arrays and even unsafe functions to make it run faster. Well, have a look:

C-version:

#include <stdint.h> #include <stdio.h> #define VDIM 100 #define VNUM 100000 uint64_t prng (uint64_t w) { w ^= w << 13; w ^= w >> 7; w ^= w << 17; return w; }; void kahanStep (double *s, double *c, double x) { double y, t; y = x - *c; t = *s + y; *c = (t - *s) - y; *s = t; } void kahan(double s[], double c[]) { for (int i = 1; i <= VNUM; i++) { uint64_t w = i; for (int j = 0; j < VDIM; j++) { kahanStep(&s[j], &c[j], w); w = prng(w); } } }; int main (int argc, char* argv[]) { double acc[VDIM], err[VDIM]; for (int i = 0; i < VDIM; i++) { acc[i] = err[i] = 0.0; }; kahan(acc, err); printf("[ "); for (int i = 0; i < VDIM; i++) { printf("%g ", acc[i]); }; printf("]\n"); };

And the haskell version:

{-# LANGUAGE CPP, BangPatterns #-} module Main (main) where #define VDIM 100 #define VNUM 100000 import Data.Array.Base import Data.Array.ST import Data.Array.Unboxed import Control.Monad.ST import GHC.Word import Control.Monad import Data.Bits prng :: Word -> Word prng w = w' where !w1 = w `xor` (w `shiftL` 13) !w2 = w1 `xor` (w1 `shiftR` 7) !w' = w2 `xor` (w2 `shiftL` 17) type Vec s = STUArray s Int Double kahan :: Vec s -> Vec s -> ST s () kahan s c = do let inner w j | j < VDIM = do !cj <- unsafeRead c j !sj <- unsafeRead s j let !y = fromIntegral w - cj !t = sj + y !w' = prng w unsafeWrite c j ((t-sj)-y) unsafeWrite s j t inner w' (j+1) | otherwise = return () forM_ [1 .. VNUM] $ \i -> inner (fromIntegral i) 0 calc :: ST s (Vec s) calc = do s <- newArray (0,VDIM-1) 0 c <- newArray (0,VDIM-1) 0 kahan s c return s main :: IO () main = print . elems $ runSTUArray calc

Cheers,

Felix

Fra: Clark Gaebel <cgaebel <at> uwaterloo.ca>
Til: Alfredo Di Napoli <alfredo.dinapoli <at> gmail.com>
Kopi: Fixie Fixie <fixie.fixie <at> rocketmail.com>; "haskell-cafe <at> haskell.org" <haskell-cafe <at> haskell.org>
Sendt: Torsdag, 29. november 2012 20.17
Emne: Re: [Haskell-cafe] To my boss: The code is cool, but it is about 100 times slower than the old one...

If you can give an example of some underperforming code, I'm sure someone (or several people) on this list would be more than happy to help you make it more performant.

Generally, it doesn't take much. It's all in knowing where to look. Also, if you know performance is key, you should be using the performance-oriented data structures (ByteString, Text, Vector) from the very beginning. Personally, I never find myself using Data.Array, and rarely use String in real code. It's just not worth the performance headaches.

And finally, depending on what you're doing, neither Haskell, nor C might be right for you! Especially with certain numerics-related code, you might find fortran, OpenCL, or CUDA easier to make performant.

Examples of this would be lovely.

  - Clark


On Thu, Nov 29, 2012 at 2:00 PM, Alfredo Di Napoli <alfredo.dinapoli <at> gmail.com> wrote:
Hi there,
I'm only an amateur so just my 2 cent: Haskell can be really fast, but reaching that speed can be all but trivial: you need to use different data types (e.g. ByteString vs. the normal String type) relies on "unconventional" IO (e.g. Conduit, Iterateee) and still be ready to go "out of the base", using packages and functions wich are not in base/haskell platform (e.g. mwc-random).

My 2 cents :)
A.

On 29 November 2012 18:09, Fixie Fixie <fixie.fixie <at> rocketmail.com> wrote:
Hi all haskellers

I every now and then get the feeling that doing my job code in Haskell would be a good idea.

I have tried a couple of times, but each time I seem to run into performance problems - I do lots of heavy computing.

The problem seems to be connected to lazy loading, which makes my programs so slow that I really can not show them to anyone. I have tried all tricks in the books, like !, seq, non-lazy datatypes...

I was poking around to see if this had changed, then I ran into this forum post: http://stackoverflow.com/questions/9409634/is-indexing-of-data-vector-unboxed-mutable-mvector-really-this-slow

The last solution was a haskell program which was in the 3x range to C, which I think is ok. This was in the days of ghc 7.0

I then tried compile the programs myself (ghc 7.4.1), but found that now the C program now was more that 100x faster. The ghc code was compiled with both O2 and O3, giving only small differences on my 64-bit Linux box.

So it seems something has changed - and even small examples are still not safe when it comes to the lazy-monster. It reminds me of some code I read a couple of years ago where one of the Simons actually fired off a new thread, to make sure a variable was realized.

A sad thing, since I am More that willing to go for Haskell if proves to be usable. If anyone can see what is wrong with the code (there are two haskell versions on the page, I have tried the last and fastest one) it would also be interesting.

What is your experience, dear haskellers? To me it seems this beautiful language is useless without a better lazy/eager-analyzer.

Cheers,

Felix

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Fixie Fixie | 29 Nov 21:00 2012
Picon

Re: To my boss: The code is cool, but it is about 100 times slower than the old one...

Oh, my - what an indentation :-)

New try:

----- Videresendt melding ----
Fra: Fixie Fixie <fixie.fixie <at> rocketmail.com>
Til: "haskell-cafe <at> haskell.org" <haskell-cafe <at> haskell.org> 
Kopi: Clark Gaebel <cgaebel <at> uwaterloo.ca> 
Sendt: Torsdag, 29. november 2012 20.57
Emne: Vedr: [Haskell-cafe] To my boss: The code is cool, but it is about 100 times slower than the old one...

Sure, the code is from the url I mentioned, both in C and in Haskell.

It allready seems like the haskell-version should run fast to me - it uses unboxed arrays and even unsafe functions to make it run faster. Well, have a look:

C-version:

#include <stdint.h>
#include <stdio.h>


#define VDIM    100
#define VNUM    100000



uint64_t prng (uint64_t w) {
    w ^= w << 13;
    w ^= w >> 7;
    w ^= w << 17;
    return w;
};

void kahanStep (double *s, double *c, double x) {
    double y, t;
    y  = x - *c;
    t  = *s + y;
    *c = (t - *s) - y;
    *s = t;
}

void kahan(double s[], double c[]) {
    for (int i = 1; i <= VNUM; i++) {
        uint64_t w = i;
        for (int j = 0; j
 < VDIM; j++) {
                kahanStep(&s[j], &c[j], w);
                w = prng(w);
        }
    }
};


int main (int argc, char* argv[]) {
    double acc[VDIM], err[VDIM];
    for (int i = 0; i < VDIM; i++) {
        acc[i] = err[i] = 0.0;
    };
    kahan(acc, err);
    printf("[ ");
    for (int i = 0; i < VDIM; i++) {
        printf("%g ", acc[i]);
    };
    printf("]\n");
};

And the haskell version:

{-# LANGUAGE CPP, BangPatterns #-}
module Main (main) where

#define VDIM 100
#define VNUM 100000

import Data.Array.Base
import Data.Array.ST
import Data.Array.Unboxed
import Control.Monad.ST
import GHC.Word
import Control.Monad
import Data.Bits

prng :: Word -> Word
prng w = w'
  where
    !w1 = w `xor` (w `shiftL` 13)
    !w2 = w1 `xor` (w1 `shiftR` 7)
    !w' = w2 `xor` (w2 `shiftL` 17)

type Vec s = STUArray s Int Double

kahan :: Vec s -> Vec s -> ST s ()
kahan s c = do
    let inner w j
            | j < VDIM  = do
                !cj <- unsafeRead c j
                !sj <- unsafeRead s j
                let !y = fromIntegral w - cj
                    !t = sj + y
                    !w' = prng w
                unsafeWrite c j ((t-sj)-y)
                unsafeWrite s j t
                inner w' (j+1)
            | otherwise = return ()
    forM_ [1 .. VNUM] $ \i -> inner (fromIntegral i) 0

calc :: ST s (Vec s)
calc = do
    s <- newArray (0,VDIM-1) 0
    c <- newArray (0,VDIM-1) 0
    kahan s c
    return s

main :: IO ()
main = print . elems $ runSTUArray calc

Cheers,

Felix
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Johan Tibell | 29 Nov 21:50 2012
Picon

Re: To my boss: The code is cool, but it is about 100 times slower than the old one...

Ack, it seems like you're running into one of these bugs (all now
fixed, but I don't know in which GHC version):

http://hackage.haskell.org/trac/ghc/search?q=doubleFromInteger
Fixie Fixie | 29 Nov 22:00 2012
Picon

Vedr: To my boss: The code is cool, but it is about 100 times slower than the old one...

The program seems to take around 6 seconds on my linux-box, while the c version goes for 0.06 sekcond.

That is really some regression bug :-)

Anyone with a more recent version thatn 7.4.1?

Felix

Fra: Johan Tibell <johan.tibell <at> gmail.com>
Til: Fixie Fixie <fixie.fixie <at> rocketmail.com>
Kopi: Haskell cafe <haskell-cafe <at> haskell.org>
Sendt: Torsdag, 29. november 2012 21.50
Emne: Re: [Haskell-cafe] To my boss: The code is cool, but it is about 100 times slower than the old one...

Ack, it seems like you're running into one of these bugs (all now
fixed, but I don't know in which GHC version):

http://hackage.haskell.org/trac/ghc/search?q=doubleFromInteger


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Johan Tibell | 29 Nov 22:06 2012
Picon

Re: To my boss: The code is cool, but it is about 100 times slower than the old one...

On Thu, Nov 29, 2012 at 1:00 PM, Fixie Fixie <fixie.fixie <at> rocketmail.com> wrote:
> The program seems to take around 6 seconds on my linux-box, while the c
> version goes for 0.06 sekcond.
>
> That is really some regression bug :-)
>
> Anyone with a more recent version thatn 7.4.1?

On 7.4.2:

$ time ./c_test
...

real	0m0.145s
user	0m0.040s
sys	0m0.003s

$ time ./Test
...

real	0m0.234s
user	0m0.220s
sys	0m0.006s

Both compiled with -O2.
Fixie Fixie | 29 Nov 22:23 2012
Picon

Vedr: To my boss: The code is cool, but it is about 100 times slower than the old one...

That's really an argument for upgrading to 7.4.2 :-)

Another reason for doing things with haskell is this mailing list.

Thanks!

Felix

Fra: Johan Tibell <johan.tibell <at> gmail.com>
Til: Fixie Fixie <fixie.fixie <at> rocketmail.com>
Kopi: Haskell cafe <haskell-cafe <at> haskell.org>
Sendt: Torsdag, 29. november 2012 22.06
Emne: Re: [Haskell-cafe] To my boss: The code is cool, but it is about 100 times slower than the old one...

On Thu, Nov 29, 2012 at 1:00 PM, Fixie Fixie <fixie.fixie <at> rocketmail.com> wrote:
> The program seems to take around 6 seconds on my linux-box, while the c
> version goes for 0.06 sekcond.
>
> That is really some regression bug :-)
>
> Anyone with a more recent version thatn 7.4.1?

On 7.4.2:

$ time ./c_test
...

real    0m0.145s
user    0m0.040s
sys    0m0.003s

$ time ./Test
...

real    0m0.234s
user    0m0.220s
sys    0m0.006s

Both compiled with -O2.


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Johan Tibell | 29 Nov 22:26 2012
Picon

Re: To my boss: The code is cool, but it is about 100 times slower than the old one...

On Thu, Nov 29, 2012 at 1:23 PM, Fixie Fixie <fixie.fixie <at> rocketmail.com> wrote:
> That's really an argument for upgrading to 7.4.2 :-)
>
> Another reason for doing things with haskell is this mailing list.

FYI I'm still looking into this issue as I'm not 100% happy with the
code GHC generates.
Daniel Fischer | 29 Nov 22:32 2012

Re: Vedr: To my boss: The code is cool, but it is about 100 times slower than the old one...

On Donnerstag, 29. November 2012, 21:00:36, Fixie Fixie wrote:
> The program seems to take around 6 seconds on my linux-box, while the c
> version goes for 0.06 sekcond.
> 
> That is really some regression bug :-)
> 
> Anyone with a more recent version thatn 7.4.1?

I don't even have a problem with 7.4.1:

$ for ghc in $GHCS; do echo $ghc; time ./hskahan-$ghc > /dev/null; done;
7.0.4

real    0m0.217s
user    0m0.214s
sys     0m0.002s
7.2.1

real    0m0.197s
user    0m0.194s
sys     0m0.002s
7.2.2

real    0m0.187s
user    0m0.187s
sys     0m0.000s
7.4.1

real    0m0.253s
user    0m0.249s
sys     0m0.003s
7.4.2

real    0m0.250s
user    0m0.247s
sys     0m0.002s
7.6.1

real    0m0.224s
user    0m0.221s
sys     0m0.002s

$ time ./ckahan > /dev/null

real    0m0.102s
user    0m0.079s
sys     0m0.022s

We have an unpleasant regression in comparison to 7.2.* and the 7.4.* were 
slower than 7.6.1 is, but it's all okay here (not that it wouldn't be nice to 
have it faster still).

Are you on a 32-bit system?
Johan Tibell | 29 Nov 22:40 2012
Picon

Re: Vedr: To my boss: The code is cool, but it is about 100 times slower than the old one...

On Thu, Nov 29, 2012 at 1:32 PM, Daniel Fischer
<daniel.is.fischer <at> googlemail.com> wrote:
> We have an unpleasant regression in comparison to 7.2.* and the 7.4.* were
> slower than 7.6.1 is, but it's all okay here (not that it wouldn't be nice to
> have it faster still).
>
> Are you on a 32-bit system?

This version works around the Word->Double conversion bug and shows
good performance:

(Always compile with -Wall, it tells you if some arguments are
defaulted to slow Integers, instead of fast Ints.)

{-# LANGUAGE CPP, BangPatterns, MagicHash #-}
module Main (main) where

#define VDIM 100
#define VNUM 100000

import Control.Monad.ST
import Data.Array.Base
import Data.Array.ST
import Data.Bits
import GHC.Word

import GHC.Exts

prng :: Word -> Word
prng w = w'
  where
    w1 = w `xor` (w `shiftL` 13)
    w2 = w1 `xor` (w1 `shiftR` 7)
    w' = w2 `xor` (w2 `shiftL` 17)

type Vec s = STUArray s Int Double

kahan :: Vec s -> Vec s -> ST s ()
kahan s c = do
    let inner !w j
            | j < VDIM  = do
                cj <- unsafeRead c j
                sj <- unsafeRead s j
                let y = word2Double w - cj
                    t = sj + y
                    w' = prng w
                unsafeWrite c j ((t-sj)-y)
                unsafeWrite s j t
                inner w' (j+1)
            | otherwise = return ()

        outer i | i < VNUM = inner (fromIntegral i) 0 >> outer (i + 1)
                | otherwise = return ()
    outer (0 :: Int)

calc :: ST s (Vec s)
calc = do
    s <- newArray (0,VDIM-1) 0
    c <- newArray (0,VDIM-1) 0
    kahan s c
    return s

main :: IO ()
main = print . elems $ runSTUArray calc

word2Double :: Word -> Double
word2Double (W# w) = D# (int2Double# (word2Int# w))

On my (64-bit) machine the Haskell and C versions are on par.

-- Johan
Johan Tibell | 29 Nov 22:42 2012
Picon

Re: Vedr: To my boss: The code is cool, but it is about 100 times slower than the old one...

On Thu, Nov 29, 2012 at 1:40 PM, Johan Tibell <johan.tibell <at> gmail.com> wrote:
> This version works around the Word->Double conversion bug and shows
> good performance:

I'd also like to point out that I've removed lots of bang patterns
that weren't needed. This program runs fine without any bang patterns
(but I've kept the one that can possibly have any performance
implication at all).

-- Johan
Daniel Fischer | 29 Nov 23:01 2012

Re: Vedr: To my boss: The code is cool, but it is about 100 times slower than the old one...

On Donnerstag, 29. November 2012, 13:40:42, Johan Tibell wrote:
> word2Double :: Word -> Double
> word2Double (W# w) = D# (int2Double# (word2Int# w))
> 
> On my (64-bit) machine the Haskell and C versions are on par.

Yes, but the result is very different.
Johan Tibell | 29 Nov 23:02 2012
Picon

Re: Vedr: To my boss: The code is cool, but it is about 100 times slower than the old one...

On Thu, Nov 29, 2012 at 2:01 PM, Daniel Fischer
<daniel.is.fischer <at> googlemail.com> wrote:
> On Donnerstag, 29. November 2012, 13:40:42, Johan Tibell wrote:
>> word2Double :: Word -> Double
>> word2Double (W# w) = D# (int2Double# (word2Int# w))
>>
>> On my (64-bit) machine the Haskell and C versions are on par.
>
> Yes, but the result is very different.

Doh, I guess I didn't look at the output carefully enough.
Johan Tibell | 29 Nov 23:08 2012
Picon

Re: Vedr: To my boss: The code is cool, but it is about 100 times slower than the old one...

On Thu, Nov 29, 2012 at 2:02 PM, Johan Tibell <johan.tibell <at> gmail.com> wrote:
> On Thu, Nov 29, 2012 at 2:01 PM, Daniel Fischer
> <daniel.is.fischer <at> googlemail.com> wrote:
>> On Donnerstag, 29. November 2012, 13:40:42, Johan Tibell wrote:
>>> word2Double :: Word -> Double
>>> word2Double (W# w) = D# (int2Double# (word2Int# w))
>>>
>>> On my (64-bit) machine the Haskell and C versions are on par.
>>
>> Yes, but the result is very different.
>
> Doh, I guess I didn't look at the output carefully enough.

One obvious error is that the C code has one loop go from 1..n where I
just naively assumed all loops go from 0..n-1. This fixes that:

        outer i | i <= VNUM = inner (fromIntegral i) 0 >> outer (i + 1)
                | otherwise = return ()
    outer (1 :: Int)

Perhaps the other issue is that

    word2Double (W# w) = D# (int2Double# (word2Int# w))

is possibly the wrong way and we need a word2Double#.

-- Johan
Daniel Fischer | 29 Nov 23:07 2012

Re: Vedr: To my boss: The code is cool, but it is about 100 times slower than the old one...

On Donnerstag, 29. November 2012, 13:40:42, Johan Tibell wrote:
> 
> word2Double :: Word -> Double
> word2Double (W# w) = D# (int2Double# (word2Int# w))
> 
> On my (64-bit) machine the Haskell and C versions are on par.

On my box, the Haskell is even faster then, but, as said, the result is 
incorrect

With

correction :: Double
correction = 2 * int2Double minBound

word2Double :: Word -> Double
word2Double w = case fromIntegral w of
                   i | i < 0 -> int2Double i - correction
                     | otherwise -> int2Double i

I get

real    0m0.078s
user    0m0.077s
sys     0m0.001s

with correct results.

Okay, we **need** a better Word -> Double etc. conversion. We could start with 
the above, that seems not too shabby.
Johan Tibell | 29 Nov 20:15 2012
Picon

Re: To my boss: The code is cool, but it is about 100 times slower than the old one...

Hi Felix,

On Thu, Nov 29, 2012 at 10:09 AM, Fixie Fixie
<fixie.fixie <at> rocketmail.com> wrote:
> The problem seems to be connected to lazy loading, which makes my programs
> so slow that I really can not show them to anyone. I have tried all tricks
> in the books, like !, seq, non-lazy datatypes...

My advice usually goes like this:

 1. Use standard, high-performance libraries (I've made a list of high
quality libraries at
https://github.com/tibbe/haskell-docs/blob/master/libraries-directory.md).
 2. Make your data type fields strict.
 3. Unpack primitive types (e.g. Int, Word, Float, Double).
 4. Reduce allocation in tight loops (e.g. avoid creating lots of
intermediate lists).

I always do 1-3, but only do 4 when it's really necessary (e.g. in the
inner loop of some machine learning algorithm).

Rarely is performance issues due to lacking bang patterns on functions
(although there are cases, e.g. when writing recursive functions with
accumulators, where you need one).

> I was poking around to see if this had changed, then I ran into this forum
> post:
> http://stackoverflow.com/questions/9409634/is-indexing-of-data-vector-unboxed-mutable-mvector-really-this-slow
>
> The last solution was a haskell program which was in the 3x range to C,
> which I think is ok. This was in the days of ghc 7.0
>
> I then tried compile the programs myself (ghc 7.4.1), but found that now the
> C program now was more that 100x faster. The ghc code was compiled with both
> O2 and O3, giving only small differences on my 64-bit Linux box.
>
> So it seems something has changed - and even small examples are still not
> safe when it comes to the lazy-monster. It reminds me of some code I read a
> couple of years ago where one of the Simons actually fired off a new thread,
> to make sure a variable was realized.

Note that the issues in the blog post are not due to laziness (i.e.
there are no space leaks), but due to the code being more polymorphic
than the C code, causing extra allocation and indirection.

> A sad thing, since I am More that willing to go for Haskell if proves to be
> usable. If anyone can see what is wrong with the code (there are two haskell
> versions on the page, I have tried the last and fastest one) it would also
> be interesting.
>
> What is your experience, dear haskellers? To me it seems this beautiful
> language is useless without a better lazy/eager-analyzer.

It's definitely possible to write fast Haskell code (as some Haskell
programmers manage to do so consistently), but I appreciate that it's
harder than it should be. In my opinion the major thing missing is a
good text on how to write fast Haskell code and some tweaks to the
compiler (e.g. unbox strict primitive fields like Int by default).

Hope this helps.

Cheers,
Johan
Ivan Salazar | 29 Nov 20:17 2012
Picon

Re: To my boss: The code is cool, but it is about 100 times slower than the old one...

I hear you, my friend.


What I love of Haskell is that a lot of algorithms are very clean to express and understand compared to, say, Lisp or C. Compared to Lisp, function manipulation is also very clean (even compared to Racket). A great plus is also type inference.

The bad side is that direct translation of algorithms are almost always very slow and the work needed to make them perform is very mind bending. I sometimes feel as if I am doing the work of a compiler/optimizer and wonder if a computer program can do this translations.

In the correct hands (the Simons, Don Stewart, etc) I am sure Haskell is a killer, but in hands of lowly, imperative mortals it is a time bomb. A good example of this is narrated in Clarke's Superiority.

I am a total Haskell newbie, so please ignore me if my opinion sounds stupid, but I think I read once that Peyton-Jones said that the Haskell of the future will be strict.

Thanks!


2012/11/29 Fixie Fixie <fixie.fixie <at> rocketmail.com>
Hi all haskellers

I every now and then get the feeling that doing my job code in Haskell would be a good idea.

I have tried a couple of times, but each time I seem to run into performance problems - I do lots of heavy computing.

The problem seems to be connected to lazy loading, which makes my programs so slow that I really can not show them to anyone. I have tried all tricks in the books, like !, seq, non-lazy datatypes...

I was poking around to see if this had changed, then I ran into this forum post: http://stackoverflow.com/questions/9409634/is-indexing-of-data-vector-unboxed-mutable-mvector-really-this-slow

The last solution was a haskell program which was in the 3x range to C, which I think is ok. This was in the days of ghc 7.0

I then tried compile the programs myself (ghc 7.4.1), but found that now the C program now was more that 100x faster. The ghc code was compiled with both O2 and O3, giving only small differences on my 64-bit Linux box.

So it seems something has changed - and even small examples are still not safe when it comes to the lazy-monster. It reminds me of some code I read a couple of years ago where one of the Simons actually fired off a new thread, to make sure a variable was realized.

A sad thing, since I am More that willing to go for Haskell if proves to be usable. If anyone can see what is wrong with the code (there are two haskell versions on the page, I have tried the last and fastest one) it would also be interesting.

What is your experience, dear haskellers? To me it seems this beautiful language is useless without a better lazy/eager-analyzer.

Cheers,

Felix

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
wren ng thornton | 2 Dec 04:16 2012

Re: To my boss: The code is cool, but it is about 100 times slower than the old one...

On 11/29/12 2:17 PM, Ivan Salazar wrote:
> The bad side is that direct translation of algorithms are almost always
> very slow and the work needed to make them perform is very mind bending.

Indeed. The thing is, all algorithms make (implicit) assumptions about 
the cost model of the underlying language. The problem comes about 
because the assumptions made in most algorithms are not valid in 
Haskell. This isn't just an issue of laziness; the entire computational 
model of Haskell (e.g., STG) is radically different from imperative 
languages.

For example, in many cases just passing arguments around (a la the State 
monad) is much faster than using ST and explicit mutation. GHC does a 
lot of clever code reorganization, but mutation breaks countless purity 
laws and so it inhibits the application of these optimizations. 
Unfortunately, the vast majority of algorithms out there assume you're 
working in a language where mutation is essentially free. I'm not 
talking about any significant theoretical issues about using mutation or 
not; I'm just talking about the implicit assumptions that algorithm 
implementers make. If you believe mutation is essentially free then it 
makes sense to create an initial object and then incrementally mutate it 
this way and that until you get to where you want. But, a great deal of 
the time there's nothing actually stopping us from gathering all the 
information and allocating the final result directly. However, if you're 
not used to thinking in those terms, then the conceptual reorganization 
required to see how to gather all the data at once is indeed mind-bending.

--

-- 
Live well,
~wren
Ivan Salazar | 3 Dec 07:49 2012
Picon

Re: To my boss: The code is cool, but it is about 100 times slower than the old one...

I see.

I read some chapters from Purely Functional Data Structures when I was in college in order to understand some tree algorithms, but not the whole book.
Do you think that could help me to understand performance problems with code (poorly) written in Haskell?

From reading your post, I can guess the book is a good start, but what about Haskell/GHC specific hacks?
Is there a good written source for gathering the required knowledge?

For example, I remember I struggled a lot with Arrays once, and just days after digging I found that Arrays are a no-no in Haskell. That was before finding this maiking list, though.

On Dec 1, 2012 9:18 PM, "wren ng thornton" <wren <at> freegeek.org> wrote:
>
> On 11/29/12 2:17 PM, Ivan Salazar wrote:
>>
>> The bad side is that direct translation of algorithms are almost always
>> very slow and the work needed to make them perform is very mind bending.
>
>
> Indeed. The thing is, all algorithms make (implicit) assumptions about the cost model of the underlying language. The problem comes about because the assumptions made in most algorithms are not valid in Haskell. This isn't just an issue of laziness; the entire computational model of Haskell (e.g., STG) is radically different from imperative languages.
>
> For example, in many cases just passing arguments around (a la the State monad) is much faster than using ST and explicit mutation. GHC does a lot of clever code reorganization, but mutation breaks countless purity laws and so it inhibits the application of these optimizations. Unfortunately, the vast majority of algorithms out there assume you're working in a language where mutation is essentially free. I'm not talking about any significant theoretical issues about using mutation or not; I'm just talking about the implicit assumptions that algorithm implementers make. If you believe mutation is essentially free then it makes sense to create an initial object and then incrementally mutate it this way and that until you get to where you want. But, a great deal of the time there's nothing actually stopping us from gathering all the information and allocating the final result directly. However, if you're not used to thinking in those terms, then the conceptual reorganization required to see how to gather all the data at once is indeed mind-bending.
>
> --
> Live well,
> ~wren
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

On Dec 1, 2012 9:18 PM, "wren ng thornton" <wren <at> freegeek.org> wrote:
On 11/29/12 2:17 PM, Ivan Salazar wrote:
The bad side is that direct translation of algorithms are almost always
very slow and the work needed to make them perform is very mind bending.

Indeed. The thing is, all algorithms make (implicit) assumptions about the cost model of the underlying language. The problem comes about because the assumptions made in most algorithms are not valid in Haskell. This isn't just an issue of laziness; the entire computational model of Haskell (e.g., STG) is radically different from imperative languages.

For example, in many cases just passing arguments around (a la the State monad) is much faster than using ST and explicit mutation. GHC does a lot of clever code reorganization, but mutation breaks countless purity laws and so it inhibits the application of these optimizations. Unfortunately, the vast majority of algorithms out there assume you're working in a language where mutation is essentially free. I'm not talking about any significant theoretical issues about using mutation or not; I'm just talking about the implicit assumptions that algorithm implementers make. If you believe mutation is essentially free then it makes sense to create an initial object and then incrementally mutate it this way and that until you get to where you want. But, a great deal of the time there's nothing actually stopping us from gathering all the information and allocating the final result directly. However, if you're not used to thinking in those terms, then the conceptual reorganization required to see how to gather all the data at once is indeed mind-bending.

--
Live well,
~wren

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Stephen Tetley | 29 Nov 23:02 2012
Picon

Re: To my boss: The code is cool, but it is about 100 times slower than the old one...

On 29 November 2012 18:09, Fixie Fixie <fixie.fixie <at> rocketmail.com> wrote:

>
> What is your experience, dear haskellers? To me it seems this beautiful
> language is useless without a better lazy/eager-analyzer.

Since when has speed been the sole arbiter of utility?

10 years ago I switched from Clean to Haskell, even though Clean was
then faster (and included a good strictness analyser). The available
libraries for Haskell is what swung the decision for me.

Gmane