Jeroen | 17 May 18:17

Another optimization question

Hi, I know there's been quite some performance/optimization post lately,
so I hope there still room for one more. While solving a ProjectEuler
problem (27), I saw a performance issue I cannot explain. I narrowed it
down to the following code (never mind that 'primes' is just [1..],
the problem is the same or worse with real primes):

primes :: [Int]
primes = [1..]

isPrime :: Int -> Bool
isPrime x = isPrime' x primes
    where isPrime' x (p:ps) | x == p = True
                            | x > p = isPrime' x ps
                            | otherwise = False

main = print $ length (filter (== True) (map isPrime [1..5000]))

$ time ./experiment1
5000

real	0m4.037s
user	0m3.378s
sys	0m0.060s

All good, but if I change isPrime to the simpeler

isPrime x = elem x (takeWhile (<= x) primes)

it takes twice as long:

(Continue reading)

John Dorsey | 17 May 19:45

Re: Another optimization question

Jeroen,

> isPrime :: Int -> Bool
> isPrime x = isPrime' x primes
>     where isPrime' x (p:ps) | x == p = True
>                             | x > p = isPrime' x ps
>                             | otherwise = False
> 
> main = print $ length (filter (== True) (map isPrime [1..5000]))
[...]
> isPrime x = elem x (takeWhile (<= x) primes)

Here's a couple of things, although I don't know if they account for what
you're seeing.  All code is untested.

1)  A simpler main would be:

main = print $ length $ filter isPrime [1..5000]

This version manually fuses your map and filter.  Of course it's not
the same if you're doing anything else besides 'length'.

2)  The takeWhile in your second isPrime creates a throwaway list, which
doesn't exist in the explicit-recursion isPrime.  Unless this gets
optimized out, this could be the droid you're looking for.  I'd compile
with profiling (ghc -O2 --make -prof -auto-all experiment2), and run
./experiment2 +RTS -p -s
Find profiling stats in experiment2.prof, and check whether the latter
version isn't allocating a lot more.  When you feel like Core-diving,
it's something specific to look for.
(Continue reading)

anton muhin | 17 May 19:52

Re: Another optimization question

On Sat, May 17, 2008 at 8:19 PM, Jeroen <yrn001 <at> gmail.com> wrote:
> Hi, I know there's been quite some performance/optimization post lately,
> so I hope there still room for one more. While solving a ProjectEuler
> problem (27), I saw a performance issue I cannot explain. I narrowed it
> down to the following code (never mind that 'primes' is just [1..],
> the problem is the same or worse with real primes):
>
> primes :: [Int]
> primes = [1..]
>
> isPrime :: Int -> Bool
> isPrime x = isPrime' x primes
>    where isPrime' x (p:ps) | x == p = True
>                            | x > p = isPrime' x ps
>                            | otherwise = False
>
> main = print $ length (filter (== True) (map isPrime [1..5000]))
>
> $ time ./experiment1
> 5000
>
> real    0m4.037s
> user    0m3.378s
> sys     0m0.060s
>
>
> All good, but if I change isPrime to the simpeler
>
> isPrime x = elem x (takeWhile (<= x) primes)
>
(Continue reading)

Daniel Fischer | 17 May 20:40

Re: Another optimization question

Am Samstag, 17. Mai 2008 19:52 schrieb anton muhin:
> On Sat, May 17, 2008 at 8:19 PM, Jeroen <yrn001 <at> gmail.com> wrote:
> > Hi, I know there's been quite some performance/optimization post lately,
> > so I hope there still room for one more. While solving a ProjectEuler
> > problem (27), I saw a performance issue I cannot explain. I narrowed it
> > down to the following code (never mind that 'primes' is just [1..],
> > the problem is the same or worse with real primes):
> >
> > primes :: [Int]
> > primes = [1..]
> >
> > isPrime :: Int -> Bool
> > isPrime x = isPrime' x primes
> >    where isPrime' x (p:ps) | x == p = True
> >
> >                            | x > p = isPrime' x ps
> >                            | otherwise = False
> >
> > main = print $ length (filter (== True) (map isPrime [1..5000]))
> >
> > $ time ./experiment1
> > 5000
> >
> > real    0m4.037s
> > user    0m3.378s
> > sys     0m0.060s
> >
> >
> > All good, but if I change isPrime to the simpeler
> >
(Continue reading)

anton muhin | 17 May 22:48

Re: Another optimization question

On Sat, May 17, 2008 at 10:40 PM, Daniel Fischer
<daniel.is.fischer <at> web.de> wrote:
> Am Samstag, 17. Mai 2008 19:52 schrieb anton muhin:
>> On Sat, May 17, 2008 at 8:19 PM, Jeroen <yrn001 <at> gmail.com> wrote:
>> > Hi, I know there's been quite some performance/optimization post lately,
>> > so I hope there still room for one more. While solving a ProjectEuler
>> > problem (27), I saw a performance issue I cannot explain. I narrowed it
>> > down to the following code (never mind that 'primes' is just [1..],
>> > the problem is the same or worse with real primes):
>> >
>> > primes :: [Int]
>> > primes = [1..]
>> >
>> > isPrime :: Int -> Bool
>> > isPrime x = isPrime' x primes
>> >    where isPrime' x (p:ps) | x == p = True
>> >
>> >                            | x > p = isPrime' x ps
>> >                            | otherwise = False
>> >
>> > main = print $ length (filter (== True) (map isPrime [1..5000]))
>> >
>> > $ time ./experiment1
>> > 5000
>> >
>> > real    0m4.037s
>> > user    0m3.378s
>> > sys     0m0.060s
>> >
>> >
(Continue reading)

Favicon

Re: Another optimization question


On 2008 May 17, at 16:48, anton muhin wrote:
> Why not -O3?

-O3 doesn't do anything over -O2 in ghc.  -fvia-c -optc-O3 *might* be  
an improvement, or might not.

--

-- 
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery <at> kf8nh.com
system administrator [openafs,heimdal,too many hats] allbery <at> ece.cmu.edu
electrical and computer engineering, carnegie mellon university    KF8NH
anton muhin | 18 May 13:21

Re: Another optimization question

Thank you, Brandon!

yours,
anton.

On Sun, May 18, 2008 at 12:51 AM, Brandon S. Allbery KF8NH
<allbery <at> ece.cmu.edu> wrote:
>
> On 2008 May 17, at 16:48, anton muhin wrote:
>>
>> Why not -O3?
>
> -O3 doesn't do anything over -O2 in ghc.  -fvia-c -optc-O3 *might* be an
> improvement, or might not.
>
> --
> brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery <at> kf8nh.com
> system administrator [openafs,heimdal,too many hats] allbery <at> ece.cmu.edu
> electrical and computer engineering, carnegie mellon university    KF8NH
>
>
>
Daniel Fischer | 18 May 00:00

Re: Another optimization question

Am Samstag, 17. Mai 2008 22:48 schrieb anton muhin:
>
> Why not -O3?

As far as I know - and Brandon S.Allbery said so, too - -O3 isn't better than 
-O2.
>
> > Using a real list of primes,
>
> What's the size of the real list?

arbitrary, however, since it's [Int], it will actually be at most 105097565 
primes long, but since only numbers up to 5000 are checked, only 670 primes 
will ever be considered - When I check numbers 1 to 50000 (5133 primes, so 
5134 primes evaluated), with -O0 / -O2, it's
Jeroen 1 : 14.5 s / 2.4 s
Jeroen 2 : 52.5 s / 49.7 s
(== x) . head . dropWhile (< x) : 8.2 s /4.1 s
go primes : 5.5 s / 2.5 s

So Jeroen 1 is the best to be optimised :)

> >> but another
> >> implementation of isPrime:
> >>
> >> isPrime x = (== x) $ head $ dropWhile (< x) primes
> >
> > With -O2, this is about 20% slower than the Jeroen's first version,
> > without optimisations 50% faster.
> > Strange.
(Continue reading)

Jeroen | 18 May 07:45

Re: Another optimization question

Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> 
> Am Samstag, 17. Mai 2008 22:48 schrieb anton muhin:
> >
> > Why not -O3?
> 
> As far as I know - and Brandon S.Allbery said so, too - -O3 isn't better than 
> -O2.
> >
> > > Using a real list of primes,
> >
> > What's the size of the real list?
> 
> arbitrary, however, since it's [Int], it will actually be at most 105097565 
> primes long, but since only numbers up to 5000 are checked, only 670 primes 
> will ever be considered - When I check numbers 1 to 50000 (5133 primes, so 
> 5134 primes evaluated), with -O0 / -O2, it's
> Jeroen 1 : 14.5 s / 2.4 s
> Jeroen 2 : 52.5 s / 49.7 s
> (== x) . head . dropWhile (< x) : 8.2 s /4.1 s
> go primes : 5.5 s / 2.5 s
> 
> So Jeroen 1 is the best to be optimised :)
> 
> > >> but another
> > >> implementation of isPrime:
> > >>
> > >> isPrime x = (== x) $ head $ dropWhile (< x) primes
> > >
> > > With -O2, this is about 20% slower than the Jeroen's first version,
(Continue reading)

anton muhin | 18 May 14:54

Re: Re: Another optimization question

Any chances you're using Integer instead of Int?  On my box switch to
Integer is quite expensive (as one could expect).

yours,
anton.

On Sun, May 18, 2008 at 9:45 AM, Jeroen <yrn001 <at> gmail.com> wrote:
> Daniel Fischer <daniel.is.fischer <at> web.de> writes:
>>
>> Am Samstag, 17. Mai 2008 22:48 schrieb anton muhin:
>> >
>> > Why not -O3?
>>
>> As far as I know - and Brandon S.Allbery said so, too - -O3 isn't better than
>> -O2.
>> >
>> > > Using a real list of primes,
>> >
>> > What's the size of the real list?
>>
>> arbitrary, however, since it's [Int], it will actually be at most 105097565
>> primes long, but since only numbers up to 5000 are checked, only 670 primes
>> will ever be considered - When I check numbers 1 to 50000 (5133 primes, so
>> 5134 primes evaluated), with -O0 / -O2, it's
>> Jeroen 1 : 14.5 s / 2.4 s
>> Jeroen 2 : 52.5 s / 49.7 s
>> (== x) . head . dropWhile (< x) : 8.2 s /4.1 s
>> go primes : 5.5 s / 2.5 s
>>
>> So Jeroen 1 is the best to be optimised :)
(Continue reading)

Luke Palmer | 18 May 17:11

Re: Re: Another optimization question

On Sat, May 17, 2008 at 11:45 PM, Jeroen <yrn001 <at> gmail.com> wrote:
> I only tested with -O2 and my primes implementation
> is the Sieve of Eratosthenes and has signature
>
> primes :: Integral a => [a]

I'm guessing that you already know this, but this declares that primes
should *not* be cached globally (and is the reason for the
monomorphism restriction).  Depending on how it is used, it means that
you might be computing the list of primes many times.  Using a
monomorphic signature like [Int] or [Integer] will allow primes to be
cached, probably increasing the performance quite dramatically.

Luke
Bulat Ziganshin | 18 May 17:14

Re[2]: Re: Another optimization question

Hello Luke,

Sunday, May 18, 2008, 7:11:13 PM, you wrote:

>> primes :: Integral a => [a]

> you might be computing the list of primes many times.  Using a
> monomorphic signature like [Int] or [Integer] will allow primes to be
> cached, probably increasing the performance quite dramatically.

besides caching, polymorphism decreases performance many tens times

--

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin <at> gmail.com
anton muhin | 18 May 14:50

Re: Another optimization question

On Sun, May 18, 2008 at 2:00 AM, Daniel Fischer
<daniel.is.fischer <at> web.de> wrote:
> Am Samstag, 17. Mai 2008 22:48 schrieb anton muhin:
>>
>> Why not -O3?
>
> As far as I know - and Brandon S.Allbery said so, too - -O3 isn't better than
> -O2.

I didn't know that, sorry.

>> > Using a real list of primes,
>>
>> What's the size of the real list?
>
> arbitrary, however, since it's [Int], it will actually be at most 105097565
> primes long, but since only numbers up to 5000 are checked, only 670 primes
> will ever be considered - When I check numbers 1 to 50000 (5133 primes, so
> 5134 primes evaluated),

So it's just a relatively sparsed sorted list of 5134 numbers and the
greatest of them still fits into Int, correct?

> with -O0 / -O2, it's
> Jeroen 1 : 14.5 s / 2.4 s
> Jeroen 2 : 52.5 s / 49.7 s
probably Jeroen's hypothesis about temporary list built might explain
that slowdown, what do you think?

> (== x) . head . dropWhile (< x) : 8.2 s /4.1 s
(Continue reading)

Daniel Fischer | 18 May 18:03

Re: Another optimization question

Am Sonntag, 18. Mai 2008 14:50 schrieb anton muhin:
> On Sun, May 18, 2008 at 2:00 AM, Daniel Fischer
>
> <daniel.is.fischer <at> web.de> wrote:
> > Am Samstag, 17. Mai 2008 22:48 schrieb anton muhin:
> >> Why not -O3?
> >
> > As far as I know - and Brandon S.Allbery said so, too - -O3 isn't better
> > than -O2.
>
> I didn't know that, sorry.

No need to apologise. It was a perfectly reasonable question. Fortunately, 
Brandon just a few minutes before confirmed that my recollection was probably 
correct, otherwise I would've added "must check that, though".

>
> >> > Using a real list of primes,
> >>
> >> What's the size of the real list?
> >
> > arbitrary, however, since it's [Int], it will actually be at most
> > 105097565 primes long, but since only numbers up to 5000 are checked,
> > only 670 primes will ever be considered - When I check numbers 1 to 50000
> > (5133 primes, so 5134 primes evaluated),
>
> So it's just a relatively sparsed sorted list of 5134 numbers and the
> greatest of them still fits into Int, correct?

Technically, I think, it's a partially evaluated list with a thunk at the end 
(Continue reading)

anton muhin | 18 May 18:09

Re: Another optimization question

On Sat, May 17, 2008 at 10:40 PM, Daniel Fischer
<daniel.is.fischer <at> web.de> wrote:
> However, more than can be squished out of fiddling with these versions could
> be gained from a better algorithm.

Just for fun and there probably should be better implementation for
the same idea:

module Main where

data Tree a = Nil | Tree { el :: a, lft :: Tree a, rgt :: Tree a }
deriving (Eq, Ord, Show)

fromDistinctAscListN :: Int -> [a] -> ([a], Tree a)
fromDistinctAscListN 0 xs = (xs, Nil)
fromDistinctAscListN n xs = let ((e:xs'), l) = fromDistinctAscListN (n
- 1) xs in
  let (xs'', r) = fromDistinctAscListN (n - 1) xs' in (xs'', Tree { el
= e, lft = l, rgt = r })

branch :: Ord a => a -> a -> (a -> b) -> (a -> b) -> (a -> b) -> b
branch x y lt eq gt = case (compare x y) of
  LT -> lt x
  EQ -> eq x
  GT -> gt x

dispatch :: Ord a => a -> a -> (a -> Bool) -> (a -> Bool) -> Bool
dispatch x y lt gt = branch x y lt (const True) gt

member :: Ord a => a -> Tree a -> Bool
(Continue reading)


Gmane