Łukasz Dąbek | 4 Mar 19:08 2013
Picon

Concurrency performance problem

Hello Cafe!

I have a problem with following code: http://hpaste.org/83460. It is a
simple Monte Carlo integration. The problem is that when I run my
program with +RTS -N1 I get:
Multi
693204.039020917 8.620632s
Single
693204.039020917 8.574839s
End

And with +RTS -N4 (I have four CPU cores):
Multi
693204.0390209169 11.877143s
Single
693204.039020917 11.399888s
End

I have two questions:
 1) Why performance decreases when I add more cores for my program?
 2) Why performance of single threaded integration also changes with
number of cores?

Thanks for all answers,
Łukasz Dąbek.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
(Continue reading)

Don Stewart | 4 Mar 20:13 2013
Picon

Re: Concurrency performance problem

Depends on your code...

On Mar 4, 2013 6:10 PM, "Łukasz Dąbek" <sznurek <at> gmail.com> wrote:
Hello Cafe!

I have a problem with following code: http://hpaste.org/83460. It is a
simple Monte Carlo integration. The problem is that when I run my
program with +RTS -N1 I get:
Multi
693204.039020917 8.620632s
Single
693204.039020917 8.574839s
End

And with +RTS -N4 (I have four CPU cores):
Multi
693204.0390209169 11.877143s
Single
693204.039020917 11.399888s
End

I have two questions:
 1) Why performance decreases when I add more cores for my program?
 2) Why performance of single threaded integration also changes with
number of cores?

Thanks for all answers,
Łukasz Dąbek.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Łukasz Dąbek | 4 Mar 20:25 2013
Picon

Re: Concurrency performance problem

What do you exactly mean? I have included link to full source listing:
http://hpaste.org/83460.

--
Łukasz Dąbek

2013/3/4 Don Stewart <dons00 <at> gmail.com>:
> Depends on your code...
>
> On Mar 4, 2013 6:10 PM, "Łukasz Dąbek" <sznurek <at> gmail.com> wrote:
>>
>> Hello Cafe!
>>
>> I have a problem with following code: http://hpaste.org/83460. It is a
>> simple Monte Carlo integration. The problem is that when I run my
>> program with +RTS -N1 I get:
>> Multi
>> 693204.039020917 8.620632s
>> Single
>> 693204.039020917 8.574839s
>> End
>>
>> And with +RTS -N4 (I have four CPU cores):
>> Multi
>> 693204.0390209169 11.877143s
>> Single
>> 693204.039020917 11.399888s
>> End
>>
>> I have two questions:
>>  1) Why performance decreases when I add more cores for my program?
>>  2) Why performance of single threaded integration also changes with
>> number of cores?
>>
>> Thanks for all answers,
>> Łukasz Dąbek.
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe <at> haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Don Stewart | 4 Mar 20:30 2013
Picon

Re: Concurrency performance problem

Apologies, didn't see the link on my phone :)

As the comment on the link shows, youre accidentally migrating unevaluated work to the main thread, hence no speedup.

Be very careful with evaluation strategies (esp. lazy expressions) around MVar and TVar points. Its too easy to put a thunk in one.

The strict-concurrency package is one attempt to invert the conventional lazy box, to better match thge most common case.

On Mar 4, 2013 7:25 PM, "Łukasz Dąbek" <sznurek <at> gmail.com> wrote:
What do you exactly mean? I have included link to full source listing:
http://hpaste.org/83460.

--
Łukasz Dąbek

2013/3/4 Don Stewart <dons00 <at> gmail.com>:
> Depends on your code...
>
> On Mar 4, 2013 6:10 PM, "Łukasz Dąbek" <sznurek <at> gmail.com> wrote:
>>
>> Hello Cafe!
>>
>> I have a problem with following code: http://hpaste.org/83460. It is a
>> simple Monte Carlo integration. The problem is that when I run my
>> program with +RTS -N1 I get:
>> Multi
>> 693204.039020917 8.620632s
>> Single
>> 693204.039020917 8.574839s
>> End
>>
>> And with +RTS -N4 (I have four CPU cores):
>> Multi
>> 693204.0390209169 11.877143s
>> Single
>> 693204.039020917 11.399888s
>> End
>>
>> I have two questions:
>>  1) Why performance decreases when I add more cores for my program?
>>  2) Why performance of single threaded integration also changes with
>> number of cores?
>>
>> Thanks for all answers,
>> Łukasz Dąbek.
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe <at> haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Łukasz Dąbek | 4 Mar 20:39 2013
Picon

Re: Concurrency performance problem

Thank you for your help! This solved my performance problem :)

Anyway, the second question remains. Why performance of single
threaded calculation is affected by RTS -N parameter. Is GHC doing
some parallelization behind the scenes?

--
Łukasz Dąbek.

2013/3/4 Don Stewart <dons00 <at> gmail.com>:
> Apologies, didn't see the link on my phone :)
>
> As the comment on the link shows, youre accidentally migrating unevaluated
> work to the main thread, hence no speedup.
>
> Be very careful with evaluation strategies (esp. lazy expressions) around
> MVar and TVar points. Its too easy to put a thunk in one.
>
> The strict-concurrency package is one attempt to invert the conventional
> lazy box, to better match thge most common case.
>
> On Mar 4, 2013 7:25 PM, "Łukasz Dąbek" <sznurek <at> gmail.com> wrote:
>>
>> What do you exactly mean? I have included link to full source listing:
>> http://hpaste.org/83460.
>>
>> --
>> Łukasz Dąbek
>>
>> 2013/3/4 Don Stewart <dons00 <at> gmail.com>:
>> > Depends on your code...
>> >
>> > On Mar 4, 2013 6:10 PM, "Łukasz Dąbek" <sznurek <at> gmail.com> wrote:
>> >>
>> >> Hello Cafe!
>> >>
>> >> I have a problem with following code: http://hpaste.org/83460. It is a
>> >> simple Monte Carlo integration. The problem is that when I run my
>> >> program with +RTS -N1 I get:
>> >> Multi
>> >> 693204.039020917 8.620632s
>> >> Single
>> >> 693204.039020917 8.574839s
>> >> End
>> >>
>> >> And with +RTS -N4 (I have four CPU cores):
>> >> Multi
>> >> 693204.0390209169 11.877143s
>> >> Single
>> >> 693204.039020917 11.399888s
>> >> End
>> >>
>> >> I have two questions:
>> >>  1) Why performance decreases when I add more cores for my program?
>> >>  2) Why performance of single threaded integration also changes with
>> >> number of cores?
>> >>
>> >> Thanks for all answers,
>> >> Łukasz Dąbek.
>> >>
>> >> _______________________________________________
>> >> Haskell-Cafe mailing list
>> >> Haskell-Cafe <at> haskell.org
>> >> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Johan Tibell | 4 Mar 21:45 2013
Picon

Re: Concurrency performance problem

On Mon, Mar 4, 2013 at 11:39 AM, Łukasz Dąbek <sznurek <at> gmail.com> wrote:

Thank you for your help! This solved my performance problem :)

Anyway, the second question remains. Why performance of single
threaded calculation is affected by RTS -N parameter. Is GHC doing
some parallelization behind the scenes?

I believe it's because -N makes GHC use the threaded RTS, which is different from the non-threaded RTS and has some overheads therefore. 
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Łukasz Dąbek | 4 Mar 22:43 2013
Picon

Re: Concurrency performance problem

2013/3/4 Johan Tibell <johan.tibell <at> gmail.com>:
> I believe it's because -N makes GHC use the threaded RTS, which is different
> from the non-threaded RTS and has some overheads therefore.

That's interesting. Can you recommend some reading materials about
this? Besides GHC source, of course ;) Explanation of why decrease in
performance is proportional to number of cores would be great.

--
Łukasz Dąbek

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Edward Z. Yang | 4 Mar 22:58 2013
Picon

Re: Concurrency performance problem

If you just pass -N, GHC automatically sets the number of threads
based on the number of cores on your machine. Do you mean -threaded?

Excerpts from Łukasz Dąbek's message of Mon Mar 04 11:39:43 -0800 2013:
> Thank you for your help! This solved my performance problem :)
> 
> Anyway, the second question remains. Why performance of single
> threaded calculation is affected by RTS -N parameter. Is GHC doing
> some parallelization behind the scenes?
> 
> --
> Łukasz Dąbek.
> 
> 2013/3/4 Don Stewart <dons00 <at> gmail.com>:
> > Apologies, didn't see the link on my phone :)
> >
> > As the comment on the link shows, youre accidentally migrating unevaluated
> > work to the main thread, hence no speedup.
> >
> > Be very careful with evaluation strategies (esp. lazy expressions) around
> > MVar and TVar points. Its too easy to put a thunk in one.
> >
> > The strict-concurrency package is one attempt to invert the conventional
> > lazy box, to better match thge most common case.
> >
> > On Mar 4, 2013 7:25 PM, "Łukasz Dąbek" <sznurek <at> gmail.com> wrote:
> >>
> >> What do you exactly mean? I have included link to full source listing:
> >> http://hpaste.org/83460.
> >>
> >> --
> >> Łukasz Dąbek
> >>
> >> 2013/3/4 Don Stewart <dons00 <at> gmail.com>:
> >> > Depends on your code...
> >> >
> >> > On Mar 4, 2013 6:10 PM, "Łukasz Dąbek" <sznurek <at> gmail.com> wrote:
> >> >>
> >> >> Hello Cafe!
> >> >>
> >> >> I have a problem with following code: http://hpaste.org/83460. It is a
> >> >> simple Monte Carlo integration. The problem is that when I run my
> >> >> program with +RTS -N1 I get:
> >> >> Multi
> >> >> 693204.039020917 8.620632s
> >> >> Single
> >> >> 693204.039020917 8.574839s
> >> >> End
> >> >>
> >> >> And with +RTS -N4 (I have four CPU cores):
> >> >> Multi
> >> >> 693204.0390209169 11.877143s
> >> >> Single
> >> >> 693204.039020917 11.399888s
> >> >> End
> >> >>
> >> >> I have two questions:
> >> >>  1) Why performance decreases when I add more cores for my program?
> >> >>  2) Why performance of single threaded integration also changes with
> >> >> number of cores?
> >> >>
> >> >> Thanks for all answers,
> >> >> Łukasz Dąbek.
> >> >>
> >> >> _______________________________________________
> >> >> Haskell-Cafe mailing list
> >> >> Haskell-Cafe <at> haskell.org
> >> >> http://www.haskell.org/mailman/listinfo/haskell-cafe
> 

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
briand | 4 Mar 23:08 2013

Re: Concurrency performance problem

On Mon, 4 Mar 2013 20:39:43 +0100
Łukasz Dąbek <sznurek <at> gmail.com> wrote:

> Thank you for your help! This solved my performance problem :)
> 

do you have a link to the new code ?

it should be very instructive to see the differences.

Brian

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Łukasz Dąbek | 4 Mar 23:23 2013
Picon

Re: Concurrency performance problem

2013/3/4  <briand <at> aracnet.com>:
> do you have a link to the new code ?

Diff is at the bottom of original code: http://hpaste.org/83460.

If you just pass -N, GHC automatically sets the number of threads
based on the number of cores on your machine. 

Yes, I know that. I am just wondering why seemingly single threaded computation (look at singleThreadIntegrate in source code from first post) runs slower with increasing number of cores available (set through -N option).

--
Łukasz Dąbek

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Nathan Howell | 5 Mar 16:54 2013
Picon

Re: Concurrency performance problem

Depends on the application, of course. The (on by default) parallel GC tends to kill performance for me... you might try running both with "+RTS -sstderr" to see if GC time is significantly higher, and try adding "+RTS -qg1" if it is.


On Mon, Mar 4, 2013 at 2:23 PM, Łukasz Dąbek <sznurek <at> gmail.com> wrote:
2013/3/4  <briand <at> aracnet.com>:

> do you have a link to the new code ?

Diff is at the bottom of original code: http://hpaste.org/83460.

If you just pass -N, GHC automatically sets the number of threads
based on the number of cores on your machine. 

Yes, I know that. I am just wondering why seemingly single threaded computation (look at singleThreadIntegrate in source code from first post) runs slower with increasing number of cores available (set through -N option).

--
Łukasz Dąbek


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Łukasz Dąbek | 5 Mar 17:46 2013
Picon

Re: Concurrency performance problem

2013/3/5 Nathan Howell <nathan.d.howell <at> gmail.com>
Depends on the application, of course. The (on by default) parallel GC tends to kill performance for me... you might try running both with "+RTS -sstderr" to see if GC time is significantly higher, and try adding "+RTS -qg1" if it is.
 
You are correct: parallel GC is slowing computation down. After some experiments I can produce two behaviors: use single threaded GC (multithreaded version is slowed down by factor of 5 - but single threaded backs to normal) or increase heap size (multithreaded version slows down by factor of 2, single threaded version runs normally). I guess I must live with this ;)

--
Łukasz Dąbek

 

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane