Christof Meerwald | 25 Mar 18:55 2012

boost asio overhead/scalability

Hi,

I have been trying to look at how much overhead boost asio introduces
and how it scales to multiple cores.

Essentially, I have written a small test program that just
sends/receives 4-byte packets between a datagram unix-domain
socketpair (this is on a quad-core Intel i7-2600K  <at>  3.4 Ghz with
hyperthreading enabled and Linux 3.0). I am comparing a simple
blocking recv/send version (1 thread per socketpair) with a
hand-written edge-triggered epoll loop, a simple async I/O abstraction
library (using edge-triggered epoll) and boost asio (1.46) when
increasing the number of worker threads.

I have run this with 1 socketpair and 16 socketpairs (to get some
concurrency in the latter case):

- http://www.editgrid.com/export/sheetobject/41780862.png
- http://www.editgrid.com/export/sheetobject/41780860.png

As there isn't any concurrency when there is only a single socketpair,
I don't expect any speedup when increasing the number of threads, but
with boost asio I am seeing a very significant slowdown here. And even
with 16 socketpairs, I am still seeing a quite significant slowdown
for boost asio - when the other options show a speedup.

BTW, source code for the test programs is available from:

- http://svn.cmeerw.net/src/nginetd/trunk/test/mttest.cc
- http://svn.cmeerw.net/src/nginetd/trunk/test/eptest.cc
(Continue reading)

Marat Abrarov | 25 Mar 21:18 2012
Picon

Re: boost asio overhead/scalability

Hi, Christof.

Thanks for the benchmark.

The only things I can suggest are:
1. Boost.Asio 1.8.0 (Boost 1.49) is rather different from the Boost.Asio 1.4.8 (Boost 1.46).
2. Taking into account the impact of increasing the number of threads I can suggest to use "Custom Memory Allocation"
(http://www.boost.org/doc/libs/1_49_0/doc/html/boost_asio/overview/core/allocation.html).
It's "must have" when using
Asio with large number of async operations, otherwise Asio based code is often limited by used memory allocator.

Regards, 
Marat Abrarov.

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Christof Meerwald | 29 Mar 15:23 2012

Re: boost asio overhead/scalability

On Sun, 25 Mar 2012 23:18:23 +0400, Marat Abrarov wrote:
> The only things I can suggest are:
> 
> 1. Boost.Asio 1.8.0 (Boost 1.49) is rather different from the
> Boost.Asio 1.4.8 (Boost 1.46).

Indeed, I need to properly re-run with Boost 1.49 (some initial tests
suggest that there is a huge difference - at least when there is some
concurrency).

> 2. Taking into account the impact of increasing the number of
> threads I can suggest to use "Custom Memory Allocation"

I have tried doing this, but it doesn't seem to make much difference.

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
(Continue reading)

Marat Abrarov | 29 Mar 15:56 2012
Picon

Re: boost asio overhead/scalability

Hi, Christof.

> > 2. Taking into account the impact of increasing the number of
> > threads I can suggest to use "Custom Memory Allocation"
> 
> I have tried doing this, but it doesn't seem to make much difference.

Can we see sources with "Custom Memory Allocation"?
I heard on Ubuntu with 4 cores "Custom Memory Allocation" gives speedup in 2.5 times (in Russian:
http://asio-samples.blogspot.com/2011/06/design-journeys-with-asio.html?showComment=1312223771487#c5101121775924857582).

Regards, 
Marat Abrarov.

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Arash Partow | 8 Apr 01:36 2012
Picon

Re: boost asio overhead/scalability

On 30/03/2012 12:56 AM, Marat Abrarov wrote:
> Can we see sources with "Custom Memory Allocation"?
> I heard on Ubuntu with 4 cores "Custom Memory Allocation" gives speedup in 2.5 times (in Russian:
> http://asio-samples.blogspot.com/2011/06/design-journeys-with-asio.html?showComment=1312223771487#c5101121775924857582).
>

Agreed, seeing the updated test source code with custom allocator changes would definitely be a good idea.

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Gruenke, Matt | 8 Apr 01:49 2012

Re: boost asio overhead/scalability

Please also try without Boost.Bind.  At event frequencies such as this,
I'd guess it's adding measurable overhead.  In at least some cases,
memory allocation is incurred by its use.

Matt

-----Original Message-----
From: Arash Partow
Sent: Saturday, April 07, 2012 19:37
To: asio-users@...
Subject: Re: [asio-users] boost asio overhead/scalability

On 30/03/2012 12:56 AM, Marat Abrarov wrote:
> Can we see sources with "Custom Memory Allocation"?
> I heard on Ubuntu with 4 cores "Custom Memory Allocation" gives
speedup in 2.5 times (in Russian:
>
http://asio-samples.blogspot.com/2011/06/design-journeys-with-asio.html?
showComment=1312223771487#c5101121775924857582).
>

Agreed, seeing the updated test source code with custom allocator
changes would definitely be a good idea.

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
(Continue reading)

Christof Meerwald | 8 Apr 10:49 2012

Re: boost asio overhead/scalability

On Sun, 08 Apr 2012 09:36:39 +1000, Arash Partow wrote:
> On 30/03/2012 12:56 AM, Marat Abrarov wrote:
>> Can we see sources with "Custom Memory Allocation"?
>> I heard on Ubuntu with 4 cores "Custom Memory Allocation" gives speedup in 2.5 times (in Russian:
>> http://asio-samples.blogspot.com/2011/06/design-journeys-with-asio.html?showComment=1312223771487#c5101121775924857582).
> Agreed, seeing the updated test source code with custom allocator changes would definitely be a good idea.

Well, I tried using custom allocators - for the recv side it's pretty
straightforward to implement them (without any locking), but I didn't
see any measureable impact.

For the send side it's actually quite tricky as boost asio might cling
on to the memory for longer than you might think - so you would have
to at least use some locking there which then didn't seem sensible.

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
(Continue reading)

Arash Partow | 8 Apr 12:30 2012
Picon

Re: boost asio overhead/scalability

On 8/04/2012 6:49 PM, Christof Meerwald wrote:
> On Sun, 08 Apr 2012 09:36:39 +1000, Arash Partow wrote:
>> On 30/03/2012 12:56 AM, Marat Abrarov wrote:
>>> Can we see sources with "Custom Memory Allocation"?
>>> I heard on Ubuntu with 4 cores "Custom Memory Allocation" gives speedup in 2.5 times (in Russian:
>>> http://asio-samples.blogspot.com/2011/06/design-journeys-with-asio.html?showComment=1312223771487#c5101121775924857582).
>> Agreed, seeing the updated test source code with custom allocator changes would definitely be a good idea.
>
> Well, I tried using custom allocators - for the recv side it's pretty
> straightforward to implement them (without any locking), but I didn't
> see any measureable impact.
>

Don't expect to see a huge difference but a difference nonetheless. Did you try the suggestion someone made
earlier wrt pinning the process to a specific core? (aka setting the affinity explicitly)

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

(Continue reading)

Christof Meerwald | 29 Mar 22:36 2012

Re: boost asio overhead/scalability

On Thu, 29 Mar 2012 15:23:35 +0200, Christof Meerwald wrote:
> On Sun, 25 Mar 2012 23:18:23 +0400, Marat Abrarov wrote:
>> 1. Boost.Asio 1.8.0 (Boost 1.49) is rather different from the
>> Boost.Asio 1.4.8 (Boost 1.46).
> Indeed, I need to properly re-run with Boost 1.49 (some initial tests
> suggest that there is a huge difference - at least when there is some
> concurrency).

I have now re-run the test with boost 1.49.0 - while there is a huge
difference with 16 socketpairs, the times for only 1 socketpair and
multiple worker threads are still bad:

http://www.editgrid.com/export/sheetobject/41940529.png
http://www.editgrid.com/export/sheetobject/41940528.png

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
(Continue reading)

Marsh Ray | 26 Mar 21:36 2012

Re: boost asio overhead/scalability

On 03/25/2012 11:55 AM, Christof Meerwald wrote:
> Essentially, I have written a small test program that just
> sends/receives 4-byte packets between a datagram unix-domain
> socketpair (this is on a quad-core Intel i7-2600K  <at>  3.4 Ghz with
> hyperthreading enabled and Linux 3.0). I am comparing a simple
> blocking recv/send version (1 thread per socketpair) with a
> hand-written edge-triggered epoll loop, a simple async I/O abstraction
> library (using edge-triggered epoll) and boost asio (1.46) when
> increasing the number of worker threads.
>
> I have run this with 1 socketpair and 16 socketpairs (to get some
> concurrency in the latter case):

I think this research is very helpful. I did some similar testing a few 
years ago. (I've since learned that Ubuntu wasn't ever taking 7 of my 8 
HT cores out of low power mode!)

Some questions and suggestions if you do another run:

How many iterations were used for the test?

How many core-sec does it take per async op? Could be easily figured 
from iter count.

Are you setting CPU affinity on the process or on the threads?

It might be helpful to graph the threads on a linear rather than a log_2 
scale. Could be interesting to see if the perf takes a big jump from 4 
to 5 threads, or grows smoothly over 4 to 8.

(Continue reading)

Gruenke, Matt | 28 Mar 01:54 2012

Re: boost asio overhead/scalability

I seem to recall someone observing that threads calling
io_service::run() enter a FIFO to process the next event that's ready to
be handled.  If that's true, then it seems like high-frequency workloads
could experience cache thrashing.

I know this hearsay isn't very useful, but it would give you an avenue
for investigation.  I also mention this to point out that your
experimental workload might be introducing artifacts that obscure what
you're really trying to measure.

I think Marat's point makes sense.  If this workload is fairly
representative of your intended usage, it would be interesting to try to
minimize memory allocation overhead.  Otherwise, perhaps you might
increase the number of connections and/or the amount of per-packet
processing overhead.

Matt

-----Original Message-----
From: Christof Meerwald
Sent: Sunday, March 25, 2012 12:56
To: asio-users@...
Subject: [asio-users] boost asio overhead/scalability

Hi,

I have been trying to look at how much overhead boost asio introduces
and how it scales to multiple cores.

Essentially, I have written a small test program that just
(Continue reading)

Christof Meerwald | 29 Mar 15:31 2012

Re: boost asio overhead/scalability

On Tue, 27 Mar 2012 19:54:22 -0400, Gruenke, Matt wrote:
> I seem to recall someone observing that threads calling
> io_service::run() enter a FIFO to process the next event that's ready to
> be handled.  If that's true, then it seems like high-frequency workloads
> could experience cache thrashing.

Yes, I need to look at that in a bit more detail - but doing a
"strace" on the process shows that only one thread is doing an
epoll_wait at any one time.

> I know this hearsay isn't very useful, but it would give you an avenue
> for investigation.  I also mention this to point out that your
> experimental workload might be introducing artifacts that obscure what
> you're really trying to measure.

Essentially, I am just trying to find the limits of boost asio - i.e.
at what point does the overhead introduced by asio become the limiting
factor (but the OS would still be able to handle it).

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
(Continue reading)

Gruenke, Matt | 29 Mar 18:46 2012

Re: boost asio overhead/scalability

From: Christof Meerwald
Sent: Thursday, March 29, 2012 09:31

> On Tue, 27 Mar 2012 19:54:22 -0400, Gruenke, Matt wrote:
> > I seem to recall someone observing that threads calling
> > io_service::run() enter a FIFO to process the next event that's
> > ready to be handled.  If that's true, then it seems like
> > high-frequency workloads could experience cache thrashing.
> 
> Yes, I need to look at that in a bit more detail - but doing a
"strace"
> on the process shows that only one thread is doing an epoll_wait at
any
> one time.

That's exactly the point.  The others are either running handlers or
waiting (possibly in a FIFO) for their turn to get another event from
epoll().  Keep in mind that the FIFO might be implicit, arising from how
certain synchronization primitives are implemented on the platform.

> > I also mention this to point out that your experimental workload
might be
> > introducing artifacts that obscure what you're really trying to
measure.
>
> Essentially, I am just trying to find the limits of boost asio - i.e.
> at what point does the overhead introduced by asio become the limiting
> factor (but the OS would still be able to handle it).

Well, anything you measure is likely to be interesting to someone.
(Continue reading)

Arash Partow | 8 Apr 01:53 2012
Picon

Re: boost asio overhead/scalability

On 30/03/2012 3:46 AM, Gruenke, Matt wrote:
> From: Christof Meerwald
> Sent: Thursday, March 29, 2012 09:31
>
> I'd also avoid Boost.Bind.

Agreed, most implementations of tr1 for std::function/std::bind will be faster than boost::bind (it
just does a lot of fancy things in the background), also my understanding is that possibly a version of ASIO
may be released that will be completely free of boost dependencies - instead using only c++11 and tr1.

Christof, Furthermore, have you considered the asio coroutine example? I believe your eptest.cc would
map quite easily with that particular approach - which would result in a comparison that is more 'accurate/reasonable'

http://think-async.com/Asio/boost_asio_1_4_8/doc/html/boost_asio/example/http/server4/coroutine.hpp

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Christof Meerwald | 8 Apr 11:03 2012

Re: boost asio overhead/scalability

On Sun, 08 Apr 2012 09:53:40 +1000, Arash Partow wrote:
> Christof, Furthermore, have you considered the asio coroutine example? I believe your eptest.cc would
map quite easily with that particular approach - which would result in a comparison that is more 'accurate/reasonable'

Not yet - but would you expect that to provide any performance
improvement or would that add a bit more overhead?

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Arash Partow | 8 Apr 12:30 2012
Picon

Re: boost asio overhead/scalability

On 8/04/2012 7:03 PM, Christof Meerwald wrote:
> On Sun, 08 Apr 2012 09:53:40 +1000, Arash Partow wrote:
>> Christof, Furthermore, have you considered the asio coroutine example? I believe your eptest.cc would
map quite easily with that particular approach - which would result in a comparison that is more 'accurate/reasonable'
>
> Not yet - but would you expect that to provide any performance
> improvement or would that add a bit more overhead?
>

  -> 'accurate/reasonable' <-

In short if done correctly, it should wipe the floor clean.

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Christof Meerwald | 8 Apr 11:47 2012

Re: boost asio overhead/scalability

On Sun, 08 Apr 2012 09:53:40 +1000, Arash Partow wrote:
>> I'd also avoid Boost.Bind.
> Agreed, most implementations of tr1 for std::function/std::bind will
> be faster than boost::bind (it just does a lot of fancy things in the
> background)

Well, replacing boost::bind with std::tr1::bind (and std::tr1::mem_fn)
seems to result in a minor slowdown (about 1-2 %) with gcc 4.6.1.

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

(Continue reading)

Christof Meerwald | 8 Apr 12:04 2012

Re: boost asio overhead/scalability

On Sun, Apr 08, 2012 at 11:47:04AM +0200, Christof Meerwald wrote:
> On Sun, 08 Apr 2012 09:53:40 +1000, Arash Partow wrote:
> >> I'd also avoid Boost.Bind.
> > Agreed, most implementations of tr1 for std::function/std::bind will
> > be faster than boost::bind (it just does a lot of fancy things in the
> > background)
> 
> Well, replacing boost::bind with std::tr1::bind (and std::tr1::mem_fn)
> seems to result in a minor slowdown (about 1-2 %) with gcc 4.6.1.

And if I replace the boost::bind with some hand-written function
objects I see something like a 1-2 % speedup - can't really get
excited about that either.

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
(Continue reading)

Arash Partow | 8 Apr 12:32 2012
Picon

Re: boost asio overhead/scalability

On 8/04/2012 8:04 PM, Christof Meerwald wrote:
> On Sun, Apr 08, 2012 at 11:47:04AM +0200, Christof Meerwald wrote:
>> On Sun, 08 Apr 2012 09:53:40 +1000, Arash Partow wrote:
>>>> I'd also avoid Boost.Bind.
>>> Agreed, most implementations of tr1 for std::function/std::bind will
>>> be faster than boost::bind (it just does a lot of fancy things in the
>>> background)
>>
>> Well, replacing boost::bind with std::tr1::bind (and std::tr1::mem_fn)
>> seems to result in a minor slowdown (about 1-2 %) with gcc 4.6.1.
>
> And if I replace the boost::bind with some hand-written function
> objects I see something like a 1-2 % speedup - can't really get
> excited about that either.
>

Have you tried any of these:

http://www.codeproject.com/Articles/7150/Member-Function-Pointers-and-the-Fastest-Possible

http://www.codeproject.com/Articles/13287/Fast-C-Delegate

http://www.codeproject.com/Articles/18389/Fast-C-Delegate-Boost-Function-drop-in-replacement

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
(Continue reading)

Gruenke, Matt | 9 Apr 03:53 2012

Re: boost asio overhead/scalability

How big are your function objects?  I think boost::function reserves a small amount of internal space, then falls back on heap-based storage.
 
Have you used oprofile to verify that you're not using any standard memory alocation?
 
 
Matt

 
From: Christof Meerwald
Sent: Sun 4/8/2012 6:04 AM
To: asio-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: [asio-users] boost asio overhead/scalability

On Sun, Apr 08, 2012 at 11:47:04AM +0200, Christof Meerwald wrote:
> On Sun, 08 Apr 2012 09:53:40 +1000, Arash Partow wrote:
> >> I'd also avoid Boost.Bind.
> > Agreed, most implementations of tr1 for std::function/std::bind will
> > be faster than boost::bind (it just does a lot of fancy things in the
> > background)
>
> Well, replacing boost::bind with std::tr1::bind (and std::tr1::mem_fn)
> seems to result in a minor slowdown (about 1-2 %) with gcc 4.6.1.

And if I replace the boost::bind with some hand-written function
objects I see something like a 1-2 % speedup - can't really get
excited about that either.


Christof

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio
Christof Meerwald | 9 Apr 10:57 2012

Re: boost asio overhead/scalability

On Sun, 8 Apr 2012 21:53:21 -0400, Gruenke, Matt wrote:
> How big are your function objects?  I think boost::function reserves a =
> small amount of internal space, then falls back on heap-based storage.
>=20
> Have you used oprofile to verify that you're not using any standard =
> memory alocation?

I really had hoped that we could move on from trying to micro-optimise
function objects that don't appear to be significant anyway (as "And
if I replace the boost::bind with some hand-written function objects I
see something like a 1-2 % speedup - can't really get excited about
that either.")

I'll leave it for others to investigate that particular area further
if they are that keen...

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Gruenke, Matt | 9 Apr 17:04 2012

Re: boost asio overhead/scalability

Have you published the version without boost.bind?  If so, I'd be
willing to take a look at it.

The concern is that your function objects are getting wrapped by
boost.function, and copied around.  If your function objects are large
enough that boost.function is using heap memory, then perhaps the reason
you only saw 1-2% speedup is a small change in size from the boost.bind
version.  To the extent that you care about avoiding standard memory
allocation, I think this is a worthwhile course of investigation - not a
micro optimization.

I really think the more likely cause of poor scaling on a small set of
connections is the FIFO feeding of worker threads, which I previously
mentioned.  One would expect this to cause a large number of context
switches and poor cache efficiency.

Matt

-----Original Message-----
From: Christof Meerwald
Sent: Monday, April 09, 2012 04:57
To: asio-users@...
Subject: Re: [asio-users] boost asio overhead/scalability

On Sun, 8 Apr 2012 21:53:21 -0400, Gruenke, Matt wrote:
> How big are your function objects?  I think boost::function reserves a

>=  small amount of internal space, then falls back on heap-based
storage.
>=20
> Have you used oprofile to verify that you're not using any standard =

>memory alocation?

I really had hoped that we could move on from trying to micro-optimise
function objects that don't appear to be significant anyway (as "And if
I replace the boost::bind with some hand-written function objects I see
something like a 1-2 % speedup - can't really get excited about that
either.")

I'll leave it for others to investigate that particular area further if
they are that keen...

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Christof Meerwald | 10 Apr 21:33 2012

Re: boost asio overhead/scalability

On Mon, 9 Apr 2012 11:04:35 -0400, Gruenke, Matt wrote:
> Have you published the version without boost.bind?  If so, I'd be
> willing to take a look at it.

It is available now:
http://svn.cmeerw.net/src/nginetd/trunk/test/asiosrv-nobind.cc

> I really think the more likely cause of poor scaling on a small set of
> connections is the FIFO feeding of worker threads, which I previously
> mentioned.  One would expect this to cause a large number of context
> switches and poor cache efficiency.

I agree and this is the area I need to concentrate on next.

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Yuri Timenkov | 9 Apr 21:10 2012
Picon

Re: boost asio overhead/scalability

Just out of curiosity I did similar research recently.


1) direct virtual method call:
        if (o) 
            o->f(i); 
2) calling method via boost::function:
    boost::function<void (int)> func; 
    func = boost::bind(&Test::f, o, _1); 
        if (func) 
            func(i); 
3) calling via boost::signals:
    boost::signal<void (int)> sig; 
    sig.connect(boost::bind(&Test::f, o, _1)); 
        sig(i); 
4) calling via boost::signals2:
    boost::signals2::signal<void (int)> sig2;
    sig2.connect(boost::bind(&Test::f, o, _1));
    for (unsigned int i = 0; i < N; ++i)
    {
        sig2(i);
    }

Test direct: 0.3 
Test boost::function+bind: 0.48 
Test boost::signal: 8.13 
Test boost::signals2: 7.28

Time measured in seconds for 40000000 calls. Results are for 1 GHz CPU (Celeron IIRC).

Thus for my case I consider boost:: function + bind overhead quite negligible. I don't know how fast boost::bind with non-virtual functions compared with virtual.

On Mon, Apr 9, 2012 at 12:57 PM, Christof Meerwald <cmeerw-CmbjLal2ID4dnm+yROfE0A@public.gmane.org> wrote:
On Sun, 8 Apr 2012 21:53:21 -0400, Gruenke, Matt wrote:
> How big are your function objects?  I think boost::function reserves a =
> small amount of internal space, then falls back on heap-based storage.
>=20
> Have you used oprofile to verify that you're not using any standard =
> memory alocation?

I really had hoped that we could move on from trying to micro-optimise
function objects that don't appear to be significant anyway (as "And
if I replace the boost::bind with some hand-written function objects I
see something like a 1-2 % speedup - can't really get excited about
that either.")

I'll leave it for others to investigate that particular area further
if they are that keen...


Christof

--

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users-5NWGOfrQmncRDUWM+popnw@public.gmane.orgforge.net
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio
Marat Abrarov | 11 Apr 09:49 2012
Picon

Re: boost asio overhead/scalability

Hi, Christof.

I don't think I can really speed up your Asio-based test but couldn't you try this:
http://liveworkspace.org/code/724d87ae7f5daf2e3d558b7fc0290770
?

I have no Linux running on real hardware and virtual machines are not acceptable for any benchmarking.

Asio-based test uses pool of threads so it will be more efficient (more applicable in real life) when (in
order of
importance):
1. Number of socket pairs >> size of the used thread pool.
2. Size of the used thread pool == number of logical processors (cores x processors, without Hyper-threading).
3. Threads of the used thread pool are bound to logical processors (affinity mask). 

Regards, 
Marat Abrarov.

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Christof Meerwald | 12 Apr 12:53 2012

Re: boost asio overhead/scalability

On Wed, 11 Apr 2012 11:49:55 +0400, Marat Abrarov wrote:
> I don't think I can really speed up your Asio-based test but couldn't you try this:
> http://liveworkspace.org/code/724d87ae7f5daf2e3d558b7fc0290770
> ?

First observation is that it sometimes doesn't successfully complete -
which might be some kind of race condition with send_in_progress /
has_data_to_send.

Other observation is that the added synchronisation (via strand) seems
to reduce the number of context switches and might slightly improve
performance (but I am only seeing 1-2 % in some cases and a more
significant slowdown in other cases).

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Marat Abrarov | 12 Apr 13:07 2012
Picon

Re: boost asio overhead/scalability

> First observation is that it sometimes doesn't successfully complete -
> which might be some kind of race condition with send_in_progress /
> has_data_to_send.

You are right - I'm  investigating it right now - asio::io_service::strand had to resolve any race
conditions, but it
didn't.

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Marat Abrarov | 12 Apr 15:45 2012
Picon

Re: boost asio overhead/scalability

Good evening.

> On Wed, 11 Apr 2012 11:49:55 +0400, Marat Abrarov wrote:
> > I don't think I can really speed up your Asio-based test but couldn't you try this:
> > http://liveworkspace.org/code/724d87ae7f5daf2e3d558b7fc0290770
> > ?
> 
> First observation is that it sometimes doesn't successfully complete -
> which might be some kind of race condition with send_in_progress /
> has_data_to_send.

Fixed: http://liveworkspace.org/code/90670b78ce727ba27f5f2e03dd0bb20e

Let's see test results and diagrams.

Regards, 
Marat Abrarov.

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Christof Meerwald | 17 Apr 20:44 2012

Re: boost asio overhead/scalability

On Thu, 12 Apr 2012 17:45:57 +0400, Marat Abrarov wrote:
> Fixed: http://liveworkspace.org/code/90670b78ce727ba27f5f2e03dd0bb20e
>
> Let's see test results and diagrams.

I had to make some minor changes to make it compile without C++11
extensions
(http://svn.cmeerw.net/src/nginetd/trunk/test/asiosrv-alloc.cc).

But in the best case performance is the same as with my original
version and in some cases it's more than 10 % slower.

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Marat Abrarov | 17 Apr 22:15 2012
Picon

Re: boost asio overhead/scalability

> I had to make some minor changes to make it compile without C++11
> extensions
> (http://svn.cmeerw.net/src/nginetd/trunk/test/asiosrv-alloc.cc).

To become more accurate only: http://liveworkspace.org/code/915570b8ba4a436e735f426679ed9a00

> But in the best case performance is the same as with my original
> version and in some cases it's more than 10 % slower.

This behavior is quite expected. So the only thing left is thread scheduling - recently I came across an Asio bug
tracker's record "epoll_reactor could cause unnecessary wakeups"
(http://sourceforge.net/tracker/?func=detail&aid=3494858&group_id=122478&atid=694037)

Regards, 
Marat Abrarov.

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Christof Meerwald | 17 Apr 23:01 2012

Re: boost asio overhead/scalability

On Wed, 18 Apr 2012 00:15:48 +0400, Marat Abrarov wrote:
> This behavior is quite expected. So the only thing left is thread scheduling - recently I came across an
Asio bug
> tracker's record "epoll_reactor could cause unnecessary wakeups"
> (http://sourceforge.net/tracker/?func=detail&aid=3494858&group_id=122478&atid=694037)

Actually, I have seen this behaviour - what's even more annoying is
that the Linux kernel seems to happily post EPOLLOUT events after each
send (even if the state hasn't changed).

I did try to patch that in a very crude way by just removing the
EPOLLOUT registration from the asio header files (as it's not needed
for my tests at all) - unfortunately, the run-times didn't improve
with that change.

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Marat Abrarov | 12 Apr 16:04 2012
Picon

Re: boost asio overhead/scalability

> > I don't think I can really speed up your Asio-based test but couldn't you try this:
> > http://liveworkspace.org/code/724d87ae7f5daf2e3d558b7fc0290770
> > ?
> 
> First observation is that it sometimes doesn't successfully complete -
> which might be some kind of race condition with send_in_progress /
> has_data_to_send.
> 
> Other observation is that the added synchronisation (via strand) seems
> to reduce the number of context switches and might slightly improve
> performance (but I am only seeing 1-2 % in some cases and a more
> significant slowdown in other cases).

asio::io_service::strand is used for synchronization of usage of custom handler allocators only.
Such synchronization is required when reusing handler allocators. I choose asio::io_service::strand
as the default one
for Asio-based programs. 

strand gives some overhead too. It's interesting that strand reduces context switches. What's about FIFO thread
scheduling and affinity mask - have you tried them? 

Regards, 
Marat Abrarov.

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Cory Nelson | 17 Apr 21:10 2012
Picon

Re: boost asio overhead/scalability

On Wed, Apr 11, 2012 at 2:49 AM, Marat Abrarov <abrarov-JGs/UdohzUI@public.gmane.org> wrote:
Hi, Christof.

I don't think I can really speed up your Asio-based test but couldn't you try this:
http://liveworkspace.org/code/724d87ae7f5daf2e3d558b7fc0290770
?

I have no Linux running on real hardware and virtual machines are not acceptable for any benchmarking.

Asio-based test uses pool of threads so it will be more efficient (more applicable in real life) when (in order of
importance):
1. Number of socket pairs >> size of the used thread pool.
2. Size of the used thread pool == number of logical processors (cores x processors, without Hyper-threading).
3. Threads of the used thread pool are bound to logical processors (affinity mask).
 

Multi-threading (even pooling) introduces a good deal of overhead and will easily just make your app consume more CPU and battery life. Don't use it unless you've already hit the limits of a single thread and need to scale for more I/Os per second. Number of sockets is irrelevant.

--
Cory Nelson
http://int64.org
------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio
Christof Meerwald | 8 Apr 12:18 2012

Re: boost asio overhead/scalability

On Thu, 29 Mar 2012 12:46:19 -0400, Gruenke, Matt wrote:
> That's exactly the point.  The others are either running handlers or
> waiting (possibly in a FIFO) for their turn to get another event from
> epoll().  Keep in mind that the FIFO might be implicit, arising from how
> certain synchronization primitives are implemented on the platform.

Another observation is that I am seeing a lot more context switches
for boost asio compared with the other options - particularly when
using multiple threads, but only a single socketpair - e.g. for 8
threads, 1 socketpair and 200000 iterations I get about 880000
voluntary context switches for boost asio compared with about 200000
for the other epoll-based solutions.

And I think part of the reason might even be the Linux kernel which
posts EPOLLOUT notifications for each send (even if there isn't any
need for it).

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Arash Partow | 8 Apr 12:39 2012
Picon

Re: boost asio overhead/scalability

On 8/04/2012 8:18 PM, Christof Meerwald wrote:
> On Thu, 29 Mar 2012 12:46:19 -0400, Gruenke, Matt wrote:
>> That's exactly the point.  The others are either running handlers or
>> waiting (possibly in a FIFO) for their turn to get another event from
>> epoll().  Keep in mind that the FIFO might be implicit, arising from how
>> certain synchronization primitives are implemented on the platform.
>
> Another observation is that I am seeing a lot more context switches
> for boost asio compared with the other options - particularly when
> using multiple threads, but only a single socketpair - e.g. for 8
> threads, 1 socketpair and 200000 iterations I get about 880000
> voluntary context switches for boost asio compared with about 200000
> for the other epoll-based solutions.
>

Do these other epoll-based solutions support time-out events as well or is it only recv ready events?

Though a 4 fold increase in C/S is pretty bad over a stock implementation.... I'm becoming very intrigued by
these results,  btw you mentioned Linux 3.0 kernel, could you please provide the exact version number, and
distro (if possible 'uname -a').

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Arash Partow | 8 Apr 01:42 2012
Picon

Re: boost asio overhead/scalability

Hi Christof,

On 26/03/2012 3:55 AM, Christof Meerwald wrote:
> - http://svn.cmeerw.net/src/nginetd/trunk/asyncsrv.cc
>

Could you please provide a link to the full listing of the "async_XXX" headers that are called in the above
source file.

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio

Christof Meerwald | 8 Apr 10:50 2012

Re: boost asio overhead/scalability

On Sun, 08 Apr 2012 09:42:32 +1000, Arash Partow wrote:
> On 26/03/2012 3:55 AM, Christof Meerwald wrote:
>> - http://svn.cmeerw.net/src/nginetd/trunk/asyncsrv.cc
> Could you please provide a link to the full listing of the "async_XXX" headers that are called in the above
source file.

All the source code (plus a bit more) is available in
http://svn.cmeerw.net/src/nginetd/trunk/

Christof

--

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
asio-users mailing list
asio-users@...
https://lists.sourceforge.net/lists/listinfo/asio-users
_______________________________________________
Using Asio? List your project at
http://think-async.com/Asio/WhoIsUsingAsio


Gmane