Edward J. Yoon | 13 Feb 06:56 2012
Picon

BSP interface in the SciPy.

Hi community,

My name is Edward, and I'm a committer of Apache Hama project[1] which
is a Bulk Synchronous Parallel framework for massive scientific
computation on top of Hadoop[2].

Today, I just noticed that there's a BSP interface based on BSPLib in
the SciPy, and thought maybe we could work together, on supporting
SciPy programs to run on existing Hadoop YARN[3] or Hama cluster.

I would like to know if anyone would be willing to participate in this project.

Thanks!

1. http://incubator.apache.org/hama/ or check more recent
2. http://hadoop.apache.org/
3. http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/

--

-- 
Best Regards, Edward J. Yoon
 <at> eddieyoon
Ralf Gommers | 13 Feb 07:11 2012

Re: BSP interface in the SciPy.



On Mon, Feb 13, 2012 at 6:56 AM, Edward J. Yoon <edwardyoon <at> apache.org> wrote:
Hi community,

My name is Edward, and I'm a committer of Apache Hama project[1] which
is a Bulk Synchronous Parallel framework for massive scientific
computation on top of Hadoop[2].

Today, I just noticed that there's a BSP interface based on BSPLib in
the SciPy, and thought maybe we could work together, on supporting
SciPy programs to run on existing Hadoop YARN[3] or Hama cluster.

To anyone else who can't find this interface: this is actually in Scientific, not SciPy. It looks like the source of Scientific is not even publicly available anymore, all I can find is one non-working sourceforge link.

Ralf


I would like to know if anyone would be willing to participate in this project.

Thanks!

1. http://incubator.apache.org/hama/ or check more recent
2. http://hadoop.apache.org/
3. http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/

--
Best Regards, Edward J. Yoon
<at> eddieyoon
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev

_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Edward J. Yoon | 13 Feb 09:19 2012
Picon

Re: BSP interface in the SciPy.

Hmm, yes. I just looked again, it's my misreading. It looks like a
part of Scientific, not a SciPy.

But if you have some interested in figuring out whether we can
collaborate on high-performance parallel processing, let's continue
discuss here :)

Thanks!

On Mon, Feb 13, 2012 at 3:11 PM, Ralf Gommers
<ralf.gommers <at> googlemail.com> wrote:
>
>
> On Mon, Feb 13, 2012 at 6:56 AM, Edward J. Yoon <edwardyoon <at> apache.org>
> wrote:
>>
>> Hi community,
>>
>> My name is Edward, and I'm a committer of Apache Hama project[1] which
>> is a Bulk Synchronous Parallel framework for massive scientific
>> computation on top of Hadoop[2].
>>
>> Today, I just noticed that there's a BSP interface based on BSPLib in
>> the SciPy, and thought maybe we could work together, on supporting
>> SciPy programs to run on existing Hadoop YARN[3] or Hama cluster.
>
>
> To anyone else who can't find this interface: this is actually in
> Scientific, not SciPy. It looks like the source of Scientific is not even
> publicly available anymore, all I can find is one non-working sourceforge
> link.
>
> Ralf
>
>>
>> I would like to know if anyone would be willing to participate in this
>> project.
>>
>> Thanks!
>>
>> 1. http://incubator.apache.org/hama/ or check more recent
>> 2. http://hadoop.apache.org/
>> 3.
>> http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/
>>
>> --
>> Best Regards, Edward J. Yoon
>>  <at> eddieyoon
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev <at> scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>

--

-- 
Best Regards, Edward J. Yoon
 <at> eddieyoon
Thomas Kluyver | 13 Feb 11:40 2012
Picon

Re: BSP interface in the SciPy.

On 13 February 2012 06:11, Ralf Gommers <ralf.gommers <at> googlemail.com> wrote:
To anyone else who can't find this interface: this is actually in Scientific, not SciPy. It looks like the source of Scientific is not even publicly available anymore, all I can find is one non-working sourceforge link.

There are still tarballs here: https://sourcesup.cru.fr/projects/scientific-py/

And a bit of detective work found that the author has put the repository on bitbucket: https://bitbucket.org/khinsen/scientificpython/overview

Thomas
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Sturla Molden | 14 Feb 17:05 2012
Picon

Re: BSP interface in the SciPy.

On 13.02.2012 06:56, Edward J. Yoon wrote:

> Today, I just noticed that there's a BSP interface based on BSPLib in
> the SciPy, and thought maybe we could work together, on supporting
> SciPy programs to run on existing Hadoop YARN[3] or Hama cluster.

AFAIK, BSP is a coding style, not a particular API.

If you need a barrier for BSP synchronization, this is the simplest 
implementation I can think of:

from multiprocessing import Event
from math import ceil, log
from contextlib import contextmanager

def _barrier(b):
      <at> contextmanager
     def _context(rank):
         b.wait(rank)
         yield
         b.wait(rank)
     return _context

class Barrier(object):

     def __init__(self, numproc):
         self._events = [mp.Event() for n in range(numproc**2)]
         self._numproc = numproc
         self.barrier = _barrier(self)

     def wait(self, rank):
         # loop log2(numproc) times, rounding up
         for k in range(int(ceil(log(self._numproc)/log(2)))):

             # send event to process
             # (rank + 2**k) % numproc
             receiver = (rank + 2**k) % self._numproc
             evt = self._events[rank * self._numproc + receiver]
             evt.set()

             # wait for event from process
             # (rank - 2**k) % numproc
             sender = (rank - 2**k) % self._numproc
             evt = self._events[sender * self._numproc + rank]
             evt.wait()
             evt.clear()

Now BSP code could look like this:

    barrier = Barrier(numprocs)

    for data in container:

        <process data>

        with barrier.barrier(rank):

             <communicate>

Sturla

	
Sturla Molden | 14 Feb 17:49 2012
Picon

Re: BSP interface in the SciPy.

On 14.02.2012 17:05, Sturla Molden wrote:

> If you need a barrier for BSP synchronization, this is the simplest
> implementation I can think of:

Moving the context manager to __call__ and adding a timeout to wait it 
becomes like this.

It's stange that Python does not have a barrier object in the standard 
lib. Considering usefulness to scientific computing it could be worth 
adding to numpy or scipy.

Sturla

from multiprocessing import Event # or threading.Event

from math import ceil, log
from contextlib import contextmanager
from time import clock

class Barrier(object):

     def __init__(self, numproc):
         self._events = [Event() for n in range(numproc**2)]
         self._numproc = numproc

      <at> contextmanager
     def __call__(self, rank):
         self.wait(rank, None)
         yield
         self.wait(rank, None)

     def wait(self, rank, *timeout):
         t0 = clock()
         if timeout:
             timeout = timeout[0]
             if (timeout is not None) and (not isinstance(timeout, float)):
                 return ValueError, 'timeout must be None or a float'
         # loop log2(num_threads) times, rounding up
         for k in range(int(ceil(log(self._numproc)/log(2)))):
             # send event to process (rank + 2**k) % numproc
             receiver = (rank + 2**k) % self._numproc
             evt = self._events[rank * self._numproc + receiver]
             evt.set()
             # wait for event from process (rank - 2**k) % numproc
             sender = (rank - 2**k) % self._numproc
             evt = self._events[sender * self._numproc + rank]
             if timeout:
                 t = clock()
                 if not evt.wait(max(0.0,timeout-(t-t0))):
                     return False
             else:
                 evt.wait()
             evt.clear()
         return True
Thomas Kluyver | 14 Feb 17:58 2012
Picon

Re: BSP interface in the SciPy.

On 14 February 2012 16:49, Sturla Molden <sturla <at> molden.no> wrote:
> It's stange that Python does not have a barrier object in the standard
> lib. Considering usefulness to scientific computing it could be worth
> adding to numpy or scipy.

Python 3.2 has a Barrier class for threading, but seemingly not yet
for multiprocessing. I imagine it would be a logical addition, since
the other synchronisation types from threading are available for
multiprocessing.

Thomas
Sturla Molden | 14 Feb 18:55 2012
Picon

Re: BSP interface in the SciPy.

On 14.02.2012 17:58, Thomas Kluyver wrote:

> Python 3.2 has a Barrier class for threading, but seemingly not yet
> for multiprocessing. I imagine it would be a logical addition, since
> the other synchronisation types from threading are available for
> multiprocessing.

Ok, I am still on 2.7 :-)

There are dozens of ways to make a barrier too. The one I used is 
certainly not the fastest, but it has a combinatoral beauty to it, like 
a butterfly :-)

Sturla
Pierre Haessig | 14 Feb 17:53 2012

Re: BSP interface in the SciPy.

Le 14/02/2012 17:05, Sturla Molden a écrit :
> AFAIK, BSP is a coding style, not a particular API.
Hi Sturla,
I'm not so familiar with parallel processing.
Do you have a short reference on this BSP style ? (oh, there is an 
international organization supporting it : http://www.bsp-worldwide.org/ !)
--

-- 
Pierre

I was wondering if your Class example comes from a preexisting code or 
if you just speak "parallel computing" as a mother tongue ;-) ?
Robert Kern | 14 Feb 17:58 2012
Picon

Re: BSP interface in the SciPy.

On Tue, Feb 14, 2012 at 16:53, Pierre Haessig <pierre.haessig <at> crans.org> wrote:
> Le 14/02/2012 17:05, Sturla Molden a écrit :
>> AFAIK, BSP is a coding style, not a particular API.
> Hi Sturla,
> I'm not so familiar with parallel processing.
> Do you have a short reference on this BSP style ? (oh, there is an
> international organization supporting it : http://www.bsp-worldwide.org/ !)

http://en.wikipedia.org/wiki/Bulk_synchronous_parallel

--

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Pierre Haessig | 14 Feb 18:42 2012

Re: BSP interface in the SciPy.

Le 14/02/2012 17:58, Robert Kern a écrit :
> http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
>
Fair enough ;-)
Thanks !
--

-- 
Pierre

_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Robert Kern | 14 Feb 19:48 2012
Picon

Re: BSP interface in the SciPy.

On Tue, Feb 14, 2012 at 17:42, Pierre Haessig <pierre.haessig <at> crans.org> wrote:
> Le 14/02/2012 17:58, Robert Kern a écrit :
>> http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
>>
> Fair enough ;-)

I apologize. I didn't mean to give such a useless response. I scanned
your email too quickly as I was leaving work and thought that you had
googled the acronym and only got an unrelated company in the results.

--

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Sturla Molden | 14 Feb 19:03 2012
Picon

Re: BSP interface in the SciPy.

On 14.02.2012 17:58, Robert Kern wrote:
> On Tue, Feb 14, 2012 at 16:53, Pierre Haessig<pierre.haessig <at> crans.org>  wrote:
>> Le 14/02/2012 17:05, Sturla Molden a écrit :
>>> AFAIK, BSP is a coding style, not a particular API.
>> Hi Sturla,
>> I'm not so familiar with parallel processing.
>> Do you have a short reference on this BSP style ?

You probably got the links.

Short answer:

It is a way to avoid deadlocks and livelocks in parallel computing. 
Computation and ipc are separated in discrete blocks with barrier synch 
in between.

ipc -> barrier -> compute -> barrier -> ipc -> ...

But it can often be difficult to fit a problem into a BSP paradigm, and 
sometimes it yields an inefficient program (the CPUs can spend a 
significant amount of time idle on the barriers).

Sturla
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Pierre Haessig | 15 Feb 11:10 2012

Re: BSP interface in the SciPy.

Le 14/02/2012 19:03, Sturla Molden a écrit :
> It is a way to avoid deadlocks and livelocks in parallel computing.
> Computation and ipc are separated in discrete blocks with barrier synch
> in between.
>
> ipc ->  barrier ->  compute ->  barrier ->  ipc ->  ...
>
> But it can often be difficult to fit a problem into a BSP paradigm, and
> sometimes it yields an inefficient program (the CPUs can spend a
> significant amount of time idle on the barriers).
Ok, I think I got the global idea now. Thanks a lot.

As of today, my exploration of parallel computing didn't go further than 
using Pool.map() from multiprocessing module. I just run the same 
simulations with a different set of input parameters. I'm guessing it's 
a classical use case, similar to the use case of Octave's parcellfun.
Simple enough but powerful enough !

Best,
Pierre
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev

Gmane