Ryan Newton | 28 Sep 19:09 2013
Picon

What class for splittable data / balanced-fold?

Hi all,

We all know and love Data.Foldable and are familiar with left folds and right folds.  But what you want in a parallel program is a balanced fold over a tree.  Fortunately, many of our datatypes (Sets, Maps) actually ARE balanced trees.  Hmm, but how do we expose that?

It seems like it would be nice to have a standard class that allows you to split a datatype into roughly even halves, until you get down to the leaves.  This goes along with Guy Steele's argument that we should use "append lists" as primitive rather than "cons-lists", and it's why we added append-lists within the monad-par library.

Does this class exist already?  A random google search brought up this module by the name Data.Splittable, but it's not quite the right thing.

Thanks,
  -Ryan


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
adam vogt | 29 Sep 03:46 2013
Picon

Re: What class for splittable data / balanced-fold?

On Sat, Sep 28, 2013 at 1:09 PM, Ryan Newton <rrnewton <at> gmail.com> wrote:
> Hi all,
>
> We all know and love Data.Foldable and are familiar with left folds and
> right folds.  But what you want in a parallel program is a balanced fold
> over a tree.  Fortunately, many of our datatypes (Sets, Maps) actually ARE
> balanced trees.  Hmm, but how do we expose that?

Hi Ryan,

At least for Data.Map, the Foldable instance seems to have a
reasonably balanced fold called fold (or foldMap):

>  fold t = go t
>    where   go (Bin _ _ v l r) = go l `mappend` (v `mappend` go r)

This doesn't seem to be guaranteed though. For example ghc's derived
instance writes the foldr only, so fold would be right-associated for
a:

> data T a = B (T a) (T a) | L a deriving (Foldable)

Regards,
Adam
Mike Izbicki | 29 Sep 09:31 2013

Re: What class for splittable data / balanced-fold?

I've got a Partitionable class that I've been using for this purpose:
The function called "parallel" in the HLearn library will automatically parallelize any homomorphism from a Partionable to a Monoid.  I specifically use that to parallelize machine learning algorithms.

I have two thoughts for better abstractions:

1)  This Partitionable class is essentially a comonoid.  By reversing the arrows of mappend, we get:

comappend :: a -> (a,a)

By itself, this works well if the number of processors you have is a power of two, but it needs some more fanciness to get things balanced properly for other numbers of processors.  I bet there's another algebraic structure that would capture these other cases, but I'm not sure what it is.

2) I'm working with parallelizing tree structures right now (kd-trees, cover trees, oct-trees, etc.).  The real problem is not splitting the number of data points equally (this is easy), but splitting the amount of work equally.  Some points take longer to process than others, and this cannot be determined in advance.  Therefore, an equal split of the data points can result in one processor getting 25% of the work load, and the second processor getting 75%.  Some sort of lazy Partitionable class that was aware of processor loads and didn't split data points until they were needed would be ideal for this scenario.

On Sat, Sep 28, 2013 at 6:46 PM, adam vogt <vogt.adam <at> gmail.com> wrote:
On Sat, Sep 28, 2013 at 1:09 PM, Ryan Newton <rrnewton <at> gmail.com> wrote:
> Hi all,
>
> We all know and love Data.Foldable and are familiar with left folds and
> right folds.  But what you want in a parallel program is a balanced fold
> over a tree.  Fortunately, many of our datatypes (Sets, Maps) actually ARE
> balanced trees.  Hmm, but how do we expose that?

Hi Ryan,

At least for Data.Map, the Foldable instance seems to have a
reasonably balanced fold called fold (or foldMap):

>  fold t = go t
>    where   go (Bin _ _ v l r) = go l `mappend` (v `mappend` go r)

This doesn't seem to be guaranteed though. For example ghc's derived
instance writes the foldr only, so fold would be right-associated for
a:

> data T a = B (T a) (T a) | L a deriving (Foldable)

Regards,
Adam
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Jan-Willem Maessen | 29 Sep 15:37 2013
Picon

Re: What class for splittable data / balanced-fold?



On Sat, Sep 28, 2013 at 1:09 PM, Ryan Newton <rrnewton <at> gmail.com> wrote:
Hi all,

We all know and love Data.Foldable and are familiar with left folds and right folds.  But what you want in a parallel program is a balanced fold over a tree.  Fortunately, many of our datatypes (Sets, Maps) actually ARE balanced trees.  Hmm, but how do we expose that?

It seems like it would be nice to have a standard class that allows you to split a datatype into roughly even halves, until you get down to the leaves.  This goes along with Guy Steele's argument that we should use "append lists" as primitive rather than "cons-lists", and it's why we added append-lists within the monad-par library.

Interestingly, in my Fortress days we looked at both using a split-like interface and at a more foldMap / reduce - like interface, and it seemed like the latter worked better – it requires a lot less boilerplate for controlling recursion, and better matches the fanout of whatever structure you're actually using underneath.

So I'd just go with a hand-written Foldable instance here.

But I'd love to hear if you've come up with an application that requires split itself, and that *isn't* zip.  I recall we decided zip was better done with element-and-index iteration over one of the structures and indexing into the other since most tree structures don't actually zip properly anyway.

-Jan-Willem Maessen
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Ryan Newton | 30 Sep 03:13 2013
Picon

Re: What class for splittable data / balanced-fold?

Thanks, that's interesting to know (re: Fortress).

Interestingly, in my Fortress days we looked at both using a split-like interface and at a more foldMap / reduce - like interface, and it seemed like the latter worked better – it requires a lot less boilerplate for controlling recursion, and better matches the fanout of whatever structure you're actually using underneath.

Ok, we'll have to try that.  I may be underestimating the power of a newtype and a monoid instance to expose the structure..  I was wrong about this before [1].  Here's the foldMap instance for Data.Map:
foldMap _ Tip = mempty foldMap f (Bin _ _ v l r) = Foldable.foldMap f l `mappend` f v `mappend` Foldable.foldMap f r Simon Marlow in his recent Haxl talk also had a domain where they wanted a symmetric (non-monadic) parallel spawn operation... But it remains pretty hard for me to reason about the operational behavior of these things... especially since foldMap instances may vary.  
Thanks,   -Ryan [1] For example, here is a non-allocating traverseWithKey_ that I failed to come up with: -- Version of traverseWithKey_ from Shachaf Ben-Kiki -- (See thread on Haskell-cafe.) -- Avoids O(N) allocation when traversing for side-effect. newtype Traverse_ f = Traverse_ { runTraverse_ :: f () } instance Applicative f => Monoid (Traverse_ f) where mempty = Traverse_ (pure ()) Traverse_ a `mappend` Traverse_ b = Traverse_ (a *> b) -- Since the Applicative used is Const (newtype Const m a = Const m), the -- structure is never built up. --(b) You can derive traverseWithKey_ from foldMapWithKey, e.g. as follows: traverseWithKey_ :: Applicative f => (k -> a -> f ()) -> M.Map k a -> f () traverseWithKey_ f = runTraverse_ . foldMapWithKey (\k x -> Traverse_ (void (f k x))) foldMapWithKey :: Monoid r => (k -> a -> r) -> M.Map k a -> r foldMapWithKey f = getConst . M.traverseWithKey (\k x -> Const (f k x))

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Jan-Willem Maessen | 30 Sep 16:25 2013
Picon

Re: What class for splittable data / balanced-fold?

On Sun, Sep 29, 2013 at 9:13 PM, Ryan Newton <rrnewton <at> gmail.com> wrote:
Thanks, that's interesting to know (re: Fortress).

Interestingly, in my Fortress days we looked at both using a split-like interface and at a more foldMap / reduce - like interface, and it seemed like the latter worked better – it requires a lot less boilerplate for controlling recursion, and better matches the fanout of whatever structure you're actually using underneath.

Ok, we'll have to try that.  I may be underestimating the power of a newtype and a monoid instance to expose the structure..  I was wrong about this before [1].  Here's the foldMap instance for Data.Map:
foldMap _ Tip = mempty foldMap f (Bin _ _ v l r) = Foldable.foldMap f l `mappend` f v `mappend` Foldable.foldMap f r Simon Marlow in his recent Haxl talk also had a domain where they wanted a symmetric (non-monadic) parallel spawn operation... But it remains pretty hard for me to reason about the operational behavior of these things... especially since foldMap instances may vary.  

I'll note that there's really a documentation responsibility here that hasn't been honored as much as it should (possibly because lots of folks are driving Foldable, which other commenters have noted doesn't seem to do what you want for tree-like data structures – I certainly didn't realize that).

It'd be worth thinking about doing the derivation of foldMap directly from the structure of the underlying type.

It'd also be worth documenting when we get tree-structured traversal out of a Foldable instance, and fixing the ones that don't provide it.

And I agree that getting down to non-allocating traversals is the ultimate goal here.  If we leak space or lose parallelism we might as well not bother.

-Jan

Thanks,
   -Ryan [1] For example, here is a non-allocating traverseWithKey_ that I failed to come up with: -- Version of traverseWithKey_ from Shachaf Ben-Kiki -- (See thread on Haskell-cafe.) -- Avoids O(N) allocation when traversing for side-effect. newtype Traverse_ f = Traverse_ { runTraverse_ :: f () } instance Applicative f => Monoid (Traverse_ f) where mempty = Traverse_ (pure ()) Traverse_ a `mappend` Traverse_ b = Traverse_ (a *> b) -- Since the Applicative used is Const (newtype Const m a = Const m), the -- structure is never built up. --(b) You can derive traverseWithKey_ from foldMapWithKey, e.g. as follows: traverseWithKey_ :: Applicative f => (k -> a -> f ()) -> M.Map k a -> f () traverseWithKey_ f = runTraverse_ . foldMapWithKey (\k x -> Traverse_ (void (f k x))) foldMapWithKey :: Monoid r => (k -> a -> r) -> M.Map k a -> r foldMapWithKey f = getConst . M.traverseWithKey (\k x -> Const (f k x))


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Ryan Newton | 30 Sep 17:23 2013
Picon

Re: What class for splittable data / balanced-fold?

Oops, this email got stuck in the pipe (flaky internet):
 
foldMap _ Tip = mempty foldMap f (Bin _ _ v l r) = Foldable.foldMap f l `mappend` f v `mappend` Foldable.foldMap f r

Btw, from my perspective, one problem with relying on foldMap is that it treats the whole structure uniformly, whereas the split approach would let one, for example, bottom out to a sequential implementation at a certain granularity.  Perhaps that is the "boilerplate for controlling recursion" that you referred to... but isn't it sometimes necessary?

   -Ryan
 
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane