Edward Kmett | 29 Aug 19:34 2012
Picon

Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

I would like to propose improving the Data instances for a number of currently completely opaque data types in the containers package, by using virtual constructors.

The instance for Data.Map already uses fromList for gfoldl, it just stops there.

Extending it to be able to gunfold and mention the name of that constructor would enable generic traversal libraries like uniplate, etc. to work over the contents of the Map, rather than bailing out in fear or crashing at the sight of a mkNoRepType.

An example of the changes for Data.Map are highlighted below.

instance (Data k, Data a, Ord k) => Data (Map k a) where
  gfoldl f z m   = z fromList `f` toList m
  toConstr _     = fromListConstr
  gunfold k z c  = case constrIndex c of
    1 -> k (z fromList)
    _ -> error "gunfold"
  dataTypeOf _   = mapDataType
  dataCast2 f    = gcast2 f

fromListConstr :: Constr
fromListConstr = mkConstr mapDataType "fromList" [] Prefix

mapDataType :: DataType
mapDataType = mkDataType "Data.Map.Map" [fromListConstr]

I've used this approach for years on my own libraries to great effect.

Discussion Period: 3 Weeks

(I added a week for ICFP)

-Edward
_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries
Milan Straka | 29 Aug 21:24 2012
Picon

Re: Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

Hi Edward,

> I would like to propose improving the Data instances for a number of
> currently completely opaque data types in the containers package, by using
> virtual constructors.
> 
> The instance for Data.Map already uses fromList for gfoldl, it just stops
> there.
> 
> Extending it to be able to gunfold and mention the name of that constructor
> would enable generic traversal libraries like uniplate, etc. to work over
> the contents of the Map, rather than bailing out in fear or crashing at the
> sight of a mkNoRepType.
> 
> An example of the changes for Data.Map are highlighted below.
> 
> instance (Data k, Data a, Ord k) => Data (Map k a) where
>   gfoldl f z m   = z fromList `f` toList m
>   toConstr _     = fromListConstr
>   gunfold k z c  = case constrIndex c of
>     1 -> k (z fromList)
>     _ -> error "gunfold"
>   dataTypeOf _   = mapDataType
>   dataCast2 f    = gcast2 f
> 
> fromListConstr :: Constr
> fromListConstr = mkConstr mapDataType "fromList" [] Prefix
> 
> mapDataType :: DataType
> mapDataType = mkDataType "Data.Map.Map" [fromListConstr]
> 
> I've used this approach for years on my own libraries to great effect.

+1 here.

I am not very familiar with the Data instances -- is it true that the
parameter of the `fromList` in the Data instance will often be sorted
(i.e., result of `toList` or `filter . toList`)? If so, we could use
fromMaybeAscList which would look like
fromMaybeAscList list | isDistinctAsc list = fromDistinctAscList list
                      | otherwise = fromList list
There is a big gain in using a linear-time fromDistinctAscList over O(N
log N) fromList, but there is a linear-time check and the list must be
kept around until isDistinctAsc finishes.

Cheers,
Milan
Edward Kmett | 30 Aug 01:54 2012
Picon

Re: Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

On Wed, Aug 29, 2012 at 3:24 PM, Milan Straka <fox <at> ucw.cz> wrote:

Hi Edward,

> I would like to propose improving the Data instances for a number of
> currently completely opaque data types in the containers package, by using
> virtual constructors.
>
> The instance for Data.Map already uses fromList for gfoldl, it just stops
> there.
>
> Extending it to be able to gunfold and mention the name of that constructor
> would enable generic traversal libraries like uniplate, etc. to work over
> the contents of the Map, rather than bailing out in fear or crashing at the
> sight of a mkNoRepType.
>
> An example of the changes for Data.Map are highlighted below.
>
> instance (Data k, Data a, Ord k) => Data (Map k a) where
>   gfoldl f z m   = z fromList `f` toList m
>   toConstr _     = fromListConstr
>   gunfold k z c  = case constrIndex c of
>     1 -> k (z fromList)
>     _ -> error "gunfold"
>   dataTypeOf _   = mapDataType
>   dataCast2 f    = gcast2 f
>
> fromListConstr :: Constr
> fromListConstr = mkConstr mapDataType "fromList" [] Prefix
>
> mapDataType :: DataType
> mapDataType = mkDataType "Data.Map.Map" [fromListConstr]
>
> I've used this approach for years on my own libraries to great effect.

+1 here.

I am not very familiar with the Data instances -- is it true that the
parameter of the `fromList` in the Data instance will often be sorted
(i.e., result of `toList` or `filter . toList`)? If so, we could use
fromMaybeAscList which would look like
fromMaybeAscList list | isDistinctAsc list = fromDistinctAscList list
                      | otherwise = fromList list
There is a big gain in using a linear-time fromDistinctAscList over O(N
log N) fromList, but there is a linear-time check and the list must be
kept around until isDistinctAsc finishes.

The users of Data.Data could in theory do anything they want to the keys, but I do confess for most scenarios they'll come back to you ordered.

Hrmm. A more nuanced fromList construction could definitely help, though I suppose that could apply in the general case as well. 

We should be able to fuse this "try to construct linearly, but fall back on N-log-N" version of fromList in one pass even for normal uses of fromList.

e.g. assume that you are constructing a sorted tree until you find a key out of order, then take the tree you've built so far and union it appropriately with the slower constructed fromList of the remainder. That way you don't have to retain the storage for both the list and the map, and we only force the list once.

-Edward
_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries
Johan Tibell | 30 Aug 02:02 2012
Picon

Re: Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

On Wed, Aug 29, 2012 at 4:54 PM, Edward Kmett <ekmett <at> gmail.com> wrote:
> We should be able to fuse this "try to construct linearly, but fall back on
> N-log-N" version of fromList in one pass even for normal uses of fromList.
>
> e.g. assume that you are constructing a sorted tree until you find a key out
> of order, then take the tree you've built so far and union it appropriately
> with the slower constructed fromList of the remainder. That way you don't
> have to retain the storage for both the list and the map, and we only force
> the list once.

Could you instead of falling back to the slower fromList just keep
building new trees and then union them all in the end. Real world data
is often is semi-sorted order. I guess we would need to detect some
worst case scenarios (i.e. sorted in reverse order) and fall back to
the slower fromList in those cases.

-- Johan
Edward Kmett | 30 Aug 02:57 2012
Picon

Re: Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

This might pay off as well, but I am leery that its a rather tricky balancing act and will take a lot of profiling to find the right balance in practical performance vs. asymptotic bounds.


Should we split the notion of improving the performance of fromList into a separate project/proposal
that simply exposes synergies with this one?

-Edward

On Wed, Aug 29, 2012 at 8:02 PM, Johan Tibell <johan.tibell <at> gmail.com> wrote:
On Wed, Aug 29, 2012 at 4:54 PM, Edward Kmett <ekmett <at> gmail.com> wrote:
> We should be able to fuse this "try to construct linearly, but fall back on
> N-log-N" version of fromList in one pass even for normal uses of fromList.
>
> e.g. assume that you are constructing a sorted tree until you find a key out
> of order, then take the tree you've built so far and union it appropriately
> with the slower constructed fromList of the remainder. That way you don't
> have to retain the storage for both the list and the map, and we only force
> the list once.

Could you instead of falling back to the slower fromList just keep
building new trees and then union them all in the end. Real world data
is often is semi-sorted order. I guess we would need to detect some
worst case scenarios (i.e. sorted in reverse order) and fall back to
the slower fromList in those cases.

-- Johan

_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries
Johan Tibell | 30 Aug 03:06 2012
Picon

Re: Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

On Wed, Aug 29, 2012 at 5:57 PM, Edward Kmett <ekmett <at> gmail.com> wrote:
> This might pay off as well, but I am leery that its a rather tricky
> balancing act and will take a lot of profiling to find the right balance in
> practical performance vs. asymptotic bounds.
>
> Should we split the notion of improving the performance of fromList into a
> separate project/proposal
> that simply exposes synergies with this one?

Sure. Lets continue this thread on its original topic. I'm +1 on the proposal.
Brent Yorgey | 30 Aug 15:50 2012

Re: Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

On Wed, Aug 29, 2012 at 06:06:25PM -0700, Johan Tibell wrote:
> On Wed, Aug 29, 2012 at 5:57 PM, Edward Kmett <ekmett <at> gmail.com> wrote:
> > This might pay off as well, but I am leery that its a rather tricky
> > balancing act and will take a lot of profiling to find the right balance in
> > practical performance vs. asymptotic bounds.
> >
> > Should we split the notion of improving the performance of fromList into a
> > separate project/proposal
> > that simply exposes synergies with this one?
> 
> Sure. Lets continue this thread on its original topic. I'm +1 on the
> proposal.

+1 from me too.  A smarter fromList also sounds cool but I agree that
should be a separate proposal, if only to avoid derailing this one.

-Brent
Edward Kmett | 1 Sep 14:23 2012
Picon

Re: Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

For what it is worth, Milan already implemented the smarter fromList and it is considerably faster. =)


If I'm reading his numbers right, its ~50% faster than the original in the worst case, and its about ~10x faster on large sorted inputs, and within 5% of the speed of fromDistinctAscList.

Overall, it is a huge win.

-Edward

On Thu, Aug 30, 2012 at 9:50 AM, Brent Yorgey <byorgey <at> seas.upenn.edu> wrote:
On Wed, Aug 29, 2012 at 06:06:25PM -0700, Johan Tibell wrote:
> On Wed, Aug 29, 2012 at 5:57 PM, Edward Kmett <ekmett <at> gmail.com> wrote:
> > This might pay off as well, but I am leery that its a rather tricky
> > balancing act and will take a lot of profiling to find the right balance in
> > practical performance vs. asymptotic bounds.
> >
> > Should we split the notion of improving the performance of fromList into a
> > separate project/proposal
> > that simply exposes synergies with this one?
>
> Sure. Lets continue this thread on its original topic. I'm +1 on the
> proposal.

+1 from me too.  A smarter fromList also sounds cool but I agree that
should be a separate proposal, if only to avoid derailing this one.

-Brent

_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries
Edward Kmett | 22 Nov 04:32 2012
Picon

Re: Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

As the deadline has long since past and this appears to have been relatively non-controversial, what do we need to do to get these into containers?

-Edward

On Sat, Sep 1, 2012 at 8:23 AM, Edward Kmett <ekmett <at> gmail.com> wrote:
For what it is worth, Milan already implemented the smarter fromList and it is considerably faster. =)

If I'm reading his numbers right, its ~50% faster than the original in the worst case, and its about ~10x faster on large sorted inputs, and within 5% of the speed of fromDistinctAscList.

Overall, it is a huge win.

-Edward


On Thu, Aug 30, 2012 at 9:50 AM, Brent Yorgey <byorgey <at> seas.upenn.edu> wrote:
On Wed, Aug 29, 2012 at 06:06:25PM -0700, Johan Tibell wrote:
> On Wed, Aug 29, 2012 at 5:57 PM, Edward Kmett <ekmett <at> gmail.com> wrote:
> > This might pay off as well, but I am leery that its a rather tricky
> > balancing act and will take a lot of profiling to find the right balance in
> > practical performance vs. asymptotic bounds.
> >
> > Should we split the notion of improving the performance of fromList into a
> > separate project/proposal
> > that simply exposes synergies with this one?
>
> Sure. Lets continue this thread on its original topic. I'm +1 on the
> proposal.

+1 from me too.  A smarter fromList also sounds cool but I agree that
should be a separate proposal, if only to avoid derailing this one.

-Brent

_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries


_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries
Milan Straka | 22 Nov 09:18 2012
Picon

Re: Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

Hi Edward,

> As the deadline has long since past and this appears to have been
> relatively non-controversial, what do we need to do to get these into
> containers?

the best way would be to create the patch and then create a pull request
at https://github.com/haskell/containers . Johan and me will be notified
then, will review the code and merge it.

Cheers,
Milan

> 
> -Edward
> 
> On Sat, Sep 1, 2012 at 8:23 AM, Edward Kmett <ekmett <at> gmail.com> wrote:
> 
> > For what it is worth, Milan already implemented the smarter fromList and
> > it is considerably faster. =)
> >
> > If I'm reading his numbers right, its ~50% faster than the original in the
> > worst case, and its about ~10x faster on large sorted inputs, and within 5%
> > of the speed of fromDistinctAscList.
> >
> > Overall, it is a huge win.
> >
> > -Edward
> >
> >
> > On Thu, Aug 30, 2012 at 9:50 AM, Brent Yorgey <byorgey <at> seas.upenn.edu>wrote:
> >
> >> On Wed, Aug 29, 2012 at 06:06:25PM -0700, Johan Tibell wrote:
> >> > On Wed, Aug 29, 2012 at 5:57 PM, Edward Kmett <ekmett <at> gmail.com> wrote:
> >> > > This might pay off as well, but I am leery that its a rather tricky
> >> > > balancing act and will take a lot of profiling to find the right
> >> balance in
> >> > > practical performance vs. asymptotic bounds.
> >> > >
> >> > > Should we split the notion of improving the performance of fromList
> >> into a
> >> > > separate project/proposal
> >> > > that simply exposes synergies with this one?
> >> >
> >> > Sure. Lets continue this thread on its original topic. I'm +1 on the
> >> > proposal.
> >>
> >> +1 from me too.  A smarter fromList also sounds cool but I agree that
> >> should be a separate proposal, if only to avoid derailing this one.
> >>
> >> -Brent
> >>
> >> _______________________________________________
> >> Libraries mailing list
> >> Libraries <at> haskell.org
> >> http://www.haskell.org/mailman/listinfo/libraries
> >>
> >
> >
José Pedro Magalhães | 1 Sep 13:14 2012
Picon

Re: Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

+1


Pedro

On Wed, Aug 29, 2012 at 6:34 PM, Edward Kmett <ekmett <at> gmail.com> wrote:
I would like to propose improving the Data instances for a number of currently completely opaque data types in the containers package, by using virtual constructors.

The instance for Data.Map already uses fromList for gfoldl, it just stops there.

Extending it to be able to gunfold and mention the name of that constructor would enable generic traversal libraries like uniplate, etc. to work over the contents of the Map, rather than bailing out in fear or crashing at the sight of a mkNoRepType.

An example of the changes for Data.Map are highlighted below.

instance (Data k, Data a, Ord k) => Data (Map k a) where
  gfoldl f z m   = z fromList `f` toList m
  toConstr _     = fromListConstr
  gunfold k z c  = case constrIndex c of
    1 -> k (z fromList)
    _ -> error "gunfold"
  dataTypeOf _   = mapDataType
  dataCast2 f    = gcast2 f

fromListConstr :: Constr
fromListConstr = mkConstr mapDataType "fromList" [] Prefix

mapDataType :: DataType
mapDataType = mkDataType "Data.Map.Map" [fromListConstr]

I've used this approach for years on my own libraries to great effect.

Discussion Period: 3 Weeks

(I added a week for ICFP)

-Edward

_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries


_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries

Gmane