Bart Massey | 28 Sep 09:45
Favicon
Gravatar

Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

http://hackage.haskell.org/trac/ghc/ticket/2629

Everyone always complains about nub, but nobody ever does
anything about it except (map head . group . sort), which
isn't lazy and isn't always faster. :-)

I've implemented a new function nubWith that takes a
"stop list" as an argument and filters its target list
against the stop list.  I've then re-implemented nub and
implemented nubOrd and nubInt in terms of nubWith: the stop
list is a typeclass, so these implementations are trivial
and new implementations are easily added.  nubBy is left
alone, since there's nothing obvious to be done about it.
All of the nubs are still fully lazy.

Basic QuickCheck tests are provided, and pass.

Performance benchmarking code is provided.  The performance
of my nub implementation is quite comparable to that of
the standard one.  My nubOrd and nubInt implementations
are dramatically faster, since they use a Set and IntSet
respectively for the stop list.  In particular, they
are performant on long lists with long nubs, unlike the
basic nub.

My implementation is available via git at
  git://svcs.cs.pdx.edu/git/nub.git
or can be browsed at
  http://svcs.cs.pdx.edu/gitweb?p=nub.git;a=tree
and has a maybe-outdated tarball at
(Continue reading)

Alexander Dunlap | 28 Sep 20:05

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

On Sun, Sep 28, 2008 at 12:49 AM, Bart Massey <bart <at> cs.pdx.edu> wrote:
> http://hackage.haskell.org/trac/ghc/ticket/2629
>
> Everyone always complains about nub, but nobody ever does
> anything about it except (map head . group . sort), which
> isn't lazy and isn't always faster. :-)
>
> I've implemented a new function nubWith that takes a
> "stop list" as an argument and filters its target list
> against the stop list.  I've then re-implemented nub and
> implemented nubOrd and nubInt in terms of nubWith: the stop
> list is a typeclass, so these implementations are trivial
> and new implementations are easily added.  nubBy is left
> alone, since there's nothing obvious to be done about it.
> All of the nubs are still fully lazy.
>
> Basic QuickCheck tests are provided, and pass.
>
> Performance benchmarking code is provided.  The performance
> of my nub implementation is quite comparable to that of
> the standard one.  My nubOrd and nubInt implementations
> are dramatically faster, since they use a Set and IntSet
> respectively for the stop list.  In particular, they
> are performant on long lists with long nubs, unlike the
> basic nub.
>
> My implementation is available via git at
>  git://svcs.cs.pdx.edu/git/nub.git
> or can be browsed at
>  http://svcs.cs.pdx.edu/gitweb?p=nub.git;a=tree
(Continue reading)

Krasimir Angelov | 28 Sep 20:48

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

I also think that it is better to use RULES and to hide at least
nubInt. For nubOrd I am not sure whether it should be exported or
hidden because it requires different type class.

Isn't the usage of multi-parameter type class with functional
dependencies and flexible instances an overkill for such a simple job?
I would write nubWith in this way:

nubWith :: (e -> s -> Maybe s) -> s -> [e] -> [e]
nubWith _ _ [] = []
nubWith f  s (e : es)
   case f e s of
      Just s   -> e : nubWith f s es
      Nothing -> nubWith f s es

It is portable and perhaps more flexible. With your type class you
assume that it is efficient and possible to have separated memberS and
insertS which might not be the case. If you have only one function
then you leave the decision to the implementor.

Regards,
  Krasimir

On Sun, Sep 28, 2008 at 8:05 PM, Alexander Dunlap
<alexander.dunlap <at> gmail.com> wrote:
> On Sun, Sep 28, 2008 at 12:49 AM, Bart Massey <bart <at> cs.pdx.edu> wrote:
>> http://hackage.haskell.org/trac/ghc/ticket/2629
>>
>> Everyone always complains about nub, but nobody ever does
>> anything about it except (map head . group . sort), which
(Continue reading)

Jeremy Shaw | 28 Sep 21:40
Favicon

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

At Sun, 28 Sep 2008 11:05:42 -0700,
Alexander Dunlap wrote:

> This seems like a good idea but it's kind of strange to have three
> different exposed versions of nub. Would it be possible to hide them,
> hide the StopList typeclass and use {-# RULES #-} pragmas to use the
> faster versions when possible?

The RULES pragma is GHC-specific. Would it make sense to export the
faster versions so that people targeting alternative compilers could
use them? 

Additionally, I think RULES only take effect if you are compiling with
optimizations, but not if you just load the files into GHCi. I have
written programs which run fine with nubOrd but are unacceptable with
nubEq. So, I would certainly want the option to force the use of
nubOrd at all times, rather than rely on the compiler to sometimes
make the required optimization for me.

j.
Isaac Dupree | 28 Sep 23:09

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

Jeremy Shaw wrote:
> Additionally, I think RULES only take effect if you are compiling with
> optimizations, but not if you just load the files into GHCi. I have
> written programs which run fine with nubOrd but are unacceptable with
> nubEq. So, I would certainly want the option to force the use of
> nubOrd at all times, rather than rely on the compiler to sometimes
> make the required optimization for me.

indeed! Could we provide a function named "nubEq" too, so 
that you can be explicit about only requiring (Eq) when 
needed, if you want to get rid of all the "deprecated" 
(word-ambiguous and aesthetically too enticingly short) uses 
of "nub".

-Isaac
apfelmus | 29 Sep 18:40
Favicon

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

Alexander Dunlap wrote:
> 
> This seems like a good idea but it's kind of strange to have three
> different exposed versions of nub. Would it be possible to hide them,
> hide the StopList typeclass and use {-# RULES #-} pragmas to use the
> faster versions when possible?

I don't think that using RULES pragmas is a good solution to the problem.

Instead, there's an old-fashioned way to use the name  nub  for all
cases: make it a member of the Eq typeclass!

  class Eq a where
    (==) :: a -> a -> Bool
    (/=) :: a -> a -> Bool
    nub  :: [a] -> [a]

    nub = -- default definition

Now,  nub  can be specialized at will.

Changing the default definition of  nub  to use Set if we have an Ord
class is tricky, though. Basically, we would need this proposal:

http://www.haskell.org/haskellwiki/Class_system_extension_proposal#Allowing_superclass_methods_to_be_overridden_in_derived_classes

Does it have a formal definition yet and guarantees that it works? In
any case, consider  nub  to be another example for the usefulness of
this proposal.

(Continue reading)

Alexander Dunlap | 30 Sep 00:55

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

On Mon, Sep 29, 2008 at 9:40 AM, apfelmus <apfelmus <at> quantentunnel.de> wrote:
> Alexander Dunlap wrote:
>>
>> This seems like a good idea but it's kind of strange to have three
>> different exposed versions of nub. Would it be possible to hide them,
>> hide the StopList typeclass and use {-# RULES #-} pragmas to use the
>> faster versions when possible?
>
> I don't think that using RULES pragmas is a good solution to the problem.

Why not? I thought that was the major purpose of RULES - to implement
transformations that don't affect semantics. It seems silly to clutter
up classes with extra functions just for efficiency or to have to
change programs every time the types change. The fact that RULES don't
work in GHCi is admittedly a downside; is there any plan to change
that?

Alex
Favicon

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

On 2008 Sep 29, at 18:55, Alexander Dunlap wrote:
> On Mon, Sep 29, 2008 at 9:40 AM, apfelmus  
> <apfelmus <at> quantentunnel.de> wrote:
>> Alexander Dunlap wrote:
>>> This seems like a good idea but it's kind of strange to have three
>>> different exposed versions of nub. Would it be possible to hide  
>>> them,
>>> hide the StopList typeclass and use {-# RULES #-} pragmas to use the
>>> faster versions when possible?
>>
>> I don't think that using RULES pragmas is a good solution to the  
>> problem.
>
> Why not? I thought that was the major purpose of RULES - to implement
> transformations that don't affect semantics. It seems silly to clutter

So what of yhc, nhc98, jhc?  ghc is not the be-all and end-all of  
Haskell.

--

-- 
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery <at> kf8nh.com
system administrator [openafs,heimdal,too many hats] allbery <at> ece.cmu.edu
electrical and computer engineering, carnegie mellon university    KF8NH
Alexander Dunlap | 30 Sep 01:40

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

On Mon, Sep 29, 2008 at 4:06 PM, Brandon S. Allbery KF8NH
<allbery <at> ece.cmu.edu> wrote:
> On 2008 Sep 29, at 18:55, Alexander Dunlap wrote:
>>
>> On Mon, Sep 29, 2008 at 9:40 AM, apfelmus <apfelmus <at> quantentunnel.de>
>> wrote:
>>>
>>> Alexander Dunlap wrote:
>>>>
>>>> This seems like a good idea but it's kind of strange to have three
>>>> different exposed versions of nub. Would it be possible to hide them,
>>>> hide the StopList typeclass and use {-# RULES #-} pragmas to use the
>>>> faster versions when possible?
>>>
>>> I don't think that using RULES pragmas is a good solution to the problem.
>>
>> Why not? I thought that was the major purpose of RULES - to implement
>> transformations that don't affect semantics. It seems silly to clutter
>
> So what of yhc, nhc98, jhc?  ghc is not the be-all and end-all of Haskell.
>

Well, yes, but presumably different compilers could optimize in
different ways. (Perhaps they could use some class behind the scenes,
or their own optimizing mechanism?) My point is that we shouldn't have
multiple exposed functions for _exactly_ the same semantic operation.

Alex
Favicon

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

On 2008 Sep 29, at 19:40, Alexander Dunlap wrote:
> On Mon, Sep 29, 2008 at 4:06 PM, Brandon S. Allbery KF8NH
> <allbery <at> ece.cmu.edu> wrote:
>> On 2008 Sep 29, at 18:55, Alexander Dunlap wrote:
>>> On Mon, Sep 29, 2008 at 9:40 AM, apfelmus  
>>> <apfelmus <at> quantentunnel.de>
>>> wrote:
>>>> Alexander Dunlap wrote:
>>>>>
>>>>> This seems like a good idea but it's kind of strange to have three
>>>>> different exposed versions of nub. Would it be possible to hide  
>>>>> them,
>>>>> hide the StopList typeclass and use {-# RULES #-} pragmas to use  
>>>>> the
>>>>> faster versions when possible?
>>>>
>>>> I don't think that using RULES pragmas is a good solution to the  
>>>> problem.
>>>
>>> Why not? I thought that was the major purpose of RULES - to  
>>> implement
>>> transformations that don't affect semantics. It seems silly to  
>>> clutter
>>
>> So what of yhc, nhc98, jhc?  ghc is not the be-all and end-all of  
>> Haskell.
>
> Well, yes, but presumably different compilers could optimize in
> different ways. (Perhaps they could use some class behind the scenes,
> or their own optimizing mechanism?) My point is that we shouldn't have
(Continue reading)

Henning Thielemann | 30 Sep 16:09

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith


On Mon, 29 Sep 2008, Brandon S. Allbery KF8NH wrote:

> On 2008 Sep 29, at 19:40, Alexander Dunlap wrote:
>> 
>> Well, yes, but presumably different compilers could optimize in
>> different ways. (Perhaps they could use some class behind the scenes,
>> or their own optimizing mechanism?) My point is that we shouldn't have
>> multiple exposed functions for _exactly_ the same semantic operation.
>
> Tell it to the Prelude.  (^), (^^), (**) anyone?

I don't see the relation between the different kind of powers and the 
different implementations of 'nub'. As far as I understand 'nubInt' is 
just an optimization of the generic 'nub'. In contrast to that, the three 
power operators denote really different operations:
   http://www.haskell.org/haskellwiki/Power_function

I think it is reasonable to drop in an optimized 'nub' for certain types 
by an optimizer rule, just like it is legitimate to drop in a special GSL 
function for (GSLMatrix.map sin), or a determinant algorithm for Double 
matrices which uses division, whereas the generic determinant computation 
does not need divisions.
apfelmus | 1 Oct 11:34
Favicon

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

Alexander Dunlap wrote:
> apfelmus wrote:
>> Alexander Dunlap wrote:
>>> This seems like a good idea but it's kind of strange to have three
>>> different exposed versions of nub. Would it be possible to hide them,
>>> hide the StopList typeclass and use {-# RULES #-} pragmas to use the
>>> faster versions when possible?
>>
>> I don't think that using RULES pragmas is a good solution to the problem.
> 
> Why not? I thought that was the major purpose of RULES - to implement
> transformations that don't affect semantics. It seems silly to clutter
> up classes with extra functions just for efficiency or to have to
> change programs every time the types change. The fact that RULES don't
> work in GHCi is admittedly a downside; is there any plan to change
> that?

Why not type classes? :) Their major purpose is to overload a value at
different types, like

  nub :: Eq a => [a] -> [a]
  nub :: Ord a => [a] -> [a]
  nub :: [Int] -> [Int]

and that's exactly what we're trying to do. Adding  nub  with a default
definition is a two-line change to the definition of the  Eq  class and
it's completely backwards compatible, no existing program needs to be
changed.

But as said, Haskell98 is currently unable to specialize  nub  for a
(Continue reading)

Bart Massey | 30 Sep 07:46
Favicon
Gravatar

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

Bart Massey <bart <at> cs.pdx.edu> writes:
> http://hackage.haskell.org/trac/ghc/ticket/2629
>
> Everyone always complains about nub, but nobody ever does
> anything about it except (map head . group . sort), which
> isn't lazy and isn't always faster.

Just thought I'd post a note to thank you all for the informative and
interesting discussion of my proposal so far.  Thanks much to all who are
helping to figure out what we should do!

It looks to me like there's consensus on at least one point so far.  If someone
is really not agreeing with this, please post.

(1) The StopList class is too "heavyweight" to naturally
    live in Data.List and should not be exported.  Given
    this, the obvious choices are (a) to drop nubWith from
    the interface altogether, or (b) to replace it with a
    version that takes a stoplist function and value as an
    argument, as suggested by Krasimir Angelov earlier:

	nubWith :: (a -> b -> Maybe b) -> b -> [a] -> [a]

    I've implemented and benchmarked (b), and it seems to
    perform fine.  You can get it from
      git://svcs.cs.pdx.edu/git/nub.git
    in the branch nubWith-function if you want to look for
    yourself.

    (Another possibility which hasn't been discussed is (c1)
(Continue reading)

Dan Doel | 30 Sep 08:20

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

On Tuesday 30 September 2008 1:46:23 am Bart Massey wrote:
> Just thought I'd post a note to thank you all for the informative and
> interesting discussion of my proposal so far.  Thanks much to all who are
> helping to figure out what we should do!

One additional issue:

Data.List is in the base package. Data.Set is in containers.

Data.Set also imports Data.List, so using sets in the latter creates a 
circular dependency.

The latter isn't as big a deal, but circular package dependences sounds like a 
problem to me (as I imagine containers depends on base).

Something else to discuss. :)

-- Dan

P.S.: On point 4, I'd probably lean away from StopList as well. Requiring 
fundeps seems undesirable for a rather core, H98-esque module.
Bart Massey | 30 Sep 18:24
Favicon
Gravatar

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

Dan Doel <dan.doel <at> gmail.com> writes:
> Data.List is in the base package. Data.Set is in containers.
> 
> Data.Set also imports Data.List, so using sets in the latter creates a 
> circular dependency.

Oh.  Ow.  My bad for not doing a proper patch in the first place, which would
have exposed the problem before it got this far.  My apologies.  Thanks much for
the catch.

You said that the circular module dependency is "not as big a deal"?  I'm not
getting why...it seems pretty serious to me.  Is there some refactoring that
makes it go away without doing huge violence to the existing structure?  I mean,
we could split the Data.List module up into pieces to break the dependency, but
that doesn't seem like it's going to happen...

I'd completely forgotten about the recent package split of containers and base.
 No idea what to do there.  I suppose at worst we could have a Data.Nubs module,
but that just seems silly, as all it's going to export at this point is nubOrd
and (StopList-less) nubWith.  Besides, what package do we put it in?  Do we now
have container, base, and nubs?? :-) :-)

(In general, it seems like this sort of thing is going to become a problem as
the libraries grow.  I note, for example, that Data.Function is in base, which
makes putting any reasonable implementation of transitiveClosure in there pretty
hard for this same reason.)

Any suggestions from anyone about how to proceed?  Or are we just done at this
point?  So close...

(Continue reading)

Dan Doel | 30 Sep 18:53

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

On Tuesday 30 September 2008 12:24:29 pm Bart Massey wrote:
> You said that the circular module dependency is "not as big a deal"?  I'm
> not getting why...it seems pretty serious to me.  Is there some refactoring
> that makes it go away without doing huge violence to the existing
> structure?  I mean, we could split the Data.List module up into pieces to
> break the dependency, but that doesn't seem like it's going to happen...

Well, in GHC, at least, there are ways to handle mutually recursive modules. 
I've never encountered the need myself, so I don't know the specifics, but it 
involves making an hs-boot file or something of the sort. Of course, that only 
works if they're in the same package, I suspect, so maybe it is a big problem. 
:)

I don't know how any other compilers handle such things.

> I'd completely forgotten about the recent package split of containers and
> base. No idea what to do there.  I suppose at worst we could have a
> Data.Nubs module, but that just seems silly, as all it's going to export at
> this point is nubOrd and (StopList-less) nubWith.  Besides, what package do
> we put it in?  Do we now have container, base, and nubs?? :-) :-)
>
> (In general, it seems like this sort of thing is going to become a problem
> as the libraries grow.  I note, for example, that Data.Function is in base,
> which makes putting any reasonable implementation of transitiveClosure in
> there pretty hard for this same reason.)
>
> Any suggestions from anyone about how to proceed?  Or are we just done at
> this point?  So close...

Lists are a container, so I guess it'd be nice if Data.List could be moved to 
(Continue reading)

Henning Thielemann | 30 Sep 20:54

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith


On Tue, 30 Sep 2008, Dan Doel wrote:

> Lists are a container, so I guess it'd be nice if Data.List could be moved to
> containers, but I think the prelude imports and uses stuff from it, and you're
> not going to move that.
>
> nubOrd could be stuck in Data.Set, but that's kind of an odd place for it.

Data.List could import list functions from Prelude and re-export them.
Bart Massey | 1 Oct 10:19
Favicon
Gravatar

A different approach to nubOrd (Re: Proposal #2629)

Bart Massey <bart <at> cs.pdx.edu> writes:
> Any suggestions from anyone about how to proceed?  Or are we just done at this
> point?  So close...

OK, I've got a new version of nubOrd together.  This one is just nubOrd, no
nubWith, no nubInt, nothing fancy.  There's two variants, nubOrd' and nubOrd''.
 The latter gets a slight performance win by omitting some work that would
preserve the order of output with respect to nub.

The basic strategy of nubOrd' is to pull off a prefix l of the input list as
long as the current stoplist s, nubSort l, merge l with s to get a new stoplist
s', use s in a merge step to filter l, then sort the remainder of l back to the
original order.  Finally, the resulting l' and s' are prepended to a recursive
call on the rest of the list.

nubOrd' has roughly the same time complexity as the set-based nubOrd did.  This
is because each step of nubOrd requires log m time to search the stoplist, and
each collection of steps in nubOrd' requires amortized log m time for the sorts.

Let's do some side by side comparison:

               nub      nubSort       nubOrd        nubOrd'      nubOrd''

laziness       full     no            full          spine        spine

worst-case     O(mn)    O(n log n)    O(n log m)    O(n log m)   O(n log m)
complexity
(m symbols,
 length n)

(Continue reading)

Bart Massey | 4 Oct 09:26
Favicon
Gravatar

Status of nubOrd (Proposal #2629)

I need to know what the community wants me to do to close
out my proposal to add nubOrd to the standard libraries.
After ruling out a lot of marginal choices, I guess I see
three leading alternatives, all of which have negatives.
I'd love to have some feedback on these so I can start
thinking about other things.

1) Stick nubOrd' from my previous message into Data.List and
   call it a day.

   Advantages: Does no violence to the current library
   structure.  Provides a nubOrd that has O(n log m)
   asymptotic performance, and performs reasonably in
   practice with large m.

   Disadvantages: In the worst case, 5x performance is left
   on the floor.  Not particularly lazy: will work with
   infinite lists, but not with lists terminating in bottom;
   will only randomly work with lists containing bottom
   elements.  Quite ugly implementation.  No nubWith.

   My score: 2/5

2) Stick (non-StopList) nubWith in Data.List.  Stick nubOrd
   in Data.Set, implemented using nubWith.

   Advantages: Provides a highly efficient, fully lazy
   nubOrd.  Provides nubWith.  Reasonable implementation.

   Disadvantages:  Sticking nubOrd in Data.Set is weird.
(Continue reading)

Isaac Dupree | 4 Oct 14:21

Re: Status of nubOrd (Proposal #2629)

Bart Massey wrote:
> 2) Stick (non-StopList) nubWith in Data.List.  Stick nubOrd
>    in Data.Set, implemented using nubWith.
> 
>    Advantages: Provides a highly efficient, fully lazy
>    nubOrd.  Provides nubWith.  Reasonable implementation.
> 
>    Disadvantages:  Sticking nubOrd in Data.Set is weird.
> 
>    My score: 4/5

indeed a weird location, but only an import-line needs to 
change if someone invents a super-duper nubOrd that's even 
faster but depends on more libraries.  Would it be better to 
stick it in a new module e.g. Data.Set.Nub?

-Isaac
Bart Massey | 4 Oct 18:23
Favicon
Gravatar

Re: Status of nubOrd (Proposal #2629)

Isaac Dupree <isaacdupree <at> charter.net> writes:
> Bart Massey wrote:
> > 2) Stick (non-StopList) nubWith in Data.List.
> 
> >    Disadvantages:  Sticking nubOrd in Data.Set is weird.
> 
> indeed a weird location.  Would it be better to 
> stick it in a new module e.g. Data.Set.Nub?

Another possibility I considered is to stick nubOrd in Data.List.Nub.  As far as
I know, since containers depends on base and Data.Set depends on Data.List,
there's nothing preventing us from putting Data.List.Nub in containers even
though Data.List is in base.  Does someone know otherwise?

> only an import-line needs to 
> change if someone invents a super-duper nubOrd that's even 
> faster but depends on more libraries.

I believe that, given sufficient effort, I could prove that the nubOrd as given
is optimal in asymptotic complexity. The constant factors seem really good and
entirely dependent on those of the Set implementation.  So I'm not so worried
about a faster nubOrd. :-)

Leaving nubOrd in Data.Set does make a weird kind of sense; we put the various
nub* in whatever module their underlying stop list representation is in.  Thus
nubIx might go in Data.Array.  Also, it's less of a change to the existing
structure.

If we made a Data.List.Nub, presumably as more nub* implementations were added,
they would all pile in there.  One problem with this plan is that if someone
(Continue reading)

David Menendez | 6 Oct 06:07

Re: Status of nubOrd (Proposal #2629)

On Sat, Oct 4, 2008 at 3:26 AM, Bart Massey <bart <at> cs.pdx.edu> wrote:
> I need to know what the community wants me to do to close
> out my proposal to add nubOrd to the standard libraries.
> After ruling out a lot of marginal choices, I guess I see
> three leading alternatives, all of which have negatives.
> I'd love to have some feedback on these so I can start
> thinking about other things.

I realize I'm coming into this discussion late, but has anyone
surveyed existing Haskell code to see how often nub is used? How many
of those cases require the specific functionality of nub, as opposed
to something like "Set.toList . Set.fromList"? I may be misjudging
things, but I can't imagine it's often enough to justify three or more
implementations in the standard library.

If there are really a lot of cases where people need a collection that
(1) has no duplicates, and (2) preserves an arbitrary order of
elements, maybe we'd be better off designing a data structure
specifically for that.

--

-- 
Dave Menendez <dave <at> zednenem.com>
<http://www.eyrie.org/~zednenem/>
Favicon

Re: Status of nubOrd (Proposal #2629)

On 2008 Oct 6, at 0:07, David Menendez wrote:
> On Sat, Oct 4, 2008 at 3:26 AM, Bart Massey <bart <at> cs.pdx.edu> wrote:
>> I need to know what the community wants me to do to close
>> out my proposal to add nubOrd to the standard libraries.
>> After ruling out a lot of marginal choices, I guess I see
>> three leading alternatives, all of which have negatives.
>> I'd love to have some feedback on these so I can start
>> thinking about other things.
>
> I realize I'm coming into this discussion late, but has anyone
> surveyed existing Haskell code to see how often nub is used? How many

It's not so much that nub is used often. it's that it's often  
*reimplemented*.

> If there are really a lot of cases where people need a collection that
> (1) has no duplicates, and (2) preserves an arbitrary order of
> elements, maybe we'd be better off designing a data structure
> specifically for that.

Feel free to contribute.  Make sure it supports the whole spectrum of  
list operations, including list comprehensions.

--

-- 
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery <at> kf8nh.com
system administrator [openafs,heimdal,too many hats] allbery <at> ece.cmu.edu
electrical and computer engineering, carnegie mellon university    KF8NH
David Menendez | 6 Oct 08:02

Re: Status of nubOrd (Proposal #2629)

On Mon, Oct 6, 2008 at 12:13 AM, Brandon S. Allbery KF8NH
<allbery <at> ece.cmu.edu> wrote:
> On 2008 Oct 6, at 0:07, David Menendez wrote:
>>
>> On Sat, Oct 4, 2008 at 3:26 AM, Bart Massey <bart <at> cs.pdx.edu> wrote:
>>>
>>> I need to know what the community wants me to do to close
>>> out my proposal to add nubOrd to the standard libraries.
>>> After ruling out a lot of marginal choices, I guess I see
>>> three leading alternatives, all of which have negatives.
>>> I'd love to have some feedback on these so I can start
>>> thinking about other things.
>>
>> I realize I'm coming into this discussion late, but has anyone
>> surveyed existing Haskell code to see how often nub is used? How many
>
> It's not so much that nub is used often. it's that it's often
> *reimplemented*.

By people who *need* a faster nub, or by people who figured they could
make nub faster by changing the type? How often does someone need the
exact behavior of nub, instead of something faster like "Set.toList .
Set.fromList"?

All this effort to optimize a seemingly obscure function feels like
overkill. We might be better off just adding a note to the
documentation of nub suggesting that people use Data.Set instead. (We
can't get rid of it, because it's in the H98 Report.)

In any case, my only objection is to complicating the standard
(Continue reading)

Simon Marlow | 7 Oct 14:40

Re: Status of nubOrd (Proposal #2629)

Bart Massey wrote:

> 2) Stick (non-StopList) nubWith in Data.List.  Stick nubOrd
>    in Data.Set, implemented using nubWith.
> 
>    Advantages: Provides a highly efficient, fully lazy
>    nubOrd.  Provides nubWith.  Reasonable implementation.
> 
>    Disadvantages:  Sticking nubOrd in Data.Set is weird.
> 
>    My score: 4/5

(2) seems reasonable to me, as long as we improve the documentation for nub 
to point out the quadratic complexity and direct users who want a faster 
version to Data.Set.nubOrd.

Cheers,
	Simon

Re: Status of nubOrd (Proposal #2629)

Bart Massey wrote:
> 1) Stick nubOrd' from my previous message into Data.List and
>    call it a day.

I don't like it, for the simple reason that its exact strictness
properties are hard to explain. Low performance hurts, too.

> 2) Stick (non-StopList) nubWith in Data.List.  Stick nubOrd
>    in Data.Set, implemented using nubWith.
>
>    Advantages: Provides a highly efficient, fully lazy
>    nubOrd.  Provides nubWith.  Reasonable implementation.

That's nubWith :: (a -> b -> Maybe b) -> b -> [a] -> [a], right?

I have no strong opinion about this proposal either way.

>    Disadvantages:  Sticking nubOrd in Data.Set is weird.

That doesn't worry me much. We can always add a link to the
documentation of 'nub'.

All in all I must say that the design space for nub functions is
surprisingly big. One aspect that hasn't been mentioned yet is
special handling for finite types. For example,

   nubBool (True:True:False:undefined)

could well return [True,False], because all possible values have
been exhausted at that point.
(Continue reading)

Bart Massey | 9 Oct 10:41
Favicon
Gravatar

Re: Status of nubOrd (Proposal #2629)

Bertram Felgenhauer <bertram.felgenhauer <at> googlemail.com> writes:
> Bart Massey wrote:
> That's nubWith :: (a -> b -> Maybe b) -> b -> [a] -> [a], right?

Yep.

> All in all I must say that the design space for nub functions is
> surprisingly big. One aspect that hasn't been mentioned yet is
> special handling for finite types.

We actually did talk about it a little earlier in this sprawling
thread.  But nubWith should handle it nicely if folks care.

>   nubBool = take 2 . nub

nub works fine on Bool, but you probably want nubWith for Char.

Thanks much for the comments!

    Bart Massey
    bart <at> cs.pdx.edu
Wolfgang Jeltsch | 9 Oct 15:41

Re: Status of nubOrd (Proposal #2629)

Am Donnerstag, 9. Oktober 2008 10:41 schrieb Bart Massey:
> >   nubBool = take 2 . nub

nub [False,True,undefined] => _|_

but

nubBool [False,True,undefined] => [False,True]

right?  Is this really intended?

Take care.  Bool has three values (False, True and _|_).  This is no Agda. ;-) 

Best wishes,
Wolfgang

Re: Status of nubOrd (Proposal #2629)

Wolfgang Jeltsch wrote:
> Am Donnerstag, 9. Oktober 2008 10:41 schrieb Bart Massey:
> > >   nubBool = take 2 . nub

Hmm, I wrote that.

> nub [False,True,undefined] => _|_

Surely you mean  False : True : _|_

> but
> 
> nubBool [False,True,undefined] => [False,True]
> 
> right?  Is this really intended?

Yes. It's a feature. I'm perfectly happy with

  nub :: [Bool] -> [Bool] ⊑ nubBool,

which means that pure functions using 'nubBool' will never fail when
using 'nub' instead works, because pure functions are monotonous. Most
of the time, relaxing the strictness of functions is not a problem.

> Take care.  Bool has three values (False, True and _|_).
> This is no Agda. ;-)

Yes, I know. But for some reason people rarely try to actually evaluate
a _|_ value, even in Haskell. ;-)

(Continue reading)

Johannes Waldmann | 30 Sep 10:03
Favicon

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

>  The StopList class  [...]

BTW, that is a horrible name for what is simply a *set*
(look at the methods you wrote: empty, insert, member)

The design problem here is that you don't want to use Data.Set
because currently it is an implementation, not an interface.

J.W.

_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries
Bart Massey | 30 Sep 17:55
Favicon
Gravatar

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

Johannes Waldmann <waldmann <at> imn.htwk-leipzig.de> writes:
> >  The StopList class  [...]
> 
> BTW, that is a horrible name for what is simply a *set*
> (look at the methods you wrote: empty, insert, member).

I originally called the typeclass Set, but it isn't, so I changed it.  It omits
some Set functionality.  One couldn't write an implementation of union or
intersect with it, because there's no way to enumerate its elements.

In any case, it's moot, because we're ditching StopLists. :-)

> The design problem here is that you don't want to use Data.Set
> because currently it is an implementation, not an interface.

If by that you mean that it would be nice if Set was a typeclass, and there was
a [] implementation as well as the current OrdSet implementation, I tend to
agree.  C'est la vie.
David Roundy | 30 Sep 17:58
Gravatar

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

On Tue, Sep 30, 2008 at 1:46 AM, Bart Massey wrote:
>    Note, though, that one can easily imagine a nubAscii
>    that is linear-time rather than the n log n time of
>    nubOrd / nubInt, using a small bit vector to track the
>    Chars.  Certainly nubBool has this property.  Hmm.
>    What's our criterion for when the performance difference
>    between two functions constitutes a practical semantic
>    difference?  I'm claiming it's asymptotic complexity
>    class, in which case (a) we should probably figure out how
>    to expose some kind of nubFinite.  Or we could just take
>    the position that in the vast universe of possible
>    functions, our library cannot provide them all.  In
>    which case (1b) nubWith is back in.  I guess I tend
>    toward (1b).

Actually, the asymptotic complexity (measured in terms of list length)
of nubOrd is identical to the asymptotic complexity of nubAscii,
nubBool or nubFinite.  They differ by a constant factor of the log(#
possible data values).

David
David Roundy | 30 Sep 18:55
Favicon
Gravatar

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

On Tue, Sep 30, 2008 at 11:58:07AM -0400, David Roundy wrote:
> On Tue, Sep 30, 2008 at 1:46 AM, Bart Massey wrote:
> >    Note, though, that one can easily imagine a nubAscii
> >    that is linear-time rather than the n log n time of
> >    nubOrd / nubInt, using a small bit vector to track the
> >    Chars.  Certainly nubBool has this property.  Hmm.
> >    What's our criterion for when the performance difference
> >    between two functions constitutes a practical semantic
> >    difference?  I'm claiming it's asymptotic complexity
> >    class, in which case (a) we should probably figure out how
> >    to expose some kind of nubFinite.  Or we could just take
> >    the position that in the vast universe of possible
> >    functions, our library cannot provide them all.  In
> >    which case (1b) nubWith is back in.  I guess I tend
> >    toward (1b).
> 
> Actually, the asymptotic complexity (measured in terms of list length)
> of nubOrd is identical to the asymptotic complexity of nubAscii,
> nubBool or nubFinite.  They differ by a constant factor of the log(#
> possible data values).

Incidentally, for data types with finite value ranges, nub itself
isn't bad, as it'd just O(n * # of values present).  So, for instance,
nub on a list of Bool is likely to be faster than nubOrd on a list of
bools, and it's certainly O(n)... I just ran a timing, and nub takes
about as long as length does.

David
Isaac Dupree | 3 Oct 19:35

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

David Roundy wrote:
> Actually, the asymptotic complexity (measured in terms of list length)
> of nubOrd is identical to the asymptotic complexity of nubAscii,
> nubBool or nubFinite.  They differ by a constant factor of the log(#
> possible data values).

except for infinite possible data values, such as Integer or 
String -- then they don't differ by a constant factor of 
infinity!  (And the constant factor on, say, Int64 and 
higher could be a lot larger than some implementations might 
be able to make it, I'm not sure)

-Isaac
Bart Massey | 4 Oct 06:25
Favicon
Gravatar

Re: Proposal #2629: Data.List: Replace nub; add nubOrd, nubInt, nubWith

I think that the easiest way to think about the asymptotic
complexity of the nub* functions is in terms of the length n
of the input list and the number of unique elements m in
that list, as I did in a previous posting.  My original
point in this thread, which I stated badly, is this: Since
array access is constant time, one can imagine an
implementation nubIx that has O(n) worst-case time on any Ix
datatype, regardless of m. This contrasts with the O(mn)
complexity of nub and the O(n log m) complexity of nubOrd on
values of type Ix.  It is true that for any fixed m any nub*
is O(n), but IMHO this fact is a bit misleading; we probably
want to think about the asymptotic complexity in m as well
as n to get a clear picture of what's going on.

    Bart Massey
    bart <at> cs.pdx.edu

Gmane