Niklas Hambüchen <mail <at> nh2.me>
2013-10-16 17:32:32 GMT
On 16/10/13 17:46, Yitzchak Gale wrote:
> On Sun, Oct 13, 2013 at 6:25 PM, Niklas Hambüchen <mail <at> nh2.me> wrote:
>> ordNub behaves has the same behaviour as nub, while (Set.toList .
>> Set.fromList) doesn't.
> Perhaps you mean that they produce exactly the same
> output when fully evaluated. But I doubt that their behavior
> is *exactly* the same in every aspect. Given the differences
> in strictness between sets and lists, can you prove that
> nub and nubOrd have *exactly* the same strictness properties
> in every possible case?
Yes. The elements are are evaluated in the same order as in `nub`.
We can see that by comparing the implementations.
If you have objects that need not be fully evaluated to use (==) on
them, and need not fully evaluated *in a different way* when using
`compare` on them, then of course the strictness properties are different.
That is why one function has "Eq a =>" and the other one "Ord a =>" at
>> ...ordNub is *always* faster than nub, even for singleton lists.
> This is also a claim that I doubt. You say that you tested the
> case of singletons. How about this:
> 2 : replicate 100000 1 ++ 
> Well, you could optimize for that by having nubOrd squeeze
> runs of of equal elements in the input. But then how about
> something like this:
> [3,2,1] ++ take 100000 (cycle [3,2,1]) ++ 
> Or variations on that, cycling some other fairly small number of
> elements. Set containers must do extra comparisons, whereas
> the of cost running down the first few hundred elements of a linked
> list (as long as the compiler fuses the iteration down to a tight
> loop in the output code and you remain within the cache) is near zero.
> How could nubOrd be as fast as ord in these kinds of cases?
I make the following, really cool proposal:
* You add your test cases that you just mentioned to the benchmark list
* run `cabal build`
* run `dist/build/ordnub/ordnub -o report.html`.
Then you will see that ordNub is, indeed, faster.
> (as long as the compiler fuses the iteration down to a tight
> loop in the output code and you remain within the cache)
Even though a list is conceptually simpler than a set, a list node and a
set node are not that different (as I explained in the last email).
Both have a content and a pointer to the next node. The only difference
is that the Set node is optimised with strictness and unboxing, and the
list one is not.
> The only advantage of nubOrd
> is for very large lists where the complexity advantage overtakes
> the excellent constants of nub to provide better speed.
Sorry, I think this is absolutely invented.
What "excellent constants" do you mean?
How large are "very large lists"? I find lists of length 1000 a very
common case. An `n log2 n` algorithm is already 100 times faster on
those than an n² one.
I do not dispute that there exist n² algorithms that are faster than
others with better complexity for small inputs.
nub is not one of them.