Niklas Hambüchen | 11 Sep 18:20 2012

Adding scanl'

I would like to propose adding a the functions scanl' and scanl1' to
Data.List.

The presence and regular use of foldl' and foldl1' suggests that
corresponding scan functions should be available.

> scanl'                   :: (a -> b -> a) -> a -> [b] -> [a]
> scanl' f q ls            =  q `seq` (q : (case ls of
>                                      []   -> []
>                                      x:xs -> scanl' f (f q x) xs))

What do you think?

Niklas
Henning Thielemann | 11 Sep 20:43 2012
Picon

Re: Adding scanl'


On Tue, 11 Sep 2012, Niklas Hambüchen wrote:

> I would like to propose adding a the functions scanl' and scanl1' to
> Data.List.
>
> The presence and regular use of foldl' and foldl1' suggests that
> corresponding scan functions should be available.
>
>> scanl'                   :: (a -> b -> a) -> a -> [b] -> [a]
>> scanl' f q ls            =  q `seq` (q : (case ls of
>>                                      []   -> []
>>                                      x:xs -> scanl' f (f q x) xs))
>
> What do you think?

I don't know whether this "scanl'" is of much use. "foldl'" is required 
because we can access its accumulator only after "foldl'" finished. But in 
"scanl'" you can and usually should access the interim accumulator values 
that are contained in the result list.

_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries
Evan Laforge | 11 Sep 20:51 2012
Picon

Re: Adding scanl'

On Tue, Sep 11, 2012 at 11:43 AM, Henning Thielemann
<lemming <at> henning-thielemann.de> wrote:
> I don't know whether this "scanl'" is of much use. "foldl'" is required
> because we can access its accumulator only after "foldl'" finished. But in
> "scanl'" you can and usually should access the interim accumulator values
> that are contained in the result list.

I use it relatively frequently, e.g. 'intervals = scanl (+) 0 (cycle
[2, 1, 2, 2, 1, 2, 2])'
Henning Thielemann | 11 Sep 20:57 2012
Picon

Re: Adding scanl'


On Tue, 11 Sep 2012, Evan Laforge wrote:

> On Tue, Sep 11, 2012 at 11:43 AM, Henning Thielemann
> <lemming <at> henning-thielemann.de> wrote:
>> I don't know whether this "scanl'" is of much use. "foldl'" is required
>> because we can access its accumulator only after "foldl'" finished. But in
>> "scanl'" you can and usually should access the interim accumulator values
>> that are contained in the result list.
>
> I use it relatively frequently, e.g. 'intervals = scanl (+) 0 (cycle
> [2, 1, 2, 2, 1, 2, 2])'

And what do you do with the elements of 'intervals'? I guess you consume 
them and then you force their evaluation this way.

However, it might be that you filter out some elements and thus consume 
only some of the elements. In this case it might be better to have a 
"scanl'".
Evan Laforge | 11 Sep 21:06 2012
Picon

Re: Adding scanl'

On Tue, Sep 11, 2012 at 11:57 AM, Henning Thielemann
<lemming <at> henning-thielemann.de> wrote:
>
> On Tue, 11 Sep 2012, Evan Laforge wrote:
>
>> On Tue, Sep 11, 2012 at 11:43 AM, Henning Thielemann
>> <lemming <at> henning-thielemann.de> wrote:
>>>
>>> I don't know whether this "scanl'" is of much use. "foldl'" is required
>>> because we can access its accumulator only after "foldl'" finished. But
>>> in
>>> "scanl'" you can and usually should access the interim accumulator values
>>> that are contained in the result list.
>>
>>
>> I use it relatively frequently, e.g. 'intervals = scanl (+) 0 (cycle
>> [2, 1, 2, 2, 1, 2, 2])'
>
>
> And what do you do with the elements of 'intervals'? I guess you consume
> them and then you force their evaluation this way.

Yes, you're right, I don't rely on laziness in the result.  And even
if I did, in the case of (+) it probably isn't worth it anyway.
Sorry, I missed that the key point was whether or not the interim
values are used.
Niklas Hambüchen | 11 Sep 21:24 2012

Re: Adding scanl'

On 11/09/12 19:57, Henning Thielemann wrote:
> However, it might be that you filter out some elements and thus consume
> only some of the elements. In this case it might be better to have a
> "scanl'".

Exactly. But the even more problematic case is when *you* don't do
anything with it, but write a library for others. You don't know what
they are going to do with the list you give them, and they don't know
(and shouldn't need to know) about the implementation detail that you
are using scanl - they only see the type, e.g. [Int], and can easily
fall into the trap of calling last on that.

I give a real-world example:

I wrote a small statistics library that calculates moving sums,
averages, standard deviations and so on. You give it the infinite list,
it could give you the infinite list of moving sums of size n (sums of
successive n elements).

movingWindowSum :: Int -> [Int] -> [Int]

Somewhere in movingWindowSum I use "scanl (+) 0". If the user of my
library calls last on that, which is not unexpected for my use case, or
is only interested in certain parts of that list, he gets a space leak
due to my implementation detail.
So I either have to disclose my implementation detail or hand-write a
scanl' into my library.

As a side note:
Data.List.sum [1..100000000] eats all my memory in my ghci.
(Continue reading)

Ben Millwood | 11 Sep 23:23 2012
Picon

Re: Adding scanl'

On Tue, Sep 11, 2012 at 8:24 PM, Niklas Hambüchen <mail <at> nh2.me> wrote:
> As a side note:
> Data.List.sum [1..100000000] eats all my memory in my ghci.
> Why does it use foldl, not foldl'?

Strictly* speaking, it's possible for (+) to be lazy in its
parameters; in any case, if you compile with optimisations you may
find the memory leak goes away.

* No pun intended.
Niklas Hambüchen | 12 Sep 14:30 2012

Re: Adding scanl'

On 11/09/12 22:23, Ben Millwood wrote:
> if you compile with optimisations you may
> find the memory leak goes away.

Sure, but should users expect the space complexity being O(n) vs O(1)
depending on a compiler switch?

Was it intentional that sum is defined in terms of foldl and not foldl'?
Henning Thielemann | 12 Sep 14:49 2012
Picon

Re: Adding scanl'


On Wed, 12 Sep 2012, Niklas Hambüchen wrote:

> On 11/09/12 22:23, Ben Millwood wrote:
>> if you compile with optimisations you may
>> find the memory leak goes away.
>
> Sure, but should users expect the space complexity being O(n) vs O(1)
> depending on a compiler switch?
>
> Was it intentional that sum is defined in terms of foldl and not foldl'?

I think yes, because (+) can be lazy, as Ben mentioned. Think of the sum 
of power series. Actually, foldl' would also work for power series, 
because it evaluates only the top-most constructor. This is also 
surprising. For me foldl' is quite a hack, since it relies on the hack 
'seq'.

_______________________________________________
Libraries mailing list
Libraries <at> haskell.org
http://www.haskell.org/mailman/listinfo/libraries
wren ng thornton | 14 Sep 01:18 2012

Re: Adding scanl'

On 9/12/12 2:30 PM, Niklas Hambüchen wrote:
> Was it intentional that sum is defined in terms of foldl and not foldl'?

Yes, because sum is defined in the Report as using foldl. This is silly, 
but it is what it is. With optimizations on, GHC often converts foldl 
into foldl' (when it's semantics-preserving) because of this and other 
functions which "must" use foldl.

--

-- 
Live well,
~wren
Niklas Hambüchen | 14 Sep 02:03 2012

Re: Adding scanl'

> Yes, because sum is defined in the Report as using foldl. This is silly,
> but it is what it is. With optimizations on, GHC often converts foldl
> into foldl' (when it's semantics-preserving) because of this and other
> functions which "must" use foldl.

I see. This, though, sounds like an argument for scanl' for me, so that
I don't have to rely on it doing optimisations like this "often" (in
last . scanl, it obviously doesn't), and not to have the usefulness of
my library hovering between fast/constant-memory and eats-all-my-ram
based on the hope that -O guesses what I want (doesn't work in ghci, of
course) *and* that the user knows how I construct my lists internally.

Gmane