30 Aug 20:18 2013

## Performance of delete-and-return-last-element

```Suppose I need to get an element from a data structure, and also
modify the data structure. For example, I might need to get and delete
the last element of a list:

darle xs = ((last xs), (rmlast xs)) where
rmlast [_] = []
rmlast (y:ys) = y:(rmlast ys)

There are probably other and better ways to write rmlast, but I want
to focus on the fact that darle here, for lack of a better name off
the top of my head, appears to traverse the list twice. Once to get
the element, and once to remove it to produce a new list. This seems
bad. Especially for large data structures, I don't want to be
traversing twice to do what ought to be one operation. To fix it, I
might be tempted to write something like:

darle' [a] = (a, [])
darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys))

But this version has lost its elegance. It was also kind of harder to
come up with, and for more complex data structures (like the binary
search tree) the simpler expression is really desirable. Can a really
smart compiler transform/optimize the first definition into something
that traverses the data structure only once? Can GHC?

- Lucas
```
30 Aug 22:04 2013

### Re: Performance of delete-and-return-last-element

I don't think a really smart compiler can make that transformation. It looks like an exponential-time algorithm would be required, but I can't prove that.

GHC definitely won't...

For this specific example, though, I'd probably do:

darle :: [a] -> (a, [a])
darle xs =
case reverse xs of
[]       -> error "darle: empty list"
(x:xs) -> (x, reverse xs)

- Clark

On Fri, Aug 30, 2013 at 2:18 PM, Lucas Paul wrote:
Suppose I need to get an element from a data structure, and also
modify the data structure. For example, I might need to get and delete
the last element of a list:

darle xs = ((last xs), (rmlast xs)) where
rmlast [_] = []
rmlast (y:ys) = y:(rmlast ys)

There are probably other and better ways to write rmlast, but I want
to focus on the fact that darle here, for lack of a better name off
the top of my head, appears to traverse the list twice. Once to get
the element, and once to remove it to produce a new list. This seems
bad. Especially for large data structures, I don't want to be
traversing twice to do what ought to be one operation. To fix it, I
might be tempted to write something like:

darle' [a] = (a, [])
darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys))

But this version has lost its elegance. It was also kind of harder to
come up with, and for more complex data structures (like the binary
search tree) the simpler expression is really desirable. Can a really
smart compiler transform/optimize the first definition into something
that traverses the data structure only once? Can GHC?

- Lucas

_______________________________________________

```_______________________________________________
```
30 Aug 22:17 2013

### Re: Performance of delete-and-return-last-element

```isn't this what zippers are for?

b

On Aug 30, 2013, at 1:04 PM, Clark Gaebel wrote:

> I don't think a really smart compiler can make that transformation. It looks like an exponential-time
algorithm would be required, but I can't prove that.
>
> GHC definitely won't...
>
> For this specific example, though, I'd probably do:
>
> darle :: [a] -> (a, [a])
> darle xs =
>   case reverse xs of
>     []       -> error "darle: empty list"
>     (x:xs) -> (x, reverse xs)
>
>   - Clark
>
>
> On Fri, Aug 30, 2013 at 2:18 PM, Lucas Paul <reilithion <at> gmail.com> wrote:
> Suppose I need to get an element from a data structure, and also
> modify the data structure. For example, I might need to get and delete
> the last element of a list:
>
> darle xs = ((last xs), (rmlast xs)) where
>   rmlast [_] = []
>   rmlast (y:ys) = y:(rmlast ys)
>
> There are probably other and better ways to write rmlast, but I want
> to focus on the fact that darle here, for lack of a better name off
> the top of my head, appears to traverse the list twice. Once to get
> the element, and once to remove it to produce a new list. This seems
> bad. Especially for large data structures, I don't want to be
> traversing twice to do what ought to be one operation. To fix it, I
> might be tempted to write something like:
>
> darle' [a] = (a, [])
> darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys))
>
> But this version has lost its elegance. It was also kind of harder to
> come up with, and for more complex data structures (like the binary
> search tree) the simpler expression is really desirable. Can a really
> smart compiler transform/optimize the first definition into something
> that traverses the data structure only once? Can GHC?
>
> - Lucas
>
> _______________________________________________
>
> _______________________________________________
```
31 Aug 14:35 2013

### Re: Performance of delete-and-return-last-element

One solution would be to fold over a specific semigroup instead of a recursive function:

import Data.Semigroup import Data.Foldable (foldMap) import Data.Maybe (maybeToList) data Darle a = Darle { getInit :: [a], getLast :: a } deriving Show instance Semigroup (Darle a) where ~(Darle xs1 l1) <> ~(Darle xs2 l2) = Darle (xs1 ++ [l1] ++ xs2) l2 darle :: [a] -> Darle a darle = foldr1 (<>) . map (Darle [])

It's somewhat more verbose, but the core idea is clearly expressed in the one line that defines <>, and IMHO it better shows what are we doing rather than how. It's sufficiently lazy so that you can do something like head . getInit \$ darle [1..].

Best regards,
Petr

Dne 08/30/2013 08:18 PM, Lucas Paul napsal(a):

Suppose I need to get an element from a data structure, and also modify the data structure. For example, I might need to get and delete the last element of a list: darle xs = ((last xs), (rmlast xs)) where rmlast [_] = [] rmlast (y:ys) = y:(rmlast ys) There are probably other and better ways to write rmlast, but I want to focus on the fact that darle here, for lack of a better name off the top of my head, appears to traverse the list twice. Once to get the element, and once to remove it to produce a new list. This seems bad. Especially for large data structures, I don't want to be traversing twice to do what ought to be one operation. To fix it, I might be tempted to write something like: darle' [a] = (a, []) darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys)) But this version has lost its elegance. It was also kind of harder to come up with, and for more complex data structures (like the binary search tree) the simpler expression is really desirable. Can a really smart compiler transform/optimize the first definition into something that traverses the data structure only once? Can GHC? - Lucas _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe <at> haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

```_______________________________________________
```
1 Sep 21:13 2013

### Re: Performance of delete-and-return-last-element

```Am 31.08.13 14:35, schrieb Petr Pudlák:
> One solution would be to fold over a specific semigroup instead of a
> recursive function:
>
> |import  Data.Semigroup
> import  Data.Foldable(foldMap)
> import  Data.Maybe(maybeToList)
>
> data  Darle  a =Darle  {getInit  :: [a],getLast  ::a  }
>   deriving  Show
> instance  Semigroup  (Darle  a)where
>     ~(Darle  xs1 l1) <> ~(Darle  xs2 l2) =Darle  (xs1 ++ [l1] ++ xs2) l2
>
> darle  :: [a] ->Darle  a
> darle  = foldr1 (<>) . map (Darle  [])|
>
> It's somewhat more verbose, but the core idea is clearly expressed in
> the one line that defines |<>|, and IMHO it better shows /what/ are we
> doing rather than /how/. It's sufficiently lazy so that you can do
> something like |head . getInit \$ darle [1..]|.

I am wondering why you put the Semigroup instance there and what the
other imports are for. Doesn't this work just as well?

data  Darle  a = Darle  {getInit  :: [a], getLast  :: a}
deriving  Show

~(Darle  xs1 l1) <> ~(Darle  xs2 l2) = Darle  (xs1 ++ [l1] ++ xs2) l2

darle  :: [a] ->Darle  a
darle  = foldr1 (<>) . map (Darle  [])

Seems to work here. I am still puzzled, though, if this is really a good
idea performance-wise. I am afraid I don't understand it well enough.

Harald

--

--
Harald Bögeholz    <bo <at> ct.de> (PGP key available from servers)
Redaktion c't      Tel.: +49 511 5352-300  Fax: +49 511 5352-417
http://www.ct.de/

int f[9814],b,c=9814,g,i;long a=1e4,d,e,h;
main(){for(;b=c,c-=14;i=printf("%04d",e+d/a),e=d%a)
while(g=--b*2)d=h*b+a*(i?f[b]:a/5),h=d/--g,f[b]=d%g;}
(Arndt/Haenel)

Affe Apfel Vergaser

/* Heise Zeitschriften Verlag GmbH & Co. KG * Karl-Wiechert-Allee 10 *
30625 Hannover * Registergericht: Amtsgericht Hannover HRA 26709 *
Persönlich haftende Gesellschafterin: Heise Zeitschriften Verlag *
Geschäftsführung GmbH * Registergericht: Amtsgericht Hannover, HRB
60405 * Geschäftsführer: Ansgar Heise, Dr. Alfons Schräder */
```
5 Sep 20:29 2013

### Re: Performance of delete-and-return-last-element

```Dne 09/01/2013 09:13 PM, Harald Bögeholz napsal(a):
> Am 31.08.13 14:35, schrieb Petr Pudlák:
>> One solution would be to fold over a specific semigroup instead of a
>> recursive function:
>>
>> |import  Data.Semigroup
>> import  Data.Foldable(foldMap)
>> import  Data.Maybe(maybeToList)
>>
>> data  Darle  a =Darle  {getInit  :: [a],getLast  ::a  }
>>    deriving  Show
>> instance  Semigroup  (Darle  a)where
>>      ~(Darle  xs1 l1) <> ~(Darle  xs2 l2) =Darle  (xs1 ++ [l1] ++ xs2) l2
>>
>> darle  :: [a] ->Darle  a
>> darle  = foldr1 (<>) . map (Darle  [])|
>>
>> It's somewhat more verbose, but the core idea is clearly expressed in
>> the one line that defines |<>|, and IMHO it better shows /what/ are we
>> doing rather than /how/. It's sufficiently lazy so that you can do
>> something like |head . getInit \$ darle [1..]|.
> I am wondering why you put the Semigroup instance there and what the
> other imports are for. Doesn't this work just as well?
Sorry, the two other imports are redundant, I forgot to erase them when
playing with various ideas.

The Semigroup instance of course isn't necessary for this particular
purpose. But having it (1) signals that the operation satisfies some
laws (associativity) and (2) allows the structure to be reused anywhere
where a Semigroup is required.

For example, we can wrap it into `Option` to get a monoid, and perhaps
use it in `foldMap`. This way we extend the functionality to empty
collections: