Lucas Paul | 30 Aug 20:18 2013
Picon

Performance of delete-and-return-last-element

Suppose I need to get an element from a data structure, and also
modify the data structure. For example, I might need to get and delete
the last element of a list:

darle xs = ((last xs), (rmlast xs)) where
  rmlast [_] = []
  rmlast (y:ys) = y:(rmlast ys)

There are probably other and better ways to write rmlast, but I want
to focus on the fact that darle here, for lack of a better name off
the top of my head, appears to traverse the list twice. Once to get
the element, and once to remove it to produce a new list. This seems
bad. Especially for large data structures, I don't want to be
traversing twice to do what ought to be one operation. To fix it, I
might be tempted to write something like:

darle' [a] = (a, [])
darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys))

But this version has lost its elegance. It was also kind of harder to
come up with, and for more complex data structures (like the binary
search tree) the simpler expression is really desirable. Can a really
smart compiler transform/optimize the first definition into something
that traverses the data structure only once? Can GHC?

- Lucas
Clark Gaebel | 30 Aug 22:04 2013
Picon
Picon

Re: Performance of delete-and-return-last-element

I don't think a really smart compiler can make that transformation. It looks like an exponential-time algorithm would be required, but I can't prove that.

GHC definitely won't...

For this specific example, though, I'd probably do:

darle :: [a] -> (a, [a])
darle xs =
  case reverse xs of
    []       -> error "darle: empty list"
    (x:xs) -> (x, reverse xs)

  - Clark


On Fri, Aug 30, 2013 at 2:18 PM, Lucas Paul <reilithion <at> gmail.com> wrote:
Suppose I need to get an element from a data structure, and also
modify the data structure. For example, I might need to get and delete
the last element of a list:

darle xs = ((last xs), (rmlast xs)) where
  rmlast [_] = []
  rmlast (y:ys) = y:(rmlast ys)

There are probably other and better ways to write rmlast, but I want
to focus on the fact that darle here, for lack of a better name off
the top of my head, appears to traverse the list twice. Once to get
the element, and once to remove it to produce a new list. This seems
bad. Especially for large data structures, I don't want to be
traversing twice to do what ought to be one operation. To fix it, I
might be tempted to write something like:

darle' [a] = (a, [])
darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys))

But this version has lost its elegance. It was also kind of harder to
come up with, and for more complex data structures (like the binary
search tree) the simpler expression is really desirable. Can a really
smart compiler transform/optimize the first definition into something
that traverses the data structure only once? Can GHC?

- Lucas

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Ben | 30 Aug 22:17 2013
Picon

Re: Performance of delete-and-return-last-element

isn't this what zippers are for?

b

On Aug 30, 2013, at 1:04 PM, Clark Gaebel wrote:

> I don't think a really smart compiler can make that transformation. It looks like an exponential-time
algorithm would be required, but I can't prove that.
> 
> GHC definitely won't...
> 
> For this specific example, though, I'd probably do:
> 
> darle :: [a] -> (a, [a])
> darle xs =
>   case reverse xs of
>     []       -> error "darle: empty list"
>     (x:xs) -> (x, reverse xs)
> 
>   - Clark
> 
> 
> On Fri, Aug 30, 2013 at 2:18 PM, Lucas Paul <reilithion <at> gmail.com> wrote:
> Suppose I need to get an element from a data structure, and also
> modify the data structure. For example, I might need to get and delete
> the last element of a list:
> 
> darle xs = ((last xs), (rmlast xs)) where
>   rmlast [_] = []
>   rmlast (y:ys) = y:(rmlast ys)
> 
> There are probably other and better ways to write rmlast, but I want
> to focus on the fact that darle here, for lack of a better name off
> the top of my head, appears to traverse the list twice. Once to get
> the element, and once to remove it to produce a new list. This seems
> bad. Especially for large data structures, I don't want to be
> traversing twice to do what ought to be one operation. To fix it, I
> might be tempted to write something like:
> 
> darle' [a] = (a, [])
> darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys))
> 
> But this version has lost its elegance. It was also kind of harder to
> come up with, and for more complex data structures (like the binary
> search tree) the simpler expression is really desirable. Can a really
> smart compiler transform/optimize the first definition into something
> that traverses the data structure only once? Can GHC?
> 
> - Lucas
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
Petr Pudlák | 31 Aug 14:35 2013
Picon

Re: Performance of delete-and-return-last-element

One solution would be to fold over a specific semigroup instead of a recursive function:

import Data.Semigroup import Data.Foldable (foldMap) import Data.Maybe (maybeToList) data Darle a = Darle { getInit :: [a], getLast :: a } deriving Show instance Semigroup (Darle a) where ~(Darle xs1 l1) <> ~(Darle xs2 l2) = Darle (xs1 ++ [l1] ++ xs2) l2 darle :: [a] -> Darle a darle = foldr1 (<>) . map (Darle [])

It's somewhat more verbose, but the core idea is clearly expressed in the one line that defines <>, and IMHO it better shows what are we doing rather than how. It's sufficiently lazy so that you can do something like head . getInit $ darle [1..].

Best regards,
Petr

Dne 08/30/2013 08:18 PM, Lucas Paul napsal(a):

Suppose I need to get an element from a data structure, and also modify the data structure. For example, I might need to get and delete the last element of a list: darle xs = ((last xs), (rmlast xs)) where rmlast [_] = [] rmlast (y:ys) = y:(rmlast ys) There are probably other and better ways to write rmlast, but I want to focus on the fact that darle here, for lack of a better name off the top of my head, appears to traverse the list twice. Once to get the element, and once to remove it to produce a new list. This seems bad. Especially for large data structures, I don't want to be traversing twice to do what ought to be one operation. To fix it, I might be tempted to write something like: darle' [a] = (a, []) darle' (x:xs) = let (a, ys) = darle' xs in (a, (x:ys)) But this version has lost its elegance. It was also kind of harder to come up with, and for more complex data structures (like the binary search tree) the simpler expression is really desirable. Can a really smart compiler transform/optimize the first definition into something that traverses the data structure only once? Can GHC? - Lucas _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe <at> haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Harald Bögeholz | 1 Sep 21:13 2013
Picon

Re: Performance of delete-and-return-last-element

Am 31.08.13 14:35, schrieb Petr Pudlák:
> One solution would be to fold over a specific semigroup instead of a
> recursive function:
> 
> |import  Data.Semigroup
> import  Data.Foldable(foldMap)
> import  Data.Maybe(maybeToList)
> 
> data  Darle  a =Darle  {getInit  :: [a],getLast  ::a  }
>   deriving  Show
> instance  Semigroup  (Darle  a)where
>     ~(Darle  xs1 l1) <> ~(Darle  xs2 l2) =Darle  (xs1 ++ [l1] ++ xs2) l2
> 
> darle  :: [a] ->Darle  a
> darle  = foldr1 (<>) . map (Darle  [])|
> 
> It's somewhat more verbose, but the core idea is clearly expressed in
> the one line that defines |<>|, and IMHO it better shows /what/ are we
> doing rather than /how/. It's sufficiently lazy so that you can do
> something like |head . getInit $ darle [1..]|.

I am wondering why you put the Semigroup instance there and what the
other imports are for. Doesn't this work just as well?

data  Darle  a = Darle  {getInit  :: [a], getLast  :: a}
  deriving  Show

~(Darle  xs1 l1) <> ~(Darle  xs2 l2) = Darle  (xs1 ++ [l1] ++ xs2) l2

darle  :: [a] ->Darle  a
darle  = foldr1 (<>) . map (Darle  [])

Seems to work here. I am still puzzled, though, if this is really a good
idea performance-wise. I am afraid I don't understand it well enough.

Harald

--

-- 
Harald Bögeholz    <bo <at> ct.de> (PGP key available from servers)
Redaktion c't      Tel.: +49 511 5352-300  Fax: +49 511 5352-417
                   http://www.ct.de/

                   int f[9814],b,c=9814,g,i;long a=1e4,d,e,h;
                   main(){for(;b=c,c-=14;i=printf("%04d",e+d/a),e=d%a)
                   while(g=--b*2)d=h*b+a*(i?f[b]:a/5),h=d/--g,f[b]=d%g;}
                                                          (Arndt/Haenel)

                   Affe Apfel Vergaser

/* Heise Zeitschriften Verlag GmbH & Co. KG * Karl-Wiechert-Allee 10 *
   30625 Hannover * Registergericht: Amtsgericht Hannover HRA 26709 *
   Persönlich haftende Gesellschafterin: Heise Zeitschriften Verlag *
   Geschäftsführung GmbH * Registergericht: Amtsgericht Hannover, HRB
   60405 * Geschäftsführer: Ansgar Heise, Dr. Alfons Schräder */
Petr Pudlák | 5 Sep 20:29 2013
Picon

Re: Performance of delete-and-return-last-element

Dne 09/01/2013 09:13 PM, Harald Bögeholz napsal(a):
> Am 31.08.13 14:35, schrieb Petr Pudlák:
>> One solution would be to fold over a specific semigroup instead of a
>> recursive function:
>>
>> |import  Data.Semigroup
>> import  Data.Foldable(foldMap)
>> import  Data.Maybe(maybeToList)
>>
>> data  Darle  a =Darle  {getInit  :: [a],getLast  ::a  }
>>    deriving  Show
>> instance  Semigroup  (Darle  a)where
>>      ~(Darle  xs1 l1) <> ~(Darle  xs2 l2) =Darle  (xs1 ++ [l1] ++ xs2) l2
>>
>> darle  :: [a] ->Darle  a
>> darle  = foldr1 (<>) . map (Darle  [])|
>>
>> It's somewhat more verbose, but the core idea is clearly expressed in
>> the one line that defines |<>|, and IMHO it better shows /what/ are we
>> doing rather than /how/. It's sufficiently lazy so that you can do
>> something like |head . getInit $ darle [1..]|.
> I am wondering why you put the Semigroup instance there and what the
> other imports are for. Doesn't this work just as well?
Sorry, the two other imports are redundant, I forgot to erase them when 
playing with various ideas.

The Semigroup instance of course isn't necessary for this particular 
purpose. But having it (1) signals that the operation satisfies some 
laws (associativity) and (2) allows the structure to be reused anywhere 
where a Semigroup is required.

For example, we can wrap it into `Option` to get a monoid, and perhaps 
use it in `foldMap`. This way we extend the functionality to empty 
collections:
```haskell
darle :: Foldable f => f a -> Maybe (Darle a)
darle = getOption . foldMap (Option . Just . Darle [])
```

   Best regards,
   Petr

Gmane