Andrew Coppin | 10 May 14:41

Copy on read

I just had a random idea, which I thought I'd share with you all.

I've heard about systems that do "copy on write". That is, all users 
share a single copy of some structure, until somebody tries to write on 
it. At that moment they get a personal copy to modify so they don't 
upset everybody else. This way, you don't have to go to all the expense 
of copying a potentially large structure unless it's actually necessary 
to do so.

Then I started thinking about Clean's "uniqueness typing". If I'm 
understanding this correctly, it lets you do things like mutate data 
in-place without requiring a monad. The restriction is that the compiler 
has to be satisfied, at compile-time, that you will never try to access 
the old data you just overwrote.

Although I once held more optimistic beliefs, as far as I can tell no 
current, real-world Haskell implementation ever mutates the fields of 
algebraic data types in-place. (Ignoring for a moment the issue of 
overwriting a thunk with it's result.) This strikes me as rather 
pessimistic. I was thinking... If Haskell had uniqueness typing, maybe 
the compiler could be made to infer when values are used in a unique 
way. And maybe if the compiler can detect data being used in a unique 
way, it could code for in-place updates instead of copying, and thereby 
gain a performance advantage.

I don't know how viable this would be - the thing that immediately jumps 
out at me is that in practice there might be comparatively few places 
where you can *guarantee* the transformation is safe, and so it might 
not get applied very much. And considering all this would likely take a 
fair bit of work on the compiler, that wouldn't be a great payoff. 
(Continue reading)

Neil Mitchell | 10 May 16:20
Picon

Re: Copy on read

Hi

You're not the first:

Jurriaan Hage and Stefan Holdermans. Heap recycling for lazy
languages. In John Hatcliff, Robert Glück, and Oege de Moor, editors,
_Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation
and Semantics-Based Program Manipulation_, PEPM'08, San Francisco,
California, USA, January 7--8, 2008, pages 189--197. ACM Press, 2008.
http://doi.acm.org/10.1145/1328408.1328436.

Like I said to them in private mail - nice idea, but without
benchmarks its only an idea. You have to consider effects of cache,
generational garbage collection, complexity etc. I think they are
going to do the benchmarks at some point, when we'll know if the idea
was a good one.

Thanks

Neil

On Sat, May 10, 2008 at 1:42 PM, Andrew Coppin
<andrewcoppin <at> btinternet.com> wrote:
> I just had a random idea, which I thought I'd share with you all.
>
> I've heard about systems that do "copy on write". That is, all users share a
> single copy of some structure, until somebody tries to write on it. At that
> moment they get a personal copy to modify so they don't upset everybody
> else. This way, you don't have to go to all the expense of copying a
> potentially large structure unless it's actually necessary to do so.
(Continue reading)

Dan Piponi | 13 May 03:10
Picon

Re: Copy on read

On Sat, May 10, 2008 at 7:20 AM, Neil Mitchell <ndmitchell <at> gmail.com> wrote:
>  Jurriaan Hage and Stefan Holdermans. Heap recycling for lazy
>  languages. In John Hatcliff, Robert Glück, and Oege de Moor, editors,
>  _Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation
>  and Semantics-Based Program Manipulation_, PEPM'08, San Francisco,
>  California, USA, January 7--8, 2008, pages 189--197. ACM Press, 2008.
>  http://doi.acm.org/10.1145/1328408.1328436.

If I'm reading it aright, it works by tagging the types of values as
unique or not but keeps that annotation secret from the programmer.
The typing rules are incredibly complicated so to make things less
scary, they are also hidden from the user. As a result, this idea
makes it impossible for a developer to reason about whether their code
will compile. That doesn't seem like a viable solution. Have I read
this correctly?

Cute idea though.
--
Dan
Stefan Holdermans | 14 May 23:43
Picon

Re: Copy on read

Dan,

Let me first apologize for this late reply.

Neil pointed you to

>> Jurriaan Hage and Stefan Holdermans. Heap recycling for lazy
>> languages. In John Hatcliff, Robert Glück, and Oege de Moor, editors,
>> _Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation
>> and Semantics-Based Program Manipulation_, PEPM'08, San Francisco,
>> California, USA, January 7--8, 2008, pages 189--197. ACM Press, 2008.
>> http://doi.acm.org/10.1145/1328408.1328436.

and you commented:

> If I'm reading it aright, it works by tagging the types of values as
> unique or not but keeps that annotation secret from the programmer.
> The typing rules are incredibly complicated so to make things less
> scary, they are also hidden from the user. As a result, this idea
> makes it impossible for a developer to reason about whether their code
> will compile. That doesn't seem like a viable solution. Have I read
> this correctly?

Well, let me first out point out that the paper by no means describes  
a complete solution. When we wrote it, we were mainly interested in  
the static semantics of pure nonstrict functional languages featuring  
a means to perform in-place updates explicitly. We use uniqueness  
types to ensure that such updates are safe, i.e., do not violate  
referential transparency, so that we can keep on enjoying the benefits  
of purity---equational reasoning being the most obvious example of  
(Continue reading)

Malcolm Wallace | 10 May 19:31
Picon

Re: Copy on read

> If Haskell had uniqueness typing, maybe the compiler could be made  
> to infer when values are used in a unique way. And maybe if the  
> compiler can detect data being used in a unique way, it could code  
> for in-place updates instead of copying, and thereby gain a  
> performance advantage.

Uniqueness typing does not lead to in-place update.  If a value is  
only used once, then there is no need to update it at all!  In fact, I  
believe ghc does this analysis, and has a minor optimisation that  
omits the thunk-update.  That is, if an unevaluated thunk is forced,  
but it has a unique usage, the original thunk is not overwritten with  
an indirection to the reduced value (as it normally would be), but the  
reduced value is just used directly.

I believe that rather than _uniqueness_, you are really trying to  
describe _linear_ usage, that is, multiple uses, but in a single non- 
branching sequence.  Single-threaded usage of a data structure might  
permit in-place modification of its parts.  The analysis for linear  
usage is much more difficult than for uniqueness, and David Wakeling's  
PhD Thesis "Linearity and Laziness" (circa 1990) did some experiments,  
but was ultimately pessimistic about the value.

Regards,
    Malcolm
Matthew Naylor | 11 May 17:02
Picon

Re: Copy on read

> Uniqueness typing does not lead to in-place update.  If a value is  
> only used once, then there is no need to update it at all!

my understanding is that if a value is uniquely-typed then it is
statically known never to have more than one reference, thus it can be
modified in-place.  Some potential advantages seem to be:

  * Less heap is used, reducing the strain on the garbage collector.

  * Memory writes can sometimes be avoided. Imagine changing the value of
    a leaf in a large unique tree.  Only one memory write is required
    rather than N, where N is the length of the path to that leaf
    from the root.  Such paths can obviously be quite long when working
    with lists.  Also, the cost of rebuilding constructors with many
    fields -- only to change a single field -- could be expensive.

I wonder to what extent deforestation supersedes such optimisations.
This would be a good topic to raise with a Cleaner.  The paper Neil
mentions looks like a nice alternative to uniqueness typing -- and it
appears that there will be a FitA talk about it, this Thursday.

Matt.
Andrew Coppin | 12 May 22:05

Re: Copy on read

Matthew Naylor wrote:
> I wonder to what extent deforestation supersedes such optimisations.
> This would be a good topic to raise with a Cleaner.  The paper Neil
> mentions looks like a nice alternative to uniqueness typing -- and it
> appears that there will be a FitA talk about it, this Thursday.
>   

What if you have, say, a giant array, and you run a loop that updates 
the entire array. Rather expensive to keep copying and collecting all 
that space. That's why most array-based code is explicitly in-place. But 
wouldn't it be nice if it didn't have to be?

Similarly, there are recursion patterns for which fusion isn't very 
easy. (How would you fuse, say, a Gaussian elimination algorithm?)
Matthew Naylor | 13 May 10:16
Picon

Re: Copy on read

Hi Andrew,

my probably dodgy reason for mentioning deforestation is that sharing
of intermediate values is a major stumbling block; code that uses data
linearly is possibly well suited for deforesting.  See Frankau's SASL
for a language that deforests all lists simply by not letting you copy
them!  (IIRC there is another constraint that forbids accumulating
parameters too.)

> Similarly, there are recursion patterns for which fusion isn't very 
> easy.

Yep, I suspect you're right.

> That's why most array-based code is explicitly in-place.  wouldn't
> it be nice if it didn't have to be?

I agree.  As an aside, DiffArray looks quite nice:

  http://www.haskell.org/haskellwiki/Modern_array_libraries

``if a diff array is used in a single-threaded style, ..., a!i takes
O(1) time and a//d takes O(length d).''

Notice the use of "seq" in 2nd example to enforce a kind of
single-threaded behaviour.  Seems nasty!  I wonder if this could be
avoided by providing a (*!*) such that

  arr *!* i = seq x (arr, x)
    where x = arr ! i
(Continue reading)


Gmane