Johan Brinch | 30 Oct 13:50 2012
Picon

ANNOUNCE: btree-concurrent

Greetings Haskellers,

We are pleased to announce this new package, featuring a concurrent
b-tree developed as part of our master's thesis on backup systems. The
package features code related to STM, caching, QuickCheck and of
course concurrency and parallelism.

And it's also on Hackage:
http://hackage.haskell.org/package/btree-concurrent

In case the build fails on Hackage and the documentation doesn't work,
it's also available here:
http://hindsight.dk/doc/html/btree-concurrent

Credits to Thomas Conway of Melbourne University, for discussions
related to the implementation, and to our supervisor Ken Friis Larsen
of University of Copenhagen.

And now, the description of the package:
=====

A backend agnostic, concurrent BTree with relaxed balance[1] written
in Haskell using a mix of IO operations and pure STM.

Although the code does work, it is neither production-ready nor complete.

Features include:
* Caching: While nodes are periodically saved on persistent storage
(e.g. disk) they are cached in-memory during operations.

(Continue reading)

Alberto G. Corona | 30 Oct 22:09 2012
Picon

Re: [Haskell-cafe] ANNOUNCE: btree-concurrent

Hi.

That´s fine. I missed an implementation of a  persistent b-tree in haskell. I planned to do my own, but it is not trivial task.

how the IO and STM is managed? . The serialization- deserialization is automatic  or programmer must write the cached blocks?  (i suppose that the block reads are automatic on data requests). 

The state in the disk is coherent at every moment?. I mean, if there is a program failure, the state in the btree in the file would be coherent?

For example, if I write two TVars in two different blocks within the same STM transaction, do I have the guarantee that either both updated values will be in the file storage or none of them  will be?


Alberto


2012/10/30 Johan Brinch <brinchj <at> gmail.com>
Greetings Haskellers,

We are pleased to announce this new package, featuring a concurrent
b-tree developed as part of our master's thesis on backup systems. The
package features code related to STM, caching, QuickCheck and of
course concurrency and parallelism.

And it's also on Hackage:
http://hackage.haskell.org/package/btree-concurrent

In case the build fails on Hackage and the documentation doesn't work,
it's also available here:
http://hindsight.dk/doc/html/btree-concurrent


Credits to Thomas Conway of Melbourne University, for discussions
related to the implementation, and to our supervisor Ken Friis Larsen
of University of Copenhagen.


And now, the description of the package:
=====

A backend agnostic, concurrent BTree with relaxed balance[1] written
in Haskell using a mix of IO operations and pure STM.

Although the code does work, it is neither production-ready nor complete.

Features include:
* Caching: While nodes are periodically saved on persistent storage
(e.g. disk) they are cached in-memory during operations.

* Live flushing: It is possible to save the current version of the
tree to disk and run modifying operations at the same time. The nodes
are updated buttom-up to ensure a valid tree in the case of a crash.

* Backend agnosticism: A simple API is used as an abstraction for the
persistent storage.

* Verification: The test-suite uses QuickCheck to compare the trees
behaviour to that of Data.Map.

Deficients include:
* Memory greedy. Nodes are not stored in a compact fashion and are
constantly being serialised/deserialised.

* findMin and findMax are incomplete and may fail (see the TODO for details).

* The implementation does not parallelise well.


[1] B-trees with relaxed balance, K. S. Larsen & R. Fagerberg,
Parallel Processing Symposium, 1995. Proceedings., 9th International

--
Johan Brinch & Morten Brøns-Pedersen

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



--
Alberto.
_______________________________________________
Haskell mailing list
Haskell <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell
Johan Brinch | 31 Oct 11:45 2012
Picon

Re: [Haskell-cafe] ANNOUNCE: btree-concurrent

On Tue, Oct 30, 2012 at 10:09 PM, Alberto G. Corona <agocorona <at> gmail.com> wrote:
> how the IO and STM is managed?

STM and IO works in cooperation in a safe manner using the error
monad. Something like ErrorT (IO ()) STM, which means that either the
STM action will result in an IO action that needs to run (like reading
a node and storing it in the cache), or it will succeed with the
requested result (after some btree operation).

When the result is an error (with an IO action), the IO action is run
and the STM operation is retried, thus retrying the btree operation
until it has everything it needs and can succeed.

> The serialization- deserialization is
> automatic  or programmer must write the cached blocks?  (i suppose that the
> block reads are automatic on data requests).
>

The cereal package is used to automatic serialize and deserialize keys
and values. This makes for an easy-to-use API (just dump data in
there), but it also takes up performance, and requires a lot of memory
(pointers everywhere!).

I've considered changing the API to require bytestring keys and values
for simplicity.

> The state in the disk is coherent at every moment?. I mean, if there is a
> program failure, the state in the btree in the file would be coherent?
>
> For example, if I write two TVars in two different blocks within the same
> STM transaction, do I have the guarantee that either both updated values
> will be in the file storage or none of them  will be?

Well, STM is used internally, but it's not exposed. Thus, simple
operations (lookup, insert, modify) will run atomically and in
isolation, but they are not composable. They cannot be composed with
other STM operations or with each other. Saving to persistent storage
is done buttom-up to ensure integrity in case of a crash (the parent
is stored /after/ the nodes it is referencing).

I don't know whether composing tree-operations is possible somehow by
rewriting the operations, but composing with arbitrary STM operations
likely isn't due to the need for IO actions (if the node isn't in the
cache, it needs to be fetched from the external persistent storage).

Composing non-modifying or idempotent operations are theoretically
composable (like ensuring two keys are both present at the same time),
but non-idempotent operations are more difficult if not impossible due
to retrying.

--

-- 
Johan Brinch

Gmane