Re: Implementation of Red-Black Trees
On Jul 6, 2008, at 12:28 PM, Raymond Myers wrote:
> The Red-Black Tree in "Purely Functional Data Structures", by Chris
> Okasaki is a good starting point. However, it is very minimal and
> does not include things like Delete and Union/Merge. For a linear
> time implementation of Union, you might look at the method addAll
> in Java's TreeMap, and its helpers.
I haven't thought much about a functional red-black tree, but if you
are looking for the standard imperative kind, the article in
Wikipedia on red-black trees is excellent--one of the clearest I've
seen. And it covers Delete, which I find much trickier than Insert.
I looked at Okasaki's "Red-Black Trees in a Functional Setting" (an
earlier paper that I could get on the web), which describes a
slightly different algorithm, but close enough.
> One of the trickier parts of a Red-Black Tree is balancing after
> insertion.
>
> (Adapted from Chris Okasaki's Haskell)
> balance :: RB a -> a -> RB a -> RB a
> balance (T R a x b) y (T R c z d) = T R (T B a x b) y (T B c z d)
> balance (T R (T R a x b) y c) z d = T R (T B a x b) y (T B c z d)
> balance (T R a x (T R b y c)) z d = T R (T B a x b) y (T B c z d)
> balance a x (T R b y (T R c z d)) = T R (T B a x b) y (T B c z d)
> balance a x (T R (T R b y c) z d) = T R (T B a x b) y (T B c z d)
> balance a x b = T B a x b
>
> Here is a balance() from one of the Fortress versions I am kicking
> around. (I hope it's ok to use the Unicode representation over the
> mailing list.)
> This version only implements a Set, not a Map, to keep things
> simple. I also made RedNode and BlackNode separate types, both
> extending the NodeSet trait. Overall, it is probably better to make
> NodeSet an object with a color field, which makes balance() a bit
> more hairy.
Hmm... not sure which is preferable. Making object a color field is
good in an imperative version because then you can just change the
field rather than allocating a new node. In a functional version,
you don't have that option, and distinguishing RedNode and BlackNode
types allows you to use dispatch (or typecase).
> Note that the Fortress code handles the base case "BlackNode⟦E⟧
> (l,v,r)" three different times, due to the nature of typecase.
> There may be ways of avoiding this.
>
> balance⟦E⟧(l:Set⟦E⟧, v:E, r:Set⟦E⟧) : Set⟦E⟧ = do
> typecase (l,r) of
> (RedNode⟦E⟧, RedNode⟦E⟧) ⇒ RedNode(l.toBlack(), v,
> r.toBlack())
> (RedNode⟦E⟧, Set⟦E⟧) ⇒
> typecase (ll,lr) = (l.l,l.r) of
> (RedNode⟦E⟧, Set⟦E⟧) ⇒ RedNode(ll.toBlack(), l.v,
> BlackNode(l.r, v, r))
> (Set⟦E⟧, RedNode⟦E⟧) ⇒ RedNode(ll.toBlack(), l.v,
> BlackNode(l.r, v, r))
I think this case should yield RedNode(BlackNode(ll, l.v, lr.l),
lr.v, BlackNode(lr.r, v, r)). (It's the "double rotation" case.)
Also, you can use lr rather than l.r in the two previous cases, and
rl and rr instead of r.l and r.r in the two analogous cases below.
> (Set⟦E⟧, Set⟦E⟧) ⇒ BlackNode⟦E⟧(l,v,r)
> end
> (Set⟦E⟧, RedNode⟦E⟧) ⇒
> typecase (rl,rr) = (r.l,r.r) of
> (Set⟦E⟧, RedNode⟦E⟧) ⇒ RedNode(BlackNode(l, v,
> r.l), r.v, rr.toBlack())
> (RedNode⟦E⟧, Set⟦E⟧) ⇒ RedNode(BlackNode(l, v,
> rl.l), rl.v, BlackNode(rl.r, r.v, r.r))
> (Set⟦E⟧, Set⟦E⟧) ⇒ BlackNode⟦E⟧(l,v,r)
> end
> (Set⟦E⟧, Set⟦E⟧) ⇒ BlackNode⟦E⟧(l,v,r)
> end
> end
Since the first typecase is just dispatching on the arguments of
balance, you can just overload balance and let dispatch do the work:
balance[\E\](l: RedNode[\E\], v: E, r: RedNode[\E\]) = RedNode
(l.toBlack(), v, r.toBlack())
balance[\E\](l: RedNode[\E\], v: E, r: Set[\E\]) = typecase (ll, lr)
= (l.l, l.r) of
(RedNode[\E\], Set[\E\]) => RedNode(ll.toBlack(), l.v, BlackNode
(lr, v, r))
(Set[\E\], RedNode[\E\]) => RedNode(BlackNode(ll, l.v, lr.l),
lr.v, BlackNode(lr.r, v, r))
else => BlackNode(l,v,r)
end
balance[\E\](l: Set[\E\], v: E, r: RedNode[\E\]) = typecase (rl, rr)
= (r.l, r.r) of
(Set[\E\], RedNode[\E\]) => RedNode(BlackNode(l, v, rl), r.v,
rr.toBlack())
(RedNode[\E\], Set[\E\]) => RedNode(BlackNode(l, v, rl.l), rl.v,
BlackNode(rl.r, r.v, rr))
else => BlackNode(l,v,r)
end
balance[\E\](l: Set[\E\], v: E, r: Set[\E\]) = BlackNode(l,v,r)
It's a matter of style which is preferable, but I prefer the
overloading form.
Alternatively, if you defined isRed and isBlack predicates (and you
could use these whether you chose to represent colors by types (as
you do) or using fields), you could write the following:
balance[\E\](l: Set[\E\], v: E, r: Set[\E\]) =
if l.isRed() AND r.isRed() then (* (R a x b) y (R c z d) *)
RedNode(l.toBlack(), v, r.toBlack())
elif l.isRed() AND: l.l.isRed() then (* (R (R a x b) y c) z d *)
RedNode(l.l.toBlack(), l.v, BlackNode(l.r, v, r))
elif l.isRed() AND: l.r.isRed() then (* (R a x (R b y c)) z d *)
RedNode(BlackNode(l.l, l.v, l.r.l), l.r.v, BlackNode(l.r.r, v, r))
elif r.isRed() AND: r.r.isRed() then (* a x (R b y (R c z d)) *)
RedNode(BlackNode(l, v, r.l), r.v, r.r.toBlack())
elif r.isRed() AND: r.l.isRed() then (* a x (R (R b y c) z d)) *)
RedNode(BlackNode(l, v, r.l.l), r.l.v, BlackNode(r.l.r, r.v, r.r))
else
BlackNode(l,v,r)
end
This matches the structure of the Haskell code more closely--it only
lacks the nice pattern-matching syntax.
By the way, I don't think the first case (l.isRed() AND r.isRed()) is
necessary. It just pushes red nodes up the tree, which is an
advantage for imperative red-black trees when insertions
significantly outnumber deletions (the usual case) because they allow
balancing after insertion to terminate early. But for functional red-
black trees, you have to go all the way to the root anyway.
One subtlety in the code above is the use of conditional AND (i.e.,
AND:), so that in the second case, for example, l.l is not evaluated
unless l.isRed() returns true. If l.isRed() is false, then l may be
empty, in which case l.l would not be well defined. We could also
avoid the problem by having an empty node return itself for .l
and .r. Alternatively, we could nest the if expressions, but that
reintroduces the extra BlackNode cases.
- Victor