Ketil Malde | 6 Dec 13:45 2013

hash tables, judy arrays, and more


Hi,

As some of you may know┬╣, I've been playing around with associative data
structures, and in particular Judy arrays, using the "judy" package by
Don S.  Now, I find these to be fast and (more importantly) frugal
concerning memory - but now I see what I can only interpret as memory
corruption - a bit depending on the ordering of the code and/or
insertion of print statements, output is truncated or contaminated.

The strange thing is that this only occurs when I also use Data.Vector
or Data.HashTable.IO in the same program - if it's all Judy, then things
work as expected - as far as I have seen, anyway.

First I thought it was my dubious "insertWith" function, but when I
reverted to separate "insert" and "lookup", I can still provoke the
behavior.

Has anybody else seen anything similar?  Any idea how to debug something
like this?

Then I thought I'd look at hash tables, using the 'hashtables' package.
I haven't tested it much yet, but it appears to be a lot slower than
Judy (maybe as much as 10x), and uses a lot more memory (also perhaps a
factor of 10).  I guess I might be able to improve things a bit by
judiciously applying strictness, but it seems to be storing both keys
and values unboxed, so I don't expect to come close to Judy - I guess
there isn't any unboxed hash table implementations around?

Anyway - I seem to have programmed myself into a corner here, so
(Continue reading)

Gregory Collins | 6 Dec 14:52 2013
Picon

Re: hash tables, judy arrays, and more


On Fri, Dec 6, 2013 at 1:45 PM, Ketil Malde <ketil <at> malde.org> wrote:
Then I thought I'd look at hash tables, using the 'hashtables' package.
I haven't tested it much yet, but it appears to be a lot slower than
Judy (maybe as much as 10x), and uses a lot more memory (also perhaps a
factor of 10).  I guess I might be able to improve things a bit by
judiciously applying strictness, but it seems to be storing both keys
and values unboxed, so I don't expect to come close to Judy - I guess
there isn't any unboxed hash table implementations around?

If you want to try the git master version of the hashtables library, I've made some performance and memory overhead improvements that haven't been released yet (I still need to run more benchmarks before release). Try both the "basic" and "cuckoo" hash tables (cuckoo might be better). IIRC we force keys stored in the hash tables but not the values -- you might want to confirm you're not building up value thunks.

G
--
Gregory Collins <greg <at> gregorycollins.net>
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane