Dave Smith | 24 Sep 00:37 2011
Picon

Bloom filters

Hi all,

I have now seen (in production) a situation where a user is writing
uuids into a LevelDB instance. Unfortunately, before the write can be
done we need to do a read. This, obviously, gets very expensive as the
dataset grows in size. I was wondering if the Google team had any
thoughts or plans to put Bloom filters into LevelDB, or even,
suggestions for best place to hang one?

Thanks,

D.

Sanjay Ghemawat | 24 Sep 01:33 2011
Picon

Re: Bloom filters

On Fri, Sep 23, 2011 at 3:37 PM, Dave Smith <dizzyd <at> gmail.com> wrote:
> Hi all,
>
> I have now seen (in production) a situation where a user is writing
> uuids into a LevelDB instance. Unfortunately, before the write can be
> done we need to do a read. This, obviously, gets very expensive as the
> dataset grows in size. I was wondering if the Google team had any
> thoughts or plans to put Bloom filters into LevelDB, or even,
> suggestions for best place to hang one?

There are two reasons I am reluctant to put bloom filter support directly
in leveldb:

(1) It increases the footprint/complexity of the library.

(2) Many storage system wrappers will want a bloom filter key
that does not exactly match a leveldb key.  Suppose for example,
there is a higher-level storage engine that creates leveldb keys that
look like
    concat(userkey, metadata)
The reads generated by such a storage engine will be range reads
(not plain gets) and for bloom filters to be effective, we would need to
have enough hooks so that leveldb can know the range that is being
read, and map down from there to the userkey to look for in bloom
filters.

This might be doable, but no simple interface springs to mind.  That's
why I am hesitant to add support inside leveldb.

On the other hand, I see no easy way for the application to build
(Continue reading)


Gmane