Matt Chaput | 19 Mar 2010 01:00
Favicon
Gravatar

64-bit CDB

Hi, I've written a Python implementation of the CDB algorithm as part of 
a project. I'd like to convert it to use 64-bit disk offsets to get 
around the 4-GB limit, but I don't understand how the 
hash-to-disk-offset lookup process works and what depends on 32-bit-ness 
well enough to do it with confidence.

Has anyone done this already or can anyone point out what I'd need to 
change in the algorithm to get it to work?

Thanks very much!

Matt

Chris Dean | 19 Mar 2010 01:37
Favicon
Gravatar

Re: 64-bit CDB

Matt Chaput <matt <at> sidefx.com> writes:
> Hi, I've written a Python implementation of the CDB algorithm as part
> of a project. I'd like to convert it to use 64-bit disk offsets to get
> around the 4-GB limit, but I don't understand how the
> hash-to-disk-offset lookup process works and what depends on
> 32-bit-ness well enough to do it with confidence.
>
> Has anyone done this already or can anyone point out what I'd need to
> change in the algorithm to get it to work?

Have you seen the diagram at 

  http://www.unixuser.org/~euske/doc/cdbinternals/index.html

To use language on that page, you need to change the pointer part of the
data structure to take 64bits instead of 32.  This will also make your
implementation incompatible with existing cdb file formats.

Cheers,
Chris Dean

Thomas Mangin | 19 Mar 2010 11:05
Picon
Favicon

Re: 64-bit CDB

> Matt Chaput <matt <at> sidefx.com> writes:
>> Hi, I've written a Python implementation of the CDB algorithm as part
>> of a project.

So did we two years ago, and put it in the public domain at the time (I believe I published it here at the time)
http://thomas.mangin.com/data/source/cdb.py

Writing CDB files is slow (compared to cdbmake) and eats much memory but it is correct - and unlike the Python
C binding does not cause us segfaults on some machines.

>  http://www.unixuser.org/~euske/doc/cdbinternals/index.html

Thanks, never saw it before.

Thomas
Matt Chaput | 19 Mar 2010 16:51
Favicon
Gravatar

re: 64-bit CDB

Let me try to be clearer: does anyone understand the hashtable lookup 
part of the CDB algorithm (i.e. start at (hash >> 8) % tablesize, half 
the table kept empty, etc.) sufficiently to explain if it could/how it 
would change if I use 12-byte records (4-byte hash + 8-byte pointer)?

Thanks,

Matt

Matt Chaput | 19 Mar 2010 18:27
Favicon
Gravatar

Re: 64-bit CDB

On 3/19/2010 11:51 AM, Matt Chaput wrote:
> Let me try to be clearer: does anyone understand the hashtable lookup
> part of the CDB algorithm (i.e. start at (hash >> 8) % tablesize, half
> the table kept empty, etc.) sufficiently to explain if it could/how it
> would change if I use 12-byte records (4-byte hash + 8-byte pointer)?

Sorry, y'know what, I think I grok it now. Thanks anyway :)

Matt

Andy Bradford | 22 Mar 2010 05:33

Re: 64-bit CDB

Thus said Matt Chaput on Fri, 19 Mar 2010 13:27:50 EDT:

> Sorry, y'know what, I think I grok it now. Thanks anyway :)

Care to share for the archives?

Andy
--

-- 
[-----------[system uptime]--------------------------------------------]
 10:33pm  up  4:07,  1 user,  load average: 1.56, 1.41, 1.21


Gmane