Christopher Barker | 2 Sep 2009 18:30
Picon
Favicon

Re: [Cython] FW: cython and hash tables / dictionary

Sanne Korzec wrote:
> The main bottleneck in my code is a large dictionary / hash table which 
> I would like to optimize.

In what way do you need to optimize it? i.e. how is it used? do you have 
memory issues or speed issues? python dicts are highly optimized 
already, so you're not likely to do much better with the look-up speed.

-Chris

--

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@...
Robert Bradshaw | 2 Sep 2009 19:22
Favicon

Re: [Cython] FW: cython and hash tables / dictionary

On Wed, 2 Sep 2009, Christopher Barker wrote:

> Sanne Korzec wrote:
>> The main bottleneck in my code is a large dictionary / hash table which
>> I would like to optimize.
>
> In what way do you need to optimize it? i.e. how is it used? do you have
> memory issues or speed issues? python dicts are highly optimized
> already, so you're not likely to do much better with the look-up speed.

It could help with both memory and speed. In particular, to do a lookup in 
a Python hashtable you need to

(1) Wrap the float in a Python object
(2) call __hash__ on that new object
(3) actually do the lookup
(4) unwrap the result back into a float.

Python does have a highly optimized (3), but the overhead for the rest 
will probably overwhelm it speedwise, so I bet a simple, unwrapped 
implementation would still be quite a win.

- Robert
Dag Sverre Seljebotn | 2 Sep 2009 19:52
Picon
Picon

Re: [Cython] FW: cython and hash tables / dictionary

Robert Bradshaw wrote:
> On Wed, 2 Sep 2009, Christopher Barker wrote:
> 
>> Sanne Korzec wrote:
>>> The main bottleneck in my code is a large dictionary / hash table which
>>> I would like to optimize.
>> In what way do you need to optimize it? i.e. how is it used? do you have
>> memory issues or speed issues? python dicts are highly optimized
>> already, so you're not likely to do much better with the look-up speed.
> 
> It could help with both memory and speed. In particular, to do a lookup in 
> a Python hashtable you need to
> 
> (1) Wrap the float in a Python object
> (2) call __hash__ on that new object

I hope that isn't what actually happens :-) Floating point and hashes 
don't seem like a good idea.

(Just a note, I believe the OP was talking about floats in the values so 
we're good.)

> (3) actually do the lookup
> (4) unwrap the result back into a float.
> 
> Python does have a highly optimized (3), but the overhead for the rest 
> will probably overwhelm it speedwise, so I bet a simple, unwrapped 
> implementation would still be quite a win.
> 
> - Robert
(Continue reading)

Stefan Behnel | 2 Sep 2009 20:01
Picon
Favicon

Re: [Cython] FW: cython and hash tables / dictionary


Dag Sverre Seljebotn wrote:
> Robert Bradshaw wrote:
>> to do a lookup in a Python hashtable you need to
>>
>> (1) Wrap the float in a Python object
>> (2) call __hash__ on that new object
> 
> I hope that isn't what actually happens :-) Floating point and hashes 
> don't seem like a good idea.

That certainly depends on the hash function. A float value is not more than
a bunch of bits after all, just like an int or string. It even has the
advantage of being exactly a multiple of 4 bytes large, so a hash function
can actually deploy extremely efficient CPU operations.

Stefan

Dag Sverre Seljebotn | 2 Sep 2009 21:21
Picon
Picon

Re: [Cython] FW: cython and hash tables / dictionary

Stefan Behnel wrote:
> Dag Sverre Seljebotn wrote:
>   
>> Robert Bradshaw wrote:
>>     
>>> to do a lookup in a Python hashtable you need to
>>>
>>> (1) Wrap the float in a Python object
>>> (2) call __hash__ on that new object
>>>       
>> I hope that isn't what actually happens :-) Floating point and hashes 
>> don't seem like a good idea.
>>     
>
> That certainly depends on the hash function. A float value is not more than
> a bunch of bits after all, just like an int or string. It even has the
> advantage of being exactly a multiple of 4 bytes large, so a hash function
> can actually deploy extremely efficient CPU operations.
>   
Yes, but the hash function to use would in the majority of real world 
cases depend heavily on the context. Up to what precision should two 
floats be compared for equality? Since roundoff errors will usually lead 
to slightly different values for storage and retrieval, unless you're 
really, really careful. (Always compare floats by abs(a - b) < eps and 
so on).

(I'd like to hear about actual use cases if there is any though! -- 
anything I can think of where store/retrieve would make sense is better 
represented by intervals on the real line than a single float.)

(Continue reading)

Stefan Behnel | 2 Sep 2009 21:29
Picon
Favicon

Re: [Cython] FW: cython and hash tables / dictionary


Dag Sverre Seljebotn wrote:
> the hash function to use would in the majority of real world 
> cases depend heavily on the context. Up to what precision should two 
> floats be compared for equality? Since roundoff errors will usually lead 
> to slightly different values for storage and retrieval, unless you're 
> really, really careful.

Ah, ok, that's what you meant. Yes, I agree that it's rather futile to
discuss this without a real use case in mind.

Stefan


Gmane