Sebastien Binet | 2 Sep 2009 19:01
Picon
Gravatar

Re: [Cython] FW: cython and hash tables / dictionary

On Wednesday 02 September 2009 18:01:43 Robert Bradshaw wrote:
> On Sep 2, 2009, at 5:34 AM, Sanne Korzec wrote:
> > Hi mailing,
> >
> > I’ve been writing a complex program in python, which I am currently
> > scaling up. I find myself in the position now, where I run out of
> > memory or out of time. I have been looking at alternatives like
> > cython and ctypes. I implemented ctypes which fixes the memory
> > problem but doubles the time problem.
> >
> > Currently I am implementing a cython version and ran into a
> > problem. I hope someone can help me out.
> >
> > The main bottleneck in my code is a large dictionary / hash table
> > which I would like to optimize. Since a dictionary is a python
> > datatype I have no idea how to make this cython.
> >
> > Currently I have tried to keep the ‘keys’ intact and store the
> > ‘values’ as ctypes floats, but I think it might be better to do
> > something else. Do I need to make the entire hash table c? Or is
> > there a more simple solution like combining the python dict with
> > cython? If so, how do I do this?
> >
> > Thanks in advance.
> >
> > Additional details: I use a double dict where the key of the first
> > dict stores another dict as value.
> >
> > S.
>
(Continue reading)

Sanne Korzec | 3 Sep 2009 11:02
Picon

Re: [Cython] FW: cython and hash tables / dictionary

Thanks for the link.

But I'm a little confused again. I thought cython is used when you do not
want to write c yourself. But use it to write 'pythonic' c.

In this case, the c code is ready and needs to be included in my python
program.

What I need is simplicity and c. Should I use cython to extend my main
python program with this c datatype? If so, how? Or is there a better/faster
way to let these two communicate?

I basically need a fast way of accessing the hash table from python. If
someone can refer me to some of the many docs, I would be very happy.

To summarize this is when my python program needs to read or write to the
hash table.

Main python program does:

-some preprocessing
-loop over a large data file
	-1) count and calculate and do some tricks (read from hash table)
	-2) return a float, with two indices
	-3) store this output (write to hash table)
-write final hash table to output

Thanks.

-----Original Message-----
(Continue reading)

Stefan Behnel | 2 Sep 2009 20:08
Picon
Favicon

Re: [Cython] FW: cython and hash tables / dictionary


Sebastien Binet wrote:
> I'd recommand using this C library:
> http://c-algorithms.sourceforge.net/

Interesting. That would certainly make a nice C-level standard library.

Would you have ready-made .pxd files for this library? Or even some Cython
example code that you could post somewhere?

Stefan
Sanne Korzec | 4 Sep 2009 15:50
Picon

Re: [Cython] FW: cython and hash tables / dictionary

Hi,

I would also appreciate an example for the hash table too. I don't mind
writing documentation for it, if someone can help me out getting it to work.

I am unsure how and were to declare the arguments that hash_table_new takes.

In the documentation
http://c-algorithms.sourceforge.net/doc/hash-table_8h.html#e361c4c0256ec6c74
1ecfeabef33d891 , I can find:

HashTable* hash_table_new 	( 	HashTableHashFunc  	hash_func,
		HashTableEqualFunc  	equal_func	 
	) 			

To create a new hash table. But I can't find were the HashTableHashFunc and
HashTableEqualFunc are declared. The only thing I can find is in the header
file which state:

typedef unsigned long(* HashTableHashFunc)(HashTableKey value)
typedef unsigned long(* HashTableHashFunc)(HashTableKey value)

Does this mean I have to write these functions myself? In c? And how then do
I call them from cython?

My guess: 

hashtable.pyx

cdef extern from "hash_table.h":
(Continue reading)

Stefan Behnel | 4 Sep 2009 16:03
Picon
Favicon

Re: [Cython] FW: cython and hash tables / dictionary


Sanne Korzec wrote:
> In the documentation
> http://c-algorithms.sourceforge.net/doc/hash-table_8h.html#e361c4c0256ec6c74
> 1ecfeabef33d891 , I can find:
> 
> HashTable* hash_table_new 	( 	HashTableHashFunc  	hash_func,
> 		HashTableEqualFunc  	equal_func	 
> 	) 			
> 
> To create a new hash table. But I can't find were the HashTableHashFunc and
> HashTableEqualFunc are declared. The only thing I can find is in the header
> file which state:
> 
> typedef unsigned long(* HashTableHashFunc)(HashTableKey value)
> typedef unsigned long(* HashTableHashFunc)(HashTableKey value)
> 
> Does this mean I have to write these functions myself?

Yes.

> In c?

You can write them in Cython:

    cdef unsigned long c_hash(HashTableKey value):
        return huge_calculation_on(value)

> And how then do I call them from cython?

(Continue reading)

Sanne Korzec | 7 Sep 2009 15:15
Picon

Re: [Cython] FW: cython and hash tables / dictionary

Ok, I now have this:

cythonHash.pxd:

cdef extern from "hash-table.h":
    ctypedef struct HashTable
    ctypedef void *HashTableKey
    ctypedef unsigned long HashTableHashFunc(HashTableKey value)
    ctypedef unsigned long HashTableEqualFunc(HashTableKey value)
    HashTable *hash_table_new(HashTableHashFunc hash_func,
HashTableEqualFunc equal_func)

cdef inline unsigned long c_hash_func(HashTableKey value):
    return 1

cdef inline unsigned long c_hash_equal(HashTableKey value):
    return 1

cythonPT.pyx:

cimport cythonHash
from cythonHash cimport HashTable

class MY_Phrase_Table(object):

    def __init__(self):

	pp = HashTable		#error here
      print type(pp), pp        

(Continue reading)

Sebastien Binet | 7 Sep 2009 21:28
Picon
Gravatar

Re: [Cython] FW: cython and hash tables / dictionary

hi there,

attached is a simple cy_stl.pyx file (together with its setup.py companion)

really just to get started :)

to test:
$ python -c 'import cy_stl as cc; cc.test()'

hth,
sebastien.

> cythonHash.pxd:
> 
> cdef extern from "hash-table.h":
>     ctypedef struct HashTable
>     ctypedef void *HashTableKey
>     ctypedef unsigned long HashTableHashFunc(HashTableKey value)
>     ctypedef unsigned long HashTableEqualFunc(HashTableKey value)
>     HashTable *hash_table_new(HashTableHashFunc hash_func,
> HashTableEqualFunc equal_func)
> 
> cdef inline unsigned long c_hash_func(HashTableKey value):
>     return 1
> 
> cdef inline unsigned long c_hash_equal(HashTableKey value):
>     return 1
> 
> 
> cythonPT.pyx:
(Continue reading)

Robert Bradshaw | 7 Sep 2009 23:53
Favicon

Re: [Cython] FW: cython and hash tables / dictionary

Thanks for the implementation. I noticed that your compare was only  
comparing pointers, so if you had two equal strings at different  
memory addresses it wouldn't find them (which may or may not be what  
you'd want) Also, your pointers would go out of scope as soon as test  
ended (so you couldn't return ht and use it later).

I built on what you had to get a float -> float hashtable. Note that  
this technique only works since the float value fits inside a void*,  
anything bigger and you'd have to allocate memory manually to stick  
it into the hashtable.

- Robert

Attachment (cy_stl.pyx): application/octet-stream, 2844 bytes

sage: from cy_stl import *
sage: time time_c_hashtable(10**5)
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s
sage: time time_c_hashtable(10**6)
CPU times: user 0.08 s, sys: 0.00 s, total: 0.08 s
Wall time: 0.08 s
sage: time time_c_hashtable(10**7)
CPU times: user 0.75 s, sys: 0.00 s, total: 0.75 s
Wall time: 0.75 s
sage: time time_py_hashtable(10**5)
CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s
Wall time: 0.02 s
(Continue reading)

Sanne Korzec | 9 Sep 2009 16:18
Picon

Re: [Cython] FW: cython and hash tables / dictionary

Ok, I have played around with this hash table and understand most of the
basics...

I now would like to create the datastructure I need for my project.

Basically what I need is two linked hash tables like this

Key : int ---> value : hashtable { key: int ---> value: list(float, float) }

And

Key : int ---> value : hashtable { key: int ---> value: float }

I am starting to wonder since this hash table works with voids for key and
value only if I should change the .c and .h files myself. I think I would
prefer not to.

Does anybody have a suggestion what would be wise? Preference goes to quick
implementation, not total optimization.

Sanne Korzec | 8 Sep 2009 16:28
Picon

Re: [Cython] FW: cython and hash tables / dictionary

Thanks for both examples. 

Is there a way to check the value of a float at different stages as it is
being cast from one type to another?

e.g.

is there a way to write to the screen the value of a void*  void** or
float*, cause currently all float values return (0.0) in my implementation.

-----Original Message-----
From: Robert Bradshaw [mailto:robertwb@...] 
Sent: maandag 7 september 2009 23:53
To: Cython-dev; Sebastien Binet
Cc: sanne@...
Subject: Re: [Cython] FW: cython and hash tables / dictionary

Thanks for the implementation. I noticed that your compare was only  
comparing pointers, so if you had two equal strings at different  
memory addresses it wouldn't find them (which may or may not be what  
you'd want) Also, your pointers would go out of scope as soon as test  
ended (so you couldn't return ht and use it later).

I built on what you had to get a float -> float hashtable. Note that  
this technique only works since the float value fits inside a void*,  
anything bigger and you'd have to allocate memory manually to stick  
it into the hashtable.

- Robert

(Continue reading)

Greg Ewing | 9 Sep 2009 01:36
Picon
Picon
Favicon

Re: [Cython] FW: cython and hash tables / dictionary

Sanne Korzec wrote:

> Is there a way to check the value of a float at different stages as it is
> being cast from one type to another?

If you're building a specialised hash table just for floats,
why are you casting them to void * at all? Why not just
store them as floats?

--

-- 
Greg
Robert Bradshaw | 9 Sep 2009 04:09
Favicon

Re: [Cython] FW: cython and hash tables / dictionary

On Sep 8, 2009, at 4:36 PM, Greg Ewing wrote:

> Sanne Korzec wrote:
>
>> Is there a way to check the value of a float at different stages  
>> as it is
>> being cast from one type to another?
>
> If you're building a specialised hash table just for floats,
> why are you casting them to void * at all? Why not just
> store them as floats?

It was an example of using an (existing) hashtable that was written  
to store void*.

- Robert
Lisandro Dalcin | 8 Sep 2009 16:52
Picon
Gravatar

Re: [Cython] FW: cython and hash tables / dictionary

On Tue, Sep 8, 2009 at 11:28 AM, Sanne Korzec<sanne@...> wrote:
>
> is there a way to write to the screen the value of a void*  void** or
> float*, cause currently all float values return (0.0) in my implementation.
>

What about using C stdlib printf() using "%p" format, like this

cdef extern from "stdio.h":
   int printf(char*,...)

cdef float *ptr = NULL
printf("%p\n", <void*>ptr)

>
>
>
>
> -----Original Message-----
> From: Robert Bradshaw [mailto:robertwb@...]
> Sent: maandag 7 september 2009 23:53
> To: Cython-dev; Sebastien Binet
> Cc: sanne@...
> Subject: Re: [Cython] FW: cython and hash tables / dictionary
>
> Thanks for the implementation. I noticed that your compare was only
> comparing pointers, so if you had two equal strings at different
> memory addresses it wouldn't find them (which may or may not be what
> you'd want) Also, your pointers would go out of scope as soon as test
> ended (so you couldn't return ht and use it later).
(Continue reading)

Sanne Korzec | 8 Sep 2009 13:41
Picon

Re: [Cython] FW: cython and hash tables / dictionary

Thanks for the implementation, can't wait to try. But again I'm running into
some problems. 

I installed the c stl with configure --prefix='/home/me/libcalg'

Changed the setup.py:

#!/usr/bin/env python
# python setup.py build_ext --inplace
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext = Extension(
    "cy_hash",
    ["cy_hash.pyx"],
    language="c",
    include_dirs=['/home/me/libcalg/include/libcalg-1.0'],
    library_dirs=['/home/me/libcalg/lib'],
    libraries=['calg'],			#also tried 'libcalg/calg'
    cmdclass = {'build_ext': build_ext}
    )

setup(
    cmdclass={'build_ext': build_ext},
    ext_modules=[ext]
    )

When I run python setup.py build_ext --inplace

(Continue reading)

Lisandro Dalcin | 8 Sep 2009 16:26
Picon
Gravatar

Re: [Cython] FW: cython and hash tables / dictionary

On Tue, Sep 8, 2009 at 8:41 AM, Sanne Korzec<sanne@...> wrote:
> Thanks for the implementation, can't wait to try. But again I'm running into
> some problems.
>
> I installed the c stl with configure --prefix='/home/me/libcalg'
>
> Changed the setup.py:
>
> #!/usr/bin/env python
> # python setup.py build_ext --inplace
> from distutils.core import setup
> from distutils.extension import Extension
> from Cython.Distutils import build_ext
>
> ext = Extension(
>    "cy_hash",
>    ["cy_hash.pyx"],
>    language="c",
>    include_dirs=['/home/me/libcalg/include/libcalg-1.0'],
>    library_dirs=['/home/me/libcalg/lib'],
>    libraries=['calg'],                 #also tried 'libcalg/calg'
>    cmdclass = {'build_ext': build_ext}
>    )
>
> setup(
>    cmdclass={'build_ext': build_ext},
>    ext_modules=[ext]
>    )
>
> When I run python setup.py build_ext --inplace
(Continue reading)

Sebastien Binet | 8 Sep 2009 13:45
Picon
Gravatar

Re: [Cython] FW: cython and hash tables / dictionary

On Tuesday 08 September 2009 13:41:40 Sanne Korzec wrote:
> Thanks for the implementation, can't wait to try. But again I'm running
>  into some problems.
> 
> I installed the c stl with configure --prefix='/home/me/libcalg'
> 
> Changed the setup.py:
> 
> #!/usr/bin/env python
> # python setup.py build_ext --inplace
> from distutils.core import setup
> from distutils.extension import Extension
> from Cython.Distutils import build_ext
> 
> ext = Extension(
>     "cy_hash",
>     ["cy_hash.pyx"],
>     language="c",
>     include_dirs=['/home/me/libcalg/include/libcalg-1.0'],
>     library_dirs=['/home/me/libcalg/lib'],
>     libraries=['calg'],			#also tried 'libcalg/calg'
>     cmdclass = {'build_ext': build_ext}
>     )
> 
> setup(
>     cmdclass={'build_ext': build_ext},
>     ext_modules=[ext]
>     )
> 
> When I run python setup.py build_ext --inplace
(Continue reading)

Sebastien Binet | 8 Sep 2009 08:51
Picon
Gravatar

Re: [Cython] FW: cython and hash tables / dictionary

On Monday 07 September 2009 23:53:03 Robert Bradshaw wrote:
> Thanks for the implementation. I noticed that your compare was only
> comparing pointers, so if you had two equal strings at different
> memory addresses it wouldn't find them (which may or may not be what
> you'd want)
yeah, for strings, the c-alg library has a dedicated string-hash:
http://c-algorithms.sourceforge.net/doc/hash-
string_8h.html#6eb697fb58d3de146a2ddd76a1900f83

(as well as a case insensitive version)

> Also, your pointers would go out of scope as soon as test
> ended (so you couldn't return ht and use it later).
well, the C-way is to malloc everything (and c-alg's hash-table provides a way 
to register the proper 'free' frunctions)

> 
> I built on what you had to get a float -> float hashtable. Note that
> this technique only works since the float value fits inside a void*,
> anything bigger and you'd have to allocate memory manually to stick
> it into the hashtable.
right.

[..snip..]
> Not near the speed gains I was expecting...disappointing.
I (probably very naively) suspect this is coming from all the type conversions 
void*<->float
but proper profiling would tell :)

cheers,
(Continue reading)

Stefan Behnel | 8 Sep 2009 09:48
Picon
Favicon

Re: [Cython] FW: cython and hash tables / dictionary


Sebastien Binet wrote:
> On Monday 07 September 2009 23:53:03 Robert Bradshaw wrote:
>> I noticed that your compare was only comparing pointers, so if you had
>> two equal strings at different memory addresses it wouldn't find them
>> (which may or may not be what you'd want)
>
> yeah, for strings, the c-alg library has a dedicated string-hash:
> http://c-algorithms.sourceforge.net/doc/hash-
> string_8h.html#6eb697fb58d3de146a2ddd76a1900f83
> 
> (as well as a case insensitive version)

Regarding string hashes, this is worth a read:

http://burtleburtle.net/bob/hash/doobs.html

In version 2.7.x, libxml2's tag dictionaries switched to one of those,
which brought a major performance boost for large sets of XML tags in
documents.

Stefan
Robert Bradshaw | 8 Sep 2009 00:19
Favicon

Re: [Cython] FW: cython and hash tables / dictionary

On Sep 7, 2009, at 2:53 PM, Robert Bradshaw wrote:

> Not near the speed gains I was expecting...disappointing.

I was timing the wrong thing

sage: time time_c_hashtable(10**5)
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s
sage: time time_c_hashtable(10**6)
CPU times: user 0.08 s, sys: 0.00 s, total: 0.08 s
Wall time: 0.08 s
sage: time time_c_hashtable(10**7)
CPU times: user 0.76 s, sys: 0.00 s, total: 0.76 s
Wall time: 0.76 s
sage: time time_py_hashtable(10**5)
CPU times: user 0.06 s, sys: 0.00 s, total: 0.06 s
Wall time: 0.06 s
sage: time time_py_hashtable(10**6)
CPU times: user 0.50 s, sys: 0.00 s, total: 0.50 s
Wall time: 0.50 s
sage: time time_py_hashtable(10**7)
CPU times: user 5.14 s, sys: 0.01 s, total: 5.15 s
Wall time: 5.15 s

still 6.7x is not much.

>
> - Robert
>
(Continue reading)


Gmane