Dimitri Tcaciuc | 21 Apr 21:17 2012
Picon

[Cython] `cdef inline` and typed memory views

Hey everyone,

Congratulations on shipping 0.16! I think I found a problem which
seems pretty straight forward. Say I want to factor out inner part of
some N^2 loops over a flow array, I write something like

  cdef inline float _inner(size_t i, size_t j, float[:] x):
     cdef float d = x[i] - x[j]
     return sqrtf(d * d)

In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and
function is declared as inline, which is great. However, the
memoryview structure is passed by value:

  static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i,
size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) {
     ...

This seems to hinder compiler's (in my case, GCC 4.3.4) ability to
perform efficient inlining (although function does in fact get
inlined). If I manually inline that distance calculation, I get 3x
speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k
elements). When I manually modified generated .c file to pass memory
view slice by pointer, slowdown was eliminated completely.

On a somewhat relevant node, have you considered enabling Issues page on Github?

Thanks!

Dimitri.
(Continue reading)

Stefan Behnel | 21 Apr 23:48 2012
Picon

Re: [Cython] `cdef inline` and typed memory views

Dimitri Tcaciuc, 21.04.2012 21:17:
> On a somewhat relevant node, have you considered enabling Issues page on Github?

It was discussed, but the drawback of having two separate bug trackers is
non-negligible.

Stefan
Dimitri Tcaciuc | 22 Apr 20:14 2012
Picon

Re: [Cython] `cdef inline` and typed memory views

On Sat, Apr 21, 2012 at 2:48 PM, Stefan Behnel <stefan_ml@...> wrote:
> Dimitri Tcaciuc, 21.04.2012 21:17:
>> On a somewhat relevant node, have you considered enabling Issues page on Github?
>
> It was discussed, but the drawback of having two separate bug trackers is
> non-negligible.

Ok. I was wondering since it would make it much easier to connect
issue/patch/discussion together without, say, me needlessly adding to
the development mailing list and/or manually registering for trac and
sending htpasswd digest over the mail. Here's something to consider if
you ever want to migrate over from trac:
https://github.com/adamcik/github-trac-ticket-import

Cheers,

Dimitri.

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel@...
> http://mail.python.org/mailman/listinfo/cython-devel
mark florisson | 22 Apr 22:21 2012
Picon

Re: [Cython] `cdef inline` and typed memory views

On 22 April 2012 19:14, Dimitri Tcaciuc <dtcaciuc@...> wrote:
> On Sat, Apr 21, 2012 at 2:48 PM, Stefan Behnel <stefan_ml@...> wrote:
>> Dimitri Tcaciuc, 21.04.2012 21:17:
>>> On a somewhat relevant node, have you considered enabling Issues page on Github?
>>
>> It was discussed, but the drawback of having two separate bug trackers is
>> non-negligible.
>
> Ok. I was wondering since it would make it much easier to connect
> issue/patch/discussion together without, say, me needlessly adding to
> the development mailing list and/or manually registering for trac and
> sending htpasswd digest over the mail. Here's something to consider if
> you ever want to migrate over from trac:
> https://github.com/adamcik/github-trac-ticket-import
>
> Cheers,
>
> Dimitri.

I haven't heard very good things about github issues, but I like to
have everything in one place, and I'm not too fond of trac in any
regard. It's also quite a barrier to get trac access, so I'd be in
favour of moving tickets.

>> Stefan
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel@...
>> http://mail.python.org/mailman/listinfo/cython-devel
> _______________________________________________
(Continue reading)

Stefan Behnel | 23 Apr 08:19 2012
Picon

Re: [Cython] `cdef inline` and typed memory views

mark florisson, 22.04.2012 22:21:
> On 22 April 2012 19:14, Dimitri Tcaciuc wrote:
>> On Sat, Apr 21, 2012 at 2:48 PM, Stefan Behnel wrote:
>>> Dimitri Tcaciuc, 21.04.2012 21:17:
>>>> On a somewhat relevant node, have you considered enabling Issues page on Github?
>>>
>>> It was discussed, but the drawback of having two separate bug trackers is
>>> non-negligible.
>>
>> Ok. I was wondering since it would make it much easier to connect
>> issue/patch/discussion together without, say, me needlessly adding to
>> the development mailing list and/or manually registering for trac and
>> sending htpasswd digest over the mail. Here's something to consider if
>> you ever want to migrate over from trac:
>> https://github.com/adamcik/github-trac-ticket-import
>
> I haven't heard very good things about github issues

I find them nicely accessible from user side, but hardly usable for the
developers. All you get is basically a blog style comment system with bare
tag support. Sure, you can build many of the necessary features on top of
tags, but trac (or any other real issue tracker) already provides a lot more.

Pull request tracking works well in github, but I consider their general
issue tracker a last resort if you don't have anything else.

Stefan
mark florisson | 22 Apr 22:20 2012
Picon

Re: [Cython] `cdef inline` and typed memory views

On 21 April 2012 20:17, Dimitri Tcaciuc <dtcaciuc <at> gmail.com> wrote:
> Hey everyone,
>
> Congratulations on shipping 0.16! I think I found a problem which
> seems pretty straight forward. Say I want to factor out inner part of
> some N^2 loops over a flow array, I write something like
>
>  cdef inline float _inner(size_t i, size_t j, float[:] x):
>     cdef float d = x[i] - x[j]
>     return sqrtf(d * d)
>
> In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and
> function is declared as inline, which is great. However, the
> memoryview structure is passed by value:
>
>  static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i,
> size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) {
>     ...
>
> This seems to hinder compiler's (in my case, GCC 4.3.4) ability to
> perform efficient inlining (although function does in fact get
> inlined). If I manually inline that distance calculation, I get 3x
> speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k
> elements). When I manually modified generated .c file to pass memory
> view slice by pointer, slowdown was eliminated completely.
>
> On a somewhat relevant node, have you considered enabling Issues page on Github?
>
>
> Thanks!
(Continue reading)

Dimitri Tcaciuc | 22 Apr 23:45 2012
Picon

Re: [Cython] `cdef inline` and typed memory views

On Sun, Apr 22, 2012 at 1:20 PM, mark florisson
<markflorisson88@...> wrote:
> On 21 April 2012 20:17, Dimitri Tcaciuc <dtcaciuc@...> wrote:
>> Hey everyone,
>>
>> Congratulations on shipping 0.16! I think I found a problem which
>> seems pretty straight forward. Say I want to factor out inner part of
>> some N^2 loops over a flow array, I write something like
>>
>>  cdef inline float _inner(size_t i, size_t j, float[:] x):
>>     cdef float d = x[i] - x[j]
>>     return sqrtf(d * d)
>>
>> In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and
>> function is declared as inline, which is great. However, the
>> memoryview structure is passed by value:
>>
>>  static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i,
>> size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) {
>>     ...
>>
>> This seems to hinder compiler's (in my case, GCC 4.3.4) ability to
>> perform efficient inlining (although function does in fact get
>> inlined). If I manually inline that distance calculation, I get 3x
>> speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k
>> elements). When I manually modified generated .c file to pass memory
>> view slice by pointer, slowdown was eliminated completely.
>>
>> On a somewhat relevant node, have you considered enabling Issues page on Github?
>>
(Continue reading)

Stefan Behnel | 23 Apr 08:24 2012
Picon

Re: [Cython] `cdef inline` and typed memory views

mark florisson, 22.04.2012 22:20:
> On 21 April 2012 20:17, Dimitri Tcaciuc wrote:
>> Say I want to factor out inner part of
>> some N^2 loops over a flow array, I write something like
>>
>>  cdef inline float _inner(size_t i, size_t j, float[:] x):
>>     cdef float d = x[i] - x[j]
>>     return sqrtf(d * d)
>>
>> In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and
>> function is declared as inline, which is great. However, the
>> memoryview structure is passed by value:
>>
>>  static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i,
>> size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) {
>>     ...
>>
>> This seems to hinder compiler's (in my case, GCC 4.3.4) ability to
>> perform efficient inlining (although function does in fact get
>> inlined). If I manually inline that distance calculation, I get 3x
>> speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k
>> elements). When I manually modified generated .c file to pass memory
>> view slice by pointer, slowdown was eliminated completely.
> 
> Although it is neither documented nor tested, it works if you just
> take the address of the memoryview. You can then index it using
> memoryview_pointer[0][i].

Are you advertising this an an actual feature here? I'm just asking because
supporting hacks can be nasty in the long run. What if we ever want to make
(Continue reading)

mark florisson | 23 Apr 10:39 2012
Picon

Re: [Cython] `cdef inline` and typed memory views

On 23 April 2012 07:24, Stefan Behnel <stefan_ml <at> behnel.de> wrote:
> mark florisson, 22.04.2012 22:20:
>> On 21 April 2012 20:17, Dimitri Tcaciuc wrote:
>>> Say I want to factor out inner part of
>>> some N^2 loops over a flow array, I write something like
>>>
>>>  cdef inline float _inner(size_t i, size_t j, float[:] x):
>>>     cdef float d = x[i] - x[j]
>>>     return sqrtf(d * d)
>>>
>>> In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and
>>> function is declared as inline, which is great. However, the
>>> memoryview structure is passed by value:
>>>
>>>  static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i,
>>> size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) {
>>>     ...
>>>
>>> This seems to hinder compiler's (in my case, GCC 4.3.4) ability to
>>> perform efficient inlining (although function does in fact get
>>> inlined). If I manually inline that distance calculation, I get 3x
>>> speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k
>>> elements). When I manually modified generated .c file to pass memory
>>> view slice by pointer, slowdown was eliminated completely.
>>
>> Although it is neither documented nor tested, it works if you just
>> take the address of the memoryview. You can then index it using
>> memoryview_pointer[0][i].
>
> Are you advertising this an an actual feature here? I'm just asking because
(Continue reading)


Gmane