5 Feb 22:57
[Cython] OpenCL support
Hey, I created a CEP for opencl support: http://wiki.cython.org/enhancements/opencl What do you think? Mark
Hey, I created a CEP for opencl support: http://wiki.cython.org/enhancements/opencl What do you think? Mark
Mark, Couple of thoughts based on some experience with OpenCL... 1. This may be going outside the proposed purpose, but some algorithms such as molecular simulations can benefit from a fairly large amount of constant data loaded at the beginning of the program and persisted in between invocations of a function. If I understand the proposal, entire program would need to be within one `with` block, which would certainly be limiting to the architecture. Eg. # run.py from cython_module import Evaluator # Arrays are loaded into device memory here x = Evaluator(params...) for i in range(N): # Calculations are performed with # mostly data in the device memory data_i = x.step() ... 2. AFAIK, given a device, OpenCL basically takes it over (which would be eg. 8 cores on 2 CPU x 4 cores machine), so I'm not sure how `num_cores` parameter would work here. There's the fission extension that allows you to selectively run on a portion of the device, but the idea is that you're still dedicating entire device to your process, but merely giving more organization to your processing tasks, where you have to specify the core numbers you want to use. I may very well be wrong here, bashing is welcome :)(Continue reading)
On 5 February 2012 22:39, Dimitri Tcaciuc <dtcaciuc <at> gmail.com> wrote: > Mark, > > Couple of thoughts based on some experience with OpenCL... > > 1. This may be going outside the proposed purpose, but some algorithms > such as molecular simulations can benefit from a fairly large amount > of constant data loaded at the beginning of the program and persisted > in between invocations of a function. If I understand the proposal, > entire program would need to be within one `with` block, which would > certainly be limiting to the architecture. Eg. > > # run.py > from cython_module import Evaluator > > # Arrays are loaded into device memory here > x = Evaluator(params...) > for i in range(N): > # Calculations are performed with > # mostly data in the device memory > data_i = x.step() > ... The point of the proposal is that the slices will actually stay on the GPU as long as possible, until they absolutely need to be copied back (e.g. when you go back to NumPy-land). You can do anything you want in-between (outside any parallel section), e.g. call other functions that run on the CPU, call python functions, whatever. When you continue processing data that is still on the GPU, it will simply continue from there.(Continue reading)
mark florisson, 06.02.2012 00:12: > On 5 February 2012 22:39, Dimitri Tcaciuc wrote: >> 3. Does it make sense to make OpenCL more explicit? Heuristics and >> automatic switching between, say, CPU and GPU is great for eg. Sage >> users, but maybe not so much if you know exactly what you're doing >> with your machine resources. E.g just having a library with thin >> cython-adapted wrappers would be awesome. I imagine this can be >> augmented by arrays having a knowledge of device-side/client-side >> (which would go towards addressing the issue 1. above) > > Hm, there are several advantages to supporting this in the language. ... and there's always the obvious disadvantage of making the language too complex and magic to learn and understand. Worth balancing. Stefan
On 6 February 2012 07:22, Stefan Behnel <stefan_ml@...> wrote: > mark florisson, 06.02.2012 00:12: >> On 5 February 2012 22:39, Dimitri Tcaciuc wrote: >>> 3. Does it make sense to make OpenCL more explicit? Heuristics and >>> automatic switching between, say, CPU and GPU is great for eg. Sage >>> users, but maybe not so much if you know exactly what you're doing >>> with your machine resources. E.g just having a library with thin >>> cython-adapted wrappers would be awesome. I imagine this can be >>> augmented by arrays having a knowledge of device-side/client-side >>> (which would go towards addressing the issue 1. above) >> >> Hm, there are several advantages to supporting this in the language. > > ... and there's always the obvious disadvantage of making the language too > complex and magic to learn and understand. Worth balancing. Definitely. This would however introduce very minor changes to the language (no new syntax at least, just a few memoryview methods), but more major changes to the compiler. The support would mostly be transparent. Clyther (http://srossross.github.com/Clyther/) is a related project, which does a similar thing by compiling python (bytecode) to opencl. What I want for Cython is something even more transparent, the user wouldn't perhaps even know opencl was involved, and the compiler has more control over how data is handled. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel@...(Continue reading)
On Mon, Feb 6, 2012 at 2:21 AM, mark florisson <markflorisson88@...> wrote: > On 6 February 2012 07:22, Stefan Behnel <stefan_ml@...> wrote: >> mark florisson, 06.02.2012 00:12: >>> On 5 February 2012 22:39, Dimitri Tcaciuc wrote: >>>> 3. Does it make sense to make OpenCL more explicit? Heuristics and >>>> automatic switching between, say, CPU and GPU is great for eg. Sage >>>> users, but maybe not so much if you know exactly what you're doing >>>> with your machine resources. E.g just having a library with thin >>>> cython-adapted wrappers would be awesome. I imagine this can be >>>> augmented by arrays having a knowledge of device-side/client-side >>>> (which would go towards addressing the issue 1. above) >>> >>> Hm, there are several advantages to supporting this in the language. >> >> ... and there's always the obvious disadvantage of making the language too >> complex and magic to learn and understand. Worth balancing. > > Definitely. This would however introduce very minor changes to the > language (no new syntax at least, just a few memoryview methods), but > more major changes to the compiler. The support would mostly be > transparent. > Clyther (http://srossross.github.com/Clyther/) is a related project, > which does a similar thing by compiling python (bytecode) to opencl. > What I want for Cython is something even more transparent, the user > wouldn't perhaps even know opencl was involved, and the compiler has > more control over how data is handled. What I'm absolutely certain of is that sort of complete transparency will eventually start getting edge cases and from there on additional(Continue reading)
On 05.02.2012 23:39, Dimitri Tcaciuc wrote: > 3. Does it make sense to make OpenCL more explicit? No, it takes the usefuness of OpenCL away, which is that kernels are text strings and compiled at run-time. > Heuristics and > automatic switching between, say, CPU and GPU is great for eg. Sage > users, but maybe not so much if you know exactly what you're doing > with your machine resources. E.g just having a library with thin > cython-adapted wrappers would be awesome. I imagine this can be > augmented by arrays having a knowledge of device-side/client-side > (which would go towards addressing the issue 1. above) Just use PyOpenCL and manipulate kernels as text. Python is excellent for that - Cython is not needed. If you think using Cython instead of Python (PyOpenCL and NumPy) will be important, you don't have a CPU bound problem that warrants the use of OpenCL. Sturla
On Tue, Feb 7, 2012 at 5:52 AM, Sturla Molden <sturla@...> wrote: > On 05.02.2012 23:39, Dimitri Tcaciuc wrote: > >> 3. Does it make sense to make OpenCL more explicit? > > > No, it takes the usefuness of OpenCL away, which is that kernels are text > strings and compiled at run-time. I'm not sure I understand you, maybe you could elaborate on that? By "explicit" I merely meant that the user will explicitly specify that they're working on OpenCL-enabled array or certain bit of Cython code will get compiled into OpenCL program etc. > >> Heuristics and >> automatic switching between, say, CPU and GPU is great for eg. Sage >> users, but maybe not so much if you know exactly what you're doing >> with your machine resources. E.g just having a library with thin >> cython-adapted wrappers would be awesome. I imagine this can be >> augmented by arrays having a knowledge of device-side/client-side >> (which would go towards addressing the issue 1. above) > > > Just use PyOpenCL and manipulate kernels as text. Python is excellent for > that - Cython is not needed. If you think using Cython instead of Python > (PyOpenCL and NumPy) will be important, you don't have a CPU bound problem > that warrants the use of OpenCL. Again, not sure what you mean here. As I mentioned in the thread,(Continue reading)
On 07.02.2012 18:22, Dimitri Tcaciuc wrote: > I'm not sure I understand you, maybe you could elaborate on that? OpenCL code is a text string that is compiled when the program runs. So it can be generated from run-time data. Think of it like dynamic HTML. > Again, not sure what you mean here. As I mentioned in the thread, > PyOpenCL worked quite fine, however if Cython is getting OpenCL > support, I'd much rather use that than keeping a dependency on another > library. You can use PyOpenCL or OpenCL C or C++ headers with Cython. The latter you just use as you would with any other C or C++ library. You don't need to change the compiler to use a library: It seems like you think OpenCL is compiled from code when you build the program. It is actually compiled from text strings when you run the program. It is meaningless to ask if Cython supports OpenCL because Cython supports any C library. Sturla
On 7 February 2012 17:58, Sturla Molden <sturla@...> wrote: > On 07.02.2012 18:22, Dimitri Tcaciuc wrote: > >> I'm not sure I understand you, maybe you could elaborate on that? > > > OpenCL code is a text string that is compiled when the program runs. So it > can be generated from run-time data. Think of it like dynamic HTML. > > >> Again, not sure what you mean here. As I mentioned in the thread, >> PyOpenCL worked quite fine, however if Cython is getting OpenCL >> support, I'd much rather use that than keeping a dependency on another >> library. > > > You can use PyOpenCL or OpenCL C or C++ headers with Cython. The latter you > just use as you would with any other C or C++ library. You don't need to > change the compiler to use a library: It seems like you think OpenCL is > compiled from code when you build the program. It is actually compiled from > text strings when you run the program. It is meaningless to ask if Cython > supports OpenCL because Cython supports any C library. > Sturla, in general we appreciate your input, you usually have useful things to say. But I really don't believe you have read the CEP, so please do, and then comment on what is proposed there if you want. Here is the link: http://wiki.cython.org/enhancements/opencl > Sturla(Continue reading)
On Tue, Feb 7, 2012 at 9:58 AM, Sturla Molden <sturla@...> wrote: > On 07.02.2012 18:22, Dimitri Tcaciuc wrote: > >> I'm not sure I understand you, maybe you could elaborate on that? > > > OpenCL code is a text string that is compiled when the program runs. So it > can be generated from run-time data. Think of it like dynamic HTML. > > >> Again, not sure what you mean here. As I mentioned in the thread, >> PyOpenCL worked quite fine, however if Cython is getting OpenCL >> support, I'd much rather use that than keeping a dependency on another >> library. > > > You can use PyOpenCL or OpenCL C or C++ headers with Cython. The latter you > just use as you would with any other C or C++ library. You don't need to > change the compiler to use a library: It seems like you think OpenCL is > compiled from code when you build the program. It is actually compiled from > text strings when you run the program. It is meaningless to ask if Cython > supports OpenCL because Cython supports any C library. I view this more as a proposal to have an OpenCL backend for prange loops and other vectorized operations. The advantage of integrating OpenCL into Cython is that one can write a single implementation of your algorithm (using traditional for...(p)range loops) and have it use the GPU in the background transparently (without having to manually learn and call the library yourself). This is analogous to the compiler/runtime system deciding to use sse instructions for a(Continue reading)
On 7 February 2012 17:22, Dimitri Tcaciuc <dtcaciuc@...> wrote: > On Tue, Feb 7, 2012 at 5:52 AM, Sturla Molden <sturla@...> wrote: >> On 05.02.2012 23:39, Dimitri Tcaciuc wrote: >> >>> 3. Does it make sense to make OpenCL more explicit? >> >> >> No, it takes the usefuness of OpenCL away, which is that kernels are text >> strings and compiled at run-time. > > I'm not sure I understand you, maybe you could elaborate on that? By > "explicit" I merely meant that the user will explicitly specify that > they're working on OpenCL-enabled array or certain bit of Cython code > will get compiled into OpenCL program etc. I gave that some thought as well, like 'cdef double[::view.gpu, :] myarray', which would mean that the data is on the gpu. Generally though, I think it kind of defeats the purpose. E.g. if you have small arrays you probably don't want anything to be on the gpu, whereas if you have larger ones and sufficient computation operating on them, it might be worthwhile. The point is, as a user you don't care, you want your runtime to make a sensible decision. If you don't want anything to do with OpenCL, you can disable it, or if you want to only ever stay on the CPU, you could "pin" it there. >> >>> Heuristics and >>> automatic switching between, say, CPU and GPU is great for eg. Sage >>> users, but maybe not so much if you know exactly what you're doing >>> with your machine resources. E.g just having a library with thin(Continue reading)
On 7 February 2012 18:01, mark florisson <markflorisson88@...> wrote: > On 7 February 2012 17:22, Dimitri Tcaciuc <dtcaciuc@...> wrote: >> On Tue, Feb 7, 2012 at 5:52 AM, Sturla Molden <sturla@...> wrote: >>> On 05.02.2012 23:39, Dimitri Tcaciuc wrote: >>> >>>> 3. Does it make sense to make OpenCL more explicit? >>> >>> >>> No, it takes the usefuness of OpenCL away, which is that kernels are text >>> strings and compiled at run-time. >> >> I'm not sure I understand you, maybe you could elaborate on that? By >> "explicit" I merely meant that the user will explicitly specify that >> they're working on OpenCL-enabled array or certain bit of Cython code >> will get compiled into OpenCL program etc. > > I gave that some thought as well, like 'cdef double[::view.gpu, :] > myarray', which would mean that the data is on the gpu. Generally > though, I think it kind of defeats the purpose. E.g. if you have small > arrays you probably don't want anything to be on the gpu, whereas if > you have larger ones and sufficient computation operating on them, it > might be worthwhile. The point is, as a user you don't care, you want > your runtime to make a sensible decision. If you don't want anything > to do with OpenCL, you can disable it, or if you want to only ever > stay on the CPU, you could "pin" it there. As for code regions, only operations on memoryview slices (most notably vector operations) and prange sections would be compiled (and only if possible at all). Maybe normal loops could be compiled as(Continue reading)
On 7 February 2012 13:52, Sturla Molden <sturla@...> wrote: > On 05.02.2012 23:39, Dimitri Tcaciuc wrote: > >> 3. Does it make sense to make OpenCL more explicit? > > > No, it takes the usefuness of OpenCL away, which is that kernels are text > strings and compiled at run-time. > I don't know why you think that is necessary. Obviously Cython's translated opencl would also be compiled at runtime (or loaded from a cache). If you mean you can't do string interpolation, I don't see why you would need that. >> Heuristics and >> automatic switching between, say, CPU and GPU is great for eg. Sage >> users, but maybe not so much if you know exactly what you're doing >> with your machine resources. E.g just having a library with thin >> cython-adapted wrappers would be awesome. I imagine this can be >> augmented by arrays having a knowledge of device-side/client-side >> (which would go towards addressing the issue 1. above) > > > Just use PyOpenCL and manipulate kernels as text. Python is excellent for > that - Cython is not needed. If you think using Cython instead of Python > (PyOpenCL and NumPy) will be important, you don't have a CPU bound problem > that warrants the use of OpenCL. > > Sturla(Continue reading)
On 02/05/2012 10:57 PM, mark florisson wrote: > Hey, > > I created a CEP for opencl support: http://wiki.cython.org/enhancements/opencl > What do you think? To start with my own conclusion on this, my feel is that it is too little gain, at least for a GPU solution. There's already Theano for trivial SIMD-stuff and PyOpenCL for the getting-hands-dirty stuff. (Of course, this CEP would be more convenient to use than Theano if one is already using Cython.) But that's just my feeling, and I'm not the one potentially signing up to do the work, so whether it is "worth it" is really not my decision, the weighing is done with your weights, not mine. Given an implementation, I definitely support the inclusion in Cython for these kind of features (FWIW). First, CPU: OpenCL is probably a very good way of portably making use of SSE/AVX etc. But to really get a payoff then I would think that the real value would be in *not* using OpenCL vector types, just many threads, so that the OpenCL driver does the dirty work of mapping each thread to each slot in the CPU registers? I'd think the gain in using OpenCL is to emit scalar code and leave the dirty work to OpenCL. If one does the hard part and mapped variables to vectors and memory accesses to shuffles, one might as well go the whole length and emit SSE/AVX rather than OpenCL to avoid the startup overhead.(Continue reading)
On Wed, Feb 8, 2012 at 6:46 AM, Dag Sverre Seljebotn <d.s.seljebotn@...> wrote: > On 02/05/2012 10:57 PM, mark florisson wrote: > > I don't really know how good the Intel and AMD CPU drivers are w.r.t. this > -- I have seen the Intel driver emit "vectorizing" and "could not > vectorize", but didn't explore the circumstances. For our project, we've tried both Intel and AMD (previously ATI) backends. The AMD experience somewhat mirrors what this developer described (http://www.msoos.org/2012/01/amds-opencl-heaven-and-hell/), although not as bad in terms of silent failures (or maybe I just havent caught any!). Intel backend was great and clearly better in terms of performance, sometimes by about 20-30%. However, when ran on older AMD-based machine as opposed to Intel one, the resulting kernel simply segfaulted without any warning about an unsupported architecture (I think its because it didn't have SSE3 support). > > Dag Sverre > > _______________________________________________ > cython-devel mailing list > cython-devel@... > http://mail.python.org/mailman/listinfo/cython-devel I know Intel is working with LLVM/Clang folks to introduce their vectorization additions, at least to some degree, and LLVM seems to be(Continue reading)
On 8 February 2012 17:35, Dimitri Tcaciuc <dtcaciuc@...> wrote: > On Wed, Feb 8, 2012 at 6:46 AM, Dag Sverre Seljebotn > <d.s.seljebotn@...> wrote: >> On 02/05/2012 10:57 PM, mark florisson wrote: >> >> I don't really know how good the Intel and AMD CPU drivers are w.r.t. this >> -- I have seen the Intel driver emit "vectorizing" and "could not >> vectorize", but didn't explore the circumstances. > > For our project, we've tried both Intel and AMD (previously ATI) > backends. The AMD experience somewhat mirrors what this developer > described (http://www.msoos.org/2012/01/amds-opencl-heaven-and-hell/), > although not as bad in terms of silent failures (or maybe I just > havent caught any!). > > Intel backend was great and clearly better in terms of performance, > sometimes by about 20-30%. However, when ran on older AMD-based > machine as opposed to Intel one, the resulting kernel simply > segfaulted without any warning about an unsupported architecture (I > think its because it didn't have SSE3 support). > >> >> Dag Sverre >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel@... >> http://mail.python.org/mailman/listinfo/cython-devel > >(Continue reading)
On 8 February 2012 14:46, Dag Sverre Seljebotn <d.s.seljebotn <at> astro.uio.no> wrote: > On 02/05/2012 10:57 PM, mark florisson wrote: >> >> Hey, >> >> I created a CEP for opencl support: >> http://wiki.cython.org/enhancements/opencl >> What do you think? > > > To start with my own conclusion on this, my feel is that it is too little > gain, at least for a GPU solution. There's already Theano for trivial > SIMD-stuff and PyOpenCL for the getting-hands-dirty stuff. (Of course, this > CEP would be more convenient to use than Theano if one is already using > Cython.) Yes, vector operations and elemental or reduction functions operator on vectors (which is what we can use Theano for, right?) don't quite merit the use of OpenCL. However, the upside is that OpenCL allows easier vectorization and multi-threading. We can appease to auto-vectorizing compilers, but e.g. using OpenMP for multithreading will still segfault the program if used outside the main thread with gcc's implementation. I believe intel allows you to use it in any thread. (Of course, keeping a thread pool around and managing it manually isn't too hard, but...) > But that's just my feeling, and I'm not the one potentially signing up to do > the work, so whether it is "worth it" is really not my decision, the > weighing is done with your weights, not mine. Given an implementation, I(Continue reading)
On 02/08/2012 11:11 PM, mark florisson wrote: > On 8 February 2012 14:46, Dag Sverre Seljebotn > <d.s.seljebotn@...> wrote: >> On 02/05/2012 10:57 PM, mark florisson wrote: >>> >>> Hey, >>> >>> I created a CEP for opencl support: >>> http://wiki.cython.org/enhancements/opencl >>> What do you think? >> >> >> To start with my own conclusion on this, my feel is that it is too little >> gain, at least for a GPU solution. There's already Theano for trivial >> SIMD-stuff and PyOpenCL for the getting-hands-dirty stuff. (Of course, this >> CEP would be more convenient to use than Theano if one is already using >> Cython.) > > Yes, vector operations and elemental or reduction functions operator > on vectors (which is what we can use Theano for, right?) don't quite > merit the use of OpenCL. However, the upside is that OpenCL allows > easier vectorization and multi-threading. We can appease to > auto-vectorizing compilers, but e.g. using OpenMP for multithreading > will still segfault the program if used outside the main thread with > gcc's implementation. I believe intel allows you to use it in any > thread. (Of course, keeping a thread pool around and managing it > manually isn't too hard, but...) > >> But that's just my feeling, and I'm not the one potentially signing up to do >> the work, so whether it is "worth it" is really not my decision, the(Continue reading)
On 02/09/2012 12:15 AM, Dag Sverre Seljebotn wrote: > On 02/08/2012 11:11 PM, mark florisson wrote: >> On 8 February 2012 14:46, Dag Sverre Seljebotn >> <d.s.seljebotn@...> wrote: >>> On 02/05/2012 10:57 PM, mark florisson wrote: >>>> >>>> Hey, >>>> >>>> I created a CEP for opencl support: >>>> http://wiki.cython.org/enhancements/opencl >>>> What do you think? >>> >>> >>> To start with my own conclusion on this, my feel is that it is too >>> little >>> gain, at least for a GPU solution. There's already Theano for trivial >>> SIMD-stuff and PyOpenCL for the getting-hands-dirty stuff. (Of >>> course, this >>> CEP would be more convenient to use than Theano if one is already using >>> Cython.) >> >> Yes, vector operations and elemental or reduction functions operator >> on vectors (which is what we can use Theano for, right?) don't quite >> merit the use of OpenCL. However, the upside is that OpenCL allows >> easier vectorization and multi-threading. We can appease to >> auto-vectorizing compilers, but e.g. using OpenMP for multithreading >> will still segfault the program if used outside the main thread with >> gcc's implementation. I believe intel allows you to use it in any >> thread. (Of course, keeping a thread pool around and managing it >> manually isn't too hard, but...)(Continue reading)
On 8 February 2012 23:28, Dag Sverre Seljebotn <d.s.seljebotn@...> wrote: > On 02/09/2012 12:15 AM, Dag Sverre Seljebotn wrote: >> >> On 02/08/2012 11:11 PM, mark florisson wrote: >>> >>> On 8 February 2012 14:46, Dag Sverre Seljebotn >>> <d.s.seljebotn@...> wrote: >>>> >>>> On 02/05/2012 10:57 PM, mark florisson wrote: >>>>> >>>>> >>>>> Hey, >>>>> >>>>> I created a CEP for opencl support: >>>>> http://wiki.cython.org/enhancements/opencl >>>>> What do you think? >>>> >>>> >>>> >>>> To start with my own conclusion on this, my feel is that it is too >>>> little >>>> gain, at least for a GPU solution. There's already Theano for trivial >>>> SIMD-stuff and PyOpenCL for the getting-hands-dirty stuff. (Of >>>> course, this >>>> CEP would be more convenient to use than Theano if one is already using >>>> Cython.) >>> >>> >>> Yes, vector operations and elemental or reduction functions operator(Continue reading)
RSS Feed