Dag Sverre Seljebotn | 20 Feb 18:46 2012
Picon
Picon

ndarray and lazy evaluation (was: Proposed Rodmap Overview)

On 02/20/2012 09:24 AM, Olivier Delalleau wrote:
> Hi Dag,
>
> Would you mind elaborating a bit on that example you mentioned at the
> end of your email? I don't quite understand what behavior you would like
> to achieve

Sure, see below. I think we should continue discussion on numpy-discuss.

I wrote:

> You need at least a slightly different Python API to get anywhere, so
> numexpr/Theano is the right place to work on an implementation of this
> idea. Of course it would be nice if numexpr/Theano offered something as
> convenient as
>
> with lazy:
>      arr = A + B + C # with all of these NumPy arrays
> # compute upon exiting...

More information:

The disadvantage today of using Theano (or numexpr) is that they require 
using a different API, so that one has to learn and use Theano "from the 
ground up", rather than just slap it on in an optimization phase.

The alternative would require extensive changes to NumPy, so I guess 
Theano authors or Francesc would need to push for this.

The alternative would be (with A, B, C ndarray instances):
(Continue reading)

Francesc Alted | 20 Feb 19:04 2012

Re: ndarray and lazy evaluation (was: Proposed Rodmap Overview)

On Feb 20, 2012, at 6:46 PM, Dag Sverre Seljebotn wrote:

> On 02/20/2012 09:24 AM, Olivier Delalleau wrote:
>> Hi Dag,
>> 
>> Would you mind elaborating a bit on that example you mentioned at the
>> end of your email? I don't quite understand what behavior you would like
>> to achieve
> 
> Sure, see below. I think we should continue discussion on numpy-discuss.
> 
> I wrote:
> 
>> You need at least a slightly different Python API to get anywhere, so
>> numexpr/Theano is the right place to work on an implementation of this
>> idea. Of course it would be nice if numexpr/Theano offered something as
>> convenient as
>> 
>> with lazy:
>>     arr = A + B + C # with all of these NumPy arrays
>> # compute upon exiting...
> 
> More information:
> 
> The disadvantage today of using Theano (or numexpr) is that they require 
> using a different API, so that one has to learn and use Theano "from the 
> ground up", rather than just slap it on in an optimization phase.
> 
> The alternative would require extensive changes to NumPy, so I guess 
> Theano authors or Francesc would need to push for this.
(Continue reading)

Dag Sverre Seljebotn | 20 Feb 19:14 2012
Picon
Picon

Re: ndarray and lazy evaluation

On 02/20/2012 10:04 AM, Francesc Alted wrote:
> On Feb 20, 2012, at 6:46 PM, Dag Sverre Seljebotn wrote:
>
>> On 02/20/2012 09:24 AM, Olivier Delalleau wrote:
>>> Hi Dag,
>>>
>>> Would you mind elaborating a bit on that example you mentioned at the
>>> end of your email? I don't quite understand what behavior you would like
>>> to achieve
>>
>> Sure, see below. I think we should continue discussion on numpy-discuss.
>>
>> I wrote:
>>
>>> You need at least a slightly different Python API to get anywhere, so
>>> numexpr/Theano is the right place to work on an implementation of this
>>> idea. Of course it would be nice if numexpr/Theano offered something as
>>> convenient as
>>>
>>> with lazy:
>>>      arr = A + B + C # with all of these NumPy arrays
>>> # compute upon exiting...
>>
>> More information:
>>
>> The disadvantage today of using Theano (or numexpr) is that they require
>> using a different API, so that one has to learn and use Theano "from the
>> ground up", rather than just slap it on in an optimization phase.
>>
>> The alternative would require extensive changes to NumPy, so I guess
(Continue reading)

James Bergstra | 20 Feb 20:26 2012
Picon

Re: ndarray and lazy evaluation


On Mon, Feb 20, 2012 at 12:28 PM, Francesc Alted <francesc <at> continuum.io> wrote:
On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:
> You need at least a slightly different Python API to get anywhere, so
> numexpr/Theano is the right place to work on an implementation of this
> idea. Of course it would be nice if numexpr/Theano offered something as
> convenient as
>
> with lazy:
>     arr = A + B + C # with all of these NumPy arrays
> # compute upon exiting…

Hmm, that would be cute indeed.  Do you have an idea on how the code in the with context could be passed to the Python AST compiler (à la numexpr.evaluate("A + B + C"))?


The biggest problem with the numexpr approach (e.g. evaluate("A + B + C")) whether the programmer has to type the quotes or not, is that the sub-program has to be completely expressed in the sub-language.

If I write 

>>> def f(x): return x[:3]
>>> numexpr.evaluate("A + B + f(C)")

I would like that to be fast, but it's not obvious at all how that would work. We would be asking numexpr to introspect arbitrary callable python objects, and recompile arbitrary Python code, effectively setting up the expectation in the user's mind that numexpr is re-implementing an entire compiler. That can be fast obviously, but it seems to me to represent significant departure from numpy's focus, which I always thought was the data-container rather than the expression evaluation (though maybe this firestorm of discussion is aimed at changing this?)

Theano went with another option which was to replace the A, B, and C variables with objects that have a modified __add__. Theano's back-end can be slow at times and the codebase can feel like a heavy dependency, but my feeling is still that this is a great approach to getting really fast implementations of compound expressions.

The context syntax you suggest using is a little ambiguous in that the indented block of a with statement block includes *statements* whereas what you mean to build in the indented block is a *single expression* graph.  You could maybe get the right effect with something like

A, B, C = np.random.rand(3, 5)

expr = np.compound_expression()
with np.expression_builder(expr) as foo:
   arr = A + B + C
   brr = A + B * C
   foo.return((arr, brr))

# compute arr and brr as quickly as possible
a, b = expr.run()

# modify one of the arrays that the expression was compiled to use
A[:] += 1

# re-run the compiled expression on the new value
a, b = expr.run()

- JB

-- 
James Bergstra, Ph.D.
Research Scientist
Rowland Institute, Harvard University
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
James Bergstra | 20 Feb 20:26 2012
Picon

Re: ndarray and lazy evaluation


On Mon, Feb 20, 2012 at 1:01 PM, James Bergstra <james.bergstra <at> gmail.com> wrote:
On Mon, Feb 20, 2012 at 12:28 PM, Francesc Alted <francesc <at> continuum.io> wrote:
On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:
> You need at least a slightly different Python API to get anywhere, so
> numexpr/Theano is the right place to work on an implementation of this
> idea. Of course it would be nice if numexpr/Theano offered something as
> convenient as
>
> with lazy:
>     arr = A + B + C # with all of these NumPy arrays
> # compute upon exiting…

Hmm, that would be cute indeed.  Do you have an idea on how the code in the with context could be passed to the Python AST compiler (à la numexpr.evaluate("A + B + C"))?


The biggest problem with the numexpr approach (e.g. evaluate("A + B + C")) whether the programmer has to type the quotes or not, is that the sub-program has to be completely expressed in the sub-language.

If I write 

>>> def f(x): return x[:3]
>>> numexpr.evaluate("A + B + f(C)")

I would like that to be fast, but it's not obvious at all how that would work. We would be asking numexpr to introspect arbitrary callable python objects, and recompile arbitrary Python code, effectively setting up the expectation in the user's mind that numexpr is re-implementing an entire compiler. That can be fast obviously, but it seems to me to represent significant departure from numpy's focus, which I always thought was the data-container rather than the expression evaluation (though maybe this firestorm of discussion is aimed at changing this?)

Theano went with another option which was to replace the A, B, and C variables with objects that have a modified __add__. Theano's back-end can be slow at times and the codebase can feel like a heavy dependency, but my feeling is still that this is a great approach to getting really fast implementations of compound expressions.

The context syntax you suggest using is a little ambiguous in that the indented block of a with statement block includes *statements* whereas what you mean to build in the indented block is a *single expression* graph.  You could maybe get the right effect with something like

A, B, C = np.random.rand(3, 5)

expr = np.compound_expression()
with np.expression_builder(expr) as foo:
   arr = A + B + C
   brr = A + B * C
   foo.return((arr, brr))

# compute arr and brr as quickly as possible
a, b = expr.run()

# modify one of the arrays that the expression was compiled to use
A[:] += 1

# re-run the compiled expression on the new value
a, b = expr.run()

- JB

I should add that the biggest benefit of expressing things as compound expressions in this way is not in saving temporaries (though that is nice) it's being able to express enough computation work at a time that it offsets the time required to ship the arguments off to a GPU for evaluation!  This has been a *huge* win reaped by the Theano approach, it works really well.  The abstraction boundary offered by this sort of expression graph has been really effective.

This speaks even more to the importance of distinguishing between the data container (e.g. numpy, Theano's internal ones, PyOpenCL's one, PyCUDA's one) and the expression compilation and evaluation infrastructures (e.g. Theano, numexpr, cython).  The goal should be as much as possible to separate these two so that programs can be expressed in a natural way, and then evaluated using containers that are suited to the program.

- JB  

-- 
James Bergstra, Ph.D.
Research Scientist
Rowland Institute, Harvard University

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Lluís | 20 Feb 20:57 2012
Picon
Picon

Re: ndarray and lazy evaluation

James Bergstra writes:
[...]
> I should add that the biggest benefit of expressing things as compound
> expressions in this way is not in saving temporaries (though that is nice) it's
> being able to express enough computation work at a time that it offsets the time
> required to ship the arguments off to a GPU for evaluation!

Right, that's exacly what you need for an "external computation" to pay off.

Just out of curiosity (feel free to respond with a RTFM or a RTFP :)), do you
support any of these? (sorry for the made-up names)

* automatic transfer double-buffering

* automatic problem partitioning into domains (e.g., multiple GPUs; even better
  if also supports nodes - MPI -)

* point-specific computations (e.g., code dependant on the thread id, although
  this can also be expressed in other ways, like index ranges)

* point-relative computations (the most common would be a stencil)

If you have all of them, then I'd say the project has a huge potential for total
world dominance :)

Lluis

--

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
James Bergstra | 20 Feb 21:30 2012
Picon

Re: ndarray and lazy evaluation



On Mon, Feb 20, 2012 at 2:57 PM, Lluís <xscript <at> gmx.net> wrote:
James Bergstra writes:
[...]
> I should add that the biggest benefit of expressing things as compound
> expressions in this way is not in saving temporaries (though that is nice) it's
> being able to express enough computation work at a time that it offsets the time
> required to ship the arguments off to a GPU for evaluation!

Right, that's exacly what you need for an "external computation" to pay off.

Just out of curiosity (feel free to respond with a RTFM or a RTFP :)), do you
support any of these? (sorry for the made-up names)

* automatic transfer double-buffering

Not currently, but it would be quite straightforward to do it. Email theano-dev and ask how if you really want to know.
 

* automatic problem partitioning into domains (e.g., multiple GPUs; even better
 if also supports nodes - MPI -)

Not currently, and it would be hard.
 

* point-specific computations (e.g., code dependant on the thread id, although
 this can also be expressed in other ways, like index ranges)


No.
 
* point-relative computations (the most common would be a stencil)


No, but I think theano provides a decent expression language to tackle this. The "Composite" element-wise code generator is an example of how I would think about this. It provides point-relative computations across several arguments.  You might want something different that applies a stencil computation across one or several arguments... the "scan" operator was another foray into this territory, and it got tricky when the stencil operation could have side-effects (like random number generation) and could define it's own input domain (stencil shape), but the result is quite powerful.

--
http://www-etud.iro.umontreal.ca/~bergstrj
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Gmane