Eric Firing | 6 Oct 02:46
Favicon

Axes.add_line() is oddly slow?

I am getting very inconsistent timings when looking into plotting a line 
with a very large number of points.  Axes.add_line() is very slow, and 
the time is taken by Axes._update_line_limits().  But when I simply run 
the latter, on a Line2D of the same dimensions, it can be fast.

import matplotlib
matplotlib.use('template')
import numpy as np
import matplotlib.lines as mlines
import matplotlib.pyplot as plt
ax = plt.gca()
LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
from time import time
t = time(); ax.add_line(LL); time()-t
###16.621543884277344
LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
t = time(); ax.add_line(LL); time()-t
###16.579419136047363
## We added two identical lines, each took 16 seconds.

LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
t = time(); ax._update_line_limits(LL); time()-t
###0.1733548641204834
## But when we made another identical line, updating the limits was
## fast.

# Below are similar experiments:
LL = mlines.Line2D(np.arange(1.5e6), 2*np.sin(np.arange(1.5e6)))
t = time(); ax._update_line_limits(LL); time()-t
###0.18362092971801758
(Continue reading)

Michael Droettboom | 7 Oct 16:18
Gravatar

Re: Axes.add_line() is oddly slow?

According to lsprofcalltree, the slowness appears to be entirely in the 
units code by a wide margin -- which is unfortunately code I understand 
very little about.  The difference in timing before and after adding the 
line to the axes appears to be because the unit conversion is not 
invalidated until the line has been added to an axes.

In units.get_converter(), it iterates through every *value* in the data 
to see if any of them require unit conversion, and returns the first one 
it finds.  It seems like if we're passing in a numpy array of numbers 
(i.e. not array of objects), then we're pretty much guaranteed from the 
get-go not to find a single value that requires unit conversion so we 
might as well not look.  Am I making the wrong assumption?

However, for lists, it also seems that, since the code returns the first 
converter it finds, maybe it could just look at the first element of the 
sequence, rather than the entire sequence.  It the first is not in the 
same unit as everything else, then the result will be broken anyway.  
For example, if I hack evans_test.py to contain a single int amongst the 
list of "Foo" objects in the data, I get an exception anyway, even as 
the code stands now.

I have attached a patch against unit.py to speed up the first case 
(passing Numpy arrays).  I think I need more feedback from the units 
experts whether my suggestion for lists (to only look at the first 
element) is reasonable.

Feel free to commit the patch if it seems reasonable to those who know 
more about units than I do.

Mike
(Continue reading)

John Hunter | 7 Oct 17:41
Gravatar

Re: Axes.add_line() is oddly slow?

On Tue, Oct 7, 2008 at 9:18 AM, Michael Droettboom <mdroe@...> wrote:
> According to lsprofcalltree, the slowness appears to be entirely in the
> units code by a wide margin -- which is unfortunately code I understand very
> little about.  The difference in timing before and after adding the line to
> the axes appears to be because the unit conversion is not invalidated until
> the line has been added to an axes.
>
> In units.get_converter(), it iterates through every *value* in the data to
> see if any of them require unit conversion, and returns the first one it
> finds.  It seems like if we're passing in a numpy array of numbers (i.e. not
> array of objects), then we're pretty much guaranteed from the get-go not to
> find a single value that requires unit conversion so we might as well not
> look.  Am I making the wrong assumption?
>
> However, for lists, it also seems that, since the code returns the first
> converter it finds, maybe it could just look at the first element of the
> sequence, rather than the entire sequence.  It the first is not in the same
> unit as everything else, then the result will be broken anyway.

I made this change -- return the converter from the first element --
and added Michael's non-object numpy arrat optimization too.  The
units code needs some attention, I just haven't been able to get to
it...

This helps performance considerably -- on backend driver:

Before:
  Backend agg took 1.32 minutes to complete
  Backend ps took 1.37 minutes to complete
  Backend pdf took 1.78 minutes to complete
(Continue reading)


Gmane