On Wednesday, February 1, 2012, Pierre Haessig <
pierre.haessig <at> crans.org> wrote:
> Hi,
>
> [I'm not sure whether this discussion belongs to numpy-discussion or scipy-dev]
>
> In day to day time series analysis I regularly need to look at the data autocorrelation ("acorr" or "acf" depending on the software package).
> The straighforward available function I have is matplotlib.pyplot.acorr. However, for a moderately long time series (say of length 10**5) it taking a huge time just to just dislays the autocorrelation values within a small range of time lags.
> The main reason being it is relying on np.correlate(x,x, mode=2) while only a few lags are needed.
> (I guess mode=2 is an (old fashioned?) way to set mode='full')
>
> I know that np.correlate performance issue has been discussed already, and there is a *somehow* related ticket (
http://projects.scipy.org/numpy/ticket/1260). I noticed in the ticket's change number 2 the following comment by Josef : "Maybe a truncated convolution/correlation would be good". I'll come back to this soon.
>
> I made an example script "acf_timing.py" to start my point with some timing data :
>
> In Ipython:
>>>> run acf_timing.py # it imports statsmodel's acf + define 2 other acf implementations + an example data 10**5 samples long
>
> %time l,c = mpl_acf(a, 10)
> CPU times: user 8.69 s, sys: 0.00 s, total: 8.69 s
> Wall time: 11.18 s # pretty long...
>
> %time c = sm_acf(a, 10)
> CPU times: user 8.76 s, sys: 0.01 s, total: 8.78 s
> Wall time: 10.79 s # long as well. statsmodel has a similar underlying implementation
> #
http://statsmodels.sourceforge.net/generated/scikits.statsmodels.tsa.stattools.acf.html#scikits.statsmodels.tsa.stattools.acf
>
> #Now, better option : use the fft convolution
> %time c=sm_acf(a, 10,fft=True)
> CPU times: user 0.03 s, sys: 0.01 s, total: 0.04 s
> Wall time: 0.07 s
> # Fast, but I'm not sure about the memory implication of using fft though.
>
> #The naive option : just compute the acf lags that are needed
> %time l,c = naive_acf(a, 10)
> CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
> Wall time: 0.01 s
> # Iterative computation. Pretty silly but very fast
> # (Now of course, this naive implementation won't scale nicely for a lot of lags)
>
> Now comes (at last) the question : what should be done about this performance issue ?
> - should there be a truncated np.convolve/np.correlate function, as Josef suggested ?
> - or should people in need of autocorrelation find some workarounds because this usecase is not big enough to call for a change in np.convolve ?
>
> I really feel this question is about *where* a change should be implemented (numpy, scipy.signal, maplotlib ?) so that it makes sense while not breaking 10^10 lines of numpy related code...
>
> Best,
> Pierre
>
>
Speaking for matplotlib, the acorr() (and xcorr()) functions in mpl are merely a convenience. The proper place for any change would not be mpl (although, we would certainly take advantage of any improved acorr() and xcorr() that are made available in numpy.
Ben Root