Jeffrey | 28 Jul 2012 18:23
Picon

scipy.stats.kendalltau bug?

Dear all,

     The sentences bellow will always raise an Error or Exception just 
as follows, which is a little anomaly. Is this a bug?

     >>> u1=numpy.random.rand(100000)
     >>> u2=numpy.random.rand(100000)
     >>> scipy.stats.kendalltau(u1,u2)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/zfyuan/phd/paper1/pyvine_lap/<ipython-input-28-98f367090ed1> in 
<module>()
----> 1 sp.stats.kendalltau(u1,u2)

/usr/lib64/python2.7/site-packages/scipy/stats/stats.pyc in 
kendalltau(x, y, initial_lexsort)
    2673
    2674     tau = ((tot - (v + u - t)) - 2.0 * exchanges) / \
-> 2675                     np.sqrt((tot - u) * (tot - v))
    2676
    2677     # what follows reproduces the ending of Gary Strangman's 
original

AttributeError: sqrt

--

-- 

Jeffrey
Jeffrey | 28 Jul 2012 18:48
Picon

Re: scipy.stats.kendalltau bug?

On 07/29/2012 12:23 AM, Jeffrey wrote:
> Dear all,
>
>     The sentences bellow will always raise an Error or Exception just 
> as follows, which is a little anomaly. Is this a bug?
>
>     >>> u1=numpy.random.rand(100000)
>     >>> u2=numpy.random.rand(100000)
>     >>> scipy.stats.kendalltau(u1,u2)
> --------------------------------------------------------------------------- 
>
> AttributeError                            Traceback (most recent call 
> last)
> /home/zfyuan/phd/paper1/pyvine_lap/<ipython-input-28-98f367090ed1> in 
> <module>()
> ----> 1 sp.stats.kendalltau(u1,u2)
>
> /usr/lib64/python2.7/site-packages/scipy/stats/stats.pyc in 
> kendalltau(x, y, initial_lexsort)
>    2673
>    2674     tau = ((tot - (v + u - t)) - 2.0 * exchanges) / \
> -> 2675                     np.sqrt((tot - u) * (tot - v))
>    2676
>    2677     # what follows reproduces the ending of Gary Strangman's 
> original
>
>
> AttributeError: sqrt
>

(Continue reading)

eat | 28 Jul 2012 19:06
Picon

Re: scipy.stats.kendalltau bug?

Hi,

On Sat, Jul 28, 2012 at 7:48 PM, Jeffrey <zfyuan <at> mail.ustc.edu.cn> wrote:
On 07/29/2012 12:23 AM, Jeffrey wrote:
> Dear all,
>
>     The sentences bellow will always raise an Error or Exception just
> as follows, which is a little anomaly. Is this a bug?
>
>     >>> u1=numpy.random.rand(100000)
>     >>> u2=numpy.random.rand(100000)
>     >>> scipy.stats.kendalltau(u1,u2)
> ---------------------------------------------------------------------------
>
> AttributeError                            Traceback (most recent call
> last)
> /home/zfyuan/phd/paper1/pyvine_lap/<ipython-input-28-98f367090ed1> in
> <module>()
> ----> 1 sp.stats.kendalltau(u1,u2)
>
> /usr/lib64/python2.7/site-packages/scipy/stats/stats.pyc in
> kendalltau(x, y, initial_lexsort)
>    2673
>    2674     tau = ((tot - (v + u - t)) - 2.0 * exchanges) / \
> -> 2675                     np.sqrt((tot - u) * (tot - v))
>    2676
>    2677     # what follows reproduces the ending of Gary Strangman's
> original
>
>
> AttributeError: sqrt
>

Sorry, I didn't describe this bug with details. What I mean is that when
the two array have larger length, for example with length 100000, then
it is more possible that the Error would occur.

My scipy version is 0.9.0 and numpy is 1.6.2.

Thanks a lot for your answering.
I can confirm this, like
In []: os.sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'
In []: sp.version.version
Out[]: '0.9.0'

In []: stats.kendalltau(rand(77929), rand(77929))
Out[]: (0.0060807135427758865, 0.010891543687108114)
In []: stats.kendalltau(rand(77939), rand(77939))
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "C:\Python27\lib\site-packages\scipy\stats\stats.py", line 2675, in kendalltau
    np.sqrt((tot - u) * (tot - v))
AttributeError: sqrt

There really seems to be odd problem above a certain length of arrays.


My 2 cents,
-eat 

--

Jeffrey


_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Jeffrey | 29 Jul 2012 09:27
Picon

Re: scipy.stats.kendalltau bug?

On 07/29/2012 01:06 AM, eat wrote:
Hi,

On Sat, Jul 28, 2012 at 7:48 PM, Jeffrey <zfyuan <at> mail.ustc.edu.cn> wrote:
On 07/29/2012 12:23 AM, Jeffrey wrote:
> Dear all,
>
>     The sentences bellow will always raise an Error or Exception just
> as follows, which is a little anomaly. Is this a bug?
>
>     >>> u1=numpy.random.rand(100000)
>     >>> u2=numpy.random.rand(100000)
>     >>> scipy.stats.kendalltau(u1,u2)
> ---------------------------------------------------------------------------
>
> AttributeError                            Traceback (most recent call
> last)
> /home/zfyuan/phd/paper1/pyvine_lap/<ipython-input-28-98f367090ed1> in
> <module>()
> ----> 1 sp.stats.kendalltau(u1,u2)
>
> /usr/lib64/python2.7/site-packages/scipy/stats/stats.pyc in
> kendalltau(x, y, initial_lexsort)
>    2673
>    2674     tau = ((tot - (v + u - t)) - 2.0 * exchanges) / \
> -> 2675                     np.sqrt((tot - u) * (tot - v))
>    2676
>    2677     # what follows reproduces the ending of Gary Strangman's
> original
>
>
> AttributeError: sqrt
>

Sorry, I didn't describe this bug with details. What I mean is that when
the two array have larger length, for example with length 100000, then
it is more possible that the Error would occur.

My scipy version is 0.9.0 and numpy is 1.6.2.

Thanks a lot for your answering.
I can confirm this, like
In []: os.sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'
In []: sp.version.version
Out[]: '0.9.0'

In []: stats.kendalltau(rand(77929), rand(77929))
Out[]: (0.0060807135427758865, 0.010891543687108114)
In []: stats.kendalltau(rand(77939), rand(77939))
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "C:\Python27\lib\site-packages\scipy\stats\stats.py", line 2675, in kendalltau
    np.sqrt((tot - u) * (tot - v))
AttributeError: sqrt

There really seems to be odd problem above a certain length of arrays.


My 2 cents,
-eat 

--

Jeffrey


_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________ SciPy-User mailing list SciPy-User <at> scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user

Thanks eat. I found the reason is that numpy.sqrt cannot deal with too large number. When calculating kendalltau, assume n=len(x),then the total pair number is 'tot' below:

    tot=(n-1)*n//2

when calculating tau, the de-numerator is as below:

    np.sqrt((tot-u)*(tot-v))

u and v stands for ties in x[] and y[perm[]], which is zero if the two array sample from continuous dist. Hence (tot-u)*(tot-v) may be out of range for the C written ufunc 'np.sqrt', and an Error is then raised.

What about using math.sqrt here, or multiply two np.sqrt in the de-numerator? Since big data sets are often seen these days.

Thanks a lot !
 

-- Jeffrey
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Nathaniel Smith | 29 Jul 2012 09:47
Picon
Favicon

Re: scipy.stats.kendalltau bug?

On Sun, Jul 29, 2012 at 8:27 AM, Jeffrey <zfyuan <at> mail.ustc.edu.cn> wrote:
> Thanks eat. I found the reason is that numpy.sqrt cannot deal with too large
> number. When calculating kendalltau, assume n=len(x),then the total pair
> number is 'tot' below:
>
>     tot=(n-1)*n//2
>
> when calculating tau, the de-numerator is as below:
>
>     np.sqrt((tot-u)*(tot-v))
>
> u and v stands for ties in x[] and y[perm[]], which is zero if the two array
> sample from continuous dist. Hence (tot-u)*(tot-v) may be out of range for
> the C written ufunc 'np.sqrt', and an Error is then raised.
>
> What about using math.sqrt here, or multiply two np.sqrt in the
> de-numerator? Since big data sets are often seen these days.

It seems like the bug is that np.sqrt is raising an AttributeError on
valid input... can you give an example of a value that np.sqrt fails
on? Like

>>> np.sqrt(<something>)
AttributeError

-n
Jeffrey | 29 Jul 2012 11:42
Picon

Re: scipy.stats.kendalltau bug?

On 07/29/2012 03:47 PM, Nathaniel Smith wrote:
> On Sun, Jul 29, 2012 at 8:27 AM, Jeffrey <zfyuan <at> mail.ustc.edu.cn> wrote:
>> Thanks eat. I found the reason is that numpy.sqrt cannot deal with too large
>> number. When calculating kendalltau, assume n=len(x),then the total pair
>> number is 'tot' below:
>>
>>      tot=(n-1)*n//2
>>
>> when calculating tau, the de-numerator is as below:
>>
>>      np.sqrt((tot-u)*(tot-v))
>>
>> u and v stands for ties in x[] and y[perm[]], which is zero if the two array
>> sample from continuous dist. Hence (tot-u)*(tot-v) may be out of range for
>> the C written ufunc 'np.sqrt', and an Error is then raised.
>>
>> What about using math.sqrt here, or multiply two np.sqrt in the
>> de-numerator? Since big data sets are often seen these days.
> It seems like the bug is that np.sqrt is raising an AttributeError on
> valid input... can you give an example of a value that np.sqrt fails
> on? Like

Assume the input array x and y has n=100000 length, which is common 
seen, and assume there is no tie in both x and y, hence u=0, v=0 and t=0 
in the scipy.stats.kendalltau subroutine. Hence the de-numerator of 
expression for calculating tau would be as follows:

     np.sqrt( (tot-u) * (tot-v) )

Here above, tot= n * (n-1) //2=499950000, and (tot-u) * (tot-v)= tot*tot 
= 24999500002500000000L, this long int will raise Error when np.sqrt is 
applied. I think type convert, like 'float()' should be done before 
np.sqrt, or write like np.sqrt(tot-u) * np.sqrt(tot-v) to avoid long 
integer.

Thanks a lot : )

>>>> np.sqrt(<something>)
> AttributeError
>
> -n
> _______________________________________________
> SciPy-User mailing list
> SciPy-User <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

--

-- 
袁振飞
中国科技大学统计与金融系
安徽省,合肥市,230026
联系电话:13155190081

_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Nathaniel Smith | 29 Jul 2012 12:30
Picon
Favicon

Re: scipy.stats.kendalltau bug?

On Sun, Jul 29, 2012 at 10:42 AM, Jeffrey <zfyuan <at> mail.ustc.edu.cn> wrote:
> On 07/29/2012 03:47 PM, Nathaniel Smith wrote:
>> On Sun, Jul 29, 2012 at 8:27 AM, Jeffrey <zfyuan <at> mail.ustc.edu.cn> wrote:
>>> Thanks eat. I found the reason is that numpy.sqrt cannot deal with too large
>>> number. When calculating kendalltau, assume n=len(x),then the total pair
>>> number is 'tot' below:
>>>
>>>      tot=(n-1)*n//2
>>>
>>> when calculating tau, the de-numerator is as below:
>>>
>>>      np.sqrt((tot-u)*(tot-v))
>>>
>>> u and v stands for ties in x[] and y[perm[]], which is zero if the two array
>>> sample from continuous dist. Hence (tot-u)*(tot-v) may be out of range for
>>> the C written ufunc 'np.sqrt', and an Error is then raised.
>>>
>>> What about using math.sqrt here, or multiply two np.sqrt in the
>>> de-numerator? Since big data sets are often seen these days.
>> It seems like the bug is that np.sqrt is raising an AttributeError on
>> valid input... can you give an example of a value that np.sqrt fails
>> on? Like
>
> Assume the input array x and y has n=100000 length, which is common
> seen, and assume there is no tie in both x and y, hence u=0, v=0 and t=0
> in the scipy.stats.kendalltau subroutine. Hence the de-numerator of
> expression for calculating tau would be as follows:
>
>      np.sqrt( (tot-u) * (tot-v) )
>
> Here above, tot= n * (n-1) //2=499950000, and (tot-u) * (tot-v)= tot*tot
> = 24999500002500000000L, this long int will raise Error when np.sqrt is
> applied. I think type convert, like 'float()' should be done before
> np.sqrt, or write like np.sqrt(tot-u) * np.sqrt(tot-v) to avoid long
> integer.
>
> Thanks a lot : )

Thanks, that clarifies things: https://github.com/numpy/numpy/issues/368

For now, yeah, some sort of workaround makes sense, though... in
addition to the ones you mention, I noticed that this also seems to
work:

np.sqrt(bignum, dtype=float)

You should submit a pull request :-).

-n
Jeffrey | 29 Jul 2012 13:25
Picon

Re: scipy.stats.kendalltau bug?

On 07/29/2012 06:30 PM, Nathaniel Smith wrote:
> On Sun, Jul 29, 2012 at 10:42 AM, Jeffrey <zfyuan <at> mail.ustc.edu.cn> wrote:
>> On 07/29/2012 03:47 PM, Nathaniel Smith wrote:
>>> On Sun, Jul 29, 2012 at 8:27 AM, Jeffrey <zfyuan <at> mail.ustc.edu.cn> wrote:
>>>> Thanks eat. I found the reason is that numpy.sqrt cannot deal with too large
>>>> number. When calculating kendalltau, assume n=len(x),then the total pair
>>>> number is 'tot' below:
>>>>
>>>>       tot=(n-1)*n//2
>>>>
>>>> when calculating tau, the de-numerator is as below:
>>>>
>>>>       np.sqrt((tot-u)*(tot-v))
>>>>
>>>> u and v stands for ties in x[] and y[perm[]], which is zero if the two array
>>>> sample from continuous dist. Hence (tot-u)*(tot-v) may be out of range for
>>>> the C written ufunc 'np.sqrt', and an Error is then raised.
>>>>
>>>> What about using math.sqrt here, or multiply two np.sqrt in the
>>>> de-numerator? Since big data sets are often seen these days.
>>> It seems like the bug is that np.sqrt is raising an AttributeError on
>>> valid input... can you give an example of a value that np.sqrt fails
>>> on? Like
>> Assume the input array x and y has n=100000 length, which is common
>> seen, and assume there is no tie in both x and y, hence u=0, v=0 and t=0
>> in the scipy.stats.kendalltau subroutine. Hence the de-numerator of
>> expression for calculating tau would be as follows:
>>
>>       np.sqrt( (tot-u) * (tot-v) )
>>
>> Here above, tot= n * (n-1) //2=499950000, and (tot-u) * (tot-v)= tot*tot
>> = 24999500002500000000L, this long int will raise Error when np.sqrt is
>> applied. I think type convert, like 'float()' should be done before
>> np.sqrt, or write like np.sqrt(tot-u) * np.sqrt(tot-v) to avoid long
>> integer.
>>
>> Thanks a lot : )
> Thanks, that clarifies things: https://github.com/numpy/numpy/issues/368
>
> For now, yeah, some sort of workaround makes sense, though... in
> addition to the ones you mention, I noticed that this also seems to
> work:
>
> np.sqrt(bignum, dtype=float)
>
> You should submit a pull request :-).
>
> -n
> _______________________________________________
> SciPy-User mailing list
> SciPy-User <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

:-) thanks for your pull request. My English is a little pool, and I'm 
new to Python.

--

-- 

Jeffrey
Ralf Gommers | 1 Aug 2012 21:03
Gravatar

Re: scipy.stats.kendalltau bug?



On Sun, Jul 29, 2012 at 12:30 PM, Nathaniel Smith <njs <at> pobox.com> wrote:
On Sun, Jul 29, 2012 at 10:42 AM, Jeffrey <zfyuan <at> mail.ustc.edu.cn> wrote:
> On 07/29/2012 03:47 PM, Nathaniel Smith wrote:
>> On Sun, Jul 29, 2012 at 8:27 AM, Jeffrey <zfyuan <at> mail.ustc.edu.cn> wrote:
>>> Thanks eat. I found the reason is that numpy.sqrt cannot deal with too large
>>> number. When calculating kendalltau, assume n=len(x),then the total pair
>>> number is 'tot' below:
>>>
>>>      tot=(n-1)*n//2
>>>
>>> when calculating tau, the de-numerator is as below:
>>>
>>>      np.sqrt((tot-u)*(tot-v))
>>>
>>> u and v stands for ties in x[] and y[perm[]], which is zero if the two array
>>> sample from continuous dist. Hence (tot-u)*(tot-v) may be out of range for
>>> the C written ufunc 'np.sqrt', and an Error is then raised.
>>>
>>> What about using math.sqrt here, or multiply two np.sqrt in the
>>> de-numerator? Since big data sets are often seen these days.
>> It seems like the bug is that np.sqrt is raising an AttributeError on
>> valid input... can you give an example of a value that np.sqrt fails
>> on? Like
>
> Assume the input array x and y has n=100000 length, which is common
> seen, and assume there is no tie in both x and y, hence u=0, v=0 and t=0
> in the scipy.stats.kendalltau subroutine. Hence the de-numerator of
> expression for calculating tau would be as follows:
>
>      np.sqrt( (tot-u) * (tot-v) )
>
> Here above, tot= n * (n-1) //2=499950000, and (tot-u) * (tot-v)= tot*tot
> = 24999500002500000000L, this long int will raise Error when np.sqrt is
> applied. I think type convert, like 'float()' should be done before
> np.sqrt, or write like np.sqrt(tot-u) * np.sqrt(tot-v) to avoid long
> integer.
>
> Thanks a lot : )

Thanks, that clarifies things: https://github.com/numpy/numpy/issues/368

For now, yeah, some sort of workaround makes sense, though... in
addition to the ones you mention, I noticed that this also seems to
work:

np.sqrt(bignum, dtype=float)

You should submit a pull request :-).

This was already fixed for 0.10.x: https://github.com/scipy/scipy/commit/ce14ddb

Ralf
 

_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Sergio Rojas | 1 Aug 2012 16:06
Picon

Re: scipy.stats.kendalltau bug? (Jeffrey)


Dear all, The sentences bellow will always raise an Error or Exception just as follows, which is a little anomaly. Is this a bug? >>> u1=numpy.random.rand(100000) >>> u2=numpy.random.rand(100000) >>> scipy.stats.kendalltau(u1,u2) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/zfyuan/phd/paper1/pyvine_lap/<ipython-input-28-98f367090ed1> in <module>() ----> 1 sp.stats.kendalltau(u1,u2) /usr/lib64/python2.7/site-packages/scipy/stats/stats.pyc in kendalltau(x, y, initial_lexsort) 2673 2674 tau = ((tot - (v + u - t)) - 2.0 * exchanges) / \ -> 2675 np.sqrt((tot - u) * (tot - v)) 2676 2677 # what follows reproduces the ending of Gary Strangman's original AttributeError: sqrt -- Jeffrey ------------------------------
>>> import numpy as np
>>> import scipy.stats
>>> u1=np.random.rand(100000)
>>> u2=np.random.rand(100000)
>>> scipy.stats.kendalltau(u1,u2)
(0.00094913269132691487, 0.65256243280384563)
>>> np.version.version
'1.6.1'
>>> scipy.version.version
'0.10.1'
>>> import os
>>> os.sys.version
'2.7.2 (default, Apr 21 2012, 14:16:53) \n[GCC 4.6.1]'

Sergio
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Gmane