Travis Oliphant | 27 Jul 2012 08:30
Favicon
Gravatar

Status of NumPy and Python 3.3

Hey all, 

I'm wondering who has tried to make NumPy work with Python 3.3.   The Unicode handling was significantly
improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects)
is not working now.  

It would be nice to get 1.7.0 working with Python 3.3 if possible before the release.     Anyone interested in
tackling that little challenge?   If someone has already tried it would be nice to hear your experience. 

-Travis
David Cournapeau | 27 Jul 2012 10:28
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant <travis <at> continuum.io> wrote:
> Hey all,
>
> I'm wondering who has tried to make NumPy work with Python 3.3.   The Unicode handling was significantly
improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects)
is not working now.
>
> It would be nice to get 1.7.0 working with Python 3.3 if possible before the release.     Anyone interested in
tackling that little challenge?   If someone has already tried it would be nice to hear your experience.

Given that we're late with 1.7, I would suggest passing this to the
next release, unless the fix is simple (just a change of API).

cheers,

David
David Cournapeau | 27 Jul 2012 10:43
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau <cournape <at> gmail.com> wrote:
> On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant <travis <at> continuum.io> wrote:
>> Hey all,
>>
>> I'm wondering who has tried to make NumPy work with Python 3.3.   The Unicode handling was significantly
improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects)
is not working now.
>>
>> It would be nice to get 1.7.0 working with Python 3.3 if possible before the release.     Anyone interested in
tackling that little challenge?   If someone has already tried it would be nice to hear your experience.
>
> Given that we're late with 1.7, I would suggest passing this to the
> next release, unless the fix is simple (just a change of API).

I took a brief look at it, and from the errors I have seen, one is
cosmetic, the other one is a bit more involved (rewriting
PyArray_Scalar unicode support). While it is not difficult in nature,
the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it
would require multiple configurations on multiple python versions to
be tested.

I don't think python 3.3 support is critical - people who want to play
with bet interpreters can build numpy by themselves from master, so I
am -1 on integrating this into 1.7.

I may have a fix within tonight for it, though,

David
Ralf Gommers | 27 Jul 2012 15:47
Gravatar

Re: Status of NumPy and Python 3.3



On Fri, Jul 27, 2012 at 10:43 AM, David Cournapeau <cournape <at> gmail.com> wrote:
On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau <cournape <at> gmail.com> wrote:
> On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant <travis <at> continuum.io> wrote:
>> Hey all,
>>
>> I'm wondering who has tried to make NumPy work with Python 3.3.   The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now.
>>
>> It would be nice to get 1.7.0 working with Python 3.3 if possible before the release.     Anyone interested in tackling that little challenge?   If someone has already tried it would be nice to hear your experience.
>
> Given that we're late with 1.7, I would suggest passing this to the
> next release, unless the fix is simple (just a change of API).

I took a brief look at it, and from the errors I have seen, one is
cosmetic, the other one is a bit more involved (rewriting
PyArray_Scalar unicode support). While it is not difficult in nature,
the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it
would require multiple configurations on multiple python versions to
be tested.

I don't think python 3.3 support is critical - people who want to play
with bet interpreters can build numpy by themselves from master, so I
am -1 on integrating this into 1.7.

I may have a fix within tonight for it, though,

Ralf

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ondřej Čertík | 27 Jul 2012 23:00
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Fri, Jul 27, 2012 at 6:47 AM, Ralf Gommers
<ralf.gommers <at> googlemail.com> wrote:
>
>
> On Fri, Jul 27, 2012 at 10:43 AM, David Cournapeau <cournape <at> gmail.com>
> wrote:
>>
>> On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau <cournape <at> gmail.com>
>> wrote:
>> > On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant <travis <at> continuum.io>
>> > wrote:
>> >> Hey all,
>> >>
>> >> I'm wondering who has tried to make NumPy work with Python 3.3.   The
>> >> Unicode handling was significantly improved in Python 3.3 and the
>> >> array-scalar code (which assumed a certain structure for UnicodeObjects) is
>> >> not working now.
>> >>
>> >> It would be nice to get 1.7.0 working with Python 3.3 if possible
>> >> before the release.     Anyone interested in tackling that little challenge?
>> >> If someone has already tried it would be nice to hear your experience.
>> >
>> > Given that we're late with 1.7, I would suggest passing this to the
>> > next release, unless the fix is simple (just a change of API).
>>
>> I took a brief look at it, and from the errors I have seen, one is
>> cosmetic, the other one is a bit more involved (rewriting
>> PyArray_Scalar unicode support). While it is not difficult in nature,
>> the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it
>> would require multiple configurations on multiple python versions to
>> be tested.
>>
>> I don't think python 3.3 support is critical - people who want to play
>> with bet interpreters can build numpy by themselves from master, so I
>> am -1 on integrating this into 1.7.
>>
>> I may have a fix within tonight for it, though,
>
>
> There are 2 tickets about this:
> http://projects.scipy.org/numpy/ticket/2145
> http://projects.scipy.org/numpy/ticket/1471

I am currently working on a PR trying to fix the unicode failures:

https://github.com/numpy/numpy/pull/366

It's a work in progress, I am still have some little issues, see the
PR for up-to-date details.

Ondrej
Stefan Krah | 28 Jul 2012 11:36

Re: Status of NumPy and Python 3.3

Ond??ej ??ert??k <ondrej.certik <at> gmail.com> wrote:
> >> I took a brief look at it, and from the errors I have seen, one is
> >> cosmetic, the other one is a bit more involved (rewriting
> >> PyArray_Scalar unicode support). While it is not difficult in nature,
> >> the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it
> >> would require multiple configurations on multiple python versions to
> >> be tested.

The cleanest way might be to leave the existing code in place and write
completely new and independent code for Python 3.3.

> https://github.com/numpy/numpy/pull/366
> 
> It's a work in progress, I am still have some little issues, see the
> PR for up-to-date details.

I'm not a Unicode expert, but I think it's best to avoid Py_UNICODE altogether.

What should matter in 3.3 is the maximum character in a Unicode string that
determines the kind of the string:

   PyUnicode_1BYTE_KIND  ->  Py_UCS1
   PyUnicode_2BYTE_KIND  ->  Py_UCS2
   PyUnicode_4BYTE_KIND  ->  Py_UCS4

So Py_UNICODE_WIDE should not matter as all builds support PyUnicode_4BYTE_KIND.
That's why I /think/ it's possible to drop Py_UNICODE altogether. For instance,
the line in https://github.com/certik/numpy/commit/d02e36e5c85d5ee444614254643037aafc8deccc
should probably be:

  itemsize = PyUnicode_GetLength(robj) * PyUnicode_KIND(robj)

Stefan Krah
Ondřej Čertík | 28 Jul 2012 16:58
Picon
Gravatar

Re: Status of NumPy and Python 3.3

Stefan,

On Sat, Jul 28, 2012 at 2:36 AM, Stefan Krah <stefan-usenet <at> bytereef.org> wrote:
> Ond??ej ??ert??k <ondrej.certik <at> gmail.com> wrote:
>> >> I took a brief look at it, and from the errors I have seen, one is
>> >> cosmetic, the other one is a bit more involved (rewriting
>> >> PyArray_Scalar unicode support). While it is not difficult in nature,
>> >> the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it
>> >> would require multiple configurations on multiple python versions to
>> >> be tested.
>
> The cleanest way might be to leave the existing code in place and write
> completely new and independent code for Python 3.3.
>
>
>> https://github.com/numpy/numpy/pull/366
>>
>> It's a work in progress, I am still have some little issues, see the
>> PR for up-to-date details.
>
> I'm not a Unicode expert, but I think it's best to avoid Py_UNICODE altogether.

I think so too.

>
> What should matter in 3.3 is the maximum character in a Unicode string that
> determines the kind of the string:
>
>    PyUnicode_1BYTE_KIND  ->  Py_UCS1
>    PyUnicode_2BYTE_KIND  ->  Py_UCS2
>    PyUnicode_4BYTE_KIND  ->  Py_UCS4
>
>
> So Py_UNICODE_WIDE should not matter as all builds support PyUnicode_4BYTE_KIND.
> That's why I /think/ it's possible to drop Py_UNICODE altogether. For instance,
> the line in https://github.com/certik/numpy/commit/d02e36e5c85d5ee444614254643037aafc8deccc
> should probably be:
>
>   itemsize = PyUnicode_GetLength(robj) * PyUnicode_KIND(robj)

Yes, I think that's it. I've changed it and pushed in the change into the PR.

I am now seeing failures like these:

======================================================================
ERROR: test_rmul (test_defchararray.TestOperations)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_defchararray.py",
line 592, in test_rmul
    Ar = np.array([[A[0,0]*r, A[0,1]*r],
  File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/defchararray.py",
line 1916, in __getitem__
    if issubclass(val.dtype.type, character) and not _len(val) == 0:
AttributeError: 'str' object has no attribute 'dtype'

Here is the code in defchararray.py:

1911 	        if not _globalvar and self.dtype.char not in 'SUbc':
1912 	            raise ValueError("Can only create a chararray from
string data.")
1913 	
1914 	    def __getitem__(self, obj):
1915 	        val = ndarray.__getitem__(self, obj)
1916 ->	        if issubclass(val.dtype.type, character) and not _len(val) == 0:
1917 	            temp = val.rstrip()
1918 	            if _len(temp) == 0:
1919 	                val = ''
1920 	            else:
1921 	                val = temp

and here is some debugging info:

(Pdb) p self
(Pdb) p obj
(0, 0)
(Pdb) p val
'abc'
(Pdb) p type(val)
<class 'str'>

So "val" is a Python string, which of course doesn't have .dtype. What
I don't understand yet is why

val = ndarray.__getitem__(self, obj)

returns a Python string. I've been debugging it for a few hours
yesterday, but so far no luck.

Then there are failures in the test_unicode.py of the following type:

======================================================================
FAIL: Check byteorder of single-dimensional objects
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
line 286, in test_valuesSD
    self.assertTrue(ua[0] != ua2[0])
AssertionError: False is not true

I didn't dig into those yet.

If anyone has any ideas, let me know.

Ondrej
Ondřej Čertík | 28 Jul 2012 17:04
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Sat, Jul 28, 2012 at 7:58 AM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
[...]
> Here is the code in defchararray.py:
>
>
> 1911            if not _globalvar and self.dtype.char not in 'SUbc':
> 1912                raise ValueError("Can only create a chararray from
> string data.")
> 1913
> 1914        def __getitem__(self, obj):
> 1915            val = ndarray.__getitem__(self, obj)
> 1916 ->         if issubclass(val.dtype.type, character) and not _len(val) == 0:
> 1917                temp = val.rstrip()
> 1918                if _len(temp) == 0:
> 1919                    val = ''
> 1920                else:
> 1921                    val = temp
>
>
> and here is some debugging info:
>

Python 3.3:

>
> (Pdb) p self
> (Pdb) p obj
> (0, 0)
> (Pdb) p val
> 'abc'
> (Pdb) p type(val)
> <class 'str'>

Python 3.2:

(Pdb) p self
chararray([['abc', '123'],
       ['789', 'xyz']],
      dtype='<U3')
(Pdb) p obj
(0, 0)
(Pdb) p val
'abc'
(Pdb) p type(val)
<class 'numpy.str_'>

So I think there might be some conversion issues int the chararray,
that instead of using numpy.str_, it uses Python's str.
Weird.

Ondrej
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ondřej Čertík | 28 Jul 2012 17:12
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Sat, Jul 28, 2012 at 8:04 AM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
> On Sat, Jul 28, 2012 at 7:58 AM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
> [...]
>> Here is the code in defchararray.py:
>>
>>
>> 1911            if not _globalvar and self.dtype.char not in 'SUbc':
>> 1912                raise ValueError("Can only create a chararray from
>> string data.")
>> 1913
>> 1914        def __getitem__(self, obj):
>> 1915            val = ndarray.__getitem__(self, obj)
>> 1916 ->         if issubclass(val.dtype.type, character) and not _len(val) == 0:
>> 1917                temp = val.rstrip()
>> 1918                if _len(temp) == 0:
>> 1919                    val = ''
>> 1920                else:
>> 1921                    val = temp
>>
>>
>> and here is some debugging info:
>>
>
> Python 3.3:
>
>>
>> (Pdb) p self
>> (Pdb) p obj
>> (0, 0)
>> (Pdb) p val
>> 'abc'
>> (Pdb) p type(val)
>> <class 'str'>
>
> Python 3.2:
>
> (Pdb) p self
> chararray([['abc', '123'],
>        ['789', 'xyz']],
>       dtype='<U3')
> (Pdb) p obj
> (0, 0)
> (Pdb) p val
> 'abc'
> (Pdb) p type(val)
> <class 'numpy.str_'>
>
>
> So I think there might be some conversion issues int the chararray,
> that instead of using numpy.str_, it uses Python's str.
> Weird.

Ok, found this minimal example of the problem. Python 3.3:

>>> from numpy import array
>>> a = array(["123", "abc"])
>>> a
array(['123', 'abc'],
      dtype='<U3')
>>> a[0]
'123'
>>> type(a[0])
<class 'str'>

Python 3.2:

>>> from numpy import array
>>> a = array(["123", "abc"])
>>> a
array(['123', 'abc'],
      dtype='<U3')
>>> a[0]
'123'
>>> type(a[0])
<class 'numpy.str_'>

So at some point, the strings get converted to numpy strings in 3.2,
but not in 3.3.

Ondrej
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Stefan Krah | 28 Jul 2012 20:19

Re: Status of NumPy and Python 3.3

Ond??ej ??ert??k <ondrej.certik <at> gmail.com> wrote:
> So at some point, the strings get converted to numpy strings in 3.2,
> but not in 3.3.

PyArray_Scalar() must return a subtype of PyUnicodeObject. I'm boldly
assuming that data is in utf-32. If so, then this unoptimized version
should work:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/scalarapi.c
index 2e255c0..c134aed 100644
--- a/numpy/core/src/multiarray/scalarapi.c
+++ b/numpy/core/src/multiarray/scalarapi.c
 <at>  <at>  -643,7 +643,20  <at>  <at>  PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *base)
     }
 #if PY_VERSION_HEX >= 0x03030000
     if (type_num == NPY_UNICODE) {
-        return PyUnicode_FromKindAndData(PyUnicode_4BYTE_KIND, data, itemsize/4);
+        PyObject *b, *args;
+        b = PyBytes_FromStringAndSize(data, itemsize);
+        if (b == NULL) {
+            return NULL;
+        }
+        args = Py_BuildValue("(Os)", b, "utf-32");
+        if (args == NULL) {
+            Py_DECREF(b);
+            return NULL;
+        }
+        obj = type->tp_new(type, args, NULL);
+        Py_DECREF(b);
+        Py_DECREF(args);
+        return obj;
     }
 #endif
     if (type->tp_itemsize != 0) {

Stefan Krah
Ondřej Čertík | 28 Jul 2012 22:43
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Sat, Jul 28, 2012 at 11:19 AM, Stefan Krah
<stefan-usenet <at> bytereef.org> wrote:
> Ond??ej ??ert??k <ondrej.certik <at> gmail.com> wrote:
>> So at some point, the strings get converted to numpy strings in 3.2,
>> but not in 3.3.
>
> PyArray_Scalar() must return a subtype of PyUnicodeObject. I'm boldly
> assuming that data is in utf-32. If so, then this unoptimized version
> should work:
>
> diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/scalarapi.c
> index 2e255c0..c134aed 100644
> --- a/numpy/core/src/multiarray/scalarapi.c
> +++ b/numpy/core/src/multiarray/scalarapi.c
>  <at>  <at>  -643,7 +643,20  <at>  <at>  PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *base)
>      }
>  #if PY_VERSION_HEX >= 0x03030000
>      if (type_num == NPY_UNICODE) {
> -        return PyUnicode_FromKindAndData(PyUnicode_4BYTE_KIND, data, itemsize/4);

Why doesn't PyUnicode_FromKindAndData return a subtype of PyUnicodeObject?

http://docs.python.org/dev/c-api/unicode.html#PyUnicode_FromKindAndData

> +        PyObject *b, *args;
> +        b = PyBytes_FromStringAndSize(data, itemsize);
> +        if (b == NULL) {
> +            return NULL;
> +        }
> +        args = Py_BuildValue("(Os)", b, "utf-32");
> +        if (args == NULL) {
> +            Py_DECREF(b);
> +            return NULL;
> +        }
> +        obj = type->tp_new(type, args, NULL);
> +        Py_DECREF(b);
> +        Py_DECREF(args);
> +        return obj;
>      }
>  #endif
>      if (type->tp_itemsize != 0) {

Nice!! I pushed your patch into the PR, now it works great in Python
3.3. There are still other failures:

https://gist.github.com/3194707

But this particular bug is fixed.

Thanks for your help!

Ondrej
Ondřej Čertík | 29 Jul 2012 00:04
Picon
Gravatar

Re: Status of NumPy and Python 3.3

Many of the failures in
https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
are of the type:

======================================================================
FAIL: Check byteorder of single-dimensional objects
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
line 286, in test_valuesSD
    self.assertTrue(ua[0] != ua2[0])
AssertionError: False is not true

and those are caused by the following minimal example:

Python 3.2:

>>> from numpy import array
>>> a = array(["abc"])
>>> b = a.newbyteorder()
>>> a.dtype
dtype('<U3')
>>> b.dtype
dtype('>U3')
>>> a[0].dtype
dtype('<U3')
>>> b[0].dtype
dtype('<U6')
>>> a[0] == b[0]
False
>>> a[0]
'abc'
>>> b[0]
'ៀ\udc00埀\udc00韀\udc00'

Python 3.3:

>>> from numpy import array
>>> a = array(["abc"])
>>> b = a.newbyteorder()
>>> a.dtype
dtype('<U3')
>>> b.dtype
dtype('>U3')
>>> a[0].dtype
dtype('<U3')
>>> b[0].dtype
dtype('<U3')
>>> a[0] == b[0]
True
>>> a[0]
'abc'
>>> b[0]
'abc'

So somehow the newbyteorder() method doesn't change the dtype of the
elements in our new code.
This method is implemented in numpy/core/src/multiarray/descriptor.c
(I think), but so far I don't see
where the problem could be.

Any ideas?

Ondrej
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ondřej Čertík | 29 Jul 2012 00:31
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
> Many of the failures in
> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
> are of the type:
>
> ======================================================================
> FAIL: Check byteorder of single-dimensional objects
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
> line 286, in test_valuesSD
>     self.assertTrue(ua[0] != ua2[0])
> AssertionError: False is not true
>
>
> and those are caused by the following minimal example:
>
> Python 3.2:
>
>>>> from numpy import array
>>>> a = array(["abc"])
>>>> b = a.newbyteorder()
>>>> a.dtype
> dtype('<U3')
>>>> b.dtype
> dtype('>U3')
>>>> a[0].dtype
> dtype('<U3')
>>>> b[0].dtype
> dtype('<U6')
>>>> a[0] == b[0]
> False
>>>> a[0]
> 'abc'
>>>> b[0]
> 'ៀ\udc00埀\udc00韀\udc00'
>
>
> Python 3.3:
>
>
>>>> from numpy import array
>>>> a = array(["abc"])
>>>> b = a.newbyteorder()
>>>> a.dtype
> dtype('<U3')
>>>> b.dtype
> dtype('>U3')
>>>> a[0].dtype
> dtype('<U3')
>>>> b[0].dtype
> dtype('<U3')
>>>> a[0] == b[0]
> True
>>>> a[0]
> 'abc'
>>>> b[0]
> 'abc'
>
>
> So somehow the newbyteorder() method doesn't change the dtype of the
> elements in our new code.
> This method is implemented in numpy/core/src/multiarray/descriptor.c
> (I think), but so far I don't see
> where the problem could be.
>
> Any ideas?

Ok, after some investigating, I think we need to do something along these lines:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
index c134aed..daf7fc4 100644
--- a/numpy/core/src/multiarray/scalarapi.c
+++ b/numpy/core/src/multiarray/scalarapi.c
 <at>  <at>  -644,7 +644,20  <at>  <at>  PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
 #if PY_VERSION_HEX >= 0x03030000
     if (type_num == NPY_UNICODE) {
         PyObject *b, *args;
-        b = PyBytes_FromStringAndSize(data, itemsize);
+        if (swap) {
+            char *buffer;
+            buffer = malloc(itemsize);
+            if (buffer == NULL) {
+                PyErr_NoMemory();
+            }
+            memcpy(buffer, data, itemsize);
+            byte_swap_vector(buffer, itemsize, 4);
+            b = PyBytes_FromStringAndSize(buffer, itemsize);
+            // We have to deallocate this later, otherwise we get a segfault...
+            //free(buffer);
+        } else {
+            b = PyBytes_FromStringAndSize(data, itemsize);
+        }
         if (b == NULL) {
             return NULL;
         }

This particular implementation still fails though:

>>> from numpy import array
>>> a = array(["abc"])
>>> b = a.newbyteorder()
>>> a.dtype
dtype('<U3')
>>> b.dtype
dtype('>U3')
>>> a[0].dtype
dtype('<U3')
>>> b[0].dtype
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)
>>> a[0] == b[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)
>>> a[0]
'abc'
>>> b[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)

But I think that we simply need to take into account the "swap" flag.

Ondrej
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ondřej Čertík | 29 Jul 2012 02:09
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
> On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
>> Many of the failures in
>> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
>> are of the type:
>>
>> ======================================================================
>> FAIL: Check byteorder of single-dimensional objects
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>   File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
>> line 286, in test_valuesSD
>>     self.assertTrue(ua[0] != ua2[0])
>> AssertionError: False is not true
>>
>>
>> and those are caused by the following minimal example:
>>
>> Python 3.2:
>>
>>>>> from numpy import array
>>>>> a = array(["abc"])
>>>>> b = a.newbyteorder()
>>>>> a.dtype
>> dtype('<U3')
>>>>> b.dtype
>> dtype('>U3')
>>>>> a[0].dtype
>> dtype('<U3')
>>>>> b[0].dtype
>> dtype('<U6')
>>>>> a[0] == b[0]
>> False
>>>>> a[0]
>> 'abc'
>>>>> b[0]
>> 'ៀ\udc00埀\udc00韀\udc00'
>>
>>
>> Python 3.3:
>>
>>
>>>>> from numpy import array
>>>>> a = array(["abc"])
>>>>> b = a.newbyteorder()
>>>>> a.dtype
>> dtype('<U3')
>>>>> b.dtype
>> dtype('>U3')
>>>>> a[0].dtype
>> dtype('<U3')
>>>>> b[0].dtype
>> dtype('<U3')
>>>>> a[0] == b[0]
>> True
>>>>> a[0]
>> 'abc'
>>>>> b[0]
>> 'abc'
>>
>>
>> So somehow the newbyteorder() method doesn't change the dtype of the
>> elements in our new code.
>> This method is implemented in numpy/core/src/multiarray/descriptor.c
>> (I think), but so far I don't see
>> where the problem could be.
>>
>> Any ideas?
>
> Ok, after some investigating, I think we need to do something along these lines:
>
> diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
> index c134aed..daf7fc4 100644
> --- a/numpy/core/src/multiarray/scalarapi.c
> +++ b/numpy/core/src/multiarray/scalarapi.c
>  <at>  <at>  -644,7 +644,20  <at>  <at>  PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
>  #if PY_VERSION_HEX >= 0x03030000
>      if (type_num == NPY_UNICODE) {
>          PyObject *b, *args;
> -        b = PyBytes_FromStringAndSize(data, itemsize);
> +        if (swap) {
> +            char *buffer;
> +            buffer = malloc(itemsize);
> +            if (buffer == NULL) {
> +                PyErr_NoMemory();
> +            }
> +            memcpy(buffer, data, itemsize);
> +            byte_swap_vector(buffer, itemsize, 4);
> +            b = PyBytes_FromStringAndSize(buffer, itemsize);
> +            // We have to deallocate this later, otherwise we get a segfault...
> +            //free(buffer);
> +        } else {
> +            b = PyBytes_FromStringAndSize(data, itemsize);
> +        }
>          if (b == NULL) {
>              return NULL;
>          }
>
> This particular implementation still fails though:
>
>
>>>> from numpy import array
>>>> a = array(["abc"])
>>>> b = a.newbyteorder()
>>>> a.dtype
> dtype('<U3')
>>>> b.dtype
> dtype('>U3')
>>>> a[0].dtype
> dtype('<U3')
>>>> b[0].dtype
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
> codepoint not in range(0x110000)
>>>> a[0] == b[0]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
> codepoint not in range(0x110000)
>>>> a[0]
> 'abc'
>>>> b[0]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
> codepoint not in range(0x110000)
>
>
>
> But I think that we simply need to take into account the "swap" flag.

Ok, so first of all, I tried to disable the swapping in Python 3.2:

                if (swap) {
                    byte_swap_vector(buffer, itemsize >> 2, 4);
                }

And then it behaves *exactly* as in Python 3.3. So I am pretty sure
that the problem is right there and something
along the lines of my patch above should fix it. I had a few bugs
there, here is the correct version:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
index c134aed..bed73f7 100644
--- a/numpy/core/src/multiarray/scalarapi.c
+++ b/numpy/core/src/multiarray/scalarapi.c
 <at>  <at>  -644,7 +644,19  <at>  <at>  PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
 #if PY_VERSION_HEX >= 0x03030000
     if (type_num == NPY_UNICODE) {
         PyObject *b, *args;
-        b = PyBytes_FromStringAndSize(data, itemsize);
+        if (swap) {
+            char *buffer;
+            buffer = malloc(itemsize);
+            if (buffer == NULL) {
+                PyErr_NoMemory();
+            }
+            memcpy(buffer, data, itemsize);
+            byte_swap_vector(buffer, itemsize >> 2, 4);
+            b = PyBytes_FromStringAndSize(buffer, itemsize);
+            free(buffer);
+        } else {
+            b = PyBytes_FromStringAndSize(data, itemsize);
+        }
         if (b == NULL) {
             return NULL;
         }

That works well, except that it gives the UnicodeDecodeError:

>>> b[0].dtype
NULL
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)

This error is actually triggered by this line:

        obj = type->tp_new(type, args, NULL);

in the patch by Stefan above. So I think what is happening is that it
simply tries to convert it from bytes
to a string and fails. That makes great sense. The question is why
doesn't it fail in exactly the same way
in Python 3.2? I think it's because the conversion check is bypassed
somehow. Stefan, I think
we need to swap it after the object is created. I am still
experimenting with this.

Ondrej
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ondřej Čertík | 29 Jul 2012 03:09
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
> On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
>> On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
>>> Many of the failures in
>>> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
>>> are of the type:
>>>
>>> ======================================================================
>>> FAIL: Check byteorder of single-dimensional objects
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>>   File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
>>> line 286, in test_valuesSD
>>>     self.assertTrue(ua[0] != ua2[0])
>>> AssertionError: False is not true
>>>
>>>
>>> and those are caused by the following minimal example:
>>>
>>> Python 3.2:
>>>
>>>>>> from numpy import array
>>>>>> a = array(["abc"])
>>>>>> b = a.newbyteorder()
>>>>>> a.dtype
>>> dtype('<U3')
>>>>>> b.dtype
>>> dtype('>U3')
>>>>>> a[0].dtype
>>> dtype('<U3')
>>>>>> b[0].dtype
>>> dtype('<U6')
>>>>>> a[0] == b[0]
>>> False
>>>>>> a[0]
>>> 'abc'
>>>>>> b[0]
>>> 'ៀ\udc00埀\udc00韀\udc00'
>>>
>>>
>>> Python 3.3:
>>>
>>>
>>>>>> from numpy import array
>>>>>> a = array(["abc"])
>>>>>> b = a.newbyteorder()
>>>>>> a.dtype
>>> dtype('<U3')
>>>>>> b.dtype
>>> dtype('>U3')
>>>>>> a[0].dtype
>>> dtype('<U3')
>>>>>> b[0].dtype
>>> dtype('<U3')
>>>>>> a[0] == b[0]
>>> True
>>>>>> a[0]
>>> 'abc'
>>>>>> b[0]
>>> 'abc'
>>>
>>>
>>> So somehow the newbyteorder() method doesn't change the dtype of the
>>> elements in our new code.
>>> This method is implemented in numpy/core/src/multiarray/descriptor.c
>>> (I think), but so far I don't see
>>> where the problem could be.
>>>
>>> Any ideas?
>>
>> Ok, after some investigating, I think we need to do something along these lines:
>>
>> diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
>> index c134aed..daf7fc4 100644
>> --- a/numpy/core/src/multiarray/scalarapi.c
>> +++ b/numpy/core/src/multiarray/scalarapi.c
>>  <at>  <at>  -644,7 +644,20  <at>  <at>  PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
>>  #if PY_VERSION_HEX >= 0x03030000
>>      if (type_num == NPY_UNICODE) {
>>          PyObject *b, *args;
>> -        b = PyBytes_FromStringAndSize(data, itemsize);
>> +        if (swap) {
>> +            char *buffer;
>> +            buffer = malloc(itemsize);
>> +            if (buffer == NULL) {
>> +                PyErr_NoMemory();
>> +            }
>> +            memcpy(buffer, data, itemsize);
>> +            byte_swap_vector(buffer, itemsize, 4);
>> +            b = PyBytes_FromStringAndSize(buffer, itemsize);
>> +            // We have to deallocate this later, otherwise we get a segfault...
>> +            //free(buffer);
>> +        } else {
>> +            b = PyBytes_FromStringAndSize(data, itemsize);
>> +        }
>>          if (b == NULL) {
>>              return NULL;
>>          }
>>
>> This particular implementation still fails though:
>>
>>
>>>>> from numpy import array
>>>>> a = array(["abc"])
>>>>> b = a.newbyteorder()
>>>>> a.dtype
>> dtype('<U3')
>>>>> b.dtype
>> dtype('>U3')
>>>>> a[0].dtype
>> dtype('<U3')
>>>>> b[0].dtype
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>> codepoint not in range(0x110000)
>>>>> a[0] == b[0]
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>> codepoint not in range(0x110000)
>>>>> a[0]
>> 'abc'
>>>>> b[0]
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>> codepoint not in range(0x110000)
>>
>>
>>
>> But I think that we simply need to take into account the "swap" flag.
>
> Ok, so first of all, I tried to disable the swapping in Python 3.2:
>
>                 if (swap) {
>                     byte_swap_vector(buffer, itemsize >> 2, 4);
>                 }
>
> And then it behaves *exactly* as in Python 3.3. So I am pretty sure
> that the problem is right there and something
> along the lines of my patch above should fix it. I had a few bugs
> there, here is the correct version:
>
> diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
> index c134aed..bed73f7 100644
> --- a/numpy/core/src/multiarray/scalarapi.c
> +++ b/numpy/core/src/multiarray/scalarapi.c
>  <at>  <at>  -644,7 +644,19  <at>  <at>  PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
>  #if PY_VERSION_HEX >= 0x03030000
>      if (type_num == NPY_UNICODE) {
>          PyObject *b, *args;
> -        b = PyBytes_FromStringAndSize(data, itemsize);
> +        if (swap) {
> +            char *buffer;
> +            buffer = malloc(itemsize);
> +            if (buffer == NULL) {
> +                PyErr_NoMemory();
> +            }
> +            memcpy(buffer, data, itemsize);
> +            byte_swap_vector(buffer, itemsize >> 2, 4);
> +            b = PyBytes_FromStringAndSize(buffer, itemsize);
> +            free(buffer);
> +        } else {
> +            b = PyBytes_FromStringAndSize(data, itemsize);
> +        }
>          if (b == NULL) {
>              return NULL;
>          }
>
>
> That works well, except that it gives the UnicodeDecodeError:
>
>>>> b[0].dtype
> NULL
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
> codepoint not in range(0x110000)
>
> This error is actually triggered by this line:
>
>
>         obj = type->tp_new(type, args, NULL);
>
> in the patch by Stefan above. So I think what is happening is that it
> simply tries to convert it from bytes
> to a string and fails. That makes great sense. The question is why
> doesn't it fail in exactly the same way
> in Python 3.2? I think it's because the conversion check is bypassed
> somehow. Stefan, I think
> we need to swap it after the object is created. I am still
> experimenting with this.

Well, I simply went to the Python sources and then implemented a
solution that works with this patch:

https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27654

So now the PR actually seems to work. The rest of the failures are here:

https://gist.github.com/3195520

and they seem to be unrelated. Can somebody please review this PR?

https://github.com/numpy/numpy/pull/366

I will squash the commits after it's reviewed (I want to keep the
history there for now).

Ondrej
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Christoph Gohlke | 29 Jul 2012 03:17
Picon
Favicon

Re: Status of NumPy and Python 3.3

On 7/28/2012 6:09 PM, Ondřej Čertík wrote:
> On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
>> On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
>>> On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
>>>> Many of the failures in
>>>> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
>>>> are of the type:
>>>>
>>>> ======================================================================
>>>> FAIL: Check byteorder of single-dimensional objects
>>>> ----------------------------------------------------------------------
>>>> Traceback (most recent call last):
>>>>    File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
>>>> line 286, in test_valuesSD
>>>>      self.assertTrue(ua[0] != ua2[0])
>>>> AssertionError: False is not true
>>>>
>>>>
>>>> and those are caused by the following minimal example:
>>>>
>>>> Python 3.2:
>>>>
>>>>>>> from numpy import array
>>>>>>> a = array(["abc"])
>>>>>>> b = a.newbyteorder()
>>>>>>> a.dtype
>>>> dtype('<U3')
>>>>>>> b.dtype
>>>> dtype('>U3')
>>>>>>> a[0].dtype
>>>> dtype('<U3')
>>>>>>> b[0].dtype
>>>> dtype('<U6')
>>>>>>> a[0] == b[0]
>>>> False
>>>>>>> a[0]
>>>> 'abc'
>>>>>>> b[0]
>>>> 'ៀ\udc00埀\udc00韀\udc00'
>>>>
>>>>
>>>> Python 3.3:
>>>>
>>>>
>>>>>>> from numpy import array
>>>>>>> a = array(["abc"])
>>>>>>> b = a.newbyteorder()
>>>>>>> a.dtype
>>>> dtype('<U3')
>>>>>>> b.dtype
>>>> dtype('>U3')
>>>>>>> a[0].dtype
>>>> dtype('<U3')
>>>>>>> b[0].dtype
>>>> dtype('<U3')
>>>>>>> a[0] == b[0]
>>>> True
>>>>>>> a[0]
>>>> 'abc'
>>>>>>> b[0]
>>>> 'abc'
>>>>
>>>>
>>>> So somehow the newbyteorder() method doesn't change the dtype of the
>>>> elements in our new code.
>>>> This method is implemented in numpy/core/src/multiarray/descriptor.c
>>>> (I think), but so far I don't see
>>>> where the problem could be.
>>>>
>>>> Any ideas?
>>>
>>> Ok, after some investigating, I think we need to do something along these lines:
>>>
>>> diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
>>> index c134aed..daf7fc4 100644
>>> --- a/numpy/core/src/multiarray/scalarapi.c
>>> +++ b/numpy/core/src/multiarray/scalarapi.c
>>>  <at>  <at>  -644,7 +644,20  <at>  <at>  PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
>>>   #if PY_VERSION_HEX >= 0x03030000
>>>       if (type_num == NPY_UNICODE) {
>>>           PyObject *b, *args;
>>> -        b = PyBytes_FromStringAndSize(data, itemsize);
>>> +        if (swap) {
>>> +            char *buffer;
>>> +            buffer = malloc(itemsize);
>>> +            if (buffer == NULL) {
>>> +                PyErr_NoMemory();
>>> +            }
>>> +            memcpy(buffer, data, itemsize);
>>> +            byte_swap_vector(buffer, itemsize, 4);
>>> +            b = PyBytes_FromStringAndSize(buffer, itemsize);
>>> +            // We have to deallocate this later, otherwise we get a segfault...
>>> +            //free(buffer);
>>> +        } else {
>>> +            b = PyBytes_FromStringAndSize(data, itemsize);
>>> +        }
>>>           if (b == NULL) {
>>>               return NULL;
>>>           }
>>>
>>> This particular implementation still fails though:
>>>
>>>
>>>>>> from numpy import array
>>>>>> a = array(["abc"])
>>>>>> b = a.newbyteorder()
>>>>>> a.dtype
>>> dtype('<U3')
>>>>>> b.dtype
>>> dtype('>U3')
>>>>>> a[0].dtype
>>> dtype('<U3')
>>>>>> b[0].dtype
>>> Traceback (most recent call last):
>>>    File "<stdin>", line 1, in <module>
>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>> codepoint not in range(0x110000)
>>>>>> a[0] == b[0]
>>> Traceback (most recent call last):
>>>    File "<stdin>", line 1, in <module>
>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>> codepoint not in range(0x110000)
>>>>>> a[0]
>>> 'abc'
>>>>>> b[0]
>>> Traceback (most recent call last):
>>>    File "<stdin>", line 1, in <module>
>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>> codepoint not in range(0x110000)
>>>
>>>
>>>
>>> But I think that we simply need to take into account the "swap" flag.
>>
>> Ok, so first of all, I tried to disable the swapping in Python 3.2:
>>
>>                  if (swap) {
>>                      byte_swap_vector(buffer, itemsize >> 2, 4);
>>                  }
>>
>> And then it behaves *exactly* as in Python 3.3. So I am pretty sure
>> that the problem is right there and something
>> along the lines of my patch above should fix it. I had a few bugs
>> there, here is the correct version:
>>
>> diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
>> index c134aed..bed73f7 100644
>> --- a/numpy/core/src/multiarray/scalarapi.c
>> +++ b/numpy/core/src/multiarray/scalarapi.c
>>  <at>  <at>  -644,7 +644,19  <at>  <at>  PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
>>   #if PY_VERSION_HEX >= 0x03030000
>>       if (type_num == NPY_UNICODE) {
>>           PyObject *b, *args;
>> -        b = PyBytes_FromStringAndSize(data, itemsize);
>> +        if (swap) {
>> +            char *buffer;
>> +            buffer = malloc(itemsize);
>> +            if (buffer == NULL) {
>> +                PyErr_NoMemory();
>> +            }
>> +            memcpy(buffer, data, itemsize);
>> +            byte_swap_vector(buffer, itemsize >> 2, 4);
>> +            b = PyBytes_FromStringAndSize(buffer, itemsize);
>> +            free(buffer);
>> +        } else {
>> +            b = PyBytes_FromStringAndSize(data, itemsize);
>> +        }
>>           if (b == NULL) {
>>               return NULL;
>>           }
>>
>>
>> That works well, except that it gives the UnicodeDecodeError:
>>
>>>>> b[0].dtype
>> NULL
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>> codepoint not in range(0x110000)
>>
>> This error is actually triggered by this line:
>>
>>
>>          obj = type->tp_new(type, args, NULL);
>>
>> in the patch by Stefan above. So I think what is happening is that it
>> simply tries to convert it from bytes
>> to a string and fails. That makes great sense. The question is why
>> doesn't it fail in exactly the same way
>> in Python 3.2? I think it's because the conversion check is bypassed
>> somehow. Stefan, I think
>> we need to swap it after the object is created. I am still
>> experimenting with this.
>
> Well, I simply went to the Python sources and then implemented a
> solution that works with this patch:
>
> https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27654
>
> So now the PR actually seems to work. The rest of the failures are here:
>
> https://gist.github.com/3195520
>
> and they seem to be unrelated. Can somebody please review this PR?
>
> https://github.com/numpy/numpy/pull/366
>
>
> I will squash the commits after it's reviewed (I want to keep the
> history there for now).
>
>
> Ondrej

Thank you. I backported the PR to numpy 1.6.2 and it works for me on 
win-amd64-py3.3 with the msvc10 compiler. I get the same 5 test failures 
of the kind:

AssertionError:
Items are not equal:
  ACTUAL: ()
  DESIRED: None

Christoph
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Christoph Gohlke | 29 Jul 2012 08:25
Picon
Favicon

Re: Status of NumPy and Python 3.3

On 7/28/2012 6:17 PM, Christoph Gohlke wrote:
> On 7/28/2012 6:09 PM, Ondřej Čertík wrote:
>> On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
>>> On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
>>>> On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
>>>>> Many of the failures in
>>>>> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
>>>>> are of the type:
>>>>>
>>>>> ======================================================================
>>>>> FAIL: Check byteorder of single-dimensional objects
>>>>> ----------------------------------------------------------------------
>>>>> Traceback (most recent call last):
>>>>>     File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
>>>>> line 286, in test_valuesSD
>>>>>       self.assertTrue(ua[0] != ua2[0])
>>>>> AssertionError: False is not true
>>>>>
>>>>>
>>>>> and those are caused by the following minimal example:
>>>>>
>>>>> Python 3.2:
>>>>>
>>>>>>>> from numpy import array
>>>>>>>> a = array(["abc"])
>>>>>>>> b = a.newbyteorder()
>>>>>>>> a.dtype
>>>>> dtype('<U3')
>>>>>>>> b.dtype
>>>>> dtype('>U3')
>>>>>>>> a[0].dtype
>>>>> dtype('<U3')
>>>>>>>> b[0].dtype
>>>>> dtype('<U6')
>>>>>>>> a[0] == b[0]
>>>>> False
>>>>>>>> a[0]
>>>>> 'abc'
>>>>>>>> b[0]
>>>>> 'ៀ\udc00埀\udc00韀\udc00'
>>>>>
>>>>>
>>>>> Python 3.3:
>>>>>
>>>>>
>>>>>>>> from numpy import array
>>>>>>>> a = array(["abc"])
>>>>>>>> b = a.newbyteorder()
>>>>>>>> a.dtype
>>>>> dtype('<U3')
>>>>>>>> b.dtype
>>>>> dtype('>U3')
>>>>>>>> a[0].dtype
>>>>> dtype('<U3')
>>>>>>>> b[0].dtype
>>>>> dtype('<U3')
>>>>>>>> a[0] == b[0]
>>>>> True
>>>>>>>> a[0]
>>>>> 'abc'
>>>>>>>> b[0]
>>>>> 'abc'
>>>>>
>>>>>
>>>>> So somehow the newbyteorder() method doesn't change the dtype of the
>>>>> elements in our new code.
>>>>> This method is implemented in numpy/core/src/multiarray/descriptor.c
>>>>> (I think), but so far I don't see
>>>>> where the problem could be.
>>>>>
>>>>> Any ideas?
>>>>
>>>> Ok, after some investigating, I think we need to do something along these lines:
>>>>
>>>> diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
>>>> index c134aed..daf7fc4 100644
>>>> --- a/numpy/core/src/multiarray/scalarapi.c
>>>> +++ b/numpy/core/src/multiarray/scalarapi.c
>>>>  <at>  <at>  -644,7 +644,20  <at>  <at>  PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
>>>>    #if PY_VERSION_HEX >= 0x03030000
>>>>        if (type_num == NPY_UNICODE) {
>>>>            PyObject *b, *args;
>>>> -        b = PyBytes_FromStringAndSize(data, itemsize);
>>>> +        if (swap) {
>>>> +            char *buffer;
>>>> +            buffer = malloc(itemsize);
>>>> +            if (buffer == NULL) {
>>>> +                PyErr_NoMemory();
>>>> +            }
>>>> +            memcpy(buffer, data, itemsize);
>>>> +            byte_swap_vector(buffer, itemsize, 4);
>>>> +            b = PyBytes_FromStringAndSize(buffer, itemsize);
>>>> +            // We have to deallocate this later, otherwise we get a segfault...
>>>> +            //free(buffer);
>>>> +        } else {
>>>> +            b = PyBytes_FromStringAndSize(data, itemsize);
>>>> +        }
>>>>            if (b == NULL) {
>>>>                return NULL;
>>>>            }
>>>>
>>>> This particular implementation still fails though:
>>>>
>>>>
>>>>>>> from numpy import array
>>>>>>> a = array(["abc"])
>>>>>>> b = a.newbyteorder()
>>>>>>> a.dtype
>>>> dtype('<U3')
>>>>>>> b.dtype
>>>> dtype('>U3')
>>>>>>> a[0].dtype
>>>> dtype('<U3')
>>>>>>> b[0].dtype
>>>> Traceback (most recent call last):
>>>>     File "<stdin>", line 1, in <module>
>>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>>> codepoint not in range(0x110000)
>>>>>>> a[0] == b[0]
>>>> Traceback (most recent call last):
>>>>     File "<stdin>", line 1, in <module>
>>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>>> codepoint not in range(0x110000)
>>>>>>> a[0]
>>>> 'abc'
>>>>>>> b[0]
>>>> Traceback (most recent call last):
>>>>     File "<stdin>", line 1, in <module>
>>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>>> codepoint not in range(0x110000)
>>>>
>>>>
>>>>
>>>> But I think that we simply need to take into account the "swap" flag.
>>>
>>> Ok, so first of all, I tried to disable the swapping in Python 3.2:
>>>
>>>                   if (swap) {
>>>                       byte_swap_vector(buffer, itemsize >> 2, 4);
>>>                   }
>>>
>>> And then it behaves *exactly* as in Python 3.3. So I am pretty sure
>>> that the problem is right there and something
>>> along the lines of my patch above should fix it. I had a few bugs
>>> there, here is the correct version:
>>>
>>> diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
>>> index c134aed..bed73f7 100644
>>> --- a/numpy/core/src/multiarray/scalarapi.c
>>> +++ b/numpy/core/src/multiarray/scalarapi.c
>>>  <at>  <at>  -644,7 +644,19  <at>  <at>  PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
>>>    #if PY_VERSION_HEX >= 0x03030000
>>>        if (type_num == NPY_UNICODE) {
>>>            PyObject *b, *args;
>>> -        b = PyBytes_FromStringAndSize(data, itemsize);
>>> +        if (swap) {
>>> +            char *buffer;
>>> +            buffer = malloc(itemsize);
>>> +            if (buffer == NULL) {
>>> +                PyErr_NoMemory();
>>> +            }
>>> +            memcpy(buffer, data, itemsize);
>>> +            byte_swap_vector(buffer, itemsize >> 2, 4);
>>> +            b = PyBytes_FromStringAndSize(buffer, itemsize);
>>> +            free(buffer);
>>> +        } else {
>>> +            b = PyBytes_FromStringAndSize(data, itemsize);
>>> +        }
>>>            if (b == NULL) {
>>>                return NULL;
>>>            }
>>>
>>>
>>> That works well, except that it gives the UnicodeDecodeError:
>>>
>>>>>> b[0].dtype
>>> NULL
>>> Traceback (most recent call last):
>>>     File "<stdin>", line 1, in <module>
>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>> codepoint not in range(0x110000)
>>>
>>> This error is actually triggered by this line:
>>>
>>>
>>>           obj = type->tp_new(type, args, NULL);
>>>
>>> in the patch by Stefan above. So I think what is happening is that it
>>> simply tries to convert it from bytes
>>> to a string and fails. That makes great sense. The question is why
>>> doesn't it fail in exactly the same way
>>> in Python 3.2? I think it's because the conversion check is bypassed
>>> somehow. Stefan, I think
>>> we need to swap it after the object is created. I am still
>>> experimenting with this.
>>
>> Well, I simply went to the Python sources and then implemented a
>> solution that works with this patch:
>>
>> https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27654
>>
>> So now the PR actually seems to work. The rest of the failures are here:
>>
>> https://gist.github.com/3195520
>>
>> and they seem to be unrelated. Can somebody please review this PR?
>>
>> https://github.com/numpy/numpy/pull/366
>>
>>
>> I will squash the commits after it's reviewed (I want to keep the
>> history there for now).
>>
>>
>> Ondrej
>
>
> Thank you. I backported the PR to numpy 1.6.2 and it works for me on
> win-amd64-py3.3 with the msvc10 compiler. I get the same 5 test failures
> of the kind:
>
> AssertionError:
> Items are not equal:
>    ACTUAL: ()
>    DESIRED: None
>
>
> Christoph

Pull request #367 should fix the NewBufferProtocol test failures.

https://github.com/numpy/numpy/pull/367

Christoph
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Stefan Krah | 29 Jul 2012 12:40

Re: Status of NumPy and Python 3.3

Ond??ej ??ert??k <ondrej.certik <at> gmail.com> wrote:
> Well, I simply went to the Python sources and then implemented a
> solution that works with this patch:
> 
> https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27654

> https://github.com/numpy/numpy/pull/366

Nice! I hit the same problem yesterday: unicode_new() does not accept
byte-swapped input with an encoding, since the input is not valid. But
your solution circumvents the validation.

I'm not sure what the use case is for byte-swapped (invalid?) unicode
strings, but the approach looks good to me in the sense that it does
the same thing as the Py_UNICODE_WIDE path in 3.2.

In PyArray_Scalar() I only have these comments, two of which are stylistic:

   - I think the 'size' parameter in PyUnicode_New() refers to the number
     of code points (UCS4 in this case), so:

        PyUnicode_New(itemsize >> 2, max_char)

   - The 'b' variable could be renamed to 'u' now.

   - PyArray_Scalar() is beginning to look a little crowded. Perhaps the whole
     PY_VERSION_HEX >= 0x03030000 block could go into a separate function such
     as:

        NPY_NO_EXPORT PyObject *
        get_unicode_scalar_3_3(PyTypeObject *type, void *data, Py_ssize_t itemsize,
                               int swap);

Then there's another problem in numpy.test() if Python 3.3 is compiled
--with-pydebug:

.python3.3: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper: Assertion
`((((((PyObject*)(temp))->ob_type))->tp_flags & ((1L<<27))) != 0)' failed.
Aborted

Stefan Krah
Stefan Krah | 29 Jul 2012 13:52

Re: Status of NumPy and Python 3.3

Stefan Krah <stefan-usenet <at> bytereef.org> wrote:
> Then there's another problem in numpy.test() if Python 3.3 is compiled
> --with-pydebug:
> 
> .python3.3: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper: Assertion
`((((((PyObject*)(temp))->ob_type))->tp_flags & ((1L<<27))) != 0)' failed.
> Aborted

This also occurs with Python 3.2, so it's unrelated to the Unicode changes:

http://projects.scipy.org/numpy/ticket/2193

Stefan Krah
Stefan Krah | 29 Jul 2012 15:42

Re: Status of NumPy and Python 3.3

Stefan Krah <stefan-usenet <at> bytereef.org> wrote:
> > .python3.3: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper:
Assertion `((((((PyObject*)(temp))->ob_type))->tp_flags & ((1L<<27))) != 0)' failed.
> > Aborted
> 
> This also occurs with Python 3.2, so it's unrelated to the Unicode changes:
> 
> http://projects.scipy.org/numpy/ticket/2193

I've uploaded a patch for the issue.

Stefan Krah
Ondřej Čertík | 29 Jul 2012 17:50
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Sun, Jul 29, 2012 at 3:40 AM, Stefan Krah <stefan-usenet <at> bytereef.org> wrote:
> Ond??ej ??ert??k <ondrej.certik <at> gmail.com> wrote:
>> Well, I simply went to the Python sources and then implemented a
>> solution that works with this patch:
>>
>> https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27654
>
>> https://github.com/numpy/numpy/pull/366
>
>
> Nice! I hit the same problem yesterday: unicode_new() does not accept
> byte-swapped input with an encoding, since the input is not valid. But
> your solution circumvents the validation.
>
> I'm not sure what the use case is for byte-swapped (invalid?) unicode
> strings, but the approach looks good to me in the sense that it does
> the same thing as the Py_UNICODE_WIDE path in 3.2.
>
>
> In PyArray_Scalar() I only have these comments, two of which are stylistic:
>
>    - I think the 'size' parameter in PyUnicode_New() refers to the number
>      of code points (UCS4 in this case), so:
>
>         PyUnicode_New(itemsize >> 2, max_char)

Right. Done.

>
>    - The 'b' variable could be renamed to 'u' now.

Done.

>
>    - PyArray_Scalar() is beginning to look a little crowded. Perhaps the whole
>      PY_VERSION_HEX >= 0x03030000 block could go into a separate function such
>      as:
>
>         NPY_NO_EXPORT PyObject *
>         get_unicode_scalar_3_3(PyTypeObject *type, void *data, Py_ssize_t itemsize,
>                                int swap);

I didn't do this, as I think the function is fine as it is. If further
refactoring is needed, then one should
probably create 3 functions, one for 3.3, one for <3.3-wide and one
for <3.3-narrow.

I've also rebased and squashed the commits, so now it is ready to be merged:

https://github.com/numpy/numpy/pull/366

Thanks Stefan for your help.

Can somebody with push access please review it?

Ondrej
Stefan Krah | 29 Jul 2012 15:56

Re: Status of NumPy and Python 3.3

Ond??ej ??ert??k <ondrej.certik <at> gmail.com> wrote:
> https://github.com/numpy/numpy/pull/366

Using Python 3.3 compiled --with-pydebug it appears to be impossible
to fool the new Unicode implementation with byte-swapped data:

Apply the patch from:

http://projects.scipy.org/numpy/ticket/2193

Then:

Python 3.3.0b1 (default:68e2690a471d+, Jul 29 2012, 15:28:41) 
[GCC 4.4.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from numpy import array
[206376 refs]
>>> a = array(["abc"])
[206382 refs]
>>> b = a.newbyteorder()
[206387 refs]
>>> b
python3.3: Objects/unicodeobject.c:401: _PyUnicode_CheckConsistency: Assertion `maxchar <=
0x10ffff' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff71e6a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
        in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) 

This should be expected since the byte-swapped strings aren't valid.

Stefan Krah
Ondřej Čertík | 29 Jul 2012 17:12
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Sun, Jul 29, 2012 at 6:56 AM, Stefan Krah <stefan-usenet <at> bytereef.org> wrote:
> Ond??ej ??ert??k <ondrej.certik <at> gmail.com> wrote:
>> https://github.com/numpy/numpy/pull/366
>
> Using Python 3.3 compiled --with-pydebug it appears to be impossible
> to fool the new Unicode implementation with byte-swapped data:
>
>
> Apply the patch from:
>
> http://projects.scipy.org/numpy/ticket/2193
>
>
> Then:
>
> Python 3.3.0b1 (default:68e2690a471d+, Jul 29 2012, 15:28:41)
> [GCC 4.4.3] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> from numpy import array
> [206376 refs]
>>>> a = array(["abc"])
> [206382 refs]
>>>> b = a.newbyteorder()
> [206387 refs]
>>>> b
> python3.3: Objects/unicodeobject.c:401: _PyUnicode_CheckConsistency: Assertion `maxchar <=
0x10ffff' failed.
>
> Program received signal SIGABRT, Aborted.
> 0x00007ffff71e6a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> 64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>         in ../nptl/sysdeps/unix/sysv/linux/raise.c
> (gdb)
>
>
> This should be expected since the byte-swapped strings aren't valid.

Exactly, I am aware that my solution is a hack. So is the Python 3.2
solution, except that Python 3.2 doesn't seem to have
the _PyUnicode_CheckConsistency() function, so no checks are done.
As such, I think that my PR simply extends the numpy approach to Python 3.3.

A separate issue is that the swapping thing is a hack -- Travis, what
is the purpose of the newbyteorder() and the need to swap the
internals of the unicode object?

Ondrej
Stefan Krah | 29 Jul 2012 17:26

Re: Status of NumPy and Python 3.3

Ond??ej ??ert??k <ondrej.certik <at> gmail.com> wrote:
> > This should be expected since the byte-swapped strings aren't valid.
> 
> Exactly, I am aware that my solution is a hack. So is the Python 3.2
> solution, except that Python 3.2 doesn't seem to have
> the _PyUnicode_CheckConsistency() function, so no checks are done.
> As such, I think that my PR simply extends the numpy approach to Python 3.3.

Absolutely, I also think that using invalid Unicode strings in 3.2 looks kind
of hackish. -- Nothing wrong with your 3.3 implementation, it's the general
concept that I don't understand.

Stefan Krah
Ronan Lamy | 29 Jul 2012 23:27
Picon
Gravatar

Re: Status of NumPy and Python 3.3

Le samedi 28 juillet 2012 à 18:09 -0700, Ondřej Čertík a écrit :

> 
> So now the PR actually seems to work. The rest of the failures are here:
> 
> https://gist.github.com/3195520
> 
I wanted to have a look at the import errors in your previous gist. How
did you get rid of them? I can't even install numpy on 3.3 as setup.py
chokes on 'import numpy.distutils.core':

(py33)ronan <at> ronan-desktop:~/dev/numpy$ python setup.py install
Converting to Python3 via 2to3...
Running from numpy source directory.
/home/ronan/dev/numpy/py33/lib/python3.3/distutils/__init__.py:16:
ResourceWarning: unclosed file <_io.TextIOWrapper
name='/usr/local/lib/python3.3/distutils/__init__.py' mode='r'
encoding='UTF-8'>
  exec(open(os.path.join(distutils_path, '__init__.py')).read())
Traceback (most recent call last):
  File "setup.py", line 214, in <module>
    setup_package()
  File "setup.py", line 191, in setup_package
    from numpy.distutils.core import setup
  File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/core.py", line
25, in <module>
    from numpy.distutils.command import config, config_compiler, \
  File
"/home/ronan/dev/numpy/build/py3k/numpy/distutils/command/__init__.py",
line 17, in <module>
    __import__('distutils.command',globals(),locals(),distutils_all)
ImportError: No module named 'distutils.command.install_clib'

Actually, I don't even understand how this __import__() call can work on
earlier versions, nor what it's trying to achieve.

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ondřej Čertík | 29 Jul 2012 23:45
Picon
Gravatar

Re: Status of NumPy and Python 3.3

Hi Ronan!

On Sun, Jul 29, 2012 at 2:27 PM, Ronan Lamy <ronan.lamy <at> gmail.com> wrote:
> Le samedi 28 juillet 2012 à 18:09 -0700, Ondřej Čertík a écrit :
>
>>
>> So now the PR actually seems to work. The rest of the failures are here:
>>
>> https://gist.github.com/3195520
>>
> I wanted to have a look at the import errors in your previous gist. How
> did you get rid of them? I can't even install numpy on 3.3 as setup.py

Do you mean this gist:

https://gist.github.com/3194707/482382fb6fd6f0d756128d97ea6c892ddb31fff9

? I have incorrectly run the tests from the wrong directory and numpy
was picking up the wrong files to import --- I think either from the
numpy directory directly (there is a check for this though), or from
numpy/core or something, I don't remember anymore. So I then run the
tests from /tmp and posted the correct result into the same gist as a
new commit:

https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71

> chokes on 'import numpy.distutils.core':
>
> (py33)ronan <at> ronan-desktop:~/dev/numpy$ python setup.py install
> Converting to Python3 via 2to3...
> Running from numpy source directory.
> /home/ronan/dev/numpy/py33/lib/python3.3/distutils/__init__.py:16:
> ResourceWarning: unclosed file <_io.TextIOWrapper
> name='/usr/local/lib/python3.3/distutils/__init__.py' mode='r'
> encoding='UTF-8'>
>   exec(open(os.path.join(distutils_path, '__init__.py')).read())
> Traceback (most recent call last):
>   File "setup.py", line 214, in <module>
>     setup_package()
>   File "setup.py", line 191, in setup_package
>     from numpy.distutils.core import setup
>   File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/core.py", line
> 25, in <module>
>     from numpy.distutils.command import config, config_compiler, \
>   File
> "/home/ronan/dev/numpy/build/py3k/numpy/distutils/command/__init__.py",
> line 17, in <module>
>     __import__('distutils.command',globals(),locals(),distutils_all)
> ImportError: No module named 'distutils.command.install_clib'
>
> Actually, I don't even understand how this __import__() call can work on
> earlier versions, nor what it's trying to achieve.

That's weird, I've never seen this error before. Try to install numpy
using your regular Python like this:

python setup.py install --prefix /tmp

let's say. If it works, then something is wrong with your Python 3.3
installation. If you want to reproduce my setup, checkout my repo:

https://github.com/certik/python-3.3

and from inside it, run:

SPKG_LOCAL=`pwd`/xx MAKEFLAGS="-j4" sh spkg-install

(adjust the "-j4" flag, or remove it). You need a few packages
installed like zlib1g-dev and so on. Then install virtualenv by
downloading the tar.gz and from inside it doing
"/path/to/my/python-3.3/xx/bin/python3.3 setup.py install". Add the
file
/path/to/my/python-3.3/xx/bin/virtualenv-3.3 into your $PATH.

Then:

rm -rf $HOME/py33
virtualenv-3.3 $HOME/py33
. $HOME/py33/bin/activate

go to your numpy directory and do "python setup.py install". To run
tests, you also need to:

TMPDIR=/tmp/numpy-env
rm -rf $TMPDIR
mkdir $TMPDIR
cd $TMPDIR
tar xzf $tarballs/nose-1.1.2.tar.gz
cd nose-1.1.2
python setup.py install

using the virtualenv environment. When I tried to install nose into
the python installation in python-3.3./xx, then it failed...

Ondrej
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ronan Lamy | 30 Jul 2012 03:00
Picon
Gravatar

Re: Status of NumPy and Python 3.3

Le dimanche 29 juillet 2012 à 14:45 -0700, Ondřej Čertík a écrit :
> Hi Ronan!
> 
> On Sun, Jul 29, 2012 at 2:27 PM, Ronan Lamy <ronan.lamy <at> gmail.com> wrote:
> > Le samedi 28 juillet 2012 à 18:09 -0700, Ondřej Čertík a écrit :
> >
> >>
> >> So now the PR actually seems to work. The rest of the failures are here:
> >>
> >> https://gist.github.com/3195520
> >>
> > I wanted to have a look at the import errors in your previous gist. How
> > did you get rid of them? I can't even install numpy on 3.3 as setup.py
> 
> Do you mean this gist:
> 
> https://gist.github.com/3194707/482382fb6fd6f0d756128d97ea6c892ddb31fff9
> 
> ? I have incorrectly run the tests from the wrong directory and numpy
> was picking up the wrong files to import --- I think either from the
> numpy directory directly (there is a check for this though), or from
> numpy/core or something, I don't remember anymore. So I then run the
> tests from /tmp and posted the correct result into the same gist as a
> new commit:
> 
> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71

Ah, OK. False alarm, then. I'm on the lookout for import errors with
Python 3.3, as the import system has been completely rewritten and
anything that relied on undocumented behaviour is likely to break.
> 
> > chokes on 'import numpy.distutils.core':
> >
> > (py33)ronan <at> ronan-desktop:~/dev/numpy$ python setup.py install
> > Converting to Python3 via 2to3...
> > Running from numpy source directory.
> > /home/ronan/dev/numpy/py33/lib/python3.3/distutils/__init__.py:16:
> > ResourceWarning: unclosed file <_io.TextIOWrapper
> > name='/usr/local/lib/python3.3/distutils/__init__.py' mode='r'
> > encoding='UTF-8'>
> >   exec(open(os.path.join(distutils_path, '__init__.py')).read())
> > Traceback (most recent call last):
> >   File "setup.py", line 214, in <module>
> >     setup_package()
> >   File "setup.py", line 191, in setup_package
> >     from numpy.distutils.core import setup
> >   File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/core.py", line
> > 25, in <module>
> >     from numpy.distutils.command import config, config_compiler, \
> >   File
> > "/home/ronan/dev/numpy/build/py3k/numpy/distutils/command/__init__.py",
> > line 17, in <module>
> >     __import__('distutils.command',globals(),locals(),distutils_all)
> > ImportError: No module named 'distutils.command.install_clib'
> >
> > Actually, I don't even understand how this __import__() call can work on
> > earlier versions, nor what it's trying to achieve.
> 
> That's weird, I've never seen this error before. Try to install numpy
> using your regular Python like this:
> 
> python setup.py install --prefix /tmp
> 
> let's say. If it works, then something is wrong with your Python 3.3

I simply used a virtualenv (you might need to get the latest from PyPI),
roughly as follows:
virtualenv -p python3.3 py33
py33/bin/python setup.py install

It worked fine with 3.2 and 2.7, but not with 3.3. 

> installation. If you want to reproduce my setup, checkout my repo:
> 
> https://github.com/certik/python-3.3
> 
> and from inside it, run:
> 
> SPKG_LOCAL=`pwd`/xx MAKEFLAGS="-j4" sh spkg-install
> 
> 
> (adjust the "-j4" flag, or remove it). You need a few packages
> installed like zlib1g-dev and so on. Then install virtualenv by
> downloading the tar.gz and from inside it doing
> "/path/to/my/python-3.3/xx/bin/python3.3 setup.py install". Add the
> file
> /path/to/my/python-3.3/xx/bin/virtualenv-3.3 into your $PATH.
> 
> Then:
> 
> 
> rm -rf $HOME/py33
> virtualenv-3.3 $HOME/py33
> . $HOME/py33/bin/activate
> 
> go to your numpy directory and do "python setup.py install". To run
> tests, you also need to:
> 
> TMPDIR=/tmp/numpy-env
> rm -rf $TMPDIR
> mkdir $TMPDIR
> cd $TMPDIR
> tar xzf $tarballs/nose-1.1.2.tar.gz
> cd nose-1.1.2
> python setup.py install
> 
> using the virtualenv environment. When I tried to install nose into
> the python installation in python-3.3./xx, then it failed...

Installing nose from a git checkout works fine for me. Maybe nose-1.1.2
isn't really compatible with Python 3.3?

Anyway, I managed to compile (by blanking
numpy/distutils/command/__init__.py) and to run the tests. I only see
the 2 pickle errors from your latest gist. So that's all good!

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ronan Lamy | 30 Jul 2012 05:57
Picon
Gravatar

Re: Status of NumPy and Python 3.3

Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :

> 
> Anyway, I managed to compile (by blanking
> numpy/distutils/command/__init__.py) and to run the tests. I only see
> the 2 pickle errors from your latest gist. So that's all good!

And the cause of these errors is that running the test suite somehow
corrupts Python's internal cache of bytes objects, causing the
following:
>>> b'\x01XXX'[0:1]
b'\xbb'

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ronan Lamy | 30 Jul 2012 18:10
Picon
Gravatar

Re: Status of NumPy and Python 3.3

Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
> Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
> 
> > 
> > Anyway, I managed to compile (by blanking
> > numpy/distutils/command/__init__.py) and to run the tests. I only see
> > the 2 pickle errors from your latest gist. So that's all good!
> 
> And the cause of these errors is that running the test suite somehow
> corrupts Python's internal cache of bytes objects, causing the
> following:
> >>> b'\x01XXX'[0:1]
> b'\xbb'

The culprit is test_pickle_string_overwrite() in test_regression.py. The
test actually tries to check for that kind of problem, but on Python 3,
it only manages to trigger it without detecting it. Here's a simple way
to reproduce the issue:

>>> a = numpy.array([1], 'b')
>>> b = pickle.loads(pickle.dumps(a))
>>> b[0] = 77
>>> b'\x01  '[0:1]
b'M'

Actually, this problem is probably quite old: I can see it in 1.6.1 w/
Python 3.2.3. 3.3 only makes it more visible. 

I'll open an issue on GitHub ASAP.

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ronan Lamy | 30 Jul 2012 19:04
Picon
Gravatar

Re: Status of NumPy and Python 3.3

Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :
> Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
> > Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
> > 
> > > 
> > > Anyway, I managed to compile (by blanking
> > > numpy/distutils/command/__init__.py) and to run the tests. I only see
> > > the 2 pickle errors from your latest gist. So that's all good!
> > 
> > And the cause of these errors is that running the test suite somehow
> > corrupts Python's internal cache of bytes objects, causing the
> > following:
> > >>> b'\x01XXX'[0:1]
> > b'\xbb'
> 
> The culprit is test_pickle_string_overwrite() in test_regression.py. The
> test actually tries to check for that kind of problem, but on Python 3,
> it only manages to trigger it without detecting it. Here's a simple way
> to reproduce the issue:
> 
> >>> a = numpy.array([1], 'b')
> >>> b = pickle.loads(pickle.dumps(a))
> >>> b[0] = 77
> >>> b'\x01  '[0:1]
> b'M'
> 
> Actually, this problem is probably quite old: I can see it in 1.6.1 w/
> Python 3.2.3. 3.3 only makes it more visible. 
> 
> I'll open an issue on GitHub ASAP.
> 
https://github.com/numpy/numpy/issues/370

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ondřej Čertík | 30 Jul 2012 20:07
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy <ronan.lamy <at> gmail.com> wrote:
> Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :
>> Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
>> > Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
>> >
>> > >
>> > > Anyway, I managed to compile (by blanking
>> > > numpy/distutils/command/__init__.py) and to run the tests. I only see
>> > > the 2 pickle errors from your latest gist. So that's all good!
>> >
>> > And the cause of these errors is that running the test suite somehow
>> > corrupts Python's internal cache of bytes objects, causing the
>> > following:
>> > >>> b'\x01XXX'[0:1]
>> > b'\xbb'
>>
>> The culprit is test_pickle_string_overwrite() in test_regression.py. The
>> test actually tries to check for that kind of problem, but on Python 3,
>> it only manages to trigger it without detecting it. Here's a simple way
>> to reproduce the issue:
>>
>> >>> a = numpy.array([1], 'b')
>> >>> b = pickle.loads(pickle.dumps(a))
>> >>> b[0] = 77
>> >>> b'\x01  '[0:1]
>> b'M'
>>
>> Actually, this problem is probably quite old: I can see it in 1.6.1 w/
>> Python 3.2.3. 3.3 only makes it more visible.
>>
>> I'll open an issue on GitHub ASAP.
>>
> https://github.com/numpy/numpy/issues/370

Thanks Ronan, nice work!

Since you looked into this -- do you know a way to fix this? (Both
NumPy and the test.)

Ondrej
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ronan Lamy | 31 Jul 2012 02:00
Picon
Gravatar

Re: Status of NumPy and Python 3.3

Le lundi 30 juillet 2012 à 11:07 -0700, Ondřej Čertík a écrit :
> On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy <ronan.lamy <at> gmail.com> wrote:
> > Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :
> >> Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
> >> > Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
> >> >
> >> > >
> >> > > Anyway, I managed to compile (by blanking
> >> > > numpy/distutils/command/__init__.py) and to run the tests. I only see
> >> > > the 2 pickle errors from your latest gist. So that's all good!
> >> >
> >> > And the cause of these errors is that running the test suite somehow
> >> > corrupts Python's internal cache of bytes objects, causing the
> >> > following:
> >> > >>> b'\x01XXX'[0:1]
> >> > b'\xbb'
> >>
> >> The culprit is test_pickle_string_overwrite() in test_regression.py. The
> >> test actually tries to check for that kind of problem, but on Python 3,
> >> it only manages to trigger it without detecting it. Here's a simple way
> >> to reproduce the issue:
> >>
> >> >>> a = numpy.array([1], 'b')
> >> >>> b = pickle.loads(pickle.dumps(a))
> >> >>> b[0] = 77
> >> >>> b'\x01  '[0:1]
> >> b'M'
> >>
> >> Actually, this problem is probably quite old: I can see it in 1.6.1 w/
> >> Python 3.2.3. 3.3 only makes it more visible.
> >>
> >> I'll open an issue on GitHub ASAP.
> >>
> > https://github.com/numpy/numpy/issues/370
> 
> Thanks Ronan, nice work!
> 
> Since you looked into this -- do you know a way to fix this? (Both
> NumPy and the test.)

Pauli found out how to fix the code, so I'll try to send a PR tonight.

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ondřej Čertík | 3 Aug 2012 17:03
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Mon, Jul 30, 2012 at 5:00 PM, Ronan Lamy <ronan.lamy <at> gmail.com> wrote:
> Le lundi 30 juillet 2012 à 11:07 -0700, Ondřej Čertík a écrit :
>> On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy <ronan.lamy <at> gmail.com> wrote:
>> > Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :
>> >> Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
>> >> > Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
>> >> >
>> >> > >
>> >> > > Anyway, I managed to compile (by blanking
>> >> > > numpy/distutils/command/__init__.py) and to run the tests. I only see
>> >> > > the 2 pickle errors from your latest gist. So that's all good!
>> >> >
>> >> > And the cause of these errors is that running the test suite somehow
>> >> > corrupts Python's internal cache of bytes objects, causing the
>> >> > following:
>> >> > >>> b'\x01XXX'[0:1]
>> >> > b'\xbb'
>> >>
>> >> The culprit is test_pickle_string_overwrite() in test_regression.py. The
>> >> test actually tries to check for that kind of problem, but on Python 3,
>> >> it only manages to trigger it without detecting it. Here's a simple way
>> >> to reproduce the issue:
>> >>
>> >> >>> a = numpy.array([1], 'b')
>> >> >>> b = pickle.loads(pickle.dumps(a))
>> >> >>> b[0] = 77
>> >> >>> b'\x01  '[0:1]
>> >> b'M'
>> >>
>> >> Actually, this problem is probably quite old: I can see it in 1.6.1 w/
>> >> Python 3.2.3. 3.3 only makes it more visible.
>> >>
>> >> I'll open an issue on GitHub ASAP.
>> >>
>> > https://github.com/numpy/numpy/issues/370
>>
>> Thanks Ronan, nice work!
>>
>> Since you looked into this -- do you know a way to fix this? (Both
>> NumPy and the test.)
>
> Pauli found out how to fix the code, so I'll try to send a PR tonight.

So this PR is now in and the issue is fixed.

As far as swapping the unicode issues, I finally understand what is
going on and I posted my current understanding into the Python tracker
issue (http://bugs.python.org/issue15540) which was recently created
for this same issue:

http://bugs.python.org/msg167280

but it was determined that it is not a bug in Python so it is closed
now. Finally, I have submitted a reworked version of my patch here:

https://github.com/numpy/numpy/pull/372

It implements things in a clean way.

Ondrej
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Ondřej Čertík | 4 Aug 2012 20:14
Picon
Gravatar

Re: Status of NumPy and Python 3.3

On Fri, Aug 3, 2012 at 8:03 AM, Ondřej Čertík <ondrej.certik <at> gmail.com> wrote:
> On Mon, Jul 30, 2012 at 5:00 PM, Ronan Lamy <ronan.lamy <at> gmail.com> wrote:
>> Le lundi 30 juillet 2012 à 11:07 -0700, Ondřej Čertík a écrit :
>>> On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy <ronan.lamy <at> gmail.com> wrote:
>>> > Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :
>>> >> Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
>>> >> > Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
>>> >> >
>>> >> > >
>>> >> > > Anyway, I managed to compile (by blanking
>>> >> > > numpy/distutils/command/__init__.py) and to run the tests. I only see
>>> >> > > the 2 pickle errors from your latest gist. So that's all good!
>>> >> >
>>> >> > And the cause of these errors is that running the test suite somehow
>>> >> > corrupts Python's internal cache of bytes objects, causing the
>>> >> > following:
>>> >> > >>> b'\x01XXX'[0:1]
>>> >> > b'\xbb'
>>> >>
>>> >> The culprit is test_pickle_string_overwrite() in test_regression.py. The
>>> >> test actually tries to check for that kind of problem, but on Python 3,
>>> >> it only manages to trigger it without detecting it. Here's a simple way
>>> >> to reproduce the issue:
>>> >>
>>> >> >>> a = numpy.array([1], 'b')
>>> >> >>> b = pickle.loads(pickle.dumps(a))
>>> >> >>> b[0] = 77
>>> >> >>> b'\x01  '[0:1]
>>> >> b'M'
>>> >>
>>> >> Actually, this problem is probably quite old: I can see it in 1.6.1 w/
>>> >> Python 3.2.3. 3.3 only makes it more visible.
>>> >>
>>> >> I'll open an issue on GitHub ASAP.
>>> >>
>>> > https://github.com/numpy/numpy/issues/370
>>>
>>> Thanks Ronan, nice work!
>>>
>>> Since you looked into this -- do you know a way to fix this? (Both
>>> NumPy and the test.)
>>
>> Pauli found out how to fix the code, so I'll try to send a PR tonight.
>
>
> So this PR is now in and the issue is fixed.
>
> As far as swapping the unicode issues, I finally understand what is
> going on and I posted my current understanding into the Python tracker
> issue (http://bugs.python.org/issue15540) which was recently created
> for this same issue:
>
> http://bugs.python.org/msg167280
>
> but it was determined that it is not a bug in Python so it is closed
> now. Finally, I have submitted a reworked version of my patch here:
>
> https://github.com/numpy/numpy/pull/372
>
> It implements things in a clean way.

Final update: the patch is in, so NumPy now passes all tests in Python 3.3.

There seems to be a better way to support unicode and that is
discussed in another thread.

Ondrej
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Stefan Krah | 29 Jul 2012 23:55

Re: Status of NumPy and Python 3.3

Ronan Lamy <ronan.lamy <at> gmail.com> wrote:
> ImportError: No module named 'distutils.command.install_clib'

I'm seeing the same with Python 3.3.0b1 (68e2690a471d+) and this patch
solves the problem:

diff --git a/numpy/distutils/command/__init__.py b/numpy/distutils/command/__init__.py
index f8f0884..b9f0d09 100644
--- a/numpy/distutils/command/__init__.py
+++ b/numpy/distutils/command/__init__.py
 <at>  <at>  -7,13 +7,13  <at>  <at>  __revision__ = "$Id: __init__.py,v 1.3 2005/05/16 11:08:49 pearu Exp $"

 distutils_all = [  #'build_py',
                    'clean',
-                   'install_clib',
                    'install_scripts',
                    'bdist',
                    'bdist_dumb',
                    'bdist_wininst',
                 ]

+from numpy.distutils.command import install_clib
 __import__('distutils.command',globals(),locals(),distutils_all)

 __all__ = ['build',

Stefan Krah
Ronan Lamy | 30 Jul 2012 02:52
Picon
Gravatar

Re: Status of NumPy and Python 3.3

Le dimanche 29 juillet 2012 à 23:55 +0200, Stefan Krah a écrit :
> Ronan Lamy <ronan.lamy <at> gmail.com> wrote:
> > ImportError: No module named 'distutils.command.install_clib'
> 
> I'm seeing the same with Python 3.3.0b1 (68e2690a471d+) and this patch
> solves the problem:
> 
> diff --git a/numpy/distutils/command/__init__.py b/numpy/distutils/command/__init__.py
> index f8f0884..b9f0d09 100644
> --- a/numpy/distutils/command/__init__.py
> +++ b/numpy/distutils/command/__init__.py
>  <at>  <at>  -7,13 +7,13  <at>  <at>  __revision__ = "$Id: __init__.py,v 1.3 2005/05/16 11:08:49 pearu Exp $"
>  
>  distutils_all = [  #'build_py',
>                     'clean',
> -                   'install_clib',
>                     'install_scripts',
>                     'bdist',
>                     'bdist_dumb',
>                     'bdist_wininst',
>                  ]
>  
> +from numpy.distutils.command import install_clib
>  __import__('distutils.command',globals(),locals(),distutils_all)
>  
>  __all__ = ['build',

That does indeed solve the problem, thanks. However, I'm quite sure that
'rm numpy/distutils/command/__init__.py && touch
numpy/distutils/command/__init__.py' works just as well - or probably
better, in fact, as it allows 'from numpy.distutils.command import *' to
run without error.

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Stefan Krah | 29 Jul 2012 11:20

Re: Status of NumPy and Python 3.3

Ond??ej ??ert??k <ondrej.certik <at> gmail.com> wrote:
> Why doesn't PyUnicode_FromKindAndData return a subtype of PyUnicodeObject?
> 
> http://docs.python.org/dev/c-api/unicode.html#PyUnicode_FromKindAndData

Well, it would need a PyTypeObject * parameter to do that. I agree that
many C-API functions would be more useful if they did this.

Stefan Krah
Nathaniel Smith | 27 Jul 2012 12:24
Picon
Favicon

Re: Status of NumPy and Python 3.3

On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau <cournape <at> gmail.com> wrote:
> On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant <travis <at> continuum.io> wrote:
>> Hey all,
>>
>> I'm wondering who has tried to make NumPy work with Python 3.3.   The Unicode handling was significantly
improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects)
is not working now.
>>
>> It would be nice to get 1.7.0 working with Python 3.3 if possible before the release.     Anyone interested in
tackling that little challenge?   If someone has already tried it would be nice to hear your experience.
>
> Given that we're late with 1.7, I would suggest passing this to the
> next release, unless the fix is simple (just a change of API).

IMO, it's not a regression so it's not a release blocker. Of course we
should release the fix whenever it's ready (in 1.7 if it's ready by
then, else in 1.7.1), but we shouldn't hold up the release for it.

-n

Gmane