Leo Soto M. | 16 Aug 02:23

Parsing and non-ASCII Input

This was mostly planned as a note-to-myself document, as I spent most
of this day with this issue but had to let it on hold for a few days.
I now think it is good idea to post it here, for review of more
experimented Jython developers.

Problem
=======

We have problems parsing non-ascii input under some circumstances. It
is mostly visible on doctest right now, but I'm afraid that the
current bugs can show up in other circumstances too. Here is a minimal
example:

>>> eval(u"'f\xf6\xf6'")
'f\xf6\xf6'
>>> eval(u"'b\u0105r'")
'b?r'

Cause
=====

The problem can be tracked to the use of String.toBytes() before
reaching the parser, without taking into account the case when the
incoming String[*] contains non-ascii characters.

The offending code is on

 * ParserFacade#parse(String string, String kind): Unused, but
dangerous IMHO, so should be removed/reworked
 * Py#compile_flags(String data, String filename, String type,
(Continue reading)

Frank Wierzbicki | 17 Aug 02:35

Re: Parsing and non-ASCII Input

On Fri, Aug 15, 2008 at 8:25 PM, Leo Soto M. <leo.soto <at> gmail.com> wrote:

> The offending code is on
>
>  * ParserFacade#parse(String string, String kind): Unused, but
> dangerous IMHO, so should be removed/reworked
Deleted -- I'm pretty sure it was unused.

>  * Py#compile_flags(String data, String filename, String type,
> CompilerFlags cflags): The entry point for most eval()/exec
> operations.
Not as easy as the first one :)

I'll need to think some more and read through some code before I can
comment on the proposed solutions... but I do like the 2nd approach
(with Readers, etc) better on the first reading.

-Frank

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

Gmane