16 Aug 02:23
Parsing and non-ASCII Input
From: Leo Soto M. <leo.soto <at> gmail.com>
Subject: Parsing and non-ASCII Input
Newsgroups: gmane.comp.lang.jython.devel
Date: 2008-08-16 00:25:47 GMT
Subject: Parsing and non-ASCII Input
Newsgroups: gmane.comp.lang.jython.devel
Date: 2008-08-16 00:25:47 GMT
This was mostly planned as a note-to-myself document, as I spent most of this day with this issue but had to let it on hold for a few days. I now think it is good idea to post it here, for review of more experimented Jython developers. Problem ======= We have problems parsing non-ascii input under some circumstances. It is mostly visible on doctest right now, but I'm afraid that the current bugs can show up in other circumstances too. Here is a minimal example: >>> eval(u"'f\xf6\xf6'") 'f\xf6\xf6' >>> eval(u"'b\u0105r'") 'b?r' Cause ===== The problem can be tracked to the use of String.toBytes() before reaching the parser, without taking into account the case when the incoming String[*] contains non-ascii characters. The offending code is on * ParserFacade#parse(String string, String kind): Unused, but dangerous IMHO, so should be removed/reworked * Py#compile_flags(String data, String filename, String type,(Continue reading)
RSS Feed