Martin von Loewis | 1 Jul 2001 15:52
Picon
Favicon

Support for "wide" Unicode characters

> The problem I have with this PEP is that it is a compile time option
> which makes it hard to work with both 32 bit and 16 bit strings in
> one program.

Can you elaborate why you think this is a problem?

> Can not the 32 bit string type be introduced as an additional type?

Yes, but not just "like that". You'd have to define an API for
creating values of this type, you'd have to teach all functions which
ought to accept it to process it, you'd have to define conversion
operations and all that: In short, you'd have to go through all the
trouble that introduction of the Unicode type gave us once again.
Also, I cannot see any advantages in introducing yet another type.

Implementing this PEP is straight forward, and with almost no visible
effect to Python programs.

People have suggested to make it a run-time decision, having the
internal representation switch on demand, but that would give an API
nightmare for C code that has to access such values.

> u[i] is a character. If u is Unicode, then u[i] is a Python Unicode
> character.

>  This wasn't usefully true in the past for DBCS strings and is not the
> right way to think of either narrow or wide strings now. The idea
> that strings are arrays of characters gets in the way of dealing
> with many encodings and is the primary difficulty in localising
> software for Japanese.
(Continue reading)

Neil Hodgson | 2 Jul 2001 04:42
Picon

Re: Support for "wide" Unicode characters

Martin von Loewis:

> > The problem I have with this PEP is that it is a compile time option
> > which makes it hard to work with both 32 bit and 16 bit strings in
> > one program.
>
> Can you elaborate why you think this is a problem?

   A common role for Python is to act as glue between various modules. If
Paul produces some interesting code that depends on 32 bit strings and I
want to use that in conjunction with some Win32 specific or COM dependent
code that wants 16 bit strings then it may not be possible or may require
difficult workaronds.

> (*) Methinks that the primary difficulty still is translating all the
> documentation, and messages. Actually, keeping the translations
> up-to-date is even more challenging.

   Translation of documentation and strings can be performed by almost
anyone who writes both languages ("even managers") and can be budgeted by
working out the amount of text and applying a conversion rate. Code requires
careful thought and can lead to the typical buggy software schedule
blowouts.

   Neil

Martin von Loewis | 2 Jul 2001 09:25
Picon
Favicon

Re: Support for "wide" Unicode characters

> > > The problem I have with this PEP is that it is a compile time option
> > > which makes it hard to work with both 32 bit and 16 bit strings in
> > > one program.
> >
> > Can you elaborate why you think this is a problem?
> 
>    A common role for Python is to act as glue between various modules. If
> Paul produces some interesting code that depends on 32 bit strings and I
> want to use that in conjunction with some Win32 specific or COM dependent
> code that wants 16 bit strings then it may not be possible or may require
> difficult workaronds.

Neither nor. All it will require is you to recompile your Python
installation for to use wide Unicode.

On Win32 APIs, this will mean that you cannot directly interpret
PyUnicode object representations as WCHAR_T pointers. This is no
problem, as you can transparently copy unicode objects into wchar_t
strings; it's a matter of coming up with a good C API for doing so
conveniently.

Regards,
Martin


Gmane