Doug Ewell | 2 Jun 17:27
Picon

Re: Last Call: Preparation of Internationalized Strings

Simon,

There have been two corrections to normalization since Unicode 3.0.  One
involved a Chinese (Han) compatibility character that was mapped to the
wrong "normal" character by error.  The other involved a Yiddish
(Hebrew) compatibility character that should have had a compatibility
mapping, but did not, also by error.

Both corrections were made to characters that are supposedly "very rare"
in actual use, so that the real-world impact would be minimal.  Neither
one has anything to do with transcoding tables.

I know you are very concerned that Unicode has "broken its promise" by
making changes to the normalization tables after claiming they would not
do so.  I think if the corrections had not been made, there would have
been an equal but opposite reaction that Unicode was too stubborn to
correct its own mistakes, and that NFKC was rendered "useless" because
of these two incorrect mappings.

The pages explaining the corrigenda include lengthy, detailed
explanations of why the Technical Committee felt they were necessary and
justified.  As someone already mentioned, one of the justifications
given for the Yiddish change was that no normative references existed
*yet* for the Unicode normalization tables (i.e. from IDN).  This
implies that once such normative references *do* exist, a similar
decision to correct an error might not be made.

I imagine these were very difficult decisions for the UTC, who knew that
someone would jump on the changes immediately as evidence that
normalization is inherently unstable and Unicode is therefore "not
(Continue reading)


Gmane