RE: Unicode CLDR Release 1.5.1 now available
Philippe Verdy <verdy_p <at> wanadoo.fr>
2008-01-11 05:26:38 GMT
On [20080110 18:34], Rick McGowan (rick <at> unicode.org) wrote:
>CLDR 1.5.1 is an update release, with no new translations. The main
>changes are a significant revision to the data and process for computing
>timezone names
Still defecting in the current data, because now it focuses on examplar
cities instead of countries as the primary selective name, and does not
disambiguate this city completely with the country name when needed, but
only when a country has several timezones, despite each examplar cities are
not necessarily ambiguous in the exising timezone data. For example
"Georgetown" is the capital and examplar city of several countries that
don't have multiple timezones, so the city name is not qualified with the
country name. This caveat comes exactly from the fact that the selective
name has been reversed in CLDR 1.5 (and this is still not corrected in
1.5.1).
> and additional data for finding default script or country
> given a language, or the converse.
There's still a problem for the converse.
Try applying it for "zh-SG", you'll get:
* the maximized locale id as "zh-Hans-SG" (keeping possible existing
variants on a separate variable that will get appended at end of the
converse resolution).
* looking for "zh-Hans" will return nothing
* looking for "zh" will return "zh-Hans-CN", and substituting "SG" for "CN"
will return "zh_Hans_SG".
But if the latest step had returned "zh-Hant-CN" (i.e. China used the
traditional orthography by default), the conversion would have returned the
wrong default orthography for Singapore: "zh-Hant-SG".
The problem may happen if the same language uses a different default script
in one country from the default script used in another. I am thinking here
about the orthography of Serbian which is now Latin by default in Serbia,
and still Cyrillic in Bosnia-Herzegovina.
From "sr" you would most probably expect "sr-Latn-SR" by default, and the
same would be obtained from "sr-SR".
From "sr-Cyrl", you would get certainly "sr-Cyrl-SR" by default, implying
that Serbian is most probably for Serbia where it is mostly used.
But from "sr-BA" you would expect to find "sr-Cyrl-BA", not "sr-Latn-BA",
for political reasons where "sr-Latn" would be too much confusable with
"hr-Latn" (or "bs-Latn", the local version less politically oriented, but a
"Bosnian" language is still rejected by Serbians in the Serbian autonomous
region in Northern-Eastern Bosnia, that still refer to their language as
"Serbian", and that want to maintain the cyrilic script as a strong cultural
difference from Bosnian). These defaults may easily change again or could be
disputed: what is the preferred script now in Montenegro? And for Serbians
in Kosovo? This area (Bosnia, Serbia, Montenegro, Kosovo) is still
considered as using two conflicting scripts, and little can be arbitrarily
chosen due to ethnic and political preferences, even if the war is now over
(I think this also applies, however with less critical issues, in the FYRO
Macedonia, where there's also an active ethnic Albanian community that
prefers Latin, or may sometime still use Arabic for religious purposes).
When there's such a mosaic of ethnic peoples in a small area, this conflict
will often translate into their language, notably if there are multiple
scripts and languages are easily mixed. I'm not sure that the situation in
Central Africa or India is even simpler with all their many languages.