Rick McGowan | 10 Jan 2008 18:23
Picon

Unicode CLDR Release 1.5.1 now available

The Unicode(R) Consortium has announced the release of the new version of  
the Unicode Common Locale Data Repository (Unicode CLDR 1.5.1), providing  
key building blocks for software to support the world's languages. Unicode  
CLDR is by far the largest and most extensive standard repository of locale  
data. This data is used by a wide spectrum of companies for their software  
internationalization and localization: adapting software to the  
conventions of different languages for such common software tasks as  
formatting of dates, times, time zones, numbers, and currency values;  
sorting text; choosing languages or countries by name; transliterating  
different alphabets; and many others.

CLDR 1.5.1 is an update release, with no new translations. The main  
changes are a significant revision to the data and process for computing  
timezone names, and additional data for finding default script or country  
given a language, or the converse. The structure has also been updated for  
the latest version of BCP 47, and new currency codes. For more information,  
see http://unicode.org/cldr/

Gravatar

Re: Unicode CLDR Release 1.5.1 now available

Rick,

I also assume http://www.unicode.org/reports/tr35/ will be updated to reflect
v9 is the latest version? Right now it is only v8 with no apparent link to v9
(aside from the CLDR 1.5.1  <at>  http://unicode.org/cldr/version/1.5.1.html).

--

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/
Once sent from the Golden Hall...

Mark Davis | 10 Jan 2008 22:42
Favicon

Re: Unicode CLDR Release 1.5.1 now available

Thanks for the comments -- those were oversights.

Mark

On Jan 10, 2008 11:38 AM, Jeroen Ruigrok van der Werven <asmodai <at> in-nomine.org> wrote:
Rick,

I also assume http://www.unicode.org/reports/tr35/ will be updated to reflect
v9 is the latest version? Right now it is only v8 with no apparent link to v9
(aside from the CLDR 1.5.1 <at> http://unicode.org/cldr/version/1.5.1.html).

--
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/
Once sent from the Golden Hall...




--
Mark
Gravatar

Re: Unicode CLDR Release 1.5.1 now available

Rick,

-On [20080110 18:34], Rick McGowan (rick <at> unicode.org) wrote:
>CLDR 1.5.1 is an update release, with no new translations. The main  
>changes are a significant revision to the data and process for computing  
>timezone names, and additional data for finding default script or country  
>given a language, or the converse. The structure has also been updated for  
>the latest version of BCP 47, and new currency codes. For more information,  
>see http://unicode.org/cldr/

http://unicode.org/cldr/version/1.5.1.html is Likely Subtags supposed to link
to http://unicode.org/cldr/version/1.5.1.html#Likely_Subtags ?
I think http://www.unicode.org/reports/tr35/tr35-9.html#Likely_Subtags was
intended.

--

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/
Seize from every moment its unique novelty and do not prepare your joys...

Philippe Verdy | 11 Jan 2008 06:26
Picon

RE: Unicode CLDR Release 1.5.1 now available

On [20080110 18:34], Rick McGowan (rick <at> unicode.org) wrote:
>CLDR 1.5.1 is an update release, with no new translations. The main
>changes are a significant revision to the data and process for computing
>timezone names

Still defecting in the current data, because now it focuses on examplar
cities instead of countries as the primary selective name, and does not
disambiguate this city completely with the country name when needed, but
only when a country has several timezones, despite each examplar cities are
not necessarily ambiguous in the exising timezone data. For example
"Georgetown" is the capital and examplar city of several countries that
don't have multiple timezones, so the city name is not qualified with the
country name. This caveat comes exactly from the fact that the selective
name has been reversed in CLDR 1.5 (and this is still not corrected in
1.5.1).

> and additional data for finding default script or country
> given a language, or the converse.

There's still a problem for the converse.

Try applying it for "zh-SG", you'll get:
* the maximized locale id as "zh-Hans-SG" (keeping possible existing
variants on a separate variable that will get appended at end of the
converse resolution).
* looking for "zh-Hans" will return nothing
* looking for "zh" will return "zh-Hans-CN", and substituting "SG" for "CN"
will return "zh_Hans_SG".

But if the latest step had returned "zh-Hant-CN" (i.e. China used the
traditional orthography by default), the conversion would have returned the
wrong default orthography for Singapore: "zh-Hant-SG".

The problem may happen if the same language uses a different default script
in one country from the default script used in another. I am thinking here
about the orthography of Serbian which is now Latin by default in Serbia,
and still Cyrillic in Bosnia-Herzegovina.

From "sr" you would most probably expect "sr-Latn-SR" by default, and the
same would be obtained from "sr-SR".
From "sr-Cyrl", you would get certainly "sr-Cyrl-SR" by default, implying
that Serbian is most probably for Serbia where it is mostly used.
But from "sr-BA" you would expect to find "sr-Cyrl-BA", not "sr-Latn-BA",
for political reasons where "sr-Latn" would be too much confusable with
"hr-Latn" (or "bs-Latn", the local version less politically oriented, but a
"Bosnian" language is still rejected by Serbians in the Serbian autonomous
region in Northern-Eastern Bosnia, that still refer to their language as
"Serbian", and that want to maintain the cyrilic script as a strong cultural
difference from Bosnian). These defaults may easily change again or could be
disputed: what is the preferred script now in Montenegro? And for Serbians
in Kosovo? This area (Bosnia, Serbia, Montenegro, Kosovo) is still
considered as using two conflicting scripts, and little can be arbitrarily
chosen due to ethnic and political preferences, even if the war is now over
(I think this also applies, however with less critical issues, in the FYRO
Macedonia, where there's also an active ethnic Albanian community that
prefers Latin, or may sometime still use Arabic for religious purposes).

When there's such a mosaic of ethnic peoples in a small area, this conflict
will often translate into their language, notably if there are multiple
scripts and languages are easily mixed. I'm not sure that the situation in
Central Africa or India is even simpler with all their many languages.

Mark Davis | 15 Jan 2008 19:01
Favicon

Re: Unicode CLDR Release 1.5.1 now available

If you think these are bugs, you should file them as such. (I don't think they are, but if they are filed as bugs we'll look into them).

Mark

On Jan 10, 2008 9:26 PM, Philippe Verdy < verdy_p <at> wanadoo.fr> wrote:
On [20080110 18:34], Rick McGowan (rick <at> unicode.org) wrote:
>CLDR 1.5.1 is an update release, with no new translations. The main
>changes are a significant revision to the data and process for computing
>timezone names

Still defecting in the current data, because now it focuses on examplar
cities instead of countries as the primary selective name, and does not
disambiguate this city completely with the country name when needed, but
only when a country has several timezones, despite each examplar cities are
not necessarily ambiguous in the exising timezone data. For example
"Georgetown" is the capital and examplar city of several countries that
don't have multiple timezones, so the city name is not qualified with the
country name. This caveat comes exactly from the fact that the selective
name has been reversed in CLDR 1.5 (and this is still not corrected in
1.5.1).

> and additional data for finding default script or country
> given a language, or the converse.

There's still a problem for the converse.

Try applying it for "zh-SG", you'll get:
* the maximized locale id as "zh-Hans-SG" (keeping possible existing
variants on a separate variable that will get appended at end of the
converse resolution).
* looking for "zh-Hans" will return nothing
* looking for "zh" will return "zh-Hans-CN", and substituting "SG" for "CN"
will return "zh_Hans_SG".

But if the latest step had returned "zh-Hant-CN" ( i.e. China used the
traditional orthography by default), the conversion would have returned the
wrong default orthography for Singapore: "zh-Hant-SG".

The problem may happen if the same language uses a different default script
in one country from the default script used in another. I am thinking here
about the orthography of Serbian which is now Latin by default in Serbia,
and still Cyrillic in Bosnia-Herzegovina.

From "sr" you would most probably expect "sr-Latn-SR" by default, and the
same would be obtained from "sr-SR".
From "sr-Cyrl", you would get certainly "sr-Cyrl-SR" by default, implying
that Serbian is most probably for Serbia where it is mostly used.
But from "sr-BA" you would expect to find "sr-Cyrl-BA", not "sr-Latn-BA",
for political reasons where "sr-Latn" would be too much confusable with
"hr-Latn" (or "bs-Latn", the local version less politically oriented, but a
"Bosnian" language is still rejected by Serbians in the Serbian autonomous
region in Northern-Eastern Bosnia, that still refer to their language as
"Serbian", and that want to maintain the cyrilic script as a strong cultural
difference from Bosnian). These defaults may easily change again or could be
disputed: what is the preferred script now in Montenegro? And for Serbians
in Kosovo? This area (Bosnia, Serbia, Montenegro, Kosovo) is still
considered as using two conflicting scripts, and little can be arbitrarily
chosen due to ethnic and political preferences, even if the war is now over
(I think this also applies, however with less critical issues, in the FYRO
Macedonia, where there's also an active ethnic Albanian community that
prefers Latin, or may sometime still use Arabic for religious purposes).

When there's such a mosaic of ethnic peoples in a small area, this conflict
will often translate into their language, notably if there are multiple
scripts and languages are easily mixed. I'm not sure that the situation in
Central Africa or India is even simpler with all their many languages.






--
Mark
Philippe Verdy | 11 Jan 2008 01:13
Picon

1.5.1 change: bug in survey (currency: ROL/symbol)

Bug related to the change announed in CLDR 1.5.1 release:
http://unicode.org/cldr/apps/survey?_=fr&x=currencies

Displays "internal error" for (unconfirmed) ROL/symbol in example (root data
is "ROL", but English data "=0#Old lei|1#Old leu|1" fails.)

----

Consider also disambiguating "Guyana" in English locale for timezones, as
this is not an examplar city, and the name is easily confusable with French
Guiana.

Proposed solution: use "Georgetown" as the examplar city, and add "(Guyana)"
country name in suffix, because the city is also easily confusable with
other cities of the Caribbean region). Note however that a country suffix is
only added when there are multiple timezones in a country, but not when the
same examplar city name is a capital of another country/region with its own
timezone where the same city name would be examplar. There may be some
languages where there's a minor orthographic difference, but such difference
is easily confusable like in "Guyana" vs. "Guiana" in English, or "Guyana"
vs. "Guyane" in French, or "Georgetown" vs. "Georgestown"...

What best solution could be used to avoid such confusions?

Philippe Verdy | 11 Jan 2008 00:59
Picon

Beta Survey application bug: coverage

Something to fix for the coming new beta of the survey (starting on February
1st?)

http://unicode.org/cldr/apps/survey?_=fr

Possible problems with locale:
(null) : Error: Internal error in org.unicode.cldr.test.CheckCoverage.
Exception: java.lang.NullPointerException, Message:
java.lang.NullPointerException, Trace: []

Mark Davis | 15 Jan 2008 19:02
Favicon

Re: Beta Survey application bug: coverage

Thanks. Can you file this as a bug?

Mark

On Jan 10, 2008 3:59 PM, Philippe Verdy <verdy_p <at> wanadoo.fr> wrote:
Something to fix for the coming new beta of the survey (starting on February
1st?)

http://unicode.org/cldr/apps/survey?_=fr

Possible problems with locale:
(null) : Error: Internal error in org.unicode.cldr.test.CheckCoverage.
Exception: java.lang.NullPointerException, Message:
java.lang.NullPointerException, Trace: []







--
Mark

Gmane