John C Klensin | 10 Dec 2003 15:11

Re: Global and national e-mail address


Dan,

Three observations (short this time)...

(i) Computer geeks and their possible preferences aside, people 
tend to not like transliterations (writing of a name that would 
normally be written in one character set in the characters of 
another).  Whether they like "better" transliterations more than 
"worse" transliterations is a cultural issue.

(ii) To reasonably transliterate names and languages, one needs 
not only a collection of the right phonemes, but an appropriate 
and accurate notation for tones.   You can't get those out of a 
small extension to Latin letters.  I'm told that one can get a 
reasonable approximation of all of the relevant phonemes and 
tones with IPA, but IPA not only uses some characters that are 
distinctly non-Latin-based, but also uses a rather complex 
collection of combining diacriticals.  And, at least unless one 
is a professional phonologist, learning IPA and how to use it 
accurately is _hard_ (having had people attempt to teach it to 
me twice, once when I was young enough to learn these things). 
For some hints in a reference that is easily accessible to most 
of us, see the discussion of IPA Characters in the Unicode 
definition (3.0 or 4.0, take your pick).

(iii) If one wants even an approximation to accurate 
transliteration, the symbol-overloading in Latin scripts is bad 
news.  E.g., the sound of "ö" (o with diaresis, U+00F6) is 
different in, e.g., Swedish and German.  I.e., they are 
(Continue reading)

Mark Davis | 10 Dec 2003 16:33

Re: Global and national e-mail address


It is easy to fall into a trap of being Eurocentric: to overestimate the ease of
Latin and underestimate the difficulty of transliteration. For many languages
there are no good transliteration standards; or rather, there are many
conflicting ones. And many of these are transcriptions, and not
transliterations. (The difference is that transliteration to Latin is
reversable; one can recover precisely the original text; transcription is not
reversable -- but is more pronouncable).

For example, here is are some sample transliterations (from
http://oss.software.ibm.com/cgi-bin/icu/tr). If you asked an average user of
each of these languages to do a transliteration, even of those who know Latin,
the odds of their coming up with exactly these results are very small.

По своей природе компьютеры могут работать лишь с
числами. И для того, чтобы они
могли хранить в памяти буквы или другие символы,
каждому такому символу должно
быть поставлено в соответствие число.
=>
Po svoej prirode kompʹûtery mogut rabotatʹ lišʹ s čislami. I dlâ togo, čtoby oni
mogli hranitʹ v pamâti bukvy ili drugie simvoly, každomu takomu simvolu dolžno
bytʹ postavleno v sootvetstvie čislo. (ISO 9)

Οι ηλεκτρονικοί υπολογιστές, σε τελική ανάλυση,
χειρίζονται απλώς αριθμούς.
Αποθηκεύουν γράμματα και άλλους χαρακτήρες
αντιστοιχώντας στο καθένα τους από
έναν αριθμό (ονομάζουμε μία τέτοια αντιστοιχία κωδικοσελίδα).
=>
(Continue reading)

Keld Jørn Simonsen | 10 Dec 2003 16:16
Picon

Re: Global and national e-mail address


On Wed, Dec 10, 2003 at 09:11:39AM -0500, John C Klensin wrote:
> 
> (iii) If one wants even an approximation to accurate 
> transliteration, the symbol-overloading in Latin scripts is bad 
> news.  E.g., the sound of "ö" (o with diaresis, U+00F6) is 
> different in, e.g., Swedish and German.  I.e., they are 
> different characters, even if they look the same and even if 
> Unicode "unified" them.

Actually ö is pronounced the same in German and Swedish, and they
com out of the same typographical tradition, with combining an o and an
e. But they are considered differently, in German it is considered an
"o umlaut" while in Swedish it is considered a genuine letter.
Both Germans and Swedes have a specific sound for the character,
when they spell words, and incidently it is the same sound (more or
less, dialects may vary). If you had said ö as done in French or in
Dutch (where the ö is less frequent) then you were right. There ö is
considered an o with diaresis, where the diaresis accent is placed to
indicate that the o-sound is pronounced individually.

Another example of letters that are pronounced differently, is "i" and
"e" which are pronounced very differently in standard English vs standard
German, French, and Scandinavian.

But I think this does not matter for email addresses, as long as you can
write the correct name, it does not matter how it is pronounced.
Eg my last name "Simonsen" is pronounced differently in Danish and by
uninitiated English speaking persons. But it goes perfectly into one of
my email addresses Keld.Simonsen <at> dkuug.dk .
(Continue reading)


Gmane