Gervase Markham | 9 Dec 2011 12:12
Picon
Favicon
Gravatar

Browser IDN display policy: opinions sought

Recently, Mozilla community member Jothan Frakes was kind enough to do
some research about how different popular web browsers implement IDN,
and when they display the real characters and when they display
Punycode. This is in the context of a Mozilla review of our policy. I am
interested in the opinions of people on this list (see below).

As it turns out, the behaviour of all popular browsers is summarised at
the bottom a Chromium project document here:
http://www.chromium.org/developers/design-documents/idn-in-google-chrome

The policies fall into 3 approximate buckets:

A (IE, Chrome): Unicode if the (single) 'language' of the string is
configured in the options, Punycode otherwise.

B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode
otherwise. Arbitrary script mixing permitted (registry policy used to
prevent abuse).

C (Safari): Unicode if the script is in a whitelist (which by default
does not include Cyrillic or Greek), Punycode otherwise. Not sure about
script mixing.

Firefox has historically resisted adopting a Type A policy because we
consider it seriously detrimental to IDN adoption and use. It seems to
me that IDN can never be reliable for site owners, and therefore will
not succceed, if a significant proportion of the world's browsers adopt
Type A or Type C policies. This is because site owners can never know
what proportion of their visitors will see gobbledegook in the URL bar
rather than their nice domain name. Perhaps for sites whose visitors are
(Continue reading)

JFC Morfin | 9 Dec 2011 15:22

Re: Browser IDN display policy: opinions sought

Dear Gervase,

IMHO if we want to force everyone to be happy in the only way we 
would want and use mandatory lists, we will make no one happy. IDNofA 
as an architecture is broken: your questions demonstrate it: five 
sister applications, at least 3 approximative different behaviours, 
and possibly five different IP resolution.

The only stable solution is a single common IDNApplication because 
there is a single common Internet DNS. This was not in our charter, 
but this was Lisa Dussault's wished us to address once we agreed over 
the WG/LC. Either the IAB eventually documents it, or we will 
disseminate it. The point is in the "WITISWISISWIS" IDNApplications 
requirement: what I type is what I see is what I send.

Please let us explain around that :

- there are 22500 language entities throught the world listed by the 
FLOSS Linguasphere database.
- there will be billions of TLDs and trillions of IDNs supported by 
IDv6 (IPv6 ID used on a global basis)
- the open code browsers will be forked to support IDNA transparent support.

Why not to just keep the browsers transparent and support your 3 
buckets as a single option set (one can chose the default depending 
on the local needs pattern). As they are I really fail to see how 
each of these bucket can clearly and easily support plurilingual 
users, multilingual applications or be used in an aerport or hotel cybercafe.

jfc
(Continue reading)

Paul Hoffman | 9 Dec 2011 16:34
Picon

Re: Browser IDN display policy: opinions sought

On Dec 9, 2011, at 3:12 AM, Gervase Markham wrote:

> The policies fall into 3 approximate buckets:
> 
> A (IE, Chrome): Unicode if the (single) 'language' of the string is
> configured in the options, Punycode otherwise.
> 
> B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode
> otherwise. Arbitrary script mixing permitted (registry policy used to
> prevent abuse).
> 
> C (Safari): Unicode if the script is in a whitelist (which by default
> does not include Cyrillic or Greek), Punycode otherwise. Not sure about
> script mixing.

Without understanding both how a TLD gets on "a whitelist", and how "registry policy (is) used to prevent
abuse", we cannot evaluate whether A or B would be better for Firefox. This information is critical to the analysis.

> By contrast, with a Type B policy, if your IDN domain works in one copy
> of Firefox, it works in them all. If everyone had Type B policies, there
> would be no risk of a properly-registered domain coming up as gibberish.

If Firefox (and Opera) were the only browsers that the site operator cared about, this would be good.
However, I believe that is true for approximately 0% of the sites in the world. (The same would be true if
there was a "D" that only applied to Chrome.)

> It has been suggested that Firefox switch to a Type A policy. As it is,
> the mix of policies means that the goal of universal acceptability is
> not being met anyway. Firefox switching to Type A would also not meet
> that goal by itself, but one could argue that there's a bit more
(Continue reading)

Gervase Markham | 9 Dec 2011 16:43
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 09/12/11 15:34, Paul Hoffman wrote:
> Without understanding both how a TLD gets on "a whitelist", and how
> "registry policy (is) used to prevent abuse", we cannot evaluate
> whether A or B would be better for Firefox. This information is
> critical to the analysis.

OK :-)

Mozilla's current policy on such things is listed here:
http://www.mozilla.org/projects/security/tld-idn-policy-list.html

We try to avoid being prescriptive about what method registries should
use to avoid homograph problems. The obvious ones are blocking and
bundling, but some registries have come up with very creative solutions
- one registry registering Cyrillic domains, for example, requires that
every domain contain at least one Cyrillic letter which is not an
ASCII-confusable.

There is a certain amount of judgement applied as to whether a
registry's policies are adequate. While we try and be consistent and
fair, that potentially can lead to accusations of unreasonable treatment.

>> By contrast, with a Type B policy, if your IDN domain works in one
>> copy of Firefox, it works in them all. If everyone had Type B
>> policies, there would be no risk of a properly-registered domain
>> coming up as gibberish.
> 
> If Firefox (and Opera) were the only browsers that the site operator
> cared about, this would be good. However, I believe that is true for
> approximately 0% of the sites in the world. (The same would be true
(Continue reading)

Paul Hoffman | 9 Dec 2011 17:46
Picon

Re: Browser IDN display policy: opinions sought

On Dec 9, 2011, at 7:43 AM, Gervase Markham wrote:

> Mozilla's current policy on such things is listed here:
> http://www.mozilla.org/projects/security/tld-idn-policy-list.html
> 
> We try to avoid being prescriptive about what method registries should
> use to avoid homograph problems. The obvious ones are blocking and
> bundling, but some registries have come up with very creative solutions
> - one registry registering Cyrillic domains, for example, requires that
> every domain contain at least one Cyrillic letter which is not an
> ASCII-confusable.
> 
> There is a certain amount of judgement applied as to whether a
> registry's policies are adequate. While we try and be consistent and
> fair, that potentially can lead to accusations of unreasonable treatment.

Thank you for that explanation. So, "if your IDN domain works in one copy of Firefox, it works in them all",
but that might change over time if a TLD changes its policies. But we know that many TLDs very much want to
change their policies with respect to bundling.

>>> It has been suggested that Firefox switch to a Type A policy. As it
>>> is, the mix of policies means that the goal of universal
>>> acceptability is not being met anyway. Firefox switching to Type A
>>> would also not meet that goal by itself, but one could argue that
>>> there's a bit more consistency to browser behaviour.
>> 
>> That has been my feeling all along, although I stopped expressing it
>> a while ago when it seemed like the Firefox team would never change.
>> I'm glad to hear that the discussion is opening up.
> 
(Continue reading)

Gervase Markham | 9 Dec 2011 18:25
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 09/12/11 16:46, Paul Hoffman wrote:
> Thank you for that explanation. So, "if your IDN domain works in one
> copy of Firefox, it works in them all", but that might change over
> time if a TLD changes its policies. But we know that many TLDs very
> much want to change their policies with respect to bundling.

Without disputing your assertion, I'd be very interested in more
information if you have it. Why do TLDs not like bundling?

> No; yes. :-) I think that if type A were not justified, there would
> have been many complaints about it over the years; I haven't heard
> any.

And you think that take-up and real-world use of IDNs on the consumer
Internet has been broad enough that we would have heard them?

>> If you would, in an ideal world, prefer everyone to be Type B,
>> would you be interested in a push to try and persuade other browser
>> makers to change tack instead of Firefox?
> 
> My ideal world is much closer to A than B. It is, in fact, one of the
> reasons that I switched to Chrome for my day-to-day work browsing
> that involves IDNs.

So do you add scripts for languages you don't speak to the Chrome
preferences to get the IDNs to render?

>> I wonder how we can get some statistics on that?
> 
> We can't. It is inherently impossible to measure because if some
(Continue reading)

Paul Hoffman | 9 Dec 2011 18:57
Picon

Re: Browser IDN display policy: opinions sought

On Dec 9, 2011, at 9:25 AM, Gervase Markham wrote:

> On 09/12/11 16:46, Paul Hoffman wrote:
>> Thank you for that explanation. So, "if your IDN domain works in one
>> copy of Firefox, it works in them all", but that might change over
>> time if a TLD changes its policies. But we know that many TLDs very
>> much want to change their policies with respect to bundling.
> 
> Without disputing your assertion, I'd be very interested in more
> information if you have it. Why do TLDs not like bundling?

Sorry, I was not clear: many TLDs was to start bundling. That is, many TLDs might be changing their policies,
and that might change what users see in Firefox.

>> No; yes. :-) I think that if type A were not justified, there would
>> have been many complaints about it over the years; I haven't heard
>> any.
> 
> And you think that take-up and real-world use of IDNs on the consumer
> Internet has been broad enough that we would have heard them?

Absolutely yes. You might not see them browsing the web from the US, but if you travel to other countries
(particularly Asia and many parts of Europe), you see them all the time.

>>> If you would, in an ideal world, prefer everyone to be Type B,
>>> would you be interested in a push to try and persuade other browser
>>> makers to change tack instead of Firefox?
>> 
>> My ideal world is much closer to A than B. It is, in fact, one of the
>> reasons that I switched to Chrome for my day-to-day work browsing
(Continue reading)

Andrew Sullivan | 9 Dec 2011 19:18

Re: Browser IDN display policy: opinions sought

On Fri, Dec 09, 2011 at 09:57:26AM -0800, Paul Hoffman wrote:
> 
> That may "surely" be the point for you, but I suspect you are in the minority. For me, the point is to have my
name rendered as well as possible in as many places as possible. I won't bother getting a domain name that
can't render well anywhere, but I will get one if I believe that it will look better than the inaccurate
all-ASCII one I have now. You may be over-applying what you think surely should be others' expectations on
people who have different desires.
> 

I think you're generalizing on what you think people believe about the
browser world, and I think Gerv has an excellent point.  A major
problem with approach A is that there is no way whatever to predict
how a given IDN will appear for people on the Internet.  That, at
least, is one of the complaints I've heard a lot of this year.

A

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
Gervase Markham | 12 Dec 2011 11:42
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 09/12/11 17:57, Paul Hoffman wrote:
>> On 09/12/11 16:46, Paul Hoffman wrote:
>>> Thank you for that explanation. So, "if your IDN domain works in
>>> one copy of Firefox, it works in them all", but that might change
>>> over time if a TLD changes its policies. But we know that many
>>> TLDs very much want to change their policies with respect to
>>> bundling.
>> 
>> Without disputing your assertion, I'd be very interested in more 
>> information if you have it. Why do TLDs not like bundling?
> 
> Sorry, I was not clear: many TLDs was to start bundling. That is,
> many TLDs might be changing their policies, and that might change
> what users see in Firefox.

Well yes, in that it would perhaps make them eligible for inclusion,
which means more IDNs would work, which would be a good thing.

If TLDs wanted to _stop_ bundling, that would be a problem.

>>> No; yes. :-) I think that if type A were not justified, there
>>> would have been many complaints about it over the years; I
>>> haven't heard any.
>> 
>> And you think that take-up and real-world use of IDNs on the
>> consumer Internet has been broad enough that we would have heard
>> them?
> 
> Absolutely yes. You might not see them browsing the web from the US,

(Continue reading)

Michel Suignard | 9 Dec 2011 18:26

RE: Browser IDN display policy: opinions sought

I am pretty much on Paul's side here. I was somewhat involved in the IE design concerning the display policy
and although I knew the Firefox choice then I never had much sympathy for it. It basically meant that you
were blacklisting large TLDs forever because you could not guarantee that all registered entries would
ever be 'safe'. And I don't want to go in how do you establish public rules determining which TLDs are
blacklisted. It is a risky business to be in if you are a commercial entity.
Putting the user in charge through 'accepted' languages seemed to me  much better alternative. It has the
unfortunate side effect of isolating English speaking folks from IDN benefit but that can be worked out by
educating users. And it still offer a greater level of safety while allowing domain users of any TLD to
experiment IDN.
Personally I still have no doubt that the A) solution was the best then and still the best today.

Michel
Totally speaking for myself. I have no link anymore with Microsoft and IE design decision.
Gervase Markham | 12 Dec 2011 12:16
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 09/12/11 17:26, Michel Suignard wrote:
> I am pretty much on Paul's side here. I was somewhat involved in the
> IE design concerning the display policy and although I knew the
> Firefox choice then I never had much sympathy for it. It basically
> meant that you were blacklisting large TLDs forever because you could
> not guarantee that all registered entries would ever be 'safe'. 

Not so; if asked, I was perfectly willing (at least, during the couple
of years after we instituted the policy) to consider grandfathering
existing registrations. Any TLD which came to me and said "we have a
problem with some existing domains, but we want to make things better
going forward - here's our new policy" would have received a very
sympathetic hearing.

As it turned out, no-one asked.

> And I
> don't want to go in how do you establish public rules determining
> which TLDs are blacklisted. It is a risky business to be in if you
> are a commercial entity. 

Mozilla may not be quite as sueable as Microsoft, but I don't think
that's down /per se/ to us being a non-profit.

> Putting the user in charge through
> 'accepted' languages seemed to me  much better alternative. 

If they even realise that they have been put in charge in that way. I
suspect very few users would jump from "this domain name looks funny" to
"my language settings are wrong".
(Continue reading)

Martin J. Dürst | 14 Dec 2011 12:06
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/12 20:16, Gervase Markham wrote:
> On 09/12/11 17:26, Michel Suignard wrote:

>> Putting the user in charge through
>> 'accepted' languages seemed to me  much better alternative.
>
> If they even realise that they have been put in charge in that way. I
> suspect very few users would jump from "this domain name looks funny" to
> "my language settings are wrong".
>
>> It has
>> the unfortunate side effect of isolating English speaking folks from
>> IDN benefit

So it was essentially a script-based solution, but with English as a 
separate case, as I suspected in my earlier mail? Or something else?

> but that can be worked out by educating users.
>
> Has Microsoft, to your knowledge, taken any steps to do such education?

Yes, they have. As I explained in another mail, when punycode is 
displayed, you get a message, which leads (among else) to some help pages.

Regards,    Martin.
Andrew Sullivan | 9 Dec 2011 18:52

Re: Browser IDN display policy: opinions sought

On Fri, Dec 09, 2011 at 08:46:36AM -0800, Paul Hoffman wrote:
> 
> No; yes. :-) I think that if type A were not justified, there would have been many complaints about it over
the years; I haven't heard any.
> 

I have spent much of the summer and fall listening to people complain
about how browsers handle IDNs.  The meetings I had with people from
India on this topic might as well have included the words, "A pox on
all their houses."  Consistency would be an improvement, I think, but
barely.

A

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
Paul Hoffman | 9 Dec 2011 19:10
Picon

Re: Browser IDN display policy: opinions sought

On Dec 9, 2011, at 9:52 AM, Andrew Sullivan wrote:

> On Fri, Dec 09, 2011 at 08:46:36AM -0800, Paul Hoffman wrote:
>> 
>> No; yes. :-) I think that if type A were not justified, there would have been many complaints about it over
the years; I haven't heard any.
>> 
> 
> I have spent much of the summer and fall listening to people complain
> about how browsers handle IDNs.  The meetings I had with people from
> India on this topic might as well have included the words, "A pox on
> all their houses."  Consistency would be an improvement, I think, but
> barely.

Were people complaining about A-type browsers, B-type browsers, or both? I note that none of the TLDs
commonly used in India are one that Mozilla acknowledges as IDN-enabled.

--Paul Hoffman
Andrew Sullivan | 9 Dec 2011 19:19

Re: Browser IDN display policy: opinions sought

On Fri, Dec 09, 2011 at 10:10:04AM -0800, Paul Hoffman wrote:
> 
> Were people complaining about A-type browsers, B-type browsers, or both? I note that none of the TLDs
commonly used in India are one that Mozilla acknowledges as IDN-enabled.

Both.  And yes, they were complaining about that, too.

A

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
Andrew Sullivan | 9 Dec 2011 18:49

Re: Browser IDN display policy: opinions sought

On Fri, Dec 09, 2011 at 11:12:29AM +0000, Gervase Markham wrote:
> 
> A (IE, Chrome): Unicode if the (single) 'language' of the string is
> configured in the options, Punycode otherwise.

The problem with this, of course, is that in many cases there's no way
to tell what language a string is in.  If you get things all in the
Arabic Script, for instance, what language are you in?  And Latin is a
disaster for this.  

It would be quite another matter if there were a way for a zone
operator to publish somehow what languages (or maybe scripts?) they
support.  In that case, you could look it up and know what to do (you
might not even have to do this quickly).

> B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode
> otherwise. Arbitrary script mixing permitted (registry policy used to
> prevent abuse).

The problem here has always been both the notion of TLD and the
whitelist maintenance.  The publicsuffix.org list is, in effect, a
lookaside root list, and it makes me extremely uncomfortable.
Moreover, what do you do about things lower in the tree?

> C (Safari): Unicode if the script is in a whitelist (which by default
> does not include Cyrillic or Greek), Punycode otherwise. Not sure about
> script mixing.

This approach sucks in all the ways you say.  I think it is the worst
option.
(Continue reading)

Mark Davis ☕ | 9 Dec 2011 19:10

Re: Browser IDN display policy: opinions sought

I'm not familiar with the code, but I think that (A) may actually be:

A (IE, Chrome): Unicode if the (single) 'script' of the string matches one of the scripts of the user's language(s) in the options,
Punycode otherwise.

It is pretty easy and reliable to detect the script of the string, whereas language detection would be unreliable.

(It would be possible to match the characters of the string against the "customary" characters used in the user's languages in the options, but that would be trickier, and is probably not worth it.)

Mark
— Il meglio è l’inimico del bene —


On Fri, Dec 9, 2011 at 09:49, Andrew Sullivan <ajs <at> anvilwalrusden.com> wrote:
This approach sucks in all the ways you say.  I think it is the worst
option.

I think that the right approach would be A _if_ you could get the
advantages of B automatically somehow.  At the moment, however, I
think all the answers are bad ones.

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Gervase Markham | 12 Dec 2011 11:54
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 09/12/11 18:10, Mark Davis ☕ wrote:
> I'm not familiar with the code, but I think that (A) may actually be:
> 
> A (IE, Chrome): Unicode if the (single) 'script' of the string matches
> one of the scripts of the user's language(s) in the options,
> Punycode otherwise.
> 
> It is pretty easy and reliable to detect the script of the string,
> whereas language detection would be unreliable.

I can quite believe it may be something like this; but how does one deal
with the impedance mismatch that users think they are defining
languages, but what you need is scripts? Does IE keep a script/language
mapping? Is that data (perhaps compiled by others) publicly available
somewhere, e.g. from the Unicode consortium?

Gerv
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Martin J. Dürst | 14 Dec 2011 12:02
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/12 19:54, Gervase Markham wrote:
> On 09/12/11 18:10, Mark Davis ☕ wrote:
>> I'm not familiar with the code, but I think that (A) may actually be:
>>
>> A (IE, Chrome): Unicode if the (single) 'script' of the string matches
>> one of the scripts of the user's language(s) in the options,
>> Punycode otherwise.
>>
>> It is pretty easy and reliable to detect the script of the string,
>> whereas language detection would be unreliable.

I have to correct myself. In another mail, I was writing that I was 
quite sure that Mark's correction applied. But by playing around with 
IE, I found out that this may only partially be the case.

I looked at http://www.viagénie.com/ in IE (IE8 on Win7), and it showed 
punycode. I then added "en" (English) to my language preferences (which 
were just "ja" (Japanese) out of the box because I rarely use IE). 
viagénie was still shown in punycode. Then I added "de" (German), and 
now viagénie was shown. So either IE uses a separate "script" category 
"ASCII-only" (but the algorithm would still be script-oriented at the 
core) or the letters for a language are taken rather widely, with German 
including French accented letters and so on (which would be a 
language-only algorithm).

Michel, if you know any details (that you can talk about), it would be 
nice to hear from you.

When showing punycode, IE also displayed a one-line message just above 
the page itself and below the chrome (tabs and stuff), saying 
(translating back from Japanese) "This Web address contains letters or 
symbols that cannot be displayed with the current language settings. If 
you click here, options will be displayed...". When clicking, I get the 
options of changing my language settings, of not displaying the message 
anymore, or of getting some further explanations or help.

> I can quite believe it may be something like this; but how does one deal
> with the impedance mismatch that users think they are defining
> languages, but what you need is scripts? Does IE keep a script/language
> mapping? Is that data (perhaps compiled by others) publicly available
> somewhere, e.g. from the Unicode consortium?

Some of the data is in the suppress-script fields in the language subtag 
registry at IANA. At 
http://www.iana.org/assignments/language-subtag-registry, if you see 
something like:

%%
Type: language
Subtag: af
Description: Afrikaans
Added: 2005-10-16
Suppress-Script: Latn
%%

then Suppress-Script: Latn tells you that Afrikaans is, for all intents 
and purposes, written with the Latin script. This information isn't 
complete (given the number of languages in the subtag registry, that 
shouldn't be a surprise), but I'd say it's highly accurate where it's 
there, and it's there for most of the major languages for which it can 
be reasonably provided.

For character coverage needed for a language, CLDR (the Unicode Common 
Locale Data Repository, http://cldr.unicode.org) provides quite a lot of 
data to work with, although you may want to have a closer look or talk 
with somebody more familiar with the data and processes before you work 
on a particular application.

While I'm mentioning data sources, I also wanted to mention 
http://www.unicode.org/reports/tr36/, Unicode Security Considerations, 
and http://www.unicode.org/reports/tr39/, Unicode Security Mechanisms, 
and the data sources mentioned there. I'm very surprised that nobody has 
mentioned them, because I think they are extremely relevant and helpful 
for our discussion and for actual implementations.

Regards,   Martin.
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Michel Suignard | 14 Dec 2011 19:12

RE: Browser IDN display policy: opinions sought

> I looked at http://www.viagénie.com/ in IE (IE8 on Win7),
> and it showed punycode. I then added "en" (English) to my
> language preferences (which were just "ja" (Japanese)
> out of the box because I rarely use IE). 
>viagénie was still shown in punycode. Then I added "de" 
>(German), and now viagénie was shown. So either IE uses
> a separate "script" category "ASCII-only" (but the algorithm
> would still be script-oriented at the core) or the letters for
> a language are taken rather widely, with German including
> French accented letters and so on (which would be a
> language-only algorithm).
>
>Michel, if you know any details (that you can talk about),
> it would be nice to hear from you.

Martin, you are correct, enabling any Latin based languages other than English would unlock IDN for Latin
script in IE. I was never a fan of blocking IDN for English users but I was not part of the IE team and that was
their decision. Given that new devices are able to show most U-label w/o install of new fonts I agree that
nowadays browsers should show them. And being in charge of creating charts for all of them in both Unicode
and 10646 I can tell it is not a small feast.

I also found some public information about the white list that IE uses for script mixing. It is a bit old
(2006), but I don't think it has changed but I obviously don't know. Check
http://blogs.msdn.com/b/ie/archive/2006/07/31/684337.aspx 

Michel
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Ken Whistler | 14 Dec 2011 23:36
Picon
Favicon

Re: Browser IDN display policy: opinions sought

On 12/14/2011 3:02 AM, "Martin J. Dürst" wrote:
> On 2011/12/12 19:54, Gervase Markham wrote:
>
>
>> I can quite believe it may be something like this; but how does one deal
>> with the impedance mismatch that users think they are defining
>> languages, but what you need is scripts? Does IE keep a script/language
>> mapping? Is that data (perhaps compiled by others) publicly available
>> somewhere, e.g. from the Unicode consortium?
>
>
>
> For character coverage needed for a language, CLDR (the Unicode Common 
> Locale Data Repository, http://cldr.unicode.org) provides quite a lot 
> of data to work with, although you may want to have a closer look or 
> talk with somebody more familiar with the data and processes before 
> you work on a particular application.
>
>
>

Just following up this particular query about publicly available data 
about script/language
mapping, CLDR also makes available specific charts which specify the 
(commonly used)
scripts for a large number of languages, including nearly all of the 
languages which
would be used for IDNs. See:

http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/languages_and_scripts.html

and the reverse indexed:

http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/scripts_and_languages.html

Although this data is not perfect or complete for *all* languages, it is 
a very good
statement of 99.9% of the significant facts of usage relevant to the 
issues being
debated on this thread, IMO.

Anyone making use of this data would need to become familiar with its 
source,
supplementalData.xml in the CLDR releases, and know something about the 
extensions
which CLDR makes to the Unicode notion of "script", before just blindly 
implementing
it. For example, the Japanese *language* is identified as being written 
with the
Japanese *script* in languages_and_scripts.html. The Japanese "script" 
actually
refers to the Japanese writing system, which combines several scripts, 
but which, for
various implementations reasons is identified in CLDR with an aggregated 
script
identifier. And so on.

However, I think this is the kind of machine-readable information that 
Gervase was
inquiring about.

Note also that CLDR is an ongoing project responsive to public input and 
feedback,
so if there are deficiencies, omissions, or outright errors in the 
script and language
data, the CLDR project would like to hear about it via bug reports. See:

http://cldr.unicode.org/

--Ken

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Gervase Markham | 12 Dec 2011 12:15
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 09/12/11 17:49, Andrew Sullivan wrote:
> The problem here has always been both the notion of TLD and the
> whitelist maintenance.  The publicsuffix.org list is, in effect, a
> lookaside root list, and it makes me extremely uncomfortable.
> Moreover, what do you do about things lower in the tree?

The publicsuffix.org list and the IDN TLD whitelist are two separate
entities and used for different purposes (although I happen to be
involved in them both). Let's not get them mixed up :-)

>> C (Safari): Unicode if the script is in a whitelist (which by default
>> does not include Cyrillic or Greek), Punycode otherwise. Not sure about
>> script mixing.
> 
> This approach sucks in all the ways you say.  I think it is the worst
> option.
> 
> I think that the right approach would be A _if_ you could get the
> advantages of B automatically somehow.

And ponies! ;-)

> Note, too, that if the root zone expands the way it sometimes
> threatens to, the whitelist approach will become impractical without
> an awful lot of failures.

Indeed. It is with an eye to that future event that we are re-evaluating
our position.

Gerv
Andrew Sullivan | 12 Dec 2011 16:23

Re: Browser IDN display policy: opinions sought

On Mon, Dec 12, 2011 at 11:15:55AM +0000, Gervase Markham wrote:
> 
> The publicsuffix.org list and the IDN TLD whitelist are two separate
> entities and used for different purposes (although I happen to be
> involved in them both). Let's not get them mixed up :-)

They work on the same principle, however: the policies of some group
other than that group that makes the delegation from the root is what
matters.

> > 
> > I think that the right approach would be A _if_ you could get the
> > advantages of B automatically somehow.
> 
> And ponies! ;-)

Well, it's bound to sound that way if you don't take seriously the
idea that there might be a way to figure these things out.

Suppose that zone operators (not just the root or TLDs, but any random
zone you liked) had a mechanism by which you could look up their
policies for, say, code point inclusion.  That is, I'm RegyCo, and I
run .example.  I put an SRV or URI or something record at .example
that points you to a policy document that tells you what code point
ranges are permitted together in a single label in my zone, and also
(for that matter) what code points I will register _at all_.  Now you
are in a position to decide whether you think my policy is sensible;
and you are also in a position to decide whether any given label
actually meets my own stated policies.  Finally, since this forms the
basis for a filter in your software, you have the ability to set a
default for your users that makes sense, but also a way for people who
want it to get the benefits of the most permissive settings available
under approach A.  Finally, it wouldn't involve a massive scaling
problem facing the whitelist in the case the root zone increases
dramatically in size, since most of the work (all?) could be
automated.

Best,
A
--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
Gervase Markham | 12 Dec 2011 17:54
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 12/12/11 15:23, Andrew Sullivan wrote:
> On Mon, Dec 12, 2011 at 11:15:55AM +0000, Gervase Markham wrote:
>>
>> The publicsuffix.org list and the IDN TLD whitelist are two separate
>> entities and used for different purposes (although I happen to be
>> involved in them both). Let's not get them mixed up :-)

Hey, if the equivalent to the PSL info was published in DNS instead by
everyone, I'd be the first to applaud :-)

>> And ponies! ;-)
> 
> Well, it's bound to sound that way if you don't take seriously the
> idea that there might be a way to figure these things out.

There was a winking smiley ;-)

> Suppose that zone operators (not just the root or TLDs, but any random
> zone you liked) had a mechanism by which you could look up their
> policies for, say, code point inclusion.  That is, I'm RegyCo, and I
> run .example.  I put an SRV or URI or something record at .example
> that points you to a policy document that tells you what code point
> ranges are permitted together in a single label in my zone, and also
> (for that matter) what code points I will register _at all_.  Now you
> are in a position to decide whether you think my policy is sensible;
> and you are also in a position to decide whether any given label
> actually meets my own stated policies.

If I am to do such a check (and presumably to fail if the domain doesn't
meet it), what about when a policy changes to be more strict? How do you
deal with grandfathering?

What about performance? I would need to look up the rules for the zone
"foo.com" every time I accessed bar.foo.com, for lots of values of foo.
This doesn't sound like it would improve performance.

If there are going to be rules, by far the best place to enforce them is
once at domain registration time, not in real time in performance
critical code millions of times a day at access time.

Approaches like this were considered when we initially made the decision
to go with the solution we have. The bottom of
http://www.mozilla.org/projects/security/tld-idn-policy-list.html
says:

"The Moz/Opera anti-spoofing mechanism is the result of widespread
public analysis and discussion, and has the following advantages:

...

    It is simple to code and deploy: about ten lines of code for the
Mozilla implementation.
...
    It is the sole survivor of a large number of alternative proposals
that were considered and rejected. Unlike most of the other rejected
proposals, it does not need any modifications to the DNS protocol, or
distribution of "language" codes for labels, nor does it require
multiple DNS lookups, large character tables in the browser, or
real-time access to WHOIS information.
...

Gerv
Paul Hoffman | 12 Dec 2011 18:24
Picon

Re: Browser IDN display policy: opinions sought

Not speaking for Andrew, just myself:

On Dec 12, 2011, at 8:54 AM, Gervase Markham wrote:

> If I am to do such a check (and presumably to fail if the domain doesn't
> meet it), what about when a policy changes to be more strict? How do you
> deal with grandfathering?

The zone owner deals with grandfathering. They publish a policy that reflects all of the zones they control.

If you are asking "how does a browser deal with grandfathering when a policy changes?", I would say the same
thing: assume that the zone owner has reasons for publishing that policy.

> What about performance? I would need to look up the rules for the zone
> "foo.com" every time I accessed bar.foo.com, for lots of values of foo.
> This doesn't sound like it would improve performance.

A sane browser would look up policies based on the browser's own policy update strategy, and would risk
missing a policy update. Further, given that the policies would have TTLs on them, you don't even need to
think about looking up a policy until the TTL expires. Further, the browser will know whether or not the
zone had a policy before and base its lookup strategy on that.

Or was that meant to be another strawman objection?

> If there are going to be rules, by far the best place to enforce them is
> once at domain registration time, not in real time in performance
> critical code millions of times a day at access time.

Fully disagree. That restricts TLDs to never changing their policies. A browser vendor might want this
convenience, but there are plenty of people who would like the browser vendors to be more responsive to
changes than that so that IDNs can be more useful.

> Approaches like this were considered when we initially made the decision
> to go with the solution we have. The bottom of
> http://www.mozilla.org/projects/security/tld-idn-policy-list.html
> says:
> 
> "The Moz/Opera anti-spoofing mechanism is the result of widespread
> public analysis and discussion, and has the following advantages:
> 
> ...
> 
>    It is simple to code and deploy: about ten lines of code for the
> Mozilla implementation.
> ...
>    It is the sole survivor of a large number of alternative proposals
> that were considered and rejected. Unlike most of the other rejected
> proposals, it does not need any modifications to the DNS protocol, or
> distribution of "language" codes for labels, nor does it require
> multiple DNS lookups, large character tables in the browser, or
> real-time access to WHOIS information.
> ...

Noted. So, if other browser vendors adopt a different approach, are you saying Mozilla won't? I thought the
purpose of this thread was to revisit the question of what would be best for the users.

--Paul Hoffman
Gervase Markham | 12 Dec 2011 18:33
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 12/12/11 17:24, Paul Hoffman wrote:
> Not speaking for Andrew, just myself:
> 
> On Dec 12, 2011, at 8:54 AM, Gervase Markham wrote:
> 
>> If I am to do such a check (and presumably to fail if the domain
>> doesn't meet it), what about when a policy changes to be more
>> strict? How do you deal with grandfathering?
> 
> The zone owner deals with grandfathering. They publish a policy that
> reflects all of the zones they control.

So published policies can only ever been loosened, not tightened?

(If they were tightened, existing domains might fail the new policy,
which would result in them being blocked. This is what I mean by the
grandfathering problem.)

> If you are asking "how does a browser deal with grandfathering when a
> policy changes?", I would say the same thing: assume that the zone
> owner has reasons for publishing that policy.

So just block the domains, then?

>> What about performance? I would need to look up the rules for the
>> zone "foo.com" every time I accessed bar.foo.com, for lots of
>> values of foo. This doesn't sound like it would improve
>> performance.
> 
> A sane browser would look up policies based on the browser's own
> policy update strategy, and would risk missing a policy update.
> Further, given that the policies would have TTLs on them, you don't
> even need to think about looking up a policy until the TTL expires.
> Further, the browser will know whether or not the zone had a policy
> before and base its lookup strategy on that.

Except if it's the first time the user has visited the site, or the user
has cleared their browsing history, or is using private browsing, or
their cache has been cleared due to memory pressure on their mobile
device, or...

> Or was that meant to be another strawman objection?

Oh, no.

Historically, the idea that a browser will burden the Internet and its
own speed characteristics with millions or billions of additional
requests per day, (and making them blocking - so don't render the site
until it returns) for a purpose such as this has been met with
incredulity and barely-suppressed laughter from our performance and
networking teams.

>> If there are going to be rules, by far the best place to enforce
>> them is once at domain registration time, not in real time in
>> performance critical code millions of times a day at access time.
> 
> Fully disagree. That restricts TLDs to never changing their policies.

Surely the opposite? If a TLD enforces a policy at registration time, it
can change that policy and start accepting registrations under the new
one, without consulting anyone.

> Noted. So, if other browser vendors adopt a different approach, are
> you saying Mozilla won't? I thought the purpose of this thread was to
> revisit the question of what would be best for the users.

I'm not saying that. But no other browser vendor has adopted an approach
which requires extra network requests for each new site visited, and
periodically thereafter. And I don't expect them to.

Gerv
Paul Hoffman | 12 Dec 2011 19:18
Picon

Re: Browser IDN display policy: opinions sought

On Dec 12, 2011, at 9:33 AM, Gervase Markham wrote:

> On 12/12/11 17:24, Paul Hoffman wrote:
>> Not speaking for Andrew, just myself:
>> 
>> On Dec 12, 2011, at 8:54 AM, Gervase Markham wrote:
>> 
>>> If I am to do such a check (and presumably to fail if the domain
>>> doesn't meet it), what about when a policy changes to be more
>>> strict? How do you deal with grandfathering?
>> 
>> The zone owner deals with grandfathering. They publish a policy that
>> reflects all of the zones they control.
> 
> So published policies can only ever been loosened, not tightened?
> 
> (If they were tightened, existing domains might fail the new policy,
> which would result in them being blocked. This is what I mean by the
> grandfathering problem.)

Published policies can be tightened. No one so far has talked about *blocking* domains: we have been
talking about when to display names in one form or another. Note that "loosening" might be the same as
"tightening": it can change the way a domain appears in the location bar.

>> If you are asking "how does a browser deal with grandfathering when a
>> policy changes?", I would say the same thing: assume that the zone
>> owner has reasons for publishing that policy.
> 
> So just block the domains, then?

Why would you want to block a domain instead of just showing the Punycode-encoded name?

>>> What about performance? I would need to look up the rules for the
>>> zone "foo.com" every time I accessed bar.foo.com, for lots of
>>> values of foo. This doesn't sound like it would improve
>>> performance.
>> 
>> A sane browser would look up policies based on the browser's own
>> policy update strategy, and would risk missing a policy update.
>> Further, given that the policies would have TTLs on them, you don't
>> even need to think about looking up a policy until the TTL expires.
>> Further, the browser will know whether or not the zone had a policy
>> before and base its lookup strategy on that.
> 
> Except if it's the first time the user has visited the site, or the user
> has cleared their browsing history, or is using private browsing, or
> their cache has been cleared due to memory pressure on their mobile
> device, or...

When a user visits www.example.com and the browser only knows the policy for .com, the browser might look up
the policy for example.com. Or, it might not: a browser that cares about display speed might have a list of
single-script terminal labels that don't need looking up, such as "www".

You can code your browser however you want, of course. To me, a sane browser would not clear the Punycode
display history when the browser clears its browsing history or goes into private browsing. The display
of domain names (or anything!) in the location bar is unrelated to the content of the page. 

>> Or was that meant to be another strawman objection?
> 
> Oh, no.
> 
> Historically, the idea that a browser will burden the Internet and its
> own speed characteristics with millions or billions of additional
> requests per day, (and making them blocking - so don't render the site
> until it returns) for a purpose such as this has been met with
> incredulity and barely-suppressed laughter from our performance and
> networking teams.

That's fine: this isn't such a proposal. I'm not sure why you are treating it as one.

>>> If there are going to be rules, by far the best place to enforce
>>> them is once at domain registration time, not in real time in
>>> performance critical code millions of times a day at access time.
>> 
>> Fully disagree. That restricts TLDs to never changing their policies.
> 
> Surely the opposite? If a TLD enforces a policy at registration time, it
> can change that policy and start accepting registrations under the new
> one, without consulting anyone.

Have you forgotten that TLDs are registered with the root? Or are you making an exception for them in the
"once at domain registration time" rule above?

>> Noted. So, if other browser vendors adopt a different approach, are
>> you saying Mozilla won't? I thought the purpose of this thread was to
>> revisit the question of what would be best for the users.
> 
> I'm not saying that. But no other browser vendor has adopted an approach
> which requires extra network requests for each new site visited, and
> periodically thereafter. And I don't expect them to.

Noted. You also expected them to adopt the Mozilla IDN policy, but that didn't happen either. We all
surprise each other, often to good effect.

--Paul Hoffman
Gervase Markham | 13 Dec 2011 11:26
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 12/12/11 18:18, Paul Hoffman wrote:
>> (If they were tightened, existing domains might fail the new
>> policy, which would result in them being blocked. This is what I
>> mean by the grandfathering problem.)
> 
> Published policies can be tightened. No one so far has talked about
> *blocking* domains: we have been talking about when to display names
> in one form or another. Note that "loosening" might be the same as
> "tightening": it can change the way a domain appears in the location
> bar.

OK, yes. I was typing faster than thinking. For "blocking", read "start
displaying the name as Punycode".

I guess what I'm saying is that with this browser-enforced mechanism,
someone can register a name, verify that it renders fine in all
browsers, start using it, build a business, and then a change of
registry policy leads to their name starting to appear as gobbledegook
everywhere simultaneously.

That doesn't sound awesome.

>> Except if it's the first time the user has visited the site, or the
>> user has cleared their browsing history, or is using private
>> browsing, or their cache has been cleared due to memory pressure on
>> their mobile device, or...
> 
> When a user visits www.example.com and the browser only knows the
> policy for .com, the browser might look up the policy for
> example.com. Or, it might not: a browser that cares about display
> speed might have a list of single-script terminal labels that don't
> need looking up, such as "www".

http://www.com?

This part of the idea sounds like it would need something like, er, the
Public Suffix List to make sure it worked correctly. ;-)

> You can code your browser however you want, of course. To me, a sane
> browser would not clear the Punycode display history when the browser
> clears its browsing history or goes into private browsing.

A user who clears their history wants, or is sufficiently likely to
want, the browser to entirely forget that he has visited the sites he
has visited - in such a way that someone examining his computer cannot
tell he has been there. Therefore, any browser record of domain names
(whether it's HSTS pin information, domain name letter display policy,
or anything like that) also has to be cleared.

>> Historically, the idea that a browser will burden the Internet and
>> its own speed characteristics with millions or billions of
>> additional requests per day, (and making them blocking - so don't
>> render the site until it returns) for a purpose such as this has
>> been met with incredulity and barely-suppressed laughter from our
>> performance and networking teams.
> 
> That's fine: this isn't such a proposal. I'm not sure why you are
> treating it as one.

Are you saying that these lookups would be non-blocking?

Or are you saying that implementing it in a browser used by 450 million
people wouldn't lead to billions of additional requests per day?

>>> Fully disagree. That restricts TLDs to never changing their
>>> policies.
>> 
>> Surely the opposite? If a TLD enforces a policy at registration
>> time, it can change that policy and start accepting registrations
>> under the new one, without consulting anyone.
> 
> Have you forgotten that TLDs are registered with the root? Or are you
> making an exception for them in the "once at domain registration
> time" rule above?

I'm sorry, I've failed to follow this subthread of the discussion. Can
you restate your point for me?

Gerv
Paul Hoffman | 13 Dec 2011 16:37
Picon

Re: Browser IDN display policy: opinions sought

On Dec 13, 2011, at 2:26 AM, Gervase Markham wrote:

> OK, yes. I was typing faster than thinking. For "blocking", read "start
> displaying the name as Punycode".

Whew. Good.

> I guess what I'm saying is that with this browser-enforced mechanism,
> someone can register a name, verify that it renders fine in all
> browsers, start using it, build a business, and then a change of
> registry policy leads to their name starting to appear as gobbledegook
> everywhere simultaneously.
> 
> That doesn't sound awesome.

A zone *always* has the right to change information for sub-domains in the zone; many of those changes will
affect the story you gave above. The policy for name display is one of the least-offensive of these changes.

As others have said, if you want permanent perfection, don't look for it in the DNS.

>>> Except if it's the first time the user has visited the site, or the
>>> user has cleared their browsing history, or is using private
>>> browsing, or their cache has been cleared due to memory pressure on
>>> their mobile device, or...
>> 
>> When a user visits www.example.com and the browser only knows the
>> policy for .com, the browser might look up the policy for
>> example.com. Or, it might not: a browser that cares about display
>> speed might have a list of single-script terminal labels that don't
>> need looking up, such as "www".
> 
> http://www.com?

Correct. If you haven't cached .com's name display policy when you get that, go ahead and display it.

> This part of the idea sounds like it would need something like, er, the
> Public Suffix List to make sure it worked correctly. ;-)

It sounds like you are fishing for reasons to support it; have a party with that.

>> You can code your browser however you want, of course. To me, a sane
>> browser would not clear the Punycode display history when the browser
>> clears its browsing history or goes into private browsing.
> 
> A user who clears their history wants, or is sufficiently likely to
> want, the browser to entirely forget that he has visited the sites he
> has visited - in such a way that someone examining his computer cannot
> tell he has been there. Therefore, any browser record of domain names
> (whether it's HSTS pin information, domain name letter display policy,
> or anything like that) also has to be cleared.

You are possibly mixing up levels again. If a user goes to www.nastypr0nsite.com and hides that by clearing
everything in his browser, that action does not clear the DNS cache at the same time.

>>> Historically, the idea that a browser will burden the Internet and
>>> its own speed characteristics with millions or billions of
>>> additional requests per day, (and making them blocking - so don't
>>> render the site until it returns) for a purpose such as this has
>>> been met with incredulity and barely-suppressed laughter from our
>>> performance and networking teams.
>> 
>> That's fine: this isn't such a proposal. I'm not sure why you are
>> treating it as one.
> 
> Are you saying that these lookups would be non-blocking?
> 
> Or are you saying that implementing it in a browser used by 450 million
> people wouldn't lead to billions of additional requests per day?

The latter. Repeating myself: the first time a user goes to www.somenewsite.com, if the policy for .com is
already in the user's DNS cache, there is no additional lookup.

>>>> Fully disagree. That restricts TLDs to never changing their
>>>> policies.
>>> 
>>> Surely the opposite? If a TLD enforces a policy at registration
>>> time, it can change that policy and start accepting registrations
>>> under the new one, without consulting anyone.
>> 
>> Have you forgotten that TLDs are registered with the root? Or are you
>> making an exception for them in the "once at domain registration
>> time" rule above?
> 
> I'm sorry, I've failed to follow this subthread of the discussion. Can
> you restate your point for me?

One level up, you said "If there are going to be rules, by far the best place to enforce them is once at domain
registration time, not in real time in performance critical code millions of times a day at access time". I
disagreed because TLDs are registered in the root, and I do not want ICANN enforcing a policy on TLDs that
the TLDs cannot change over time.

--Paul Hoffman
Gervase Markham | 14 Dec 2011 11:55
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 13/12/11 15:37, Paul Hoffman wrote:
> You are possibly mixing up levels again. If a user goes to
> www.nastypr0nsite.com and hides that by clearing everything in his
> browser, that action does not clear the DNS cache at the same time.

In Firefox, it clears everything under our control - and, in fact, we
have had additional APIs added to the plugin interface used by e.g.
Flash so we can clear stuff not under our direct control as well, such
as 'Flash cookies'. If Firefox retains any sort of record that you've
visited a particular site after you have cleared all data, that's a bug.

> One level up, you said "If there are going to be rules, by far the
> best place to enforce them is once at domain registration time, not
> in real time in performance critical code millions of times a day at
> access time". I disagreed because TLDs are registered in the root,
> and I do not want ICANN enforcing a policy on TLDs that the TLDs
> cannot change over time.

I can't parse the last sentence. Are you saying:

a) "I do not want ICANN enforcing a policy on TLDs such that the TLDs
cannot change the policy over time."

or

b) "I do not want ICANN enforcing a policy on TLDs such that the nature
of the policy in regards to what TLDs can and cannot exist, cannot
change over time."

or something else? Surely, with respect to b), ICANN does this, but has
no problems today changing its policy about what is and isn't allowed?

Gerv
Paul Hoffman | 14 Dec 2011 18:32
Picon

Re: Browser IDN display policy: opinions sought

On Dec 14, 2011, at 2:55 AM, Gervase Markham wrote:

> On 13/12/11 15:37, Paul Hoffman wrote:
>> You are possibly mixing up levels again. If a user goes to
>> www.nastypr0nsite.com and hides that by clearing everything in his
>> browser, that action does not clear the DNS cache at the same time.
> 
> In Firefox, it clears everything under our control - and, in fact, we
> have had additional APIs added to the plugin interface used by e.g.
> Flash so we can clear stuff not under our direct control as well, such
> as 'Flash cookies'. If Firefox retains any sort of record that you've
> visited a particular site after you have cleared all data, that's a bug.

As far as I have seen in my tests with Firefox, the OS's DNS cache is not one of the things that falls under
"everything in our control". So, if I go to www.nastypr0nsite.com in Firefox private browsing and then
quit from Firefox, and someone grabs my computer right then, they can see that an application wanted the
DNS information for that site. The fact that the application also wanted the IDN display policy doesn't
seem any more damning than the application wanting the A/AAAA record.

>> One level up, you said "If there are going to be rules, by far the
>> best place to enforce them is once at domain registration time, not
>> in real time in performance critical code millions of times a day at
>> access time". I disagreed because TLDs are registered in the root,
>> and I do not want ICANN enforcing a policy on TLDs that the TLDs
>> cannot change over time.
> 
> I can't parse the last sentence. Are you saying:
> 
> a) "I do not want ICANN enforcing a policy on TLDs such that the TLDs
> cannot change the policy over time."
> 
> or
> 
> b) "I do not want ICANN enforcing a policy on TLDs such that the nature
> of the policy in regards to what TLDs can and cannot exist, cannot
> change over time."
> 
> or something else? Surely, with respect to b), ICANN does this, but has
> no problems today changing its policy about what is and isn't allowed?

I meant (a). I want the same for all zones, regardless of when they are registered. I agree that this takes
more work on the part of browsers, and will cause more traffic on the Internet: it is worth it to have IDNs
work better than they do today where some browsers block display for reasons unfathomable to users.

--Paul Hoffman
John C Klensin | 12 Dec 2011 20:30

Re: Browser IDN display policy: opinions sought


--On Monday, December 12, 2011 09:24 -0800 Paul Hoffman
<phoffman <at> imc.org> wrote:

>...
>> If there are going to be rules, by far the best place to
>> enforce them is once at domain registration time, not in real
>> time in performance critical code millions of times a day at
>> access time.
> 
> Fully disagree. That restricts TLDs to never changing their
> policies. A browser vendor might want this convenience, but
> there are plenty of people who would like the browser vendors
> to be more responsive to changes than that so that IDNs can be
> more useful.

Paul, perhaps Gerv should have stated that rule-enforcement
provision differently (and maybe he should have said "by the
registration and delegation process" rather than at a specific
time0, but I disagree with your inference.  But:

-- A domain applicant who doesn't meet requirements at the time
of application should certainly be able to reapply if the
requirements change.

-- A domain applicant who meets requirements at the time of
registration and whose domain is delegated, still has to renew
the registration and, especially given appropriate contract
provisions could be subjected to newer rules at renewal time.
In the case of rules modified to deal with problems, really
egregious, problem-causing, variations from those new rules
could result in domain cancellation.  Note that, especially in
the last year, we've seen an increasing number of domain
cancellations at the demand of various governments.  That makes
me very nervous, but it is happening and, if the relevant
registry is within the jurisdiction of some body with
cancellation-demanding authority, it isn't likely that it will
change (even if efforts to tighten the conditions under which
cancellation can be requested in some countries are
significantly tightened).  From one point of view, those
external interventions are the consequence of industry (read
"ICANN, registries, registrars, and the domaineers"
unwillingness or inability to self-police (see Eric
Brunner-Williams's recent note, which is probably a better
description of the problem than mine).

Whether cancelling registrations or waiting for renewal, changes
involve some time lag but that is true of almost everything else
in this space including both Gerv's list and various "embed the
lists in the DNS" ideas.

best,
   john
Andrew Sullivan | 13 Dec 2011 00:50

Re: Browser IDN display policy: opinions sought

On Mon, Dec 12, 2011 at 04:54:27PM +0000, Gervase Markham wrote:
> If I am to do such a check (and presumably to fail if the domain doesn't
> meet it), what about when a policy changes to be more strict? How do you
> deal with grandfathering?

Yes, we're going to have a problem with this.  But note that nothing
says that this policy needs be the one you actually register with;
it's just the way that you state, "I permit these things together."
But this is admittedly sort of hand-wavy right now.  It is entirely
possible that this is a fatal problem; but you already have that fatal
problem today, so I don't see how this is any worse than a problem you
have now.

> What about performance? I would need to look up the rules for the zone
> "foo.com" every time I accessed bar.foo.com, for lots of values of foo.
> This doesn't sound like it would improve performance.

I was sort of imaginging that these policies (1) would be cacheable,
so that you wouldn't actually need to do things in real time all the
time, and (2) would fail soft, so that you fall back to A-label form
until you've managed to fetch the relevant policy (at which time you
can check the label against the policy and update the display as
necessary).

> If there are going to be rules, by far the best place to enforce them is
> once at domain registration time, not in real time in performance
> critical code millions of times a day at access time.

Right.  But you're talking about different kinds of rules: (1) how do
I display this? and (2) what is permitted for registration?  You want
(1) to be linked to (2) some how, and I agree.  But I cannot see how
either shipping static lists around or else relying on
language-guessing of intended domains actually addresses the user
problems we're attempting to talk about.

>     It is the sole survivor of a large number of alternative proposals
> that were considered and rejected. Unlike most of the other rejected
> proposals, it does not need any modifications to the DNS protocol, or
> distribution of "language" codes for labels, nor does it require
> multiple DNS lookups, large character tables in the browser, or
> real-time access to WHOIS information.

The only reason the latter two of these are true is because the root
zone is small.  If it grows to several thousands of labels a
significant number of which are IDNs, the last two advantages turn out
to be fatal flaws, because there's no practical way to make the
decision that you need to make on heuristic grounds.  I'm not trying
to dismiss those factors; I think those are indeed advantages to the
existing solution.  But as you see in this thread, there are
disadvantages that also pile up; and I think that pile gets bigger as
the root zone expands.

Best,

A

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
Martin J. Dürst | 13 Dec 2011 05:52
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/13 8:50, Andrew Sullivan wrote:
> On Mon, Dec 12, 2011 at 04:54:27PM +0000, Gervase Markham wrote:

>>      It is the sole survivor of a large number of alternative proposals
>> that were considered and rejected. Unlike most of the other rejected
>> proposals, it does not need any modifications to the DNS protocol, or
>> distribution of "language" codes for labels, nor does it require
>> multiple DNS lookups, large character tables in the browser, or
>> real-time access to WHOIS information.
>
> The only reason the latter two of these are true is because the root
> zone is small.  If it grows to several thousands of labels a
> significant number of which are IDNs, the last two advantages turn out
> to be fatal flaws, because there's no practical way to make the
> decision that you need to make on heuristic grounds.  I'm not trying
> to dismiss those factors; I think those are indeed advantages to the
> existing solution.  But as you see in this thread, there are
> disadvantages that also pile up; and I think that pile gets bigger as
> the root zone expands.

Even without significant growth in the root zone, "large character 
tables in the browser" is actually very relative. 
http://www.unicode.org/Public/UNIDATA/Scripts.txt is about 120kB, but 
most of it is spaces and comments, and it separates out characters by 
character class. Removing character class and taking into account gaps 
and stuff that's not allowed in IDNs anyway, the table can be 
*significantly* compacted.

Regards,    Martin.
Mark Davis ☕ | 13 Dec 2011 06:15

Re: Browser IDN display policy: opinions sought

FYI, a simple binary data structure that contains all the script info is 2,156 bytes. The extended script info would add 385 bytes to that.

Mark
— Il meglio è l’inimico del bene —


On Mon, Dec 12, 2011 at 20:52, "Martin J. Dürst" <duerst <at> it.aoyama.ac.jp> wrote:
On 2011/12/13 8:50, Andrew Sullivan wrote:
On Mon, Dec 12, 2011 at 04:54:27PM +0000, Gervase Markham wrote:

    It is the sole survivor of a large number of alternative proposals
that were considered and rejected. Unlike most of the other rejected
proposals, it does not need any modifications to the DNS protocol, or
distribution of "language" codes for labels, nor does it require
multiple DNS lookups, large character tables in the browser, or
real-time access to WHOIS information.

The only reason the latter two of these are true is because the root
zone is small.  If it grows to several thousands of labels a
significant number of which are IDNs, the last two advantages turn out
to be fatal flaws, because there's no practical way to make the
decision that you need to make on heuristic grounds.  I'm not trying
to dismiss those factors; I think those are indeed advantages to the
existing solution.  But as you see in this thread, there are
disadvantages that also pile up; and I think that pile gets bigger as
the root zone expands.

Even without significant growth in the root zone, "large character tables in the browser" is actually very relative. http://www.unicode.org/Public/UNIDATA/Scripts.txt is about 120kB, but most of it is spaces and comments, and it separates out characters by character class. Removing character class and taking into account gaps and stuff that's not allowed in IDNs anyway, the table can be *significantly* compacted.

Regards,    Martin.

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
John C Klensin | 12 Dec 2011 18:16

Re: Browser IDN display policy: opinions sought


--On Monday, December 12, 2011 10:23 -0500 Andrew Sullivan
<ajs <at> anvilwalrusden.com> wrote:

>> And ponies! ;-)

> Well, it's bound to sound that way if you don't take seriously
> the idea that there might be a way to figure these things out.
> 
> Suppose that zone operators (not just the root or TLDs, but
> any random zone you liked) had a mechanism by which you could
> look up their policies for, say, code point inclusion.  That
> is, I'm RegyCo, and I run .example.  I put an SRV or URI or
> something record at .example that points you to a policy
> document that tells you what code point ranges are permitted
> together in a single label in my zone, and also (for that
> matter) what code points I will register _at all_.  Now you
> are in a position to decide whether you think my policy is
> sensible; and you are also in a position to decide whether any
> given label actually meets my own stated policies.  Finally,
> since this forms the basis for a filter in your software, you
> have the ability to set a default for your users that makes
> sense, but also a way for people who want it to get the
> benefits of the most permissive settings available under
> approach A.  Finally, it wouldn't involve a massive scaling
> problem facing the whitelist in the case the root zone
> increases dramatically in size, since most of the work (all?)
> could be automated.

Andrew, sure, but...   This comes back to the assumptions that: 

	-- all registries are good guys and enforce whatever
	rules they make.
	
	-- all registrars are good guys, with neither motivation
	nor will for getting around the rules.
	
	-- if either of the above fail, there is someone with
	both the authority and willingness to require that the
	rules be enforced and to enforce that requirement (or to
	enforce the rules itself, but that is even more
	farfetched).

Now, unless one believes in miracle turnarounds from history,
all of the above assumptions are demonstrably and massively
false.  If they were only occasionally false, Gerv would still
need to decide whether his obligation to protect users required
some additional measures.  But, despite believing strongly that
ICANN should be held responsible for stepping up to the role in
this that I read into their charter and bylaws, I think spending
energy on a policy that requires believing all three of the
above today should get you, not just a pony, but an opportunity
on a discount price on a bridge I understand is for sale.

    john
Eric Brunner-Williams | 12 Dec 2011 19:18
Favicon

Re: Browser IDN display policy: opinions sought

On 12/12/11 12:16 PM, John C Klensin wrote:
> 	-- all registries are good guys and enforce whatever
> 	rules they make.

The incumbent monopoly operator and the larger two of the marginally
viable 2000 round gTLD operators are on record opposed to registry
liability for "willful blindness" to systematic misconduct by
registrars and their resellers.

> 	-- all registrars are good guys, with neither motivation
> 	nor will for getting around the rules.

Some 600 of the 900 or so entities ICANN has accredited as
"registrars" exist for the purpose of providing (race) access to the
"drop pool", and a significant number of the remaining 300 registrars
hold substantial self-owned portfolios of domains created prior to the
change of rule concerning "domain tasting".

To attempt a slightly less implied universal cynicism than John, where
the universe of "registries" contains .cat, .coop, .museum, and
perhaps twice that number of registries, none of which is price
capped, and the universe of "registrars" is similarly limited to those
that have not pursued "tasting" or the "secondary market" and which
are not committed to acquisition of a registry agreement and the
monitization strategy of unrestricted registration (and this is not a
null set), Andrew's assumption is not inherently doomed to fail.

So passing on to the third assumption:

> 	-- if either of the above fail, there is someone with
> 	both the authority and willingness to require that the
> 	rules be enforced and to enforce that requirement (or to
> 	enforce the rules itself, but that is even more
> 	farfetched).

At the Rome meeting ICANN took heat for Sitefinder, and after more
time than necessary, issued a statement on the harm of synthetic return.

The next major demonstration of sanity was pulling the plug on the
systemic exploit of the Add Grace Period, ending domain tasting. It
did not, however, act upon registrars which engaged in tasting, other
than de-accrediting those (low-tech and/or low-clue) unable to cease
tasting, and unable to pay the new fee to taste above the permitted
thresholds.

Those are the major sanity-as-enforced-policy I recall. For the past
two years or more ICANN has had the opportunity to spend more on
contractual compliance. I observed at the Cartagena meeting that the
recent head of compliance hire was low-clue on well-known forms of
abuse. I don't think this is greatly improved -- my reading of the
weaknesses of the current CEO, and of the transition effect that has
been present for the past six months or more, and continues for the
next several months.

So I share John's cynicism w.r.t. registry contract enforcement, and
registrar contract enforcement.

There will be safe and sane namespaces, and there will be namespaces
who's operators will maximize revenues, creating externalities to be
bourne by others.

My two beads worth,
Eric
Andrew Sullivan | 13 Dec 2011 00:59

Re: Browser IDN display policy: opinions sought

On Mon, Dec 12, 2011 at 12:16:01PM -0500, John C Klensin wrote:
> Andrew, sure, but...   This comes back to the assumptions that: 
> 
> 	-- all registries are good guys and enforce whatever
> 	rules they make.

No, because you can check those rules yourself in your resolution
context: look at what you are looking up and compare it to the rules
to see whether it conforms.  Indeed, if that's not good enough, you
have this problem anyway.

> 	
> 	-- all registrars are good guys, with neither motivation
> 	nor will for getting around the rules.
> 	
> 	-- if either of the above fail, there is someone with
> 	both the authority and willingness to require that the
> 	rules be enforced and to enforce that requirement (or to
> 	enforce the rules itself, but that is even more
> 	farfetched).
> 
> Now, unless one believes in miracle turnarounds from history,
> all of the above assumptions are demonstrably and massively
> false.  If they were only occasionally false, Gerv would still
> need to decide whether his obligation to protect users required
> some additional measures.  But, despite believing strongly that
> ICANN should be held responsible for stepping up to the role in
> this that I read into their charter and bylaws, I think spending
> energy on a policy that requires believing all three of the
> above today should get you, not just a pony, but an opportunity
> on a discount price on a bridge I understand is for sale.
> 
>     john
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Idna-update mailing list
> Idna-update <at> alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
Andrew Sullivan | 13 Dec 2011 03:10

Re: Browser IDN display policy: opinions sought

Sorry, I managed to send this before I intended to.  The rest is
below.

On Mon, Dec 12, 2011 at 06:59:37PM -0500, Andrew Sullivan wrote:
> On Mon, Dec 12, 2011 at 12:16:01PM -0500, John C Klensin wrote:
> > Andrew, sure, but...   This comes back to the assumptions that: 
> > 
> > 	-- all registries are good guys and enforce whatever
> > 	rules they make.
> 
> No, because you can check those rules yourself in your resolution
> context: look at what you are looking up and compare it to the rules
> to see whether it conforms.  Indeed, if that's not good enough, you
> have this problem anyway.
> 
> > 	
> > 	-- all registrars are good guys, with neither motivation
> > 	nor will for getting around the rules.

This is a problem we already have, for _any_ of these rules.  What's
special about the current approach that solves that?

> > 	-- if either of the above fail, there is someone with
> > 	both the authority and willingness to require that the
> > 	rules be enforced and to enforce that requirement (or to
> > 	enforce the rules itself, but that is even more
> > 	farfetched).

We also already don't have this.  On the contrary, what we have right
now is a case where rules are inconsistent among registries, there is
no way at all to find out the rules in zones not near the root, those
near-root zones are treated according to at least three different
display conventions, and one of those conventions entails using a
_different_ set of more or less arbitrary rules established under
conventions also not strictly rooted in the behaviour of anyone
operating the zones.  How is this better?

If the goal is, "Protect people from bad actors," my suggestion is,
"Don't use the DNS.  It's a worse match for that task than the
hundreds of others people seem to want to throw into it."  But if the
goal is to know whether there is something resembling a policy that
allows you to make slightly-informed guesses about whether it is sane
to treat U-labels in a zone as U-labels, I'm suggesting that we can do
better than either "SWAG about the language this label is supposed to
be in" or "I know who the bad guys are, trust me."

Best,

A

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
John C Klensin | 13 Dec 2011 11:34

Re: Browser IDN display policy: opinions sought


--On Monday, December 12, 2011 21:10 -0500 Andrew Sullivan
<ajs <at> anvilwalrusden.com> wrote:

> Sorry, I managed to send this before I intended to.  The rest
> is below.
> 
> On Mon, Dec 12, 2011 at 06:59:37PM -0500, Andrew Sullivan
> wrote:
>...
>> No, because you can check those rules yourself in your
>> resolution context: look at what you are looking up and
>> compare it to the rules to see whether it conforms.  Indeed,
>> if that's not good enough, you have this problem anyway.

But you cannot check if the rules involve similarity with what
is present in the zone.  Certainly you can check for
mixed-script labels but (ignoring the complexities of the
various exception cases), that is a useless check to make
against a determined attacker because the recommendation to make
such checks is just too widely known and understood (yes, it
would give some protection against really dumb attackers,
but...).  To do a more sophisticated check, you'd need to be
able to ask the DNS server to return all of the labels that
might be confused with the one you are thinking about looking
up.  That would be a useful function for many purposes,
especially if one could ignore the issues associated with
perception and doing it algorithmically.   But it requires two
things of DNS servers (presumably all of them): that they be
able to convert IDNs back to Unicode code points so they could
do similarity searches on them and that they be able to do
similarity (fuzzy and/or distance measure) searches on those
converted strings.  Actually a third: the ability to return a
rather long list of labels, something that is not supported by
any existing DNS query other than zone transfers.  Or, I
suppose, one ignore the security issues and could just do the
zone transfer oneself and then perform the conversions,
searching, and matching locally.  Ignoring the questions of
which DNS this could be implemented in and how long it would
take to deploy, I can imagine Gerv's implementers and
performance folks who won't tolerate changes with far lower
performance consequences having ROFL responses or worse.

>> > 	-- all registrars are good guys, with neither motivation
>> > 	nor will for getting around the rules.
> 
> This is a problem we already have, for _any_ of these rules.
> What's special about the current approach that solves that?

At some level, that is exactly my point.  Whether he gets the
list right or not, Gerv's "Type B" is based on a whitelist of
well-behaved registries, where well-behaved includes (even if
indirectly) assuring the good behavior of registrars.  In a
world in which stating policies and ignoring them is ok (because
there are no effective sanctions) and, as Tina indirectly points
out, having guidelines that are ignored by some registries is ok
(again, no consequences), then Gerv needs to run his registry
evaluation system anyway.  It makes it a little easier for him
to find the rules that make up part of the evaluation, but the
"do they actually follow those rules" part is unchanged.

Ignoring the performance issues, etc., there is another problem
with saying "lets put a pointer to the rules in the DNS".  If
those rules are going to be machine-processed, there must be an
agreed-upon format.  The diversity of the possible types of
rules and some experience with similar format discussions might
make the time needed to develop and agree upon the required
format, keywords, operators, etc., compare unfavorably with the
design, development, and deployment time for DNSng.  And that
assumes the IETF did the work; if the obvious other organization
tried it, I'd assume it would take that long before the
committees stopped delegating work to other committees and
actually sat down to do something.

>> > 	-- if either of the above fail, there is someone with
>> > 	both the authority and willingness to require that the
>> > 	rules be enforced and to enforce that requirement (or to
>> > 	enforce the rules itself, but that is even more
>> > 	farfetched).

> We also already don't have this.  On the contrary, what we
> have right now is a case where rules are inconsistent among
> registries, there is no way at all to find out the rules in
> zones not near the root, those near-root zones are treated
> according to at least three different display conventions, and
> one of those conventions entails using a _different_ set of
> more or less arbitrary rules established under conventions
> also not strictly rooted in the behaviour of anyone operating
> the zones.  How is this better?

And you left out "many of the rules that do exist are just
ignored in practice".  All I was suggesting was that your
proposal wouldn't help much.  Pointers to rules that are ignored
helps no one.  Pointers to rules that cannot be parsed and
accurately understood don't help with lookup-time processing,
even there were no performance issues.

Unless this situation is rationalized sufficiently by some
entity that has the authority to enforce it on some domains and
create, by example, a model that inspires (or creates pressure
on) others, then either: 

	Gerv (and many others) are wrong and confusable names
	will never amount to much as an attack vector except
	against the dumbest of users ... or ...
	
	IDNs don't have much future outside isolated,
	single-language communities because these
	blunt-instrument tools will exclude too much  and/or
	enough people will be victimized to create a general
	sense of fear.

> If the goal is, "Protect people from bad actors," my
> suggestion is, "Don't use the DNS.  It's a worse match for
> that task than the hundreds of others people seem to want to
> throw into it." 

Strongly agree.  But you know that already.

> But if the goal is to know whether there is
> something resembling a policy that allows you to make
> slightly-informed guesses about whether it is sane to treat
> U-labels in a zone as U-labels, I'm suggesting that we can do
> better than either "SWAG about the language this label is
> supposed to be in" or "I know who the bad guys are, trust me."

And I'm suggesting that any system that makes the rules easier
to find will ultimately come down to your second choice above
unless some entity starts enforcing (at least) conformance to
declared rules and preferably a minimum set of rules as well.

best,
   john
Andrew Sullivan | 13 Dec 2011 16:49

Re: Browser IDN display policy: opinions sought

On Tue, Dec 13, 2011 at 05:34:40AM -0500, John C Klensin wrote:

> but...).  To do a more sophisticated check, you'd need to be
> able to ask the DNS server to return all of the labels that
> might be confused with the one you are thinking about looking
> up.

Aha.  You want some kind of assurance that, if you are looking up the
label, you can rely on the party who told you what the policy is to
enforce that policy.

How is this different from the state of affairs that obtains now?  If
Afilias did something bonehead in .info tomorrow, I have little
confidence that Opera and Mozilla would detect it right away -- how
would they even know to look?

I claim that, if "be sure nobody is lying about what they are doing"
is the criterion for success, this effort is doomed.  That's like
wishing for a protocol that will prove the guys with the shell games
in your favourite tourist trap are never going to cheat.  Or, to beat
up on the usual metaphor, it's an invisible flying pony.  With sparkles.

> Ignoring the performance issues, etc., there is another problem
> with saying "lets put a pointer to the rules in the DNS".  If
> those rules are going to be machine-processed, there must be an
> agreed-upon format.

Yes, this is a problem.  OTOH, as we see in this thread, the existing
answers are all broken.  Perhaps the REPUTE WG offers us a chance at
a way to evaluate these things over time?

> Unless this situation is rationalized sufficiently by some
> entity that has the authority to enforce it on some domains and
> create, by example, a model that inspires (or creates pressure
> on) others

This sounds like a desire for a universal co-ordinator of the DNS.
The exact point of the protocol was to get rid of that choke point, so
I don't think we're going to re-invent it.

> > supposed to be in" or "I know who the bad guys are, trust me."
> 
> And I'm suggesting that any system that makes the rules easier
> to find will ultimately come down to your second choice above
> unless some entity starts enforcing (at least) conformance to
> declared rules and preferably a minimum set of rules as well.

In the area of spam control, despite all the nasty side effects,
consulting several different abuse lists (which we might like to think
of as "reputation services") gets you more information to base your
decisions on.  I don't see why a similar approach might not work for
IDN display, _provided that_ zones have a way of stating what it is
they're trying to do.  Such a mechanism (and calling this hand-wavy
sketch of an idea a "proposal" is giving it too much credit) would be
extremely imperfect and it would mean that new names always started at
a disadvantage.  But it would at least give us something to build on.

Best,

A

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
Eric Brunner-Williams | 13 Dec 2011 18:09
Favicon

Re: Browser IDN display policy: opinions sought

On 12/13/11 10:49 AM, Andrew Sullivan wrote:
> In the area of spam control, despite all the nasty side effects,
> consulting several different abuse lists (which we might like to think
> of as "reputation services") gets you more information to base your
> decisions on.  ...

Circa 2002 there was wide spread filtering of one 2000 round new gTLD
due to the difference between its stated purpose and policy, and its
actual registration policy, as the latter made the namespace
accessible to unsollicited commercial emailers. The reputation effect
persisted for several years (and may still persist).

The point being that examples of autonomous mechanism behavior tending
towards apparent, even actual coherency of policy, exist for
namespaces, not just domain names, addresses, and address block
allocations.

> ... I don't see why a similar approach might not work for
> IDN display, _provided that_ zones have a way of stating what it is
> they're trying to do.  Such a mechanism (and calling this hand-wavy
> sketch of an idea a "proposal" is giving it too much credit) would be
> extremely imperfect and it would mean that new names always started at
> a disadvantage.  But it would at least give us something to build on.

The obverse would be the claim that non-global semantics can not
exist, and that state may not accumulate.

We've sort of been down this path before with gedanken experiments
about encoding discovery, the query-and-response problem, however,
here the problem is display, for which out-of-protocol mechanisms are
possible.

Since we're hand-waving, waving in the general direction of REPUTE
and/or DOMAINREP may necessary to access accumulated state, but may be
insufficient to determine a locally likely display property of some
non-ASCII label.

Two beads and change,
Eric
John Levine | 13 Dec 2011 19:16

Re: Browser IDN display policy: opinions sought

Having been reading this discussion with great interest, I don't
understand what problem is being solved.  Is it:

A) Only display names that are not deceptive?

B) Don't display names that might be deceptive?

C) Don't display names that fail to meet some policy that
doesn't really have anything to do with deception?

D) Only display names that meet some policy?

E) Something else?

It clearly can't be A, since there's plenty of room for deception in
plain ASCII, and people can put random names at the Nth level, e.g.,
FDIC.GOV.FOO.BAR.SOMETHING.TLD.  Beyond that, I'm baffled.

R's,
John
Paul Hoffman | 13 Dec 2011 19:31
Picon

Re: Browser IDN display policy: opinions sought


On Dec 13, 2011, at 10:16 AM, John Levine wrote:

> Having been reading this discussion with great interest, I don't
> understand what problem is being solved.  Is it:
> 
> A) Only display names that are not deceptive?
> 
> B) Don't display names that might be deceptive?
> 
> C) Don't display names that fail to meet some policy that
> doesn't really have anything to do with deception?
> 
> D) Only display names that meet some policy?
> 
> E) Something else?
> 
> It clearly can't be A, since there's plenty of room for deception in
> plain ASCII, and people can put random names at the Nth level, e.g.,
> FDIC.GOV.FOO.BAR.SOMETHING.TLD.  Beyond that, I'm baffled.

The stated reason for not just displaying the Unicode every time is to avoid deception. So, (B).

--Paul Hoffman
Mark Davis ☕ | 13 Dec 2011 22:40

Re: Browser IDN display policy: opinions sought

Martin,

According to all of the information I have from our security people:

IDNA spoofing is far down on the list of importance compared to other ways to spoof. Average people are more swayed by the appearance of the page they land on than on the appearance of the url in the address bar. The average person doesn't distinguish:


The warnings that really grab people's attention are where (for example) a warning screen comes up before the contents appears, telling people that the content page is dangerous, and asking if they want to continue. Simply changing the appearance in the address bar is often overlooked.

That says to me that much it would be better to always show the Unicode characters (thus giving a uniform UI across browsers), but then provide a more obvious UI signal to users that the page is suspect (and for what reason). So from your example, the user should see http://www.viagénie.com and http://биатлон.рф in all of the browsers.

The Unicode vs Punycode UI is a blunt tool anyway; a separate UI signal out from that for gradations in the levels of warnings given to users. Thus the following could get different levels of warnings (depending on the user's language settings)—some being of the "you can't go farther without confirmation" sort:
  • ѕех.рф (the 'sex' are all Cyrillic characters)
  • ѕех.com (the 'sex' are all Cyrillic characters)
  • ypal.com (with just one Cyrillic а).
  • &c.
User's could also get settings to turn off classes of errors, if they find that those get in the way based on their environment.

On determining which pages are suspect because of their URL: If we were in a world where we could depend on registries to police domain name labels, that would be simple for browsers and other clients. Such a Kumbaya planet bears little resemblance to our reality, though. And as far as I know, ICANN neither has the authority to require that every domain name label (at every level, such as the label 'foobar' in foobar.blogspot.co.uk) meets some particular set of requirements, nor would it would be willing to certify (subject to legal damage claims?) that such is the case, even for those domain name labels that it can control.

That says to me that whatever level of signaling that is required is largely up to the browsers; depending on the registries is just wishful thinking. Given that, I think some refinements of A look promising. There are a variety of different possibilities; it would probably be useful for interested parties to brainstorm on the most effective ones in practice.
  • warn on mixed-script labels (allowing certain exceptions, essentially where there are no confusable characters between the scripts, like Latin + Hangul)
  • warn on mixed-script domain names.
  • warn on confusable characters outside of my languages
  • &c.
Mark
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Simon Josefsson | 14 Dec 2011 11:05
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

Mark Davis ☕ <mark <at> macchiato.com> writes:

> That says to me that much it would be better to always show the Unicode
> characters (thus giving a uniform UI across browsers),

+1 to that, and thanks for saying it.

I don't think it is constructive to frame a discussion like 'chose
between A, B and C but do not think about the general problem and
propose any other solution that might be better because we don't want to
hear about it'.

> but then provide a more obvious UI signal to users that the page is
> suspect (and for what reason).  So from your example, the user should
> see http://www.viagénie.com<http://www.xn--viagnie-eya.com/> and
> http://биатлон.рф <http://xn--80abvnkf0a.xn--p1ai/> in all of the
> browsers.

Exactly.

There is a market for software that protects against "dangerous"
website.  Phishing is a technological problem that goes far beyond IDNs.
I suggest we let experts in that area handle that problem, and let us
focus on displaying IDNs to users.

As an analogy, consider if we took a similar approach to MIME
attachments.  The way some browsers implement IDNs today is similar to
letting e-mail clients display the raw MIME encoding of the entire
e-mail to the user when the client didn't have the attachment in a
whitelist.

/Simon
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Gervase Markham | 14 Dec 2011 11:59
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 13/12/11 18:16, John Levine wrote:
> Having been reading this discussion with great interest, I don't
> understand what problem is being solved.  Is it:
> 
> A) Only display names that are not deceptive?
> 
> B) Don't display names that might be deceptive?
> 
> C) Don't display names that fail to meet some policy that
> doesn't really have anything to do with deception?
> 
> D) Only display names that meet some policy?
> 
> E) Something else?
> 
> It clearly can't be A, since there's plenty of room for deception in
> plain ASCII, and people can put random names at the Nth level, e.g.,
> FDIC.GOV.FOO.BAR.SOMETHING.TLD.  Beyond that, I'm baffled.

Hence the highlighting of the "Public Suffix + 1" in recent versions of
Firefox and Chrome, and the blacklisting (even before IDNA2008) of
homographs of "." and "/".

The aim is some reasonable approximation of A, given that deceptiveness
is subjective.

(The logic "deceptiveness is subjective -> one should not attempt to do
anything about deceptiveness" is not considered reasonable.)

Gerv
Martin J. Dürst | 14 Dec 2011 12:25
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/14 3:16, John Levine wrote:
> Having been reading this discussion with great interest, I don't
> understand what problem is being solved.  Is it:
>
> A) Only display names that are not deceptive?
>
> B) Don't display names that might be deceptive?
>
> C) Don't display names that fail to meet some policy that
> doesn't really have anything to do with deception?
>
> D) Only display names that meet some policy?
>
> E) Something else?
>
> It clearly can't be A, since there's plenty of room for deception in
> plain ASCII, and people can put random names at the Nth level, e.g.,
> FDIC.GOV.FOO.BAR.SOMETHING.TLD.  Beyond that, I'm baffled.

The whole thing started with the paypаl.com scare (the second 'a' of 
paypal being Cyrillic). The goal of the browser makers was to come up 
with something that addressed this issue, and similar IDN-related and 
script-related potential deceptions. So the goal was:

Don't display names that are potentially deceptive because of 
similarities of letters in different scripts.

That's a pretty limited goal, and because there was quite a bit of 
perceived pressure to do something, and not too much time and not too 
many actual names out there yet that would have people make complain, 
the job was overdone in many ways and not good enough in others (as 
mentioned, the Mozilla approach fails for cases such as wordpress.com).

Regards,   Martin.
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
John C Klensin | 9 Dec 2011 21:55

Re: Browser IDN display policy: opinions sought

Gerv,

This is going to be long -- you asked for opinions and analysis
and I want to give you some, rather than just "I prefer X".

--On Friday, December 09, 2011 11:12 +0000 Gervase Markham
<gerv <at> mozilla.org> wrote:

> Recently, Mozilla community member Jothan Frakes was kind
> enough to do some research about how different popular web
> browsers implement IDN, and when they display the real
> characters and when they display Punycode. This is in the
> context of a Mozilla review of our policy. I am interested in
> the opinions of people on this list (see below).
> 
> As it turns out, the behaviour of all popular browsers is
> summarised at the bottom a Chromium project document here:
> http://www.chromium.org/developers/design-documents/idn-in-goo
> gle-chrome
> 
> The policies fall into 3 approximate buckets:
> 
> A (IE, Chrome): Unicode if the (single) 'language' of the
> string is configured in the options, Punycode otherwise.
> 
> B (Firefox, Opera): Unicode if the TLD is in a whitelist,
> Punycode otherwise. Arbitrary script mixing permitted
> (registry policy used to prevent abuse).
> 
> C (Safari): Unicode if the script is in a whitelist (which by
> default does not include Cyrillic or Greek), Punycode
> otherwise. Not sure about script mixing.
> 
> 
> Firefox has historically resisted adopting a Type A policy
> because we consider it seriously detrimental to IDN adoption
> and use. It seems to me that IDN can never be reliable for
> site owners, and therefore will not succceed, if a significant
> proportion of the world's browsers adopt Type A or Type C
> policies. This is because site owners can never know what
> proportion of their visitors will see gobbledegook in the URL
> bar rather than their nice domain name. Perhaps for sites
> whose visitors are all guaranteed to be from a particular
> country or language group, with properly-configured browsers
> and OSes which know that they speak a certain language or use
> a certain script, it might work - but I suggest that's a small
> subset of all sites. Many people in non-English-speaking
> countries still use English OSes and English browsers, with
> default settings.
> 
> Type C is particularly bad - Russian and Greek IDNs are broken
> by default, but even if you persuade your users to turn it on,
> they can then be mixed-script spoofed. You get to choose
> between functionality and security.
> 
> By contrast, with a Type B policy, if your IDN domain works in
> one copy of Firefox, it works in them all. If everyone had
> Type B policies, there would be no risk of a
> properly-registered domain coming up as gibberish.
> 
> It has been suggested that Firefox switch to a Type A policy.
> As it is, the mix of policies means that the goal of universal
> acceptability is not being met anyway. Firefox switching to
> Type A would also not meet that goal by itself, but one could
> argue that there's a bit more consistency to browser behaviour.
> 
> I would be interested in the opinion of people on this list as
> to:
> 
> - whether my analysis seems reasonable;
> - whether they prefer type A, B or C; and
> - whether they see any particular policy as more damaging to
> IDN   adoption than another.
>...
> (Note that "no restrictions" is not an option, given what
> happened in 2005 with payp-cyrillic-a-l.com, and I would
> rather not derail this debate by rehearsing those arguments
> again.)

Gerv, let me disagree, at least slightly, with Paul and Michel.
I've got a lot of sympathy for Type B even though I've never
been happy with at least one aspect of it.  I see a few --fairly
serious, IMO-- issues with Type A and a few with parts of your
analysis.

To start with, let me use my own patterns as an example.  I
really want the same character-display behavior on every device
I use.  It isn't good enough that I get the same behavior on
every device on which I'm running Firefox because I have devices
on which that isn't practical (you don't have a version for that
device, it doesn't behave well, my configuration of the device
has footprint/capacity problems and Firefox is getting
bloated,...).  In a typical week, I move among circa a
half-dozen machines with four to six different operating systems
(depending on how you feel about versions, and versions do make
a difference in this area). I may also sit down at a public
terminal or a friend's machine.  I'm probably worse than
average, but I'm not a tablet user: I've now got several
non-specialist friends who are carrying a smartphone, a Kindle
Fire, and an iPad, so this gets them too.  With Type A, these
machines need to be configured one at a time, using different
interfaces and conventions and sometimes going back to physical
media.  I wouldn't know how to configure my Android 2.2 phone to
display Greek or Han (which I prefer to be able to see, even
when I can't read the languages) while retaining Latin if I
wanted to...   We have no mechanisms for smoothly exporting
those settings among machines running the some OS, much less
across heterogeneous OSs.   That is a big pain, one requiring a
fair amount of technical smarts and time investment to get and
keep consistent, and impossible if one is using borrowed or
kiosk machines.

Worse, for several systems (Windows and FreeBSD included),
"configured in the options" doesn't mean "set a browser flag
that says it is ok to display Slobbovian script characters".
Instead, unless some of the bits are there already (which they
might be because, e.g., fonts and type styles are typically
installed in units of groups of scripts while configuration is
often done by language), it means "set operating system options
and load hundreds of megabytes of fonts for the relevant script
in every type style installed on the machine, printer driver
extensions, spelling dictionaries, keyboard layouts, and
sometimes phonetic tables for screen readers and other good
things".  For browser display of domain names or IRIs in scripts
I don't use routinely, a single reference type style would not
only be adequate, but I might be better off (less vunerable to
attack), but loading only one type style for scripts other than
those for my favorite language or two is not an option (or not
one that an unsophisticated user can figure out how to use). 

Because I (and others) may want to see the characters of scripts
that are associated with languages we can't read (e.g., I can
read Greek characters, but cannot usefully read the language),
Type A has the inherent problem of being language-based and, in
at least most cases, requiring that I essentially lie to the
operating system and tell it I can read and write languages for
which I have no capability.

Your "Type B" model, if used by other browsers and systems,
would get me that consistency of behavior, especially if browser
installation gives me that default "universal" reference font
too.  It is less useful if you and Opera are the only ones
supporting it, but "Type A" is less useful unless there are
really easy ways to export settings and to make fonts and
display for particular   scripts available without importing
large groups of scripts or all of the associated language
baggage.  Nothing else in this reality can give me that
consistency.  See below for a different reality, but I'm just
not convinced that we have it right with any of the three groups
of options.  Intentionally or not, I think they are even solving
different problems.

The big disadvantage of "Type B" is that it means I'm reliant on
your judgment about what TLDs are well-behaved (or worth the
risk) and which ones are not.  If I disagree, I lose -- either
because you are opposed to giving the user domain-by-domain
override capability or, if there is an override mechanism that
allows me to insert my own opinions, it would take me back to
the problem of keeping multiple machines consistent.   Even
overrides and your opinions might not be granular enough: It is
entirely possible that I would trust the registry and registrar
policies for the "frob" TLD for European scripts but not for
Asian ones (and vice versa) even if they permit registration
from those other scripts.  The current "Type B" doesn't even
have a vocabulary to talk about that.

In addition, if ICANN adds 500 new domains next year, many of
them with alternate actual orthography or renderings
("variants"), I can't imagine that the Type B approach is going
scale well enough to remain timely.   If the goal is to protect
users, with errors on the side of displaying A-labels considered
acceptable, then that may be ok, but it isn't a good story to
tell folks who are interested in using or marketing a new
domain.   Scaling for Type A and Type C depends on the number of
languages with which the user claims familiarity; for Type B, it
depends on the total number of TLDs.

All of that said, I think you (as a Type B) representative and
Microsoft (as a Type A one) one are, intentionally or not,
solving different problems.  Your concern --from the
explanations you gave years ago and reinforced by your use of
the bogus paypal example-- has been about protecting users from
intentionally-confusing or confuseable strings.    From that
perspective, displaying A-labels rather than U-labels is a way
to convey a particularly forceful message warning that the user
had better pay careful attention.  Microsoft's concern is, as
I've understood it, much closer to the belief that the user
doesn't want to deal with domain name labels in languages or
scripts she does not read or use.  That may have some useful
anti-phishing (and anti-fraud more generally) properties, but
(intentionally not, but I think intentionally) it is really much
more about an attractive user experience than it is about
protection against bad behaviors.   There is, I think, an
underlying dubious assumption that is that a domain name that
contains labels in some script implies content in a language
that uses that script, but so it goes.

The "different problems" suggests that, if you wanted to, you
could adopt Type A behavior but still identify domain names from
suspect trees (Type B behavior) in some way other than A-label
display.  Whether that would be a good strategy or not depends
on what you think about language-based models and your analysis
of the effectiveness of various forms of user warnings (my own
guess is that shifting to A-labels as a warning is going to be
effective if you don't do it very often but, if the network and
TLD behavior evolve in a direction that results in a lot of
users seeing A-labels most of the time, the shock value would
rapidly disappear (the same comment would apply to Type C
behavior with most scripts disabled by default).

To wrap this long analysis up, I think some analysis of what
problem you are trying to solve is in order.  Such an analysis
would go beyond my hypothesized difference between your concerns
and Microsoft's and would dig more deeply into your concerns.  I
think that what you are trying to do is ultimately a workaround
for a real solution.  Perhaps, almost nine years after work on
IDNA2003 completed and with the prospect of a huge upsurge of
IDN use in the relatively near future, the browser vendors and
the rest of us could band together to push for a solution to the
problem itself, not just workarounds that are not really very
good at user protection against skilled attackers.  

In principle, there is a much better answer than any of those
three models. It is for ICANN to get really serious about doing
properly what you are trying to approximate.  If ICANN had the
will to change its behavior and rules to stop the use of IDNs
(and, for that matter, domain names generally) for
mostly-untraceable deceptive behavior, the phishing piece of the
problem that all three of these approaches are trying to address
would stop for gTLDs.  If it did, it is likely that the natural,
domestic, pressures on and from governments to clean up the
ccTLD act would be immense.  For the countries that did not
conform, variations on your Type B model would be even more
attractive than they are today because they would represent real
consensus about indifferent behavior -- a sort of collective
Internet community shunning of those who were not interested in
conforming to community norms.   

It probably won't happen.  The communities that appear to have
the most influence in ICANN are far more interested in promoting
the sale of as many names as possible no matter who they are
sold to, under what conditions, for what purposes, etc.   Even
more sadly, the window on that approach probably closes forever
next month: while one can imagine transition plans based on
requiring registrant authentication at annual renewal now, once
ICANN starts signing long-term contracts with lots of "you pay
your money, you do what you like" gTLDs, I have no idea how the
community could go back even if that were considered a good idea.

At least in principle, ICANN is open enough that the situation
could be turned around.  Write your favorite ICANN SSAC member
asking that they review this situation and insist that it be
solved at the registration end, not leaving it to browser
vendors to perform heuristic or highly subjective tricks (note
that Patrik is now SSAC Chair).  Write your favorite ICANN GAC
member or government official who is concerned about Cybercrime
and suggest that this issue deserves attention.  If you are
involved with the ICANN at-large process, talk with your
representatives.  Or try notes to ICANN's CEO and/or Board
Chair.  These three types of browser filtering approaches each
have their strengths and weaknesses (which ones are best depend
on your assumptions), but the real solution to the user
protection part of the problem lies, at least IMO, at the other
end of the process.

regards,
     john
Paul Hoffman | 9 Dec 2011 22:32
Picon

Re: Browser IDN display policy: opinions sought

On Dec 9, 2011, at 12:55 PM, John C Klensin wrote:

> It is for ICANN to get really serious about doing
> properly what you are trying to approximate.  

<laughing too hard to catch my breath>

> It probably won't happen.  

Oh, good, you're not batshit insane.

--Paul, still wheezing
John C Klensin | 9 Dec 2011 23:01

Re: Browser IDN display policy: opinions sought


--On Friday, December 09, 2011 13:32 -0800 Paul Hoffman
<phoffman <at> imc.org> wrote:

> On Dec 9, 2011, at 12:55 PM, John C Klensin wrote:
> 
>> It is for ICANN to get really serious about doing
>> properly what you are trying to approximate.  
> 
> <laughing too hard to catch my breath>
> 
>> It probably won't happen.  
> 
> Oh, good, you're not batshit insane.

Always  happy to have the opportunity to brighten your day.

I actually wrote "It obviously won't happen" first, but I'm
actually not quite that pessimistic (ok, emphasis on "quite").
Or I think the risks going forward are high enough that it is
worth trying to make that particular part of the system work
even if one is less than optimistic about success.  

Maybe insane, but possibly not over into "batshit".

FWIW, the IAB's iana-strategy program is busy trying to assemble
a letter/ questionnaire response that gives the current regime a
ringing endorsement.  That effort is mostly focused on IANA
performance, but there are overlaps and the driving question has
to do with what organization will be providing IANA services
later in 2012.  If you believe that things at ICANN are so
horribly broken that anyone who suggests trying to use the
system to get more responsible behavior is insane (batshit or
otherwise), maybe you want to speak up on that issue... perhaps
including about what you think the alternatives are.

    john
Picon

Re: Browser IDN display policy: opinions sought

I am afraid this debate is among people who read ASCII characters, believe that Registries have IDNA2003 policies when IDNA2008 default should be OK for all and is the rule for 3rd level DNs, confuse the TCP/IP international network with the Internet, and think that ICANN and GAC have something to do with the digital ecosystem technology.

There is only one thing which is demanded by users : WITISWISISWIS and no application interference. Otherwise it means that every application and each application version will result in its own internet.

Now, if browsers manufacturers want to support additional optional service pluggins with A,B,C,.... Y,Z strategies this is very nice of them: they write an RFC describing their protocosl and make their complete support optional.

1. Browsers MUST support any IDN made of any IDNA2008 permitted character, whatever the TLD, etc. etc.
2. Right clicking on an URL should allow users to see its punycode equivalent.

PORTZAMPARC


_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Michel Suignard | 10 Dec 2011 04:58

RE: Browser IDN display policy: opinions sought

John and all, concerning the following:
<<
With Type A, these machines need to be configured one at a time, using different interfaces and conventions
and sometimes going back to physical media.  I wouldn't know how to configure my Android 2.2 phone to
display Greek or Han (which I prefer to be able to see, even when I can't read the languages) while retaining
Latin if I
wanted to...
>>
This is not typically true. Display capabilities are much broader than other language processing
abilities on most platforms. Recent Windows platform (at least since Vista) and the phone I am using
(iPhone with iOS4) can represent many more writing systems than I am familiar with. And they don't require
any user enabling. To a large degree the availability of font resources is a footprint issue on many of the
devices which is more acute on smart phones and such. But even on those platforms there is a trend to display
more and more of the Unicode repertoire. I was in fact pretty amazed at how much an iPhone can display (sorry
no experience on other smartphones).

I don't think there is a difference between type A and B on that aspect. It may impact the choice of
'supported' language but adding more languages will not magically request the install of more
resources, there are already typically installed. Keyboard entries and similar advanced language
related processes is another story but none of that is required for IDN purpose.

Either in A or B mode you are facing the prospect of displaying over 100 000 characters with the added
complexity of CJK variants and complex script shaping. It is always a compromise but it has nothing to do
with A or B. Operating Systems tend to get away from the on demand install for language specific resources
because of customer confusion. 

Interesting discussion btw. I don't see ICANN at fault here. It is a very complex issue.

Michel
Gervase Markham | 12 Dec 2011 13:05
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 09/12/11 20:55, John C Klensin wrote:
> This is going to be long -- you asked for opinions and analysis
> and I want to give you some, rather than just "I prefer X".

Hi John,

Thank you :-)

> Your "Type B" model, if used by other browsers and systems,
> would get me that consistency of behavior, especially if browser
> installation gives me that default "universal" reference font
> too.

(Side point: I'd see universal font provision as something that's the
OS's responsibility.)

> The big disadvantage of "Type B" is that it means I'm reliant on
> your judgment about what TLDs are well-behaved (or worth the
> risk) and which ones are not. 

You are. Mozilla is fine with that, in that we make many similar
judgments for our users in lots of areas on a regular basis (the
inclusion of certificate authorities being one obvious example).

> The "different problems" suggests that, if you wanted to, you
> could adopt Type A behavior but still identify domain names from
> suspect trees (Type B behavior) in some way other than A-label
> display.  

You seem here to be suggesting a blacklist ("suspect trees") rather than
a whitelist. Depending on how it was constructed, that might be somewhat
controversial! :-)

> Whether that would be a good strategy or not depends
> on what you think about language-based models and your analysis
> of the effectiveness of various forms of user warnings 

My opinion is that other forms of user warning are not going to be
effective - if you want a user not to be fooled by a particular visual
presentation, the only way is to not make that visual presentation.

> In principle, there is a much better answer than any of those
> three models. It is for ICANN to get really serious about doing
> properly what you are trying to approximate.

Amen!

> It probably won't happen.  The communities that appear to have
> the most influence in ICANN are far more interested in promoting
> the sale of as many names as possible no matter who they are
> sold to, under what conditions, for what purposes, etc.   Even
> more sadly, the window on that approach probably closes forever
> next month: while one can imagine transition plans based on
> requiring registrant authentication at annual renewal now, once
> ICANN starts signing long-term contracts with lots of "you pay
> your money, you do what you like" gTLDs, I have no idea how the
> community could go back even if that were considered a good idea.

It's deeply disappointing that solving this problem wasn't an obvious,
basic goal of the process from the beginning!

Gerv
John C Klensin | 12 Dec 2011 18:54

Re: Browser IDN display policy: opinions sought

(Sorry... I've been necessarily offline for a couple of hours --
this note was started before I dropped out and is being finished
now, so some issues may have already been addressed by others.)

--On Monday, December 12, 2011 12:05 +0000 Gervase Markham
<gerv <at> mozilla.org> wrote:

> On 09/12/11 20:55, John C Klensin wrote:
>> This is going to be long -- you asked for opinions and
>> analysis and I want to give you some, rather than just "I
>> prefer X".
> 
> Hi John,
> 
> Thank you :-)
> 
>> Your "Type B" model, if used by other browsers and systems,
>> would get me that consistency of behavior, especially if
>> browser installation gives me that default "universal"
>> reference font too.
> 
> (Side point: I'd see universal font provision as something
> that's the OS's responsibility.)

I absolutely agree.  I also believe that being sure that domain
names are not confusing or otherwise intended for hostile
purposes is the job of ICANN (for TLD names, SLD policy, and
enforcement of both) and registries and, if necessary,
governments.  I believe that there are better ways to control
and fix the spam problem than recipient-end filters, no matter
how those filters are designed or how well they work.  I have
several similar beliefs that probably put me in an alternate
reality, if not over into "insane".  Understanding that those
beliefs aren't realistic today, I recognize and agree with your
desire to provide nearly-last-resort user protections (the last
resort is consistent user care and intelligence and even I'm not
sufficiently idealistic and unrealistic to believe in that).  

We, and certainly most others on this list, are painfully aware,
not only about the fundamental technical principle that it is
hard to display a script for which one has no fonts (however we
might feel about U-labels versus A-labels, either is clearly
better than "????" or its equivalent) and about how much
confusion can be added simply by the choice of type styles and
sizes one uses (e.g., I believe that the Firefox change in
recent versions to drop the typesize of the display of full URIs
when my mouse pointer moves over a link and to make the location
of that display less predictable to be security-lowering -- but
maybe that is just me). 

In this modern world, I'd like to see every operating system
support a good reference type style, with highly differentiated
characters, for all or most the the characters defined by
Unicode.  I'd like to have that type style able to be referenced
in a way that makes it accessible to applications even when the
user has chosen different defaults.  And I'd like to see
applications use it for display of security-sensitive objects
(not just domain names and IRIs, but, e.g., user and certificate
names and certificate contents), no matter how conventional
content is displayed.  That implies that I'd like to see XML,
HTML, etc., tags to identify such data.  

Of course, that implies that I'd like such a drastic increase in
the world supply of ponies that everyone who wanted one could
have one (and the means to keep it in comfort).

>> The big disadvantage of "Type B" is that it means I'm reliant
>> on your judgment about what TLDs are well-behaved (or worth
>> the risk) and which ones are not. 
> 
> You are. Mozilla is fine with that, in that we make many
> similar judgments for our users in lots of areas on a regular
> basis (the inclusion of certificate authorities being one
> obvious example).

Indeed.  And reasonable people can question those decisions too.
You and I do have a difference of opinion about whether users
should be able to override those decisions and how painful (or
obvious) it should be to do so if it is possible.   In my world,
not only should those options exist, but users should be able to
easily import security preference profiles that might be offered
by interested third parties (e.g., if I believe that Patrik and
I are about equally paranoid and about the same things, and he
wants to take the time to designate which certificate providers,
domains, etc., he is willing to trust, then, if I can persuade
him to share (either because he is nice or because I am willing
to pay for the privilege) he ought to be able to easily export
those settings and I ought to be easily import them.  And that
should be the case whether he and I are using the same browser
or different ones.  Again, I understand these are fantasies and
well out of IETF's scope, but I think it is all part advising
you (and ICANN, and W3C, and governments and other bodies who
think they care) about how Mozilla should proceed.

>> The "different problems" suggests that, if you wanted to, you
>> could adopt Type A behavior but still identify domain names
>> from suspect trees (Type B behavior) in some way other than
>> A-label display.  
> 
> You seem here to be suggesting a blacklist ("suspect trees")
> rather than a whitelist. Depending on how it was constructed,
> that might be somewhat controversial! :-)

No.  I think that, in the presence of sufficient authority and
consensus, there is no difference between a blacklist and a
whitelist.  Ultimately, if one maintains or uses a list, one has
to figure out what to do with the questionable cases.  To the
extent to which a policy amounts to "these are the known good
guys and we will reward everyone on that whitelist with "good"
name displays and punish everyone else with "bad" name displays,
you are basically running a blacklist whose membership is
determined by a simple set operation on the list of TLDs.   For
at least many of the folks who object to your particular version
of a "Type B" policy, that is the real source of the objection.

>> Whether that would be a good strategy or not depends
>> on what you think about language-based models and your
>> analysis of the effectiveness of various forms of user
>> warnings 

> My opinion is that other forms of user warning are not going
> to be effective - if you want a user not to be fooled by a
> particular visual presentation, the only way is to not make
> that visual presentation.

A reasonable position, even if one doesn't agree with it.   I
again suggest that the language-based models are concerned about
that visual confusion model only as a secondary goal (whether
they realize that or not) because language-based (or even
script-based) models are almost completely ineffective at
avoiding confusion caused by a determined attacker.  Your
approach, based on a judgment --however crude-- about registry
policies and tolerance for confusing registrations is much
better in that regard, if only because you are asking the right
question.

>> In principle, there is a much better answer than any of those
>> three models. It is for ICANN to get really serious about
>> doing properly what you are trying to approximate.
> 
> Amen!
> 
>> It probably won't happen.  The communities that appear to have
>> the most influence in ICANN are far more interested in
>> promoting the sale of as many names as possible no matter who
>> they are sold to, under what conditions, for what purposes,
>> etc.   Even more sadly, the window on that approach probably
>> closes forever next month: while one can imagine transition
>> plans based on requiring registrant authentication at annual
>> renewal now, once ICANN starts signing long-term contracts
>> with lots of "you pay your money, you do what you like"
>> gTLDs, I have no idea how the community could go back even if
>> that were considered a good idea.
> 
> It's deeply disappointing that solving this problem wasn't an
> obvious, basic goal of the process from the beginning!

I couldn't agree more.  But, ICANN, like every other body in
this strange activity we call the Internet, ultimately deals in
tradeoffs.  Independent of the history of how we got here --
interesting (and I could tell you a lot of stories), but largely
irrelevant going forward -- it is perfectly natural for someone
in the ICANN decision process to say "well, no one has
complained to _me_ about that, so it isn't really a problem".
I can assure you that lots of people are saying, loudly and
repeatedly and sometimes claiming to be speaking for the users
and/or us, that ICANN's goal should be to enable the selling of
as many names as possible, with as few restrictions as possible,
so the variation of "4000 people have told me that want no rules
and not a single one has told me that there is a problem that
ICANN needs to solve" is an even more plausible excuse.   I've
come to the conclusion --perhaps much later than I should have--
that it is time to eliminate at least that set of particular
excuses for inaction and not taking responsibility.  And the
only way that is possible is if lots of folks who are concerned
about the issues and understand why start making the issues
clear to ICANN.   ICANN also isn't the IETF: whether it should
be that way or not, a hundred loud voices often speak much more
persuasively than a single coherent technical (or even security)
argument.

best,
    john
Olivier MJ Crepin-Leblond | 12 Dec 2011 23:23

Re: Browser IDN display policy: opinions sought

Hello John,

On 12/12/2011 18:54, John C Klensin wrote :
> I absolutely agree.  I also believe that being sure that domain
> names are not confusing or otherwise intended for hostile
> purposes is the job of ICANN (for TLD names, SLD policy, and
> enforcement of both) and registries and, if necessary,
> governments.

...bearing in mind that ICANN would have to look only at the string of
the domain name itself, and does not have any business looking at the
content of a Web site.
Hence, the complication re: objections on new gTLDs etc.
Kind regards,

Olivier

--

-- 
Olivier MJ Crépin-Leblond, PhD
http://www.gih.com/ocl.html
Eric Brunner-Williams | 13 Dec 2011 01:12
Favicon

Re: Browser IDN display policy: opinions sought

> ...bearing in mind that ICANN would have to look only at the string of
> the domain name itself, and does not have any business looking at the
> content of a Web site.

The representation that an entity exercising US DoC delegated rule
making may only examine information denoted as "label" and may not
examine information denoted as "mapped resource", appears to be
inconsistent with the representation of the US DoJ concerning labels
and mapped resources.

-e
Martin J. Dürst | 14 Dec 2011 12:12
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/13 2:54, John C Klensin wrote:

> We, and certainly most others on this list, are painfully aware,
> not only about the fundamental technical principle that it is
> hard to display a script for which one has no fonts (however we
> might feel about U-labels versus A-labels, either is clearly
> better than "????" or its equivalent) and about how much
> confusion can be added simply by the choice of type styles and
> sizes one uses (e.g., I believe that the Firefox change in
> recent versions to drop the typesize of the display of full URIs
> when my mouse pointer moves over a link and to make the location
> of that display less predictable to be security-lowering -- but
> maybe that is just me).

A-labels may be better than "????" if the user wants to transcribe the 
address for later use. But in terms of phishing and spoofing, ???? may 
actually be better, if only a very tiny bit.

Regards,   Martin.
Martin J. Dürst | 10 Dec 2011 13:51
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

Hello Gervase, others,

I'm replying to the original post because I didn't find the right 
message later in the thread to reply to.

About "payp-cyrillic-a-l.com": Exactly 0 people got robbed there, and 
way too many got scared. That very clearly includes the browser makers, 
who heavily overreacted.

I personally am very much with John in that I want to see as much of the 
IDNs as what they are: Unicode characters, not punycode. I tell my 
browser all the languages that I can understand to some extent, but I 
don't want to tell it that I understand Korean (because I don't), even 
though I know Hangul,.... On top of that, I don't see any problems at 
all with scripts for which I would have to consult a code chart, 
definitely not if they are e.g. as far apart from what I use daily as 
e.g. Devanagari. But even leaving personal preferences aside, I don't 
now why we can't try to display as much Unicode as we reasonably can in 
the address/location bar, the same way we display Unicode in the page 
itself.

I never really liked the type B approach for the reasons mentioned by 
others, but I also think that the A approach is way too restrictive.

I think none of the browsers have made more than a very quick stab at 
the issue. The A type browsers could easily extend their stuff to 
include all the scripts that the user won't confuse (such as Indic and 
East-Asian scripts for typical European users). On the other hand, they 
might want to be more careful for whole-script confusables for users who 
declare to read both (one of the many languages associated with) Latin 
and Greek.

The B type browsers could easily ADD A-type stuff (including the above 
improvements). They could also add some script-mixing detection to be 
able to be more generous with their TLD screening process.

I can't really judge type C, it very much depends on how big the 
whitelist is. If it's rather big, then C looks very good to me, 
otherwise not.

Also, now that we have non-ASCII TLDs, that gives us some new ideas. We 
should be able to assume that ICANN wouldn't be open to visual spoofing 
at the TLD level, such as e.g. not allowing whole-script confusables in 
Cyrillic or Greek. That should mean that cyrillic.cyrillic and 
equivalents are safe to display. And these are incidentally the domains 
where IDNs are really at their best, and where the growth should go.

So the question for Mozilla (and other browser vendors) isn't "should we 
switch from type X to type Y", but "how can we increase (potentially 
drastically) the number of IDNs we can display without creating visual 
spoofing traps". If Mozilla is able to show all IDNs that IE shows, and 
some more on top of that (without including something like 
payp-cyrillic-a-l.com), then that can only be an argument for using 
Mozilla, not against it. And that's not the slippery slope of 
bugwards-compatibility that continues to haunt HTML, but simply 
displaying the data that's there correctly.

Regards,   Martin.

On 2011/12/09 20:12, Gervase Markham wrote:
> Recently, Mozilla community member Jothan Frakes was kind enough to do
> some research about how different popular web browsers implement IDN,
> and when they display the real characters and when they display
> Punycode. This is in the context of a Mozilla review of our policy. I am
> interested in the opinions of people on this list (see below).
>
> As it turns out, the behaviour of all popular browsers is summarised at
> the bottom a Chromium project document here:
> http://www.chromium.org/developers/design-documents/idn-in-google-chrome
>
> The policies fall into 3 approximate buckets:
>
> A (IE, Chrome): Unicode if the (single) 'language' of the string is
> configured in the options, Punycode otherwise.
>
> B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode
> otherwise. Arbitrary script mixing permitted (registry policy used to
> prevent abuse).
>
> C (Safari): Unicode if the script is in a whitelist (which by default
> does not include Cyrillic or Greek), Punycode otherwise. Not sure about
> script mixing.
>
>
> Firefox has historically resisted adopting a Type A policy because we
> consider it seriously detrimental to IDN adoption and use. It seems to
> me that IDN can never be reliable for site owners, and therefore will
> not succceed, if a significant proportion of the world's browsers adopt
> Type A or Type C policies. This is because site owners can never know
> what proportion of their visitors will see gobbledegook in the URL bar
> rather than their nice domain name. Perhaps for sites whose visitors are
> all guaranteed to be from a particular country or language group, with
> properly-configured browsers and OSes which know that they speak a
> certain language or use a certain script, it might work - but I suggest
> that's a small subset of all sites. Many people in non-English-speaking
> countries still use English OSes and English browsers, with default
> settings.
>
> Type C is particularly bad - Russian and Greek IDNs are broken by
> default, but even if you persuade your users to turn it on, they can
> then be mixed-script spoofed. You get to choose between functionality
> and security.
>
> By contrast, with a Type B policy, if your IDN domain works in one copy
> of Firefox, it works in them all. If everyone had Type B policies, there
> would be no risk of a properly-registered domain coming up as gibberish.
>
> It has been suggested that Firefox switch to a Type A policy. As it is,
> the mix of policies means that the goal of universal acceptability is
> not being met anyway. Firefox switching to Type A would also not meet
> that goal by itself, but one could argue that there's a bit more
> consistency to browser behaviour.
>
> I would be interested in the opinion of people on this list as to:
>
> - whether my analysis seems reasonable;
> - whether they prefer type A, B or C; and
> - whether they see any particular policy as more damaging to IDN
>    adoption than another.
>
> Has anyone lobbied one browser manufacturer or another to change their
> policy? Is there another option that is not currently in use which would
> be better?
>
> (Note that "no restrictions" is not an option, given what happened in
> 2005 with payp-cyrillic-a-l.com, and I would rather not derail this
> debate by rehearsing those arguments again.)
>
> Gerv
> _______________________________________________
> Idna-update mailing list
> Idna-update <at> alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
John C Klensin | 10 Dec 2011 16:28

Re: Browser IDN display policy: opinions sought


--On Saturday, December 10, 2011 21:51 +0900 "\"Martin J.
Dürst\"" <duerst <at> it.aoyama.ac.jp> wrote:

>...
> Also, now that we have non-ASCII TLDs, that gives us some new
> ideas. We should be able to assume that ICANN wouldn't be open
> to visual spoofing at the TLD level, such as e.g. not allowing
> whole-script confusables in Cyrillic or Greek. That should
> mean that cyrillic.cyrillic and equivalents are safe to
> display. And these are incidentally the domains where IDNs are
> really at their best, and where the growth should go.
>...

Having suggested yesterday that we should try to get ICANN to
fix a large fraction of the underlying problem, and maybe could
succeed, and had my sanity questioned (in good humor; no offense
taken), a comment on the above...

I note that you say "should be able to assume".  It seems to me
that is correct.  It also keeps you out of this wagon I'm riding
to the nut house :-)

As to how realistic that assumption is, you might consider:

(1) ICANN's Board apparently (minutes have not yet been posted)
passed a resolution on Thursday exempting IDN variations on .EU
from review for visual spoofing and other forms of
confusability.   An optimist would assume that is a special
case, never to be repeated, and that EU itself will be careful
to avoid potentially-confusing strings even if doing so results
in unnatural translations.  A pessimist would assume that anyone
with sufficient power and leverage can get such an exemption and
that one should be careful about what protections one assumes
will come from that direction.

(2) The rules about what will or will not be accepted are
contained in the "Applicant Guidebook", which I strongly
recommend as recreational reading for anyone who is trying to
figure out what one should expect from ICANN in this area.
http://newgtlds.icann.org/applicants/agb
As a guide to relevant material in this 352+ page document
(maybe much longer-- the pre-assembled version seems to have
some sections omitted), see Section 1.1.2.10 on "String
Contention".  Those of you who took statistics courses may find
it particularly interesting to contemplate what a definition
actually means that is based on whether the combination of two
things "create[s] a probability of user confusion".

If you have any strong opinions about either that document or
exemptions from confusability reviews, remember that this list
is not the right place for the discussion.  The members of
ICANN's current Board of Directors, including the CEO, are
listed at http://www.icann.org/en/general/board.html should you
want to express your support (or offer other opinions).

    john

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Patrik Fältström | 10 Dec 2011 17:35
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

All,

Let me emphasize that what John writes here is extremely important. If you have the slightest opinion of
what "confusing" implies, and what implications approval of "too similar" TLDs might have,
specifically cross scripts, you should let ICANN know.

Now.

I do explicitly also sign this email as the chair of SSAC. That is not a mistake.

   Patrik Fältström
   Chair ICANN Security and Stability Committee

On 10 dec 2011, at 16:28, John C Klensin wrote:

> --On Saturday, December 10, 2011 21:51 +0900 "\"Martin J.
> Dürst\"" <duerst <at> it.aoyama.ac.jp> wrote:
> 
>> ...
>> Also, now that we have non-ASCII TLDs, that gives us some new
>> ideas. We should be able to assume that ICANN wouldn't be open
>> to visual spoofing at the TLD level, such as e.g. not allowing
>> whole-script confusables in Cyrillic or Greek. That should
>> mean that cyrillic.cyrillic and equivalents are safe to
>> display. And these are incidentally the domains where IDNs are
>> really at their best, and where the growth should go.
>> ...
> 
> Having suggested yesterday that we should try to get ICANN to
> fix a large fraction of the underlying problem, and maybe could
> succeed, and had my sanity questioned (in good humor; no offense
> taken), a comment on the above...
> 
> I note that you say "should be able to assume".  It seems to me
> that is correct.  It also keeps you out of this wagon I'm riding
> to the nut house :-)
> 
> As to how realistic that assumption is, you might consider:
> 
> (1) ICANN's Board apparently (minutes have not yet been posted)
> passed a resolution on Thursday exempting IDN variations on .EU
> from review for visual spoofing and other forms of
> confusability.   An optimist would assume that is a special
> case, never to be repeated, and that EU itself will be careful
> to avoid potentially-confusing strings even if doing so results
> in unnatural translations.  A pessimist would assume that anyone
> with sufficient power and leverage can get such an exemption and
> that one should be careful about what protections one assumes
> will come from that direction.
> 
> (2) The rules about what will or will not be accepted are
> contained in the "Applicant Guidebook", which I strongly
> recommend as recreational reading for anyone who is trying to
> figure out what one should expect from ICANN in this area.
> http://newgtlds.icann.org/applicants/agb
> As a guide to relevant material in this 352+ page document
> (maybe much longer-- the pre-assembled version seems to have
> some sections omitted), see Section 1.1.2.10 on "String
> Contention".  Those of you who took statistics courses may find
> it particularly interesting to contemplate what a definition
> actually means that is based on whether the combination of two
> things "create[s] a probability of user confusion".
> 
> If you have any strong opinions about either that document or
> exemptions from confusability reviews, remember that this list
> is not the right place for the discussion.  The members of
> ICANN's current Board of Directors, including the CEO, are
> listed at http://www.icann.org/en/general/board.html should you
> want to express your support (or offer other opinions).
> 
>    john
> 
> _______________________________________________
> Idna-update mailing list
> Idna-update <at> alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
Cary Karp | 10 Dec 2011 17:48
Picon

Re: Browser IDN display policy: opinions sought

Quoting Patrik:

> Let me emphasize that what John writes here is extremely important.
> If you have the slightest opinion of what "confusing" implies, and what
> implications approval of "too similar" TLDs might have, specifically cross
> scripts, you should let ICANN know.
>
> Now.

The advisory group for ICANN's Variant Issues Project will be holding
its wrap-up meeting in Marina del Rey on Monday and Tuesday The issues
report it is focusing on will finalized in short order thereafter. At
least three of the people on the idna-update list will be in MdR so
anything said on it can also be channeled into the VIP discussion.

The "now" that paf mentions really is NOW.

/Cary
Gervase Markham | 12 Dec 2011 13:06
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 10/12/11 16:35, Patrik Fältström wrote:
> Let me emphasize that what John writes here is extremely important.
> If you have the slightest opinion of what "confusing" implies, and
> what implications approval of "too similar" TLDs might have,
> specifically cross scripts, you should let ICANN know.
> 
> Now.

How? Hopefully this is the right channel.

Mozilla is of the very strong opinion that once it has entered the
territory of creating non-ASCII TLDs and opening up the TLD process to
applicants on a wide scale, the "confusables" issue becomes a matter of
serious ICANN concern. If I pay my money, can I register
".cyrillic_c-om"? If not, why not? What about ".cöm"? The implications
of having no restrictions in this area are the risk of opening up
user-unfriendly scenarios ranging from the shady (typo interception to
drive search engine traffic) to the downright criminal (phishing).

(IMO, the definition of 'confusable' for TLDs needs to be wider than
just visually confusable; I don't think anyone should be able to get
".cmo" either, for pretty much the same reasons - but that's out of
scope for my comments here.)

For this purpose, ICANN should be setting out best practice in the area
of algorithmically determining visual confusability and, once it has
done so, it should be contractually obliging all registries with whom it
has a contract to follow that best practice in regard to end-user
registrations.

Registries should be free to implement whatever technical measures work
best for them - ICANN should define a required outcome ("no two domains
which are 'confusable' under these rules should be registered to two
distinct entities"), not a mechanism.

As a browser maker, we would prefer not to have to put any restrictions
on the display of validly-registered IDNs, by being able to rely on
registries to do the right thing (and accept the blame if it goes
wrong). We want the registry community to solve this problem, rather
than having to impose a solution from outside.

Does that help? :-)

Gerv
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Patrik Fältström | 12 Dec 2011 13:24
Picon
Gravatar

Re: Browser IDN display policy: opinions sought


On 12 dec 2011, at 13:06, Gervase Markham wrote:

> On 10/12/11 16:35, Patrik Fältström wrote:
>> Let me emphasize that what John writes here is extremely important.
>> If you have the slightest opinion of what "confusing" implies, and
>> what implications approval of "too similar" TLDs might have,
>> specifically cross scripts, you should let ICANN know.
>> 
>> Now.
> 
> How? Hopefully this is the right channel.

By contacting ICANN. That you communicate in the IETF does not matter.

You can for this specific topic contact Bart Boswinkel <bart.boswinkel <at> icann.org>. cc:ed.

   Patrik
John C Klensin | 12 Dec 2011 15:46

Re: Browser IDN display policy: opinions sought

--On Monday, December 12, 2011 13:24 +0100 Patrik Fältström
<patrik <at> frobbit.se> wrote:

>...
> On 12 dec 2011, at 13:06, Gervase Markham wrote:
> 
>> On 10/12/11 16:35, Patrik Fältström wrote:
>>> Let me emphasize that what John writes here is extremely
>>> important. If you have the slightest opinion of what
>>> "confusing" implies, and what implications approval of "too
>>> similar" TLDs might have, specifically cross scripts, you
>>> should let ICANN know.
>>> 
>>> Now.
>> 
>> How? Hopefully this is the right channel.
> 
> By contacting ICANN. That you communicate in the IETF does not
> matter.

Let me say that more strongly.  Fortunately or unfortunately,
one of the things ICANN seems to do best in recent years is
denial.  If a staff member complains about something that is not
in line with current policy and explains why it is important, he
or she can be answered with "policy matter that has to come
through the bottom-up public process and staff cannot originate
such processes" (or she or he can be forced out or otherwise
punished).  My suggestion, or any other suggestion that ICANN is
responsible for doing anything in the domain space that might
actual restrict the market for names or what can be registered,
is likely to be wildly unpopular in some ICANN quarters and
hence likely to be ignored if that is possible.  And it is
really easy to say "whatever was discussed on an IETF list, or
an internal (even if public) Mozilla list doesn't count because
it isn't part of the ICANN process".

> You can for this specific topic contact Bart Boswinkel
> <bart.boswinkel <at> icann.org>. cc:ed.

For the reasons above, while I think contacting Bart is a fine
idea (and I would welcome his involvement), I think that we
should not try to put the responsibility for this on a single
staff member.  If anything were actually to be done, it will
take rather direct pressure on (voting) Board members in their
capacity as Board members.  

I note that, presumably in the spirit of openness and
transparency, ICANN not longer makes the addresses of Board
members, or even an address for the Board list, available.  The
CEO's address is also no longer easily accessible  Even SSAC's
page says to contact SSAC by sending mail to a named individual,
not to a list that might generate logs (even if private) and/or
tickets.  But I gave you the URL for the Board member list.
Most of those folks are not hard to find and, if you care, you
will find them.   And, as with Bart's address as provided in
Patrik's note and above, a reasonable person might quickly
figure out that most ICANN staff members can be reached using
the FirstName.LastName <at> icann.org convention.

    john

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Patrik Fältström | 12 Dec 2011 16:20
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 12 dec 2011, at 15:46, John C Klensin wrote:

>> You can for this specific topic contact Bart Boswinkel
>> <bart.boswinkel <at> icann.org>. cc:ed.
> 
> For the reasons above, while I think contacting Bart is a fine
> idea (and I would welcome his involvement), I think that we
> should not try to put the responsibility for this on a single
> staff member.  If anything were actually to be done, it will
> take rather direct pressure on (voting) Board members in their
> capacity as Board members.  

Correct.

> Even SSAC's
> page says to contact SSAC by sending mail to a named individual,
> not to a list that might generate logs (even if private) and/or
> tickets.

Julie is admin for SSAC, and email that is to SSAC is logged by her. We go through all such email every Friday.

Yes, it could have been something that does not look like an email address to an individual, but it would
still have ended up with a human. For spam reasons.

   Patrik
J-F C. Morfin | 11 Dec 2011 03:06

Re: Browser IDN display policy: opinions sought


>At 17:35 10/12/2011, Patrik Fältström wrote:
>Let me emphasize that what John writes here is extremely important. 
>If you have the slightest opinion of what "confusing" implies, and 
>what implications approval of "too similar" TLDs might have, 
>specifically cross scripts, you should let ICANN know.
>Now.
>I do explicitly also sign this email as the chair of SSAC. That is 
>not a mistake.
>Patrik Fältström
>Chair ICANN Security and Stability Committee

>At 17:48 10/12/2011, Cary Karp wrote:
>The advisory group for ICANN's Variant Issues Project will be holding
>its wrap-up meeting in Marina del Rey on Monday and Tuesday The issues
>report it is focusing on will finalized in short order thereafter. At
>least three of the people on the idna-update list will be in MdR so
>anything said on it can also be channeled into the VIP discussion.
>
>The "now" that paf mentions really is NOW.
>/Cary

At 16:28 10/12/2011, John C Klensin wrote:
>As to how realistic that assumption is, you might consider:
>
>(1) ICANN's Board apparently (minutes have not yet been posted)
>passed a resolution on Thursday exempting IDN variations on .EU
>from review for visual spoofing and other forms of
>confusability.

Dear John,

ICANN has nothing to do with the Internet *technology* that the IETF
is meant to positively influence (RFC 3935) and that Gervase has
questions about on behalf of Firefox.

The IAB has.

ICANN claims benefits from a limited set of TLDs in class IN, outside
of private networks, outside of IUsers interests, and outside of
non-Internet digital technologies that want to use/already use the
digital naming system and its root names (e.g.: .gsm, the Chinese TLDs
and keywords, etc.). The area of use of browsers (and probable forked
browsers, if open source ones become IUsers-foreign) is much larger.

IMHO, the last thing we need, hence my slowdown for more than one
year, is an inadvertent browser war, or Internet use confusion, over
IDNA. This is why, as IUsers, we certainly are interested in
presentation-layer-related-services being offered by browser
manufacturers, as long as they are RFC documented and we can turn them
up/down at our own will.

The way I understand the problem is that the ball has been in your IAB
field for more than one year after my appeal and the IESG/IAB have
clarified that my area of concerns did affect, but did not belong to,
the IETF scope. The IAB was (this is how I understood its road map) to
decide how IDN/IDNA SHOULD be used in the Internet technology context.
We did not move on this because I had hoped that we could maintain the
IDNA2008 consensus and permit the IAB to move faster in not
interfering. However, it turns out that I was wrong because the
blocking reality questioned by you, me, Lisa, etc. remains: the
IDNA2003/stringprep application-to-application architectural scheme
does not scale to IDNA2008.

We all have also known for years that this issue is to be addressed
prior to January 12, 2012.

The only viable architectural solution that I can figure out, but I
could also be totally wrong, is subsidiarity at the IUI (intelligent
use interface) on a fringe to fringe basis. The implications on many
other diversity related items should have been discussed, over the
course of the last two years, if some interests (IAB, IETF, ICANN,
IGF, GAC, Unicode, Google, etc.) wanted to keep the transition under
joint control. Since nothing has been discussed and, therefore,
prepared and tested, we are heading towards an entirely new internet
(IDv6 addressing, shared virtual unique root file, IDNgTLDs shambles,
IANA revamp, etc.) without any idea of the way we want to manage it
together, or even if we want to manage it together.

If the IAB wants to document its IAB chosen architecture for this new
internet, it has exactly one month left to be clear about a framework.
Then, practical experimentation will have to start prior to the first
ICANN IDNgTLD being officially accepted. Otherwise, it will not be
innovation towards a better internet for all. It will be a long-term
costly competition (global fight) between an obsolete use, a
commercial+ use and an emerging efficient use of the Internet.

I am going to document an IUse position that is to be I_Ded on January
10 if the IAB does not commit beforehand to a position that IUsers can
consensually support. I have no objection to discussing my memo while
I am preparing it (on the iucg <at> ietf.org mailing list: as you know, I
cannot discuss it on the ietf <at> ietf.org mailing list).

However, please let seriously discuss only network architecture and
stop cosmetic layer violation confusions.

At 02:02 11/12/2011, Paul Hoffman wrote:
>[the response to the users] needs to come from a stable, trusted body.
>I question whether such a body who is willing to make a table exists.

Until now I thought that such a competent body was the IAB.

This "really NOW" I will know if I was right.

Best

jfc
Martin J. Dürst | 14 Dec 2011 11:24
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

Hello John, others,

(sorry this mail took so long)

On 2011/12/11 0:28, John C Klensin wrote:

> I note that you say "should be able to assume".  It seems to me
> that is correct.  It also keeps you out of this wagon I'm riding
> to the nut house :-)
>
> As to how realistic that assumption is, you might consider:

Points below taken. My assumption was mainly based on the fact that even 
if we get to a few thousand TLDs, these are not handed out automatically 
like second- and third-level domains. If they get announced in any way, 
then there's a possibility for current holders to complain. As an 
example, I couldn't imagine that Verisign would be happy with somebody 
trying to get .СОМ (the visual equivalent of .COM in Cyrillic).

> (1) ICANN's Board apparently (minutes have not yet been posted)
> passed a resolution on Thursday exempting IDN variations on .EU
> from review for visual spoofing and other forms of
> confusability.   An optimist would assume that is a special
> case, never to be repeated, and that EU itself will be careful
> to avoid potentially-confusing strings even if doing so results
> in unnatural translations.  A pessimist would assume that anyone
> with sufficient power and leverage can get such an exemption and
> that one should be careful about what protections one assumes
> will come from that direction.

I sometimes like to thing through concrete examples. Some replies seemed 
to assume that this was about what is going to happen on the second 
level. But the way I understand your description, it seems to be about 
variations of .EU itself (please correct me if I'm wrong). In that case, 
I assume the EU is mostly interested in Greek (for Greece) and Cyrillic 
(for Bulgaria).

Looking at pages such as http://europa.eu/index_en.htm, 
http://europa.eu/index_bg.htm, and http://europa.eu/index_el.htm, it 
seems that the Greek version of .EU would be .ΕΕ, and the 
Cyrillic/Slavic version would be .ЕС. Now if we go to IANA, we find that
.ec is Ecuador (http://www.iana.org/domains/root/db/ec.html), and
.ee is Estonia (http://www.iana.org/domains/root/db/ee.html).
For lower case, the Greek one is .εε, which is way less of a problem, 
but for Cyrillic, it's .ес, which is exactly what we are most worried about.

Now I wonder what Ecuador and Estonia, and their domain name 
administrations, think about this issue (if they ever were asked or 
happened to notice), or what action they have taken, or tried to take.

I also wonder why the Board made such an apparently general exception 
rather than limiting this to one or two very specific cases, or 
attaching some additional conditions.

It wouldn't be too difficult to solve the issue with some negotiation: 
If .ec (Ecuador) agrees to never register Cyrillic domain names, and .ес 
(Cyrillic EU) agrees to never register domain names in Latin, and to not 
register any whole-script confusables in Cyrillic (and they make a deal 
of how to tread Greek), then things might just work out fine.

Regards,   Martin.
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Eric Brunner-Williams | 14 Dec 2011 13:39
Favicon

Re: Browser IDN display policy: opinions sought

On 12/14/11 5:24 AM, "Martin J. Dürst" wrote:
> 
> Points below taken. My assumption was mainly based on the fact that
> even if we get to a few thousand TLDs, these are not handed out
> automatically like second- and third-level domains. If they get
> announced in any way, then there's a possibility for current holders
> to complain. As an example, I couldn't imagine that Verisign would be
> happy with somebody trying to get .СОМ (the visual equivalent of .COM
> in Cyrillic).

The "confusingly similar" restriction anticipates this.

VGRS's plan of record (to the best of my knowledge) is to acquire many
equivalent strings in scripts other than Latin.

There are pre-existing equivalent strings in Han script in a namespace
with more users than the namespace in which VGRS currently holds
several nominally renewable franchises.

Mariners. Dragons. Etc.

...

> It wouldn't be too difficult to solve the issue with some negotiation:
> If .ec (Ecuador) agrees to never register Cyrillic domain names, and
> .ес (Cyrillic EU) agrees to never register domain names in Latin, and
> to not register any whole-script confusables in Cyrillic (and they
> make a deal of how to tread Greek), then things might just work out fine.

Country code registries might voluntarily agree to a registration
restriction policy.

For counter-examples, see NU re-purposed as "new" in extended Latin,
VGRS managed BZ re-purposed as "business" oddly coincidental with the
2000 round launch of a non-VGRS BIZ gTLD, and the recent NuStar
managed CO re-purposed as an SEO match-before COM.

For general failures of registration restriction policy see:
http://www.icann.org/en/general/litigation-employ-media.htm

There may be others already. There will be others.

Eric

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Paul Hoffman | 10 Dec 2011 18:26
Picon

Re: Browser IDN display policy: opinions sought

First, Mark's correction (which needs to be checked) is an important one:

On Dec 9, 2011, at 3:12 AM, Gervase Markham wrote:

> The policies fall into 3 approximate buckets:
> 
> A (IE, Chrome): Unicode if the (single) 'language' of the string is
> configured in the options, Punycode otherwise.
> 
> B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode
> otherwise. Arbitrary script mixing permitted (registry policy used to
> prevent abuse).
> 
> C (Safari): Unicode if the script is in a whitelist (which by default
> does not include Cyrillic or Greek), Punycode otherwise. Not sure about
> script mixing.

Later, Mark Davis said:

On Dec 9, 2011, at 10:10 AM, Mark Davis ☕ wrote:

> I'm not familiar with the code, but I think that (A) may actually be:
> 
> A (IE, Chrome): Unicode if the (single) 'script' of the string matches one of the scripts of the user's
language(s) in the options,
> Punycode otherwise.
> 
> It is pretty easy and reliable to detect the script of the string, whereas language detection would be unreliable.

What a few people might be asking for is:

D: Unicode if the label is a single script that is displayable by the browser, Punycode otherwise.

Restated less tersely:

D: If every character in the label comes from a single script as defined in the Unicode Standard, and every
character is displayable by the browser without resorting to "unknown" or "fallback" glyphs, display
the label; otherwise show Punycode.

This would lead to zone owners having more assurance of their zones being displayed properly as long as
every label is single-script. It requires no options-setting on the part of the user, which is a big win
over (A) for users who are multi-lingual, and completely avoids the "TLDs we like" problem of B.

Thoughts?

--Paul Hoffman
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Patrik Fältström | 10 Dec 2011 18:48
Picon
Gravatar

Re: Browser IDN display policy: opinions sought


On 10 dec 2011, at 18:26, Paul Hoffman wrote:

> D: Unicode if the label is a single script that is displayable by the browser, Punycode otherwise.

+1

With the exceptions for combinations of various scripts and script COMMON.

Or in other words: If the domain name can be displayed as a U-label, in a technically safe way, why not display
it as an U-label?

If I, that do not know Han, get a label in Han, I claim it is as non-understandable by me in Han as an A-label, but
I must say it is much more nice to see it in Han than the alternative.

Why discriminate against some scripts?

I.e. my view is that we should really limit the number of cases where A-label is displayed to the user.

If nothing else because people should and must start to understand we live in a multi script and multi
cultural world.

   Patrik
Michel Suignard | 10 Dec 2011 19:20

RE: Browser IDN display policy: opinions sought

> On 10 dec 2011, at 18:26, Paul Hoffman wrote:
>
>> D: Unicode if the label is a single script that is displayable by the
>> browser, Punycode otherwise.
>
>+1
>
>With the exceptions for combinations of various scripts and script COMMON.

In fact, mixed scripts are fine and desired  in many situations. Think of Romaji in Japan which cannot be
confused with Kana and Kanji. So in that case it is perfectly OK to have a white list of scripts that can be
mixed with Latin (which is the typical case). You really want to block the obvious threats such as
Latin/Cyrillic and Latin/Greek but let mixed scripts when there is no confusability. It is a case though
where you want a white list instead of a black list because new scripts get added to Unicode (and eventually
to IDN) which are highly confusable with other scripts.
>
>Or in other words: If the domain name can be displayed as a U-label,
> in a technically safe way, why not display it as an U-label?
>
>If I, that do not know Han, get a label in Han, I claim it is as
> non-understandable by me in Han as an A-label, but I must 
>say it is much more nice to see it in Han than the alternative.
>
>Why discriminate against some scripts?
>
> I.e. my view is that we should really limit the number of cases
> where A-label is displayed to the user.

Agree, I also think that most 'type A' implementations are too restrictive in that aspect.

>If nothing else because people should and must start to
> understand we live in a multi script and multi cultural world.

+1

Michel
Paul Hoffman | 10 Dec 2011 21:12
Picon

Re: Browser IDN display policy: opinions sought

On Dec 10, 2011, at 10:20 AM, Michel Suignard wrote:

>> On 10 dec 2011, at 18:26, Paul Hoffman wrote:
>> 
>>> D: Unicode if the label is a single script that is displayable by the
>>> browser, Punycode otherwise.
>> 
>> +1
>> 
>> With the exceptions for combinations of various scripts and script COMMON.
> 
> In fact, mixed scripts are fine and desired  in many situations.

Yes.

> Think of Romaji in Japan which cannot be confused with Kana and Kanji. So in that case it is perfectly OK to
have a white list of scripts that can be mixed with Latin (which is the typical case).

Sure, if you think there is a single entity who can make the whitelist of scripts that can be combined in a
single label.

I believe that there is not. I would like to be wrong.

--Paul Hoffman
Mark Davis ☕ | 10 Dec 2011 21:58

Re: Browser IDN display policy: opinions sought

There is a set of basic levels for mixing scripts set up in http://www.unicode.org/reports/tr36/proposed.html#Security_Levels_and_Alerts (I'd recommend reading the material before that section, however, for background information.)

A more comprehensive approach would be to use some of the mixed-script and whole-script confusable data as described in http://www.unicode.org/reports/tr39/proposed.html. Note that that data will grow over time; in particular, I'd expect the Indic confusables to be fleshed out further. I'd expect it to incorporate some of the work ongoing in ICANN for identifying confusables, as well.

There is a table of recommended scripts for identifiers in http://www.unicode.org/reports/tr31/proposed.html#Table_Recommended_Scripts. I wouldn't recommend mixtures of those those with others scripts (like Latin), or allowing whole-script confusables of other scripts that match those. That helps to keep out many of the characters that are extremely confusable with Latin.

I'm pointing to the proposed-update versions of these documents; comments are welcome on the text. The next meeting of the UTC to consider feedback will be in February.

Mark
— Il meglio è l’inimico del bene —


2011/12/10 Paul Hoffman <phoffman <at> imc.org>
On Dec 10, 2011, at 10:20 AM, Michel Suignard wrote:

>> On 10 dec 2011, at 18:26, Paul Hoffman wrote:
>>
>>> D: Unicode if the label is a single script that is displayable by the
>>> browser, Punycode otherwise.
>>
>> +1
>>
>> With the exceptions for combinations of various scripts and script COMMON.
>
> In fact, mixed scripts are fine and desired  in many situations.

Yes.

> Think of Romaji in Japan which cannot be confused with Kana and Kanji. So in that case it is perfectly OK to have a white list of scripts that can be mixed with Latin (which is the typical case).

Sure, if you think there is a single entity who can make the whitelist of scripts that can be combined in a single label.

I believe that there is not. I would like to be wrong.

--Paul Hoffman

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
James Seng | 11 Dec 2011 01:15
Picon

Re: Browser IDN display policy: opinions sought

On Sunday, December 11, 2011, Paul Hoffman wrote:
On Dec 10, 2011, at 10:20 AM, Michel Suignard wrote:

>> On 10 dec 2011, at 18:26, Paul Hoffman wrote:
>>
>>> D: Unicode if the label is a single script that is displayable by the
>>> browser, Punycode otherwise.
>>
>> +1
>>
>> With the exceptions for combinations of various scripts and script COMMON.
>
> In fact, mixed scripts are fine and desired  in many situations.

Yes.

Many languages uses more than one script in their written system. Even Chinese which most people think is merely CJK Unified Ideograph would use ASCII and sometimes others like Bopomofo. 

 

> Think of Romaji in Japan which cannot be confused with Kana and Kanji. So in that case it is perfectly OK to have a white list of scripts that can be mixed with Latin (which is the typical case).

Sure, if you think there is a single entity who can make the whitelist of scripts that can be combined in a single label.

I believe that there is not. I would like to be wrong.


Instead of trying to say what language would use a script sets and therefore display it as U-label, why not the other way round? We know latin/cyrillic combination would be a problem. We know there would be other combination of scripts would be a problem. We make combination of those scripts and display them in Punycode UNLESS the "language" of the string is configured in the options.

I think we need a combination of auto-detect problem U-labels and a whitelist.

-James Seng
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Paul Hoffman | 11 Dec 2011 02:02
Picon

Re: Browser IDN display policy: opinions sought

On Dec 10, 2011, at 4:15 PM, James Seng wrote:

> Many languages uses more than one script in their written system. Even Chinese which most people think is
merely CJK Unified Ideograph would use ASCII and sometimes others like Bopomofo. 

Of course. Greek uses Latin digits. And so on.

We sang this song about a decade ago. Nothing has changed.

> Instead of trying to say what language would use a script sets and therefore display it as U-label, why not
the other way round? We know latin/cyrillic combination would be a problem. We know there would be other
combination of scripts would be a problem. We make combination of those scripts and display them in
Punycode UNLESS the "language" of the string is configured in the options.

And we sang that one too. What "language" is a string that is two Han characters and some Latin digits?

> I think we need a combination of auto-detect problem U-labels and a whitelist.

We don't need anything: the rest of the users do. And it needs to come from a stable, trusted body. I question
whether such a body who is willing to make a table exists.

--Paul Hoffman
Andrew Sullivan | 11 Dec 2011 20:20

Re: Browser IDN display policy: opinions sought

On Sat, Dec 10, 2011 at 06:48:20PM +0100, Patrik Fältström wrote:
> 
> On 10 dec 2011, at 18:26, Paul Hoffman wrote:
> 
> > D: Unicode if the label is a single script that is displayable by the browser, Punycode otherwise.
> 
> +1
> 
> With the exceptions for combinations of various scripts and script COMMON.

But this boils down to, "We should do that, unless we can't and it
makes sense to do something else."  Which is practically equivalent to
"we should follow this rule except when we don't."

Note that even LDH is not in one script.

A

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Paul Hoffman | 11 Dec 2011 21:03
Picon

Re: Browser IDN display policy: opinions sought

On Dec 11, 2011, at 11:20 AM, Andrew Sullivan wrote:

> Note that even LDH is not in one script.

Ouch. Exceptionally good point.

--Paul Hoffman
John C Klensin | 11 Dec 2011 21:12

Re: Browser IDN display policy: opinions sought


--On Sunday, December 11, 2011 14:20 -0500 Andrew Sullivan
<ajs <at> anvilwalrusden.com> wrote:

> On Sat, Dec 10, 2011 at 06:48:20PM +0100, Patrik Fältström
> wrote:
>> 
>> On 10 dec 2011, at 18:26, Paul Hoffman wrote:
>> 
>> > D: Unicode if the label is a single script that is
>> > displayable by the browser, Punycode otherwise.
>> 
>> +1
>> 
>> With the exceptions for combinations of various scripts and
>> script COMMON.
> 
> But this boils down to, "We should do that, unless we can't
> and it makes sense to do something else."  Which is
> practically equivalent to "we should follow this rule except
> when we don't."
> 
> Note that even LDH is not in one script.

It also raises a very complex problem from which none of these
strategies are immune (unless they are completely focused on
user experience without even a hint of protecting people from
harm).  What we know already is that script-mixing tests aren't
much good.  Yes, preventing them (preferably as a registration
norm) unless they are actually necessary, is a good thing to do.
But, if someone is actually planning an attack, there are more
than enough "all in one script but confusable with another"
examples to provide ample opportunities.

If we tell, or appear to tell, the poor lusers that we are
protecting them against a particular variety of attack --such as
confusing names-- and end up doing that often enough to be
persuasive that we are accomplishing something while remaining
open to slightly-more-clever attacks, we actually decrease
effective security by encouraging the user to become less wary.
Repeated observations from many parts of the world that adding
traffic signals to interactions that were known to be dangerous
often increases the accident rate come to mind here.

So let's be a little careful about our assumptions about who we
are helping and with what.

   john

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Martin J. Dürst | 12 Dec 2011 07:54
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/12 5:12, John C Klensin wrote:

> It also raises a very complex problem from which none of these
> strategies are immune (unless they are completely focused on
> user experience without even a hint of protecting people from
> harm).  What we know already is that script-mixing tests aren't
> much good.  Yes, preventing them (preferably as a registration
> norm) unless they are actually necessary, is a good thing to do.
> But, if someone is actually planning an attack, there are more
> than enough "all in one script but confusable with another"
> examples to provide ample opportunities.

Yes indeed. The browser vendors overreacted on issues such as 
script-mixing, stuff that the user isn't able to read, and so on, 
because overreacting was easier than a more careful reaction, and they 
were able to say that they did something.

But they didn't do much for in-script attacks, because that's much more 
difficult. (Not that I'm advocating a browser that shows MlCR0S0FT.com 
with punycode.)

> If we tell, or appear to tell, the poor lusers that we are
> protecting them against a particular variety of attack --such as
> confusing names-- and end up doing that often enough to be
> persuasive that we are accomplishing something while remaining
> open to slightly-more-clever attacks, we actually decrease
> effective security by encouraging the user to become less wary.

With regards to wrong messages, I'm not so concerned about the typical 
"luser", but about the people between the end users and the hard-core 
tech experts. I'm not so much concerned about the actual loss of money. 
People who are stupid enough to click before thinking will click before 
thinking, whatever the circumstances. The APWG and others will be busy 
to take down phishers as fast as they can independent of what we may or 
may not tell people. But I'm concerned about the wasted effort on 
implementations and the damage from suboptimal implementations (e.g. 
showing only a small part of what could be shown without any direct spam 
potential).

Regards,    Martin.
Vint Cerf | 12 Dec 2011 08:38
Picon
Favicon

Re: Browser IDN display policy: opinions sought

as ugly as it might be, I wonder whether showing both UNICODE and
PUNYCODE expressions would be useful? By itself, this might not reveal
a spoof unless one had bookmarked domain names against which to
compare, of course. If the viewer does not have an appropriate
font/script, of course, then only the PUNYCODE is displayable to any
useful effect. Again, one would need reference points to detect a
spoof.

v

On Mon, Dec 12, 2011 at 1:54 AM, "Martin J. Dürst"
<duerst <at> it.aoyama.ac.jp> wrote:
> On 2011/12/12 5:12, John C Klensin wrote:
>
>> It also raises a very complex problem from which none of these
>> strategies are immune (unless they are completely focused on
>> user experience without even a hint of protecting people from
>> harm).  What we know already is that script-mixing tests aren't
>> much good.  Yes, preventing them (preferably as a registration
>> norm) unless they are actually necessary, is a good thing to do.
>> But, if someone is actually planning an attack, there are more
>> than enough "all in one script but confusable with another"
>> examples to provide ample opportunities.
>
>
> Yes indeed. The browser vendors overreacted on issues such as script-mixing,
> stuff that the user isn't able to read, and so on, because overreacting was
> easier than a more careful reaction, and they were able to say that they did
> something.
>
> But they didn't do much for in-script attacks, because that's much more
> difficult. (Not that I'm advocating a browser that shows MlCR0S0FT.com with
> punycode.)
>
>
>> If we tell, or appear to tell, the poor lusers that we are
>> protecting them against a particular variety of attack --such as
>> confusing names-- and end up doing that often enough to be
>> persuasive that we are accomplishing something while remaining
>> open to slightly-more-clever attacks, we actually decrease
>> effective security by encouraging the user to become less wary.
>
>
> With regards to wrong messages, I'm not so concerned about the typical
> "luser", but about the people between the end users and the hard-core tech
> experts. I'm not so much concerned about the actual loss of money. People
> who are stupid enough to click before thinking will click before thinking,
> whatever the circumstances. The APWG and others will be busy to take down
> phishers as fast as they can independent of what we may or may not tell
> people. But I'm concerned about the wasted effort on implementations and the
> damage from suboptimal implementations (e.g. showing only a small part of
> what could be shown without any direct spam potential).
>
> Regards,    Martin.
>
> _______________________________________________
> Idna-update mailing list
> Idna-update <at> alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
Gervase Markham | 12 Dec 2011 13:04
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 12/12/11 06:54, "Martin J. Dürst" wrote:
> Yes indeed. The browser vendors overreacted on issues such as
> script-mixing, stuff that the user isn't able to read, and so on,
> because overreacting was easier than a more careful reaction, and they
> were able to say that they did something.

Perhaps if we are throwing blame around, a little could be reserved for
registries which said "this is not our problem, guvnor, it's intended to
work this way"?

I deny the charge that we over-reacted. The list of reasons for our
policy at the bottom of this page:
http://www.mozilla.org/projects/security/tld-idn-policy-list.html
was compelling then and is still pretty compelling today IMO.

Gerv
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Martin J. Dürst | 13 Dec 2011 11:19
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/12 21:04, Gervase Markham wrote:
> On 12/12/11 06:54, "Martin J. Dürst" wrote:
>> Yes indeed. The browser vendors overreacted on issues such as
>> script-mixing, stuff that the user isn't able to read, and so on,
>> because overreacting was easier than a more careful reaction, and they
>> were able to say that they did something.
>
> Perhaps if we are throwing blame around,

Sorry about that.

> a little could be reserved for
> registries which said "this is not our problem, guvnor, it's intended to
> work this way"?

Please rest assured that I didn't want to single out the browser side, 
nor Firefox in particular. I have plenty of blame left for the 
above-mentioned registries, and will not hesitate to throw it their way 
as much as they deserve.

> I deny the charge that we over-reacted.

To take some very simple examples, why do I have to look at 
http://www.xn--viagnie-eya.com/ in Firefox when I can see 
http://www.viagénie.com in Chrome? The other way round, why do I have to 
look at http://xn--80abvnkf0a.xn--p1ai in Chome when I can see 
http://биатлон.рф in Firefox? (and why can I see both of these in Opera 
and in Safari?)

There is absolutely *nothing* wrong with displaying either of these 
domain names. Otherwise, the other browsers displaying them, and their 
users, would be in deep trouble, wouldn't they? That's the reason I say 
that browsers overreacted.

> The list of reasons for our
> policy at the bottom of this page:
> http://www.mozilla.org/projects/security/tld-idn-policy-list.html
> was compelling then and is still pretty compelling today IMO.

Others have poked holes at that already. I personally don't think it was 
such a bad idea to try and get the registries to do their job by some 
more or less gentle pressure. And I don't think that the criteria you 
have are overall unreasonable. And it might even have worked quite a bit 
better if all the browsers had collaborated.

But if I were in the position of a registry, I'm not sure I'd taken the 
time to submit bug reports. One browser might be okay, but what if all 
browsers were doing this? And what if their criteria were all slightly 
different? The number of browsers is another dimension over which this 
approach doesn't scale. (For those people who think that there are only 
four or five browsers around: That's totally wrong. There are four or 
five major browsers, but there are many, many others.)

Another potentially very important aspect is 'who is in charge'. I have 
absolutely zero insider knowledge, and am just speculating, but it might 
be possible that Verisign simply doesn't think they should in any way 
obey instructions of a browser maker as to what they can and can't 
register, or what they should and shouldn't document, or that they have 
to submit a bug report (rather than the browser maker doing the legwork 
if they choose to use their own criteria).

If something like the above speculation is true, then the whole thing 
may have ended up as a power struggle between registries and browsers, 
fought on the backs of users (both those registering IDNs and those 
simply wanting to browse them).

Anyway, I don't necessarily want Firefox to abandon policy B. Definitely 
not if that meant switching to A, which has just about the same number 
of problems, just different ones. As you already are thinking about 
changing policies (which would mean implementing them), why not 
implement another policy and then OR the results together. That way, the 
huge number of current false positives gets reduced quite a bit.

Regards,    Martin.
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Gervase Markham | 13 Dec 2011 11:43
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 13/12/11 10:19, "Martin J. Dürst" wrote:
> To take some very simple examples, why do I have to look at
> http://www.xn--viagnie-eya.com/ in Firefox when I can see
> http://www.viagénie.com in Chrome? The other way round, why do I have to
> look at http://xn--80abvnkf0a.xn--p1ai in Chome when I can see
> http://биатлон.рф in Firefox? (and why can I see both of these in Opera
> and in Safari?)
> 
> There is absolutely *nothing* wrong with displaying either of these
> domain names. Otherwise, the other browsers displaying them, and their
> users, would be in deep trouble, wouldn't they? That's the reason I say
> that browsers overreacted.

That sounds like your definition of "not overreacting" is "solving the
problem with a 0% false positive and 0% false negative rate". (Or
perhaps you would have been happy with us doing nothing, which has a 0%
false positive rate, but a non-0 false negative rate.)

> But if I were in the position of a registry, I'm not sure I'd taken the
> time to submit bug reports. One browser might be okay, but what if all
> browsers were doing this? And what if their criteria were all slightly
> different? The number of browsers is another dimension over which this
> approach doesn't scale. (For those people who think that there are only
> four or five browsers around: That's totally wrong. There are four or
> five major browsers, but there are many, many others.)

In practice, as with CA inclusions, it tends to be that the smaller
browsers just adopt the decisions of the larger ones. Even some large
browsers do - Google Chrome uses the OS cert store on each platform, AIUI.

This does have, of course, pros and cons. But given the fairly low
effort of filing a bug with us, and the corresponding gain that an
additional 30% of the Internet will be able to correctly see the IDNs
you are selling, that seems like a good use of registry time to me...

> Anyway, I don't necessarily want Firefox to abandon policy B. Definitely
> not if that meant switching to A, which has just about the same number
> of problems, just different ones. As you already are thinking about
> changing policies (which would mean implementing them), why not
> implement another policy and then OR the results together. That way, the
> huge number of current false positives gets reduced quite a bit.

We are certainly open to proposals D, E, F and beyond. (Although due to
the way Mozilla works, I can't promise an implementation timescale!)

Gerv
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Martin J. Dürst | 13 Dec 2011 12:24
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/13 19:43, Gervase Markham wrote:
> On 13/12/11 10:19, "Martin J. Dürst" wrote:
>> To take some very simple examples, why do I have to look at
>> http://www.xn--viagnie-eya.com/ in Firefox when I can see
>> http://www.viagénie.com in Chrome? The other way round, why do I have to
>> look at http://xn--80abvnkf0a.xn--p1ai in Chome when I can see
>> http://биатлон.рф in Firefox? (and why can I see both of these in Opera
>> and in Safari?)
>>
>> There is absolutely *nothing* wrong with displaying either of these
>> domain names. Otherwise, the other browsers displaying them, and their
>> users, would be in deep trouble, wouldn't they? That's the reason I say
>> that browsers overreacted.
>
> That sounds like your definition of "not overreacting" is "solving the
> problem with a 0% false positive and 0% false negative rate".

Well, that would indeed have been wonderful :-). But without attaching 
any significance to actual numbers, what I was trying to say is the 
following:

Currently you have essentially 0% false negatives if we define the 
problem narrow enough (i.e. we exclude in-script confusables, 
orthographic stuff such as color/colour, and the like), but something 
like 50% or 80% or so false positives. And you could easily decrease the 
number of false positives by quite a lot (possibly getting into a 5% or 
even a 1% range) with a different policy or a combination of policies.

> (Or
> perhaps you would have been happy with us doing nothing, which has a 0%
> false positive rate, but a non-0 false negative rate.)

[Me personally, why not. But because I'm using a Japanese OS, I'm in the 
lucky situation that Cyrillic and Greek are displayed in full-width, 
which makes them really stick out. And I'm also a rather careful 
browser, or so I hope.]

For the overall Web, not displaying (as U-labels) mixed-script labels 
(except for where exceptions are truly needed) and whole-script 
confusables is definitely a good thing. But that's many less labels than 
what is affected in the browsers currently.

>> But if I were in the position of a registry, I'm not sure I'd taken the
>> time to submit bug reports. One browser might be okay, but what if all
>> browsers were doing this? And what if their criteria were all slightly
>> different? The number of browsers is another dimension over which this
>> approach doesn't scale. (For those people who think that there are only
>> four or five browsers around: That's totally wrong. There are four or
>> five major browsers, but there are many, many others.)
>
> In practice, as with CA inclusions, it tends to be that the smaller
> browsers just adopt the decisions of the larger ones.

Good to know.

> Even some large
> browsers do - Google Chrome uses the OS cert store on each platform, AIUI.
>
> This does have, of course, pros and cons. But given the fairly low
> effort of filing a bug with us, and the corresponding gain that an
> additional 30% of the Internet will be able to correctly see the IDNs
> you are selling, that seems like a good use of registry time to me...

To you of course :-). But some registries might be more concerned about 
authority and 'face' than about practicalities.

Regards,     Martin.
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Picon

Re: Browser IDN display policy: opinions sought



2011/12/13 Gervase Markham <gerv <at> mozilla.org>

We are certainly open to proposals D, E, F and beyond. (Although due to
the way Mozilla works, I can't promise an implementation timescale!)

Users only expect every browser to do its browser job and to possibly provide a choice among the 0, A, B, C, ... X, Y, Z protection technically and commercially competitive options. Please remember that ICANN and IETF are to foster competition in these areas in the best public interest.

Disregarding the 0 protection option is therefore a marketing choice. My opinion is that it will lead many, if not most to use another browser offering that option.

Portzamparc
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Kent Karlsson | 13 Dec 2011 12:49
Picon

Re: Browser IDN display policy: opinions sought


Den 2011-12-13 11:19, skrev "Martin J. Dürst" <duerst <at> it.aoyama.ac.jp>:

> 
To take some very simple examples, why do I have to look at
> 
http://www.xn--viagnie-eya.com/ in Firefox when I can see
> 
http://www.viagénie.com in Chrome? The other way round, why do I have to
> 
look at http://xn--80abvnkf0a.xn--p1ai in Chome when I can see
> 
http://биатлон.рф in Firefox? (and why can I see both of these in Opera
> 
and in Safari?)

Again I feel I must reiterate that the very idea of using the punycode
version of a domain name as a kind of error/warning indication is
severely flawed.

1) Pure ASCII domain names cannot be warning/error indicated this way.
The coded version is the same as the "cleartext" version. There is no
reason to assume that just because a domain name is in pure ASCII that
it should not get any error/warning indication. Nor is there a reason to
think that *just because* a domain name isn't pure ASCII, it should be
subject to scrutiny that is not afforded to pure ASCII domain names.

2) Using the "punycoded" version of a domain name as some form of
error/warning indication was never a design point for that encoding.
It is purely a content transfer encoding, designed to be efficient
(short) and the encoded form to be (a subset of) pure ASCII. It is
strongly ill-suited as a error/warning indication. It rather indicates
that there is something wrong with the browser (whatever), especially
if other browsers (whatever) can display the "proper" name. (That
there are apparently punycoded domain names that cannot be decoded
without errors is a different matter.)

3) The kind of error/warning is in no way indicated by using the
punycoded version. Could it not be decoded? Does it mix scripts
inappropriately? Is it not in the "whitelist" (of some sort)?  Is it
(in part) in a "whole script" confusable script (with a script the
user has declared wishing to see)? Or what?

4) When the warning is misdirected (as in Martins examples above) it is
hard for the end user to get the decoded (plaintext) form. It is usually
not possible to easily toggle the display form. And cut&paste may be
even more flawed.

So please use another way of indication of this kind of problems that
is not thus flawed.

    /Kent K

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
John C Klensin | 12 Dec 2011 22:44

Re: Browser IDN display policy: opinions sought


--On Monday, December 12, 2011 15:54 +0900 "\"Martin J.
Dürst\"" <duerst <at> it.aoyama.ac.jp> wrote:

>...
> Yes indeed. The browser vendors overreacted on issues such as
> script-mixing, stuff that the user isn't able to read, and so
> on, because overreacting was easier than a more careful
> reaction, and they were able to say that they did something.

I think "overreacted" is debatable.  It depends on one's
evaluation of the threat... and even one's guesses about how
much worse the threat might be if the browser vendors hadn't
tried to take precautions.  There is a very big difference
between an accidental source of confusion and a deliberate
attack on the user's perceptions and gullibility by a Bad Guy.
I saw signs of some pretty careful analysis, too.

> But they didn't do much for in-script attacks, because that's
> much more difficult. (Not that I'm advocating a browser that
> shows MlCR0S0FT.com with punycode.)

Quite the contrary.  If Gerv's "Type B" model were actually
effective and workable (another question, for better or worse),
it would address in-script attacks, and so-called whole-label or
whole-script attacks (two labels, each with all of its
characters in a single script, that look alike), because it is
model based on whether or not the registry has effective
policies to prevent such registrations.

>> If we tell, or appear to tell, the poor lusers that we are
>> protecting them against a particular variety of attack --such
>> as confusing names-- and end up doing that often enough to be
>> persuasive that we are accomplishing something while remaining
>> open to slightly-more-clever attacks, we actually decrease
>> effective security by encouraging the user to become less
>> wary.

> With regards to wrong messages, I'm not so concerned about the
> typical "luser", but about the people between the end users
> and the hard-core tech experts. I'm not so much concerned
> about the actual loss of money. People who are stupid enough
> to click before thinking will click before thinking, whatever
> the circumstances. The APWG and others will be busy to take
> down phishers as fast as they can independent of what we may
> or may not tell people. But I'm concerned about the wasted
> effort on implementations and the damage from suboptimal
> implementations (e.g. showing only a small part of what could
> be shown without any direct spam potential).

That is fair and clearly part of the tradeoff we (and Gerv and
his colleagues and competitors in particular) are facing.  You
are concerned about that issue.  He is concerned about a harm
that he could reasonably prevent.  The position he has taken
with "Type B" is actually a fairly moderation one: if one
accepts his position about avoidable harm and combines it with
your "stupid people" hypothesis (in this contenxt essentially
equivalent to the abbreviated form "luser"), then the action he
should be taking might be to display the link (in either A-label
or U-label form) but respond to anyone clicking on it by popping
up a rather threatening "IDNs in this domain are not policed for
spoofing behavior and this one might be Really Evil; if you want
to continue. type the square root of three into this box to at
least digits of accuracy" (the latter to prevent anyone
mindlessly clicking "yes" and to exclude any user who doesn't
know what a square root is entirely) :-(.  I think both concerns
are reasonable and the difficulties lie in the tradeoffs with no
"right" answers (other than, possibly, "browsers shouldn't have
to deal with this nonsense because someone further upstream has
done so".

best,
   john

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Tina Dam | 12 Dec 2011 23:28
Picon

Re: Browser IDN display policy: opinions sought

Trying to catch up on the many emails on this topic.

2011/12/10 Patrik Fältström <patrik <at> frobbit.se>
>
>
> On 10 dec 2011, at 18:26, Paul Hoffman wrote:
>
> > D: Unicode if the label is a single script that is displayable by the browser, Punycode otherwise.
>
> +1
>
> With the exceptions for combinations of various scripts and script COMMON.
>
> Or in other words: If the domain name can be displayed as a U-label, in a technically safe way, why not
display it as an U-label?

+1.

I agree with others that the three other options are not to be
desired. But since I don't see us reaching a 100% solution anytime
soon, so if I had to select between A, B, and C, I would select A.
The issues between the different options have been discussed at
lenght, so let me just say that the biggest problem I have with B is
that it leave it up to Firefox to decide what is a good/bad TLD
registry. I think that belongs elsewhere, namely with ICANN.

An additional couple of points as I have been reading through all the emails:

They reminds me a lot of the discussions we had during the IDN
Guidelines revisions. For example, as opposed to making a list of
scripts allowed to be mixed the rule is:

"All code points in a single label will be taken from the same script
as determined by the Unicode Standard Annex #24: Script Names
<http://www.unicode.org/reports/tr24>. Exceptions to this guideline
are permissible for languages with established orthographies and
conventions that require the commingled use of multiple scripts. Even
in the case of this exception, visually confusable characters from
different scripts will not be allowed to co-exist in a single set of
permissible code points unless a corresponding policy and character
table is clearly defined."

It also requires registries to submit to ICANN (for display) their IDN Tables.

So I would suggest that part of the inquiry to ICANN would be to
enforce compliance with the IDN Guidelines. This should at minimum
help to:

1) ensure TLD registries supporting IDNs do so in a responsible manner.

2) display in one place the languages and scripts that each registry
is supporting (I lost track but it was requested in one of the emails
on this topic).

I understand that there are linguistic requirements in the Guidelines
that are not within ICANN's area of expertise, and that it is a major
undertaking for the ICANN staff, but there still is the opportunity
for a relevant entity to be contracted with ICANN to do some/all of
this work.

Tina
Tina Dam | 13 Dec 2011 00:03
Picon

Re: Browser IDN display policy: opinions sought

One more thing. Perhaps we need to treat the IDN Guidelines the same
way the protocol revision did - i.e. seperate guidelines for
registration and resolution/display? Or is that re-opening the
discussion that Gerv tried to avoid?

2011/12/12 Tina Dam <tinadam <at> gmail.com>:
> Trying to catch up on the many emails on this topic.
>
> 2011/12/10 Patrik Fältström <patrik <at> frobbit.se>
>>
>>
>> On 10 dec 2011, at 18:26, Paul Hoffman wrote:
>>
>> > D: Unicode if the label is a single script that is displayable by the browser, Punycode otherwise.
>>
>> +1
>>
>> With the exceptions for combinations of various scripts and script COMMON.
>>
>> Or in other words: If the domain name can be displayed as a U-label, in a technically safe way, why not
display it as an U-label?
>
> +1.
>
> I agree with others that the three other options are not to be
> desired. But since I don't see us reaching a 100% solution anytime
> soon, so if I had to select between A, B, and C, I would select A.
> The issues between the different options have been discussed at
> lenght, so let me just say that the biggest problem I have with B is
> that it leave it up to Firefox to decide what is a good/bad TLD
> registry. I think that belongs elsewhere, namely with ICANN.
>
> An additional couple of points as I have been reading through all the emails:
>
> They reminds me a lot of the discussions we had during the IDN
> Guidelines revisions. For example, as opposed to making a list of
> scripts allowed to be mixed the rule is:
>
> "All code points in a single label will be taken from the same script
> as determined by the Unicode Standard Annex #24: Script Names
> <http://www.unicode.org/reports/tr24>. Exceptions to this guideline
> are permissible for languages with established orthographies and
> conventions that require the commingled use of multiple scripts. Even
> in the case of this exception, visually confusable characters from
> different scripts will not be allowed to co-exist in a single set of
> permissible code points unless a corresponding policy and character
> table is clearly defined."
>
> It also requires registries to submit to ICANN (for display) their IDN Tables.
>
> So I would suggest that part of the inquiry to ICANN would be to
> enforce compliance with the IDN Guidelines. This should at minimum
> help to:
>
> 1) ensure TLD registries supporting IDNs do so in a responsible manner.
>
> 2) display in one place the languages and scripts that each registry
> is supporting (I lost track but it was requested in one of the emails
> on this topic).
>
> I understand that there are linguistic requirements in the Guidelines
> that are not within ICANN's area of expertise, and that it is a major
> undertaking for the ICANN staff, but there still is the opportunity
> for a relevant entity to be contracted with ICANN to do some/all of
> this work.
>
> Tina
Martin J. Dürst | 13 Dec 2011 06:12
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/13 8:03, Tina Dam wrote:
> One more thing. Perhaps we need to treat the IDN Guidelines the same
> way the protocol revision did - i.e. seperate guidelines for
> registration and resolution/display? Or is that re-opening the
> discussion that Gerv tried to avoid?

The reason this was separated in the protocol was that registering as 
yet unassigned codepoints is total nonsense, whereas accepting as yet 
unassigned codepoints for resolution/display makes sense because that 
avoids the need for software updates.

For the protocol, it made sense to be a bit looser on the receiving 
side. But for the security protections we are talking about now, and on 
the level of general guidelines, I don't see that making sense. If 
something "looks dangerous", then it shouldn't be displayed. If 
something "looks dangerous", then it shouldn't be registered.

I may be wrong, but it looks like the main problem isn't that the two 
sides might be different. The main problem is that both sides would be 
the same, and therefore every side tries to blame the other, and get 
away with it.

Regards,    Martin.
John C Klensin | 13 Dec 2011 10:37

Re: Browser IDN display policy: opinions sought


--On Tuesday, December 13, 2011 14:12 +0900 "\"Martin J.
Dürst\"" <duerst <at> it.aoyama.ac.jp> wrote:

> On 2011/12/13 8:03, Tina Dam wrote:
>> One more thing. Perhaps we need to treat the IDN Guidelines
>> the same way the protocol revision did - i.e. seperate
>> guidelines for registration and resolution/display? Or is
>> that re-opening the discussion that Gerv tried to avoid?

> The reason this was separated in the protocol was that
> registering as yet unassigned codepoints is total nonsense,
> whereas accepting as yet unassigned codepoints for
> resolution/display makes sense because that avoids the need
> for software updates.

Actually, that breaks down too if one is using a system (like
IDNA2008) that is dependent on Unicode properties that cannot be
known until a character is bound to the code point.  If you
don't care about properties (except insofar as they are
reflected in a highly reified table), then "don't register
strings containing unknown code points but it is ok to resolve
them" is a reasonable strategy (and the IDNA2003 strategy).
Unfortunately, defining the standard in terms of the required
table creates significant version dependencies.  By contrast, if
one gets rid of the version dependencies (modulo the presumably
infrequent need to deal with exception cases) by going to a
property model, the properties have to be known in principle at
both registration and lookup time.  That, in turn, prevents
looking up unknown code points because one cannot know if they
are valid... at least without putting a lot of trust in  the
registrars and registries who, from Gerv's point of view (and
that of many others) are Part Of The Problem.

> For the protocol, it made sense to be a bit looser on the
> receiving side. But for the security protections we are
> talking about now, and on the level of general guidelines, I
> don't see that making sense. If something "looks dangerous",
> then it shouldn't be displayed. If something "looks
> dangerous", then it shouldn't be registered.

Exactly.  And, seen in that light, what we are looking at with
Types A, B, and C are different lookup-size surrogates for
"looks dangerous".  And you and others have (IMO correctly)
pointed out, none of them is very good for that purpose.

> I may be wrong, but it looks like the main problem isn't that
> the two sides might be different. The main problem is that
> both sides would be the same, and therefore every side tries
> to blame the other, and get away with it.

I think that is right, but things are a little more complex.
Gerv's "Type B" model looks at registry policy and then, of
necessity, treats perfectly reasonable names as potentially bad
because they are registered in a TLD whose policies would allow
less reasonable names to be registered.  The language-based
approaches of Type A treat some perfectly reasonable names is
potentially bad because (I think, with some supporting evidence)
they are written in a script that isn't associated with one of
the languages the user more or less claims to read and write.
The slightly more script-based Type C treats some perfectly
reasonable names as bad because they are written in scripts that
the user hasn't certified she uses, even though the characters
of that script might be perfectly differentiable to that
particular user.  From a whitelisting perspective, there are
lots and lots of false negatives at the name/label level because
no one can really do much with labels at lookup time so they are
using these language/ registry/ script surrogates instead.

Remembering that the fact that two strings can be confused
shouldn't prohibit either from being delegated and used but
should only either prevent delegation or one of them or apply
restrictions to it ownership and/or use.  At least in general,
per label tests can be reasonably carried out only at
registration time because only the registries and registration
processes can examine exclusion lists, do (non-DNS) fuzzy match
searches over whatever is registered, usefully apply.  That may
not eliminate the need for any or all of Types A, B, and C, but
it would certainly contain a lot of the problem.

best,
    john

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Martin J. Dürst | 13 Dec 2011 06:18
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/13 7:28, Tina Dam wrote:

>> Or in other words: If the domain name can be displayed as a U-label, in a technically safe way, why not
display it as an U-label?
>
> +1.
>
> I agree with others that the three other options are not to be
> desired. But since I don't see us reaching a 100% solution anytime
> soon, so if I had to select between A, B, and C, I would select A.
> The issues between the different options have been discussed at
> lenght, so let me just say that the biggest problem I have with B is
> that it leave it up to Firefox to decide what is a good/bad TLD
> registry. I think that belongs elsewhere, namely with ICANN.

If we had enough faith in ICANN, that might work. But judging from the 
mails on this list from people close to the ICANN process, I see lots of 
doubts and not much faith at all.

Also, as I have said earlier, asking "one only of A or B or C" is 
essentially the wrong question. A 100% solution will indeed be 
difficult, but each of A or B or C only are essentially something like 
20% solutions, throwing most of the baby (in terms of the problem, 
totally harmless IDNs) out with the bathwater.

Regards,   Martin.
Tina Dam | 13 Dec 2011 06:48
Picon

Re: Browser IDN display policy: opinions sought

feedback below....

2011/12/12 "Martin J. Dürst" <duerst <at> it.aoyama.ac.jp>:
> On 2011/12/13 7:28, Tina Dam wrote:
>
>>> Or in other words: If the domain name can be displayed as a U-label, in a
>>> technically safe way, why not display it as an U-label?
>>
>>
>> +1.
>>
>> I agree with others that the three other options are not to be
>> desired. But since I don't see us reaching a 100% solution anytime
>> soon, so if I had to select between A, B, and C, I would select A.
>> The issues between the different options have been discussed at
>> lenght, so let me just say that the biggest problem I have with B is
>> that it leave it up to Firefox to decide what is a good/bad TLD
>> registry. I think that belongs elsewhere, namely with ICANN.
>
>
> If we had enough faith in ICANN, that might work. But judging from the mails
> on this list from people close to the ICANN process, I see lots of doubts
> and not much faith at all.

I saw the doubt and no-faith as well, but I still did not see
suggestions that it belong elsewhere...I do not think ICANN has the
necessary staff ressources to it either, but that does not chnage the
fact that the job belongs there. As with other things ICANN can hire
or outsource the necessary resources.

> Also, as I have said earlier, asking "one only of A or B or C" is
> essentially the wrong question. A 100% solution will indeed be difficult,
> but each of A or B or C only are essentially something like 20% solutions,
> throwing most of the baby (in terms of the problem, totally harmless IDNs)
> out with the bathwater.

Well, Gerv did asked for a selection, so that was mine. I don't think
it is as low as a 20% solution - at least I have personally not seen
_that_ many complaints. I also think modifying A into D is easier than
going from B to D, so I think for Firefox to go from B to A _is_ an
improvement and a step in the right direction.
Patrik Fältström | 13 Dec 2011 07:44
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 13 dec 2011, at 06:48, Tina Dam wrote:

> I saw the doubt and no-faith as well, but I still did not see
> suggestions that it belong elsewhere...I do not think ICANN has the
> necessary staff ressources to it either, but that does not chnage the
> fact that the job belongs there. As with other things ICANN can hire
> or outsource the necessary resources.

ICANN only police the TLD, no other levels in the domain name tree.

   Patrik
Tina Dam | 13 Dec 2011 16:49
Picon

Re: Browser IDN display policy: opinions sought

Hi Patrik,

2011/12/12 Patrik Fältström <patrik <at> frobbit.se>:
> On 13 dec 2011, at 06:48, Tina Dam wrote:
>
>> I saw the doubt and no-faith as well, but I still did not see
>> suggestions that it belong elsewhere...I do not think ICANN has the
>> necessary staff ressources to it either, but that does not chnage the
>> fact that the job belongs there. As with other things ICANN can hire
>> or outsource the necessary resources.
>
> ICANN only police the TLD, no other levels in the domain name tree.
>

We agree there are certain areas that is none of ICANN's business. For
example they should not police the IDN Table content (other than
making sure it conforms to the required format, holds contact details
and things like that).

But ICANN should police that the tables are being posted in the IANA
Repository, thereby making sure we have one place where we can look up
what any TLD is doing (surely not the ccTLDs, but does count for gTLDs
and the Internationalized ccTLDs).

In the same manner ICANN should police the IDN Guidelines compliance
or adherence.

I think those two things will help tremendously, although I am not
claiming that this would means 'no more problems'.

Tina
JFC Morfin | 14 Dec 2011 01:56

Re: Browser IDN display policy: opinions sought

At 07:44 13/12/2011, Patrik Fältström wrote:
>ICANN only police the TLD, no other levels in the domain name tree.

Uncorrect. ICANN only try to police the minority of NTIA root TLD 
managers who contracted with them. IETF is for everyone who use TCP/IP.

Anyway, all this debate looks like the IETF people being reasonable 
but trying evading responsibility by hiding behind ICANN? Let 
clarify: Vint attempted to transfer responsbility of the 
post-IDNA2008 work to ICANN. However:

- ICANN has not taken the bait.
- Lisa's architectural questions belong to the IETF scope and have to 
be addressed first.
- ICANN, as everyone else, has no capacity to speak on behalf of all 
the Internet Users.

Gervase was correct in soughting opinions. No one has legitimate 
authority nor practical capacity to decide for the whole existing and 
future internet community. The only thing which can be done is for 
some to document what they suggest (IAB), what they think (this 
thread), or what they intend to do or already did.

Pragmatically, since IAB did not wish to provide guidance and 
opinions are divided, what will actually forge the future are those 
who specify, develop, test and deploy. They are authoritative for the 
community of those adhering to their postulates or locked in their 
"+" services. This is traditionally called in IETF jargon a market 
decision. It should traditionnally be made under IETF influence (RFC 
3935) for the Internet to work better. However, this thread shows 
that a RFC 3935bis is first needed to document what "work better" 
means today. This is what browser manufacturers, as many others, try 
to be told, in order to build a "better internet" window in the http 
web application context.

jfc  
Paul Hoffman | 13 Dec 2011 16:42
Picon

Re: Browser IDN display policy: opinions sought

On Dec 12, 2011, at 9:48 PM, Tina Dam wrote:

> I saw the doubt and no-faith as well, but I still did not see
> suggestions that it belong elsewhere...

Much of this thread is about allowing zones themselves to state their policies, again through the DNS.

> I do not think ICANN has the
> necessary staff ressources to it either, but that does not chnage the
> fact that the job belongs there.

Many people disagree with that "fact", and some disagree with it simply because ICANN has for a decade not
applied "necessary staff ressources to it" while choosing to instead apply significant staff resources
to things that are trivial.

> As with other things ICANN can hire
> or outsource the necessary resources.

Or not.

--Paul Hoffman
Eric Brunner-Williams | 13 Dec 2011 16:21
Favicon

Re: Browser IDN display policy: opinions sought


>> The issues between the different options have been discussed at
>> lenght, so let me just say that the biggest problem I have with B is
>> that it leave it up to Firefox to decide what is a good/bad TLD
>> registry. I think that belongs elsewhere, namely with ICANN.
> 
> If we had enough faith in ICANN, that might work. But judging from the
> mails on this list from people close to the ICANN process, I see lots
> of doubts and not much faith at all.

Randy Bush occasionally reminds the NANOG community that "procmail is
your friend".

There are mechanisms, and there are policies.

Whatever the economic dependencies, and therefore the limits on its
policies and choices of mechanism, the dependence of entities engaged
in the development of applications consuming rfc1034/35 et seq.
services is, to a first order, distinct from the dependencies of the
Marina del Rey entity on the Reston entity, and therefore its limits
on policies.

This first order distinction is present in mail transport agents, in
state-full applications, virtualized-circuit applications, etc.

Assuming, for the sake of intellectual interest, that "agency capture"
exists, that the Reston entity determines root zone access policy, and
gTLD policy, more than any other entity, then the recourse to a single
source of policy and the necessary deprecation of all mechanisms
capable of supporting other sources of policy has the effect of
extending the scope of control of a dominating actor in one market to
others.

The act of turning up a nameserver constellation in November 2001 for
which resolution of HAN script labels was neither administratively
failed nor third-party synthesized at a subsequent point in time,
several years before this semantic was available from a pre-existing
nameserver constellation, demonstrates that independent of the
intentions of the IAB of 2000, there are limits to unitary policy.

Perhaps the HTTP application authors will voluntarily adopt a unitary
policy that does not arise from first order dependencies such as
private equity, or public agency capture, and perhaps other
application authors will as well, and perhaps all will adopt the same,
single, unique policy, and perhaps the differences between any two
instances of a zone will not grow without bound, through prudent
stewardship by each cooperating publisher. However, this may not occur.

While it may be feasible to attempt to persuade each application
author set to use a particular source of policy, it is an unfortunate
fact that retail and wholesale monitization schemes are not
distinguished from the public interest. As Peter Dengate-Thrush
responded when I reminded him at the Nairobi Public Meeting that there
is no agreement as to what constitutes "the public interest",
(paraphrase of the original) 'the interests of domainers are a public
interest.'

I honestly don't know if adding brands in many scripts to some zones
or extending the typosquatting "natural traffic" economic model to
more zones or enfranchising indefinitely private parties with limited
resources is in any application author's interest, but given the
plurality of existing views into the union of one or more data sets,
and the expressions of intent to create further views, by governments,
institutional actors, and others who's requirements are not
controlling a particular source of policy, a uni-polar policy
assumption appears tenuous.

My two beads worth,
Eric
Martin J. Dürst | 12 Dec 2011 08:10
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/11 2:26, Paul Hoffman wrote:
> First, Mark's correction (which needs to be checked) is an important one:

I very much think it is correct. From their very timid start with IDNs, 
ICANN has a strong and confusing tendency to cast issues in terms of 
"language" even if they are script issues. That has influenced the 
surroundings, too.

> On Dec 9, 2011, at 3:12 AM, Gervase Markham wrote:
>
>> The policies fall into 3 approximate buckets:
>>
>> A (IE, Chrome): Unicode if the (single) 'language' of the string is
>> configured in the options, Punycode otherwise.
>>
>> B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode
>> otherwise. Arbitrary script mixing permitted (registry policy used to
>> prevent abuse).
>>
>> C (Safari): Unicode if the script is in a whitelist (which by default
>> does not include Cyrillic or Greek), Punycode otherwise. Not sure about
>> script mixing.
>
> Later, Mark Davis said:
>
> On Dec 9, 2011, at 10:10 AM, Mark Davis ☕ wrote:
>
>> I'm not familiar with the code, but I think that (A) may actually be:
>>
>> A (IE, Chrome): Unicode if the (single) 'script' of the string matches one of the scripts of the user's
language(s) in the options,
>> Punycode otherwise.
>>
>> It is pretty easy and reliable to detect the script of the string, whereas language detection would be unreliable.
>
> What a few people might be asking for is:
>
> D: Unicode if the label is a single script that is displayable by the browser, Punycode otherwise.
>
> Restated less tersely:
>
> D: If every character in the label comes from a single script as defined in the Unicode Standard, and every
character is displayable by the browser without resorting to "unknown" or "fallback" glyphs, display
the label; otherwise show Punycode.

Yes with the caveat that Patrick gave for punctuation and the additional 
caveat that whole-script confusables (confusable where e.g. one side is 
all-Latin and the other side is all-Cyrillic) should be checked for and 
addressed.

Regards,   Martin.
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Kent Karlsson | 10 Dec 2011 21:59
Picon

Re: Browser IDN display policy: opinions sought


Den 2011-12-09 12:12, skrev "Gervase Markham" <gerv <at> mozilla.org>:

... 
> The policies fall into 3 approximate buckets:
> 
> A (IE, Chrome): Unicode if the (single) 'language' of the string is
> configured in the options, Punycode otherwise.
... 
> (Note that "no restrictions" is not an option, given what happened in
> 2005 with payp-cyrillic-a-l.com, and I would rather not derail this
> debate by rehearsing those arguments again.)

There must be better ways to indicate that a particular domain name is
"not altogether fine" (whatever criteria are used for determining that)
than to use punycode display, which is altogether user unfriendly.

I would suggest to never use punycode display (but display the decoded
form) in display places where the conversion takes place (like the
address bar of a browser, to/from fields of a GUI-based email client,
...). That would hold even if there is no font on the system that
can display the characters in the domain name. The punycode version
should be extractable somehow, even though never used for normal display.

That a displayed domain name is "not altogether fine" could instead be
indicated by, e.g., red background, overstriking, or explicit error
messages (in the UI language of the application/app/...) associated
somehow with the display of the IDN, or a combination of these. Error
(or warning, if you like) messages could include "This domain name is
not in the global/personal whitelist", "This domain name mixes scripts
in an inappropriate manner", "This domain name is in the global
blacklist/has been reported as fraudulent", "This domain name uses
characters in script not used for any of the languages you have in
your language preference list" or whatever else is considered
"suspicious". Such error/warning messages would make it clear why
a particular domain name is flagged, while using punycode dispay is
on the level of using "?" as the sole error message for any error...

    /Kent K
John Levine | 10 Dec 2011 22:30

Re: Browser IDN display policy: opinions sought

>That a displayed domain name is "not altogether fine" could instead be
>indicated by, e.g., red background, overstriking, or explicit error
>messages (in the UI language of the application/app/...) associated
>somehow with the display of the IDN, or a combination of these. ...

Please, no.  A decade of bad UI design has trained users that no
matter what a warning says, you ignore it and click through.  Let's
not make that mistake again.

Either display the Unicode name or don't.

R's,
John
Raed Al-Fayez | 18 Dec 2011 10:50
Picon

RE: Browser IDN display policy: opinions sought

Dear All,

 

My name is Raed Al-Fayez I am from SaudiNIC (.sa & .السعودية    ccTLD Registry).

 

First of all I would like to thank Gervase Markham for opening such important issue; Also I thank everyone who have contributed in its discussion.

 

Please allow me to share with you our opinions and thoughts regarding "Browser IDN display policy":

 

IDNs should not be treated as second-class citizens on the Internet. They should be displayed in their native languages and not treated/displayed in their corresponding ASCII encodings. Users should not do anything to get their IDNs displayed correctly in their screens. IDNs should not be forced to be displayed in their ASCII format just – as some vendor claims: "The standard text encoding is used because it's possible for letters and symbols in some languages to be used to impersonate English language websites for phishing scams".  Please note that not ALL SCRIPTs can be used to "impersonate English language websites".

 

I would repeat here what Gervase Markham from Mozilla had – accurately - said in one of the ICANN meetings:

"They [IDNs] are going to be just as safe to use as ASCII domain names. You don't need extra alerts you don't need extra warnings, don't need extra training to use them. If they are less safe because characters are allowed which might confuse people, and therefore there is some danger, then we have failed in that goal ..".

 

At SaudiNIC, we have experiences with providing Arabic IDNs as we were the 1st ccTLD registry provided IDN registration in our region. We have observed that our local users lost their trust in the Arabic IDNs when IDN addresses appeared in ASCII encoding. Many of them were shocked not to see the Arabic IDNs but rather "garbage text" not in their native language. Furthermore, they had difficulties or did not know how to fix this within the operating systems or the browsers they were using.

 

Please note in our registry we provide a conservative and a complete IDN solution which include Variants managements at the Arabic Script level as well as we do not allow script mixing. Registration are based on a very well defined and limited language table. So the security issue is already taking care by the registry (us). Hence, users should have no security issue when they get Arabic IDNs and should be displayed in their native language automatically without the intervention by anyone including the users him/herself.

 

Here are some comments regarding Option A:

-          A language/country is added to the supported list but it is not associated with an IDN TLD. Hence, if a user add the Arabic language to the supported list this will always displays all the Arabic IDNs regardless wither the IDN TLD is trusted or not.

-          This treats IDNs as second criticizes of the Internet which leads to miss-trustiness from users in the IDNA as it always needs intervention from the user (which is not easy and not standard).

-          This option leads to different actions from the users to enable IDNs depending on operating system type/version and used applications (e.g., browsers). Therefore, it adds extra complexity to user acceptance and registry customer care.

-          In Airports or public internet hotspots IDNs would not work properly as user have no control in the application settings.

-          IDNA deals with domain labels at the script level, but "Option A" works at the language (and country) level which are incompatible behavior.

 

As IDNs are about making the Internet more global and accessible for everyone and many companies/communities are investing heavily in IDNs, it would not be fair to treat IDN domain names differently from their ASCII/English ones. IDNs should be used and displayed automatically without the intervention of users. This is very important so that these investments will not be jeopardized just because of these kind of treatments which will shun away the users from IDNs . Thus, IDNs should be treated as first-class citizens of Internet.

 

We have to be very careful as new gTLDs are coming in the market that will be used by almost all users around the world. It is not adequate and justifiable that new ASCII gTLDs are working from day one while new IDN gTLDs are handicapped by the applications and need some extra intervention and involvement from the user side to make them working probably.

 

In our judgment, the issue of " impersonate English language websites for phishing scams" is unjustly magnified based on bad example of a registry practice. For example, under ".com", IDN was supported with almost all possible Unicode characters without any restriction or variants managements. Hence, registries who provide clear language tables, policies, and variants handling mechanisms do not fall in these kind of problems but they have been penalize.

 

The problems cited as they are from IDNs can be addressed through mechanisms similar to the current practices to fight phishing scams and malware. Currently, web browsers and search engines have their own means to flag suspicious sites as a service provided from the application vendors to their users.

 

Hence, we recommend the following:

-       IDNs should be treated and handled similarly to ASCII domains. It should not require extra efforts from the users to make them work correctly. It should be transparent to the users, except for the suspicious sites (that can be handled via a black list).

-       Otherwise, if it is not possible to execute, we recommend to have a centralized authority (e.g., ICANN/IANA) that maintain a repository for each TLD registry that contains information about the registry's language tables, list of supported languages and it should be filled automatically (online) as part of the delegation process.

 

 

 

With best regards,

 

Raed I. Al-Fayez

------------------------------------------

Senior IT Projects Specialist, M.Sc, PMP

Saudi Network Information Center (SaudiNIC)

Communication and Information Technology Commission (CITC)

Tel: + 966-1-2639235   - Fax: + 966-1-2639393

http://www.nic.net.sa

 

-----Original Message-----
From: idna-update-bounces <at> alvestrand.no [mailto:idna-update-bounces <at> alvestrand.no] On Behalf Of Gervase Markham
Sent: Friday, December 09, 2011 2:12 PM
To: idna-update <at> alvestrand.no
Subject: Browser IDN display policy: opinions sought

 

Recently, Mozilla community member Jothan Frakes was kind enough to do some research about how different popular web browsers implement IDN, and when they display the real characters and when they display Punycode. This is in the context of a Mozilla review of our policy. I am interested in the opinions of people on this list (see below).

 

As it turns out, the behaviour of all popular browsers is summarised at the bottom a Chromium project document here:

http://www.chromium.org/developers/design-documents/idn-in-google-chrome

 

The policies fall into 3 approximate buckets:

 

A (IE, Chrome): Unicode if the (single) 'language' of the string is configured in the options, Punycode otherwise.

 

B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode otherwise. Arbitrary script mixing permitted (registry policy used to prevent abuse).

 

C (Safari): Unicode if the script is in a whitelist (which by default does not include Cyrillic or Greek), Punycode otherwise. Not sure about script mixing.

 

 

Firefox has historically resisted adopting a Type A policy because we consider it seriously detrimental to IDN adoption and use. It seems to me that IDN can never be reliable for site owners, and therefore will not succceed, if a significant proportion of the world's browsers adopt Type A or Type C policies. This is because site owners can never know what proportion of their visitors will see gobbledegook in the URL bar rather than their nice domain name. Perhaps for sites whose visitors are all guaranteed to be from a particular country or language group, with properly-configured browsers and OSes which know that they speak a certain language or use a certain script, it might work - but I suggest that's a small subset of all sites. Many people in non-English-speaking countries still use English OSes and English browsers, with default settings.

 

Type C is particularly bad - Russian and Greek IDNs are broken by default, but even if you persuade your users to turn it on, they can then be mixed-script spoofed. You get to choose between functionality and security.

 

By contrast, with a Type B policy, if your IDN domain works in one copy of Firefox, it works in them all. If everyone had Type B policies, there would be no risk of a properly-registered domain coming up as gibberish.

 

It has been suggested that Firefox switch to a Type A policy. As it is, the mix of policies means that the goal of universal acceptability is not being met anyway. Firefox switching to Type A would also not meet that goal by itself, but one could argue that there's a bit more consistency to browser behaviour.

 

I would be interested in the opinion of people on this list as to:

 

- whether my analysis seems reasonable;

- whether they prefer type A, B or C; and

- whether they see any particular policy as more damaging to IDN

  adoption than another.

 

Has anyone lobbied one browser manufacturer or another to change their policy? Is there another option that is not currently in use which would be better?

 

(Note that "no restrictions" is not an option, given what happened in

2005 with payp-cyrillic-a-l.com, and I would rather not derail this debate by rehearsing those arguments again.)

 

Gerv

_______________________________________________

Idna-update mailing list

Idna-update <at> alvestrand.no

http://www.alvestrand.no/mailman/listinfo/idna-update


-----------------------------------------------------------------------
تنويه:
هذه الرسالة و مرفقاتها (إن وجدت) تمثل وثيقة سرية قد تحتوي على معلومات تتمتع بحماية وحصانة قانونية. إذا لم تكن الشخص المعني بهذه الرسالة يجب عليك تنبيه المُرسل
بخطأ وصولها إليك، و حذف الرسالة و مرفقاتها (إن وجدت) من الحاسب الآلي الخاص بك. ولا يجوز لك نسخ هذه الرسالة أو مرفقاتها (إن وجدت) أو أي جزئ منها، أو
البوح بمحتوياتها لأي شخص أو استعمالها لأي غرض. علماً بأن الإفادات و الآراء التي تحويها هذه الرسالة تعبر فقط عن رأي المُرسل و ليس بالضرورة رأي هيئة الاتصالات و
تقنية المعلومات، ولا تتحمل الهيئة أي مسئولية عن الأضرار الناتجة عن هذ البريد.
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
John C Klensin | 18 Dec 2011 17:45

RE: Browser IDN display policy: opinions sought


--On Sunday, December 18, 2011 09:50 +0000 Raed Al-Fayez
<rfayez <at> citc.gov.sa> wrote:

> Dear All,
> 
> My name is Raed Al-Fayez I am from SaudiNIC (.sa &
> .السعودية    ccTLD Registry).
> 
> First of all I would like to thank Gervase Markham for opening
> such important issue; Also I thank everyone who have
> contributed in its discussion.
> 
> Please allow me to share with you our opinions and thoughts
> regarding "Browser IDN display policy":
>...

Raed,

Thanks for your careful and thoughtful note.  In the hope of
encouraging more discussion and understanding, I'd like to
clarify some things that often get in the way of clear thinking:

Equating the pre-IDN domain name policies, and the hostname
policies that preceded them, with "English" or "ASCII" is
convenient but misleading.  The decisions about allowed
characters and the structure of names, going back to the 1970s,
were made in order to have a character repertoire that was
easily and accurately mapped back and forth among coded
character sets.  Many characters that might have made sense were
excluded because they did not appear the same way in the
conventional glyphs associated with different character sets and
codings, others because they were inherently confusable when
written in different ways (the controversial decision to exclude
spacing underscore ("_") is the example I remember most vividly
although there were other issues with that character), non-Latin
scripts were excluded either because they had too many
characters to represent easily or because properly representing
them in the 6 and 7 bit codes of the day required multi-"byte"
sequences, sometimes because they just had too many distinct
characters or glyph forms.  Some of them could be represented
reasonably well in 8 bits, others could not, but 8 bit codes
were not universally available.

Even the decision to make upper case and lower case Latin
characters match --which many of us believe in retrospect to
have been a mistake-- was driven in part by the recognition that
there were systems attached to the network that were single-case
only (usually upper and often because of histories in six-bit
coding systems.  

There were also considerations about characters used in common
operating system command lines, but they didn't dominate.  For
example, Multics and later Unix used "-" as a command argument
introducer, but it is permitted in domain names (although not as
a leading character, partially for that reason).

So we ended up with undecorated Latin characters with one
wide-available in-label separator character (hyphen-minus in
Unicode-speak) and one label separator (period or dot).  Yes,
the DNS uses ASCII, but that is ultimately an "on the wire"
convention to prevent total confusion (see RFCs 20 and 5198 for
discussions of that issue).  If people wanted or needed to do
something else in the local operating system, they did.

Could folks have started with something other than Latin?  Yes,
but it wasn't practical given the state of computer developments
at the time.  Even ignoring that, the choices are more limited
than one would think and the precedents predate the Internet and
computing by many years.  Arabic would not have worked because
the connected characters and differentiation issues that have
occupied a good deal of time within Arabic script IDN
discussions make it unsuitable for this type of
mnemonic-identifier use by those who don't already read the
script.  Most other scripts have the same issues or others.
Japanese Kana (one set or the other, not both, and certainly not
Kanji) might have done the job.  Cyrillic might have worked if
it had been possible to pick an acceptable subset.  Greek would
not have worked due to some of the same matching issues that now
contribute to variant discussions.  And there were very few
scripts for which there were stable coding standards by the time
these specifications started to solidify.

Anyway, please don't assume that all of these decisions were
made out of ignorance or indifference to other scripts or
codings.  The issues associated with internationalization were
very much discussed and considered, long before the DNS.  It
wasn't practical to do anything much different from what was
done -- I note that the ITU made essentially the same decisions,
agreeing on nearly the same set of characters, in the design of
protocols that use letter-based keywords and parameters -- and a
great deal of the decision-making was about differentiability
and existing international practices, not about either ASCII or
English.

    best,
    john

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Martin J. Dürst | 20 Dec 2011 06:31
Picon
Gravatar

Re: Browser IDN display policy: opinions sought

On 2011/12/19 1:45, John C Klensin wrote:

> Thanks for your careful and thoughtful note.  In the hope of
> encouraging more discussion and understanding, I'd like to
> clarify some things that often get in the way of clear thinking:
>
> Equating the pre-IDN domain name policies, and the hostname
> policies that preceded them, with "English" or "ASCII" is
> convenient but misleading.

For better or worse, that's often the easiest way to refer to them. 
While the technical choices you describe below are not in a strict 
absolute sense related to either ASCII or English, they are nevertheless 
very strongly related to both, at least indirectly. For better or worse, 
most computers of that time where developed in and for English-speaking 
countries, and the same applies to the various encodings (from 6bits 
upwards) that went with it.

Anyway, while the technical choices you describe below may very well 
have been appropriate 30 or 40 years ago, I don't think anybody is 
challenging that. What's important are the technical choices today. 
Neither 6bit encodings nor multibyte character encodings (incl. UTF-8) 
nor most of the other aspects you describe below are relevant today, nor 
are any of them a reason for using punycode rather than Unicode to 
display IDNs.

Regards,   Martin.

> The decisions about allowed
> characters and the structure of names, going back to the 1970s,
> were made in order to have a character repertoire that was
> easily and accurately mapped back and forth among coded
> character sets.  Many characters that might have made sense were
> excluded because they did not appear the same way in the
> conventional glyphs associated with different character sets and
> codings, others because they were inherently confusable when
> written in different ways (the controversial decision to exclude
> spacing underscore ("_") is the example I remember most vividly
> although there were other issues with that character), non-Latin
> scripts were excluded either because they had too many
> characters to represent easily or because properly representing
> them in the 6 and 7 bit codes of the day required multi-"byte"
> sequences, sometimes because they just had too many distinct
> characters or glyph forms.  Some of them could be represented
> reasonably well in 8 bits, others could not, but 8 bit codes
> were not universally available.
>
> Even the decision to make upper case and lower case Latin
> characters match --which many of us believe in retrospect to
> have been a mistake-- was driven in part by the recognition that
> there were systems attached to the network that were single-case
> only (usually upper and often because of histories in six-bit
> coding systems.
>
> There were also considerations about characters used in common
> operating system command lines, but they didn't dominate.  For
> example, Multics and later Unix used "-" as a command argument
> introducer, but it is permitted in domain names (although not as
> a leading character, partially for that reason).
>
> So we ended up with undecorated Latin characters with one
> wide-available in-label separator character (hyphen-minus in
> Unicode-speak) and one label separator (period or dot).  Yes,
> the DNS uses ASCII, but that is ultimately an "on the wire"
> convention to prevent total confusion (see RFCs 20 and 5198 for
> discussions of that issue).  If people wanted or needed to do
> something else in the local operating system, they did.
>
> Could folks have started with something other than Latin?  Yes,
> but it wasn't practical given the state of computer developments
> at the time.  Even ignoring that, the choices are more limited
> than one would think and the precedents predate the Internet and
> computing by many years.  Arabic would not have worked because
> the connected characters and differentiation issues that have
> occupied a good deal of time within Arabic script IDN
> discussions make it unsuitable for this type of
> mnemonic-identifier use by those who don't already read the
> script.  Most other scripts have the same issues or others.
> Japanese Kana (one set or the other, not both, and certainly not
> Kanji) might have done the job.  Cyrillic might have worked if
> it had been possible to pick an acceptable subset.  Greek would
> not have worked due to some of the same matching issues that now
> contribute to variant discussions.  And there were very few
> scripts for which there were stable coding standards by the time
> these specifications started to solidify.
>
> Anyway, please don't assume that all of these decisions were
> made out of ignorance or indifference to other scripts or
> codings.  The issues associated with internationalization were
> very much discussed and considered, long before the DNS.  It
> wasn't practical to do anything much different from what was
> done -- I note that the ITU made essentially the same decisions,
> agreeing on nearly the same set of characters, in the design of
> protocols that use letter-based keywords and parameters -- and a
> great deal of the decision-making was about differentiability
> and existing international practices, not about either ASCII or
> English.
>
>      best,
>      john
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update <at> alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
James Seng | 19 Dec 2011 01:19
Picon

Re: Browser IDN display policy: opinions sought

+1

Sent from my iPhone

On 18 Dec, 2011, at 6:02 PM, Raed Al-Fayez <rfayez <at> citc.gov.sa> wrote:

Dear All,

 

My name is Raed Al-Fayez I am from SaudiNIC (.sa & .السعودية    ccTLD Registry).

 

First of all I would like to thank Gervase Markham for opening such important issue; Also I thank everyone who have contributed in its discussion.

 

Please allow me to share with you our opinions and thoughts regarding "Browser IDN display policy":

 

IDNs should not be treated as second-class citizens on the Internet. They should be displayed in their native languages and not treated/displayed in their corresponding ASCII encodings. Users should not do anything to get their IDNs displayed correctly in their screens. IDNs should not be forced to be displayed in their ASCII format just – as some vendor claims: "The standard text encoding is used because it's possible for letters and symbols in some languages to be used to impersonate English language websites for phishing scams".  Please note that not ALL SCRIPTs can be used to "impersonate English language websites".

 

I would repeat here what Gervase Markham from Mozilla had – accurately - said in one of the ICANN meetings:

"They [IDNs] are going to be just as safe to use as ASCII domain names. You don't need extra alerts you don't need extra warnings, don't need extra training to use them. If they are less safe because characters are allowed which might confuse people, and therefore there is some danger, then we have failed in that goal ..".

 

At SaudiNIC, we have experiences with providing Arabic IDNs as we were the 1st ccTLD registry provided IDN registration in our region. We have observed that our local users lost their trust in the Arabic IDNs when IDN addresses appeared in ASCII encoding. Many of them were shocked not to see the Arabic IDNs but rather "garbage text" not in their native language. Furthermore, they had difficulties or did not know how to fix this within the operating systems or the browsers they were using.

 

Please note in our registry we provide a conservative and a complete IDN solution which include Variants managements at the Arabic Script level as well as we do not allow script mixing. Registration are based on a very well defined and limited language table. So the security issue is already taking care by the registry (us). Hence, users should have no security issue when they get Arabic IDNs and should be displayed in their native language automatically without the intervention by anyone including the users him/herself.

 

Here are some comments regarding Option A:

-          A language/country is added to the supported list but it is not associated with an IDN TLD. Hence, if a user add the Arabic language to the supported list this will always displays all the Arabic IDNs regardless wither the IDN TLD is trusted or not.

-          This treats IDNs as second criticizes of the Internet which leads to miss-trustiness from users in the IDNA as it always needs intervention from the user (which is not easy and not standard).

-          This option leads to different actions from the users to enable IDNs depending on operating system type/version and used applications (e.g., browsers). Therefore, it adds extra complexity to user acceptance and registry customer care.

-          In Airports or public internet hotspots IDNs would not work properly as user have no control in the application settings.

-          IDNA deals with domain labels at the script level, but "Option A" works at the language (and country) level which are incompatible behavior.

 

As IDNs are about making the Internet more global and accessible for everyone and many companies/communities are investing heavily in IDNs, it would not be fair to treat IDN domain names differently from their ASCII/English ones. IDNs should be used and displayed automatically without the intervention of users. This is very important so that these investments will not be jeopardized just because of these kind of treatments which will shun away the users from IDNs . Thus, IDNs should be treated as first-class citizens of Internet.

 

We have to be very careful as new gTLDs are coming in the market that will be used by almost all users around the world. It is not adequate and justifiable that new ASCII gTLDs are working from day one while new IDN gTLDs are handicapped by the applications and need some extra intervention and involvement from the user side to make them working probably.

 

In our judgment, the issue of " impersonate English language websites for phishing scams" is unjustly magnified based on bad example of a registry practice. For example, under ".com", IDN was supported with almost all possible Unicode characters without any restriction or variants managements. Hence, registries who provide clear language tables, policies, and variants handling mechanisms do not fall in these kind of problems but they have been penalize.

 

The problems cited as they are from IDNs can be addressed through mechanisms similar to the current practices to fight phishing scams and malware. Currently, web browsers and search engines have their own means to flag suspicious sites as a service provided from the application vendors to their users.

 

Hence, we recommend the following:

-       IDNs should be treated and handled similarly to ASCII domains. It should not require extra efforts from the users to make them work correctly. It should be transparent to the users, except for the suspicious sites (that can be handled via a black list).

-       Otherwise, if it is not possible to execute, we recommend to have a centralized authority (e.g., ICANN/IANA) that maintain a repository for each TLD registry that contains information about the registry's language tables, list of supported languages and it should be filled automatically (online) as part of the delegation process.

 

 

 

With best regards,

 

Raed I. Al-Fayez

------------------------------------------

Senior IT Projects Specialist, M.Sc, PMP

Saudi Network Information Center (SaudiNIC)

Communication and Information Technology Commission (CITC)

Tel: + 966-1-2639235   - Fax: + 966-1-2639393

http://www.nic.net.sa

 

-----Original Message-----
From: idna-update-bounces <at> alvestrand.no [mailto:idna-update-bounces <at> alvestrand.no] On Behalf Of Gervase Markham
Sent: Friday, December 09, 2011 2:12 PM
To: idna-update <at> alvestrand.no
Subject: Browser IDN display policy: opinions sought

 

Recently, Mozilla community member Jothan Frakes was kind enough to do some research about how different popular web browsers implement IDN, and when they display the real characters and when they display Punycode. This is in the context of a Mozilla review of our policy. I am interested in the opinions of people on this list (see below).

 

As it turns out, the behaviour of all popular browsers is summarised at the bottom a Chromium project document here:

http://www.chromium.org/developers/design-documents/idn-in-google-chrome

 

The policies fall into 3 approximate buckets:

 

A (IE, Chrome): Unicode if the (single) 'language' of the string is configured in the options, Punycode otherwise.

 

B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode otherwise. Arbitrary script mixing permitted (registry policy used to prevent abuse).

 

C (Safari): Unicode if the script is in a whitelist (which by default does not include Cyrillic or Greek), Punycode otherwise. Not sure about script mixing.

 

 

Firefox has historically resisted adopting a Type A policy because we consider it seriously detrimental to IDN adoption and use. It seems to me that IDN can never be reliable for site owners, and therefore will not succceed, if a significant proportion of the world's browsers adopt Type A or Type C policies. This is because site owners can never know what proportion of their visitors will see gobbledegook in the URL bar rather than their nice domain name. Perhaps for sites whose visitors are all guaranteed to be from a particular country or language group, with properly-configured browsers and OSes which know that they speak a certain language or use a certain script, it might work - but I suggest that's a small subset of all sites. Many people in non-English-speaking countries still use English OSes and English browsers, with default settings.

 

Type C is particularly bad - Russian and Greek IDNs are broken by default, but even if you persuade your users to turn it on, they can then be mixed-script spoofed. You get to choose between functionality and security.

 

By contrast, with a Type B policy, if your IDN domain works in one copy of Firefox, it works in them all. If everyone had Type B policies, there would be no risk of a properly-registered domain coming up as gibberish.

 

It has been suggested that Firefox switch to a Type A policy. As it is, the mix of policies means that the goal of universal acceptability is not being met anyway. Firefox switching to Type A would also not meet that goal by itself, but one could argue that there's a bit more consistency to browser behaviour.

 

I would be interested in the opinion of people on this list as to:

 

- whether my analysis seems reasonable;

- whether they prefer type A, B or C; and

- whether they see any particular policy as more damaging to IDN

  adoption than another.

 

Has anyone lobbied one browser manufacturer or another to change their policy? Is there another option that is not currently in use which would be better?

 

(Note that "no restrictions" is not an option, given what happened in

2005 with payp-cyrillic-a-l.com, and I would rather not derail this debate by rehearsing those arguments again.)

 

Gerv

_______________________________________________

Idna-update mailing list

Idna-update <at> alvestrand.no

http://www.alvestrand.no/mailman/listinfo/idna-update


-----------------------------------------------------------------------
تنويه:
هذه الرسالة و مرفقاتها (إن وجدت) تمثل وثيقة سرية قد تحتوي على معلومات تتمتع بحماية وحصانة قانونية. إذا لم تكن الشخص المعني بهذه الرسالة يجب عليك تنبيه المُرسل
بخطأ وصولها إليك، و حذف الرسالة و مرفقاتها (إن وجدت) من الحاسب الآلي الخاص بك. ولا يجوز لك نسخ هذه الرسالة أو مرفقاتها (إن وجدت) أو أي جزئ منها، أو
البوح بمحتوياتها لأي شخص أو استعمالها لأي غرض. علماً بأن الإفادات و الآراء التي تحويها هذه الرسالة تعبر فقط عن رأي المُرسل و ليس بالضرورة رأي هيئة الاتصالات و
تقنية المعلومات، ولا تتحمل الهيئة أي مسئولية عن الأضرار الناتجة عن هذ البريد.
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Gervase Markham | 19 Dec 2011 11:42
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

Hi Raed,

Thank you for your input.

I would say that the vendor you quote is using sloppy language and it's
not actually about 'prioritizing English', but your point about
first-class citizenship is a good one.

If all registries were are responsible as yours, there would hardly be a
need for browser restrictions on IDN display at all (perhaps just a
blacklist of homographs for "." and "/", and that should have been
achieved by IDNA2008 anyway).

> IDNs should not be treated as second-class citizens on the Internet.

I entirely agree. This is why we went for option B - once a registry has
a responsible policy, their IDNs are treated as first-class citizens
everywhere (at least, in Firefox). No additional configuration required.

Unfortunately, IDNs are still not treated as first-class citizens. So
the question is: how do we get from where we are now to a situation
where they are treated that way?

Gerv
J-F C. Morfin | 19 Dec 2011 15:16

Re: Browser IDN display policy: opinions sought

At 11:42 19/12/2011, Gervase Markham wrote:
>Hi Raed,
>Thank you for your input.
>
>I would say that the vendor you quote is using sloppy language and it's
>not actually about 'prioritizing English', but your point about
>first-class citizenship is a good one.

No one is prioritizing English. English IS, by mutual convenience, 
the Internet language (cf. RFC 3935).

This is why the IETF solution is "internationalization" and not 
"multilingualization". Internationalization is a component of 
"globalization" together with "localization" and "langtags" and is 
embodied in CLDR tools. Internationlization is to make the medium 
(here the Internet) language neutral. For that it uses an 
International English and ASCII programming to quote Unicoded strings 
independently from the actual language these strings correspond to 
(identfied by the langtag), including English. These quotes are of 
two forms: the real text, i.e. a batch of code points, or an URL that 
permit an option indirect access. For good consustency (RFC 1958) the 
quotes should use the same script displaying context (UTF-8).

IDNA2008 has organized this to properly work through:

- an Internet technology core,
- and a subsidiary support of languages at the periphery.

In this architecture it happens that the Internet architectural core 
happens to currently work in ASCII, but it could work the same in any 
other charset, including binary.(what may be the evolution of the 
Internet 2.0 architecture).

IDNA2008 however has left two unaddressed issues.

1. to be broadly explained as being an example of the way the end to 
end Internet architecture actually (RFC 1958) supports diversity, any 
diversity. This support is at the intelligent fringe to fringe layer 
set which includes the presentation layer.

The difficulty is that OSI was wrong: the OSI model in uncomplete and 
the layer pile must dissociate Network and Subsidiary Network Use 
layers. I appealed against the lack of a IAB/IESG disclaimer which 
would have prevented the current debate. The response I obtained 
clearly indicated that this issue did not belong to the IETF scope, 
but that IETF wanted to detail its own needs in this area. This is 
why I delayed as much as I could the description of the fringe to 
fringe IUI framework. However, the technically illegitimate ICANN 
opening to gTLD registration calls for it to be published before January 10.

2. fringe to fringe calls for metadata exchanges: IDNA2008 has not 
considered the need to support orthotypography (i.e. script use 
variations depending on the meaning). Basically what you say about 
second-zone languages also applies to real life English which 
actually needs more than the ASCII charset, and obviously to French 
as I reported it to the IESG. The management of the fringe to fringe 
metadata intercommunications is more complex than the data 
intercommunications. It calls for a dedicated channel that has to be 
transparent to the existing core technology and common usage. What 
Gervase asks for is to be told how this is to work.

>If all registries were are responsible as yours, there would hardly be a
>need for browser restrictions on IDN display at all (perhaps just a
>blacklist of homographs for "." and "/", and that should have been
>achieved by IDNA2008 anyway).

This "registries being responsible" is an wrong interference where we 
technicians are judging use, instead of supporting it properly.  The 
principle of the Internet is that technology is to best support the 
user needs and the IETF is to influence those who design, use and 
manage the Internet for it to work better. The problem are (1) that 
no one ever explained what working "better" means and (2) that some 
of us want to unilaterally constrain uses along with their own 
understanding of that "better" means, moreover in areas where things 
have not been fully defined.

> > IDNs should not be treated as second-class citizens on the Internet.
>
>I entirely agree. This is why we went for option B - once a registry has
>a responsible policy, their IDNs are treated as first-class citizens
>everywhere (at least, in Firefox). No additional configuration required.

This means that you decide who is a second-zone Registry and treat 
their registrants as third-zone netizens. Then the question 
unfortunately comes: "Who made you king to decide this?"

>Unfortunately, IDNs are still not treated as first-class citizens. So
>the question is: how do we get from where we are now to a situation
>where they are treated that way?

This is easy as far as you are concerned.

- you treat them as your user application layer is to treat them: as 
unique-class Internet Domain Name citizens.
- you demand those who are to take care of the difficulties you 
identify to address them properly. They are the IAB and the emerging 
IUTF (on the iucg <at> ietf.org) where this mailing thread belongs since 
the WG/IDNAbis is closed, its RFC set is published, and the remaining 
points are on the IDNA2008 use side.

Cheers.

jfc
Paul Hoffman | 19 Dec 2011 17:55
Picon

Re: Browser IDN display policy: opinions sought

On Dec 19, 2011, at 2:42 AM, Gervase Markham wrote:

> I entirely agree. This is why we went for option B - once a registry has
> a responsible policy, their IDNs are treated as first-class citizens
> everywhere (at least, in Firefox). No additional configuration required.

In this case, however, the "responsible policy" is limited to TLDs registering SLDs. People have already
pointed out on this thread that Firefox's restriction on script-confusables only goes one layer down,
and that for LDH labels, Firefox (and all other browsers) don't do anything about names like www.bankofamerica.com.deposits.index-action.me.

> Unfortunately, IDNs are still not treated as first-class citizens. So
> the question is: how do we get from where we are now to a situation
> where they are treated that way?

One way, which you have rejected earlier in this thread, is to simply display all IDNs as Unicode (where the
display is possible), just the same way you display all possibly-fraudulent LDH labels. That would make
them all first-class. If you choose to do some checking on the domain names for possible fraud based on
other heuristics (as Firefox and all other browsers do), and then show an interstitial warning or change
the navigation chrome in some way, you can do that for IDNs as well *following the same rules you use for
non-IDN names*.

If you want to get additional heuristics from TLDs about policies to help you decide when you should add a
warning, the technical community can talk about how to make that happen in a way that would be useful to
application vendors. (So could ICANN, but I suspect that would be a waste of everyone's time.)

The choice to not treat IDNs as second-class in applications remains with the application vendors. Being
consistent in pointing out possible fraud would go a long way towards making IDNs more useful to everyone
except to fraudsters.

--Paul Hoffman
Gervase Markham | 21 Dec 2011 12:57
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 19/12/11 16:55, Paul Hoffman wrote:
> In this case, however, the "responsible policy" is limited to TLDs
> registering SLDs. People have already pointed out on this thread that
> Firefox's restriction on script-confusables only goes one layer down,
> and that for LDH labels, Firefox (and all other browsers) don't do
> anything about names like
> www.bankofamerica.com.deposits.index-action.me.

And I have responded that this is not your problem; we are tackling that
sort of thing via other means (such as domain highlighting).

> One way, which you have rejected earlier in this thread, is to simply
> display all IDNs as Unicode (where the display is possible), just the
> same way you display all possibly-fraudulent LDH labels. That would
> make them all first-class. If you choose to do some checking on the
> domain names for possible fraud based on other heuristics (as Firefox
> and all other browsers do), and then show an interstitial warning or
> change the navigation chrome in some way, you can do that for IDNs as
> well *following the same rules you use for non-IDN names*.

(For those not familiar: Firefox can use various data sources, but by
default uses the Google SafeBrowsing list, to put up warnings whenever a
site on the list is encountered.)

> If you want to get additional heuristics from TLDs about policies to
> help you decide when you should add a warning, the technical
> community can talk about how to make that happen in a way that would
> be useful to application vendors. (So could ICANN, but I suspect that
> would be a waste of everyone's time.)

Could you expand on how that might happen?

Gerv
Paul Hoffman | 21 Dec 2011 16:24
Picon

Re: Browser IDN display policy: opinions sought

On Dec 21, 2011, at 3:57 AM, Gervase Markham wrote:

> On 19/12/11 16:55, Paul Hoffman wrote:
>> In this case, however, the "responsible policy" is limited to TLDs
>> registering SLDs. People have already pointed out on this thread that
>> Firefox's restriction on script-confusables only goes one layer down,
>> and that for LDH labels, Firefox (and all other browsers) don't do
>> anything about names like
>> www.bankofamerica.com.deposits.index-action.me.
> 
> And I have responded that this is not your problem; we are tackling that
> sort of thing via other means (such as domain highlighting).

It is "our problem" in that you are introducing multiple ways to alert users about questionable domain
names, where many valid IDNs get worse display than all-ASCII names that are clearly fraudulent. The
proposal people are making is that, if your motivation for showing Punycode is that there might be fraud,
that you instead use the same alert technologies that you { are | will be } using for all-ASCII names that you
believe are fraudulent.

>> One way, which you have rejected earlier in this thread, is to simply
>> display all IDNs as Unicode (where the display is possible), just the
>> same way you display all possibly-fraudulent LDH labels. That would
>> make them all first-class. If you choose to do some checking on the
>> domain names for possible fraud based on other heuristics (as Firefox
>> and all other browsers do), and then show an interstitial warning or
>> change the navigation chrome in some way, you can do that for IDNs as
>> well *following the same rules you use for non-IDN names*.
> 
> (For those not familiar: Firefox can use various data sources, but by
> default uses the Google SafeBrowsing list, to put up warnings whenever a
> site on the list is encountered.)

That seems quite reasonable to me.

>> If you want to get additional heuristics from TLDs about policies to
>> help you decide when you should add a warning, the technical
>> community can talk about how to make that happen in a way that would
>> be useful to application vendors. (So could ICANN, but I suspect that
>> would be a waste of everyone's time.)
> 
> Could you expand on how that might happen?

Andrew already proposed maybe a new RRtype in which a zone can publish its script policy. That one attracts
me most so far because it keeps DNS-related decisions in the DNS and uses clear TTL semantics for caching.

--Paul Hoffman
Gervase Markham | 21 Dec 2011 16:49
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 21/12/11 15:24, Paul Hoffman wrote:
>> And I have responded that this is not your problem; we are tackling
>> that sort of thing via other means (such as domain highlighting).
> 
> It is "our problem" in that you are introducing multiple ways to
> alert users about questionable domain names, 

The methods share a common approach - attempt to avoid showing the user
something which might confuse them into believing a falsehood. In one
case, it's greying-out non-critical bits of the domain name, in another,
it's changing the characters to not use potentially-confusing glyphs.

> The proposal people are making is that, if your motivation for
> showing Punycode is that there might be fraud, that you instead use
> the same alert technologies that you { are | will be } using for
> all-ASCII names that you believe are fraudulent.

The domain name highlighting approach doesn't make sense in this case of
potentially-confusing IDNs; in fact, the default behaviour makes the
problem worse. So I'm not sure how you are suggesting it could be used
as a mechanism for alerting the user to risk.

>> (For those not familiar: Firefox can use various data sources, but
>> by default uses the Google SafeBrowsing list, to put up warnings
>> whenever a site on the list is encountered.)
> 
> That seems quite reasonable to me.

Except that such mechanisms are always after-the-fact. The block list
data delta is downloaded once per day (live checking means that you send
your entire browser history to Google, something many people are not
comfortable with). The lifetime of individual phishing sites is usually
measured in hours.

I am not saying it is impossible that we will give up entirely and just
rely on "safe browsing" to protect Firefox users; but it would seem an
odd thing to do to say "yes, let people register domains of the
type/style of www.paypa-cyrillic_a-l.com; I'm sure eventually someone
will figure out it's a scam and add it to the blacklist". Such domains
seem to me to have no use but fooling people; the question is, how do
you distinguish them from other IDNs in a 0% false positive, 0% false
negative way? Is that even possible?

>>> If you want to get additional heuristics from TLDs about policies
>>> to help you decide when you should add a warning, the technical 
>>> community can talk about how to make that happen in a way that
>>> would be useful to application vendors. (So could ICANN, but I
>>> suspect that would be a waste of everyone's time.)
>> 
>> Could you expand on how that might happen?
> 
> Andrew already proposed maybe a new RRtype in which a zone can
> publish its script policy. That one attracts me most so far because
> it keeps DNS-related decisions in the DNS and uses clear TTL
> semantics for caching.

But the problem is not that the browser does not know the script policy
for a given TLD; the problem is that some TLDs do not have, or do not
have an acceptable, script policy. Even if we remove the notion of
"acceptable" entirely, then still, the only thing this would allow a
browser to check would be that a TLD is in conformance with its own
stated policy. That does not sound to me like a particularly useful
thing to check, as I would expect it to be true pretty much 100% of the
time. Why would they publish one policy and then accept registrations
under another?

Like the IANA registry of accepted scripts, this mechanism seems to
solve a problem that the browsers do not have.

Gerv
John C Klensin | 21 Dec 2011 18:24

Re: Browser IDN display policy: opinions sought

Folks,

Gerv's rather focused question seems to have turned into a
rather long and far-ranging set of threads.  I've learned a lot
from it; perhaps others have too. 

With the understanding that I'm speaking only for myself, let me
urge that people take a little bit of a break in honor of the
end of the conventional solar calendar year (and whatever
holidays people are celebrating, if relevant) and let some of
this absorb.  My sense of the high points/ separate issues is:

(1) Most browser vendors feel a need to protect their uses
against known threats.  Whether the "confusion" or "phishing"
problems are appropriate threats to be considered in that
context is a separate issue (#2 below) as well is how that
protection should be offered (#3 below).  Convincing them that
they should not be worried about such threats is probably a lost
cause.

(2) Regardless of how we feel about it and why, it is fairly
clear that the potential for confusion with a 37 character
repertoire is far less than the potential with a repertoire of
circa 50K characters or more.    That statement is true whether
the 37 glyphs are selected from "basic Latin" or just about any
other script.  In addition, there are huge differences between
confusion because of some inherent properties of the characters
(this is, I believe, the topic that Section 4 of UTR 39  and the
confusables.txt file are intended to address), confusion because
of user perception issues (e.g., "seeing what one expects to
see"), and confusion due to deliberate attacks (of which
phishing is a key, but not the only, example).  Whether the same
(browser or otherwise) remedies for all three are the same is an
open question.  While I may be wrong, it continues to appear to
me that different browser strategies put different emphasis on
those three issues by, e.g., assuming that I'm more likely to be
confused by characters in an unfamiliar script than by a
familiar one.

(3) If protection is going to be offered via the User Interface
(browser or otherwise), the mechanism chosen (Punycode, refusal
to render at all (via question marks, funny boxes, etc.),
highlighting, popups, etc.) are ultimately going to be a matter
of taste.  Some styles may be more appropriate to some browsers
than others, some more to some customers than others, and so on.
Even when appropriate fonts are not available for displaying the
native-character string, I imagine we could have a long debate
about whether it would be better to display Punycode or the
"undisplayable character" symbols of choice.   There are good
reasons why the IETF rarely enters deeply into that area and we
may be illustrating at least some of them.

(4) Thare is actually an IDNA2008 requirement that registries
handling IDNs establish policies for the strings that they are
willing to accept for registration.  A check on whether such
policies exist (however that is accomplished) is ultimately just
a check on conformance to the Standard.   Evaluations of whether
the pollcies are reasonable and/or adequate and/or actually
followed and enforced is, of course, a much different and harder
matter.

(5) Independent of how it is accomplished --or whether, in
today's environment, it can be accomplished at all-- it is clear
that the problems that could be evaluated and protected against
at the client UI end of things would be much reduced if there
were effective push-back against, or prevention of, deliberately
problematic registrations and delegations of names.    Some of
the browser policies started out as ways to put pressure on
registries to adopt such restrictions.    Unfortunately, most
"shun the bad guys" models work much better when almost everyone
conforms to community norms and only a few exceptional cases
need special treatment.   When a large fraction of the cases
need the special "your policy model and its enforcement aren't
good enough" treatment, the whole approach becomes somewhat less
effective (whether it is enough less effective to be worth
dropping depends on judgment calls about tradeoffs between risks
and tradeoffs.

If people are going to consider the discussion and any
characterization like the above is helpful, might I suggest
separate threads?

  best,
    john
Gervase Markham | 21 Dec 2011 18:28
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 21/12/11 17:24, John C Klensin wrote:
> With the understanding that I'm speaking only for myself, let me
> urge that people take a little bit of a break in honor of the
> end of the conventional solar calendar year (and whatever
> holidays people are celebrating, if relevant) and let some of
> this absorb.  My sense of the high points/ separate issues is:

Hi John,

Helpful as always :-) I plan to also take your advice, and mull things
over over mulled wine. Merry Christmas to everyone!

Gerv
Eric Brunner-Williams | 22 Dec 2011 17:05
Favicon

Re: Browser IDN display policy: opinions sought

I've been thinking about John's recent remark that the "the potential
for confusion with a 37 character repertoire is far less than the
potential with a repertoire of circa 50K characters or more". I write
to point out a context, outline a mechanism, and offer a rational.

For the sets of points in the string space defined by a distance 1 in
Damerau–Levenshtein metric from the strings "microsoft", "twitter",
"facebook", "google", and "apple", in the .COM namespace (Verisign
registry operator, ICANN policy authority, registration policy
implemented by ICANN accredited registrars), the non-intermediated
non-NXDOMIN resolution ratios are 61%, 74%, 81%, 83%, and 86%,
respectively*,**.

1. Given the order of introduction of scripts, in both the MdR and
Bejing roots, that is, LDH first, the use of non-Latin homoglyphs to
extend the repertoire over which a distance 1 in Damerau–Levenshtein
metric resolvable string from an existing string in a namespace may be
found is likely to be preceded by non-zero resolvable strings in the
LDH distance 1 in Damerau–Levenshtein metric.

Restated, typosquatting in LDH precedes typosquatting in LDH+.

2. Densities of non-intermediated non-NXDOMAIN resolution in distance
1 in Damerau–Levenshtein metric from this, or any equivalent sample
set of strings in namespaces which differ in registry policy, e.g.,
.cat, .tel, .museum, or which differ in policy authority, e.g., .sa,
may be determined by resolving applications, at run time, as well as
earlier.

Restated, a browser vendor may determine that the root problem is in
namespace policy authority, not a property of glyphs, and test any
resolution for Damerau–Levenshtein metric 1 neighbors, and act upon
that resolution-time or earlier data.

3. With likelihood .3 or greater, unmediated non-NXDOMAIN resolutions
within a Damerau–Levenshtein metric 1 of these five strings contain
discoverable links to a monitizing agent -- Google's DoubleClick
business unit. High correlations exist for the associated NS records,
and the addresses mapped by A records, for strings within this metric
1 set.

I suggest that the context missing is, unfortunately, the policy
authority and its implementers.

A mechanism any resolution requesting application author may exercise,
at any point in time, including "run time", and hence not necessarily
limited to the static representations of a dynamic set of resolution
contexts, roots included, is to check for the Damerau–Levenshtein
metric 1 density of apparently unmediated non-NXDOMAIN returns for
resolutions, about some suite of strings, and/or around the string
resolution is attempted. High density is distinguishable from low
density, and the application logic may branch upon some empirically
observed property of a stringspace within a namespace.

A rational is the application author, unless specifically motivated,
say an "evil browser" (and we've seen plenty of browser hacks to fake
IDN TLDs over the past decade, which while not "evil", is not "good"
either), is unlikely to share in the recurring revenue obtained by any
registry operator and its database access providers (registrars), the
homomorph exploiting registrant, and the policy authority which
materially benefits from all of these registrations.

In my modest opinion, the observations made by operators of the .sa
registry are not well answered by appeals to script repertoires that
do not address pre-existing conditions, and their root causes in
policy and economics allowed by policy, in the LDH set of code points.

Its solstice, so I'm going to go hang apples on the trees for the
deer, and corn for the squirrels. Enjoy the day, tomorrow will be
longer, and the night, for no night will be longer.

Eric

(*) The authors of the typosquatting study do not appear to have
limited their Damerau–Levenshtein metric 1 sets to the AZERTY or
QWERTY keyboard adjacent set, the "i" "l" "1" or "o" "0" sets or
similar subsets of the 101 key, 37 character constraints. If their
data did not contain these limits, then for those subsets, the ratios
may be significantly higher, though not in excess of 100%.

(**) The authors of the typosquatting study found 1,502 of a total
possible 2,249 points in the .COM string space defined by six strings
(the sixth being the author's name "saphos") resolved. If the
resolutions were unmediated, these registrations created an additional
$294 revenue for ICANN, and approximately $9,000 additional revenue
for Verisign, and possibly a similar additional revenue amount for a
number of registrars.
Eric Brunner-Williams | 22 Dec 2011 18:21
Favicon

Re: Browser IDN display policy: opinions sought

Apologies in advance. I wrote this note yesterday, and the note I
posted a few minutes ago this morning, after thinking about the "stale
knowledge" part of the solution proposed by Jonathan Frakes.

Eric

> (1) Most browser vendors ...

are likely to implement independent policy evaluations, as mailer and
mail filter vendors did circa 2002 when one or more then new
namespaces were ... not to their taste, and as a major mail and
resolver response to a event not addressed by some source of policy
(sitefinder and the fixes).

> Convincing them that
> they should not be worried about such threats is probably a lost
> cause.

they may not share a critical material dependency with other sources
of policy, particularly those interested only in a small subset of the
several suites of technologies these vendors are primarily concerned with.

> (2) ... it is fairly
> clear that the potential for confusion with a 37 character
> repertoire is far less than the potential with a repertoire of
> circa 50K characters or more. 

demonstrably true where the registration policy allows replication of
the ownership and use patterns established in the .com namespace.

counter-examples may exist where the registration policy does not
allow replication of the ownership and use patterns established in the
.com namespace.

> (3) If protection is going to be offered via ...

it is likely to take more than a single form.

> (4) Thare is actually an IDNA2008 requirement that registries
> handling IDNs establish policies for the strings that they are
> willing to accept for registration.

the motivation for conformance is variable. for a set of registries a
source of motivation exists, at least in theory.

> (5) Independent of how it is accomplished --or whether, in
> today's environment, it can be accomplished at all-- it is clear
> that the problems that could be evaluated and protected against
> at the client UI end of things would be much reduced if there
> were effective push-back against, or prevention of, deliberately
> problematic registrations and delegations of names.

several similar observations have been made in the past.

the circa 2004/2005 .cat registry operator implemented a registration
restriction policy which did (and continues) not allow replication of
the ownership and use patterns established in the .com namespace.

the circa 2009 IRT made, inter alia, an implementable recommendation.

the GAC has on several occasions made recommendations to this effect.

the FTC last week made just such a recommendation.

these recommendations have not been adopted by registrars, nor by
registries, while they remain outside the contractual obligations of
these entities to a source of policy. the necessity for repeatedly
making such policy recommendations may arise from the absence of
acceptance of such by a policy adopting entity of some relevance to
our problem domain, though, as i noted earlier, they have been adopted
by more registry operators than just the operator of .cat.

from where i sit it seems quite reasonable for application vendors to
be "policy aware" and distinguish between any two namespaces with
substantively distinct registration (and related, eventually we could
get to default TTLs and tolerance of fast flux hosting and other
well-known features of the landscape), and signal that to their users
or whatever consumes state and logic their applications produce.

after all, if browser vendors are expected to distinguish between
signed and unsigned zones and their leaves, why shouldn't they
distinguish between zones based upon other empirically observable
semantic differences?

-e
Abdulrahman I. ALGhadir | 21 Dec 2011 09:42
Picon

RE: Browser IDN display policy: opinions sought

I think currently both ICANN and IANA maintain a public list for new IDNs and their supported
script/language, see the 2 lists:
 - IANA : http://www.iana.org/domains/idn-tables/
 - ICANN: http://www.icann.org/en/topics/idn/fast-track/string-evaluation-completion-en.htm

These 2 lists can be easily combined and amended with some extra fields to suit the application developer
needs. For example, additional field can be added to cover supported languages and scripts that will help
applications safely to display U-Labels instead of A-Labels when the language/script rules is met.

AbdulRahman,

-----Original Message-----
From: idna-update-bounces <at> alvestrand.no [mailto:idna-update-bounces <at> alvestrand.no] On Behalf Of
Gervase Markham
Sent: 19/Dec/2011 1:42 PM
To: Raed Al-Fayez
Cc: idna-update <at> alvestrand.no
Subject: Re: Browser IDN display policy: opinions sought

Hi Raed,

Thank you for your input.

I would say that the vendor you quote is using sloppy language and it's not actually about 'prioritizing
English', but your point about first-class citizenship is a good one.

If all registries were are responsible as yours, there would hardly be a need for browser restrictions on
IDN display at all (perhaps just a blacklist of homographs for "." and "/", and that should have been
achieved by IDNA2008 anyway).

> IDNs should not be treated as second-class citizens on the Internet.

I entirely agree. This is why we went for option B - once a registry has a responsible policy, their IDNs are
treated as first-class citizens everywhere (at least, in Firefox). No additional configuration required.

Unfortunately, IDNs are still not treated as first-class citizens. So the question is: how do we get from
where we are now to a situation where they are treated that way?

Gerv
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

-----------------------------------------------------------------------------------
Disclaimer:
This message and its attachment, if any, are confidential and may contain legally
privileged information. If you are not the intended recipient, please contact the
sender immediately and delete this message and its attachment, if any, from your
system. You should not copy this message or disclose its contents to any other
person or use it for any purpose. Statements and opinions expressed in this e-mail
are those of the sender, and do not necessarily reflect those of the Communications
and Information Technology Commission (CITC). CITC accepts no liability for damage
caused by this email.
Gervase Markham | 21 Dec 2011 11:58
Picon
Favicon
Gravatar

Re: Browser IDN display policy: opinions sought

On 21/12/11 08:42, Abdulrahman I. ALGhadir wrote:
> I think currently both ICANN and IANA maintain a public list for new
> IDNs and their supported script/language, see the 2 lists: - IANA :
> http://www.iana.org/domains/idn-tables/ - ICANN:
> http://www.icann.org/en/topics/idn/fast-track/string-evaluation-completion-en.htm
>
>  These 2 lists can be easily combined and amended with some extra
> fields to suit the application developer needs. For example,
> additional field can be added to cover supported languages and
> scripts that will help applications safely to display U-Labels
> instead of A-Labels when the language/script rules is met.

Hi Abdulrahman,

As I understand it, these sites are merely records of policy; no
restrictions are imposed. So if .newtld decided to register with IANA to
say "we accept every character", then the tables would duly record that
fact, despite there being a big potential problem.

So, unless I have missed something, such lists are of no help in this
situation.

Gerv
J-F C. Morfin | 21 Dec 2011 16:29

Re: Browser IDN display policy: opinions sought

At 11:58 21/12/2011, Gervase Markham wrote:
>As I understand it, these sites are merely records of policy; no
>restrictions are imposed. So if .newtld decided to register with IANA to
>say "we accept every character", then the tables would duly record that
>fact, despite there being a big potential problem.
>
>So, unless I have missed something, such lists are of no help in this
>situation.

Gervase,

please let stay within the RFC framework.

The DNS is built to accept every byte. IDNA2008 accept RFC 5892 UTF-8
characters. Internet DNS practice favors ASCII only. This is the
universal policy by default. Every zone manager may restrict its
registrations policy and universally publish it (TLDs) or not (other
zone managers). Your request is to know when a domain name entered by
a user conforms or not with this registration policy, so you may
practice your own policy to be defined.

1. there is as yet no protocol for TLD to disseminate that information
between them and application designers.

2. You may engage into a BoF to show that such a protocol is necessary
for the Internet to work better and propose an IETF WG Charter.

3. personnally I think that such a protocol belongs to the whole
digital ecosystem area and is to be discussed and possibly
standardised outside of the sole IETF. Its discussions calls for the
preliminary emergence of its requirement out of experimentation. Such
an Internet+ community experimentation has been engaged through the
files documented by Abdulrahman, ICANN work on variants, and works
engaged within the IUse community within the so called "IDNA2012"
framework (the so called IDNA2010 being the issues on the user side).

jfc 
Abdulrahman I. ALGhadir | 24 Dec 2011 06:47
Picon

RE: Browser IDN display policy: opinions sought

I guess you have missed something.
The second link was broken I didn't copy it correctly and I think that you haven't check it, have you?

http://www.icann.org/en/topics/idn/fast-track/string-evaluation-completion-en.htm

When you want to register an IDN you should define the script and the language which your label will follow.

http://www.icann.org/en/topics/idn/fast-track/idn-cctld-implementation-plan-15dec11-en.pdf
(check 3.2)
http://www.icann.org/en/topics/new-gtlds/rfp-clean-19sep11-en.pdf (check 1.3.1)

As you see it is part from ICANN application process and it is a must and I think this will help us in having a
solid resource which can provide us with needed information?

-----Original Message-----
From: idna-update-bounces <at> alvestrand.no [mailto:idna-update-bounces <at> alvestrand.no] On Behalf Of
Gervase Markham
Sent: 21/Dec/2011 1:58 PM
To: Abdulrahman I. ALGhadir
Cc: idna-update <at> alvestrand.no; Raed Al-Fayez
Subject: Re: Browser IDN display policy: opinions sought

On 21/12/11 08:42, Abdulrahman I. ALGhadir wrote:
> I think currently both ICANN and IANA maintain a public list for new 
> IDNs and their supported script/language, see the 2 lists: - IANA :
> http://www.iana.org/domains/idn-tables/ - ICANN:
> http://www.icann.org/en/topics/idn/fast-track/string-evaluation-comple
> tion-en.htm
>
>  These 2 lists can be easily combined and amended with some extra 
> fields to suit the application developer needs. For example, 
> additional field can be added to cover supported languages and scripts 
> that will help applications safely to display U-Labels instead of 
> A-Labels when the language/script rules is met.

Hi Abdulrahman,

As I understand it, these sites are merely records of policy; no restrictions are imposed. So if .newtld
decided to register with IANA to say "we accept every character", then the tables would duly record that
fact, despite there being a big potential problem.

So, unless I have missed something, such lists are of no help in this situation.

Gerv

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

-----------------------------------------------------------------------------------
Disclaimer:
This message and its attachment, if any, are confidential and may contain legally
privileged information. If you are not the intended recipient, please contact the
sender immediately and delete this message and its attachment, if any, from your
system. You should not copy this message or disclose its contents to any other
person or use it for any purpose. Statements and opinions expressed in this e-mail
are those of the sender, and do not necessarily reflect those of the Communications
and Information Technology Commission (CITC). CITC accepts no liability for damage
caused by this email.

Gmane