Andrew Sullivan | 9 Dec 18:49 2011

Re: Browser IDN display policy: opinions sought

On Fri, Dec 09, 2011 at 11:12:29AM +0000, Gervase Markham wrote:
> 
> A (IE, Chrome): Unicode if the (single) 'language' of the string is
> configured in the options, Punycode otherwise.

The problem with this, of course, is that in many cases there's no way
to tell what language a string is in.  If you get things all in the
Arabic Script, for instance, what language are you in?  And Latin is a
disaster for this.  

It would be quite another matter if there were a way for a zone
operator to publish somehow what languages (or maybe scripts?) they
support.  In that case, you could look it up and know what to do (you
might not even have to do this quickly).

> B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode
> otherwise. Arbitrary script mixing permitted (registry policy used to
> prevent abuse).

The problem here has always been both the notion of TLD and the
whitelist maintenance.  The publicsuffix.org list is, in effect, a
lookaside root list, and it makes me extremely uncomfortable.
Moreover, what do you do about things lower in the tree?

> C (Safari): Unicode if the script is in a whitelist (which by default
> does not include Cyrillic or Greek), Punycode otherwise. Not sure about
> script mixing.

This approach sucks in all the ways you say.  I think it is the worst
option.
(Continue reading)

Mark Davis ☕ | 9 Dec 19:10 2011

Re: Browser IDN display policy: opinions sought

I'm not familiar with the code, but I think that (A) may actually be:

A (IE, Chrome): Unicode if the (single) 'script' of the string matches one of the scripts of the user's language(s) in the options,
Punycode otherwise.

It is pretty easy and reliable to detect the script of the string, whereas language detection would be unreliable.

(It would be possible to match the characters of the string against the "customary" characters used in the user's languages in the options, but that would be trickier, and is probably not worth it.)

Mark
— Il meglio è l’inimico del bene —


On Fri, Dec 9, 2011 at 09:49, Andrew Sullivan <ajs <at> anvilwalrusden.com> wrote:
This approach sucks in all the ways you say.  I think it is the worst
option.

I think that the right approach would be A _if_ you could get the
advantages of B automatically somehow.  At the moment, however, I
think all the answers are bad ones.

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Gervase Markham | 12 Dec 11:54 2011
Picon

Re: Browser IDN display policy: opinions sought

On 09/12/11 18:10, Mark Davis ☕ wrote:
> I'm not familiar with the code, but I think that (A) may actually be:
> 
> A (IE, Chrome): Unicode if the (single) 'script' of the string matches
> one of the scripts of the user's language(s) in the options,
> Punycode otherwise.
> 
> It is pretty easy and reliable to detect the script of the string,
> whereas language detection would be unreliable.

I can quite believe it may be something like this; but how does one deal
with the impedance mismatch that users think they are defining
languages, but what you need is scripts? Does IE keep a script/language
mapping? Is that data (perhaps compiled by others) publicly available
somewhere, e.g. from the Unicode consortium?

Gerv
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
"Martin J. Dürst" | 14 Dec 12:02 2011
Picon

Re: Browser IDN display policy: opinions sought

On 2011/12/12 19:54, Gervase Markham wrote:
> On 09/12/11 18:10, Mark Davis ☕ wrote:
>> I'm not familiar with the code, but I think that (A) may actually be:
>>
>> A (IE, Chrome): Unicode if the (single) 'script' of the string matches
>> one of the scripts of the user's language(s) in the options,
>> Punycode otherwise.
>>
>> It is pretty easy and reliable to detect the script of the string,
>> whereas language detection would be unreliable.

I have to correct myself. In another mail, I was writing that I was 
quite sure that Mark's correction applied. But by playing around with 
IE, I found out that this may only partially be the case.

I looked at http://www.viagénie.com/ in IE (IE8 on Win7), and it showed 
punycode. I then added "en" (English) to my language preferences (which 
were just "ja" (Japanese) out of the box because I rarely use IE). 
viagénie was still shown in punycode. Then I added "de" (German), and 
now viagénie was shown. So either IE uses a separate "script" category 
"ASCII-only" (but the algorithm would still be script-oriented at the 
core) or the letters for a language are taken rather widely, with German 
including French accented letters and so on (which would be a 
language-only algorithm).

Michel, if you know any details (that you can talk about), it would be 
nice to hear from you.

When showing punycode, IE also displayed a one-line message just above 
the page itself and below the chrome (tabs and stuff), saying 
(translating back from Japanese) "This Web address contains letters or 
symbols that cannot be displayed with the current language settings. If 
you click here, options will be displayed...". When clicking, I get the 
options of changing my language settings, of not displaying the message 
anymore, or of getting some further explanations or help.

> I can quite believe it may be something like this; but how does one deal
> with the impedance mismatch that users think they are defining
> languages, but what you need is scripts? Does IE keep a script/language
> mapping? Is that data (perhaps compiled by others) publicly available
> somewhere, e.g. from the Unicode consortium?

Some of the data is in the suppress-script fields in the language subtag 
registry at IANA. At 
http://www.iana.org/assignments/language-subtag-registry, if you see 
something like:

%%
Type: language
Subtag: af
Description: Afrikaans
Added: 2005-10-16
Suppress-Script: Latn
%%

then Suppress-Script: Latn tells you that Afrikaans is, for all intents 
and purposes, written with the Latin script. This information isn't 
complete (given the number of languages in the subtag registry, that 
shouldn't be a surprise), but I'd say it's highly accurate where it's 
there, and it's there for most of the major languages for which it can 
be reasonably provided.

For character coverage needed for a language, CLDR (the Unicode Common 
Locale Data Repository, http://cldr.unicode.org) provides quite a lot of 
data to work with, although you may want to have a closer look or talk 
with somebody more familiar with the data and processes before you work 
on a particular application.

While I'm mentioning data sources, I also wanted to mention 
http://www.unicode.org/reports/tr36/, Unicode Security Considerations, 
and http://www.unicode.org/reports/tr39/, Unicode Security Mechanisms, 
and the data sources mentioned there. I'm very surprised that nobody has 
mentioned them, because I think they are extremely relevant and helpful 
for our discussion and for actual implementations.

Regards,   Martin.
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Michel Suignard | 14 Dec 19:12 2011

RE: Browser IDN display policy: opinions sought

> I looked at http://www.viagénie.com/ in IE (IE8 on Win7),
> and it showed punycode. I then added "en" (English) to my
> language preferences (which were just "ja" (Japanese)
> out of the box because I rarely use IE). 
>viagénie was still shown in punycode. Then I added "de" 
>(German), and now viagénie was shown. So either IE uses
> a separate "script" category "ASCII-only" (but the algorithm
> would still be script-oriented at the core) or the letters for
> a language are taken rather widely, with German including
> French accented letters and so on (which would be a
> language-only algorithm).
>
>Michel, if you know any details (that you can talk about),
> it would be nice to hear from you.

Martin, you are correct, enabling any Latin based languages other than English would unlock IDN for Latin
script in IE. I was never a fan of blocking IDN for English users but I was not part of the IE team and that was
their decision. Given that new devices are able to show most U-label w/o install of new fonts I agree that
nowadays browsers should show them. And being in charge of creating charts for all of them in both Unicode
and 10646 I can tell it is not a small feast.

I also found some public information about the white list that IE uses for script mixing. It is a bit old
(2006), but I don't think it has changed but I obviously don't know. Check
http://blogs.msdn.com/b/ie/archive/2006/07/31/684337.aspx 

Michel
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Ken Whistler | 14 Dec 23:36 2011
Picon

Re: Browser IDN display policy: opinions sought

On 12/14/2011 3:02 AM, "Martin J. Dürst" wrote:
> On 2011/12/12 19:54, Gervase Markham wrote:
>
>
>> I can quite believe it may be something like this; but how does one deal
>> with the impedance mismatch that users think they are defining
>> languages, but what you need is scripts? Does IE keep a script/language
>> mapping? Is that data (perhaps compiled by others) publicly available
>> somewhere, e.g. from the Unicode consortium?
>
>
>
> For character coverage needed for a language, CLDR (the Unicode Common 
> Locale Data Repository, http://cldr.unicode.org) provides quite a lot 
> of data to work with, although you may want to have a closer look or 
> talk with somebody more familiar with the data and processes before 
> you work on a particular application.
>
>
>

Just following up this particular query about publicly available data 
about script/language
mapping, CLDR also makes available specific charts which specify the 
(commonly used)
scripts for a large number of languages, including nearly all of the 
languages which
would be used for IDNs. See:

http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/languages_and_scripts.html

and the reverse indexed:

http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/scripts_and_languages.html

Although this data is not perfect or complete for *all* languages, it is 
a very good
statement of 99.9% of the significant facts of usage relevant to the 
issues being
debated on this thread, IMO.

Anyone making use of this data would need to become familiar with its 
source,
supplementalData.xml in the CLDR releases, and know something about the 
extensions
which CLDR makes to the Unicode notion of "script", before just blindly 
implementing
it. For example, the Japanese *language* is identified as being written 
with the
Japanese *script* in languages_and_scripts.html. The Japanese "script" 
actually
refers to the Japanese writing system, which combines several scripts, 
but which, for
various implementations reasons is identified in CLDR with an aggregated 
script
identifier. And so on.

However, I think this is the kind of machine-readable information that 
Gervase was
inquiring about.

Note also that CLDR is an ongoing project responsive to public input and 
feedback,
so if there are deficiencies, omissions, or outright errors in the 
script and language
data, the CLDR project would like to hear about it via bug reports. See:

http://cldr.unicode.org/

--Ken

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Gervase Markham | 12 Dec 12:15 2011
Picon

Re: Browser IDN display policy: opinions sought

On 09/12/11 17:49, Andrew Sullivan wrote:
> The problem here has always been both the notion of TLD and the
> whitelist maintenance.  The publicsuffix.org list is, in effect, a
> lookaside root list, and it makes me extremely uncomfortable.
> Moreover, what do you do about things lower in the tree?

The publicsuffix.org list and the IDN TLD whitelist are two separate
entities and used for different purposes (although I happen to be
involved in them both). Let's not get them mixed up :-)

>> C (Safari): Unicode if the script is in a whitelist (which by default
>> does not include Cyrillic or Greek), Punycode otherwise. Not sure about
>> script mixing.
> 
> This approach sucks in all the ways you say.  I think it is the worst
> option.
> 
> I think that the right approach would be A _if_ you could get the
> advantages of B automatically somehow.

And ponies! ;-)

> Note, too, that if the root zone expands the way it sometimes
> threatens to, the whitelist approach will become impractical without
> an awful lot of failures.

Indeed. It is with an eye to that future event that we are re-evaluating
our position.

Gerv
Andrew Sullivan | 12 Dec 16:23 2011

Re: Browser IDN display policy: opinions sought

On Mon, Dec 12, 2011 at 11:15:55AM +0000, Gervase Markham wrote:
> 
> The publicsuffix.org list and the IDN TLD whitelist are two separate
> entities and used for different purposes (although I happen to be
> involved in them both). Let's not get them mixed up :-)

They work on the same principle, however: the policies of some group
other than that group that makes the delegation from the root is what
matters.

> > 
> > I think that the right approach would be A _if_ you could get the
> > advantages of B automatically somehow.
> 
> And ponies! ;-)

Well, it's bound to sound that way if you don't take seriously the
idea that there might be a way to figure these things out.

Suppose that zone operators (not just the root or TLDs, but any random
zone you liked) had a mechanism by which you could look up their
policies for, say, code point inclusion.  That is, I'm RegyCo, and I
run .example.  I put an SRV or URI or something record at .example
that points you to a policy document that tells you what code point
ranges are permitted together in a single label in my zone, and also
(for that matter) what code points I will register _at all_.  Now you
are in a position to decide whether you think my policy is sensible;
and you are also in a position to decide whether any given label
actually meets my own stated policies.  Finally, since this forms the
basis for a filter in your software, you have the ability to set a
default for your users that makes sense, but also a way for people who
want it to get the benefits of the most permissive settings available
under approach A.  Finally, it wouldn't involve a massive scaling
problem facing the whitelist in the case the root zone increases
dramatically in size, since most of the work (all?) could be
automated.

Best,
A
--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
Gervase Markham | 12 Dec 17:54 2011
Picon

Re: Browser IDN display policy: opinions sought

On 12/12/11 15:23, Andrew Sullivan wrote:
> On Mon, Dec 12, 2011 at 11:15:55AM +0000, Gervase Markham wrote:
>>
>> The publicsuffix.org list and the IDN TLD whitelist are two separate
>> entities and used for different purposes (although I happen to be
>> involved in them both). Let's not get them mixed up :-)

Hey, if the equivalent to the PSL info was published in DNS instead by
everyone, I'd be the first to applaud :-)

>> And ponies! ;-)
> 
> Well, it's bound to sound that way if you don't take seriously the
> idea that there might be a way to figure these things out.

There was a winking smiley ;-)

> Suppose that zone operators (not just the root or TLDs, but any random
> zone you liked) had a mechanism by which you could look up their
> policies for, say, code point inclusion.  That is, I'm RegyCo, and I
> run .example.  I put an SRV or URI or something record at .example
> that points you to a policy document that tells you what code point
> ranges are permitted together in a single label in my zone, and also
> (for that matter) what code points I will register _at all_.  Now you
> are in a position to decide whether you think my policy is sensible;
> and you are also in a position to decide whether any given label
> actually meets my own stated policies.

If I am to do such a check (and presumably to fail if the domain doesn't
meet it), what about when a policy changes to be more strict? How do you
deal with grandfathering?

What about performance? I would need to look up the rules for the zone
"foo.com" every time I accessed bar.foo.com, for lots of values of foo.
This doesn't sound like it would improve performance.

If there are going to be rules, by far the best place to enforce them is
once at domain registration time, not in real time in performance
critical code millions of times a day at access time.

Approaches like this were considered when we initially made the decision
to go with the solution we have. The bottom of
http://www.mozilla.org/projects/security/tld-idn-policy-list.html
says:

"The Moz/Opera anti-spoofing mechanism is the result of widespread
public analysis and discussion, and has the following advantages:

...

    It is simple to code and deploy: about ten lines of code for the
Mozilla implementation.
...
    It is the sole survivor of a large number of alternative proposals
that were considered and rejected. Unlike most of the other rejected
proposals, it does not need any modifications to the DNS protocol, or
distribution of "language" codes for labels, nor does it require
multiple DNS lookups, large character tables in the browser, or
real-time access to WHOIS information.
...

Gerv
Paul Hoffman | 12 Dec 18:24 2011
Picon

Re: Browser IDN display policy: opinions sought

Not speaking for Andrew, just myself:

On Dec 12, 2011, at 8:54 AM, Gervase Markham wrote:

> If I am to do such a check (and presumably to fail if the domain doesn't
> meet it), what about when a policy changes to be more strict? How do you
> deal with grandfathering?

The zone owner deals with grandfathering. They publish a policy that reflects all of the zones they control.

If you are asking "how does a browser deal with grandfathering when a policy changes?", I would say the same
thing: assume that the zone owner has reasons for publishing that policy.

> What about performance? I would need to look up the rules for the zone
> "foo.com" every time I accessed bar.foo.com, for lots of values of foo.
> This doesn't sound like it would improve performance.

A sane browser would look up policies based on the browser's own policy update strategy, and would risk
missing a policy update. Further, given that the policies would have TTLs on them, you don't even need to
think about looking up a policy until the TTL expires. Further, the browser will know whether or not the
zone had a policy before and base its lookup strategy on that.

Or was that meant to be another strawman objection?

> If there are going to be rules, by far the best place to enforce them is
> once at domain registration time, not in real time in performance
> critical code millions of times a day at access time.

Fully disagree. That restricts TLDs to never changing their policies. A browser vendor might want this
convenience, but there are plenty of people who would like the browser vendors to be more responsive to
changes than that so that IDNs can be more useful.

> Approaches like this were considered when we initially made the decision
> to go with the solution we have. The bottom of
> http://www.mozilla.org/projects/security/tld-idn-policy-list.html
> says:
> 
> "The Moz/Opera anti-spoofing mechanism is the result of widespread
> public analysis and discussion, and has the following advantages:
> 
> ...
> 
>    It is simple to code and deploy: about ten lines of code for the
> Mozilla implementation.
> ...
>    It is the sole survivor of a large number of alternative proposals
> that were considered and rejected. Unlike most of the other rejected
> proposals, it does not need any modifications to the DNS protocol, or
> distribution of "language" codes for labels, nor does it require
> multiple DNS lookups, large character tables in the browser, or
> real-time access to WHOIS information.
> ...

Noted. So, if other browser vendors adopt a different approach, are you saying Mozilla won't? I thought the
purpose of this thread was to revisit the question of what would be best for the users.

--Paul Hoffman
Gervase Markham | 12 Dec 18:33 2011
Picon

Re: Browser IDN display policy: opinions sought

On 12/12/11 17:24, Paul Hoffman wrote:
> Not speaking for Andrew, just myself:
> 
> On Dec 12, 2011, at 8:54 AM, Gervase Markham wrote:
> 
>> If I am to do such a check (and presumably to fail if the domain
>> doesn't meet it), what about when a policy changes to be more
>> strict? How do you deal with grandfathering?
> 
> The zone owner deals with grandfathering. They publish a policy that
> reflects all of the zones they control.

So published policies can only ever been loosened, not tightened?

(If they were tightened, existing domains might fail the new policy,
which would result in them being blocked. This is what I mean by the
grandfathering problem.)

> If you are asking "how does a browser deal with grandfathering when a
> policy changes?", I would say the same thing: assume that the zone
> owner has reasons for publishing that policy.

So just block the domains, then?

>> What about performance? I would need to look up the rules for the
>> zone "foo.com" every time I accessed bar.foo.com, for lots of
>> values of foo. This doesn't sound like it would improve
>> performance.
> 
> A sane browser would look up policies based on the browser's own
> policy update strategy, and would risk missing a policy update.
> Further, given that the policies would have TTLs on them, you don't
> even need to think about looking up a policy until the TTL expires.
> Further, the browser will know whether or not the zone had a policy
> before and base its lookup strategy on that.

Except if it's the first time the user has visited the site, or the user
has cleared their browsing history, or is using private browsing, or
their cache has been cleared due to memory pressure on their mobile
device, or...

> Or was that meant to be another strawman objection?

Oh, no.

Historically, the idea that a browser will burden the Internet and its
own speed characteristics with millions or billions of additional
requests per day, (and making them blocking - so don't render the site
until it returns) for a purpose such as this has been met with
incredulity and barely-suppressed laughter from our performance and
networking teams.

>> If there are going to be rules, by far the best place to enforce
>> them is once at domain registration time, not in real time in
>> performance critical code millions of times a day at access time.
> 
> Fully disagree. That restricts TLDs to never changing their policies.

Surely the opposite? If a TLD enforces a policy at registration time, it
can change that policy and start accepting registrations under the new
one, without consulting anyone.

> Noted. So, if other browser vendors adopt a different approach, are
> you saying Mozilla won't? I thought the purpose of this thread was to
> revisit the question of what would be best for the users.

I'm not saying that. But no other browser vendor has adopted an approach
which requires extra network requests for each new site visited, and
periodically thereafter. And I don't expect them to.

Gerv
Paul Hoffman | 12 Dec 19:18 2011
Picon

Re: Browser IDN display policy: opinions sought

On Dec 12, 2011, at 9:33 AM, Gervase Markham wrote:

> On 12/12/11 17:24, Paul Hoffman wrote:
>> Not speaking for Andrew, just myself:
>> 
>> On Dec 12, 2011, at 8:54 AM, Gervase Markham wrote:
>> 
>>> If I am to do such a check (and presumably to fail if the domain
>>> doesn't meet it), what about when a policy changes to be more
>>> strict? How do you deal with grandfathering?
>> 
>> The zone owner deals with grandfathering. They publish a policy that
>> reflects all of the zones they control.
> 
> So published policies can only ever been loosened, not tightened?
> 
> (If they were tightened, existing domains might fail the new policy,
> which would result in them being blocked. This is what I mean by the
> grandfathering problem.)

Published policies can be tightened. No one so far has talked about *blocking* domains: we have been
talking about when to display names in one form or another. Note that "loosening" might be the same as
"tightening": it can change the way a domain appears in the location bar.

>> If you are asking "how does a browser deal with grandfathering when a
>> policy changes?", I would say the same thing: assume that the zone
>> owner has reasons for publishing that policy.
> 
> So just block the domains, then?

Why would you want to block a domain instead of just showing the Punycode-encoded name?

>>> What about performance? I would need to look up the rules for the
>>> zone "foo.com" every time I accessed bar.foo.com, for lots of
>>> values of foo. This doesn't sound like it would improve
>>> performance.
>> 
>> A sane browser would look up policies based on the browser's own
>> policy update strategy, and would risk missing a policy update.
>> Further, given that the policies would have TTLs on them, you don't
>> even need to think about looking up a policy until the TTL expires.
>> Further, the browser will know whether or not the zone had a policy
>> before and base its lookup strategy on that.
> 
> Except if it's the first time the user has visited the site, or the user
> has cleared their browsing history, or is using private browsing, or
> their cache has been cleared due to memory pressure on their mobile
> device, or...

When a user visits www.example.com and the browser only knows the policy for .com, the browser might look up
the policy for example.com. Or, it might not: a browser that cares about display speed might have a list of
single-script terminal labels that don't need looking up, such as "www".

You can code your browser however you want, of course. To me, a sane browser would not clear the Punycode
display history when the browser clears its browsing history or goes into private browsing. The display
of domain names (or anything!) in the location bar is unrelated to the content of the page. 

>> Or was that meant to be another strawman objection?
> 
> Oh, no.
> 
> Historically, the idea that a browser will burden the Internet and its
> own speed characteristics with millions or billions of additional
> requests per day, (and making them blocking - so don't render the site
> until it returns) for a purpose such as this has been met with
> incredulity and barely-suppressed laughter from our performance and
> networking teams.

That's fine: this isn't such a proposal. I'm not sure why you are treating it as one.

>>> If there are going to be rules, by far the best place to enforce
>>> them is once at domain registration time, not in real time in
>>> performance critical code millions of times a day at access time.
>> 
>> Fully disagree. That restricts TLDs to never changing their policies.
> 
> Surely the opposite? If a TLD enforces a policy at registration time, it
> can change that policy and start accepting registrations under the new
> one, without consulting anyone.

Have you forgotten that TLDs are registered with the root? Or are you making an exception for them in the
"once at domain registration time" rule above?

>> Noted. So, if other browser vendors adopt a different approach, are
>> you saying Mozilla won't? I thought the purpose of this thread was to
>> revisit the question of what would be best for the users.
> 
> I'm not saying that. But no other browser vendor has adopted an approach
> which requires extra network requests for each new site visited, and
> periodically thereafter. And I don't expect them to.

Noted. You also expected them to adopt the Mozilla IDN policy, but that didn't happen either. We all
surprise each other, often to good effect.

--Paul Hoffman
Gervase Markham | 13 Dec 11:26 2011
Picon

Re: Browser IDN display policy: opinions sought

On 12/12/11 18:18, Paul Hoffman wrote:
>> (If they were tightened, existing domains might fail the new
>> policy, which would result in them being blocked. This is what I
>> mean by the grandfathering problem.)
> 
> Published policies can be tightened. No one so far has talked about
> *blocking* domains: we have been talking about when to display names
> in one form or another. Note that "loosening" might be the same as
> "tightening": it can change the way a domain appears in the location
> bar.

OK, yes. I was typing faster than thinking. For "blocking", read "start
displaying the name as Punycode".

I guess what I'm saying is that with this browser-enforced mechanism,
someone can register a name, verify that it renders fine in all
browsers, start using it, build a business, and then a change of
registry policy leads to their name starting to appear as gobbledegook
everywhere simultaneously.

That doesn't sound awesome.

>> Except if it's the first time the user has visited the site, or the
>> user has cleared their browsing history, or is using private
>> browsing, or their cache has been cleared due to memory pressure on
>> their mobile device, or...
> 
> When a user visits www.example.com and the browser only knows the
> policy for .com, the browser might look up the policy for
> example.com. Or, it might not: a browser that cares about display
> speed might have a list of single-script terminal labels that don't
> need looking up, such as "www".

http://www.com?

This part of the idea sounds like it would need something like, er, the
Public Suffix List to make sure it worked correctly. ;-)

> You can code your browser however you want, of course. To me, a sane
> browser would not clear the Punycode display history when the browser
> clears its browsing history or goes into private browsing.

A user who clears their history wants, or is sufficiently likely to
want, the browser to entirely forget that he has visited the sites he
has visited - in such a way that someone examining his computer cannot
tell he has been there. Therefore, any browser record of domain names
(whether it's HSTS pin information, domain name letter display policy,
or anything like that) also has to be cleared.

>> Historically, the idea that a browser will burden the Internet and
>> its own speed characteristics with millions or billions of
>> additional requests per day, (and making them blocking - so don't
>> render the site until it returns) for a purpose such as this has
>> been met with incredulity and barely-suppressed laughter from our
>> performance and networking teams.
> 
> That's fine: this isn't such a proposal. I'm not sure why you are
> treating it as one.

Are you saying that these lookups would be non-blocking?

Or are you saying that implementing it in a browser used by 450 million
people wouldn't lead to billions of additional requests per day?

>>> Fully disagree. That restricts TLDs to never changing their
>>> policies.
>> 
>> Surely the opposite? If a TLD enforces a policy at registration
>> time, it can change that policy and start accepting registrations
>> under the new one, without consulting anyone.
> 
> Have you forgotten that TLDs are registered with the root? Or are you
> making an exception for them in the "once at domain registration
> time" rule above?

I'm sorry, I've failed to follow this subthread of the discussion. Can
you restate your point for me?

Gerv
Paul Hoffman | 13 Dec 16:37 2011
Picon

Re: Browser IDN display policy: opinions sought

On Dec 13, 2011, at 2:26 AM, Gervase Markham wrote:

> OK, yes. I was typing faster than thinking. For "blocking", read "start
> displaying the name as Punycode".

Whew. Good.

> I guess what I'm saying is that with this browser-enforced mechanism,
> someone can register a name, verify that it renders fine in all
> browsers, start using it, build a business, and then a change of
> registry policy leads to their name starting to appear as gobbledegook
> everywhere simultaneously.
> 
> That doesn't sound awesome.

A zone *always* has the right to change information for sub-domains in the zone; many of those changes will
affect the story you gave above. The policy for name display is one of the least-offensive of these changes.

As others have said, if you want permanent perfection, don't look for it in the DNS.

>>> Except if it's the first time the user has visited the site, or the
>>> user has cleared their browsing history, or is using private
>>> browsing, or their cache has been cleared due to memory pressure on
>>> their mobile device, or...
>> 
>> When a user visits www.example.com and the browser only knows the
>> policy for .com, the browser might look up the policy for
>> example.com. Or, it might not: a browser that cares about display
>> speed might have a list of single-script terminal labels that don't
>> need looking up, such as "www".
> 
> http://www.com?

Correct. If you haven't cached .com's name display policy when you get that, go ahead and display it.

> This part of the idea sounds like it would need something like, er, the
> Public Suffix List to make sure it worked correctly. ;-)

It sounds like you are fishing for reasons to support it; have a party with that.

>> You can code your browser however you want, of course. To me, a sane
>> browser would not clear the Punycode display history when the browser
>> clears its browsing history or goes into private browsing.
> 
> A user who clears their history wants, or is sufficiently likely to
> want, the browser to entirely forget that he has visited the sites he
> has visited - in such a way that someone examining his computer cannot
> tell he has been there. Therefore, any browser record of domain names
> (whether it's HSTS pin information, domain name letter display policy,
> or anything like that) also has to be cleared.

You are possibly mixing up levels again. If a user goes to www.nastypr0nsite.com and hides that by clearing
everything in his browser, that action does not clear the DNS cache at the same time.

>>> Historically, the idea that a browser will burden the Internet and
>>> its own speed characteristics with millions or billions of
>>> additional requests per day, (and making them blocking - so don't
>>> render the site until it returns) for a purpose such as this has
>>> been met with incredulity and barely-suppressed laughter from our
>>> performance and networking teams.
>> 
>> That's fine: this isn't such a proposal. I'm not sure why you are
>> treating it as one.
> 
> Are you saying that these lookups would be non-blocking?
> 
> Or are you saying that implementing it in a browser used by 450 million
> people wouldn't lead to billions of additional requests per day?

The latter. Repeating myself: the first time a user goes to www.somenewsite.com, if the policy for .com is
already in the user's DNS cache, there is no additional lookup.

>>>> Fully disagree. That restricts TLDs to never changing their
>>>> policies.
>>> 
>>> Surely the opposite? If a TLD enforces a policy at registration
>>> time, it can change that policy and start accepting registrations
>>> under the new one, without consulting anyone.
>> 
>> Have you forgotten that TLDs are registered with the root? Or are you
>> making an exception for them in the "once at domain registration
>> time" rule above?
> 
> I'm sorry, I've failed to follow this subthread of the discussion. Can
> you restate your point for me?

One level up, you said "If there are going to be rules, by far the best place to enforce them is once at domain
registration time, not in real time in performance critical code millions of times a day at access time". I
disagreed because TLDs are registered in the root, and I do not want ICANN enforcing a policy on TLDs that
the TLDs cannot change over time.

--Paul Hoffman
Gervase Markham | 14 Dec 11:55 2011
Picon

Re: Browser IDN display policy: opinions sought

On 13/12/11 15:37, Paul Hoffman wrote:
> You are possibly mixing up levels again. If a user goes to
> www.nastypr0nsite.com and hides that by clearing everything in his
> browser, that action does not clear the DNS cache at the same time.

In Firefox, it clears everything under our control - and, in fact, we
have had additional APIs added to the plugin interface used by e.g.
Flash so we can clear stuff not under our direct control as well, such
as 'Flash cookies'. If Firefox retains any sort of record that you've
visited a particular site after you have cleared all data, that's a bug.

> One level up, you said "If there are going to be rules, by far the
> best place to enforce them is once at domain registration time, not
> in real time in performance critical code millions of times a day at
> access time". I disagreed because TLDs are registered in the root,
> and I do not want ICANN enforcing a policy on TLDs that the TLDs
> cannot change over time.

I can't parse the last sentence. Are you saying:

a) "I do not want ICANN enforcing a policy on TLDs such that the TLDs
cannot change the policy over time."

or

b) "I do not want ICANN enforcing a policy on TLDs such that the nature
of the policy in regards to what TLDs can and cannot exist, cannot
change over time."

or something else? Surely, with respect to b), ICANN does this, but has
no problems today changing its policy about what is and isn't allowed?

Gerv
Paul Hoffman | 14 Dec 18:32 2011
Picon

Re: Browser IDN display policy: opinions sought

On Dec 14, 2011, at 2:55 AM, Gervase Markham wrote:

> On 13/12/11 15:37, Paul Hoffman wrote:
>> You are possibly mixing up levels again. If a user goes to
>> www.nastypr0nsite.com and hides that by clearing everything in his
>> browser, that action does not clear the DNS cache at the same time.
> 
> In Firefox, it clears everything under our control - and, in fact, we
> have had additional APIs added to the plugin interface used by e.g.
> Flash so we can clear stuff not under our direct control as well, such
> as 'Flash cookies'. If Firefox retains any sort of record that you've
> visited a particular site after you have cleared all data, that's a bug.

As far as I have seen in my tests with Firefox, the OS's DNS cache is not one of the things that falls under
"everything in our control". So, if I go to www.nastypr0nsite.com in Firefox private browsing and then
quit from Firefox, and someone grabs my computer right then, they can see that an application wanted the
DNS information for that site. The fact that the application also wanted the IDN display policy doesn't
seem any more damning than the application wanting the A/AAAA record.

>> One level up, you said "If there are going to be rules, by far the
>> best place to enforce them is once at domain registration time, not
>> in real time in performance critical code millions of times a day at
>> access time". I disagreed because TLDs are registered in the root,
>> and I do not want ICANN enforcing a policy on TLDs that the TLDs
>> cannot change over time.
> 
> I can't parse the last sentence. Are you saying:
> 
> a) "I do not want ICANN enforcing a policy on TLDs such that the TLDs
> cannot change the policy over time."
> 
> or
> 
> b) "I do not want ICANN enforcing a policy on TLDs such that the nature
> of the policy in regards to what TLDs can and cannot exist, cannot
> change over time."
> 
> or something else? Surely, with respect to b), ICANN does this, but has
> no problems today changing its policy about what is and isn't allowed?

I meant (a). I want the same for all zones, regardless of when they are registered. I agree that this takes
more work on the part of browsers, and will cause more traffic on the Internet: it is worth it to have IDNs
work better than they do today where some browsers block display for reasons unfathomable to users.

--Paul Hoffman
John C Klensin | 12 Dec 20:30 2011

Re: Browser IDN display policy: opinions sought


--On Monday, December 12, 2011 09:24 -0800 Paul Hoffman
<phoffman <at> imc.org> wrote:

>...
>> If there are going to be rules, by far the best place to
>> enforce them is once at domain registration time, not in real
>> time in performance critical code millions of times a day at
>> access time.
> 
> Fully disagree. That restricts TLDs to never changing their
> policies. A browser vendor might want this convenience, but
> there are plenty of people who would like the browser vendors
> to be more responsive to changes than that so that IDNs can be
> more useful.

Paul, perhaps Gerv should have stated that rule-enforcement
provision differently (and maybe he should have said "by the
registration and delegation process" rather than at a specific
time0, but I disagree with your inference.  But:

-- A domain applicant who doesn't meet requirements at the time
of application should certainly be able to reapply if the
requirements change.

-- A domain applicant who meets requirements at the time of
registration and whose domain is delegated, still has to renew
the registration and, especially given appropriate contract
provisions could be subjected to newer rules at renewal time.
In the case of rules modified to deal with problems, really
egregious, problem-causing, variations from those new rules
could result in domain cancellation.  Note that, especially in
the last year, we've seen an increasing number of domain
cancellations at the demand of various governments.  That makes
me very nervous, but it is happening and, if the relevant
registry is within the jurisdiction of some body with
cancellation-demanding authority, it isn't likely that it will
change (even if efforts to tighten the conditions under which
cancellation can be requested in some countries are
significantly tightened).  From one point of view, those
external interventions are the consequence of industry (read
"ICANN, registries, registrars, and the domaineers"
unwillingness or inability to self-police (see Eric
Brunner-Williams's recent note, which is probably a better
description of the problem than mine).

Whether cancelling registrations or waiting for renewal, changes
involve some time lag but that is true of almost everything else
in this space including both Gerv's list and various "embed the
lists in the DNS" ideas.

best,
   john
Andrew Sullivan | 13 Dec 00:50 2011

Re: Browser IDN display policy: opinions sought

On Mon, Dec 12, 2011 at 04:54:27PM +0000, Gervase Markham wrote:
> If I am to do such a check (and presumably to fail if the domain doesn't
> meet it), what about when a policy changes to be more strict? How do you
> deal with grandfathering?

Yes, we're going to have a problem with this.  But note that nothing
says that this policy needs be the one you actually register with;
it's just the way that you state, "I permit these things together."
But this is admittedly sort of hand-wavy right now.  It is entirely
possible that this is a fatal problem; but you already have that fatal
problem today, so I don't see how this is any worse than a problem you
have now.

> What about performance? I would need to look up the rules for the zone
> "foo.com" every time I accessed bar.foo.com, for lots of values of foo.
> This doesn't sound like it would improve performance.

I was sort of imaginging that these policies (1) would be cacheable,
so that you wouldn't actually need to do things in real time all the
time, and (2) would fail soft, so that you fall back to A-label form
until you've managed to fetch the relevant policy (at which time you
can check the label against the policy and update the display as
necessary).

> If there are going to be rules, by far the best place to enforce them is
> once at domain registration time, not in real time in performance
> critical code millions of times a day at access time.

Right.  But you're talking about different kinds of rules: (1) how do
I display this? and (2) what is permitted for registration?  You want
(1) to be linked to (2) some how, and I agree.  But I cannot see how
either shipping static lists around or else relying on
language-guessing of intended domains actually addresses the user
problems we're attempting to talk about.

>     It is the sole survivor of a large number of alternative proposals
> that were considered and rejected. Unlike most of the other rejected
> proposals, it does not need any modifications to the DNS protocol, or
> distribution of "language" codes for labels, nor does it require
> multiple DNS lookups, large character tables in the browser, or
> real-time access to WHOIS information.

The only reason the latter two of these are true is because the root
zone is small.  If it grows to several thousands of labels a
significant number of which are IDNs, the last two advantages turn out
to be fatal flaws, because there's no practical way to make the
decision that you need to make on heuristic grounds.  I'm not trying
to dismiss those factors; I think those are indeed advantages to the
existing solution.  But as you see in this thread, there are
disadvantages that also pile up; and I think that pile gets bigger as
the root zone expands.

Best,

A

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
"Martin J. Dürst" | 13 Dec 05:52 2011
Picon

Re: Browser IDN display policy: opinions sought

On 2011/12/13 8:50, Andrew Sullivan wrote:
> On Mon, Dec 12, 2011 at 04:54:27PM +0000, Gervase Markham wrote:

>>      It is the sole survivor of a large number of alternative proposals
>> that were considered and rejected. Unlike most of the other rejected
>> proposals, it does not need any modifications to the DNS protocol, or
>> distribution of "language" codes for labels, nor does it require
>> multiple DNS lookups, large character tables in the browser, or
>> real-time access to WHOIS information.
>
> The only reason the latter two of these are true is because the root
> zone is small.  If it grows to several thousands of labels a
> significant number of which are IDNs, the last two advantages turn out
> to be fatal flaws, because there's no practical way to make the
> decision that you need to make on heuristic grounds.  I'm not trying
> to dismiss those factors; I think those are indeed advantages to the
> existing solution.  But as you see in this thread, there are
> disadvantages that also pile up; and I think that pile gets bigger as
> the root zone expands.

Even without significant growth in the root zone, "large character 
tables in the browser" is actually very relative. 
http://www.unicode.org/Public/UNIDATA/Scripts.txt is about 120kB, but 
most of it is spaces and comments, and it separates out characters by 
character class. Removing character class and taking into account gaps 
and stuff that's not allowed in IDNs anyway, the table can be 
*significantly* compacted.

Regards,    Martin.
Mark Davis ☕ | 13 Dec 06:15 2011

Re: Browser IDN display policy: opinions sought

FYI, a simple binary data structure that contains all the script info is 2,156 bytes. The extended script info would add 385 bytes to that.

Mark
— Il meglio è l’inimico del bene —


On Mon, Dec 12, 2011 at 20:52, "Martin J. Dürst" <duerst <at> it.aoyama.ac.jp> wrote:
On 2011/12/13 8:50, Andrew Sullivan wrote:
On Mon, Dec 12, 2011 at 04:54:27PM +0000, Gervase Markham wrote:

    It is the sole survivor of a large number of alternative proposals
that were considered and rejected. Unlike most of the other rejected
proposals, it does not need any modifications to the DNS protocol, or
distribution of "language" codes for labels, nor does it require
multiple DNS lookups, large character tables in the browser, or
real-time access to WHOIS information.

The only reason the latter two of these are true is because the root
zone is small.  If it grows to several thousands of labels a
significant number of which are IDNs, the last two advantages turn out
to be fatal flaws, because there's no practical way to make the
decision that you need to make on heuristic grounds.  I'm not trying
to dismiss those factors; I think those are indeed advantages to the
existing solution.  But as you see in this thread, there are
disadvantages that also pile up; and I think that pile gets bigger as
the root zone expands.

Even without significant growth in the root zone, "large character tables in the browser" is actually very relative. http://www.unicode.org/Public/UNIDATA/Scripts.txt is about 120kB, but most of it is spaces and comments, and it separates out characters by character class. Removing character class and taking into account gaps and stuff that's not allowed in IDNs anyway, the table can be *significantly* compacted.

Regards,    Martin.

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
John C Klensin | 12 Dec 18:16 2011

Re: Browser IDN display policy: opinions sought


--On Monday, December 12, 2011 10:23 -0500 Andrew Sullivan
<ajs <at> anvilwalrusden.com> wrote:

>> And ponies! ;-)

> Well, it's bound to sound that way if you don't take seriously
> the idea that there might be a way to figure these things out.
> 
> Suppose that zone operators (not just the root or TLDs, but
> any random zone you liked) had a mechanism by which you could
> look up their policies for, say, code point inclusion.  That
> is, I'm RegyCo, and I run .example.  I put an SRV or URI or
> something record at .example that points you to a policy
> document that tells you what code point ranges are permitted
> together in a single label in my zone, and also (for that
> matter) what code points I will register _at all_.  Now you
> are in a position to decide whether you think my policy is
> sensible; and you are also in a position to decide whether any
> given label actually meets my own stated policies.  Finally,
> since this forms the basis for a filter in your software, you
> have the ability to set a default for your users that makes
> sense, but also a way for people who want it to get the
> benefits of the most permissive settings available under
> approach A.  Finally, it wouldn't involve a massive scaling
> problem facing the whitelist in the case the root zone
> increases dramatically in size, since most of the work (all?)
> could be automated.

Andrew, sure, but...   This comes back to the assumptions that: 

	-- all registries are good guys and enforce whatever
	rules they make.
	
	-- all registrars are good guys, with neither motivation
	nor will for getting around the rules.
	
	-- if either of the above fail, there is someone with
	both the authority and willingness to require that the
	rules be enforced and to enforce that requirement (or to
	enforce the rules itself, but that is even more
	farfetched).

Now, unless one believes in miracle turnarounds from history,
all of the above assumptions are demonstrably and massively
false.  If they were only occasionally false, Gerv would still
need to decide whether his obligation to protect users required
some additional measures.  But, despite believing strongly that
ICANN should be held responsible for stepping up to the role in
this that I read into their charter and bylaws, I think spending
energy on a policy that requires believing all three of the
above today should get you, not just a pony, but an opportunity
on a discount price on a bridge I understand is for sale.

    john
Eric Brunner-Williams | 12 Dec 19:18 2011
Picon

Re: Browser IDN display policy: opinions sought

On 12/12/11 12:16 PM, John C Klensin wrote:
> 	-- all registries are good guys and enforce whatever
> 	rules they make.

The incumbent monopoly operator and the larger two of the marginally
viable 2000 round gTLD operators are on record opposed to registry
liability for "willful blindness" to systematic misconduct by
registrars and their resellers.

> 	-- all registrars are good guys, with neither motivation
> 	nor will for getting around the rules.

Some 600 of the 900 or so entities ICANN has accredited as
"registrars" exist for the purpose of providing (race) access to the
"drop pool", and a significant number of the remaining 300 registrars
hold substantial self-owned portfolios of domains created prior to the
change of rule concerning "domain tasting".

To attempt a slightly less implied universal cynicism than John, where
the universe of "registries" contains .cat, .coop, .museum, and
perhaps twice that number of registries, none of which is price
capped, and the universe of "registrars" is similarly limited to those
that have not pursued "tasting" or the "secondary market" and which
are not committed to acquisition of a registry agreement and the
monitization strategy of unrestricted registration (and this is not a
null set), Andrew's assumption is not inherently doomed to fail.

So passing on to the third assumption:

> 	-- if either of the above fail, there is someone with
> 	both the authority and willingness to require that the
> 	rules be enforced and to enforce that requirement (or to
> 	enforce the rules itself, but that is even more
> 	farfetched).

At the Rome meeting ICANN took heat for Sitefinder, and after more
time than necessary, issued a statement on the harm of synthetic return.

The next major demonstration of sanity was pulling the plug on the
systemic exploit of the Add Grace Period, ending domain tasting. It
did not, however, act upon registrars which engaged in tasting, other
than de-accrediting those (low-tech and/or low-clue) unable to cease
tasting, and unable to pay the new fee to taste above the permitted
thresholds.

Those are the major sanity-as-enforced-policy I recall. For the past
two years or more ICANN has had the opportunity to spend more on
contractual compliance. I observed at the Cartagena meeting that the
recent head of compliance hire was low-clue on well-known forms of
abuse. I don't think this is greatly improved -- my reading of the
weaknesses of the current CEO, and of the transition effect that has
been present for the past six months or more, and continues for the
next several months.

So I share John's cynicism w.r.t. registry contract enforcement, and
registrar contract enforcement.

There will be safe and sane namespaces, and there will be namespaces
who's operators will maximize revenues, creating externalities to be
bourne by others.

My two beads worth,
Eric
Andrew Sullivan | 13 Dec 00:59 2011

Re: Browser IDN display policy: opinions sought

On Mon, Dec 12, 2011 at 12:16:01PM -0500, John C Klensin wrote:
> Andrew, sure, but...   This comes back to the assumptions that: 
> 
> 	-- all registries are good guys and enforce whatever
> 	rules they make.

No, because you can check those rules yourself in your resolution
context: look at what you are looking up and compare it to the rules
to see whether it conforms.  Indeed, if that's not good enough, you
have this problem anyway.

> 	
> 	-- all registrars are good guys, with neither motivation
> 	nor will for getting around the rules.
> 	
> 	-- if either of the above fail, there is someone with
> 	both the authority and willingness to require that the
> 	rules be enforced and to enforce that requirement (or to
> 	enforce the rules itself, but that is even more
> 	farfetched).
> 
> Now, unless one believes in miracle turnarounds from history,
> all of the above assumptions are demonstrably and massively
> false.  If they were only occasionally false, Gerv would still
> need to decide whether his obligation to protect users required
> some additional measures.  But, despite believing strongly that
> ICANN should be held responsible for stepping up to the role in
> this that I read into their charter and bylaws, I think spending
> energy on a policy that requires believing all three of the
> above today should get you, not just a pony, but an opportunity
> on a discount price on a bridge I understand is for sale.
> 
>     john
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Idna-update mailing list
> Idna-update <at> alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
Andrew Sullivan | 13 Dec 03:10 2011

Re: Browser IDN display policy: opinions sought

Sorry, I managed to send this before I intended to.  The rest is
below.

On Mon, Dec 12, 2011 at 06:59:37PM -0500, Andrew Sullivan wrote:
> On Mon, Dec 12, 2011 at 12:16:01PM -0500, John C Klensin wrote:
> > Andrew, sure, but...   This comes back to the assumptions that: 
> > 
> > 	-- all registries are good guys and enforce whatever
> > 	rules they make.
> 
> No, because you can check those rules yourself in your resolution
> context: look at what you are looking up and compare it to the rules
> to see whether it conforms.  Indeed, if that's not good enough, you
> have this problem anyway.
> 
> > 	
> > 	-- all registrars are good guys, with neither motivation
> > 	nor will for getting around the rules.

This is a problem we already have, for _any_ of these rules.  What's
special about the current approach that solves that?

> > 	-- if either of the above fail, there is someone with
> > 	both the authority and willingness to require that the
> > 	rules be enforced and to enforce that requirement (or to
> > 	enforce the rules itself, but that is even more
> > 	farfetched).

We also already don't have this.  On the contrary, what we have right
now is a case where rules are inconsistent among registries, there is
no way at all to find out the rules in zones not near the root, those
near-root zones are treated according to at least three different
display conventions, and one of those conventions entails using a
_different_ set of more or less arbitrary rules established under
conventions also not strictly rooted in the behaviour of anyone
operating the zones.  How is this better?

If the goal is, "Protect people from bad actors," my suggestion is,
"Don't use the DNS.  It's a worse match for that task than the
hundreds of others people seem to want to throw into it."  But if the
goal is to know whether there is something resembling a policy that
allows you to make slightly-informed guesses about whether it is sane
to treat U-labels in a zone as U-labels, I'm suggesting that we can do
better than either "SWAG about the language this label is supposed to
be in" or "I know who the bad guys are, trust me."

Best,

A

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
John C Klensin | 13 Dec 11:34 2011

Re: Browser IDN display policy: opinions sought


--On Monday, December 12, 2011 21:10 -0500 Andrew Sullivan
<ajs <at> anvilwalrusden.com> wrote:

> Sorry, I managed to send this before I intended to.  The rest
> is below.
> 
> On Mon, Dec 12, 2011 at 06:59:37PM -0500, Andrew Sullivan
> wrote:
>...
>> No, because you can check those rules yourself in your
>> resolution context: look at what you are looking up and
>> compare it to the rules to see whether it conforms.  Indeed,
>> if that's not good enough, you have this problem anyway.

But you cannot check if the rules involve similarity with what
is present in the zone.  Certainly you can check for
mixed-script labels but (ignoring the complexities of the
various exception cases), that is a useless check to make
against a determined attacker because the recommendation to make
such checks is just too widely known and understood (yes, it
would give some protection against really dumb attackers,
but...).  To do a more sophisticated check, you'd need to be
able to ask the DNS server to return all of the labels that
might be confused with the one you are thinking about looking
up.  That would be a useful function for many purposes,
especially if one could ignore the issues associated with
perception and doing it algorithmically.   But it requires two
things of DNS servers (presumably all of them): that they be
able to convert IDNs back to Unicode code points so they could
do similarity searches on them and that they be able to do
similarity (fuzzy and/or distance measure) searches on those
converted strings.  Actually a third: the ability to return a
rather long list of labels, something that is not supported by
any existing DNS query other than zone transfers.  Or, I
suppose, one ignore the security issues and could just do the
zone transfer oneself and then perform the conversions,
searching, and matching locally.  Ignoring the questions of
which DNS this could be implemented in and how long it would
take to deploy, I can imagine Gerv's implementers and
performance folks who won't tolerate changes with far lower
performance consequences having ROFL responses or worse.

>> > 	-- all registrars are good guys, with neither motivation
>> > 	nor will for getting around the rules.
> 
> This is a problem we already have, for _any_ of these rules.
> What's special about the current approach that solves that?

At some level, that is exactly my point.  Whether he gets the
list right or not, Gerv's "Type B" is based on a whitelist of
well-behaved registries, where well-behaved includes (even if
indirectly) assuring the good behavior of registrars.  In a
world in which stating policies and ignoring them is ok (because
there are no effective sanctions) and, as Tina indirectly points
out, having guidelines that are ignored by some registries is ok
(again, no consequences), then Gerv needs to run his registry
evaluation system anyway.  It makes it a little easier for him
to find the rules that make up part of the evaluation, but the
"do they actually follow those rules" part is unchanged.

Ignoring the performance issues, etc., there is another problem
with saying "lets put a pointer to the rules in the DNS".  If
those rules are going to be machine-processed, there must be an
agreed-upon format.  The diversity of the possible types of
rules and some experience with similar format discussions might
make the time needed to develop and agree upon the required
format, keywords, operators, etc., compare unfavorably with the
design, development, and deployment time for DNSng.  And that
assumes the IETF did the work; if the obvious other organization
tried it, I'd assume it would take that long before the
committees stopped delegating work to other committees and
actually sat down to do something.

>> > 	-- if either of the above fail, there is someone with
>> > 	both the authority and willingness to require that the
>> > 	rules be enforced and to enforce that requirement (or to
>> > 	enforce the rules itself, but that is even more
>> > 	farfetched).

> We also already don't have this.  On the contrary, what we
> have right now is a case where rules are inconsistent among
> registries, there is no way at all to find out the rules in
> zones not near the root, those near-root zones are treated
> according to at least three different display conventions, and
> one of those conventions entails using a _different_ set of
> more or less arbitrary rules established under conventions
> also not strictly rooted in the behaviour of anyone operating
> the zones.  How is this better?

And you left out "many of the rules that do exist are just
ignored in practice".  All I was suggesting was that your
proposal wouldn't help much.  Pointers to rules that are ignored
helps no one.  Pointers to rules that cannot be parsed and
accurately understood don't help with lookup-time processing,
even there were no performance issues.

Unless this situation is rationalized sufficiently by some
entity that has the authority to enforce it on some domains and
create, by example, a model that inspires (or creates pressure
on) others, then either: 

	Gerv (and many others) are wrong and confusable names
	will never amount to much as an attack vector except
	against the dumbest of users ... or ...
	
	IDNs don't have much future outside isolated,
	single-language communities because these
	blunt-instrument tools will exclude too much  and/or
	enough people will be victimized to create a general
	sense of fear.

> If the goal is, "Protect people from bad actors," my
> suggestion is, "Don't use the DNS.  It's a worse match for
> that task than the hundreds of others people seem to want to
> throw into it." 

Strongly agree.  But you know that already.

> But if the goal is to know whether there is
> something resembling a policy that allows you to make
> slightly-informed guesses about whether it is sane to treat
> U-labels in a zone as U-labels, I'm suggesting that we can do
> better than either "SWAG about the language this label is
> supposed to be in" or "I know who the bad guys are, trust me."

And I'm suggesting that any system that makes the rules easier
to find will ultimately come down to your second choice above
unless some entity starts enforcing (at least) conformance to
declared rules and preferably a minimum set of rules as well.

best,
   john
Andrew Sullivan | 13 Dec 16:49 2011

Re: Browser IDN display policy: opinions sought

On Tue, Dec 13, 2011 at 05:34:40AM -0500, John C Klensin wrote:

> but...).  To do a more sophisticated check, you'd need to be
> able to ask the DNS server to return all of the labels that
> might be confused with the one you are thinking about looking
> up.

Aha.  You want some kind of assurance that, if you are looking up the
label, you can rely on the party who told you what the policy is to
enforce that policy.

How is this different from the state of affairs that obtains now?  If
Afilias did something bonehead in .info tomorrow, I have little
confidence that Opera and Mozilla would detect it right away -- how
would they even know to look?

I claim that, if "be sure nobody is lying about what they are doing"
is the criterion for success, this effort is doomed.  That's like
wishing for a protocol that will prove the guys with the shell games
in your favourite tourist trap are never going to cheat.  Or, to beat
up on the usual metaphor, it's an invisible flying pony.  With sparkles.

> Ignoring the performance issues, etc., there is another problem
> with saying "lets put a pointer to the rules in the DNS".  If
> those rules are going to be machine-processed, there must be an
> agreed-upon format.

Yes, this is a problem.  OTOH, as we see in this thread, the existing
answers are all broken.  Perhaps the REPUTE WG offers us a chance at
a way to evaluate these things over time?

> Unless this situation is rationalized sufficiently by some
> entity that has the authority to enforce it on some domains and
> create, by example, a model that inspires (or creates pressure
> on) others

This sounds like a desire for a universal co-ordinator of the DNS.
The exact point of the protocol was to get rid of that choke point, so
I don't think we're going to re-invent it.

> > supposed to be in" or "I know who the bad guys are, trust me."
> 
> And I'm suggesting that any system that makes the rules easier
> to find will ultimately come down to your second choice above
> unless some entity starts enforcing (at least) conformance to
> declared rules and preferably a minimum set of rules as well.

In the area of spam control, despite all the nasty side effects,
consulting several different abuse lists (which we might like to think
of as "reputation services") gets you more information to base your
decisions on.  I don't see why a similar approach might not work for
IDN display, _provided that_ zones have a way of stating what it is
they're trying to do.  Such a mechanism (and calling this hand-wavy
sketch of an idea a "proposal" is giving it too much credit) would be
extremely imperfect and it would mean that new names always started at
a disadvantage.  But it would at least give us something to build on.

Best,

A

--

-- 
Andrew Sullivan
ajs <at> anvilwalrusden.com
Eric Brunner-Williams | 13 Dec 18:09 2011
Picon

Re: Browser IDN display policy: opinions sought

On 12/13/11 10:49 AM, Andrew Sullivan wrote:
> In the area of spam control, despite all the nasty side effects,
> consulting several different abuse lists (which we might like to think
> of as "reputation services") gets you more information to base your
> decisions on.  ...

Circa 2002 there was wide spread filtering of one 2000 round new gTLD
due to the difference between its stated purpose and policy, and its
actual registration policy, as the latter made the namespace
accessible to unsollicited commercial emailers. The reputation effect
persisted for several years (and may still persist).

The point being that examples of autonomous mechanism behavior tending
towards apparent, even actual coherency of policy, exist for
namespaces, not just domain names, addresses, and address block
allocations.

> ... I don't see why a similar approach might not work for
> IDN display, _provided that_ zones have a way of stating what it is
> they're trying to do.  Such a mechanism (and calling this hand-wavy
> sketch of an idea a "proposal" is giving it too much credit) would be
> extremely imperfect and it would mean that new names always started at
> a disadvantage.  But it would at least give us something to build on.

The obverse would be the claim that non-global semantics can not
exist, and that state may not accumulate.

We've sort of been down this path before with gedanken experiments
about encoding discovery, the query-and-response problem, however,
here the problem is display, for which out-of-protocol mechanisms are
possible.

Since we're hand-waving, waving in the general direction of REPUTE
and/or DOMAINREP may necessary to access accumulated state, but may be
insufficient to determine a locally likely display property of some
non-ASCII label.

Two beads and change,
Eric
John Levine | 13 Dec 19:16 2011

Re: Browser IDN display policy: opinions sought

Having been reading this discussion with great interest, I don't
understand what problem is being solved.  Is it:

A) Only display names that are not deceptive?

B) Don't display names that might be deceptive?

C) Don't display names that fail to meet some policy that
doesn't really have anything to do with deception?

D) Only display names that meet some policy?

E) Something else?

It clearly can't be A, since there's plenty of room for deception in
plain ASCII, and people can put random names at the Nth level, e.g.,
FDIC.GOV.FOO.BAR.SOMETHING.TLD.  Beyond that, I'm baffled.

R's,
John
Paul Hoffman | 13 Dec 19:31 2011
Picon

Re: Browser IDN display policy: opinions sought


On Dec 13, 2011, at 10:16 AM, John Levine wrote:

> Having been reading this discussion with great interest, I don't
> understand what problem is being solved.  Is it:
> 
> A) Only display names that are not deceptive?
> 
> B) Don't display names that might be deceptive?
> 
> C) Don't display names that fail to meet some policy that
> doesn't really have anything to do with deception?
> 
> D) Only display names that meet some policy?
> 
> E) Something else?
> 
> It clearly can't be A, since there's plenty of room for deception in
> plain ASCII, and people can put random names at the Nth level, e.g.,
> FDIC.GOV.FOO.BAR.SOMETHING.TLD.  Beyond that, I'm baffled.

The stated reason for not just displaying the Unicode every time is to avoid deception. So, (B).

--Paul Hoffman
Mark Davis ☕ | 13 Dec 22:40 2011

Re: Browser IDN display policy: opinions sought

Martin,

According to all of the information I have from our security people:

IDNA spoofing is far down on the list of importance compared to other ways to spoof. Average people are more swayed by the appearance of the page they land on than on the appearance of the url in the address bar. The average person doesn't distinguish:


The warnings that really grab people's attention are where (for example) a warning screen comes up before the contents appears, telling people that the content page is dangerous, and asking if they want to continue. Simply changing the appearance in the address bar is often overlooked.

That says to me that much it would be better to always show the Unicode characters (thus giving a uniform UI across browsers), but then provide a more obvious UI signal to users that the page is suspect (and for what reason). So from your example, the user should see http://www.viagénie.com and http://биатлон.рф in all of the browsers.

The Unicode vs Punycode UI is a blunt tool anyway; a separate UI signal out from that for gradations in the levels of warnings given to users. Thus the following could get different levels of warnings (depending on the user's language settings)—some being of the "you can't go farther without confirmation" sort:
  • ѕех.рф (the 'sex' are all Cyrillic characters)
  • ѕех.com (the 'sex' are all Cyrillic characters)
  • ypal.com (with just one Cyrillic а).
  • &c.
User's could also get settings to turn off classes of errors, if they find that those get in the way based on their environment.

On determining which pages are suspect because of their URL: If we were in a world where we could depend on registries to police domain name labels, that would be simple for browsers and other clients. Such a Kumbaya planet bears little resemblance to our reality, though. And as far as I know, ICANN neither has the authority to require that every domain name label (at every level, such as the label 'foobar' in foobar.blogspot.co.uk) meets some particular set of requirements, nor would it would be willing to certify (subject to legal damage claims?) that such is the case, even for those domain name labels that it can control.

That says to me that whatever level of signaling that is required is largely up to the browsers; depending on the registries is just wishful thinking. Given that, I think some refinements of A look promising. There are a variety of different possibilities; it would probably be useful for interested parties to brainstorm on the most effective ones in practice.
  • warn on mixed-script labels (allowing certain exceptions, essentially where there are no confusable characters between the scripts, like Latin + Hangul)
  • warn on mixed-script domain names.
  • warn on confusable characters outside of my languages
  • &c.
Mark
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Simon Josefsson | 14 Dec 11:05 2011

Re: Browser IDN display policy: opinions sought

Mark Davis ☕ <mark <at> macchiato.com> writes:

> That says to me that much it would be better to always show the Unicode
> characters (thus giving a uniform UI across browsers),

+1 to that, and thanks for saying it.

I don't think it is constructive to frame a discussion like 'chose
between A, B and C but do not think about the general problem and
propose any other solution that might be better because we don't want to
hear about it'.

> but then provide a more obvious UI signal to users that the page is
> suspect (and for what reason).  So from your example, the user should
> see http://www.viagénie.com<http://www.xn--viagnie-eya.com/> and
> http://биатлон.рф <http://xn--80abvnkf0a.xn--p1ai/> in all of the
> browsers.

Exactly.

There is a market for software that protects against "dangerous"
website.  Phishing is a technological problem that goes far beyond IDNs.
I suggest we let experts in that area handle that problem, and let us
focus on displaying IDNs to users.

As an analogy, consider if we took a similar approach to MIME
attachments.  The way some browsers implement IDNs today is similar to
letting e-mail clients display the raw MIME encoding of the entire
e-mail to the user when the client didn't have the attachment in a
whitelist.

/Simon
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
Gervase Markham | 14 Dec 11:59 2011
Picon

Re: Browser IDN display policy: opinions sought

On 13/12/11 18:16, John Levine wrote:
> Having been reading this discussion with great interest, I don't
> understand what problem is being solved.  Is it:
> 
> A) Only display names that are not deceptive?
> 
> B) Don't display names that might be deceptive?
> 
> C) Don't display names that fail to meet some policy that
> doesn't really have anything to do with deception?
> 
> D) Only display names that meet some policy?
> 
> E) Something else?
> 
> It clearly can't be A, since there's plenty of room for deception in
> plain ASCII, and people can put random names at the Nth level, e.g.,
> FDIC.GOV.FOO.BAR.SOMETHING.TLD.  Beyond that, I'm baffled.

Hence the highlighting of the "Public Suffix + 1" in recent versions of
Firefox and Chrome, and the blacklisting (even before IDNA2008) of
homographs of "." and "/".

The aim is some reasonable approximation of A, given that deceptiveness
is subjective.

(The logic "deceptiveness is subjective -> one should not attempt to do
anything about deceptiveness" is not considered reasonable.)

Gerv
"Martin J. Dürst" | 14 Dec 12:25 2011
Picon

Re: Browser IDN display policy: opinions sought

On 2011/12/14 3:16, John Levine wrote:
> Having been reading this discussion with great interest, I don't
> understand what problem is being solved.  Is it:
>
> A) Only display names that are not deceptive?
>
> B) Don't display names that might be deceptive?
>
> C) Don't display names that fail to meet some policy that
> doesn't really have anything to do with deception?
>
> D) Only display names that meet some policy?
>
> E) Something else?
>
> It clearly can't be A, since there's plenty of room for deception in
> plain ASCII, and people can put random names at the Nth level, e.g.,
> FDIC.GOV.FOO.BAR.SOMETHING.TLD.  Beyond that, I'm baffled.

The whole thing started with the paypаl.com scare (the second 'a' of 
paypal being Cyrillic). The goal of the browser makers was to come up 
with something that addressed this issue, and similar IDN-related and 
script-related potential deceptions. So the goal was:

Don't display names that are potentially deceptive because of 
similarities of letters in different scripts.

That's a pretty limited goal, and because there was quite a bit of 
perceived pressure to do something, and not too much time and not too 
many actual names out there yet that would have people make complain, 
the job was overdone in many ways and not good enough in others (as 
mentioned, the Mozilla approach fails for cases such as wordpress.com).

Regards,   Martin.
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

Gmane