Adam M. Costello | 1 Mar 23:17 2009

plea for NFKC & case-folding, suggestions for definitions

I have not had time to follow the progress of this
working group, but I have now read the latest
draft-ietf-idnabis-{defs,protocol,rationale,tables}, and I have a few
high-level comments.

 1) I am very happy that an internationalized generalization of
    preferred-syntax (host name) labels is being defined, based on the
    principle of including what's needed, rather than excluding only
    what's clearly useless/harmful.  I wanted to work on this in the
    first IDN working group, instead of or in addition to the wide-open
    IDNs of IDNA2003, but the rough consensus then was that it was not
    worth the delay it would cause.

 2) I am not persuaded that IDNAbis can avoid requiring the fundamentals
    of Nameprep: NFKC and case-folding.  More on this below.

 3) I think the approach taken in the definitions section, of building
    up the smaller concepts involved in the ACE architecture, is
    better than the approach I took in RFC 3490--referring to complex
    multi-step operations ToASCII and ToUnicode as primitives.  The
    small-concepts approach allows the reader to develop some intuition.
    I have some suggestions for more concise and rigorous definitions
    following that approach (see below).

Regarding NFKC and case-folding

I think the rationale draft is trying to have it both ways.  It says
a prefix change would be required if a label that is valid in both
IDNA2003 and IDNAbis is represented by different ASCII forms in the
two protocols.  To avoid triggering that incompatibility, it defines
(Continue reading)

Harald Alvestrand | 2 Mar 18:14 2009
Picon

Re: plea for NFKC & case-folding, suggestions for definitions

Adam,

one question.....

do you think that the canonicalization function you're positing can be 
described in an Unicode-version-independent way?

                       harald
Adam M. Costello | 3 Mar 07:28 2009

Re: plea for NFKC & case-folding, suggestions for definitions

Harald Alvestrand <harald <at> alvestrand.no> wrote:

> do you think that the canonicalization function you're positing can be
> described in an Unicode-version-independent way?

I think it can, but I'll need help from our resident Unicode experts to
provide the definition.

The Unicode standard promises normalization stability from
version 4.1 onward, and case-folding stability from version
5.0 onward.  Section 3.13 of version 5.0 suggests this for
compatibility-normalize-and-case-fold:

     NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) =

I don't know why so many iterations are needed, and I don't know why the
first step is NFD rather than NFKD.

We would of course want to change the last NFKD (or both of them) to
NFKC.

After applying this function, we would then check the result for
disallowed code points and other violations (described in the existing
drafts), and either return it or return an error.

We'd have to consider & specify what to do about the handful of obscure
incompatibilites between Unicode 3.2 and 5.0, but we should never have
to do that again because of the newer Unicode stability policies.

Unicode experts, please comment on the feasibility of this approach, and
(Continue reading)

Mark Davis | 3 Mar 18:02 2009

Re: plea for NFKC & case-folding, suggestions for definitions

There is already a draft mapping on http://unicode.org/reports/tr46/ - feedback is welcome.

Mark


On Mon, Mar 2, 2009 at 22:28, Adam M. Costello <idna-update.amc+0+ <at> nicemice.net.removethisword> wrote:
Harald Alvestrand <harald <at> alvestrand.no> wrote:

> do you think that the canonicalization function you're positing can be
> described in an Unicode-version-independent way?

I think it can, but I'll need help from our resident Unicode experts to
provide the definition.

The Unicode standard promises normalization stability from
version 4.1 onward, and case-folding stability from version
5.0 onward.  Section 3.13 of version 5.0 suggests this for
compatibility-normalize-and-case-fold:

    NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) =

I don't know why so many iterations are needed, and I don't know why the
first step is NFD rather than NFKD.

We would of course want to change the last NFKD (or both of them) to
NFKC.

After applying this function, we would then check the result for
disallowed code points and other violations (described in the existing
drafts), and either return it or return an error.

We'd have to consider & specify what to do about the handful of obscure
incompatibilites between Unicode 3.2 and 5.0, but we should never have
to do that again because of the newer Unicode stability policies.

Unicode experts, please comment on the feasibility of this approach, and
suggest ways to simplify it if you can think of any.

Thanks,
AMC
- Show quoted text -
_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

_______________________________________________
Idna-update mailing list
Idna-update <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

Gmane