Paul Hoffman / IMC | 14 Oct 04:47
Picon

Re: length restrictions on IDN label

At 11:43 AM +0900 10/14/02, Soobok Lee wrote:
>Then,
>  U+AC00 x 56 times (in my previous posting)  is a valid label 
>conforming to RFC1035 ?
>   and its equivalent ACE label (of  63 octets ) is a valid label ?

If it follows the rules for ToASCII, yes.

>   UTF8-encoded IDN labels are not governed by RFC1035 length restrictions ?

There is no such thing. IDN labels are always encoded in ASCII 
following the rules of STD 13, just as it says in the draft.

>   IDNA contains  brand new length restrictions for 8bit labels 
>which obsoletes RFC1035 ?

No. Where in the draft do you see such "brand new length restrictions"?

The current goal of the WG is to fix any unclear statements in the 
document. We can't do that if you don't say exactly what text you 
find unclear.

--Paul Hoffman, Director
--Internet Mail Consortium

Soobok Lee | 14 Oct 05:05
Picon

Re: length restrictions on IDN label

Paul Hoffman / IMC wrote:

> At 11:43 AM +0900 10/14/02, Soobok Lee wrote:
>  
>
>>   UTF8-encoded IDN labels are not governed by RFC1035 length 
>> restrictions ?
>
>
> There is no such thing. IDN labels are always encoded in ASCII 
> following the rules of STD 13, just as it says in the draft.
>
That is true only in  protocols predating IDNA draft.
IDN labels can be typed in/ displayed/ copy&pasted/ or exchanged in 
 UTF8 (or other) encoding
in now and future applications or protocols slots as described in IDNA 
draft itself.
See enclosed excerpts from IDNA draft  ( "SEE HERE").

I think some length restriction in code points is needed, rather than in 
octets ....
IDNA is the right place to put such things..

Soobok Lee

6.3 DNS servers

Domain names stored in zones follow the rules for "stored strings" from
[STRINGPREP].

(Continue reading)

James Seng | 14 Oct 05:45
Picon

Re: length restrictions on IDN label

There is already restriction in the number of codepoint or octets in UTF-8
(or other) encoding.

The restriction is defined currently as octet(ToASCII(X)) < 63.

-James Seng

> That is true only in  protocols predating IDNA draft.
> IDN labels can be typed in/ displayed/ copy&pasted/ or exchanged in
>  UTF8 (or other) encoding
> in now and future applications or protocols slots as described in IDNA
> draft itself.
> See enclosed excerpts from IDNA draft  ( "SEE HERE").
>
> I think some length restriction in code points is needed, rather than in
> octets ....
> IDNA is the right place to put such things..
>
> Soobok Lee
>
> 6.3 DNS servers
>
> Domain names stored in zones follow the rules for "stored strings" from
> [STRINGPREP].
>
> For internationalized labels that cannot be represented directly in
> ASCII, DNS servers MUST use the ACE form produced by the ToASCII
> operation. All IDNs served by DNS servers MUST contain only ASCII
> characters.
>
(Continue reading)

Soobok Lee | 14 Oct 06:12
Picon

Re: length restrictions on IDN label

James Seng wrote:

>There is already restriction in the number of codepoint or octets in UTF-8
>(or other) encoding.
>
>The restriction is defined currently as octet(ToASCII(X)) < 63.
>
I already mentioned this in my first posting.
What I request is confirmation and clarification about that restriction.

If UTF8-encoded, that valid 8bit label will exceed 63 octets limits (up 
to 168 octets or more)
 which is imposed by RFC1035 even upon non-ASCII 8bit labels .
 IDNA section 6.3 does not rule out that  utf8 encoded labels may be used
 in DNS wire protocols in the future. And that will affect UDP based 
 DNS protocols
 which suffers from lack of space in the UDP packet length limits (512).
 Packet truncation or protcol errors are inevitable.

That is why i suggest that some separate length restriction on utf8 
(other encoding) IDN labels
 be needed. Or clarifications about those problems, at least.

Soobok Lee

>
>-James Seng
>
>  
>
(Continue reading)

Adam M. Costello | 14 Oct 06:47

Re: length restrictions on IDN label

Soobok Lee <lsb <at> postel.co.kr> wrote:

> If UTF8-encoded, that valid 8bit label will exceed 63 octets limits
> (up to 168 octets or more)

True.

> which is imposed by RFC1035 even upon non-ASCII 8bit labels .

Yes, but labels in DNS containing octets >= 128 are not
internationalized labels, because internationalized labels use only
octets <= 127 in DNS.  Labels in DNS containing octets >= 128 are
mysterious creatures that have no standard interpretation as text
(because ASCII is the only text encoding used by the DNS standard).

> IDNA section 6.3 does not rule out that utf8 encoded labels may be
> used in DNS wire protocols in the future.

In which case the specification of those future wire protocols will need
to deal with the fact that UTF-8 forms of internationalized labels can
have more than 63 octets.  (255 is an upper bound, though not the least
upper bound.)

AMC

Soobok Lee | 14 Oct 07:21
Picon

Re: length restrictions on IDN label

I  read your previous long answer. Thanks, Adams.
My comments begins...

Adam M. Costello wrote:

>Soobok Lee <lsb <at> postel.co.kr> wrote:
>
>  
>
>>If UTF8-encoded, that valid 8bit label will exceed 63 octets limits
>>(up to 168 octets or more)
>>    
>>
>
>True.
>
>  
>
>>which is imposed by RFC1035 even upon non-ASCII 8bit labels .
>>    
>>
>
>Yes, but labels in DNS containing octets >= 128 are not
>internationalized labels, because internationalized labels use only
>octets <= 127 in DNS. 
>
Really ?   then, please goto to the next comment below and compare your  
claim with IDNA's utf8 position.

Length restriction itself in RFC1035 seems to have nothing to do with
(Continue reading)

Adam M. Costello | 14 Oct 07:52

Re: length restrictions on IDN label

Soobok Lee <lsb <at> postel.co.kr> wrote:

> > Yes, but labels in DNS containing octets >= 128 are not
> > internationalized labels, because internationalized labels use only
> > octets <= 127 in DNS.
>
> Really ?

Really.

> "UTF-8 forms of internationalized labels" are not "internationalized
> labels" ?

The UTF-8 form of an internationalized label is an internationalized
label.  But that's irrelevant, because labels in DNS containing octets
>= 128 are not UTF-8.  The only text encoding used by DNS is ASCII
(according to the current DNS standard).  The octets >= 128 in DNS are
non-ASCII, but that doesn't mean they are UTF-8.  We don't know what
they are, except octets.

The UTF-8 form of an internationalized label is an internationalized
label, but a sequence of octets with no charset tag is not an
internationalized label (it's not even text).

So I stand by my position quoted at the top of this message.

AMC

Soobok Lee | 14 Oct 08:29
Picon

Re: length restrictions on IDN label

Adam M. Costello wrote:

>Soobok Lee <lsb <at> postel.co.kr> wrote:
>
>  
>
>>>Yes, but labels in DNS containing octets >= 128 are not
>>>internationalized labels, because internationalized labels use only
>>>octets <= 127 in DNS.
>>>      
>>>
>>Really ?
>>    
>>
>
>Really.
>
>  
>
>>"UTF-8 forms of internationalized labels" are not "internationalized
>>labels" ?
>>    
>>
>
>The UTF-8 form of an internationalized label is an internationalized
>label.  But that's irrelevant, because labels in DNS containing octets
>  
>
>>= 128 are not UTF-8.  The only text encoding used by DNS is ASCII
>>    
(Continue reading)

Adam M. Costello | 14 Oct 09:55

Re: length restrictions on IDN label

Soobok Lee <lsb <at> postel.co.kr> wrote:

> UTF-8 forms make subset of the entire set of non-ASCII forms.
> Thus, the utf8-compliant subset has been under the overall length
> restriction imposed by RFC1035 on the entire set.

UTF-8 data stored directly in 8-bit DNS labels would be subject to
the 63-octet limit.  This is irrelevant to IDNs, because IDNs do not
store UTF-8 data directly in 8-bit DNS labels.  IDNA requires that
internationalized labels use their 7-bit ASCII form in DNS.

If someday you want to use UTF-8 forms of internationalized labels
directly in newDNS, you will need to make sure that newDNS allows more
than 63 octets per label.  Or you could use the UTF-8 form when it fits,
and fall back to the ASCII form when UTF-8 doesn't fit.

(Or you could decide it's easier to stick with ASCII in the DNS
protocol, and create the illusion of UTF-8 using a new resolver on the
client.)

Your argument seems to be:

1. An internationalized label in UTF-8 form is a sequence of octets.

2. RFC 1035 limits labels to 63 octets.

3. Therefore internationalized labels must have no more than 63 octets in
   UTF-8 form.

But you could try the same argument for UTF-16, and EUC-KR, and
(Continue reading)

Soobok Lee | 14 Oct 10:25
Picon

Re: length restrictions on IDN label

Adam M. Costello wrote:

>  
>
>IDNA created some brand new kinds of labels that had never existed
>before: non-ASCII textual labels.  They have never appeared in DNS,
>cannot appear in DNS, and will not be able to appear in DNS unless DNS
>is updated to support them (because the only text supported by today's
>DNS is ASCII).  These new non-ASCII textual labels are outside the
>universe of labels defined by RFC 1035, and therefore the RFC 1035
>length restriction does not apply to them (not directly, although it
>applies to their corresponding ASCII forms).
>
Okay.
non-ACE textual labels can be used and rendered in document format.

But, IDNA section 6.1 goes further than that
 by allowing _protocols_ to use non-ACE labels  which are
not presentation forms nor textual labels, but protocol elements.
What if future ESMTP allows utf8 encodings in RCPT: headers ?

Would you comment on this section?

6.1 Entry and display in applications

(snip)

In protocols and document formats that define how to handle
specification or negotiation of charsets, labels can be encoded in any
charset allowed by the protocol or document format. If a protocol or
(Continue reading)

Adam M. Costello | 15 Oct 00:22

Re: length restrictions on IDN label

Erik Nordmark <Erik.Nordmark <at> sun.com> wrote:

> > an internationalized label can represent at most 63 code points,
> > whether it's ACE or not.  A given encoding uses a bounded number of
> > octets per code point, so you can allocate your buffers based on
> > that.
>
> 63 code points is presumably a conservative number.  Given the 4 octet
> ACE prefix you can only fit a 59 octets worth of punycode output
> per label, hence presumably 59 code points is a tighter limit for
> non-ASCII internationalized labels while 63 code points is the limit
> for ASCII labels.

True, but which limit you care about depends on the encoding.  For
example, if you're using UTF-32, then a regular ASCII label can have 63
code points each occupying 4 octets.

Soobok Lee <lsb <at> postel.co.kr> wrote:

> IDNA section 6.1 goes further than that by allowing _protocols_ to use
> non-ACE labels which are not presentation forms nor textual labels,
> but protocol elements.  What if future ESMTP allows utf8 encodings in
> RCPT: headers ?

Then applications that implement future ESMTP will need to be prepared
for UTF-8 labels to contain more than 63 octets.  This is not a
problem, because any application that can even think about using
non-ASCII labels is aware of IDNA, and therefore knows the definition of
internationalized label, and therefore knows that the maximum possible
label length depends on the encoding used.
(Continue reading)

Soobok Lee | 15 Oct 08:10
Picon

Re: length restrictions on IDN label

2002-10-15   07:22, Adam M. Costello :
> Erik Nordmark <Erik.Nordmark <at> sun.com> wrote:
> 
> > > an internationalized label can represent at most 63 code points,
> > > whether it's ACE or not.  A given encoding uses a bounded number of
> > > octets per code point, so you can allocate your buffers based on
> > > that.
> >
> > 63 code points is presumably a conservative number.  Given the 4 octet
> > ACE prefix you can only fit a 59 octets worth of punycode output
> > per label, hence presumably 59 code points is a tighter limit for
> > non-ASCII internationalized labels while 63 code points is the limit
> > for ASCII labels.
> 
> True, but which limit you care about depends on the encoding.  For
> example, if you're using UTF-32, then a regular ASCII label can have 63
> code points each occupying 4 octets.

YES.

> 
> Soobok Lee <lsb <at> postel.co.kr> wrote:
> 
> > IDNA section 6.1 goes further than that by allowing _protocols_ to use
> > non-ACE labels which are not presentation forms nor textual labels,
> > but protocol elements.  What if future ESMTP allows utf8 encodings in
> > RCPT: headers ?
> 
> Then applications that implement future ESMTP will need to be prepared
> for UTF-8 labels to contain more than 63 octets.  This is not a
(Continue reading)

Adam M. Costello | 16 Oct 01:51

Re: length restrictions on IDN label

Soobok Lee <lsb <at> postel.co.kr> wrote:

> My focus was on whether the "UTF-8 labels in ESMTP sessions" are
> legitimate internationalized hostnames (labels) or not at the future
> and at the present time, since IDNA section 6.1 allows utf8 encodings
> of transmitted labels.  Does this section seems to propose changes in
> hostname rules ?

Nothing in section 6.1 reduces the requirement of section 3 item 2:
Whenever a domain name is put into an IDN-unaware domain name slot, it
MUST contain only ASCII characters.

At the present time, all domain name slots in ESMTP are IDN-unaware
(because they predate IDNA).  Therefore, all domain names put into
those slots must contain only ASCII characters.

Section 6.1 is not aimed primarily at this situation, but it does
have implications for this situation.  If ESMTP were to use EBCDIC or
UTF-16 for some slots (which isn't very likely), then domain names
being put into those slots would contain only ASCII characters (that
is, characters from the ASCII repertoire) as required by section 3
item 2, but would need to be encoded using EBCDIC or UTF-16 rather than
the ASCII encoding.  (The name ASCII is old, from the days when people
weren't so careful to distinguish between repertoires and encodings).

If a future revision/extension of ESMTP defines some IDN-aware slots,
then the implications of section 6.1 become more interesting.  If those
slots use, say, UTF-8, then IDNs containing non-ASCII characters could
be put into those slots, and they would need to be encoded using UTF-8.

(Continue reading)

Dave Crocker | 15 Oct 09:59

Re: length restrictions on IDN label

Erik,

Monday, October 14, 2002, 7:29:11 AM, you wrote:

EN> It might also be useful to make the IDNA specification more clear about
EN> the length limit and its implications somewhere in the document.
EN> Strawman:         The length limit for internationalized labels is the
EN> limit         of at most 63 octets of output from ToASCII. The design
EN> of the Punycode encoding is such that some sequences of up to 63 Unicode
EN> code points result in ToASCII output of 63 or less octets, but
EN> all possible Unicode strings of 64 code points result in ToASCII
EN> output of 64 or more octets.

If this limit is a new and independent from the underlying "native" DNS,
what is the reason for it?  If it is not independent, why duplicate it?

d/

Erik Nordmark | 15 Oct 14:28
Picon

Re: length restrictions on IDN label

> If this limit is a new and independent from the underlying "native" DNS,
> what is the reason for it?  If it is not independent, why duplicate it?

It isn't independent since the limit derives from the label length limit
in the DNS protocol, but the limit is also a function of the amount
of compression that punycode performs.

The fact that punycode will never generate a string with less
number of code points than the number of code points in the input
is far from obvious, hence it makes sense in my mind to state this limit.

  Erik

Dave Crocker | 15 Oct 16:16

Re: length restrictions on IDN label

Erik,

Tuesday, October 15, 2002, 5:28:23 AM, you wrote:
EN> It isn't independent since the limit derives from the label length limit
...
EN> The fact that punycode will never generate a string with less number of
EN> code points than the number of code points in the input is far from
EN> obvious, hence it makes sense in my mind to state this limit.

makes sense.  thanks.

/d
-----
Dave <mailto:dcrocker <at> brandenburg.com>

James Seng | 14 Oct 06:18
Picon

Re: length restrictions on IDN label

Let me repeat again,

the restriction is octet(ToASCII(X)) <= 63 (sorry, forget the =).

The length restriction of a domain name (incidently, UTF-8 encoded string
which looks like domain name is not a domain name) in a DNS UDP packet is
something beyond this working group. Please bring it to the DNSEXT working
group.

-James Seng

> What I request is confirmation and clarification about that restriction.
>
> If UTF8-encoded, that valid 8bit label will exceed 63 octets limits (up
> to 168 octets or more)
>  which is imposed by RFC1035 even upon non-ASCII 8bit labels .
>  IDNA section 6.3 does not rule out that  utf8 encoded labels may be used
>  in DNS wire protocols in the future. And that will affect UDP based
>  DNS protocols
>  which suffers from lack of space in the UDP packet length limits (512).
>  Packet truncation or protcol errors are inevitable.
>
> That is why i suggest that some separate length restriction on utf8
> (other encoding) IDN labels
>  be needed. Or clarifications about those problems, at least.
>
> Soobok Lee
>
> >
> >-James Seng
(Continue reading)

Soobok Lee | 14 Oct 06:56
Picon

Re: length restrictions on IDN label


James Seng wrote:

>Let me repeat again,
>
>the restriction is octet(ToASCII(X)) <= 63 (sorry, forget the =).
>
>The length restriction of a domain name (incidently, UTF-8 encoded string
>which looks like domain name is not a domain name) in a DNS UDP packet is
>something beyond this working group. 
>
The restriction orginated from RFC1035 affects all  label 
creation/validations.
And "octet(ToASCII(X)) <= 63" seems to loosen the restrictions about
8bit labels. That is why i think RFC1035 restriction is about to be 
obsoleted.

>Please bring it to the DNSEXT working  group.
>
As a novice IETF pariticpant, I have little experience with DNSEXT WG.
Would  Area Directors make comment on this issue, Erik ?

Soobok Lee

Erik Nordmark | 14 Oct 16:29
Picon

Re: length restrictions on IDN label

> >Please bring it to the DNSEXT working  group.
> >
> As a novice IETF pariticpant, I have little experience with DNSEXT WG.
> Would  Area Directors make comment on this issue, Erik ?

When and if there is a need to carry IDNs in the DNS protocol using octet
values above 127 the DNSEXT WG would be the place to discuss this.
However, I still haven't seen a fully worked out approach for doing this
that takes DNSSEC into account. (draft-hall-dm-idns is the clostest I've seen,
but it doesn't get into DNSSEC issues).

When and if this is approach, there are presumably multiple encodings
that can be used. One is UTF-8; another possibility would be to
define a bootstring algorithm from Unicode to octets in the range 0-255.
The latter would carry the bootstring compression features forward into
these encodings.

Taking length limits into account would be one part of that work.

> UTF8-form of a label may be used as protocol elements , in addition to
> presentation forms.
> In the latter case, there will be no problem as you said above: just
> display it.
> 
> But, in the former case of being used as protocols elements, utf8-form
> label length limit is
> of our concern. IDNA drafts does not rule out that utf-8 form of labels
> may be used as protocol
> elements. This will clairfy my point.

(Continue reading)

Soobok Lee | 15 Oct 07:44
Picon

Re: length restrictions on IDN label

2002-10-14 월 23:29, Erik Nordmark :

> 
> But perhaps IDNA should be more clear that there is a length issue for
> such future work. For instance, would adding a sentence at the end of
> this paragraph be helpful?
> 	If a signaling system which makes negotiation possible between old and
> 	new DNS clients and servers is standardized in the future, the encoding
> 	of the query in the DNS protocol itself can be changed from ACE to
> 	something else, such as UTF-8. The question whether or not this should
> 	be used is, however, a separate problem and is not discussed in this
> 	memo.
> E.g. "Such work would have to take into account the length limit for IDNs
> as specified in this document".
> 
> It might also be useful to make the IDNA specification more clear about
> the length limit and its implications somewhere in the document.
> Strawman:
> 	The length limit for internationalized labels is the limit
> 	of at most 63 octets of output from ToASCII. The design
> 	of the Punycode encoding is such that some sequences of up to 63 Unicode
> 	code points result in ToASCII output of 63 or less octets, but
> 	all possible Unicode strings of 64 code points result in ToASCII
> 	output of 64 or more octets.
> 
> 	Implementation Note:
> 		Depending on the local encoding the software uses for
> 		Unicode this can correspond to a lot more than 63 octets.
> 		Similarely, the total length of a IDN expressed in the local
> 		encoding can significantly exceed 255 octets.
(Continue reading)

Mark.Andrews | 14 Oct 07:27
Favicon

Re: length restrictions on IDN label


> 
> 
> James Seng wrote:
> 
> >Let me repeat again,
> >
> >the restriction is octet(ToASCII(X)) <= 63 (sorry, forget the =).
> >
> >The length restriction of a domain name (incidently, UTF-8 encoded string
> >which looks like domain name is not a domain name) in a DNS UDP packet is
> >something beyond this working group. 
> >
> The restriction orginated from RFC1035 affects all  label 
> creation/validations.
> And "octet(ToASCII(X)) <= 63" seems to loosen the restrictions about
> 8bit labels. That is why i think RFC1035 restriction is about to be 
> obsoleted.

	There is nothing new with the number of octets in presentation
	form being greater those presented on the wire.  In RFC
	1034 0x00 is presented as "\000", 0x5c as "\\" or  "\093",
	0x2e as "\." or "\046".

	Mark

> 
> >Please bring it to the DNSEXT working  group.
> >
> As a novice IETF pariticpant, I have little experience with DNSEXT WG.
(Continue reading)

Soobok Lee | 14 Oct 07:40
Picon

Re: length restrictions on IDN label

Mark.Andrews <at> isc.org wrote:

>>The restriction orginated from RFC1035 affects all  label 
>>creation/validations.
>>And "octet(ToASCII(X)) <= 63" seems to loosen the restrictions about
>>8bit labels. That is why i think RFC1035 restriction is about to be 
>>obsoleted.
>>    
>>
>
>	There is nothing new with the number of octets in presentation
>	form being greater those presented on the wire.  In RFC
>	1034 0x00 is presented as "\000", 0x5c as "\\" or  "\093",
>	0x2e as "\." or "\046".
>
UTF8-form of a label may be used as protocol elements , in addition to
presentation forms.
In the latter case, there will be no problem as you said above: just
display it.

But, in the former case of being used as protocols elements, utf8-form
label length limit is
of our concern. IDNA drafts does not rule out that utf-8 form of labels
may be used as protocol
elements. This will clairfy my point.

Thanks.

Soobok Lee

(Continue reading)

Mark.Andrews | 14 Oct 08:09
Favicon

Re: length restrictions on IDN label


> Mark.Andrews <at> isc.org wrote:
> 
> >>The restriction orginated from RFC1035 affects all  label 
> >>creation/validations.
> >>And "octet(ToASCII(X)) <= 63" seems to loosen the restrictions about
> >>8bit labels. That is why i think RFC1035 restriction is about to be 
> >>obsoleted.
> >>    
> >>
> >
> >	There is nothing new with the number of octets in presentation
> >	form being greater those presented on the wire.  In RFC
> >	1034 0x00 is presented as "\000", 0x5c as "\\" or  "\093",
> >	0x2e as "\." or "\046".
> >
> UTF8-form of a label may be used as protocol elements , in addition to
> presentation forms.
> In the latter case, there will be no problem as you said above: just
> display it.
> 
> But, in the former case of being used as protocols elements, utf8-form
> label length limit is
> of our concern. IDNA drafts does not rule out that utf-8 form of labels
> may be used as protocol
> elements. This will clairfy my point.

	The restrictions have always derived from the DNS wire format.
	Each individual presentation format will have its own maximum
	number of octets however just because a string fits within
(Continue reading)

Soobok Lee | 14 Oct 08:55
Picon

Re: length restrictions on IDN label

About presentation formats, you argument stands.
UTF8 form of labels further can be URLEncoded with %xx sequences into
tripled length of character streams.
That means "56 x U+AC00 " can be converted into
56 x 3 (utf8) x 3 (URLencode) = 504 octets string.
RFC1035 length restriction does not apply to presentation forms. This is
your point. right?

But, my question is about utf8 form as protocol elements (wire format
for future
application or DNS protocols), not as presentation forms.
You can find that in IDNA section 6.3.
Would you make another comments on that section ?

THanks.

Soobok Lee

Mark.Andrews <at> isc.org wrote:

>	The restrictions have always derived from the DNS wire format.
>	Each individual presentation format will have its own maximum
>	number of octets however just because a string fits within
>	that number of octets doesn't mean that it will be valid.
>
>	Invalid 
>	0123456789012345678901234567890123456789012345678901234567891234
>
>	Valid
>	\048\049\050\051\052\053\054\055\056\057\048\049\050\051\052\053
(Continue reading)

Mark.Andrews | 14 Oct 09:52
Favicon

Re: length restrictions on IDN label


> About presentation formats, you argument stands.
> UTF8 form of labels further can be URLEncoded with %xx sequences into
> tripled length of character streams.
> That means "56 x U+AC00 " can be converted into
> 56 x 3 (utf8) x 3 (URLencode) = 504 octets string.
> RFC1035 length restriction does not apply to presentation forms. This is
> your point. right?

	Yes.

> But, my question is about utf8 form as protocol elements (wire format
> for future
> application or DNS protocols), not as presentation forms.
> You can find that in IDNA section 6.3.
> Would you make another comments on that section ?

	Well if you use raw utf-8 today int the DNS you are limited to
	63 octets per label.  255 octets for a domain name.  As long as
	you continue to use the same label encoding you are limited to
	63 octets.  YMMV at levels other than the DNS.

	However 

> THanks.
> 
> Soobok Lee
> 
> 
> Mark.Andrews <at> isc.org wrote:
(Continue reading)

Soobok Lee | 14 Oct 15:10
Picon

Re: length restrictions on IDN label

Mark.Andrews <at> isc.org wrote:

>	Well if you use raw utf-8 today int the DNS you are limited to
>	63 octets per label.  255 octets for a domain name.  As long as
>	you continue to use the same label encoding you are limited to
>	63 octets.  YMMV at levels other than the DNS.
>
Sure.
If the limit may vary at levels (applications)other than the DNS, the
utf8 labels may exceed 63 octets
in appliction protocol formats (not for display) and the implementors
should reserve enough buffer
spaces for ToUnicoded(ACE) utf8 labels. This really matters because many
programmers favor
utf8 as internal representation format of unicode strings for its ascii
compatibility.

Most applications programmer have been reserving 256 bytes for any LDH
FQDN buffer space .
But that convention should be changed to cover the cases of long utf8
IDN FQDN which may be
3 or 4 times longer than 256 octets. So, 1024 or 768 bytes are good. But
those utf8 FQDN cannot
be put into single UDP packet of DNS response/query. This will constrain
future DNS protocol
update efforts around utf8 supports in wire format. TOday's long iDNs
may be one of the obstacles
in the way to the effort.

If this warning is neglected by application programmers,
(Continue reading)

Mark.Andrews | 14 Oct 23:35
Favicon

Re: length restrictions on IDN label


> Mark.Andrews <at> isc.org wrote:
> 
> >	Well if you use raw utf-8 today int the DNS you are limited to
> >	63 octets per label.  255 octets for a domain name.  As long as
> >	you continue to use the same label encoding you are limited to
> >	63 octets.  YMMV at levels other than the DNS.
> >
> Sure.
> If the limit may vary at levels (applications)other than the DNS, the
> utf8 labels may exceed 63 octets
> in appliction protocol formats (not for display) and the implementors
> should reserve enough buffer
> spaces for ToUnicoded(ACE) utf8 labels. This really matters because many
> programmers favor
> utf8 as internal representation format of unicode strings for its ascii
> compatibility.
> 
> Most applications programmer have been reserving 256 bytes for any LDH
> FQDN buffer space .

	Most applications will use NS_MAXDNAME or MAXDNAME to store
	a domain name today as that is enough to cover all the
	potential expansions from wire to RFC 1034 presentation
	format.  Have a look at NS_MAXDNAME in <arpa/nameser.h>.
	Unless you have a really old OS it will be at least (250*4
	+ 5) (minimum number of label that can appear in a full
	length domain name is 4 labels + root, all 4 periods +
	null).

(Continue reading)

Soobok Lee | 15 Oct 03:07
Picon

Re: length restrictions on IDN label

Mark.Andrews <at> isc.org wrote:

>> Most applications programmer have been reserving 256 bytes for any LDH
>>
>>FQDN buffer space .
>>    
>>
>
>	Most applications will use NS_MAXDNAME or MAXDNAME to store
>	a domain name today as that is enough to cover all the
>	potential expansions from wire to RFC 1034 presentation
>	format.  Have a look at NS_MAXDNAME in <arpa/nameser.h>.
>	Unless you have a really old OS it will be at least (250*4
>	+ 5) (minimum number of label that can appear in a full
>	length domain name is 4 labels + root, all 4 periods +
>	null).
>
If you have a linux (Redhat) box, "chdir /usr/include; grep MAXHOST */*".
Then you will get the enclosed header lists.
Many applications headers use 64/65/257 except for resolver/network
libraries headers which
use 1024/1025 for FQDN. If you have openssh 3.4p1 sources, you can find
that it uses MAXHOSTNAMELEN (64) in
some place in addition to 1025 in other places. of course, 64 is for a
hostname label length.
See rpc/* headers.

asm/param.h:#define MAXHOSTNAMELEN 64 /* max length of hostname */
imap/mail.h:#define NETMAXHOST 65
imap/mail.h: char host[NETMAXHOST]; /* host name (may be canonicalized) */
(Continue reading)

Mark.Andrews | 15 Oct 04:23
Favicon

Re: length restrictions on IDN label


> Mark.Andrews <at> isc.org wrote:
> 
> >> Most applications programmer have been reserving 256 bytes for any LDH
> >>
> >>FQDN buffer space .
> >>    
> >>
> >
> >	Most applications will use NS_MAXDNAME or MAXDNAME to store
> >	a domain name today as that is enough to cover all the
> >	potential expansions from wire to RFC 1034 presentation
> >	format.  Have a look at NS_MAXDNAME in <arpa/nameser.h>.
> >	Unless you have a really old OS it will be at least (250*4
> >	+ 5) (minimum number of label that can appear in a full
> >	length domain name is 4 labels + root, all 4 periods +
> >	null).
> >
> If you have a linux (Redhat) box, "chdir /usr/include; grep MAXHOST */*".
> Then you will get the enclosed header lists.
> Many applications headers use 64/65/257 except for resolver/network
> libraries headers which
> use 1024/1025 for FQDN. If you have openssh 3.4p1 sources, you can find
> that it uses MAXHOSTNAMELEN (64) in
> some place in addition to 1025 in other places. of course, 64 is for a
> hostname label length.
> See rpc/* headers.

	Repeat after me: "HOSTNAME != DOMAIN NAME".

(Continue reading)

Soobok Lee | 15 Oct 05:18
Picon

Re: length restrictions on IDN label

Mark.Andrews <at> isc.org wrote:

>
>	Repeat after me: "HOSTNAME != DOMAIN NAME".
>
Everyone knows that here. :-)
Applications , in general, deal with hostnames (smtp?), not abstact
domain names (nslookup,dig?).
It has been clear that i am saying in the context of appliations's
hostname label buffer space problems.

>
>	Hostnames are a subset of domain names (ignoring hostnames
>	that are larger that 253 and hence not supported by the
>	DNS).  If you are only dealing with hostnames then you
>	should be rejecting domain names that are not legal hostnames.
>
Yes.

>	Anything reading unsanitized domainnames has to expect a
>	strings bigger than 255 when converted to RFC 1034 presentation
>	format.
>
You repeated this twice. And i had agreed.

>
>	Some implementations of gethostbyaddr() do this sanitization
>	for you.  This was also one of the reasons IDNs are converted
>	to and from ACSII.  ToUnicode should be perform outside of
>	gethostbyaddr().  Moving it inside of gethostbyaddr() is a API
(Continue reading)

Paul Hoffman / IMC | 14 Oct 15:47
Picon

Re: length restrictions on IDN label

At 10:10 PM +0900 10/14/02, Soobok Lee wrote:
>Most applications programmer have been reserving 256 bytes for any LDH
>FQDN buffer space .

It is amazingly arrogant for anyone to make statements about "most 
applications programmer".

>But that convention should be changed to cover the cases of long utf8
>IDN FQDN which may be
>3 or 4 times longer than 256 octets.

Why just UTF8? Why not UTF16? Or GB? Or ... ?

>If this warning is neglected by application programmers,
>some remote malicious crackers will send to users' applications long ACE
>IDNs manufactured to
>cause buffer overflow errors when toUnicoded and seaze control of the
>machine.

Oh, come on. Step 6 of ToUnicode is exactly two words long. Which one 
of those two words do you think that other applications programmers 
will not understand?

--Paul Hoffman, Director
--Internet Mail Consortium

Soobok Lee | 14 Oct 16:02
Picon

Re: length restrictions on IDN label

Paul Hoffman / IMC wrote:

> At 10:10 PM +0900 10/14/02, Soobok Lee wrote:
>
>> Most applications programmer have been reserving 256 bytes for any LDH
>> FQDN buffer space .
>
>
> It is amazingly arrogant for anyone to make statements about "most 
> applications programmer".

I accepted. :-)

>
>
>> But that convention should be changed to cover the cases of long utf8
>> IDN FQDN which may be
>> 3 or 4 times longer than 256 octets.
>
>
> Why just UTF8? Why not UTF16? Or GB? Or ... ?

I already mentioned other encodings in the early postings.

>
>
>> If this warning is neglected by application programmers,
>> some remote malicious crackers will send to users' applications long ACE
>> IDNs manufactured to
>> cause buffer overflow errors when toUnicoded and seaze control of the
(Continue reading)

Soobok Lee | 14 Oct 07:06
Picon

Re: length restrictions on IDN label

Soobok Lee wrote:

>
>
> James Seng wrote:
>
>> Let me repeat again,
>>
>> the restriction is octet(ToASCII(X)) <= 63 (sorry, forget the =).
>>
>> The length restriction of a domain name (incidently, UTF-8 encoded 
>> string
>> which looks like domain name is not a domain name)
>
I cant' agree on your parenthesized claims.  Please look into IDNA draft 
sections
which mention utf8 encoding as alternative encoding  for labels in 
applications
and future protocols and even in future DNS protocols.

Soobok Lee

>> in a DNS UDP packet is
>> something beyond this working group.
>
> The restriction orginated from RFC1035 affects all  label 
> creation/validations.
> And "octet(ToASCII(X)) <= 63" seems to loosen the restrictions about
> 8bit labels. That is why i think RFC1035 restriction is about to be 
> obsoleted.
(Continue reading)

James Seng | 14 Oct 11:13
Picon

Re: length restrictions on IDN label

> >>(incidently, UTF-8 encoded string which looks like domain name is
> >>not a domain name)
>
> I cant' agree on your parenthesized claims.  Please look into IDNA draft
> sections which mention utf8 encoding as alternative encoding  for labels
in
> applications and future protocols and even in future DNS protocols.

Adam has already answered you on this.

"Yes, but labels in DNS containing octets >= 128 are not
internationalized labels, because internationalized labels use only
octets <= 127 in DNS.  Labels in DNS containing octets >= 128 are
mysterious creatures that have no standard interpretation as text
(because ASCII is the only text encoding used by the DNS standard)."

Therefore, a UTF-8 string which was somehow squeeze into the DNS packet has
no meaning right now.

IDNA do not ignore the possibility of using UTF-8 in DNS Packet. But the use
is yet to be defined. Until it is defined, it is useless to discuss that
possibility.

> As a novice IETF pariticpant, I have little experience with DNSEXT WG.
> Would  Area Directors make comment on this issue, Erik ?

I think even a novice participiant like yourself would know better to lookup
the the DNSEXT yourself then to bother ADs with such little nitpicks.

http://www.ietf.org/html.charters/dnsext-charter.html
(Continue reading)

Soobok Lee | 14 Oct 11:47
Picon

Re: length restrictions on IDN label

James Seng wrote:

>Therefore, a UTF-8 string which was somehow squeeze into the DNS packet has
>no meaning right now.
>
>IDNA do not ignore the possibility of using UTF-8 in DNS Packet. But the use
>is yet to be defined. Until it is defined, it is useless to discuss that
>possibility.
>
>  
>
Your argument stands only in DNS on-the-wire context. IN all other 
protocol/display
 contexts, utf8 labels are legitimate domain labels. That is why zone 
admins may input
their labels in utf8 into their zone files and end users recoginze them 
as  domain names.

Moreover,
I didn't limit my question into  DNS and its protocols (just 'protocol', 
DNS is not the  'all')
Rather, that may span into all application protocols that use domain names,
like future ESMTP.

There will be many application protocols (IETF's or home-made) that
may exchange utf8-form labels as protocols elements (not for human eyes)
according to IDNA section 6.1-3. IDNA draft granted such use of utf8
labels as legitimate one.

Length restriction should be clarified before  IDN  deployments.
(Continue reading)

James Seng | 14 Oct 12:18
Picon

Re: length restrictions on IDN label

> Your argument stands only in DNS on-the-wire context. IN all other
> protocol/display
>  contexts, utf8 labels are legitimate domain labels. That is why zone
> admins may input
> their labels in utf8 into their zone files and end users recoginze them
> as  domain names.

utf8 labels have no meaning as domain name labels, as currently defined in
RFC 1034/1035, whether on wire or otherwise. anything above 127 is a
no-man-zone-use-it-at-your-own-risk.

If you mean IDN labels encoded in UTF-8, then the restriction is fairly
simply, one already explained to you multiple times, octet(ToASCII(X))<=63.

> Moreover,
> I didn't limit my question into  DNS and its protocols (just 'protocol',
> DNS is not the  'all')
> Rather, that may span into all application protocols that use domain
names,
> like future ESMTP.

Then it is worst then you think. This working group is definately not going
to be able to address all the concerns of all application protocols that
uses domain names, much less a future work in ESMTP. That work should be
address in other specific applications protocol and DEFINATELY not in this
working group.

I still hope to conclude this working group within the next few months and
get on with my life, dealing with other IDN related issues.

(Continue reading)

Soobok Lee | 14 Oct 12:49
Picon

Re: length restrictions on IDN label

James Seng wrote:

>>Your argument stands only in DNS on-the-wire context. IN all other
>>protocol/display
>> contexts, utf8 labels are legitimate domain labels. That is why zone
>>admins may input
>>their labels in utf8 into their zone files and end users recoginze them
>>as  domain names.
>>    
>>
>
>utf8 labels have no meaning as domain name labels, as currently defined in
>RFC 1034/1035, whether on wire or otherwise. anything above 127 is a
>no-man-zone-use-it-at-your-own-risk.
>
Then, you should not convert ACE-form iDN int  UTF8 and render it for users.
Doesn't this Looks absurd ?  You claims seem against IDNA drafts.
My current argument is based on IDNA draft section 6.

>
>If you mean IDN labels encoded in UTF-8, then the restriction is fairly
>simply, one already explained to you multiple times, octet(ToASCII(X))<=63.
>
>  
>
>>Moreover,
>>I didn't limit my question into  DNS and its protocols (just 'protocol',
>>DNS is not the  'all')
>>Rather, that may span into all application protocols that use domain
>>    
(Continue reading)

James Seng | 14 Oct 12:52
Picon

Re: length restrictions on IDN label

> Then, you should not convert ACE-form iDN int  UTF8 and render it for
users.
> Doesn't this Looks absurd ?  You claims seem against IDNA drafts.
> My current argument is based on IDNA draft section 6.

Section 6.1 describe how "Entry and display in application". It makes no
mention of UTF-8 as can read. Which paragraph are you referring to?

> As one security programmer, i think the above issue is very important.
> James, you begin to be nervous, typical of you. Be comfortable and easy.
> The audiences are knowledgable,experienced and prudent enough.

What makes you think I am nervous? Obviously you didnt catch the sacarism in
FUD...

-James Seng

James Seng | 14 Oct 13:00
Picon

Re: length restrictions on IDN label

The length restriction in IDN label is confusing comes from a confused
individual. The specification is quite clear that the only restriction is
based on the octet length of the ToASCII string.

So unless there is others who wish to continue this topic, lets put this
thread to a rest.

-James Seng

Soobok Lee | 14 Oct 13:03
Picon

Re: length restrictions on IDN label

James Seng wrote:

>>Then, you should not convert ACE-form iDN int  UTF8 and render it for
>>    
>>
>users.
>  
>
>>Doesn't this Looks absurd ?  You claims seem against IDNA drafts.
>>My current argument is based on IDNA draft section 6.
>>    
>>
>
>Section 6.1 describe how "Entry and display in application". It makes no
>mention of UTF-8 as can read. Which paragraph are you referring to?
> 
>
Look into Section 6.1 and the IDNA architecture figure, please.
The authors themselves may clarifiy the reading of the section,
better than you James and me.

============================

6.1 Entry and display in applications

(snip)

In protocols and document formats that define how to handle
specification or negotiation of charsets, labels can be encoded in any
charset allowed by the protocol or document format. If a protocol or
(Continue reading)

Soobok Lee | 14 Oct 11:55
Picon

Re: length restrictions on IDN label

Soobok Lee wrote:

> James Seng wrote:
>
>> Therefore, a UTF-8 string which was somehow squeeze into the DNS 
>> packet has
>> no meaning right now.
>>
>> IDNA do not ignore the possibility of using UTF-8 in DNS Packet. But 
>> the use
>> is yet to be defined. Until it is defined, it is useless to discuss that
>> possibility.
>>
>>  
>>
> Your argument stands only in DNS on-the-wire context. IN all other 
> protocol/display
> contexts, utf8 labels are legitimate domain labels. That is why zone 
> admins may input
> their labels in utf8 into their zone files and

of course, the dns server may read them and convert into ACE forms on 
the fly in reading
zone files. Just to prevent misunderstating the sentence. Soobok Lee


Gmane