John C Klensin | 16 May 2012 23:52

LANG, UTF-8, and POP3

Questions for the working group:

(1) In reading the POP3 document, I notice that it is possible
for a client to request and a server to process LANG without
supporting UTF8.  In the interest of simplification of
variations and with the knowledge that very few languages can be
completely supported by ASCII (and English is _not_ one of
them), I suggest we consider requiring UTF8 capability if LANG
is supported/requested.  

The logical alternative is for a server to have to support both
UTF-8 and, e.g., transliterated / ASCII-mapped message forms for
the various languages and, at least IMO, we really don't want to
go there.

UTF8 without LANG makes sense (elementary EAI/ SMTPUTF8 support
for addresses and headers) but I suggest that LANG without UTF8
does not

Note that adopting this change requires no change to the
document other than a well-placed MUST and maybe a one sentence
explanation.

    john
Barry Leiba | 17 May 2012 03:56
Picon
Favicon
Gravatar

Re: LANG, UTF-8, and POP3

> UTF8 without LANG makes sense (elementary EAI/ SMTPUTF8 support
> for addresses and headers) but I suggest that LANG without UTF8
> does not

I don't see how they're related.  The server's being able to put
non-English on protocol messages, and having its messages translated
into 17 languages, is entirely unrelated to its ability to handle
UTF-8 in email address fields.

On the other hand, I can't see anyone looking at this spec and saying,
"Well, I can't/won't do the rest of it, but the LANG command looks
interesting.  I think I'll just implement that."

Barry
Martin J. Dürst | 22 May 2012 06:06
Picon
Gravatar

Re: LANG, UTF-8, and POP3

On 2012/05/17 10:56, Barry Leiba wrote:
>> UTF8 without LANG makes sense (elementary EAI/ SMTPUTF8 support
>> for addresses and headers) but I suggest that LANG without UTF8
>> does not
>
> I don't see how they're related.  The server's being able to put
> non-English on protocol messages, and having its messages translated
> into 17 languages, is entirely unrelated to its ability to handle
> UTF-8 in email address fields.

Well, except that most languages, to be written decently, need non-ASCII 
characters.

> On the other hand, I can't see anyone looking at this spec and saying,
> "Well, I can't/won't do the rest of it, but the LANG command looks
> interesting.  I think I'll just implement that."

Agreed. Also, the other way round, I can't see anyone looking at the 
spec and saying: Implementing LANG is way too complex, I'll never get 
that done. A trivial implementation just means that the following two 
things have to work:

       C: LANG Foo
       S: -ERR invalid language Foo

       C: LANG
       S: -ERR Server is unable to list languages

I'm not very familiar with POP3, but maybe this even could be shortened 
to a single uniform reply:
(Continue reading)

Barry Leiba | 22 May 2012 14:58
Picon
Favicon
Gravatar

Re: LANG, UTF-8, and POP3

>>> UTF8 without LANG makes sense (elementary EAI/ SMTPUTF8 support
>>> for addresses and headers) but I suggest that LANG without UTF8
>>> does not
>>
>> I don't see how they're related.  The server's being able to put
>> non-English on protocol messages, and having its messages translated
>> into 17 languages, is entirely unrelated to its ability to handle
>> UTF-8 in email address fields.
>
> Well, except that most languages, to be written decently, need non-ASCII
> characters.

So?  What does being able to put non-ASCII characters in the
human-readable messages have to do with being able to handle UTF-8 in
email address fields?

b
Arnt Gulbrandsen | 22 May 2012 15:11
Picon
Favicon
Gravatar

Re: LANG, UTF-8, and POP3

We have two options:

1. The server is permitted to issue "unknown command" on LANG if it does
UTF8 but not LANG.
2. The server is permitted to issue "unimplemented command" on LANG if
it does UTF8 but not LANG.

Pardon my yawn.

Arnt
John C Klensin | 22 May 2012 22:01

Re: LANG, UTF-8, and POP3


--On Tuesday, May 22, 2012 15:11 +0200 Arnt Gulbrandsen
<arnt <at> gulbrandsen.priv.no> wrote:

> We have two options:
> 
> 1. The server is permitted to issue "unknown command" on LANG
> if it does UTF8 but not LANG.
> 2. The server is permitted to issue "unimplemented command" on
> LANG if it does UTF8 but not LANG.

Because I really hate the "try to guess what character code the
string might be in" case, I pretty much agree.  There is a third
alternative,

	3. LANG implies UTF-8 for anything that might be
	affected by LANG but not otherwise.

But that takes us right back to "UTF-8 for some things but not
others" issue that we rejected on the Jabber chat.

Wrt what message the server should issue if the client issues
LANG but not UTF8, I share your yawn but would prefer that
something be picked.

   john
Arnt Gulbrandsen | 22 May 2012 22:11
Picon
Favicon
Gravatar

Re: LANG, UTF-8, and POP3

On 05/22/2012 10:01 PM, John C Klensin wrote:
> But that takes us right back to "UTF-8 for some things but not
> others" issue that we rejected on the Jabber chat.

We have that anyway. Consider a server which speaks LANG and a client 
which groks EAI. That pair may still deal with a good old MIME message 
containing

   content-type: text/plain; charset=iso-646-10

If you feel strongly about this, I'll go along on the strength of your 
intuition. A sense of design is very often prescient, even if it nothing 
concrete is known to support it at first.

Arnt

PS: I've seen ISO 646 in real mail this millenium.
John C Klensin | 22 May 2012 18:21

Re: LANG, UTF-8, and POP3


--On Tuesday, May 22, 2012 08:58 -0400 Barry Leiba
<barryleiba <at> computer.org> wrote:

>>>> UTF8 without LANG makes sense (elementary EAI/ SMTPUTF8
>>>> support for addresses and headers) but I suggest that LANG
>>>> without UTF8 does not
>>> 
>>> I don't see how they're related.  The server's being able
>>> to put non-English on protocol messages, and having its
>>> messages translated into 17 languages, is entirely unrelated
>>> to its ability to handle UTF-8 in email address fields.
>> 
>> Well, except that most languages, to be written decently,
>> need non-ASCII characters.

"Most" is a bit of an understatement.  Not even English
qualifies (although many texts in English do).

> So?  What does being able to put non-ASCII characters in the
> human-readable messages have to do with being able to handle
> UTF-8 in email address fields?

Barry, please reread what I wrote:  IMO, UTF-8 without LANG
makes perfectly good sense and email address fields and similar
information is certainly part of the reason.  I was just
objecting to LANG without UTF-8, which I think is a setup for
the use of *and battles over) arbitrary character codings.

I do not believe this is a big enough deal to justify holding up
(Continue reading)

Shawn Steele | 23 May 2012 21:24
Picon
Favicon

Re: LANG, UTF-8, and POP3

I've been ignoring the whole thread, but I don't get what LANG is for?  

If I'm receiving mail in UTF-8, then I get UTF-8 mail and all is fine, what's LANG do for me?

If I'm sending mail in UTF-8, then the server can send it with SMTPUTF8.  If it can't be forwarded it's no more
stuck that current EAI stuff, so what's LANG for?

So, IMO, the LANG thing seems orthogonal to the EAI UTF8 support problem, so I don't get why it lives in this draft.

-Shawn
John C Klensin | 23 May 2012 22:55

Re: LANG, UTF-8, and POP3


--On Wednesday, May 23, 2012 19:24 +0000 Shawn Steele
<Shawn.Steele <at> microsoft.com> wrote:

> I've been ignoring the whole thread, but I don't get what LANG
> is for?  
> 
> If I'm receiving mail in UTF-8, then I get UTF-8 mail and all
> is fine, what's LANG do for me?

Primarily, it permits the server to deliver error message text
in the local language.  Whether that is a good idea or not
depends on the implementation model and, in particular, how one
breaks up MUA and IMAP/POP server responsibilities in that
particular UA model.

FWIW, I'm not a good person to be defending this because I have
long favored codes that are as specific as possible,
standardized text in a canonical/interchange language to the
degree needed to supplement those codes, and then local
translation in the client.  I.e., localized message strings
appear only in the UI and not on the wire, even when that wire
is between the two pars of a split UA.  But I've been very much
in the rough on most of the occasions when this, and similar
issues, have come up in the IETF.  That said, POP (especially)
is an extra-difficult case because, unlike, e.g., SMTP, it
doesn't really use codes -- all the information other than
"success" or "failure" is in the text.

Partially because of that and partially because I believe that
(Continue reading)

Shawn Steele | 23 May 2012 23:04
Picon
Favicon

Re: LANG, UTF-8, and POP3

> Primarily, it permits the server to deliver error message text in the local language.

I'd forgotten that :)

>>  So, IMO, the LANG thing seems orthogonal to the EAI UTF8 support 
>>  problem, so I don't get why it lives in this draft.

> Where "this draft" is the IMAP spec?

Yes, and I still see it as orthogonal to UTF-8 support.  Eg: IMO, this is a "localized error messages"
feature, not a "deliver/receive UTF-8 mail" feature.  And, as you mentioned, it could be done on the
client.  Additionally, there are some tools/forwarders that use POP or IMAP to grab mailboxes but aren't
the actual end user client.  In those cases, it could even be kind of bad because it might end up at some
intermediate language instead of the true end user's language.

I'd be happy if LANG was cut so we don't have to talk about it.  If there's enough interest, it could be its own draft.

-Shawn
John C Klensin | 24 May 2012 00:17

Re: LANG, UTF-8, and POP3


--On Wednesday, May 23, 2012 21:04 +0000 Shawn Steele
<Shawn.Steele <at> microsoft.com> wrote:

>> Primarily, it permits the server to deliver error message
>> text in the local language.
> 
> I'd forgotten that :)
> 
>>>  So, IMO, the LANG thing seems orthogonal to the EAI UTF8
>>>  support  problem, so I don't get why it lives in this draft.
> 
>> Where "this draft" is the IMAP spec?

First, please will everyone I've confused forgive me.  LANG is
in the POP3 spec but is not in the (our) IMAP spec.  So we are
talking about POP3 here, per the subject line.

> Yes, and I still see it as orthogonal to UTF-8 support.  Eg:
> IMO, this is a "localized error messages" feature, not a
> "deliver/receive UTF-8 mail" feature.  And, as you mentioned,
> it could be done on the client.  Additionally, there are some
> tools/forwarders that use POP or IMAP to grab mailboxes but
> aren't the actual end user client.  In those cases, it could
> even be kind of bad because it might end up at some
> intermediate language instead of the true end user's language.

I think those tools would be pretty stupid to specify a language
and, if they were implemented for that purpose only, to even
have the code there to support LANG.  From a client point of
(Continue reading)

Jiankang YAO | 24 May 2012 05:17
Picon

Re: LANG, UTF-8, and POP3


Why LANG was there since UTF-8 is enabled?
I think that one reason is that:
Unicode includes many languages' characters such as Chinese, Korean, and Russia.....
If there is a LANG command, the server and client can talk with the languages preferred by client and server.
Otherwise, the server may respond with Korean characters, but the user of the client may not know it and say
"pls send me with Chinese".

Jiankang Yao 

----- Original Message ----- 
From: "Shawn Steele" <Shawn.Steele <at> microsoft.com>
To: "John C Klensin" <klensin <at> jck.com>; <ima <at> ietf.org>
Sent: Thursday, May 24, 2012 5:04 AM
Subject: Re: [EAI] LANG, UTF-8, and POP3

>> Primarily, it permits the server to deliver error message text in the local language.
> 
> I'd forgotten that :)
> 
>>>  So, IMO, the LANG thing seems orthogonal to the EAI UTF8 support 
>>>  problem, so I don't get why it lives in this draft.
> 
>> Where "this draft" is the IMAP spec?
> 
> Yes, and I still see it as orthogonal to UTF-8 support.  Eg: IMO, this is a "localized error messages"
feature, not a "deliver/receive UTF-8 mail" feature.  And, as you mentioned, it could be done on the
client.  Additionally, there are some tools/forwarders that use POP or IMAP to grab mailboxes but aren't
the actual end user client.  In those cases, it could even be kind of bad because it might end up at some
intermediate language instead of the true end user's language.
(Continue reading)

Shawn Steele | 24 May 2012 18:56
Picon
Favicon

Re: LANG, UTF-8, and POP3

It had come up before & I'd forgotten :)  I'm happy with or without it, whatever's easier.

-Shawn

-----Original Message-----
From: Jiankang YAO [mailto:yaojk <at> cnnic.cn] 
Sent: Wednesday, May 23, 2012 8:17 PM
To: Shawn Steele
Cc: John C Klensin; ima <at> ietf.org
Subject: Re: [EAI] LANG, UTF-8, and POP3

Why LANG was there since UTF-8 is enabled?
I think that one reason is that:
Unicode includes many languages' characters such as Chinese, Korean, and Russia.....
If there is a LANG command, the server and client can talk with the languages preferred by client and server.
Otherwise, the server may respond with Korean characters, but the user of the client may not know it and say
"pls send me with Chinese".

Jiankang Yao 
ned+ima | 24 May 2012 02:51

Re: LANG, UTF-8, and POP3


> --On Wednesday, May 23, 2012 19:24 +0000 Shawn Steele
> <Shawn.Steele <at> microsoft.com> wrote:

> > I've been ignoring the whole thread, but I don't get what LANG
> > is for?
> >
> > If I'm receiving mail in UTF-8, then I get UTF-8 mail and all
> > is fine, what's LANG do for me?

> Primarily, it permits the server to deliver error message text
> in the local language.  Whether that is a good idea or not
> depends on the implementation model and, in particular, how one
> breaks up MUA and IMAP/POP server responsibilities in that
> particular UA model.

As a practical matter, localization has to happen somewhere, and there
are advantages and disadvantages to every possible location.

Putting it in the back end server has the advantage that it saves having to do
the work in what are often multiple clients. And the server is in the best
position to know what a given error actually means.

The main disadvantage is lack of flexibility - the server cannot respond to
local client conditions because it is usually unaware of them.

The conditions are flipped in the client case, of course.

THe one place I really don't like seeing it done is in the web service layer.
There are probably arguments for doing it there, but I've yet to hear one
(Continue reading)

Jiankang YAO | 25 May 2012 09:47
Picon

Re: LANG, UTF-8, and POP3


----- Original Message ----- 
From: "John C Klensin" <klensin <at> jck.com>
To: <ima <at> ietf.org>
Sent: Thursday, May 17, 2012 5:52 AM
Subject: [EAI] LANG, UTF-8, and POP3

> Questions for the working group:
> 
> (1) In reading the POP3 document, I notice that it is possible
> for a client to request and a server to process LANG without
> supporting UTF8.  In the interest of simplification of
> variations and with the knowledge that very few languages can be
> completely supported by ASCII (and English is _not_ one of
> them), I suggest we consider requiring UTF8 capability if LANG
> is supported/requested.  
> 
> The logical alternative is for a server to have to support both
> UTF-8 and, e.g., transliterated / ASCII-mapped message forms for
> the various languages and, at least IMO, we really don't want to
> go there.
> 
> UTF8 without LANG makes sense (elementary EAI/ SMTPUTF8 support
> for addresses and headers) but I suggest that LANG without UTF8
> does not
> 
> Note that adopting this change requires no change to the
> document other than a well-placed MUST and maybe a one sentence
> explanation.
> 
(Continue reading)

John C Klensin | 25 May 2012 10:07

Re: LANG, UTF-8, and POP3


--On Friday, May 25, 2012 15:47 +0800 Jiankang YAO
<yaojk <at> cnnic.cn> wrote:

>...
> LANG without supporting UTF8 is not good. LANG must have UTF8.
> I will try to clarify some words in the new version.

Excellent, thanks.   The clarification could be as simple as a
sentence in the LANG description that says "Because of the need
to use non-ASCII characters to represent almost all languages in
a comprehensive way, the LANG option MUST NOT be specified
unless UTF8 is also specified".  But I think that is a matter
for editorial discretion.

Anyone who objects to such a change should speak up, and
explain, immediately.

   john
ned+ima | 25 May 2012 21:06

Re: LANG, UTF-8, and POP3

I have to say I don't really care for this approach, because it effectively
requires a server to implement UTF8 (difficult) in order to implement LANG
(easy).

I'd rather say that the client's use of LANG implicitly authorizes the server
to send replies in UTF-8, no matter what language ends up getting selected (or
not selected).

But if people really prefer the dependency, I can live with it.

				Ned

P.S. Looking at the specification, another question: Why was the approach of
enabling the use of UTF-8 in USER and PASS taken rather than simply requiring
AUTH support in order to suport UTF8? This is a case where coupling makes
considerably more sense.

> --On Friday, May 25, 2012 15:47 +0800 Jiankang YAO
> <yaojk <at> cnnic.cn> wrote:

> >...
> > LANG without supporting UTF8 is not good. LANG must have UTF8.
> > I will try to clarify some words in the new version.

> Excellent, thanks.   The clarification could be as simple as a
> sentence in the LANG description that says "Because of the need
> to use non-ASCII characters to represent almost all languages in
> a comprehensive way, the LANG option MUST NOT be specified
> unless UTF8 is also specified".  But I think that is a matter
> for editorial discretion.
(Continue reading)

John C Klensin | 25 May 2012 21:59

Re: LANG, UTF-8, and POP3

Ned,

Just trying to make sure we are seeing the issues the same way,
even if we still reach different conclusions.

--On Friday, May 25, 2012 12:06 -0700 Ned Freed
<ned.freed <at> mrochek.com> wrote:

> I have to say I don't really care for this approach, because
> it effectively requires a server to implement UTF8 (difficult)
> in order to implement LANG (easy).

First, ancient history now, but I'd contend that standalone LANG
would/should have been out of scope for EAI.  One can argue that
it has utility with traditional ASCII addresses and headers, but
not a lot... and it should have been a separate proposal, not
handled through EAI.  RFC 5721's title reinforces that view: it
is called "POP3 Support for UTF-8", not, e.g., "Assorted i18n
extensions for POP3".  Given that LANG was included in 5721, I
hope that we can just skip dealing with that issue, but...

Second, while there are several languages for which, with care,
one can construct reply messages in ASCII only (e.g., English;
German if the "ue", "oe", etc., conventions are acceptable; I
suppose Chinese if one were willing to write in Pinyin without
tone marks) the vast majority, both Latin-script-based and
otherwise, require non-ASCII characters and that, these days,
effectively means UTF-8.    One could avoid that problem and
make LANG independent of UTF-8 by specifying a mechanism for
encoded words or some other escape convention in the relevant
(Continue reading)

ned+ima | 25 May 2012 22:19

Re: LANG, UTF-8, and POP3

> Ned,

> Just trying to make sure we are seeing the issues the same way,
> even if we still reach different conclusions.

> --On Friday, May 25, 2012 12:06 -0700 Ned Freed
> <ned.freed <at> mrochek.com> wrote:

> > I have to say I don't really care for this approach, because
> > it effectively requires a server to implement UTF8 (difficult)
> > in order to implement LANG (easy).

> First, ancient history now, but I'd contend that standalone LANG
> would/should have been out of scope for EAI.  One can argue that
> it has utility with traditional ASCII addresses and headers, but
> not a lot... and it should have been a separate proposal, not
> handled through EAI.  RFC 5721's title reinforces that view: it
> is called "POP3 Support for UTF-8", not, e.g., "Assorted i18n
> extensions for POP3".  Given that LANG was included in 5721, I
> hope that we can just skip dealing with that issue, but...

It's the same sort of argument that was used, back in the day, against MIME:
The WG was chartered to internationalize email content, so what's all this
multimedia crap doing in there?

As you might guess, that past experience doesn't give me a lot of sympathy for
this argument.

> Second, while there are several languages for which, with care,
> one can construct reply messages in ASCII only (e.g., English;
(Continue reading)

John C Klensin | 25 May 2012 23:28

Re: LANG, UTF-8, and POP3

Ok.  IMO, Ned's position is entirely reasonable.  I don't think
I agree, but the difference is mostly a matter of taste.  Up to
the WG.  Those who have opinions, speak up.  If no one other
than Ned and myself care, I'll either defer to Ned's
implementation experience or ask Joseph to flip a coin.

   john

--On Friday, May 25, 2012 13:19 -0700 Ned Freed
<ned.freed <at> mrochek.com> wrote:

>> Ned,
> 
>> Just trying to make sure we are seeing the issues the same
>> way, even if we still reach different conclusions.
> 
>> --On Friday, May 25, 2012 12:06 -0700 Ned Freed
>> <ned.freed <at> mrochek.com> wrote:
> 
>> > I have to say I don't really care for this approach, because
>> > it effectively requires a server to implement UTF8
>> > (difficult) in order to implement LANG (easy).
> 
>> First, ancient history now, but I'd contend that standalone
>> LANG would/should have been out of scope for EAI.  One can
>> argue that it has utility with traditional ASCII addresses
>> and headers, but not a lot... and it should have been a
>> separate proposal, not handled through EAI.  RFC 5721's title
>> reinforces that view: it is called "POP3 Support for UTF-8",
>> not, e.g., "Assorted i18n extensions for POP3".  Given that
(Continue reading)

Barry Leiba | 26 May 2012 01:30
Picon
Favicon
Gravatar

Re: LANG, UTF-8, and POP3

>> But it isn't and there aren't. MIME and SASL already gave POP3
>> internationalized bodies, header text, usernames, and
>> passwords. So there are a lot of people out there using POP3
>> who potentially have a problem dealing with POP3 errors as the
>> stand, especially given the crappy error code situation.
>>
>> It also would be different if UTF8 was a snap to implement.
>> But that's not the case either. LANG is *far* simpler than
>> implementing UTF8, mostly because of downgrading.
>
> Ok.  IMO, Ned's position is entirely reasonable.  I don't think
> I agree, but the difference is mostly a matter of taste.  Up to
> the WG.  Those who have opinions, speak up.  If no one other
> than Ned and myself care, I'll either defer to Ned's
> implementation experience or ask Joseph to flip a coin.

Well, I've already said that I don't see the reason to tie them
together, but was willing to agree with your (John's) proposal.  Ned
appears to be in the same boat that I am, so we now have two who think
that UTF-8 support in error messages (LANG) does not need to be tied
to UTF-8 support for email addresses.

Not a strong opinion, and I'm happy to accept it either way, but my
preference would be to keep them separate.

Barry

Barry
John C Klensin | 26 May 2012 08:26

Re: LANG, UTF-8, and POP3


--On Friday, May 25, 2012 19:30 -0400 Barry Leiba
<barryleiba <at> computer.org> wrote:

>...
>>> It also would be different if UTF8 was a snap to implement.
>>> But that's not the case either. LANG is *far* simpler than
>>> implementing UTF8, mostly because of downgrading.
>> 
>> Ok.  IMO, Ned's position is entirely reasonable.  I don't
>> think I agree, but the difference is mostly a matter of
>> taste.  Up to the WG.  Those who have opinions, speak up.
>>  If no one other than Ned and myself care, I'll either defer
>> to Ned's implementation experience or ask Joseph to flip a
>> coin.
> 
> Well, I've already said that I don't see the reason to tie them
> together, but was willing to agree with your (John's)
> proposal.  Ned appears to be in the same boat that I am, so we
> now have two who think that UTF-8 support in error messages
> (LANG) does not need to be tied to UTF-8 support for email
> addresses.
> 
> Not a strong opinion, and I'm happy to accept it either way,
> but my preference would be to keep them separate.

Note that there are two separate issues here:

(1) Is support for UTF-8 (the charset) required if LANG is used,
i.e., if LANG is offered, does the server have the right to
(Continue reading)

Arnt Gulbrandsen | 26 May 2012 09:34
Picon
Favicon
Gravatar

Re: LANG, UTF-8, and POP3

Maybe I'm woozy from too much travel now, but I can't tell which question the imap parameter one refers to.

Arnt
_______________________________________________
IMA mailing list
IMA <at> ietf.org
https://www.ietf.org/mailman/listinfo/ima
Barry Leiba | 26 May 2012 15:02
Picon
Favicon
Gravatar

Re: LANG, UTF-8, and POP3

Yes, and I prefer 2.1 but would accept 2.2.


b

On Saturday, May 26, 2012, John C Klensin wrote:


--On Friday, May 25, 2012 19:30 -0400 Barry Leiba
<barryleiba <at> computer.org> wrote:

>...
>>> It also would be different if UTF8 was a snap to implement.
>>> But that's not the case either. LANG is *far* simpler than
>>> implementing UTF8, mostly because of downgrading.
>>
>> Ok.  IMO, Ned's position is entirely reasonable.  I don't
>> think I agree, but the difference is mostly a matter of
>> taste.  Up to the WG.  Those who have opinions, speak up.
>>  If no one other than Ned and myself care, I'll either defer
>> to Ned's implementation experience or ask Joseph to flip a
>> coin.
>
> Well, I've already said that I don't see the reason to tie them
> together, but was willing to agree with your (John's)
> proposal.  Ned appears to be in the same boat that I am, so we
> now have two who think that UTF-8 support in error messages
> (LANG) does not need to be tied to UTF-8 support for email
> addresses.
>
> Not a strong opinion, and I'm happy to accept it either way,
> but my preference would be to keep them separate.

Note that there are two separate issues here:

(1) Is support for UTF-8 (the charset) required if LANG is used,
i.e., if LANG is offered, does the server have the right to
transmit responses in UTF-8 (the charset)?  The current version
of the spec actually does say "yes".  IMO, it does so in a way
that is a bit too hard to find -- my initial quick reading
didn't find it, which is what started this discussion thread and
proposal.  Worse, what it does say is "This and subsequent
protocol-level human-readable text is encoded in the UTF-8
charset" which could be a little over-broad because _all_ POP3
exchanges are "human-readable text", thereby possibly making the
LANG capability a superset of the UTF8 one.  I don't think
anyone intends, but that is what a plausible reading of the text
says.

(2) Given a "yes, if LANG is used the server must be able to
send +OK and -ERR replies un UTF-8 (the charset)" answer, is the
best way to enable that:

(2.1) Make that a property of LANG, such that using the option
authorizes the server to send non-ASCII responses (in UTF-8, the
charset)?   (This is Ned's proposal and probably what the
existing text was intended to say.)

(2.2) Require support for the whole EAI package in order for
LANG to be used, i.e., the LANG capability requires the UTF8
capability.  (This is my original proposal, consistent with what
I thought was the "don't allow implementing pieces and subsets"
conclusion about IMAP parameters on the call.)

If the preference is 2.1, then does anyone want to review the
IMAP parameter question?

For the record, I actually don't have a strong preference
between 2.1 and 2.2.   I just want to be sure that we make a
decision and that whatever decision we make is completely clear
and obvious in the final text.

  john
_______________________________________________
IMA mailing list
IMA <at> ietf.org
https://www.ietf.org/mailman/listinfo/ima
_______________________________________________
IMA mailing list
IMA <at> ietf.org
https://www.ietf.org/mailman/listinfo/ima
John C Klensin | 26 May 2012 15:26

Re: LANG, UTF-8, and POP3

After thinking about this for a few more hours, I've changed my
mind.  See below.

--On Saturday, May 26, 2012 02:26 -0400 John C Klensin
<klensin <at> jck.com> wrote:

>...
> (2) Given a "yes, if LANG is used the server must be able to
> send +OK and -ERR replies un UTF-8 (the charset)" answer, is
> the best way to enable that:
> 
> (2.1) Make that a property of LANG, such that using the option
> authorizes the server to send non-ASCII responses (in UTF-8,
> the charset)?   (This is Ned's proposal and probably what the
> existing text was intended to say.)
> 
> (2.2) Require support for the whole EAI package in order for
> LANG to be used, i.e., the LANG capability requires the UTF8
> capability.  (This is my original proposal, consistent with
> what I thought was the "don't allow implementing pieces and
> subsets" conclusion about IMAP parameters on the call.)
> 
> If the preference is 2.1, then does anyone want to review the
> IMAP parameter question?
> 
> For the record, I actually don't have a strong preference
> between 2.1 and 2.2.   I just want to be sure that we make a
> decision and that whatever decision we make is completely clear
> and obvious in the final text.

Writing as an individual participant only...

Mixing the principle of making minimal changes at this late date
with Ned's reasoning, I've changed my mind about preferred
outcome.  I propose that we go with 2.1 and that we change the
second sentence of the Section 2 discussion of LANG in 5721bis
to read something like:

	"The LANG capability and command permit a POP3 client to
	negotiate which language the server uses when sending
	human-readable text in replies and changes the character
	set used in those replies from ASCII to Unicode encoded
	in UTF-8."

That correctly explains what the LANG capability and command
actually do up front, rather than burying the UTF-8 part in what
appears to be almost an offhand comment.

Others may still favor 2.2; if so, they should speak up.

best,
   john

	
ned+ima | 26 May 2012 15:59

Re: LANG, UTF-8, and POP3

> Mixing the principle of making minimal changes at this late date
> with Ned's reasoning, I've changed my mind about preferred
> outcome.  I propose that we go with 2.1 and that we change the
> second sentence of the Section 2 discussion of LANG in 5721bis
> to read something like:

> 	"The LANG capability and command permit a POP3 client to
> 	negotiate which language the server uses when sending
> 	human-readable text in replies and changes the character
> 	set used in those replies from ASCII to Unicode encoded
> 	in UTF-8."

This wording seems fine to me - it makes it clear where UTF-8 can
now be used.

> That correctly explains what the LANG capability and command
> actually do up front, rather than burying the UTF-8 part in what
> appears to be almost an offhand comment.

It was sufficiently offhand that I missed it completely.

				Ned
Martin J. Dürst | 29 May 2012 11:58
Picon
Gravatar

Re: LANG, UTF-8, and POP3

I'm fine with 2.1. In case we went for 2.2, I'd again want to propose 
that we'd actually not have a separate LANG capability, because 
implementing the LANG command on the server is, as Ned has said, fairly 
trivial, and even more so if it's only a trivial version (i.e. there's 
only a single language to 'choose' for, which is perfectly okay 
according to how I understand the spec).

Regards,   Martin.

On 2012/05/26 22:26, John C Klensin wrote:
> After thinking about this for a few more hours, I've changed my
> mind.  See below.
>
> --On Saturday, May 26, 2012 02:26 -0400 John C Klensin
> <klensin <at> jck.com>  wrote:
>
>> ...
>> (2) Given a "yes, if LANG is used the server must be able to
>> send +OK and -ERR replies un UTF-8 (the charset)" answer, is
>> the best way to enable that:
>>
>> (2.1) Make that a property of LANG, such that using the option
>> authorizes the server to send non-ASCII responses (in UTF-8,
>> the charset)?   (This is Ned's proposal and probably what the
>> existing text was intended to say.)
>>
>> (2.2) Require support for the whole EAI package in order for
>> LANG to be used, i.e., the LANG capability requires the UTF8
>> capability.  (This is my original proposal, consistent with
>> what I thought was the "don't allow implementing pieces and
>> subsets" conclusion about IMAP parameters on the call.)
>>
>> If the preference is 2.1, then does anyone want to review the
>> IMAP parameter question?
>>
>> For the record, I actually don't have a strong preference
>> between 2.1 and 2.2.   I just want to be sure that we make a
>> decision and that whatever decision we make is completely clear
>> and obvious in the final text.
>
> Writing as an individual participant only...
>
> Mixing the principle of making minimal changes at this late date
> with Ned's reasoning, I've changed my mind about preferred
> outcome.  I propose that we go with 2.1 and that we change the
> second sentence of the Section 2 discussion of LANG in 5721bis
> to read something like:
>
> 	"The LANG capability and command permit a POP3 client to
> 	negotiate which language the server uses when sending
> 	human-readable text in replies and changes the character
> 	set used in those replies from ASCII to Unicode encoded
> 	in UTF-8."
>
> That correctly explains what the LANG capability and command
> actually do up front, rather than burying the UTF-8 part in what
> appears to be almost an offhand comment.
>
> Others may still favor 2.2; if so, they should speak up.
>
> best,
>     john
>
> 	
>
>
>
> _______________________________________________
> IMA mailing list
> IMA <at> ietf.org
> https://www.ietf.org/mailman/listinfo/ima
>
Tony Hansen | 29 May 2012 14:59
Picon
Favicon

Re: LANG, UTF-8, and POP3

On 5/26/2012 9:26 AM, John C Klensin wrote:
> Writing as an individual participant only...
>
> Mixing the principle of making minimal changes at this late date
> with Ned's reasoning, I've changed my mind about preferred
> outcome.  I propose that we go with 2.1 and that we change the
> second sentence of the Section 2 discussion of LANG in 5721bis
> to read something like:
>
> 	"The LANG capability and command permit a POP3 client to
> 	negotiate which language the server uses when sending
> 	human-readable text in replies and changes the character
> 	set used in those replies from ASCII to Unicode encoded
> 	in UTF-8."
>
> That correctly explains what the LANG capability and command
> actually do up front, rather than burying the UTF-8 part in what
> appears to be almost an offhand comment.

I'm happy with this text.

     Tony
Alexey Melnikov | 30 May 2012 13:32
Favicon

Re: LANG, UTF-8, and POP3

On 29/05/2012 13:59, Tony Hansen wrote:
> On 5/26/2012 9:26 AM, John C Klensin wrote:
>> Writing as an individual participant only...
>>
>> Mixing the principle of making minimal changes at this late date
>> with Ned's reasoning, I've changed my mind about preferred
>> outcome.  I propose that we go with 2.1 and that we change the
>> second sentence of the Section 2 discussion of LANG in 5721bis
>> to read something like:
>>
>>     "The LANG capability and command permit a POP3 client to
>>     negotiate which language the server uses when sending
>>     human-readable text in replies and changes the character
>>     set used in those replies from ASCII to Unicode encoded
>>     in UTF-8."
>>
>> That correctly explains what the LANG capability and command
>> actually do up front, rather than burying the UTF-8 part in what
>> appears to be almost an offhand comment.
>
> I'm happy with this text.

Works for me as well.
Shawn Steele | 30 May 2012 01:07
Picon
Favicon

Re: LANG, UTF-8, and POP3

While, technically I agree that LANG shouldn't depend on UTF8, and I think it's not really needed, I'm happy
with keeping it.  I'm also happy with requiring UTF8 because, while LANG might be easier, certainly UTF8 is
much more important to the entire mail ecosystem and I'd much prefer developers spend time getting UTF8
right.  If that makes one extra app have UTF8 because they wanted LANG, I can live with that :)

So I prefer the 2.2:  Use of LANG requires UTF8 support, because that pushes the message that UTF8 support is important.

-Shawn
Barry Leiba | 30 May 2012 01:49
Picon
Favicon
Gravatar

Re: LANG, UTF-8, and POP3

> So I prefer the 2.2:  Use of LANG requires UTF8 support, because that pushes
> the message that UTF8 support is important.

Sorry: I have to push back on this rather strongly.
We should not be making protocol decisions because we want to push any
messages about importance.  We need to make protocol decisions based
on what's right for the protocol.

Barry
ned+ima | 30 May 2012 01:52

Re: LANG, UTF-8, and POP3

> > So I prefer the 2.2:  Use of LANG requires UTF8 support, because that pushes
> > the message that UTF8 support is important.

> Sorry: I have to push back on this rather strongly.
> We should not be making protocol decisions because we want to push any
> messages about importance.  We need to make protocol decisions based
> on what's right for the protocol.

Although I disagree with Shawn, I wasn't going to comment on it until I
saw this.

Barry, you are of course correct. Pushing decisions about what's important
through protocol decisions is a bad thing, and especially when it results
in a coupling that doesn't need to be there.

				Ned
Shawn Steele | 30 May 2012 02:58
Picon
Favicon

Re: LANG, UTF-8, and POP3

> > Sorry: I have to push back on this rather strongly.
> > We should not be making protocol decisions because we want to push any 
> > messages about importance.  We need to make protocol decisions based 
> > on what's right for the protocol.

> Barry, you are of course correct. Pushing decisions about what's important
> through protocol decisions is a bad thing

In that case LANG shouldn't be there.  It's doing the same thing in reverse.  Either LANG's important for
UTF-8 mail, or it's not.  But it was argued that at this point it should be left in basically for processes,
which seems way worse than for "what's important".

And, although I buy the higher level philosophical point, as devil's advocate isn't every protocol
decision a decision about what's important?  Eg: cool but unimportant things are often just left out. 
Other things get a MUST because they're the right thing to do, not necessarily because there are technical
limitations that force their presence.  For example, 3492 is very permissive, allowing Punycode most
anywhere.  However 6530 discourages that practice in EAI.

If LANG is permitted without UTF-8 mail, then there are a bunch of questions about code pages, encodings,
etc.  Technically they could probably be solved, or ignored, but it works way better with UTF-8 mail.  If a
solution is solved (or ignored) that allows LANG not requiring UTF-8, then I MUST test LANG and no UTF-8 for
interoperability, even if I have no intention in supporting the LANG w/o UTF8 case in my client or server. 
That protocol decision adds overhead and complexity for every application developer for a scenario that
we've all pretty much agreed is a not a desirable case.  That means less time fixing real bugs or adding
meaningful extensions to the standard.  So I think it's fair to ask for a baseline of support for new
features, in this case supporting UTF8 when supporting LAN
 G.

Making protocol decisions without considering the total cost of those decisions is also a bad thing.

That said, I may have misused the word "important".  LANG opens up the problems of other scripts and stuff,
which leads to the 3 options mentioned before:  Require UTF8, restrict it to ASCII, or figure out how to
declare/encode stuff in other encodings.  Since our entire set of RFCs is trying to advocate UTF8 over the
current mixed up mail encoding story, the 3rd option seems a non-starter.  So requiring UTF8 for LANG seems
reasonable as the simplest way to get reliable LANG support.

-Shawn
John C Klensin | 30 May 2012 06:20

Re: LANG, UTF-8, and POP3

Speaking personally only.  I think we have a combination of a
misunderstanding or two and a legitimate philosophical
disagreement here.  Mainly...

--On Wednesday, May 30, 2012 00:58 +0000 Shawn Steele
<Shawn.Steele <at> microsoft.com> wrote:

>> > Sorry: I have to push back on this rather strongly.
>> > We should not be making protocol decisions because we want
>> > to push any  messages about importance.  We need to make
>> > protocol decisions based  on what's right for the protocol.
> 
>> Barry, you are of course correct. Pushing decisions about
>> what's important through protocol decisions is a bad thing
> 
> In that case LANG shouldn't be there.  It's doing the same
> thing in reverse.  Either LANG's important for UTF-8 mail, or
> it's not.  But it was argued that at this point it should be
> left in basically for processes, which seems way worse than
> for "what's important".

I don't think anyone has argued "for processes".  Some,
including myself, have argued that the time to get rid of (or
not have) LANG was long ago and that, if we are going to make
progress, we should not reopen old issues without compelling
need.  That really isn't a process issue but a somewhat
different management one.  That said, there is a strong case for
removing LANG entirely that has nothing to do with importance.
While "ask the server to do what you want" is always a tempting
design issue, good protocol designs are almost always minimal,
with as few extra options or features as possible.  It is also
almost always better to use a canonical form on the wire and let
the systems at each end convert that to local form as necessary
than to convert an "each client needs to convert from the
canonical form to what it needs" problem into one in which each
server has to support all possible forms to make clients happy.

The case is not that straightforward for POP3 and LANG for two
reasons.  One is that POP3's reply structure is not particularly
friendly to a canonical form approach.  It differs in that
respect from systems that return canonical codes that contain
most of the needed information as part of the effort response.
The other is that the list of preferred languages approach
permits any given server to make its own decisions about what
languages to support with, at least in principle, no pressure to
support everything.  Whether those two characteristics add
enough value to overcome what I believe should be a bias against
complexity and _any_ extra commands and capabilities unless they
are clearly necessary, is a question on which the WG needs to
decide (and, I suggest, decided many months ago).

> And, although I buy the higher level philosophical point, as
> devil's advocate isn't every protocol decision a decision
> about what's important?  Eg: cool but unimportant things are
> often just left out.  Other things get a MUST because they're
> the right thing to do, not necessarily because there are
> technical limitations that force their presence.  For example,
> 3492 is very permissive, allowing Punycode most anywhere.
> However 6530 discourages that practice in EAI.

Right.  However...

> If LANG is permitted without UTF-8 mail, then there are a
> bunch of questions about code pages, encodings, etc.
> Technically they could probably be solved, or ignored, but it
> works way better with UTF-8 mail.  If a solution is solved (or
> ignored) that allows LANG not requiring UTF-8, then I MUST
> test LANG and no UTF-8 for interoperability, even if I have no
> intention in supporting the LANG w/o UTF8 case in my client or
> server.  That protocol decision adds overhead and complexity
> for every application developer for a scenario that we've all
> pretty much agreed is a not a desirable case.  That means less
> time fixing real bugs or adding meaningful extensions to the
> standard.  So I think it's fair to ask for a baseline of
> support for new features, in this case supporting UTF8 when
> supporting LANG.

I don't see this problem.  You are making the same assumption
about what LANG specifies that Ned and I did after a few
readings, i.e., that LANG didn't imply/require character set
support at all.  What LANG actually says is that, if it is
enabled, the server gets to send messages associated with the
negotiated language in UTF-8 coding.  So no code page or
encoding issues, just UTF-8.  And your interoperability testing
issue is limited to being sure that a client that enables LANG
can handle receiving UTF-8 responses (presumably not
burdensome).  The server burden is one of keeping different
response tables for each language you choose to support... and
you need not support any at all, or even have such tables, to
conform to the requirements of the spec.

The fact that wasn't crystal-clear is exactly why I proposed new
text and placement to clarify the relationship.

> Making protocol decisions without considering the total cost
> of those decisions is also a bad thing.

Agreed.

> That said, I may have misused the word "important".  LANG
> opens up the problems of other scripts and stuff, which leads
> to the 3 options mentioned before:  Require UTF8, restrict it
> to ASCII, or figure out how to declare/encode stuff in other
> encodings.  Since our entire set of RFCs is trying to advocate
> UTF8 over the current mixed up mail encoding story, the 3rd
> option seems a non-starter.  So requiring UTF8 for LANG seems
> reasonable as the simplest way to get reliable LANG support.

And that is what the current text says and the clarification
reinforces.

best,
    john
Shawn Steele | 30 May 2012 07:05
Picon
Favicon

Re: LANG, UTF-8, and POP3

Well, regardless of the methodology, it sounds like we're in violent agreement with the result :)

-----Original Message-----
From: John C Klensin [mailto:klensin <at> jck.com] 
Sent: Tuesday, May 29, 2012 9:20 PM
To: Shawn Steele; Ned Freed; Barry Leiba
Cc: ima <at> ietf.org
Subject: Re: [EAI] LANG, UTF-8, and POP3

Speaking personally only.  I think we have a combination of a misunderstanding or two and a legitimate
philosophical disagreement here.  Mainly...

--On Wednesday, May 30, 2012 00:58 +0000 Shawn Steele <Shawn.Steele <at> microsoft.com> wrote:

>> > Sorry: I have to push back on this rather strongly.
>> > We should not be making protocol decisions because we want to push 
>> > any  messages about importance.  We need to make protocol decisions 
>> > based  on what's right for the protocol.
> 
>> Barry, you are of course correct. Pushing decisions about what's 
>> important through protocol decisions is a bad thing
> 
> In that case LANG shouldn't be there.  It's doing the same thing in 
> reverse.  Either LANG's important for UTF-8 mail, or it's not.  But it 
> was argued that at this point it should be left in basically for 
> processes, which seems way worse than for "what's important".

I don't think anyone has argued "for processes".  Some, including myself, have argued that the time to get
rid of (or not have) LANG was long ago and that, if we are going to make progress, we should not reopen old
issues without compelling need.  That really isn't a process issue but a somewhat different management
one.  That said, there is a strong case for removing LANG entirely that has nothing to do with importance.
While "ask the server to do what you want" is always a tempting design issue, good protocol designs are
almost always minimal, with as few extra options or features as possible.  It is also almost always better
to use a canonical form on the wire and let the systems at each end convert that to local form as necessary
than to convert an "each client needs to convert from the canonical form to what it needs" problem into one
in which each server has to support all possible forms to make clients happy.

The case is not that straightforward for POP3 and LANG for two reasons.  One is that POP3's reply structure is
not particularly friendly to a canonical form approach.  It differs in that respect from systems that
return canonical codes that contain most of the needed information as part of the effort response.
The other is that the list of preferred languages approach permits any given server to make its own
decisions about what languages to support with, at least in principle, no pressure to support
everything.  Whether those two characteristics add enough value to overcome what I believe should be a
bias against complexity and _any_ extra commands and capabilities unless they are clearly necessary, is
a question on which the WG needs to decide (and, I suggest, decided many months ago).

> And, although I buy the higher level philosophical point, as devil's 
> advocate isn't every protocol decision a decision about what's 
> important?  Eg: cool but unimportant things are often just left out.  
> Other things get a MUST because they're the right thing to do, not 
> necessarily because there are technical limitations that force their 
> presence.  For example,
> 3492 is very permissive, allowing Punycode most anywhere.
> However 6530 discourages that practice in EAI.

Right.  However...

> If LANG is permitted without UTF-8 mail, then there are a bunch of 
> questions about code pages, encodings, etc.
> Technically they could probably be solved, or ignored, but it works 
> way better with UTF-8 mail.  If a solution is solved (or
> ignored) that allows LANG not requiring UTF-8, then I MUST test LANG 
> and no UTF-8 for interoperability, even if I have no intention in 
> supporting the LANG w/o UTF8 case in my client or server.  That 
> protocol decision adds overhead and complexity for every application 
> developer for a scenario that we've all pretty much agreed is a not a 
> desirable case.  That means less time fixing real bugs or adding 
> meaningful extensions to the standard.  So I think it's fair to ask 
> for a baseline of support for new features, in this case supporting 
> UTF8 when supporting LANG.

I don't see this problem.  You are making the same assumption about what LANG specifies that Ned and I did
after a few readings, i.e., that LANG didn't imply/require character set support at all.  What LANG
actually says is that, if it is enabled, the server gets to send messages associated with the negotiated
language in UTF-8 coding.  So no code page or encoding issues, just UTF-8.  And your interoperability
testing issue is limited to being sure that a client that enables LANG can handle receiving UTF-8
responses (presumably not burdensome).  The server burden is one of keeping different response tables
for each language you choose to support... and you need not support any at all, or even have such tables, to
conform to the requirements of the spec.

The fact that wasn't crystal-clear is exactly why I proposed new text and placement to clarify the relationship.

> Making protocol decisions without considering the total cost of those 
> decisions is also a bad thing.

Agreed.

> That said, I may have misused the word "important".  LANG opens up the 
> problems of other scripts and stuff, which leads to the 3 options 
> mentioned before:  Require UTF8, restrict it to ASCII, or figure out 
> how to declare/encode stuff in other encodings.  Since our entire set 
> of RFCs is trying to advocate
> UTF8 over the current mixed up mail encoding story, the 3rd option 
> seems a non-starter.  So requiring UTF8 for LANG seems reasonable as 
> the simplest way to get reliable LANG support.

And that is what the current text says and the clarification reinforces.

best,
    john
John C Klensin | 30 May 2012 12:50

Re: LANG, UTF-8, and POP3


--On Wednesday, May 30, 2012 05:05 +0000 Shawn Steele
<Shawn.Steele <at> microsoft.com> wrote:

> Well, regardless of the methodology, it sounds like we're in
> violent agreement with the result :)

At this stage, I think that is what is important.

   john
ned+ima | 30 May 2012 06:50

Re: LANG, UTF-8, and POP3

> > > Sorry: I have to push back on this rather strongly.
> > > We should not be making protocol decisions because we want to push any
> > > messages about importance.  We need to make protocol decisions based
> > > on what's right for the protocol.

> > Barry, you are of course correct. Pushing decisions about what's important
> > through protocol decisions is a bad thing

> In that case LANG shouldn't be there.  It's doing the same thing in reverse. 

It does nothing of the sort.

> Either LANG's important for UTF-8 mail, or it's not.

You're once again falling into the trap Barry warned about. This is about
creating standards with the greatest possible utility, not about trying to
assess importance, or worse, trying to use standards language to force a notion
of what we happen to think is important down people's throats.

Given the limitations of POP error codes, by providing the ability to select a
language and return responses in UTF-8, LANG provides a better client
internationalization experience. This improvement is actually largely
independent of whether the UTF8 capability is used to provide support for
downloading EAI messages, especially since email already has significant
internationalization facilities.

This argues strongly for it to be possible to implement the LANG capability
without also requiring the ability to use the UTF8 capability.

> But it was argued that at this point it should be left in basically for
> processes, which seems way worse than for "what's important".

That certainly isn't the point I've been making. This has essentially nothing
to do with process and everything to do with the utility of the outcome.

> And, although I buy the higher level philosophical point, as devil's advocate
> isn't every protocol decision a decision about what's important?

Nope. It's about utility. Very different.

> Eg: cool but unimportant things are often just left out.  Other things get a
> MUST because they're the right thing to do, not necessarily because there are
> technical limitations that force their presence.  For example, 3492 is very
> permissive, allowing Punycode most anywhere.  However 6530 discourages that
> practice in EAI.

That's the result of a technical assessment. (It happens to be one I actually
disagree with, but I accept that it's the consensus of the group.)

> If LANG is permitted without UTF-8 mail, then there are a bunch of questions
> about code pages, encodings, etc.

Nobody is saying that LANG should provide the ability to produce responses in
any charset other than UTF-8. There's only one code page and one encoding on
the table.

You appear to be confusing the UTF8 *capability*, which provides the
ability to download EAI messages in POP, with the UTF-8 *charset*. When
people talk about LANG requiring UTF8, they are talking about
coupling the capabilities with each other. They are most definitely not
talking about allowing LANG to return responses in charsets other than UTF-8.
That's completely off in left field.

> Technically they could probably be solved, or ignored, but it works way better
> with UTF-8 mail.  If a solution is solved (or ignored) that allows LANG not
> requiring UTF-8,

See above. This is effectively a strawman. All we are saying is that servers
should be allowed to support the LANG *capability* without also requiring
support for the UTF8 *capability*.

> then I MUST test LANG and no UTF-8 for interoperability, even if I have no
> intention in supporting the LANG w/o UTF8 case in my client or server.

Again, this is not about being able to use other charsets with LANG. As for the
UTF8 capability, whether or not we require servers to support it if they
support LANG has no impact on your test cases, for several reasons:

(1) Nothing about server-side requirements has any impact on what clients may 
    do - a client may choose to activate neither capability, LANG by itself,
    UTF8 by itself, or both. So if you implement both, your test cases have
    to cover all of the combinations in order to account for all possible
    client behavior.

(2) If as you say, your client is going to require both capabilities, then the
    only server it is going to work with is one that offers both extensions,
    and that's the only case you need to test. This means you may not work
    with some servers, but that's your choice as an implementor to make.

(3) The only place these capabilities actually overlap would be if there are
    internationalized error responses specific to UTF8 capability. That's
    not much overlap, so even if you decide to support one without the other
    in your client, it's not a significant test burden since the tests are
    mostly indepedent of each other.

> That protocol decision adds overhead and complexity for every application
> developer for a scenario that we've all pretty much agreed is a not a desirable case. 

It does no such thing. See above.

> That means less time fixing real bugs or adding meaningful extensions to the
> standard.  So I think it's fair to ask for a baseline of support for new
> features, in this case supporting UTF8 when supporting LANG.

I'm afraid as a result of an apparent misunderstanding of the discussion, your
analysis is flawed and therefore your conclusions are erroneous.

> Making protocol decisions without considering the total cost of those
> decisions is also a bad thing.

> That said, I may have misused the word "important".  LANG opens up the
> problems of other scripts and stuff, which leads to the 3 options mentioned
> before:  Require UTF8, restrict it to ASCII, or figure out how to
> declare/encode stuff in other encodings.

> Since our entire set of RFCs is trying to advocate UTF8 over the current mixed
> up mail encoding story, the 3rd option seems a non-starter.  So requiring UTF8
> for LANG seems reasonable as the simplest way to get reliable LANG support.

Again, since nobody is proposing anything like that, this amounts to nothing
but a strawman.

				Ned
Shawn Steele | 30 May 2012 09:52
Picon
Favicon

Re: LANG, UTF-8, and POP3

> This is about creating standards with the greatest possible utility, not about trying to
> assess importance, or worse, trying to use standards language to force a notion of 
> what we happen to think is important down people's throats.

Features is not necessarily == utility.  Standards, by necessity, "force what the authors happen to think
is important down people's throats."  No matter how excellent the standard, people can be found that
disagree with that standard, either in large parts or small parts.

Standards are limited to what the authors think is important.  (If they didn't think it was important, they
wouldn't waste time including it in the standard).  Hopefully there is consensus and a great process
driving those decisions.  I think we've done a decent job.

For example, if we were striving for "greatest possible utility", I'd suggest that a more flexible
approach would have been to not create "SMTPUTF8", but rather "SMTPCP", with a mechanism for selecting
from an unlimited set of code pages.  We all decided that UTF8 and consistency was more important for EAI
than allowing people to pick their favorite code page.  We sacrificed potential utility for the
uniformity we perceived was more important.  Someone stuck maintaining a system with some awkward code
page may well have preferred if we'd come to a different conclusion.

Standards, IMO, are around to create interoperable systems.  Many permitted variations make the systems
more complex and hinder the interoperability.  So, IMO, it's completely fair to require UTF8 for LANG in
order to reduce the complexity of the system.  I do not think that allowing additional code pages,
particularly since it would introduce ambiguity or some complex declaration system, would provide more
"utility".  In fact I think it'd be less useful than a LANG that was unambiguous in its encoding.

But I'd like to point out that it shouldn't be "forcing what the authors think is important down people's
throats".  It should be what the WG has decided is realistic and reasonable for the entire ecosystem and the
long term goals of the standard and related standards.  IMO EAI cannot succeed without widespread UTF8
adoption, so adding things to an EAI standard that encourage deviating from UTF8 could only delay EAI
adoption, conflicting with the core goal of EAI adoption that this WG is trying to achieve.

Anyway, this is a philosophical discussion is sort of orthogonal to what we're doing here.  I'm fine with
what John's been proposing, though there seems to be some disagreement since John just said "That is what
the current text says" and Ned said "nobody is proposing anything like that" :)

-Shawn
John C Klensin | 30 May 2012 13:51

Re: LANG, UTF-8, and POP3

Shawn,

Two added comments, just because the design philosophy
discussion is interesting (at least to me).  Again, speaking
personally only.

--On Wednesday, May 30, 2012 07:52 +0000 Shawn Steele
<Shawn.Steele <at> microsoft.com> wrote:

>...
> For example, if we were striving for "greatest possible
> utility", I'd suggest that a more flexible approach would have
> been to not create "SMTPUTF8", but rather "SMTPCP", with a
> mechanism for selecting from an unlimited set of code pages.
> We all decided that UTF8 and consistency was more important
> for EAI than allowing people to pick their favorite code page.
> We sacrificed potential utility for the uniformity we
> perceived was more important.

While you may see that in terms of importance, I see it as
maximizing interoperability by permitting only one form on the
wire.  One would see greater utility only if either all servers
or all clients supported all possible code pages, which is a
near-impossibility.   If not, an unlimited number of code pages
approach would require every client to support both its
preferred code page(s) and one or more that it was pretty sure
everyone would support so it could convert.  (As
standards-writers, we could have made that choice easier by
making something mandatory to implement; that something would
probably be UTF-8).  The result would be that a client
implementer would need to support its preferred code page(s),
UTF-8, and conversion between UTF-8 and one or more of those
code pages.  That is not either improved functionality nor
reduced implementation complexity.  It enables a performance
optimization for clients that run on operating systems that
prefer non-Unicode internally, but, given that these reply
strings tend to be very short, that optimization is trivial.  Of
course, if there is no mandatory to implement code page (other
than ASCII) on the server, and both server and client optimize
flexibility by supporting only their own code pages,
"flexibility" would translate into an interoperability horror.

For context, I believe we would not have adopted the multiple
code page/encoding "charset" approach with text/plain had it
been clear that Unicode and UTF-8 would be available and
successful.  That was not at all obvious in 1992.  Even if it
had been, UTF-8 support wasn't even near widely deployed and
supported enough to make trying to force its use (even on the
wire) plausible.  (I note that this wasn't long after the
emerging HTML community effectively decided that, if they used
8859-1, they would be finished with i18n problems -- we at least
knew that was a non-starter.)

> Someone stuck maintaining a
> system with some awkward code page may well have preferred if
> we'd come to a different conclusion.

Maybe.  But, if they wanted to interoperate with others, I don't
know how strong the preference would have been given that they
would have had to support mapping to and from UTF-8 for that
code page anyway.

> Standards, IMO, are around to create interoperable systems.
> Many permitted variations make the systems more complex and
> hinder the interoperability.  So, IMO, it's completely fair to
> require UTF8 for LANG in order to reduce the complexity of the
> system.  I do not think that allowing additional code pages,
> particularly since it would introduce ambiguity or some
> complex declaration system, would provide more "utility".  In
> fact I think it'd be less useful than a LANG that was
> unambiguous in its encoding.

Exactly.

> But I'd like to point out that it shouldn't be "forcing what
> the authors think is important down people's throats".  It
> should be what the WG has decided is realistic and reasonable
> for the entire ecosystem and the long term goals of the
> standard and related standards.  IMO EAI cannot succeed
> without widespread UTF8 adoption, so adding things to an EAI
> standard that encourage deviating from UTF8 could only delay
> EAI adoption, conflicting with the core goal of EAI adoption
> that this WG is trying to achieve.

Again, no one opposed LANG with arbitrary code pages.  My
initial readings of the spec was that it didn't say anything at
all on the subject, which meant that we either needed to specify
that it used UTF-8 coding on replies or to descend into
"ambiguity or some complex declaration system".  This discussion
has been entirely about exactly how the "if you are going to
enable LANG, replies MUST be in UTF-8" condition was going to be
specified.

> Anyway, this is a philosophical discussion is sort of
> orthogonal to what we're doing here.  I'm fine with what
> John's been proposing, though there seems to be some
> disagreement since John just said "That is what the current
> text says" and Ned said "nobody is proposing anything like
> that" :)

I actually don't think so.  A really careful reading of the text
actually turns up the requirement for replies in UTF-8 so my
suggested modified text clarifies the situation but doesn't
actually change anything.  After another look at I-D this
morning, I believe that we should also change the abstract and
possibly even the title, to clarify that two separate and
orthogonal capabilities are being specified.  But even if the WG
agreed to those changes, they would be clarifications to text to
reduce confusion and increase the odds of correct
implementations, not alterations to the spec that would affect
any correct implementation.

   john
ned+ima | 30 May 2012 16:17

Re: LANG, UTF-8, and POP3

> > This is about creating standards with the greatest possible utility, not about trying to
> > assess importance, or worse, trying to use standards language to force a notion of
> > what we happen to think is important down people's throats.

> Features is not necessarily == utility.  Standards, by necessity, "force what
> the authors happen to think is important down people's throats."  No matter how
> excellent the standard, people can be found that disagree with that standard,
> either in large parts or small parts.

Of course features are not equivalent to utility. That's exactly the point of
this discussion. You're arguing for what amounts to a feature - coupling of
LANG to UTF8. I'm arguing that the feature has considerable negative utility,
not having it has negligable cost, and therefore it has no business being
there.

> Standards are limited to what the authors think is important.  (If they
> didn't think it was important, they wouldn't waste time including it in the
> standard).  Hopefully there is consensus and a great process driving those
> decisions.  I think we've done a decent job.

I actually disagee, but that's irrelevant at this point.

> For example, if we were striving for "greatest possible utility", I'd suggest
> that a more flexible approach would have been to not create "SMTPUTF8", but
> rather "SMTPCP", with a mechanism for selecting from an unlimited set of code
> pages.

Sorry, that's just flat out incorrect. There may have been a time when being
able to use multiple charsets resulted in an overall increase in utility
because of lack of widespread support for UTF-8, but that time is long since
past. It's especially true given the total hashup that's been made of various
CJK charsets.

And there's no way you can demand that servers be able to supply all possible
code pages. If you made that a requirement, it would either be ignored or
nobody would implement the extension. That's just reality.

> We all decided that UTF8 and consistency was more important for EAI than
> allowing people to pick their favorite code page.

No, we decided that having something that was implementable was more
important than having something that was not.

> We sacrificed potential utility for the uniformity we perceived was more
> important.  Someone stuck maintaining a system with some awkward code page may
> well have preferred if we'd come to a different conclusion.

Nope. You're comparing something that's implementable with something that is
not. There is no utility in something that will never see the light of day.

> Standards, IMO, are around to create interoperable systems.  Many permitted
> variations make the systems more complex and hinder the interoperability.  So,
> IMO, it's completely fair to require UTF8 for LANG in order to reduce the
> complexity of the system. 

You have not provided the slightest scintilla of evidence that separating the
two actually reduces significant added implementation complexity. I, OTOH, have
analyzed the situation fairly carefully and shown that the added complexity is
somewhere between nonexistent to negligable.

> I do not think that allowing additional code pages, particularly since it
> would introduce ambiguity or some complex declaration system, would provide
> more "utility".

And once again you're arguing a strawman. For the last time, nobody is arguing
for requiring support of additional code pages. That's not part of any
proposal anyone has made. Read the proposed text if you don't believe me.

> In fact I think it'd be less useful than a LANG that was unambiguous in its
> encoding.

Please explain why there's anything ambiguous about requiring the use of UTF-8.

> But I'd like to point out that it shouldn't be "forcing what the authors
> think is important down people's throats".  It should be what the WG has
> decided is realistic and reasonable for the entire ecosystem and the long term
> goals of the standard and related standards.  IMO EAI cannot succeed without
> widespread UTF8 adoption, so adding things to an EAI standard that encourage
> deviating from UTF8 could only delay EAI adoption, conflicting with the core
> goal of EAI adoption that this WG is trying to achieve.

You need to stop piling up the strawmen. There is nothing about allowing
LANG to be used separately from UTF8 that will discourage use of the UTF8
capability.

> Anyway, this is a philosophical discussion is sort of orthogonal to what
> we're doing here.  I'm fine with what John's been proposing, though there seems
> to be some disagreement since John just said "That is what the current text
> says" and Ned said "nobody is proposing anything like that" :)

John has apparently missed all your claims about how the revised LANG proposal
introduces the need to support additional code pages.

				Ned
Shawn Steele | 30 May 2012 18:40
Picon
Favicon

Re: LANG, UTF-8, and POP3

> Sorry, that's just flat out incorrect. There may have been a time when 
> being able to use multiple charsets resulted in an overall increase in utility
> because of lack of widespread support for UTF-8, but that time is long 
> since past. It's especially true given the total hashup that's been made of
> various CJK charsets.

I was being facetious.   John said it wasn't "importance" but "maximizing interoperability" that led to
this decision.  I think that it is important that interoperability be maximized, so that seems like
picking nits to me.

And now I'm really confused, if code pages are so evil, why not require LANG to respond in UTF8?  (Which is what
I understand John is trying to agree on language for.)  The WG seems agreed that there's about 1 language
that ASCII is sufficient for, so it seems like having localized responses would immediately require a
non-ASCII response.

IMO if LANG replies MUST be in UTF-8, then you're a large part of the way towards supporting UTF8 already? 
Like, how can you reply in UTF8 if you haven't negotiated a UTF8 connection?  If I use LANG w/o using UTF8,
then how does that work?  That's where I'm a bit confused.

-Shawn
ned+ima | 30 May 2012 18:52

Re: LANG, UTF-8, and POP3

> And now I'm really confused, if code pages are so evil, why not require LANG
> to respond in UTF8?

I dislike shouting, but it seems to be in order here:

THAT'S EXACTLY WHAT IS BEING PROPOSED. ABSOLUTELY NOBODY IS PROPOSING ALLOWING
ANYTHING OTHER THAN UTF-8 WHEN LANG IS USED. YOU ARE MAKING UP ISSUES WHERE
NONE EXIST.

*Please* read the text. I defy you to find even the slightest mention of
allowing any other charset, any other code page, any other encoding.

LANG does two things:

(1) It allows the server to respond in utf-8 instead being restricted to ASCII.
(2) It allows the client to request responses in different language, on the
    condition that the server is prepared to respond in that language.

That's it. There is nothing else there. When we talk about the coupling of LANG
with UTF8 (note the spelling - that's *not* a charset), we're talking about
there being a connection between suporting LANG and supporting the UTF8
capability, which enables client access to EAI messages without downgrading.

> (Which is what I understand John is trying to agree on language for.)

And your understanding is simply incorrect.

> The WG seems agreed that there's about 1 language that ASCII is sufficient
> for, so it seems like having localized responses would immediately
> require a non-ASCII response.

Which is why LANG is useful.

> IMO if LANG replies MUST be in UTF-8, then you're a large part of the way
>  towards supporting UTF8 already?

No you are not. If you believe that, you need to look at the various
downgrade specifications a *lot* more carefully.

> Like, how can you reply in UTF8 if you haven't negotiated a UTF8 connection? 
> If I use LANG w/o using UTF8, then how does that work?  That's where I'm a bit
> confused.

See above. These are separate things.

				Ned
John C Klensin | 30 May 2012 20:11

Re: LANG, UTF-8, and POP3


--On Wednesday, May 30, 2012 16:40 +0000 Shawn Steele
<Shawn.Steele <at> microsoft.com> wrote:

> And now I'm really confused, if code pages are so evil, why
> not require LANG to respond in UTF8?  (Which is what I
> understand John is trying to agree on language for.)  The WG
> seems agreed that there's about 1 language that ASCII is
> sufficient for, so it seems like having localized responses
> would immediately require a non-ASCII response.

Please read the spec.  In particular, the introduction to
draft-ietf-eai-rfc5721bis-04 says (second paragraph of Section
1): 

	"a mechanism to support UTF-8 characters in protocol
	level response strings as well as the ability to
	negotiate a language for such response strings."

and, in the third paragraph of the "Discussion" in Section 2:

	"This and subsequent protocol-level human-readable text
	is encoded in the UTF-8 charset."

That is really fairly clear and certainly doesn't suggest
returning anything but UTF-8 strings.  It has, IMO, only one
problem, which is that the two pieces of text are a little bit
buried and sound narrative rather than normative.   My proposed
fixes -- the modification to the actual description of LANG and
the suggestion that the WG consider changes to the title and/or
abstract -- simply clarify and highlight those statements; they
do not change the protocol spec in any way.

> IMO if LANG replies MUST be in UTF-8, then you're a large part
> of the way towards supporting UTF8 already?  Like, how can you
> reply in UTF8 if you haven't negotiated a UTF8 connection?  If
> I use LANG w/o using UTF8, then how does that work?  That's
> where I'm a bit confused.

Sending the LANG command and having it accepted constitutes
negotiating a a connection over which UTF-8 (the charset, not
the other Capability in this spec) will be sent.  See above.
Given that, I'm not sure what you are talking about, so one of
us is probably _very_ confused.  I don't think it is me (or Ned
or Barry).

best,
    john
Shawn Steele | 30 May 2012 20:59
Picon
Favicon

Re: LANG, UTF-8, and POP3

I'm happy with it, thanks :)

-----Original Message-----
From: John C Klensin [mailto:klensin <at> jck.com] 
Sent: Wednesday, May 30, 2012 11:11 AM
To: Shawn Steele; Ned Freed
Cc: Barry Leiba; ima <at> ietf.org
Subject: Re: [EAI] LANG, UTF-8, and POP3

--On Wednesday, May 30, 2012 16:40 +0000 Shawn Steele <Shawn.Steele <at> microsoft.com> wrote:

> And now I'm really confused, if code pages are so evil, why not 
> require LANG to respond in UTF8?  (Which is what I understand John is 
> trying to agree on language for.)  The WG seems agreed that there's 
> about 1 language that ASCII is sufficient for, so it seems like having 
> localized responses would immediately require a non-ASCII response.

Please read the spec.  In particular, the introduction to
draft-ietf-eai-rfc5721bis-04 says (second paragraph of Section
1): 

	"a mechanism to support UTF-8 characters in protocol
	level response strings as well as the ability to
	negotiate a language for such response strings."

and, in the third paragraph of the "Discussion" in Section 2:

	"This and subsequent protocol-level human-readable text
	is encoded in the UTF-8 charset."

That is really fairly clear and certainly doesn't suggest returning anything but UTF-8 strings.  It has,
IMO, only one problem, which is that the two pieces of text are a little bit
buried and sound narrative rather than normative.   My proposed
fixes -- the modification to the actual description of LANG and the suggestion that the WG consider changes
to the title and/or abstract -- simply clarify and highlight those statements; they do not change the
protocol spec in any way.

> IMO if LANG replies MUST be in UTF-8, then you're a large part of the 
> way towards supporting UTF8 already?  Like, how can you reply in UTF8 
> if you haven't negotiated a UTF8 connection?  If I use LANG w/o using 
> UTF8, then how does that work?  That's where I'm a bit confused.

Sending the LANG command and having it accepted constitutes negotiating a a connection over which UTF-8
(the charset, not the other Capability in this spec) will be sent.  See above.
Given that, I'm not sure what you are talking about, so one of us is probably _very_ confused.  I don't think it
is me (or Ned or Barry).

best,
    john

Gmane