Re: LANG, UTF-8, and POP3
Shawn Steele <Shawn.Steele <at> microsoft.com>
2012-05-30 05:05:46 GMT
Well, regardless of the methodology, it sounds like we're in violent agreement with the result :)
-----Original Message-----
From: John C Klensin [mailto:klensin <at> jck.com]
Sent: Tuesday, May 29, 2012 9:20 PM
To: Shawn Steele; Ned Freed; Barry Leiba
Cc: ima <at> ietf.org
Subject: Re: [EAI] LANG, UTF-8, and POP3
Speaking personally only. I think we have a combination of a misunderstanding or two and a legitimate
philosophical disagreement here. Mainly...
--On Wednesday, May 30, 2012 00:58 +0000 Shawn Steele <Shawn.Steele <at> microsoft.com> wrote:
>> > Sorry: I have to push back on this rather strongly.
>> > We should not be making protocol decisions because we want to push
>> > any messages about importance. We need to make protocol decisions
>> > based on what's right for the protocol.
>
>> Barry, you are of course correct. Pushing decisions about what's
>> important through protocol decisions is a bad thing
>
> In that case LANG shouldn't be there. It's doing the same thing in
> reverse. Either LANG's important for UTF-8 mail, or it's not. But it
> was argued that at this point it should be left in basically for
> processes, which seems way worse than for "what's important".
I don't think anyone has argued "for processes". Some, including myself, have argued that the time to get
rid of (or not have) LANG was long ago and that, if we are going to make progress, we should not reopen old
issues without compelling need. That really isn't a process issue but a somewhat different management
one. That said, there is a strong case for removing LANG entirely that has nothing to do with importance.
While "ask the server to do what you want" is always a tempting design issue, good protocol designs are
almost always minimal, with as few extra options or features as possible. It is also almost always better
to use a canonical form on the wire and let the systems at each end convert that to local form as necessary
than to convert an "each client needs to convert from the canonical form to what it needs" problem into one
in which each server has to support all possible forms to make clients happy.
The case is not that straightforward for POP3 and LANG for two reasons. One is that POP3's reply structure is
not particularly friendly to a canonical form approach. It differs in that respect from systems that
return canonical codes that contain most of the needed information as part of the effort response.
The other is that the list of preferred languages approach permits any given server to make its own
decisions about what languages to support with, at least in principle, no pressure to support
everything. Whether those two characteristics add enough value to overcome what I believe should be a
bias against complexity and _any_ extra commands and capabilities unless they are clearly necessary, is
a question on which the WG needs to decide (and, I suggest, decided many months ago).
> And, although I buy the higher level philosophical point, as devil's
> advocate isn't every protocol decision a decision about what's
> important? Eg: cool but unimportant things are often just left out.
> Other things get a MUST because they're the right thing to do, not
> necessarily because there are technical limitations that force their
> presence. For example,
> 3492 is very permissive, allowing Punycode most anywhere.
> However 6530 discourages that practice in EAI.
Right. However...
> If LANG is permitted without UTF-8 mail, then there are a bunch of
> questions about code pages, encodings, etc.
> Technically they could probably be solved, or ignored, but it works
> way better with UTF-8 mail. If a solution is solved (or
> ignored) that allows LANG not requiring UTF-8, then I MUST test LANG
> and no UTF-8 for interoperability, even if I have no intention in
> supporting the LANG w/o UTF8 case in my client or server. That
> protocol decision adds overhead and complexity for every application
> developer for a scenario that we've all pretty much agreed is a not a
> desirable case. That means less time fixing real bugs or adding
> meaningful extensions to the standard. So I think it's fair to ask
> for a baseline of support for new features, in this case supporting
> UTF8 when supporting LANG.
I don't see this problem. You are making the same assumption about what LANG specifies that Ned and I did
after a few readings, i.e., that LANG didn't imply/require character set support at all. What LANG
actually says is that, if it is enabled, the server gets to send messages associated with the negotiated
language in UTF-8 coding. So no code page or encoding issues, just UTF-8. And your interoperability
testing issue is limited to being sure that a client that enables LANG can handle receiving UTF-8
responses (presumably not burdensome). The server burden is one of keeping different response tables
for each language you choose to support... and you need not support any at all, or even have such tables, to
conform to the requirements of the spec.
The fact that wasn't crystal-clear is exactly why I proposed new text and placement to clarify the relationship.
> Making protocol decisions without considering the total cost of those
> decisions is also a bad thing.
Agreed.
> That said, I may have misused the word "important". LANG opens up the
> problems of other scripts and stuff, which leads to the 3 options
> mentioned before: Require UTF8, restrict it to ASCII, or figure out
> how to declare/encode stuff in other encodings. Since our entire set
> of RFCs is trying to advocate
> UTF8 over the current mixed up mail encoding story, the 3rd option
> seems a non-starter. So requiring UTF8 for LANG seems reasonable as
> the simplest way to get reliable LANG support.
And that is what the current text says and the clarification reinforces.
best,
john