Keith Moore | 1 Jan 2004 07:34
Picon

if you really want utf-8 headers...


Okay, I still see zero justification for utf-8 headers.  The 
improvement in transmission and storage efficiency is miniscule. They 
make both user agents and mail transports more complex and less 
reliable, because MTAs need to have conversion code (which will break 
messages and cause delivery failures) and UAs need to be able to handle 
old messages that use RFC 2047 (resulting in multiple code paths and 
additional failure modes).

(That, and they don't address the problem that this group is trying to 
solve...)

But if you believe that the very long term benefit of utf-8 headers (by 
which I mean that whatever benefit might result from using utf-8 - and 
it's by no means certain - won't be realized for a very long time) 
somehow outweighs the very high near-term cost, then may I suggest that 
the place to do the upgrade and negotiation is not in the mail 
transport, but at the message store and message submission.

That is, the major benefit of using utf-8 headers would be to make life 
easier for user agents and IMAP servers (for searching).  They don't 
benefit the transport at all.  But I could imagine POP and IMAP options 
that said "give me utf-8 headers instead of headers with RFC 2047 
and/or IMAAs in them", and I could imagine simplified UAs that would 
only talk to POP and IMAP servers that implemented that option.(I'd 
hate the lack of interoperability between new simplified UAs and old 
POP and IMAP servers, but there's already some precedent for UAs 
insisting on nonstandard or optional features in POP and IMAP.)

Message stores could implement this in a variety of ways - they could 
(Continue reading)

Martin Duerst | 5 Jan 2004 21:10
Picon
Favicon

Re: if you really want utf-8 headers...


At 01:34 04/01/01 -0500, Keith Moore wrote:

>Universal adoption of IMAAs is anything but assured.  The largest age 
>group of the world population is fairly young (say, less than 21 years 
>old) .  Many of these people have grown up with cheap travel and good 
>communications, and an international popular culture.   They are used to 
>dealing with people from other countries, and in multiple languages. Many 
>of these people may find that  IMAAs don't benefit them so much and that 
>it's easier to get all email at an ASCII address (or for that matter at an 
>E164 number using ENUM) than it is to deal with IMAAs.

There are definitely a lot of young people who travel cheaply.
But there are also a lot of people who can't afford to travel,
but who still might be able to use a computer at some public
place. And while we can't do anything to make travel cheaper,
we can work on making computers easier to use, for everybody.

Regards,   Martin.

Keith Moore | 5 Jan 2004 21:29
Picon

Re: if you really want utf-8 headers...


> At 01:34 04/01/01 -0500, Keith Moore wrote:
> 
> >Universal adoption of IMAAs is anything but assured.  The largest age
> >
> >group of the world population is fairly young (say, less than 21
> >years old) .  Many of these people have grown up with cheap travel
> >and good communications, and an international popular culture.   They
> >are used to dealing with people from other countries, and in multiple
> >languages. Many of these people may find that  IMAAs don't benefit
> >them so much and that it's easier to get all email at an ASCII
> >address (or for that matter at an E164 number using ENUM) than it is
> >to deal with IMAAs.
> 
> There are definitely a lot of young people who travel cheaply.
> But there are also a lot of people who can't afford to travel,
> but who still might be able to use a computer at some public
> place. And while we can't do anything to make travel cheaper,
> we can work on making computers easier to use, for everybody.

Indeed, and seen at that level, it's a laudible goal.  The question is
whether it's worth it to upgrade the entire email infrastructure in
order to provide the specific feature of IMAs, when those IMAs might not
be widely used.  Especially given that
a) we can provide the same service without such an expensive upgrade, and
probably without nearly so much disruption, or
b) we could use the expensive upgrade to drastically improve email 
service in many other ways than just to provide IMAs.

(Continue reading)

Martin Duerst | 6 Jan 2004 21:08
Picon
Favicon

Re: if you really want utf-8 headers...


At 15:29 04/01/05 -0500, Keith Moore wrote:

>Indeed, and seen at that level, it's a laudible goal.  The question is
>whether it's worth it to upgrade the entire email infrastructure in
>order to provide the specific feature of IMAs, when those IMAs might not
>be widely used.  Especially given that
>a) we can provide the same service without such an expensive upgrade, and
>probably without nearly so much disruption, or

Even potentially without an SMTP extension, I think that defining
a new header format and a way to down- and upgrade is important,
because only this will allow things such as simplified clients
based on upgraded delivery agents,... And once we are there,
defining an SMTP extension isn't really a big deal, although
adoption may take quite a while.

>b) we could use the expensive upgrade to drastically improve email
>service in many other ways than just to provide IMAs.

If it were up to you alone to choose, what would you do in that upgrade?

Regads,    Martin.

Keith Moore | 7 Jan 2004 01:10
Picon

Re: if you really want utf-8 headers...


> Even potentially without an SMTP extension, I think that defining
> a new header format and a way to down- and upgrade is important,
> because only this will allow things such as simplified clients
> based on upgraded delivery agents,...

I think we should think in terms of a new message format, not merely a 
new header format, because MIME is so complex and so irregular (some 
would say baroque) that the amount of simplification you get from 
having utf-8 in message headers is only a small part of that which 
could be gained.  even if you have utf-8 transparency, mail readers 
still need to know how to parse address fields, normalize/canonicalize 
addresses, look up IDNs, etc.  you still need to deal with RFC 2047 in 
received messages and old messages.  you still have to support a 
different syntax for each different header or bodypart field.  compared 
to all of this cruft, the extra overhead required to translate 
addresses between ACE and raw UTF-8 is minimal.

> And once we are there,
> defining an SMTP extension isn't really a big deal, although
> adoption may take quite a while.

the big deal is the leakage and damage to messages that we can expect 
from the extension.

>> b) we could use the expensive upgrade to drastically improve email
>> service in many other ways than just to provide IMAs.
>
> If it were up to you alone to choose, what would you do in that 
> upgrade?
(Continue reading)

Dan Oscarsson | 2 Jan 2004 15:28
Picon

Re: if you really want utf-8 headers...


John C Klensin wrote

>	(i) Maximizes global interoperability of the mail
>	infrastructure, especially when it is actually important
>	in practice (not just in theory).
>	
>	(ii) Minimizes damage when things leak out of Unicode or
>	local CCS environments.
>	
>	(iii) Avoids driving users and mail systems toward
>	proprietary environments because they provide a better
>	experience.
>
>I think those are our goals, or should be. 

They are goals I have worked for and is the reason I only
want ONE way to encode characters. I will come back to it later on.

>Now, it seems to me that there are two main possible models for 
>getting there. 
>
>(1) We accept the conclusion that that the proprietary, local 
>CCS (which might be Unicode in UTF-8 or some other form), 
>local-header-definition, systems are out there and are going to 
>be with us forever.  We then view this strictly as a gateway 
>problem.  Given all of our other constraints, that gateway 
>problem is probably best dealt with by encapsulation, e.g.

This is like MIME/IDNS/IMAA - encapsulate non-ASCII inside ASCII
(Continue reading)

J-F C. (Jefsey) Morfin | 2 Jan 2004 20:45
Picon

Re: if you really want utf-8 headers...


At 15:28 02/01/04, Dan Oscarsson wrote:
> From my view gateways are best because then you either are in legacy 
> world or in single character set world.

Unfortunately there is also a real world.

Keith Moore | 2 Jan 2004 17:48
Picon

Re: if you really want utf-8 headers...


> A simple clean gateways point between old and new could be to
> define that protocols over IPv6 only use UTF-8. 

this is a really lousy idea, unless you want to make BOTH transitions
(IPv4 -> IPv6 and ascii -> utf-8) more difficult.  it's far easier to 
upgrade one part of the system at a time than to require that you have
to upgrade all of your MTAs, MUAs, gateways, firewalls, etc., at the 
same time you introduce IPv6.

Keith Moore | 2 Jan 2004 17:45
Picon

Re: if you really want utf-8 headers...


> Keith have wondered why we need UTF-8 in headers.

No, that's not what I'm wondering at all.  What I'm wondering is why
people think that merely saying "it's okay to use utf-8 in message
headers" will result in reduced complexity, when even a simple analysis
indicates that it will increase complexity of every part of the mail
system.  And I'm wondering why, given that mail is already getting less
and less useful and less and less reliable due to spam and viruses and
the countermeasures for these, that people think that having unencoded
utf-8 in email addresses is somehow worth making email even less
reliable.  We can provide IMAAs with far less disruption and
transition pain without introducing utf-8 in message headers. 

> If we instead agreed on using a single character set (like UCS) and
> a single transfer encoding on the wire (like UTF-8) things would be
> a lot easier! 

No it wouldn't,  because we're still going to need to accept all of
those various character encodings and ACEs in mail from legacy MUAs and
we're still going to need to accept all of those things in old mail.

But that's part of why I suggested making this change first at the MUA
and message store, without burdening mail transport at least initially -
because the message store is in a good position to convert legacy
messages to the new format, and because POP or IMAP provide the means to
allow simplified MUAs to refuse to deal with legacy messages,
and because it doesn't burden the rest of the mail transport in the near
term.

(Continue reading)

Martin Duerst | 2 Jan 2004 20:11
Picon
Favicon

Re: if you really want utf-8 headers...


At 11:45 04/01/02 -0500, Keith Moore wrote:

>But that's part of why I suggested making this change first at the MUA
>and message store, without burdening mail transport at least initially -
>because the message store is in a good position to convert legacy
>messages to the new format, and because POP or IMAP provide the means to
>allow simplified MUAs to refuse to deal with legacy messages,
>and because it doesn't burden the rest of the mail transport in the near
>term.

I think getting UTF-8 for the message store and for POP and IMAP is
great. But I don't think Paul or John (and the rest of us) are starting
at the wrong end, I just see it as them doing one part of the overall work.
The work on downgrading/upgrading has to be done anyway, and better
only once. And looking at all the pieces has various advantages:

- We make sure that things work together
- We get some increased push on adoption, because people have
   various ways and places to start upgrading. It's often difficult
   to predict in which sequence upgrading will happen; the best thing
   to me seems to have consistent/compatible upgrades available
   across the board.
- We know that adoption won't necessarily be that quick. If we
   think it takes five years or more, we shouldn't start at one
   end and wait for that to be fairly upgraded and then start at
   the other end.
- A simplified MUA is an excellent example of the benefit of streamlining
   to UTF-8. But for that to work, the MUA not only has to be able
   to receive messages with UTF-8 headers, it also has to be able
(Continue reading)

Keith Moore | 2 Jan 2004 21:32
Picon

Re: if you really want utf-8 headers...


> I think getting UTF-8 for the message store and for POP and IMAP is
> great. But I don't think Paul or John (and the rest of us) are starting
> at the wrong end, I just see it as them doing one part of the overall 
> work.
> The work on downgrading/upgrading has to be done anyway, and better
> only once.

I agree that a specification for how to do the conversion is necessary, 
and that it should only be done once.  The disagreement is really about 
whether it's a good idea to try to send these messages through SMTP 
(making SMTP more complex and error-prone in the process) and also SMTP 
negotiation is a good way to define the boundary between the legacy 
mail system and the mail system that supports utf-8.

>  And looking at all the pieces has various advantages:
>
> - We make sure that things work together
> - We get some increased push on adoption, because people have
>   various ways and places to start upgrading. It's often difficult
>   to predict in which sequence upgrading will happen; the best thing
>   to me seems to have consistent/compatible upgrades available
>   across the board.
> - We know that adoption won't necessarily be that quick. If we
>   think it takes five years or more, we shouldn't start at one
>   end and wait for that to be fairly upgraded and then start at
>   the other end.
> - A simplified MUA is an excellent example of the benefit of 
> streamlining
>   to UTF-8. But for that to work, the MUA not only has to be able
(Continue reading)

Steve Hole | 2 Jan 2004 22:33

Re: if you really want utf-8 headers...


On Thu, 1 Jan 2004 01:34:38 -0500 Keith Moore <moore <at> cs.utk.edu> wrote:

> 
> Okay, I still see zero justification for utf-8 headers.  The 
> improvement in transmission and storage efficiency is miniscule. They 
> make both user agents and mail transports more complex and less 
> reliable, because MTAs need to have conversion code (which will break 
> messages and cause delivery failures) and UAs need to be able to handle 
> old messages that use RFC 2047 (resulting in multiple code paths and 
> additional failure modes).

I am completely in agreement with this.   I still see a huge delay in 
transport upgrades for features like DSN, 8BITMIME and PIPELINE because, 
basically, everybody has to play to make it useful.   If you *require* a 
transport upgrade by SMTP extension you will be looking at at decade long 
deployment.

---
Steve Hole
Chief Technology Officer - Billing and Payment Systems
ACI Worldwide
<mailto:holes <at> ACIWorldwide.com>
Phone: 780-424-4922

John C Klensin | 1 Jan 2004 18:40

Re: if you really want utf-8 headers...


Keith (and many others),

I've been sitting out this discussion because I've been trying 
to work through the long-term and transition cases and sort them 
out in a coherent way.  I've also been struck by the degree to 
which much of these discussions reflects an apparent lack of 
operational experience with the way email works in practice.

So let's take a few steps back:

Anything we do will cause some interoperability problems 
--whether impact on transport or software systems or as 
perceived by users -- somewhere.  "8:" headers will mess 
something up, somewhere, because of the issues you and others 
have identified (including parsing issues, header consolidation 
algorithms, special headers coming through but getting trashed 
without warnings to the recipient, and so on), even if the 
assumptions that cause the problems aren't justified in the 
standards.   And users will be furious if they see IMAA/ACE 
local-parts, even if the mail goes through, and will also be 
furious if mail that they consider to be well-formed bounces. 
While RFC 1342 and its successors were a brilliant solution 
given the constraints of the network at that time, you've 
certainly got enough operational experience, and are a keen 
enough observer, to know how much users hate actually seeing 
them (and how much abuse Quoted-printable took).   Any of these 
changes will cause problems, and will make people unhappy -- 
probably, in the short term, more people will be unhappy than 
happy about them.
(Continue reading)

Thomas Roessler | 2 Jan 2004 18:18

Re: if you really want utf-8 headers...


On 2004-01-01 12:40:59 -0500, John C Klensin wrote:

> 	1.2 We invent message/rfcNNNN, where "rfcNNNN" basically
> 	says "just like RFC2822, but all header fields are
> 	defined as being in UTF-8, not ASCII".
> 	
> 	1.3 The gateway converts all envelope addresses to IMAA
> 	form and encapsulates the original message using
> 	message/rfcNNNN, so we have a MIME body of...
> 
> 	   From: "1342/2047 PersonalName"
> 	       <IMAA-local-part <at> IDNA-domain>
> 		To: "1342/2047 PersonalName2"
> 	       <IMAA-local-part2 <at> IDNA-domain2>
> 	   Date: RFC2822-date
> 	   MIME-Version: 1.0
> 	   content-type: message/rfcNNNN
> 	   content-type-encoding: <as needed>
> 	
> 	   <original message, with original headers, in original
> 	form>
> 
> 	I hope we can avoid it, but a charset parameter for
> 	message/rfcNNNN would certainly not be rocket science to
> 	define.

> 	1.4 Clever receiving systems notice "message/rfcNNN" and
> 	unwind the situation in some appropriate way, with no
> 	information loss.   And note that the model above is
(Continue reading)

John C Klensin | 3 Jan 2004 17:01

Re: if you really want utf-8 headers...


--On Friday, 02 January, 2004 18:18 +0100 Thomas Roessler 
<roessler <at> does-not-exist.org> wrote:

> On 2004-01-01 12:40:59 -0500, John C Klensin wrote:
>
>> 	1.2 We invent message/rfcNNNN, where "rfcNNNN" basically
>> 	says "just like RFC2822, but all header fields are
>> 	defined as being in UTF-8, not ASCII".
>> 	
>> 	1.3 The gateway converts all envelope addresses to IMAA
>> 	form and encapsulates the original message using
>> 	message/rfcNNNN, so we have a MIME body of...
>>
>> 	   From: "1342/2047 PersonalName"
>> 	       <IMAA-local-part <at> IDNA-domain>
>> 		To: "1342/2047 PersonalName2"
>> 	       <IMAA-local-part2 <at> IDNA-domain2>
>> 	   Date: RFC2822-date
>> 	   MIME-Version: 1.0
>> 	   content-type: message/rfcNNNN
>> 	   content-type-encoding: <as needed>
>> 	
>> 	   <original message, with original headers, in original
>> 	form>
>>
>> 	I hope we can avoid it, but a charset parameter for
>> 	message/rfcNNNN would certainly not be rocket science to
>> 	define.
>
(Continue reading)

Adam M. Costello | 5 Jan 2004 08:05

Re: if you really want utf-8 headers...


John C Klensin <john-ietf <at> jck.com> wrote:

> 1.3 The gateway converts all envelope addresses to IMAA
> form and encapsulates the original message using
> message/rfcNNNN, so we have a MIME body of...
> 
>     From: "1342/2047 PersonalName" <IMAA-local-part <at> IDNA-domain>
>     To: "1342/2047 PersonalName2" <IMAA-local-part2 <at> IDNA-domain2>
>     Date: RFC2822-date
>     MIME-Version: 1.0
>     content-type: message/rfcNNNN
>     content-transfer-encoding: <as needed>
> 
>     <original message, with original headers, in original form>

Thomas Roessler <roessler <at> does-not-exist.org> replied:

> Wouldn't that construction violate MIME's "no nested encodings"
> rule when transferred in a 7bit environment?

Yes.  RFC 2045 section 6.4 says:

   it is EXPRESSLY FORBIDDEN to use any encodings other than "7bit",
   "8bit", or "binary" with any composite media type, i.e. one that
   recursively includes other Content-Type fields.

Therefore, if you try to encapsulate the 8-bit header and pass the
message to a 7-bit MTA, you won't be able to apply quoted-printable or
base64 encoding, and you're stuck.
(Continue reading)


Gmane