John C Klensin | 23 Jan 2004 22:53

FWD: I-D ACTION:draft-klensin-email-envelope-00.txt

The draft posted/described in the attached is a bizarre idea, 
partially to see if it is possible to consider a radical 
solution to an increasingly troublesome problem, and partially 
to see if the supportive comments about "Email NG" in 
Minneapolis were really serious.

I am not at all convinced that it is a _good_ idea, only that, 
if we are talking about radical changes to the mail 
infrastructure to support various extended services, this is the 
sort of "clean up the warts that get in the way" option we might 
want to consider.

And even if it were a good idea, some of the details are 
probably not right -- if this looks like it received a day's 
thought, you would probably be guessing much too high.

Discussion should probably go to the SMTP list; IMAA is copied 
only because this could interact a bit with some of the "UTF-8 
header" discussions.

   john
Picon Favicon
From: The draft posted/described in the attached is a bizarre idea, partially to see if it is possible to consider a radical solution to an increasingly troublesome problem, and partially (Continue reading)

Keith Moore | 26 Jan 2004 01:27
Picon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


I'm all for separating the envelope from the message content, but I'd 
be interested in investigating an even more radical change - (probably) 
scrapping SMTP entirely or (less likely) branching out of the SMTP 
state machine very early.  Or in other words, I could see making the 
new protocol similar enough to SMTP that a single server could 
implement both, but that might just invite confusion.

Hector Santos | 26 Jan 2004 08:55
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


I agree with you. I believe SMTP can be used to offer a "dual" state
transaction model.   Yes, design and implementation confusion can be an
issue. However, I believe this only means it would become one of many
aspects for the new design to address and minimize.

I would like to suggest an approach borrowing old concepts such as Fidonet
when it was faced with new industry needs and state machine protocols.

The original and standard protocol was called FTSC1.   This was mostly based
on a kludged XMODEM handshake. A second one called WAZOO offered an easier
session handshake and introduced ZMODEM for data transfer.  A 3rd one called
EMSI addressed the session handshake issues. EMSI introduced text based tags
or attributes that defined and establish the compatibility level between the
client and server.  It also introduced new tags to define new file transfer
protocols.

If EMSI or WAZOO was not established,  the fallback was FTSC1.    As a side
note, for historical perspective, the demise of Fidonet was not helped by
the fact that the Fidonet or FTSC (Fidonet Technical Standards Committee)
protocol police insisted on backward compatibility.  WAZOO/EMSI flexibility
allowed for growth including new "internet readiness."  As new developers
come aboard to support the new direction (internet), most used the easier to
implement WAZOO/EMSI protocols and avoided FTSC1 mostly due to the fact it
was a complex outdated modified XMODEM7 binary protocol handle shake which
is probably akin to the idea of SMTP having no states other than DATA!   It
didn't fit well in packet switching networks. Security was weak and it
created conflicts between the development community and the FTSC.  Fidonet
operators or members found to be using non-FTSC1 compliant mailers were
excommunicated from Fidonet.  To get an Fidonet address (akin to getting a
(Continue reading)

william | 26 Jan 2004 02:47

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


First of all end-users would not care much what protocol it is when
"you've got mail" pops up.

And for designers, programmers, and quicker implementation its quite a bit 
better if new protocol and old one share certain components. For example 
MIME encoding/decoding, etc. It would faciliate process  if protocol can 
be implemented on same mail server software that would be able to decide 
based on certain dns or other parameters what protocol to use when sending 
email to the other end.

And look at the example of IPv4 vs IPv6 - IPv6. While there are substantial 
improvements in features available with IPv6, as far as implementation and 
design and application programmings, its not that different. 

On Sun, 25 Jan 2004, Keith Moore wrote:

> 
> I'm all for separating the envelope from the message content, but I'd 
> be interested in investigating an even more radical change - (probably) 
> scrapping SMTP entirely or (less likely) branching out of the SMTP 
> state machine very early.  Or in other words, I could see making the 
> new protocol similar enough to SMTP that a single server could 
> implement both, but that might just invite confusion.

Nathaniel Borenstein | 26 Jan 2004 14:54
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


On Sunday, January 25, 2004, at 08:47  PM, william <at> elan.net wrote:

> And for designers, programmers, and quicker implementation its quite a 
> bit
> better if new protocol and old one share certain components. For 
> example
> MIME encoding/decoding, etc.

Well, let's not rule out improvements to MIME in the process.  The name 
MIME includes a lot of things, from the two-level content-type 
architecture to the quoted-printable and base64 transfer encodings.  
I'd certainly like to see us keep the former, but the latter can only 
be seen as warts for backward compatibility with SMTP/822.  I wouldn't 
make it one of the major goals of email NG, but I certainly wouldn't 
mind seeing the new protocol eliminate the need for 
content-transfer-encodings.

Another comment:  I strongly urge moving this discussion to the new 
mail-ng mailing list once it's up and running.  The imaa and smtp 
mailing lists will make a lot more short-term progress if the 
longer-term discussions have their own venue.  -- Nathaniel

william@elan.net | 26 Jan 2004 16:13

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


On Mon, 26 Jan 2004, Nathaniel Borenstein wrote:

> Another comment:  I strongly urge moving this discussion to the new 
> mail-ng mailing list once it's up and running.  The imaa and smtp 
> mailing lists will make a lot more short-term progress if the 
> longer-term discussions have their own venue.  -- Nathaniel

I've tried forwarding original draft and couple recent comments which are 
directly on topic of new mail system to this new mail list in hopes of 
moving those discussions there, but system bounced it off with automated 
autoreply. Hopefully list owner can do it himself if he things its worth 
it to start off that list with. Here are the posts I wanted to forward:
 http://www.imc.org/ietf-smtp/mail-archive/msg00927.html
 http://www.imc.org/ietf-smtp/mail-archive/msg00928.html
 http://www.imc.org/ietf-smtp/mail-archive/msg00929.html
 http://www.imc.org/ietf-smtp/mail-archive/msg00935.html
 http://www.imc.org/ietf-smtp/mail-archive/msg00943.html
 http://www.imc.org/ietf-smtp/mail-archive/msg00945.html
 http://www.imc.org/ietf-smtp/mail-archive/msg00950.html
 http://www.imc.org/ietf-smtp/mail-archive/msg00951.html
 http://www.imc.org/ietf-smtp/mail-archive/msg00952.html
 http://www.imc.org/ietf-smtp/mail-archive/msg00954.html

--

-- 
William Leibzon
Elan Networks
william <at> elan.net

(Continue reading)

J-F C. (Jefsey) Morfin | 26 Jan 2004 04:57
Picon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


I an idea we work on is to keep SMTP as it is, and use it for mail 
signaling (weemail). The interest is that nothing is to be changed except 
to modify slightly the user agent and develop a new agent - a store and 
retrieve mail server which may also be used as a gateway between the 
current and the new system. The "you've got mail" is then not refering to a 
piece of junk with emebeded virus you have been forfced to accept into your 
computer and spend bandwidth for, but to a file of any kind  you can chose 
to disregard and leave on the sender's server or read totally or in part, etc.

This does not kill spam as there are ways to fake a server. But it makes 
wild spam far more complex and drastically reduces the bandwidth usage.

The possibilties it easily allows, like dynamic distribution lists, also 
permits to prioritize the reader interest and to leave the spam at the 
bottom of the basket. Mail URLs also allows to more easily hunt for known 
spam or to disregard known commercial sources - or already read posts.

It is there quite easy to use 0-Z numbering to name LHS and to have a 
resolution system into any vernacular (ML, menus, emoticon, etc.).
jfc

At 02:47 26/01/04, william <at> elan.net wrote:

>First of all end-users would not care much what protocol it is when
>"you've got mail" pops up.
>
>And for designers, programmers, and quicker implementation its quite a bit
>better if new protocol and old one share certain components. For example
>MIME encoding/decoding, etc. It would faciliate process  if protocol can
(Continue reading)

Hector Santos | 26 Jan 2004 07:36
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


Don't wish to insult anyone with the obvious but here some points we should
keep in mind when considering revamping SMTP.

1) "worthy new solutions" are those that best address real problems.  While
there will also be a market where needless change is performed ("latest toy
syndrome"),  in general, costly revamps become more feasible for
consideration when they provide major benefits.

2) Consideration is enhanced when new functional specs include well outlined
migration, acceptance and deployment plans, ideally including or referencing
source code examples and/or API, SDK tools to help the migration process.

3) Responsive vendors will help the process by making the "conversion"
process easier via their software, and lastly

4) Where there is a market for change, a tertiary market of tools (proxies,
middle ware, APIs) will be made available to address legacy software and/or
provide easier development and migration.

In short, "lets not fear change"

Also, consider Yahoo's plan for implementing their YDK (Yahoo Domain Keys)
proposal.  According to recent cyber news,  Yahoo has said they are going to
modify open source mail servers (I believe they mentioned sendmail and
qmail) to implement their YDK  and according them, plan to release it when
it all send and done.   I am seriously interested what others think about
this.

Thanks
(Continue reading)

Martin Duerst | 26 Jan 2004 16:52
Picon
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


At 01:36 04/01/26 -0500, Hector Santos wrote:

>Also, consider Yahoo's plan for implementing their YDK (Yahoo Domain Keys)
>proposal.  According to recent cyber news,  Yahoo has said they are going to
>modify open source mail servers (I believe they mentioned sendmail and
>qmail) to implement their YDK  and according them, plan to release it when
>it all send and done.   I am seriously interested what others think about
>this.

I have just looked at an article about YDK for a couple minutes, and
don't claim to understand it fully. There seem to be some similarities
to SPF and other proposals. We (W3C) have deployed SPF records just
recently. The main benefit we are expecting is that we can avoid
ourselves and our mailing lists being spammed by impersonators faking
our own email addresses. That's not all of spam, but it's a very
nasty and troubling bit of it.

Regards,   Martin.

Hector Santos | 27 Jan 2004 03:35
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


Martin,

We have added DMP to our anti-spam package and it works great in this area
(checking your local domain and machine spoofs).  WCSAP performs 4 checks:
Internal White/Black List,   RBL,  DMP, and CBV and it is called at the RCPT
state (only if the recipient is acceptable, if refused, wcsap is not
called).

What I did not get 100% clear about SPF, is whether it works from a central
authority (database) concept.  Its documentation seems to indicate both (a
central and your own).  The author should clarify this more.  I am somewhat
resistance to having a "central" database system, however I am not naive to
know that this is probably ultimately the direction the email world will
head to.   So what we did was add a general DNS lookup rules/parser into our
system so that it can ready for anything in the future.

In any case, in my research and testing of the various DNS lookup proposals,
the only "current" benefit I see from all the anti-spam DNS based lookup
proposals is checking spoofs against your own domains.   If you begin to
check other domains, the DNS overhead skyrockets.   It was pointed out to me
by the DMP author that this is a function of your DNS setup. I am not too
sure about this. But I don't pretend to be an DNS expect.  All I know is
that looking up an unknown domain (which is what a majority of the spammers
are) yields a long initial lookup delay.

Thanks

--

-- 
Hector Santos, Santronics Software, Inc.
(Continue reading)

Martin Duerst | 27 Jan 2004 16:45
Picon
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


Hello Hector,

At 21:35 04/01/26 -0500, Hector Santos wrote:

>What I did not get 100% clear about SPF, is whether it works from a central
>authority (database) concept.  Its documentation seems to indicate both (a
>central and your own).  The author should clarify this more.

Please tell him directly. I don't have any direct contacts.

>In any case, in my research and testing of the various DNS lookup proposals,
>the only "current" benefit I see from all the anti-spam DNS based lookup
>proposals is checking spoofs against your own domains.

That's the initial benefit, as I mentioned.

>If you begin to
>check other domains, the DNS overhead skyrockets.   It was pointed out to me
>by the DMP author that this is a function of your DNS setup. I am not too
>sure about this. But I don't pretend to be an DNS expect.  All I know is
>that looking up an unknown domain (which is what a majority of the spammers
>are) yields a long initial lookup delay.

I'm no DNS expert either. But giving the spammers some delay may be a
nice side effect :-).

Regards,   Martin.

(Continue reading)

Paul Hoffman / IMC | 27 Jan 2004 22:08
Picon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


This discussion does not belong on the IMAA mailing list. Please trim 
your Cc lists. Thanks!

--Paul Hoffman, Director
--Internet Mail Consortium

Keith Moore | 26 Jan 2004 02:00
Picon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


> First of all end-users would not care much what protocol it is when
> "you've got mail" pops up.

No, but they might care about having email work more reliably.

> And for designers, programmers, and quicker implementation its quite a 
> bit
> better if new protocol and old one share certain components. For 
> example
> MIME encoding/decoding, etc. It would faciliate process  if protocol 
> can
> be implemented on same mail server software that would be able to 
> decide
> based on certain dns or other parameters what protocol to use when 
> sending
> email to the other end.

Yes that's why I would consider running both protocols on the same port.

On the other hand if the new protocol were designed well then there 
would be less need to share components.  For instance if all mail were 
binary transparent and delivered directly from the sender's submission 
server to the recipient's message store (which is close to what you 
need for reliable error reporting) there would be not be as much need 
for such an MTA to encode or decode MIME.

> And look at the example of IPv4 vs IPv6 - IPv6. While there are 
> substantial
> improvements in features available with IPv6, as far as implementation 
(Continue reading)

Hector Santos | 24 Jan 2004 02:03
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


Hello John,

In general,  I have much to say about any SMTP proposal that suggests SMTP
change at the software level. To me, a change in software opens the door for
other possible concepts that may address a particular needs in the email
industry.   IMTV, acceptance and deployment aspects of the proposal are
important and major considerations of any SMTP change proposal.  IMTV,
that's a matter of a having a strong functional specification dictation.

With that said,  I am putting my "Software Engineering Hat" on as if I was
going to implement your ENVL proposal today to see what would be the
technical implementations issues.  Based on this technical point of view,
here are my specific comments about your draft:

What does this ENVL proposal attempt to address?  (What problem(s) does it
solve?)

Excuse me if I missed it in the draft, but I see only three (3) concepts
that addresses a "problem is solved:"

1) Proposal provides a cleaner SMTP envelope or proposal to create a cleaner
RFC 822 trace header?

2) Cleaner SMTP envelope minimizes mail relay issues.

3) Syntax Error checking.

Is this correct?

(Continue reading)

william | 24 Jan 2004 03:25

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


The proposal does not really address "cleaner" or "more detailed" message 
routing data or define standards for it. What it does is to separate the 
actual message from the routing/envelope/trace data being added by mail 
servers while the message is in transit. Additional proposals would have 
to build on this one to define more trace data or more information about 
message routing, etc.

One possible benefit is that mail filtering systems would be able to deny 
message or bounce it if data is not acceptable to it (not properly signed, 
bad origin, etc). This has certain processing/bandwidth benefits as 
message would not have had to be processed or tranmitted in full (and then 
bounced back to "envelope-from" address which may not even exist) and 
could be bounced immediatly oe possibly certain authentication data could 
be requested and immediatly check on.

In general to other remarks I have to agree, it is not perfectly clear if 
its better to add number of additional extensions that may solve (close 
holes) in current email transport system or if its better to just design 
new mail transport protocol all together. However it maybe of benefit to 
everybody if we work on both approaches at the same time for now and 
decide what is better in the future (several prcidents to that - CRISP has 
worked on both FIRS/LDAP and IRIS/XML variants; LEMONADE is working on 
extensions for IMAP to do both do remote mail editing & submission and 
different approach to reference parts of imap message in message being 
submitted direct from end-user computer)

On Fri, 23 Jan 2004, Hector Santos wrote:

> 
(Continue reading)

John C Klensin | 24 Jan 2004 03:36

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


Replying to ietf-smtp only...

Hector,

As I indicated in my response to Nathaniel, this proposal is one 
of a group, most of whole elements are still to come (and some 
of the elements of which are small variations of work already 
done by others).  Also, the draft assumes that one has read 
carefully and understood 2821 and its terminology.  If this goes 
anywhere --and the purpose for posting it and some of its 
pending relatives is to start some discussion, not necessarily 
to have this result come out-- then it may be desirable to make 
it more self-contained... or it may make better sense to combine 
it with some other proposals.

I'll respond to a few of your other questions/ comments below.

--On Friday, 23 January, 2004 20:03 -0500 Hector Santos 
<winserver.support <at> winserver.com> wrote:

> In general,  I have much to say about any SMTP proposal that
> suggests SMTP change at the software level. To me, a change in
> software opens the door for other possible concepts that may
> address a particular needs in the email industry.   IMTV,
> acceptance and deployment aspects of the proposal are
> important and major considerations of any SMTP change
> proposal.  IMTV, that's a matter of a having a strong
> functional specification dictation.
>
(Continue reading)

Hector Santos | 24 Jan 2004 08:17
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


>
> That is very encouraging, although I would encourage you to read
> and understand the proposal, and probably to follow the
> discussion that is likely to unfold on this list (and, for part
> of the work, the ietf-imaa one) before committing yourself to
> implement anything.
>

John,

Please yield to any lingering benefits of your doubts you may have about me.
I am not an idiot.

Thanks

--

-- 
Hector Santos, Santronics Software, Inc.
http://www.santronics.com

Paul Hoffman / IMC | 16 Dec 2003 23:01
Picon

Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


OK, so here's my first pass at the "UTF-8 headers" strawman I put up 
a few weeks ago. I kinda rushed it together, so I might have missed 
some suggestions that I want to put in eventually.

Please remember to start threads with new Subject lines.

--Paul Hoffman

>To: IETF-Announce: ;
>From: Internet-Drafts <at> ietf.org
>Reply-to: Internet-Drafts <at> ietf.org
>Subject: I-D ACTION:draft-hoffman-utf8headers-00.txt
>Date: Tue, 16 Dec 2003 16:01:56 -0500
>Sender: owner-ietf-announce <at> ietf.org
>
>
>
>A New Internet-Draft is available from the on-line Internet-Drafts 
>directories.
>
>
>	Title		: SMTP Service Extensions or Transmission of 
>Headers in UTF-8 Encoding
>	Author(s)	: P. Hoffman
>	Filename	: draft-hoffman-utf8headers-00.txt
>	Pages		: 0
>	Date		: 2003-12-16
>
>Mailbox names often represent the names of human users. Many of these
(Continue reading)

Paul Hoffman / IMC | 22 Dec 2003 18:14
Picon

Re: Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


Er, any comments at all?

--Paul Hoffman, Director
--Internet Mail Consortium

Keith Moore | 1 Jan 2004 07:06
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


> Er, any comments at all?

there's no justification given for utf-8 headers.  the desired 
functionality can be accomplished by the address-map fields and 
encoding the fields in ascii.

there's no explanation as to where the address-map information would be 
obtained.

there's no cost analysis for a proposal which would appear to have a 
huge cost.  of course this is just a -00 version, but it's hard to 
evaluate the desirability of this proposal without considering the cost 
fairly quickly.

even accepting that it's a good idea to allow email addresses in raw 
utf-8 (and this is a stretch) many fields should remain ascii so that 
they can be read anywhere.  it will often make more sense to put 
ascii-encoded addresses, message-ids, etc. into log files than to put 
raw utf-8 there.

there are too many mail transport boundaries that don't use SMTP and 
thus may have no way to negotiate utf-8.

whether email addresses are in raw utf-8 or encoded in ascii there is 
still a need to define how they are compared, because there will often 
be more than one utf-8 representation of an address.

nit: the document repeatedly says that non-ASCII text is encoded in 
quoted-printable; this is incorrect.  RFC 2047 allows either a variant 
(Continue reading)

Charles Lindsey | 23 Dec 2003 00:54
Picon
Picon

Re: Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


On Mon, 22 Dec 2003 09:14:25 -0800, Paul Hoffman / IMC <phoffman <at> imc.org> 
wrote:

> Er, any comments at all?

OK, I'll bite. It has taken me a few days to get onto this mailing list, 
and some time to digest the document.

First let me introduce myself as the Editor of the Usefor Working Group. 
As some of you likely know, Usefor had intended UTF-8 headers to become 
the norm in Usenet, allowing for internationalized newsgroup-names. But it 
got bogged down in gatewaying into email, and forwards/backwards 
compatibility arguments, and "why didn't we invent yet another 8bit->7bit 
encoding". So our arms were twisted and we have now agreed to remove all 
that from the draft and, instead, produce an Experimental Protocol to deal 
with I18N issues. In the meantime, Usenet will have to get by with RFC 
2047 and RFC 2231.

So I was delighted to see this proposal, because it it gets accepted for 
Email, there is a greater chance that our Experimental I18N Protocol will 
be able to build upon it.

Now I note that the proposal comes in two parts. How to deal with local 
parts, and how to introduce UTF-8 in headers. These are somewhat 
orthogonal issues, so I will reserve my comments to the UTF-8 part, though 
I do have some concerns about local-parts too.

So here is your section 3, with my (indented) remarks.

(Continue reading)

Martin Duerst | 31 Dec 2003 22:41
Picon
Favicon

Re: Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


At 23:54 03/12/22 +0000, Charles Lindsey wrote:

>Indeed, the next big problem is how servers and other agents are to
>recognize whether any of the headers of a message contain any Non_ASCII.
>Yes, you could scan the headers of every message looking for an octet
> > 127, but that is a great expenditure of effort considering that 99.9% of
>the world's emails will have pure ASCII headers for several years to come.
>Far better to have some indication in the message that it is contains 8bit
>stuff (most likely an extra header to say so). Indeed, Mark Crispin is on
>record as saying that, if he is to have his arm twisted into having UTF-8
>headers in IMAP, he would insist on such a header).

I think such a header is not a bad idea. I don't think it's particularly
important, but if it helps, why not. As for actually scanning the headers,
I'm not sure about the 'great expediture'. If you have to scan all
headers to find the header that says it's UTF-8, doing the > 127 check
on the side is almost free.

>In addition to that, SMTP is not the only mechanism for transporting email
>(or netnews). There is UUCP. There is NNTP. There is X.400 (complete
>with complex gatewaying rules in and out). There are satellites and
>carrier pigeons and goodness knows what. Not all of these protocols will
>want to implement a UTF-8-HEADERS extension. Indeed, for UUCP and NNTP it
>is quite unnecessary, because they are 8bit clean already, and the
>upcoming NNTP draft already assumes UTF-8 (in the few places where it
>would notice).

For X.400 and UUCP, my assumption would be that things would be
downgraded anyway, which would mean to remove the header. Satellites
(Continue reading)

Adam M. Costello | 3 Jan 2004 11:11

Re: Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


Martin Duerst <duerst <at> w3.org> wrote:

> I definitely like a header much more than the 8: header prefix
> proposal, because ... 8-bit-clean software can just work on headers
> without having to care about 7-bit/8-bit issues except at very
> specific points (downgrading/upgrading).

Charles Lindsey <chl <at> clerew.man.ac.uk> wrote:

> ...I don't like the "8:" header prefix.  In some environments (notably
> Netnews) it would be much simpler to leave the headers in their
> present form (otherwise, all agents will have to learn to recognise a
> new set of headers which are really just synonyms for existing ones
> - that could be true of mail user agents too).  The advantage of the
> special header is that agents that don't need to be aware of the
> distinction can just ignore it.

There seems to be assumption here that existing "8-bit clean" software
will automagically understand "UTF-8 header fields" that use the same
field-names as existing ASCII header fields.  But "UTF-8 header fields"
have not even been defined yet, and there are plenty of important
details to work out.  All standard header fields (like To:) are defined
by grammars that currenly allow only ASCII characters.  UTF-8 header
fields would have different grammars.  Exactly which Unicode characters
would be allowed, and where?  The Unicode standard recommends that
equivalent strings be treated the same.  Will that be true for UTF-8
header fields?  If so, it means normalization needs to be done at some
point.  At what point?  When the field is created, or when it is parsed?
Which normalization, canonical or compatible?  Or some profile of
(Continue reading)

Charles Lindsey | 3 Jan 2004 20:30
Picon
Picon

Re: Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


On Sat, 3 Jan 2004 10:11:59 +0000, Adam M. Costello 
<ietf-imaa.amc+0 <at> nicemice.net.RemoveThisWord> wrote:

> Martin Duerst <duerst <at> w3.org> wrote:
>

> Charles Lindsey <chl <at> clerew.man.ac.uk> wrote:
>
>> ...I don't like the "8:" header prefix.  In some environments (notably
>> Netnews) it would be much simpler to leave the headers in their
>> present form (otherwise, all agents will have to learn to recognise a
>> new set of headers which are really just synonyms for existing ones
>> - that could be true of mail user agents too).  The advantage of the
>> special header is that agents that don't need to be aware of the
>> distinction can just ignore it.
>
> There seems to be assumption here that existing "8-bit clean" software
> will automagically understand "UTF-8 header fields" that use the same
> field-names as existing ASCII header fields.

Eh? Of course they will, because those ASCII header fields are already 
correct UTF-8. Nothing automagic needed there.

>  But "UTF-8 header fields"
> have not even been defined yet, and there are plenty of important
> details to work out.  All standard header fields (like To:) are defined
> by grammars that currenly allow only ASCII characters.  UTF-8 header
> fields would have different grammars.  Exactly which Unicode characters
> would be allowed, and where?  The Unicode standard recommends that
(Continue reading)

Keith Moore | 3 Jan 2004 16:36
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


>> the possibility exists that the gateway isn't aware of the utf-8
>> extension, so it injects messages with utf-8 headers and addresses
>> into the legacy mail system without doing a conversion.
>
> Yes, that's a good argument that UTF-8 header fields are likely to
> fall unexpectedly into the hands of software that doesn't know how
> to handle them.  If UTF-8 header fields use the same field names as
> the corresponding ASCII header fields, there's no telling what will
> happen.  By using new never-before-used field names for the UTF-8 
> header
> fields (or even an entirely different header format) we could avoid 
> that
> pitfall.

but using new field names for utf-8 versions of existing fields has 
other pitfalls - e.g. that the utf-8 and ascii versions can get out of 
sync, that the utf-8 fields will get converted to 2047 or bit-stripped, 
etc.

Adam M. Costello | 5 Jan 2004 02:08

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


Keith Moore <moore <at> cs.utk.edu> wrote:

> > UTF-8 header fields are likely to fall unexpectedly into the hands
> > of software that doesn't know how to handle them.  If UTF-8 header
> > fields use the same field names as the corresponding ASCII header
> > fields, there's no telling what will happen.
>
> but using new field names for utf-8 versions of existing fields has
> other pitfalls - e.g. that the utf-8 and ascii versions can get out of
> sync,

That is indeed a concern that does not arise if UTF-8 header fields use
the same field names as ASCII header fields.

> that the utf-8 fields will get converted to 2047 or bit-stripped, etc.

That concern applies equally regardless of whether UTF-8 header fields
use the same field-names as ASCII header fields.

Charles Lindsey <chl <at> clerew.man.ac.uk> wrote:

> > There seems to be assumption here that existing "8-bit clean"
> > software will automagically understand "UTF-8 header fields" that
> > use the same field-names as existing ASCII header fields.
>
> Eh? Of course they will, because those ASCII header fields are already
> correct UTF-8.  Nothing automagic needed there.

Non-ASCII field contents are invalid according to the spec that was
(Continue reading)

Keith Moore | 5 Jan 2004 05:16
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


>> but using new field names for utf-8 versions of existing fields has
>> other pitfalls - e.g. that the utf-8 and ascii versions can get out of
>> sync,
>
> That is indeed a concern that does not arise if UTF-8 header fields use
> the same field names as ASCII header fields.

no, the concern is when the utf-8 fields are expected to contain the 
same information as the ascii fields, but in a different format.

Arnt Gulbrandsen | 29 Dec 2003 15:33
Picon
Favicon
Gravatar

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


Nathaniel Borenstein writes:
> I would worry a bit that there may still be mailers out there that 
> don't always convey all instances of a header field that appears more 
> than once -- e.g. if they convert into, say, a database 
> representation and can only have one value indexed to the field name 
> "8", there might be information lost when that gets converted back 
> into RFC [2]822 format.  -- Nathaniel

Sure. I've seen such code code convert

To: a <at> b.com
To: c <at> d.com

into

To: a <at> b.com, c <at> d.com

I don't think it's a problem. The former is illegal to start with IIRC, 
so forwarding it unchanged is just as illegal as forwarding it changed.

I can't, right now, remember seeing problems caused by such code. I'm 
sure there are some, for example with non-standard X- fields.

--Arnt

John Cowan | 29 Dec 2003 19:51

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


Arnt Gulbrandsen scripsit:

> Sure. I've seen such code code convert
> 
> To: a <at> b.com
> To: c <at> d.com
> 
> into
> 
> To: a <at> b.com, c <at> d.com
> 
> I don't think it's a problem. The former is illegal to start with IIRC, 
> so forwarding it unchanged is just as illegal as forwarding it changed.

Multiple TO fields are discouraged by RFC 822, illegal according to
RFC 2822.  But the Comments, Keywords, Received, and Resent-* fields MAY
(and in some circumstances MUST) occur multiple times even according to
the stricter standards of 2822.

--

-- 
Her he asked if O'Hare Doctor tidings sent from far     John Cowan
coast and she with grameful sigh him answered that      www.ccil.org/~cowan
O'Hare Doctor in heaven was. Sad was the man that word  www.reutershealth.com
to hear that him so heavied in bowels ruthful. All      jcowan <at> reutershealth.com
she there told him, ruing death for friend so young,
algate sore unwilling God's rightwiseness to withsay.   _Ulysses_, "Oxen"

Keith Moore | 29 Dec 2003 17:16
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


> Sure. I've seen such code code convert
>
> To: a <at> b.com
> To: c <at> d.com
>
> into
>
> To: a <at> b.com, c <at> d.com
>
> I don't think it's a problem. The former is illegal to start with 
> IIRC, so forwarding it unchanged is just as illegal as forwarding it 
> changed.

nope, that's bad layering.  the job of an MTA is to forward each 
message intact, not to potentially change every message so that the 
message it forwards is correct syntax.  the latter behavior is much 
more error prone as the code that detects and corrects errors may be 
buggy, and often errors cannot be corrected without resorting to 
unreliable heuristics.  it also makes it much harder to upgrade the 
message format.  imagine what happens when utf-8 headers leak (as they 
inevitably will) into an MTA that tries to insist that all headers on 
messages that it forwards are 7bit only.  the output is not likely to 
be usable even if it is valid syntax.

we do recognize that there are some conversions and gateway operations 
that require well-formed input in order to produce well-formed output. 
such conversions are occasionally necessary but should be avoided when 
possible.

(Continue reading)

Arnt Gulbrandsen | 29 Dec 2003 17:40
Picon
Favicon
Gravatar

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


Keith Moore writes:
>> Sure. I've seen such code code convert
...
> nope, that's bad layering.

It is now. Back when the internet had lots of mail gateways that needed 
to do format conversion it wasn't, IMO.

> such conversions are occasionally necessary but should be avoided when 
> possible.

Agree 100%.

Luckily I've never seen this particular sort of conversion mess things 
up. Perhaps the consequences are so visible and so bad that problems 
get fixed before release.

Anyway, unless people have seen worse problems, the "8" field approach 
("8: from: ü <at> ü.de") shouldn't cause run into canonicalization problems. 
It may be zealously removed, it may be mishandled, but I don't see any 
reason that it'll be mashed together with other 8s.

--Arnt

Charles Lindsey | 1 Jan 2004 12:17
Picon
Picon

Re: Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


On Wed, 31 Dec 2003 16:41:55 -0500, Martin Duerst <duerst <at> w3.org> wrote:

> At 23:54 03/12/22 +0000, Charles Lindsey wrote:

>
>> In addition to that, SMTP is not the only mechanism for transporting 
>> email
>> (or netnews). There is UUCP. There is NNTP. ... Not all of these 
>> protocols will
>> want to implement a UTF-8-HEADERS extension.
>
> For X.400 and UUCP, my assumption would be that things would be
> downgraded anyway, which would mean to remove the header. Satellites
> are not a protocol, and carrier pigeons carry paper, where we don't
> even need UTF-8 :-). But in connection with NNTP, and for certain kinds
> of local processing (procmail,...), it would probably make sense.
> It may also ease implementation because it gives guidance for
> internal (mail spool) formats.

I don't think you would need to downgrade for UUCP, because it is already 
8-bit clean. But my point was that a message might happily wander around 
within one protocol (UUCP or NNTP) without anybody needing to care about 
the encoding or to check for "UTF-8-HEADERS". Then suddenly it arrives at 
a gateway into something else (e.g. SMTP or an IMAP store) where the 
distinction really matters. So the implementor of the gateway needs some 
quick way to discover whether this particular message needs special 
handling, and the presence of an extra header is probably the simplest way 
to do it.

(Continue reading)

Keith Moore | 1 Jan 2004 15:04
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


On Jan 1, 2004, at 6:17 AM, Charles Lindsey wrote:

> I don't think you would need to downgrade for UUCP, because it is 
> already 8-bit clean. But my point was that a message might happily 
> wander around within one protocol (UUCP or NNTP) without anybody 
> needing to care about the encoding or to check for "UTF-8-HEADERS". 
> Then suddenly it arrives at a gateway into something else (e.g. SMTP 
> or an IMAP store) where the distinction really matters.

the possibility exists that the gateway isn't aware of the utf-8 
extension, so it injects messages with utf-8 headers and addresses into 
the legacy mail system without doing a conversion.

> Yes, Email carrries more weight within IETF, and if that means this 
> can be brought straight to standards track, then I would be delighted.

I think it's exactly the opposite.  email is viewed as an essential 
service; usenet isn't.  also, many people feel that usenet is already a 
hopeless mess and they haven't quite gotten to feeling that way about 
email (though there is a trend in this direction).  so there is 
considerable reluctance to making disruptive changes to email, whereas 
with usenet, the attitude is more likely to be "who cares?"  or "why 
are you bothering to upgrade usenet anyway?"

> Yes, our standard will say that the code used in headers MUST be 
> UTF-8, and other codes MUST NOT be used. That, sadly, is not 
> sufficient to prevent it from happening. Which is why I suggest that 
> our Foobar header should contain a possible handle to indicate other 
> usages, though clearly that handle "MUST NOT be used".
(Continue reading)

John C Klensin | 1 Jan 2004 19:05

UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


--On Thursday, 01 January, 2004 09:04 -0500 Keith Moore 
<moore <at> cs.utk.edu> wrote:

> On Jan 1, 2004, at 6:17 AM, Charles Lindsey wrote:
>
>> I don't think you would need to downgrade for UUCP, because
>> it is  already 8-bit clean. But my point was that a message
>> might happily  wander around within one protocol (UUCP or
>> NNTP) without anybody  needing to care about the encoding or
>> to check for "UTF-8-HEADERS".  Then suddenly it arrives at a
>> gateway into something else (e.g. SMTP  or an IMAP store)
>> where the distinction really matters.
>
> the possibility exists that the gateway isn't aware of the
> utf-8 extension, so it injects messages with utf-8 headers and
> addresses into the legacy mail system without doing a
> conversion.

Or, worse, that it "downgrades" the UTF-8 by zeroing out all of 
the high bits.   For anyone who doesn't know (I know Keith 
does), we have seen both behaviors many times.   And what this 
really says is that, if UUCP-based mail is now defined as "8-bit 
clean", it is a requirement of a gateway that conforms to RFC 
2821 that it detect the presence of 8bit characters and do 
something intelligent.  Now that requirement exists today, and 
existed long before this particular discussion and mailing list 
got started.  If a UUCP-based mail message that contains 8bit 
information in the body gets to a gateway into an [E]SMTP 
environment, it must (sorry, MUST) tag that information 
(Continue reading)

Keith Moore | 1 Jan 2004 20:11
Picon

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


> > at the time we were working on what became RFC 1342 we
> > realized that a single header field to tag the charset used
> > throughout the header would not be sufficient, because
> > different parts of the header are generated by different
> > agents on different machines.   one of the reasons for 1342
> > was to be able to encode such things in ASCII, but another
> > reason was to be able to tag each bit of human-readable text
> > with a separate charset.  what we might be finding out is that
> > it's not reasonable to expect everyone to use utf-8, and that
> > we're going to continue to need to deal with multiple charsets
> > (though perhaps fewer than are in use now) perhaps including
> > different charsets in different parts of the message header.
> 
> Keith, while I agreed with (and strongly supported) that 
> reasoning at the time, in the ensuing eleven or so years Unicode 
> (and maybe UTF-8) have achieved sufficient adoption that it 
> might now be reasonable to say "systems injecting non-ASCII 
> characters into header fields or equivalent contexts are 
> required to take responsibility for converting to UTF-8"... 

that's onerous but more-or-less doable.  what doesn't seem doable is
to prevent subsequent systems that handle the message (or reply
to it) from adding their own, non-utf-8, header contents.

John C Klensin | 1 Jan 2004 20:42

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


--On Thursday, 01 January, 2004 14:11 -0500 Keith Moore 
<moore <at> cs.utk.edu> wrote:

>> > at the time we were working on what became RFC 1342 we
>> > realized that a single header field to tag the charset used
>> > throughout the header would not be sufficient, because
>> > different parts of the header are generated by different
>> > agents on different machines.   one of the reasons for 1342
>> > was to be able to encode such things in ASCII, but another
>> > reason was to be able to tag each bit of human-readable text
>> > with a separate charset.  what we might be finding out is
>> > that it's not reasonable to expect everyone to use utf-8,
>> > and that we're going to continue to need to deal with
>> > multiple charsets (though perhaps fewer than are in use
>> > now) perhaps including different charsets in different
>> > parts of the message header.
>>
>> Keith, while I agreed with (and strongly supported) that
>> reasoning at the time, in the ensuing eleven or so years
>> Unicode  (and maybe UTF-8) have achieved sufficient adoption
>> that it  might now be reasonable to say "systems injecting
>> non-ASCII  characters into header fields or equivalent
>> contexts are  required to take responsibility for converting
>> to UTF-8"...
>
> that's onerous but more-or-less doable.  what doesn't seem
> doable is to prevent subsequent systems that handle the
> message (or reply to it) from adding their own, non-utf-8,
> header contents.
(Continue reading)

Keith Moore | 1 Jan 2004 21:55
Picon

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


> >> Keith, while I agreed with (and strongly supported) that
> >> reasoning at the time, in the ensuing eleven or so years
> >> Unicode  (and maybe UTF-8) have achieved sufficient adoption
> >> that it  might now be reasonable to say "systems injecting
> >> non-ASCII  characters into header fields or equivalent
> >> contexts are  required to take responsibility for converting
> >> to UTF-8"...
> >
> > that's onerous but more-or-less doable.  what doesn't seem
> > doable is to prevent subsequent systems that handle the
> > message (or reply to it) from adding their own, non-utf-8,
> > header contents.
> 
> I'm not sure I see the issue.  At one level, nothing can prevent 
> anything or anyone from adding trash, anywhere they like.

no, but the "reply" operation is fairly normal - in particular,
taking existing to/cc/reply-to and subject header fields, adding 
new things to them, removing other things, and generally rearranging
them are all common operations - and these "new things" can be
from machine or human sources.

> At another level, if there is a 
> specification that says "if you add 8bit header content, it must 
> be UTF-8; anything else must either be converted into RFC 2047 
> form or must be converted to UTF-8", then we are probably ok. 

well, we already have widespread practice of taking rfc 2047 and
decoding it into whatever charset the MUA happens to want to use -
(Continue reading)

Martin Duerst | 2 Jan 2004 22:51
Picon
Favicon

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-DACTION:draft-hoffman-utf8headers-00.txt)


At 15:55 04/01/01 -0500, Keith Moore wrote:

> > At another level, if there is a
> > specification that says "if you add 8bit header content, it must
> > be UTF-8; anything else must either be converted into RFC 2047
> > form or must be converted to UTF-8", then we are probably ok.
>
>well, we already have widespread practice of taking rfc 2047 and
>decoding it into whatever charset the MUA happens to want to use -
>mixing utf-8 with that probably produces unpredictable results,
>and insisting that all non-tagged non-ASCII text is utf-8 is probably
>just naive.

This is an interesting point. But it doesn't exactly work that way.
There are MUAs that can handle only one charset, or a few very
related ones (I still use such a beast, but I'm thinking hard
about how to get rid of it). Those won't feel very well with UTF-8,
but they also won't feel well with IDNA, and a few other things.
The MUAs that can handle a wide variety of encodings will just
use Unicode inside. Being able to handle something like
    Subject: =?iso-8859-1?Q?...?= =?iso-2022-jp?B?...?=
in a way that the user can actually look at means using
Unicode (or it means using some very messy code that would get a
lot easier if they switched to Unicode).

>IIRC, unicode wasn't known to be stable at that time.  it certainly
>wasn't widely adopted, and it's hard to imagine that we could have
>gotten consensus on such a rule.  it's only after 10 years' experience
>with unicode that we have some confidence in its character repertoire,
(Continue reading)

Keith Moore | 3 Jan 2004 03:58
Picon

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-DACTION:draft-hoffman-utf8headers-00.txt)


>> well, we already have widespread practice of taking rfc 2047 and
>> decoding it into whatever charset the MUA happens to want to use -
>> mixing utf-8 with that probably produces unpredictable results,
>> and insisting that all non-tagged non-ASCII text is utf-8 is probably
>> just naive.
>
> This is an interesting point. But it doesn't exactly work that way.
> There are MUAs that can handle only one charset, or a few very
> related ones (I still use such a beast, but I'm thinking hard
> about how to get rid of it). Those won't feel very well with UTF-8,
> but they also won't feel well with IDNA, and a few other things.
> The MUAs that can handle a wide variety of encodings will just
> use Unicode inside.

MUAs don't have to handle a wide variety of encodings in order to 
translate encodings that they do understand to a character encoding 
other than utf-8 and put *that* in the message header.

>> It's only after 10 years' experience with unicode that we have some 
>> confidence in its character repertoire, and we have even less 
>> experience with other aspects of it.
>
> Others have discussed the first part of this paragraph.
> For the second part (from the "it's only"), I think the
> 'some confidence' and 'even less experience' is a clear
> understatement if the 'we' means the overall Internet or
> IT community.

I disagree.  It has taken most of those 10 years to get enough software 
(Continue reading)

ned.freed | 2 Jan 2004 00:24

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


> > We _could_ have adopted a rigid, Unicode-only, rule a decade ago
> > rather than doing charset-specific tagging in text content types
> > and 2047 encodings.

> IIRC, unicode wasn't known to be stable at that time.  it certainly
> wasn't widely adopted, and it's hard to imagine that we could have
> gotten consensus on such a rule.  it's only after 10 years' experience
> with unicode that we have some confidence in its character repertoire,
> and we have even less experience with other aspects of it.

A decade ago isn't a particularly relevant date in the development of MIME --
at that point in time MIME was already a draft standard, making such a change
quite difficult to make.

A much more relevant date is November 18-22, 1991. This is the date of the last
IETF meeting prior to the approval of MIME as a proposed standard.
Realistically, this was the last point in time at which a change as major as
uncategorical endorsement of a single, universal charset  specification could
have been made. The actual MIME specifications were subsequently submitted to
the IESG for approval in January 1992.

According to the Unicode web site the complete specification of Unicode 1.0
wasn't published until June, 1992. (Amusingly enough, that was the same month
in which RFC 1341 appeared.) In November 1991 the universal charset situation
was far from clear: What what then called 10646 seemed to be on the way
out and Unicode seemed to be on the way in but no conclusions had been reached.

This led to the following text appearing in RFC 1341:

(Continue reading)

Keld Jørn Simonsen | 2 Jan 2004 01:23
Picon

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


On Thu, Jan 01, 2004 at 03:24:59PM -0800, ned.freed <at> mrochek.com wrote:
> 
> > > We _could_ have adopted a rigid, Unicode-only, rule a decade ago
> > > rather than doing charset-specific tagging in text content types
> > > and 2047 encodings.
> 
> > IIRC, unicode wasn't known to be stable at that time.  it certainly
> > wasn't widely adopted, and it's hard to imagine that we could have
> > gotten consensus on such a rule.  it's only after 10 years' experience
> > with unicode that we have some confidence in its character repertoire,
> > and we have even less experience with other aspects of it.
> 
> A decade ago isn't a particularly relevant date in the development of MIME --
> at that point in time MIME was already a draft standard, making such a change
> quite difficult to make.
> 
> A much more relevant date is November 18-22, 1991. This is the date of the last
> IETF meeting prior to the approval of MIME as a proposed standard.
> Realistically, this was the last point in time at which a change as major as
> uncategorical endorsement of a single, universal charset  specification could
> have been made. The actual MIME specifications were subsequently submitted to
> the IESG for approval in January 1992.
> 
> According to the Unicode web site the complete specification of Unicode 1.0
> wasn't published until June, 1992. (Amusingly enough, that was the same month
> in which RFC 1341 appeared.) In November 1991 the universal charset situation
> was far from clear: What what then called 10646 seemed to be on the way
> out and Unicode seemed to be on the way in but no conclusions had been reached.

(Continue reading)

ned.freed | 2 Jan 2004 05:35

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


> On Thu, Jan 01, 2004 at 03:24:59PM -0800, ned.freed <at> mrochek.com wrote:
> >
> > > > We _could_ have adopted a rigid, Unicode-only, rule a decade ago
> > > > rather than doing charset-specific tagging in text content types
> > > > and 2047 encodings.
> >
> > > IIRC, unicode wasn't known to be stable at that time.  it certainly
> > > wasn't widely adopted, and it's hard to imagine that we could have
> > > gotten consensus on such a rule.  it's only after 10 years' experience
> > > with unicode that we have some confidence in its character repertoire,
> > > and we have even less experience with other aspects of it.
> >
> > A decade ago isn't a particularly relevant date in the development of MIME --
> > at that point in time MIME was already a draft standard, making such a change
> > quite difficult to make.
> >
> > A much more relevant date is November 18-22, 1991. This is the date of the last
> > IETF meeting prior to the approval of MIME as a proposed standard.
> > Realistically, this was the last point in time at which a change as major as
> > uncategorical endorsement of a single, universal charset  specification could
> > have been made. The actual MIME specifications were subsequently submitted to
> > the IESG for approval in January 1992.
> >
> > According to the Unicode web site the complete specification of Unicode 1.0
> > wasn't published until June, 1992. (Amusingly enough, that was the same month
> > in which RFC 1341 appeared.) In November 1991 the universal charset situation
> > was far from clear: What what then called 10646 seemed to be on the way
> > out and Unicode seemed to be on the way in but no conclusions had been reached.

(Continue reading)

Keld Jørn Simonsen | 2 Jan 2004 18:19
Picon

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


On Thu, Jan 01, 2004 at 08:35:24PM -0800, ned.freed <at> mrochek.com wrote:
> 
> 
> > On Thu, Jan 01, 2004 at 03:24:59PM -0800, ned.freed <at> mrochek.com wrote:
> > >
> > > > > We _could_ have adopted a rigid, Unicode-only, rule a decade ago
> > > > > rather than doing charset-specific tagging in text content types
> > > > > and 2047 encodings.
> > >
> > > > IIRC, unicode wasn't known to be stable at that time.  it certainly
> > > > wasn't widely adopted, and it's hard to imagine that we could have
> > > > gotten consensus on such a rule.  it's only after 10 years' experience
> > > > with unicode that we have some confidence in its character repertoire,
> > > > and we have even less experience with other aspects of it.
> > >
> > > A decade ago isn't a particularly relevant date in the development of MIME --
> > > at that point in time MIME was already a draft standard, making such a change
> > > quite difficult to make.
> > >
> > > A much more relevant date is November 18-22, 1991. This is the date of the last
> > > IETF meeting prior to the approval of MIME as a proposed standard.
> > > Realistically, this was the last point in time at which a change as major as
> > > uncategorical endorsement of a single, universal charset  specification could
> > > have been made. The actual MIME specifications were subsequently submitted to
> > > the IESG for approval in January 1992.
> > >
> > > According to the Unicode web site the complete specification of Unicode 1.0
> > > wasn't published until June, 1992. (Amusingly enough, that was the same month
> > > in which RFC 1341 appeared.) In November 1991 the universal charset situation
(Continue reading)

Martin Duerst | 2 Jan 2004 19:27
Picon
Favicon

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


At 18:19 04/01/02 +0100, Keld J$BS(Bn Simonsen wrote:

>I am not sure that we today should only go for UTF-8 for the
>enhancements on email addresses, as UTF-8 as used in IETF specs is not
>an international standard.

Please explain what the difference is between "UTF-8 as used in IETF
specs" and UTF-8 as defined in ISO 10646? And in what way is this
difference relevant to the task at hand?

>I also fear that just sending 8 bit will harm existing conforming
>implementations.

As far as I understand, that's not what the UTF-8-HEADERS extension
is about.

Regards,  Martin.

John C Klensin | 2 Jan 2004 18:49

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


--On Friday, 02 January, 2004 18:19 +0100 Keld Jørn Simonsen 
<keld <at> dkuug.dk> wrote:

>...
> I am not sure that we today should only go for UTF-8 for the
> enhancements on email addresses, as UTF-8 as used in IETF
> specs is not an international standard. We should probably
> then rather use what we already did for names, and the
> functions to handle this is already there. I also fear that
> just sending 8 bit will harm existing conforming
> implementations.

Keld,

I'll let others respond to others of your points if they feel 
like it.  To them, the strongest thing I can say is what I have 
said before -- fewer options promote better interoperability 
and, while I'm not a huge fan of UTF-8, I consider, in the 
absence of overwhelming arguments and something _clearly_ 
better, UTF-8 alone to be a better choice than UTF-8 plus 
something.

I don't know that anyone has proposed "just sending 8 bit".  An 
existing implementation that has any 8 bit characters in its 
headers is not conforming, so the question of a conforming 
implementation interpreting 8 bit strings as other than UTF-8 
(or whatever a standard specifies) is just not an issue.   And, 
because of the history of non-conforming implementations sending 
assorted unlabeled 8bit stuff (much more in message bodies than 
(Continue reading)

John C Klensin | 2 Jan 2004 01:05

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


Ned,

Thanks for clarifying the dates in my deliberately-vague 
"decade".

We are, I think, in complete agreement: we couldn't have 
rationally done anything else than what we did, and the text you 
cite explains exactly why we made that decision.  I was only 
suggesting that, faced with similar decisions, but a new 
context, today, we are not, and should not be, obligated to 
replicate the decision of what is now thirteen or fourteen years 
ago.

    john

--On Thursday, 01 January, 2004 15:24 -0800 
ned.freed <at> mrochek.com wrote:

>> > We _could_ have adopted a rigid, Unicode-only, rule a
>> > decade ago rather than doing charset-specific tagging in
>> > text content types and 2047 encodings.
>
>> IIRC, unicode wasn't known to be stable at that time.  it
>> certainly wasn't widely adopted, and it's hard to imagine
>> that we could have gotten consensus on such a rule.  it's
>> only after 10 years' experience with unicode that we have
>> some confidence in its character repertoire, and we have even
>> less experience with other aspects of it.
>
(Continue reading)

Dave Crocker | 2 Jan 2004 08:05

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


On Thu, 01 Jan 2004 19:05:06 -0500, John C Klensin wrote:
>  Thanks for clarifying the dates in my deliberately-vague "decade".
>
>  We are, I think, in complete agreement: we couldn't have rationally done
>  anything else than what we did, and the text you cite explains exactly why
>  we made that decision.

I believe the situation was stronger than "we couldn't have rationally done 
anything else."  My recollection is that MIME was delayed roughly a year due 
to the claims that Unicode was the One True solution, in spite of it not yet 
having attained stability or installed base.  Indeed, we could not delay MIME 
longer.

d/
--
Dave Crocker <dcrocker-at-brandenburg-dot-com>
Brandenburg InternetWorking <http://brandenburg.com>

Keith Moore | 2 Jan 2004 05:58
Picon

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


> We are, I think, in complete agreement: we couldn't have rationally 
> done anything else than what we did, and the text you cite explains 
> exactly why we made that decision.  I was only suggesting that, faced 
> with similar decisions, but a new context, today, we are not, and 
> should not be, obligated to replicate the decision of what is now 
> thirteen or fourteen years ago.

Of course not, but nobody has really suggested that we do so.  At the 
same time there are some things about email that were true then that 
remain true now - one of which is that different pieces of the message 
header are generated at different places by different agents, which 
won't all use the same native character encoding and won't all get 
upgraded to utf-8 at the same time.

Martin Duerst | 2 Jan 2004 18:03
Picon
Favicon

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


At 23:58 04/01/01 -0500, Keith Moore wrote:

>Of course not, but nobody has really suggested that we do so.  At the same 
>time there are some things about email that were true then that remain 
>true now - one of which is that different pieces of the message header are 
>generated at different places by different agents, which won't all use the 
>same native character encoding and won't all get upgraded to utf-8 at the 
>same time.

This is a valid point, and it seems to lead to an interesting question
that I haven't seen discussed yet: For the SMPT extension proposed in
Paul's draft, and for Charles' header, what's the policy with respect
to stuff encoded with RFC 2047? In detail:

- Is a message tagged with Charles' header allowed to contain RFC 2047 stuff?
   (I would propose we say: MAY contain RFC 2047-encoded stuff)
- Is a message passed over SMTP with UTF-8-HEADERS allowed to contain
   RFC 2047 stuff? The way I understand SMTP extensions (experts on this
   list, please correct me if I'm wrong), this is a somewhat moot question,
   because it's the server that says what extensions it supports; the client
   doesn't say which extensions it uses (unless through the use of
   parameters in commands, but there are none for UTF-8-HEADERS).
   So my understanding is that RFC 2047-encoded headers are not disallowed.
- Does 'upgrade' include conversion from RFC 2047-encoded headers to
   raw UTF-8 (even if the RFC 2047 encoding doesn't use UTF-8)?
   I didn't find this in Paul's current draft; there is at the moment
   not yet much about upgrading overall. I would propose we say
   "upgrading MUST convert RFC 2047-encoded text to UTF-8 if the charset
   used in the RFC 2047-encoding is UTF-8, and SHOULD (or MAY?) convert
(Continue reading)

Charles Lindsey | 2 Jan 2004 18:53
Picon
Picon

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


On Fri, 02 Jan 2004 12:03:50 -0500, Martin Duerst <duerst <at> w3.org> wrote:

> - Is a message tagged with Charles' header allowed to contain RFC 2047 
> stuff?
>    (I would propose we say: MAY contain RFC 2047-encoded stuff)

Yes, I would think so.

> - Is a message passed over SMTP with UTF-8-HEADERS allowed to contain
>    RFC 2047 stuff?

Yes, I don't see why not. In the fullness of time, one hopes that the use 
of RFC 2047 will gradually disappear, but it is for the marketplace to 
determine when.

> The way I understand SMTP extensions (experts on this
>    list, please correct me if I'm wrong), this is a somewhat moot 
> question,
>    because it's the server that says what extensions it supports; the 
> client
>    doesn't say which extensions it uses (unless through the use of
>    parameters in commands, but there are none for UTF-8-HEADERS).
>    So my understanding is that RFC 2047-encoded headers are not 
> disallowed.

But here I have a problem. RFC 2821 seems written on the assumption that 
the client is just the last server in the chain, so of course it can 
announce whether it supports UTF-8-HEADERS or not.

(Continue reading)

John C Klensin | 2 Jan 2004 17:47

Re: UUCP, etc., and SMTP/822/MIME mail (was: Re: I-D ACTION:draft-hoffman-utf8headers-00.txt)


--On Thursday, 01 January, 2004 23:58 -0500 Keith Moore 
<moore <at> cs.utk.edu> wrote:

>> We are, I think, in complete agreement: we couldn't have
>> rationally  done anything else than what we did, and the text
>> you cite explains  exactly why we made that decision.  I was
>> only suggesting that, faced  with similar decisions, but a
>> new context, today, we are not, and  should not be, obligated
>> to replicate the decision of what is now  thirteen or
>> fourteen years ago.
>
> Of course not, but nobody has really suggested that we do so.
> At the same time there are some things about email that were
> true then that remain true now - one of which is that
> different pieces of the message header are generated at
> different places by different agents, which won't all use the
> same native character encoding and won't all get upgraded to
> utf-8 at the same time.

Of course not.

Keith, might I respectively suggest that you stop firing off 
responses and, instead, try reading the notes to which you are 
responding carefully enough to understand what they really say 
before reacting to them.

I'm certainly not naive enough to believe in "all get upgraded 
to UTF-8 at the same time" and didn't suggest that.  It is 
obvious to me that we will be living with (and _should_ be 
(Continue reading)

Keith Moore | 1 Jan 2004 06:15
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


>> Indeed, the next big problem is how servers and other agents are to
>> recognize whether any of the headers of a message contain any 
>> Non_ASCII.
>> Yes, you could scan the headers of every message looking for an octet
>> > 127, but that is a great expenditure of effort considering that 
>> 99.9% of
>> the world's emails will have pure ASCII headers for several years to 
>> come.
>> Far better to have some indication in the message that it is contains 
>> 8bit
>> stuff (most likely an extra header to say so). Indeed, Mark Crispin 
>> is on
>> record as saying that, if he is to have his arm twisted into having 
>> UTF-8
>> headers in IMAP, he would insist on such a header).
>
> I think such a header is not a bad idea. I don't think it's 
> particularly
> important, but if it helps, why not. As for actually scanning the 
> headers,
> I'm not sure about the 'great expediture'. If you have to scan all
> headers to find the header that says it's UTF-8, doing the > 127 check
> on the side is almost free.

having a single flag to say that fields are in utf-8 is ridiculous - 
first because the fields aren't all generated at the same place, and 
second because (as you point out) you potentially have to scan the 
whole header anyway to find the new header field.

(Continue reading)

Paul Hoffman / IMC | 3 Jan 2004 02:31
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


At 12:15 AM -0500 1/1/04, Keith Moore wrote:

>having a single flag to say that fields are in utf-8 is ridiculous - 
>first because the fields aren't all generated at the same place, and 
>second because (as you point out) you potentially have to scan the 
>whole header anyway to find the new header field.

Neither of those arguments seems that relevant.

- It doesn't matter if all are generated in the same place, just that 
they are all generated the same way. Non-updated MUAs and MTAs 
generate headers in UTF-8 (that is, in ASCII, a proper subset of 
UTF-8), and updated MUAs and MTAs generate headers in UTF-8. 
Non-compliant MUAs and MTAs will mess up whatever we do.

- What's the problem with having to scan the whole header? Why is 
this onerous for a terminal MTA? (It is already done by the MUA.)

>but as far as I'm concerned putting utf-8 in headers is a nonstarter 
>anyway.  there's simply no justification for  it.

The justification is that the only proposal that doesn't involve 
non-ASCII in the headers, draft-hoffman-imaa-03.txt, has two fairly 
significant side-effects, namely that senders who have not updated 
their MUAs will not sanely be able to initiate mail to non-ASCII 
mailboxes and that recipients who have not updated their MUAs will 
see gibberish.

At 1:06 AM -0500 1/1/04, Keith Moore wrote:
(Continue reading)

Keith Moore | 3 Jan 2004 03:46
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


>> having a single flag to say that fields are in utf-8 is ridiculous - 
>> first because the fields aren't all generated at the same place, and 
>> second because (as you point out) you potentially have to scan the 
>> whole header anyway to find the new header field.
>
> Neither of those arguments seems that relevant.
>
> - It doesn't matter if all are generated in the same place, just that 
> they are all generated the same way. Non-updated MUAs and MTAs 
> generate headers in UTF-8 (that is, in ASCII, a proper subset of 
> UTF-8),

actually, they generate headers in a variety of charsets.

>> but as far as I'm concerned putting utf-8 in headers is a nonstarter 
>> anyway.  there's simply no justification for  it.
>
> The justification is that the only proposal that doesn't involve 
> non-ASCII in the headers, draft-hoffman-imaa-03.txt, has two fairly 
> significant side-effects, namely that senders who have not updated 
> their MUAs will not sanely be able to initiate mail to non-ASCII 
> mailboxes
> and that recipients who have not updated their MUAs will see gibberish.

both of those side-effects also exist for your utf-8 header proposal.

> At 1:06 AM -0500 1/1/04, Keith Moore wrote:
>
>> there's no justification given for utf-8 headers.  the desired 
(Continue reading)

Paul Hoffman / IMC | 3 Jan 2004 04:25
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


At 9:46 PM -0500 1/2/04, Keith Moore wrote:
>>>having a single flag to say that fields are in utf-8 is ridiculous 
>>>- first because the fields aren't all generated at the same place, 
>>>and second because (as you point out) you potentially have to scan 
>>>the whole header anyway to find the new header field.
>>
>>Neither of those arguments seems that relevant.
>>
>>- It doesn't matter if all are generated in the same place, just 
>>that they are all generated the same way. Non-updated MUAs and MTAs 
>>generate headers in UTF-8 (that is, in ASCII, a proper subset of 
>>UTF-8),
>
>actually, they generate headers in a variety of charsets.

I'm not sure what you mean. All headers are in the ASCII character 
set currently.

>>>but as far as I'm concerned putting utf-8 in headers is a 
>>>nonstarter anyway.  there's simply no justification for  it.
>>
>>The justification is that the only proposal that doesn't involve 
>>non-ASCII in the headers, draft-hoffman-imaa-03.txt, has two fairly 
>>significant side-effects, namely that senders who have not updated 
>>their MUAs will not sanely be able to initiate mail to non-ASCII 
>>mailboxes
>>and that recipients who have not updated their MUAs will see gibberish.
>
>both of those side-effects also exist for your utf-8 header proposal.
(Continue reading)

Keith Moore | 3 Jan 2004 04:46
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


> I'm not sure what you mean. All headers are in the ASCII character set 
> currently.

according to the standards, yes.  that's not entirely true in the real 
world.
>
>>>> but as far as I'm concerned putting utf-8 in headers is a 
>>>> nonstarter anyway.  there's simply no justification for  it.
>>>
>>> The justification is that the only proposal that doesn't involve 
>>> non-ASCII in the headers, draft-hoffman-imaa-03.txt, has two fairly 
>>> significant side-effects, namely that senders who have not updated 
>>> their MUAs will not sanely be able to initiate mail to non-ASCII 
>>> mailboxes
>>> and that recipients who have not updated their MUAs will see 
>>> gibberish.
>>
>> both of those side-effects also exist for your utf-8 header proposal.

> That is false on both counts.

existing MUAs can't sanely initiate mail to non-ASCII mailboxes.  
they're not set up to accept UTF-8 input.  they're not set up to look 
up IDNs.  they're not set up to stringprep local parts or to encode 
them in a way that's compatible with either SMTP or other MUAs.

existing MUAs can't display non-ASCII mailboxes as anything but 
gibberish.  if they're in utf-8 form, they look like gibberish unless 
the output device happens to display utf-8.  if they're in ACE form, 
(Continue reading)

Thomas Roessler | 3 Jan 2004 11:47

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


On 2004-01-02 22:46:26 -0500, Keith Moore wrote:

> though I would want to think about the security implications of
> caching address maps from previous messages.  getting them from
> an oracle associated with the recipient's domain seems much
> safer.

Using a well-defined encoding to calculate them would seem even
safer, and also works for clients not directly connected to the
Internet.

Regards,
--

-- 
Thomas Roessler · Personal soap box at <http://log.does-not-exist.org/>.

Keith Moore | 4 Jan 2004 07:39
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


>> though I would want to think about the security implications of
>> caching address maps from previous messages.  getting them from
>> an oracle associated with the recipient's domain seems much
>> safer.
>
> Using a well-defined encoding to calculate them would seem even
> safer, and also works for clients not directly connected to the
> Internet.

that lets them be conveyed in protocol fields that expect ASCII but it 
doesn't really make them easy to remember or transcribe, which seems to 
be part of the goal.

Charles Lindsey | 1 Jan 2004 16:29
Picon
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


On Thu, 1 Jan 2004 00:15:34 -0500, Keith Moore <moore <at> cs.utk.edu> wrote:

> having a single flag to say that fields are in utf-8 is ridiculous - 
> first because the fields aren't all generated at the same place, and 
> second because (as you point out) you potentially have to scan the whole 
> header anyway to find the new header field.

He who knowingly generates a UTF-8 header would be responsible for 
ensuring that the "Foobar" header was added, if not already present. IOW, 
those who want to use this new-fangled UTF-8 stuff would be the ones to 
bear the cost.
>
> but as far as I'm concerned putting utf-8 in headers is a nonstarter 
> anyway.  there's simply no justification for  it.

Maybe not, but the problem is that it is going to happen whether you like 
it or not, because people will find that it "just works" (well, mostly). 
Indeed it is already happening, except that the code used is usually not 
UTF-8.

--

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl <at> clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

Keith Moore | 1 Jan 2004 17:20
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


>> having a single flag to say that fields are in utf-8 is ridiculous - 
>> first because the fields aren't all generated at the same place, and 
>> second because (as you point out) you potentially have to scan the 
>> whole header anyway to find the new header field.
>
> He who knowingly generates a UTF-8 header would be responsible for 
> ensuring that the "Foobar" header was added, if not already present. 
> IOW, those who want to use this new-fangled UTF-8 stuff would be the 
> ones to bear the cost.

that's missing the point.  adding an extra field is easy, making sure 
that all non-ascii text that is in the header is in utf-8 at the time 
that that field is added is hard. making sure that any nonascii text 
that is subsequently added by other agents is also in utf-8 is 
impossible.

>> but as far as I'm concerned putting utf-8 in headers is a nonstarter 
>> anyway.  there's simply no justification for  it.
>
> Maybe not, but the problem is that it is going to happen whether you 
> like it or not, because people will find that it "just works" (well, 
> mostly).

lots of people do stupid things.  it's naive to believe that IETF can 
stop people from doing stupid things by defining other ways to do those 
things.

> Indeed it is already happening, except that the code used is usually 
> not UTF-8.
(Continue reading)

Charles Lindsey | 1 Jan 2004 20:18
Picon
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


On Thu, 1 Jan 2004 11:20:11 -0500, Keith Moore <moore <at> cs.utk.edu> wrote:

>> He who knowingly generates a UTF-8 header would be responsible for 
>> ensuring that the "Foobar" header was added, if not already present. 
>> IOW, those who want to use this new-fangled UTF-8 stuff would be the 
>> ones to bear the cost.
>
> that's missing the point.  adding an extra field is easy, making sure 
> that all non-ascii text that is in the header is in utf-8 at the time 
> that that field is added is hard. making sure that any nonascii text 
> that is subsequently added by other agents is also in utf-8 is 
> impossible.

No, the proposal is for a standard which says all Non-ASCII in headers 
MUST be in UTF-8 (well, you can still use RFC 2047 if you really need more 
flexibility). So if anyone includes other charsets naked in a set of 
headers that contains the Foobar header, then he is con-compliant (as is 
anybody who uses even UTF-8 without Foobar). He who generates 
non-compliant messages must put up with the consequences (and that 
includes anyone who invents some local variant of the Foobar header 
allowing other charsets, if his message escapes from his local 
environment).
.
>
>> Indeed it is already happening, except that the code used is usually 
>> not UTF-8.
>
> which is exactly why tagging the entire header as either being utf-8 or 
> not doesn't work.
(Continue reading)

Keith Moore | 2 Jan 2004 06:02
Picon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


>>> He who knowingly generates a UTF-8 header would be responsible for 
>>> ensuring that the "Foobar" header was added, if not already present. 
>>> IOW, those who want to use this new-fangled UTF-8 stuff would be the 
>>> ones to bear the cost.
>>
>> that's missing the point.  adding an extra field is easy, making sure 
>> that all non-ascii text that is in the header is in utf-8 at the time 
>> that that field is added is hard. making sure that any nonascii text 
>> that is subsequently added by other agents is also in utf-8 is 
>> impossible.
>
> No, the proposal is for a standard which says all Non-ASCII in headers 
> MUST be in UTF-8 (well, you can still use RFC 2047 if you really need 
> more flexibility). So if anyone includes other charsets naked in a set 
> of headers that contains the Foobar header, then he is con-compliant 
> (as is anybody who uses even UTF-8 without Foobar).

the question is not whether an implementation that used some other 
charset without encoding it would violate the standard.  the question 
is whether this would work well in practice given that various other 
charsets are already being used without encoding, and also given that 
even within the same header field different bits of text can come from 
different places and be in different charsets.

as for the extra header, I suspect it would be about as useless as 
MIME-Version.

> He who generates non-compliant messages must put up with the 
> consequences (and that includes anyone who invents some local variant 
(Continue reading)

Paul Hoffman / IMC | 1 Jan 2004 01:12
Picon

Use of 8: headers


Glad to see this being discussed more heavily. Adam's "8:" proposal 
is interesting and easy to describe. However, I think Martin has 
brought up a good point:

At 4:41 PM -0500 12/31/03, Martin Duerst wrote:
>I definitely like a header much more than the 8: header prefix
>proposal, because it looks to me that it is much more straight-
>forward to implement. There are no issues such as "what happens
>if there is a To: and an 8:To: header?", and 8-bit-clean software
>can just work on headers without having to care about 7-bit/8-bit
>issues except at very specific points (downgrading/upgrading).

Comments on this balance are most welcome!

>>Header-Transfer-Encoding : "Header-Transfer-Encoding:" ( "8bit" / "7bit" )
>>                                *( ";" parameter )
>>
>>OK, it needs CFWS and all that jazz in the proper places. We can argue
>>later whether the operative keyword is "8bit" or "utf-8".
>
>Allow me to start now: I think the name "Header-Transfer-Encoding"
>is problematic, because it will further increase confusion about
>the various encoding layers. Second, I very much think the
>distinction should be between US-ASCII and UTF-8, not 8bit and 7bit.

Martin is correct. If we go with a new "flagging" header, it could 
simply be "Has-UTF-8-headers: yes".

--Paul Hoffman, Director
(Continue reading)

Adam M. Costello | 27 Dec 2003 09:02

Re: Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


Charles Lindsey <chl <at> clerew.man.ac.uk> wrote:

> Far better to have some indication in the message that it is contains
> 8bit stuff (most likely an extra header to say so).
>
> So let me suggest a header so that UTF-8 users can mark their messages
> as "unclean".
>
> Header-Transfer-Encoding : "Header-Transfer-Encoding:" ( "8bit" /
>   "7bit" ) *( ";" parameter )
>
> It might be argued that this header SHOULD precede any use of
> Non-ASCII in the headers (but given the propensity for transports to
> reorder headers, I doubt that would survive).

Consider this:

    UTF-8-header-field = "8:" field-name ":" utf-8-field-body

where field-name is the same as always, and utf-8-field-body is like
the normal field body for that field-name except that certain Unicode
characters are allowed in certain places (encoded as UTF-8) (details to
be worked out).

For example:

    8:From: blah blah <blah <at> blah>
    Date: Fri, 26 Dec 2003 12:00:00 -0000
    8:Subject: blah blah blah
(Continue reading)

John C Klensin | 27 Dec 2003 17:07

Re: Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


Adam,

While this idea is an interesting one in principle, the
particular proposal you make would break a very large fraction
of the RFC822/2822 parsers in the world, which assume 
   Header = *C ":"
where "C" is an instance of a permitted character.

   They may then treat the character after the colon, if it is
not a space, as an error or as the first character in the field
that follows the header.  They will break either way.

While Paul and I continue to disagree about the level of badness
associated with transport bouncing of an extension, I think
that "deliver and then fail badly" is the worst of all possible
cases, since it may not even permit delivering a competent error
message.

      john

--On Saturday, 27 December, 2003 08:02 +0000 "Adam M. Costello"
<ietf-imaa.amc+0 <at> nicemice.net.RemoveThisWord> wrote:

> 
> Charles Lindsey <chl <at> clerew.man.ac.uk> wrote:
> 
>> Far better to have some indication in the message that it is
>> contains 8bit stuff (most likely an extra header to say so).
>> 
(Continue reading)

Adam M. Costello | 27 Dec 2003 23:21

Re: Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


John C Klensin <john-ietf <at> jck.com> wrote:

> While this idea is an interesting one in principle, the
> particular proposal you make would break a very large fraction
> of the RFC822/2822 parsers in the world, which assume 
>    Header = *C ":"
> where "C" is an instance of a permitted character.

Either I don't understand your objection, or I didn't make my proposal
clear enough.

The previous UTF-8-headers proposals have been redefining the syntax
of existing fields (like From:, Subject:, etc.) to allow UTF-8.  I am
suggesting creating a new field, 8: (which may appear multiple times),
and allowing UTF-8 only inside 8:, and leaving existing field syntax
unchanged.  Existing software, which does not recognize 8:, will not try
to interpret it (unrecognized fields are ignored).  New software that
recognizes 8: will know that the first thing inside the contents of an
8: field is a sub-field-name, with the same semantics as a top-level
field name, but with slightly different syntax beyond that (UTF-8 is
allowed).

The intention of this proposal is to make breakage less likely than it
would be if UTF-8 were used directly inside today's standard fields
(From:, To:, etc.).

We can't expect existing parsers to understand non-ASCII fields, so
the best we can do is try to hide the non-ASCII fields from them.  An
SMTP extension keyword is one line of defense, but Charles Lindsey was
(Continue reading)

Nathaniel Borenstein | 28 Dec 2003 18:37
Favicon

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


I would worry a bit that there may still be mailers out there that 
don't always convey all instances of a header field that appears more 
than once -- e.g. if they convert into, say, a database representation 
and can only have one value indexed to the field name "8", there might 
be information lost when that gets converted back into RFC [2]822 
format.  -- Nathaniel

On Saturday, December 27, 2003, at 05:21  PM, Adam M. Costello wrote:

>
> John C Klensin <john-ietf <at> jck.com> wrote:
>
>> While this idea is an interesting one in principle, the
>> particular proposal you make would break a very large fraction
>> of the RFC822/2822 parsers in the world, which assume
>>    Header = *C ":"
>> where "C" is an instance of a permitted character.
>
> Either I don't understand your objection, or I didn't make my proposal
> clear enough.
>
> The previous UTF-8-headers proposals have been redefining the syntax
> of existing fields (like From:, Subject:, etc.) to allow UTF-8.  I am
> suggesting creating a new field, 8: (which may appear multiple times),
> and allowing UTF-8 only inside 8:, and leaving existing field syntax
> unchanged.  Existing software, which does not recognize 8:, will not 
> try
> to interpret it (unrecognized fields are ignored).  New software that
> recognizes 8: will know that the first thing inside the contents of an
(Continue reading)

John Cowan | 28 Dec 2003 20:02

Re: I-D ACTION:draft-hoffman-utf8headers-00.txt


Nathaniel Borenstein scripsit:

> I would worry a bit that there may still be mailers out there that 
> don't always convey all instances of a header field that appears more 
> than once -- e.g. if they convert into, say, a database representation 
> and can only have one value indexed to the field name "8", there might 
> be information lost when that gets converted back into RFC [2]822 
> format.  -- Nathaniel

Such mailers are obviously broken, and can't represent the "Received:"
header as commonly used -- not to mention that the RFC allows most
headers to be repeated.

--

-- 
Schlingt dreifach einen Kreis vom dies!    John Cowan <jcowan <at> reutershealth.com>
Schliesst euer Aug vor heiliger Schau,     http://www.reutershealth.com      
Denn er genoss vom Honig-Tau,              http://www.ccil.org/~cowan  
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)

Martin Duerst | 22 Dec 2003 19:16
Picon
Favicon

Re: Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


Hello Paul,

I'm surprised too that there haven't been any comments so far on your
draft. I have read about half of your draft, and already have quite a few
comments (mostly positive/clarifying), but I want to finish reading
it before writing things up.

Regards,     Martin.

At 09:14 03/12/22 -0800, Paul Hoffman / IMC wrote:

>Er, any comments at all?
>
>--Paul Hoffman, Director
>--Internet Mail Consortium

James Seng | 23 Dec 2003 02:32
Picon

Re: Fwd: I-D ACTION:draft-hoffman-utf8headers-00.txt


perhaps it is christmas session? i know i been running like crazy the 
last few days.

x-mas :-)

james

Martin Duerst wrote:

> 
> Hello Paul,
> 
> I'm surprised too that there haven't been any comments so far on your
> draft. I have read about half of your draft, and already have quite a few
> comments (mostly positive/clarifying), but I want to finish reading
> it before writing things up.
> 
> Regards,     Martin.
> 
> At 09:14 03/12/22 -0800, Paul Hoffman / IMC wrote:
> 
>> Er, any comments at all?
>>
>> --Paul Hoffman, Director
>> --Internet Mail Consortium
> 
> 
> 
> 
(Continue reading)

Paul Hoffman / IMC | 21 Apr 2003 17:55
Picon

Fwd: I-D ACTION:draft-hoffman-imaa-01.txt


Adam and I have significantly revised the document based on the 
earlier discussion. Please let us know what you think, and please 
*don't* reply to this message, but start a new thread with a 
different subject line. Thanks!

>To: IETF-Announce: ;
>From: Internet-Drafts <at> ietf.org
>Reply-to: Internet-Drafts <at> ietf.org
>Subject: I-D ACTION:draft-hoffman-imaa-01.txt
>Date: Mon, 21 Apr 2003 07:32:19 -0400
>Sender: owner-ietf-announce <at> ietf.org
>
>
>
>A New Internet-Draft is available from the on-line Internet-Drafts 
>directories.
>
>
>	Title		: Internationalizing Mail Addresses in Applications
>                           (IMAA)
>	Author(s)	: P. Hoffman, A. Costello
>	Filename	: draft-hoffman-imaa-01.txt
>	Pages		: 11
>	Date		: 2003-4-18
>
>The Internationalizing Domain Names in Applications (IDNA)
>specification describes how to process domain names that have
>characters outside the ASCII repertoire.  A user who has an
>internationalized domain name may want to have their full Internet
(Continue reading)

Nathaniel Borenstein | 23 Jan 2004 23:52
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt

John -- It's a pity to say this about such a nicely-written draft, but 
I fear this proposal comes remarkably close to maximizing the 
cost/benefit ratio.  When you consider distributed costs of testing and 
deployment, I would bet that implementing this proposal would cost at 
*least* 50% of what it would cost to deploy a far more radical set of 
changes to the email infrastructure.

What we were discussing in Minneapolis was a once-in-a-generation scale 
of protocol reform.  I think we should think big, because there is a 
large fixed cost to pushing any structural change through the entire 
Internet community.  I'm particularly concerned that you seem to be 
punting on internationalization -- I can't see putting much energy into 
*any* major reform that doesn't address the most widely-perceived 
failing of the current system.

At root, what I don't buy is the notion that, in this case, we can take 
an incremental approach to radical change.  The Wright Brothers didn't 
learn to fly by practicing the high jump, and I don't think we're going 
to clean up email by creating a new architectural entity (the extended 
envelope) that only works in ASCII.

Now, if the extended envelope could be done completely in UTF-8, with a 
coevolutionary model for son-of-822 header fields and binary body 
transport, now *that* would be a more compelling story.  Delivery 
tracing, internationalization, and binary transport would be a nice 
trio of goals for a next-generation email system.  I can certainly 
think of a few others, though.  -- Nathaniel

On Friday, January 23, 2004, at 04:53  PM, John C Klensin wrote:

(Continue reading)

John C Klensin | 24 Jan 2004 01:40

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


Nathaniel,

We may be less in disagreement than you assume.  Briefly, I 
think it is useful to write some things up separately, because, 
that way, it is easier to think about whether the pieces are 
right or there are better ways to do them.   And I am not 
punting on internationalization at all -- it is my main concern 
here and I just want to see if we can build a healthy 
environment for it, one in which, as I've said a few times, we 
move aggressively toward an internationalized/multilingual 
Internet rather than continuing with a predominantly Roman 
character/ English network with various kludges, patches, and 
warts to permit other scripts and languages to sort of work. 
So, fwiw...

I would be happy, indeed delighted, to be wrong, but I see just 
about no hope for designing and deploying a completely new email 
structure.  More on that in draft-klensin-emailaddr-02.txt, 
which I hope I can get wrapped up and posted on Monday or 
Tuesday (it is a fairly major revision of the ideas in -01).

The nature of the internationalization problem, as I see it, is 
precisely something that will need a number of interdependent 
changes.  To try to make them independently, or to work around 
some of them, or to try for small incremental chunks, is likely 
to get us into the land of "nasty kludges forever".  I don't see 
that as flying, and I don't see it as being successful.  On the 
other hand, if we look at the whole picture (and if I'm right), 
when we get through, we have a rather different-looking email 
(Continue reading)

Eric A. Hall | 25 Jan 2004 00:08

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


On 1/23/2004 6:40 PM, John C Klensin wrote:

> I would be happy, indeed delighted, to be wrong, but I see just 
> about no hope for designing and deploying a completely new email 
> structure.  More on that in draft-klensin-emailaddr-02.txt, 
> which I hope I can get wrapped up and posted on Monday or 
> Tuesday (it is a fairly major revision of the ideas in -01).

I disagree, but I'll read your draft when it's published anyway. From a
design perspective (or rather, ignoring the installed base as a criteria
of 'design', which is the usual and erroneous counter-argument), SMTP is
broken. The layering is munged and none of the advanced features will work
right until the transfer-headers and data-headers are clearly separated (I
know you're advocating a slight separation here, but that'll only get you
so far). Also, the messaging network is mostly limited to per-hop
negotiation, and does not provide the kind of end-to-end negotiation
(which is still possible in a store-and-forward environment) that is going
to be necessary for advanced features.

I agree with most of the rest of your message. I do not think it is
possible to do that with the existing infrastructure.

http://www.ehsco.com/misc/mt2/ has an outline of a replacement service
that covers most of the same ground and goes a fair bit further, the
latter of which is afforded by the fact that it's a wholesale replacement
[and no, Valdis, it doesn't require everybody to upgrade first :)]. I had
meant to get into I-D form last year but I've got too many other pro-bono
projects going already to take on another one. Steal as you wish.

(Continue reading)

Nathaniel Borenstein | 24 Jan 2004 21:49
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


John -- I'm glad to hear that you see this as part of a larger effort.  
I am certainly not opposed to incrementalism wherever possible, it just 
wasn't clear to me from this particular document, out of context, how 
it fit.

For my part, I think I might slice the problem a bit differently, but I 
might be wrong.  So I'd like to see our efforts start with some 
requirements analysis.  If we're going to build email NG, we need to 
make some decisions about what the goals are and then try to cut it 
off, rather than let it grow indefinitely as we're trying to make 
progress.

I've already suggested the following key goals:

	-- internationalization, esp. of addresses
	-- enhanced tracing mechanisms
	-- binary transport  (But what I really mean by binary transport is 
probably "deprecation of historical cruft," e.g. 
Content-Transfer-Encoding shouldn't be necessary where the extensions 
are used.)

I also think that there's enormous potential in generalizing and 
standardizing the kind of structured local-parts that people currently 
use "+" for.  And on and on.  The list of possible additions is fairly 
large, I'd like to have at least a rough consensus before we put a lot 
of work into documents.  -- Nathaniel

On Friday, January 23, 2004, at 07:40  PM, John C Klensin wrote:

(Continue reading)

Hector Santos | 24 Jan 2004 03:21
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


----- Original Message ----- 
From: "John C Klensin" <john-ietf <at> jck.com>
To: "Nathaniel Borenstein" <nsb <at> guppylake.com>
Cc: <ietf-smtp <at> imc.org>
Sent: Friday, January 23, 2004 7:40 PM
Subject: Re: I-D ACTION:draft-klensin-email-envelope-00.txt

>
> Nathaniel,
>
> We may be less in disagreement than you assume.  Briefly, I
> think it is useful to write some things up separately, because,
> that way, it is easier to think about whether the pieces are
> right or there are better ways to do them.   And I am not
> punting on internationalization at all -- it is my main concern
> here and I just want to see if we can build a healthy
> environment for it, one in which, as I've said a few times, we
> move aggressively toward an internationalized/multilingual
> Internet rather than continuing with a predominantly Roman
> character/ English network with various kludges, patches, and
> warts to permit other scripts and languages to sort of work.

Is this one of the main reasons for the ENVL proposal?
Internationalization?   if so, this sort of answers my question on what the
ENV header includes.  It implies mime content headers are part of the ENVL
header block.  No?

> So, fwiw...
>
(Continue reading)

Valdis.Kletnieks | 24 Jan 2004 04:24
Picon
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt

On Fri, 23 Jan 2004 21:21:15 EST, Hector Santos said:
> mindset.  Quite frankly,  there seems to be conflictive old guru closed
> minded attitude about the realities of the mail industry delimas which
> perpetuates a dearth of interest to finally address the issues one way or
> another.    You guys (speaking in general) need to listen and begin "opening
> your minds."

What is often mistaken as "old guru closed mind attitude" is the experience
to be able to tell that *any* proposal in category X is doomed to failure unless
problems A, B, and C are addressed, and any proposal in Y has almost-certain
fatal flaws D and E, and so on.

Another common error is failing to understand the realities of scaling and deployment.
It would take me a short afternoon to put something up to test, understanding that
it was beta-quality software.  Getting everybody in my department moved is a
much bigger challenge, and migrating 60K users is a year-long proposition, and
migrating the millions of users of an AOL or Hotmail is a daunting issue indeed.
(Hint - how many extra first-line help desk people would AOL need to hire, and
what would the price be for that?  Is the cost of the migration more than their
current anti-spam costs, or not?  Don't forget to include burning 100M new
coaster^H^H^H^H^H CD's with 'AOL 10.0 - now with new anti-spam' on it).

TV ads.  Lots of them.  You'll need them.  If you don't understand why,
don't bother commenting until you do. ;)

Oh, and don't forget that you're not allowed to break anybody's connectivity at any
point along the line.  The NCP->IP migration was bad enough wnen there were only
several hundred hosts.  We'll never get to do that again.  If you have half the users
migrated to SMTP-bis and half are still on SMTP-classic, they *will* have to
be able to communicate, or you *will* be fired.
(Continue reading)

John C Klensin | 24 Jan 2004 14:31

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


--On Friday, 23 January, 2004 22:24 -0500 
Valdis.Kletnieks <at> vt.edu wrote:

>...
> So for instance, the previous paragraph is why *any* proposal
> that doesn't provide benefit until 98% or so of the net has
> deployed it is doomed to fail unless you give a *really* good
> explanation of why 98% of the people should deploy it when it
> isn't going to give them any benefit if 3% of the OTHER people
> don't bother....

Valis,

I want to comment on the above, since I am in the process of 
proposing some things (the subject draft is only part of the 
package) that, from a deployment standpoint, give me the shakes 
-- and that should give anyone who has dealt with those issues, 
as you obviously have, similar anxieties.

At the antispam conference at MIT a week ago, there were an 
astonishing (to me) number of proposals that seemed to be based 
on the premise of "as soon as everyone on the network adopts 
this technology, everything will be wonderful".   And, as you 
suggest, the reaction among all of us with significant 
deployment experience was either "yes, right.  next?" or "when 
pigs fly".  It isn't even worth the effort to stand up and ask 
those folks pointed questions any more.  If you ever see a 
proposal with my name on it that proposes to cure spam (or the 
common cold) but requires Internet-wide deployment to be 
(Continue reading)

Hector Santos | 24 Jan 2004 08:16
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


----- Original Message ----- 
From: <Valdis.Kletnieks <at> vt.edu>
To: "Hector Santos" <winserver.support <at> winserver.com>
Cc: <ietf-smtp <at> imc.org>
Sent: Friday, January 23, 2004 10:24 PM
Subject: Re: I-D ACTION:draft-klensin-email-envelope-00.txt

> What is often mistaken as "old guru closed mind attitude" is the
> experience to be able to tell that *any* proposal in category X
> is doomed to failure unless problems A, B, and C are addressed,
> and any proposal in Y has almost-certain fatal flaws D and E, and
> so on.

May I ask, why do you assume I lack system design or system wide engineering
integration, design impact, understanding , capability or experience?

What is becoming quite clear to me, it seem to have a fundamentally flawed
assumption about me.  I am trying to be nice.  I do not make the assumption
you or anyone else here is not competent.

> Another common error is failing to understand the realities of
> scaling and deployment.

Yet another fundamentally flawed assumption. Why do you assume I lack basic
understanding in high end product operations?

> It would take me a short afternoon to put
> something up to test, understanding that it was beta-quality
> software.
(Continue reading)

Valdis.Kletnieks | 24 Jan 2004 09:36
Picon
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt

On Sat, 24 Jan 2004 02:16:34 EST, Hector Santos said:

> May I ask, why do you assume I lack system design or system wide engineering
> integration, design impact, understanding , capability or experience?

I didn't.

You're the one who brought up "closed mind attitude".  You're also the one
who asserted "You guys need to listen and begin opening your minds", without
considering that maybe our minds *are* opened, and we're willing to listen
to new ideas, but *no new ideas are forthcoming*.  More about that later.

You're also making the rash assumption that my reply was directed entirely to
you personally, and not to the lurkers on the list, or to the people who may
read the list archives in the future, trying to figure out why *their*
brilliant pet proposal got shot down...

> 1) the flawed assumption a proposed consideration, concept or idea is not a
> good idea to begin with at any level minimizing possibility or consideration
> for implementation, and

OK, You tell me.  What do *you* tell one of your engineers when they come in
with the *same* idea that 8 other engineers have suggested, and this latest one
hasn't done anything about the fatal flaw you already pointed out to 8 others?

> All I am saying it is a mistake I lack reason,  knowledge or insight in my
> statements.  I appreciate the experience and historical IETF perspectives
> shared by your and other IETF veterans here .  But I will not allow views I

Exactly.  Unfortunately, some of the "historical perspectives" the veterans have
(Continue reading)

Hector Santos | 24 Jan 2004 11:58
Favicon

Re: I-D ACTION:draft-klensin-email-envelope-00.txt


----- Original Message ----- 
From: <Valdis.Kletnieks <at> vt.edu>
To: "Hector Santos" <winserver.support <at> winserver.com>
Cc: <ietf-smtp <at> imc.org>
Sent: Saturday, January 24, 2004 3:36 AM
Subject: Re: I-D ACTION:draft-klensin-email-envelope-00.txt

> You're the one who brought up "closed mind attitude".

Its true! <g>

> You're also the one who asserted "You guys need to listen and begin
> opening your minds", without considering that maybe our minds *are*
> opened, and we're willing to listen to new ideas, but *no new ideas
> are forthcoming*.  More about that later.

In my view, the reason for this is that the functional specification is
inherently restrictive and this is what I meant
by "opening your minds" because maybe then some legit effort can begin.  I
don't think I am off based the general consensus viewpoint most often cited
about the IETF is that it is a closed knit group of vets where proposal for
change is philosophically, militantly and fundamentally not welcome. :-)

> to the people who may read the list archives in the future, trying
> to figure out why *their* brilliant pet proposal got shot down...

People will read this tone, the negative "brilliant pet proposal"
description in the archives and they will ASSUME a "closed minded attitude"
and become more resistance to participate because the tone ridicules them.
(Continue reading)


Gmane