David Levine | 19 Jul 2012 03:05
Picon
Favicon

Message-IDs and Content-IDs

I've been thinking more about Message-IDs.  The way nmh,
sendmail, and Fedora configure things by default, the host
part is worthless, but I expect that's common in these
halcyon days of NAT.  And on some hosts, I want the real
hostname to be masked, esp. when using a masqueraded From:
address.  (It can't be masked in Received headers, but I'll
do what I can.)

"Host part" is pseudo, it's after the  <at>  and can include most
ASCII printable characters.

So I'm thinking of generating a Message-ID based on a
MAC address.  Something of the form:
pid-timestamp <at> md, where md is the sha1 hash of the pid,
timestamp, and MAC address.  That would take care of the
worthless and unmasked host part.

And as a bonus, I could easily determine if a message with a
particular Message-ID originated from my machine.

One downside is that there's no portable way to retrieve
MAC addresses.  (In other words, I would just do it on Linux
but would incorporate contributions for other platforms.)
But Message-IDs are supposed to be globally unique, so I
don't think there are other easy alternatives.

We had talked about allowing the host part to be configured
by the user (identname/idname), this seems like a good place
to include that as well.

(Continue reading)

Tom Lane | 19 Jul 2012 03:36
Picon

Re: Message-IDs and Content-IDs

David Levine <levinedl <at> acm.org> writes:
> I've been thinking more about Message-IDs.  The way nmh,
> sendmail, and Fedora configure things by default, the host
> part is worthless, but I expect that's common in these
> halcyon days of NAT.

Agreed ... that's a problem ... but

> So I'm thinking of generating a Message-ID based on a
> MAC address.

... MAC addresses are just as forge-able; there are utilities on any
modern OS to let you spoof that.  Moreover, if I have a machine that
*is* exposing a "real", legitimate, vendor-assigned MAC address,
I'm not sure I want to publish that to the world.  At the least it's
telling anybody who wants to know what sort of hardware I'm running.

More generally, I'd prefer to opt out of any scheme that involves
creating an anonymized message ID.  It looks way too much like something
a spammer would do.  I know I've got filters that will bit-bucket
messages with all-hex message IDs; I'm probably not the only one.

			regards, tom lane

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

(Continue reading)

Lyndon Nerenberg | 19 Jul 2012 04:43
Picon

Re: Message-IDs and Content-IDs

> "Host part" is pseudo, it's after the  <at>  and can include most
> ASCII printable characters.

Just read, say, 64 bytes from /dev/random and base64 encode it.

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Ralph Corderoy | 19 Jul 2012 15:24
Picon

Re: Message-IDs and Content-IDs

Hi,

Lyndon Nerenberg wrote:
> > "Host part" is pseudo, it's after the  <at>  and can include most ASCII
> > printable characters.
> 
> Just read, say, 64 bytes from /dev/random and base64 encode it.

Would /dev/urandom suffice?  Entropy can be in short supply in some
environments, e.g. VMs, and given it's MH it may be being scripted to
send a few emails in a short time.

heirloom-mailx uses /dev/urandom with a fallback if it's not available.

Cheers, Ralph.

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Lyndon Nerenberg | 19 Jul 2012 19:48
Picon

Re: Message-IDs and Content-IDs

> Would /dev/urandom suffice?

Sure. But for portability's sake, the code should read 4 bytes from 
/dev/random to seed srand() (fallback to (time XOR pid) or something 
similar if /dev/random can't be opened), then call rand() to supply 
however many bytes you want to use.

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Ken Hornstein | 19 Jul 2012 20:19
X-Face
Picon
Favicon

Re: Message-IDs and Content-IDs

>> Would /dev/urandom suffice?
>
>Sure. But for portability's sake, the code should read 4 bytes from 
>/dev/random to seed srand() (fallback to (time XOR pid) or something 
>similar if /dev/random can't be opened), then call rand() to supply 
>however many bytes you want to use.

Dumb question time: Do we want to introduce a dependency on /dev/random?
Looks like most people have it now.

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

valdis.kletnieks | 19 Jul 2012 20:47
Picon
Favicon

Re: Message-IDs and Content-IDs

On Thu, 19 Jul 2012 14:19:33 -0400, Ken Hornstein said:

> Dumb question time: Do we want to introduce a dependency on /dev/random?
> Looks like most people have it now.

Go for it.  It's 2012, and enough other things need a random number supply for
crypto that you can probably safely consider a system that still doesn't have
/dev/random as being somewhere between archaic and downright crippled.

What still doesn't have /dev/random, and what does software do to compensate?

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Ken Hornstein | 19 Jul 2012 20:54
X-Face
Picon
Favicon

Re: Message-IDs and Content-IDs

>> Dumb question time: Do we want to introduce a dependency on /dev/random?
>> Looks like most people have it now.
>
>Go for it.  It's 2012, and enough other things need a random number supply for
>crypto that you can probably safely consider a system that still doesn't have
>/dev/random as being somewhere between archaic and downright crippled.
>
>What still doesn't have /dev/random, and what does software do to compensate?

I'm only mentioning this because we got recent bug reports about
things not working right because of a buggy snprintf() on an ancient
HPUX box.  Does the ancient HPUX box in question have /dev/random?  Good
question; dunno.

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Tom Lane | 20 Jul 2012 01:46
Picon

Re: Message-IDs and Content-IDs

Ken Hornstein <kenh <at> pobox.com> writes:
>> What still doesn't have /dev/random, and what does software do to compensate?

> I'm only mentioning this because we got recent bug reports about
> things not working right because of a buggy snprintf() on an ancient
> HPUX box.  Does the ancient HPUX box in question have /dev/random?  Good
> question; dunno.

That was mine, and no it doesn't have /dev/random.  But, as I said
earlier, I'd prefer to opt-out of this whole cryptographic host part
concept anyway.  I think as long as you provide an option to still use
the regular host name for message IDs, people who don't have /dev/random
have an escape.

			regards, tom lane

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Lyndon Nerenberg | 20 Jul 2012 02:25
Picon

Re: Message-IDs and Content-IDs

> That was mine, and no it doesn't have /dev/random.  But, as I said
> earlier, I'd prefer to opt-out of this whole cryptographic host part
> concept anyway.

It's not cryptographic, it's just a case of trying to achieve uniqueness 
without too much effort. The world won't end if message-ids collide.

And don't get too caught up with local vs. host parts of message-id 
strings. The idea of <mumble <at> host> was an inexpensive way to generate a 
unique qualifier for the entropy, back when entropy was expensive. Today, 
entropy is cheap, so any opaque random string works fine.  The only reason 
we keep the ' <at> ' requirement is for compatibility with software written to 
previous versions of the *822 specifications.

--lyndon

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Lyndon Nerenberg | 19 Jul 2012 21:00
Picon

Re: Message-IDs and Content-IDs

> Dumb question time: Do we want to introduce a dependency on /dev/random?
> Looks like most people have it now.

That's why the fallback.  Everyone has getpid() and time().

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

David Levine | 19 Jul 2012 14:46
Picon
Favicon

Re: Message-IDs and Content-IDs

Tom wrote:

> ... MAC addresses are just as forge-able; there are utilities on any
> modern OS to let you spoof that.  Moreover, if I have a machine that
> *is* exposing a "real", legitimate, vendor-assigned MAC address,
> I'm not sure I want to publish that to the world.

The idea was to sha1 hash them so they'd effectively be random.
But Lyndon's idea of using random makes more sense.

And it looks like one of my ISPs overwrites Message-ID completely,
anyway.

> More generally, I'd prefer to opt out of any scheme that involves
> creating an anonymized message ID.  It looks way too much like something
> a spammer would do.  I know I've got filters that will bit-bucket
> messages with all-hex message IDs; I'm probably not the only one.

Do you need anything beyond what's already provided?  I
thought you change LocalName(1) calls to LocalName(0) to
use localname?

David

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

(Continue reading)

David Levine | 19 Jul 2012 15:32
Picon
Favicon

Re: Message-IDs and Content-IDs

Ralph wrote:

> Lyndon Nerenberg wrote:
> > Just read, say, 64 bytes from /dev/random and base64 encode it.
> 
> Would /dev/urandom suffice?  Entropy can be in short supply in some
> environments, e.g. VMs, and given it's MH it may be being scripted to
> send a few emails in a short time.
> 
> heirloom-mailx uses /dev/urandom with a fallback if it's not available.

/dev/urandom is fine.

64 bytes?  Though 64 bits seems too short.

David

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

valdis.kletnieks | 19 Jul 2012 16:02
Picon
Favicon

Re: Message-IDs and Content-IDs

On Thu, 19 Jul 2012 08:32:59 -0500, David Levine said:

> 64 bytes?  Though 64 bits seems too short.

64 bits should be plenty.  At that point, the chances of an accidental collision
of Message-ID:s due to a crypto failure is already far below other software
bugs causing Message-IDs to be reused.  Remember, Message-ID isn't guaranteed
to be crypto-secure - we'd need a *lot* more infrastructure worldwide to
actually guarantee that.

(Remember that many of us store those Message-ID's in 'procmail -D' files.
No sense in bloating them beyond what's needed ;)
_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Lyndon Nerenberg | 19 Jul 2012 19:37
Picon

Re: Message-IDs and Content-IDs

> 64 bytes?  Though 64 bits seems too short.

It's just a number :-)

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

David Levine | 20 Jul 2012 02:13
Picon
Favicon

Re: Message-IDs and Content-IDs

Tom wrote:

> That was mine, and no it doesn't have /dev/random.  But, as I said
> earlier, I'd prefer to opt-out of this whole cryptographic host part
> concept anyway.  I think as long as you provide an option to still use
> the regular host name for message IDs,

That will remain the default.

Do you want LocalName(0), which can use localname from mts.conf?
Or is the current use of LocalName(1) OK?

David

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Tom Lane | 20 Jul 2012 03:46
Picon

Re: Message-IDs and Content-IDs

David Levine <levinedl <at> acm.org> writes:
> Do you want LocalName(0), which can use localname from mts.conf?
> Or is the current use of LocalName(1) OK?

The current way is fine with me, but I don't think the other would
be a problem either.

			regards, tom lane

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

David Levine | 21 Jul 2012 23:25
Picon
Favicon

Re: Message-IDs and Content-IDs

I added to send and post:

  The -messageid switch selects the style used for the part
  appearing after the  <at>  in "Message-ID:", "Resent-Message-ID:",
  and "Content-ID:" header fields.  The two acceptable options
  are localname (which is the default), and random.  With
  localname, the local hostname is used.  With random, a random
  sequence of characters is used instead.  Note that the -msgid
  switch must be enabled for this switch to have any effect.

The default maintains the current behavior.

This message should have a random Message-ID.

David

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Lyndon Nerenberg | 21 Jul 2012 23:36
Picon

Re: Message-IDs and Content-IDs

> This message should have a random Message-ID.

The trouble is, it's not a valid Message-ID. You sent:

   Message-ID: <16888-1342905949.421986 <at> QcTLPy+DeAJLdhEN>

The grammar production requires:

   msg-id =   [CFWS] "<" id-left " <at> " id-right ">" [CFWS]

So you need to stuff an ' <at> ' in there someplace.

--lyndon

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

valdis.kletnieks | 21 Jul 2012 23:43
Picon
Favicon

Re: Message-IDs and Content-IDs

On Sat, 21 Jul 2012 14:36:19 -0700, Lyndon Nerenberg said:
> > This message should have a random Message-ID.
>
> The trouble is, it's not a valid Message-ID. You sent:
>
>    Message-ID: <16888-1342905949.421986 <at> QcTLPy+DeAJLdhEN>
>
> The grammar production requires:
>
>    msg-id =   [CFWS] "<" id-left " <at> " id-right ">" [CFWS]
>
> So you need to stuff an ' <at> ' in there someplace.

Odd.. My copy of the message, and the one you quoted, both apparently
had an ' <at> ' in them, between the '6' and the 'Q'? ;)
_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Lyndon Nerenberg | 21 Jul 2012 23:47
Picon

Re: Message-IDs and Content-IDs

> Odd.. My copy of the message, and the one you quoted, both apparently
> had an ' <at> ' in them, between the '6' and the 'Q'? ;)

Odd. My brain and eyeballs cannot spot an ' <at> ' between a '6' and a 'Q'. 
And I was pretty sure my editor search for it, umm, failed.

Doh.

--lyndon

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

valdis.kletnieks | 21 Jul 2012 23:41
Picon
Favicon

Re: Message-IDs and Content-IDs

On Sat, 21 Jul 2012 16:25:49 -0500, David Levine said:

> This message should have a random Message-ID.

Message-id: <16888-1342905949.421986 <at> QcTLPy+DeAJLdhEN>

Looks good to me - but will some busticated software that assumes everything
after the  <at>  is syntactically (if not semantically) a "hostname" get upset at
the + in there? (Personally, I don't care, all of my stuff just want a unique
string inside the <>, and if my stuff is busted I'll beat the snot out of whoever
wrote it and then get it fixed ;)

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers
David Levine | 22 Jul 2012 00:23
Picon
Favicon

Re: Message-IDs and Content-IDs

Valdis wrote:

> Looks good to me - but will some busticated software that assumes everything
> after the  <at>  is syntactically (if not semantically) a "hostname" get upset at
> the + in there? (Personally, I don't care, all of my stuff just want a unique
> string inside the <>, and if my stuff is busted I'll beat the snot out of whoever
> wrote it and then get it fixed ;)

I'll find out :-)

Non-qualified hostnames do get used, even to this list.  I
looked at a small collection of spam and saw hardly any
random "hostname" parts, but the sample is biased (it got
through some filters) and very small.

David

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Tom Lane | 22 Jul 2012 01:15
Picon

Re: Message-IDs and Content-IDs

David Levine <levinedl <at> acm.org> writes:
> Non-qualified hostnames do get used, even to this list.  I
> looked at a small collection of spam and saw hardly any
> random "hostname" parts, but the sample is biased (it got
> through some filters) and very small.

I have a rather larger collection of spam handy ... and there's
quite a lot of unqualified hostnames in there, as well as quite
a lot of raw IP addresses.

I would suggest trying to make sure that the phony-hostname part
doesn't look like either of those categories.  I don't personally
use such a thing as spam sign, but I bet some people do.  Perhaps
it would do to intentionally insert a couple of dots in the otherwise
random string, ie instead of "... <at> QcTLPy+DeAJLdhEN" something like
"... <at> QcTLP.y+DeA.JLdhEN".

Personally I'd be inclined to limit the characters used for the "random"
data to alphanumerics, too, to make it look more like a hostname.
If you want 64 characters so that it works like base64, maybe add "-"
and "_" to the repertoire.

			regards, tom lane

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

(Continue reading)

valdis.kletnieks | 22 Jul 2012 02:42
Picon
Favicon

Re: Message-IDs and Content-IDs

On Sat, 21 Jul 2012 17:23:38 -0500, David Levine said:
> Non-qualified hostnames do get used, even to this list.  I
> looked at a small collection of spam and saw hardly any
> random "hostname" parts, but the sample is biased (it got
> through some filters) and very small.

Non-qualified hostnames get used so much that I'm pretty sure that
everybody will accept a string that doesn't have a '.' in it.  I was
more worried about the presence of '+' in there giving something
a tummyache.

On Sat, 21 Jul 2012 19:15:04 -0400, Tom Lane said:
> Personally I'd be inclined to limit the characters used for the "random"
> data to alphanumerics, too, to make it look more like a hostname.
> If you want 64 characters so that it works like base64, maybe add "-"
> and "_" to the repertoire.

[A-Za-z0-9] '-' and '_', makes 64, and only '_' is at all controversial
(but then, it's been ever since RFC821 and 822 disagreed about it,
so I feel more confident that people accept _ out of self-defense than they
accept '+' that's never been fair game there..)

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers
David Levine | 22 Jul 2012 02:56
Picon
Favicon

Re: Message-IDs and Content-IDs

Tom Lane wrote:

> it would do to intentionally insert a couple of dots in the otherwise
> random string, ie instead of "... <at> QcTLPy+DeAJLdhEN" something like
> "... <at> QcTLP.y+DeA.JLdhEN".
> 
> Personally I'd be inclined to limit the characters used for the "random"
> data to alphanumerics, too, to make it look more like a hostname.
> If you want 64 characters so that it works like base64, maybe add "-"
> and "_" to the repertoire.

Great ideas, done.

David

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Gmane