Paul Fox | 22 Jun 2012 16:01
Picon
Favicon

mime-aware filtering?


i realized yesterday that the ever-increasing base64-ization of mail
is starting to break my procmail filters, some of which look into the
body of the message.  i'd welcome any excuse to switch to something
other than procmail, so:  are there any mime-aware filtering programs
out there?  slocal doesn't even seem to look at the body at all.  i
looked at the maildrop docs -- i don't think it makes the cut either,
but i might have missed something.  the sieve specification seems like
it could do it, but all the implementations seem to be imap-centric,
and live on the server.  again, maybe i missed something.

perhaps it would be worth it to fix slocal, but i'm not sure it
has some of the other flexibility i might want -- multiple
conditions, nested conditions, etc. (am i wrong about that?)

paul
=---------------------
 paul fox, pgf <at> foxharp.boston.ma.us (arlington, ma, where it's 80.1 degrees)

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Ken Hornstein | 22 Jun 2012 16:26
X-Face
Picon
Favicon

Re: mime-aware filtering?

>but i might have missed something.  the sieve specification seems like
>it could do it, but all the implementations seem to be imap-centric,
>and live on the server.  again, maybe i missed something.

I think you're right in that slocal probably isn't worth extending to do
what you want.  So to keep it within nmh we'd need a sieve implementation
which is probably a good idea anyway.

Some quick Googling shows me that there are some implementations out there
(libsieve for one), but it's going to be a job to implement it in nmh.
Thankfully it requires that everything be converted interally to UTF-8,
so at least that deals with some of the I18N issues.

To answer your original question ... I don't really know of anything
that does what you want.  Seems like the sieve implementations have
all focused on the server side.  I think the move to IMAP and webmail
has resulted in client-side support for more complex filtering to lag.

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

David Levine | 22 Jun 2012 21:53
Picon
Favicon

Re: mime-aware filtering?

Paul wrote:

> i realized yesterday that the ever-increasing
> base64-ization of mail is starting to break my procmail
> filters, some of which look into the body of the message.

I've been feeling a need for that recently, too.  While I
wouldn't mind moving away from procmail, this is its only
drawback for me.

Could a filter that decodes just the base64-encoded parts of
a message solve the problem?  It should be fairly simple,
esp. if it can rely on an existing decoder such as base64,
openssl, or MIME::Base64.  And if it doesn't worry about
signature verification.

I run a script that de-base64's text/plain parts of messages
so that I can grep them.  I invoke it manually after inc'ing
the message, but automatically decoding upstream would save
me a step.  It uses nmh programs (mhlist, and mhshow or
mhstore) so is too kludgy other than as a proof of concept
that maybe has been a bit too successful.

> perhaps it would be worth it to fix slocal, but i'm not
> sure it has some of the other flexibility i might want --
> multiple conditions, nested conditions, etc. (am i wrong
> about that?)

I agree with you and Ken that slocal isn't worth extending.

(Continue reading)

Paul Vixie | 22 Jun 2012 22:08
Favicon

Re: mime-aware filtering?

On 6/22/2012 7:53 PM, David Levine wrote:
> Paul wrote:
>
>> i realized yesterday that the ever-increasing
>> base64-ization of mail is starting to break my procmail
>> filters, some of which look into the body of the message.

whoa. blast from the past.

> I've been feeling a need for that recently, too.  While I
> wouldn't mind moving away from procmail, this is its only
> drawback for me.
>
> Could a filter that decodes just the base64-encoded parts of
> a message solve the problem?  It should be fairly simple,
> esp. if it can rely on an existing decoder such as base64,
> openssl, or MIME::Base64.  And if it doesn't worry about
> signature verification.

i think the solution space is bimodal.

if someone is willing to write what they need in perl, they can create a
MIME data structure and surf that from within perl. mh need do nothing
except continue as it does now to supply files and filenames to such
outboard tools.

for people who aren't willing to do that, MH would need to be able to
export the structure of a message in a way that a shell script could
iterate through the parts, asking MH to provide a file or file name of
each attachment. none of these should still be in wire format, in case
(Continue reading)

norm | 23 Jun 2012 17:31

Re: mime-aware filtering?

Paul Vixie <paul <at> redbarn.org> writes:
>...
>i consider MH's basic mailbox format to be flawed in a MIME world for
>which MH was never designed or redesigned. every attachment should be in
>its own file, even if that meant that messages were directories no
>longer files themselves.
>...

Historical note:

In the earliest discussions of what was to become mh, Stock Gaines and I (This
was even before Bruce Borden was involved) considered making messages be
directories, whose file members were components. But we dismissed the idea on
efficiency and disk storage grounds. We were thinking in terms of maybe 20 or 30
users sharing a single PDP 1145. There was also the issue of how radical to be,
in order to gain acceptance in a world where not messages, but folders, or even
ensembles of folders were single files.

    Norman Shapiro

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Jerrad Pierce | 23 Jun 2012 21:42

Re: mime-aware filtering?

nmh tools ignore non-numeric filenames, doesn't it?

A possible way to solve the access to MIME parts problem
might be to store the parts as messageNumber.partNumber*
Creation of these parts would be optional, and eat space,
but it would make indexing/grepping easy.

It seems you'd just need a demultiplexer (we have one
already, it just names things a bit differently), and for
rmm plus refile to handle parts if present. That would be
sufficient for things to function I think, but optional
features might be for (mh)show, to use the pre-decoded
parts if present, etc.

Would this not be a workable solution for some of our
common woes?

*Maybe msg.part=filename if supplied

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Ken Hornstein | 24 Jun 2012 04:34
X-Face
Picon
Favicon

Re: mime-aware filtering?

>nmh tools ignore non-numeric filenames, doesn't it?

To answer that question more specifically ... it will ignore any filename
that fails m_atoi(), which will reject anything that contains something
that is !isdigit().

>A possible way to solve the access to MIME parts problem
>might be to store the parts as messageNumber.partNumber*
>Creation of these parts would be optional, and eat space,
>but it would make indexing/grepping easy.

You know ... given that & Norm's comments, that actually might work.
Thoughts?  FWIW, in my view that really only helps with the use of
non-nmh tools, but it still might make it worth doing.

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Paul Fox | 24 Jun 2012 14:38
Picon
Favicon

Re: mime-aware filtering?


ken wrote:
 > >nmh tools ignore non-numeric filenames, doesn't it?
 > 
 > To answer that question more specifically ... it will ignore any filename
 > that fails m_atoi(), which will reject anything that contains something
 > that is !isdigit().
 > 
 > >A possible way to solve the access to MIME parts problem
 > >might be to store the parts as messageNumber.partNumber*
 > >Creation of these parts would be optional, and eat space,
 > >but it would make indexing/grepping easy.
 > 
 > You know ... given that & Norm's comments, that actually might work.
 > Thoughts?  FWIW, in my view that really only helps with the use of
 > non-nmh tools, but it still might make it worth doing.

it's certainly a (relatively) simple way of implementing what paul (vixie)
has proposed a couple of times, while leveraging the filesystem namespace
for the storage of the parts.  i'm not convinced that introducing a
directory level might not be a good idea:  i.e., a message might have
the message file itself ("53") and a directory which mh would currently
ignore ("53.mime").  the directory could then contain lots of stuff that
would clutter the upper-level MH folder otherwise.

paul

 > 
 > --Ken
 > 
(Continue reading)

Ralph Corderoy | 24 Jun 2012 15:13
Picon

Re: mime-aware filtering?

Hi,

Paul Fox wrote:
> i'm not convinced that introducing a directory level might not be a
> good idea:  i.e., a message might have the message file itself ("53")
> and a directory which mh would currently ignore ("53.mime").  the
> directory could then contain lots of stuff that would clutter the
> upper-level MH folder otherwise.

I'd also prefer a directory for the email's contents.

Plan 9's upasfs(4) shows the kind of thing that can be done.
http://plan9.bell-labs.com/magic/man2html/4/upasfs

Cheers, Ralph.

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Yoshi Rokuko | 25 Jun 2012 19:49

Re: mime-aware filtering?

------- Ralph Corderoy on Sun, 24 Jun 2012 14:13:45 +0100 -------
> Hi,
> 
> Paul Fox wrote:
> > i'm not convinced that introducing a directory level might not be a
> > good idea:  i.e., a message might have the message file itself ("53")
> > and a directory which mh would currently ignore ("53.mime").  the
> > directory could then contain lots of stuff that would clutter the
> > upper-level MH folder otherwise.
> 
> I'd also prefer a directory for the email's contents.
> 
> Plan 9's upasfs(4) shows the kind of thing that can be done.
> http://plan9.bell-labs.com/magic/man2html/4/upasfs

+1

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Tethys | 26 Jun 2012 00:43
X-Face
Picon

Re: mime-aware filtering?


Ken Hornstein writes:

>>A possible way to solve the access to MIME parts problem
>>might be to store the parts as messageNumber.partNumber*
>>Creation of these parts would be optional, and eat space,
>>but it would make indexing/grepping easy.
>
>You know ... given that & Norm's comments, that actually might work.
>Thoughts? 

My only thought is that MIME is more than just the linear list of
attachments that many seem to believe, and we need to come up with
a naming convention capable of representing that. And even then,
deciding what to store as content for a given part isn't necessarily
straightforward. For example, if you have a multipart/alternative
part, how do you represent that in the filesystem? We've briefly
touched on some of this before:

	http://lists.nongnu.org/archive/html/nmh-workers/2012-02/msg00088.html

But whatever we do, it needs careful thought to cover the edge cases
that are increasingly becoming the common case in mail I'm being sent.

Tet

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers
(Continue reading)

Paul Vixie | 26 Jun 2012 00:58
Favicon

Re: mime-aware filtering?

On 6/25/2012 10:43 PM, Tethys wrote:
> Ken Hornstein writes:
>
>>> A possible way to solve the access to MIME parts problem
>>> might be to store the parts as messageNumber.partNumber*
>>> Creation of these parts would be optional, and eat space,
>>> but it would make indexing/grepping easy.
>> You know ... given that & Norm's comments, that actually might work.
>> Thoughts? 

i'm opposed. what should be in the file system is what SMTP received and
handed to /var/mail or whatever.

> My only thought is that MIME is more than just the linear list of
> attachments that many seem to believe, and we need to come up with
> a naming convention capable of representing that. And even then,
> deciding what to store as content for a given part isn't necessarily
> straightforward. For example, if you have a multipart/alternative
> part, how do you represent that in the filesystem? We've briefly
> touched on some of this before:
>
> 	http://lists.nongnu.org/archive/html/nmh-workers/2012-02/msg00088.html
>
> But whatever we do, it needs careful thought to cover the edge cases
> that are increasingly becoming the common case in mail I'm being sent.

thus my proposal which is to provide shell level commands that can
expose the message structure (as "msg.part{.subpart ...}") and something
like mhpath that will make you a /var/tmp file from the specified
part/subpart without any encoding, and then update the rest of the
(Continue reading)

Jon Steinhart | 26 Jun 2012 01:03

Re: mime-aware filtering?

Paul Vixie writes:
> On 6/25/2012 10:43 PM, Tethys wrote:
> > Ken Hornstein writes:
> >
> >>> A possible way to solve the access to MIME parts problem
> >>> might be to store the parts as messageNumber.partNumber*
> >>> Creation of these parts would be optional, and eat space,
> >>> but it would make indexing/grepping easy.
> >> You know ... given that & Norm's comments, that actually might work.
> >> Thoughts? 
> 
> i'm opposed. what should be in the file system is what SMTP received and
> handed to /var/mail or whatever.
> 
> > My only thought is that MIME is more than just the linear list of
> > attachments that many seem to believe, and we need to come up with
> > a naming convention capable of representing that. And even then,
> > deciding what to store as content for a given part isn't necessarily
> > straightforward. For example, if you have a multipart/alternative
> > part, how do you represent that in the filesystem? We've briefly
> > touched on some of this before:
> >
> > 	http://lists.nongnu.org/archive/html/nmh-workers/2012-02/msg00088.html
> >
> > But whatever we do, it needs careful thought to cover the edge cases
> > that are increasingly becoming the common case in mail I'm being sent.
> 
> thus my proposal which is to provide shell level commands that can
> expose the message structure (as "msg.part{.subpart ...}") and something
> like mhpath that will make you a /var/tmp file from the specified
(Continue reading)

Ken Hornstein | 26 Jun 2012 01:57
X-Face
Picon
Favicon

Re: mime-aware filtering?

>>> You know ... given that & Norm's comments, that actually might work.
>>> Thoughts? 
>
>i'm opposed. what should be in the file system is what SMTP received and
>handed to /var/mail or whatever.

My personal thoughts in terms of implementation was that "53" would be
the original message.  "53.mime" would contain the decoded MIME parts.

But it's important to note that while I'm not AGAINST this, I'm not
going to work on it myself.  AFAICT it only helps out people who want
to use Unix tools on MIME messages; it doesn't help nmh to be more MIME
aware.  I'm not against that, but I would personally rather focus my
own energy and time on better native MIME support.

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Jerrad Pierce | 26 Jun 2012 05:01

Re: mime-aware filtering?

You seem to have misunderstood my proposql.

Paul, Message 76 would still be what came over the wire,
however something like mhstore could optionally make 76.*
as the split out compoents

Tet, nothing in what I wrote implied you couldn't have
76.1.1.4 grep's, not going to care.

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Paul Vixie | 26 Jun 2012 05:14
Favicon

Re: mime-aware filtering?

On 2012-06-26 3:01 AM, Jerrad Pierce wrote:
> You seem to have misunderstood my proposql.
>
> Paul, Message 76 would still be what came over the wire,
> however something like mhstore could optionally make 76.*
> as the split out compoents
>
> Tet, nothing in what I wrote implied you couldn't have
> 76.1.1.4 grep's, not going to care.

i think i understood; i just don't want these other files in the MH
store. but read on for a friendly closure.

On 2012-06-26 3:05 AM, David Levine wrote:
> Paul Vixie wrote:
>
>> mhpart (or whatever) would need a -clean option to get rid
>> of the /var/tmp files it has made for you in this session.
>>
>> but i do not think we should pollute the Mail subdirectory
>> hierarchy with permanent copies of parts.
> nmh already has nmh-cache, how about putting parts there?
> They could go into a hierarchy that shadows the MH
> hierarchy, but with one root directory, e.g., 53.mime (or
> just 53) corresponding to each message.  That way scripts
> that troll the two hierarchies would look similar.  And
> a script that trolls the MH hierarchy would know where
> to look for the parts.
>
> The cache be populated on demand.  And cleaned up manually.
(Continue reading)

Jerrad Pierce | 26 Jun 2012 05:19

Re: mime-aware filtering?

Sorry for the premature reply.

I see now that Paul did understand my idea.
I can underatd that some might not want duplicate
content, but that's what I proposed it be optional.
A temporary cache does not allow for indexing.
Keeping it in Mail means you have whichever
decoded messages you want greppable/indexable;
be it done to all on inc, or manually for a select
few. Then, when you remove them message, the parts
get automagically wiped out by rmm.

refile & rmm are the only things that need to be
aware of this AFAICT. It could probably all be done
with scripts if there was a refileproc profile
component (presumaably passed source folder,
dest folder, msgnum[s]), although it may take
some effort to bend mhstore to these ends.

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Paul Vixie | 26 Jun 2012 06:30
Favicon

Re: mime-aware filtering?

On 2012-06-26 3:19 AM, Jerrad Pierce wrote:
> Sorry for the premature reply.
>
> I see now that Paul did understand my idea.
> I can underatd that some might not want duplicate
> content, but that's what I proposed it be optional.
> A temporary cache does not allow for indexing.

i'm ok with that. disk space is cheap. the index can keep copies of the
content. the mh hook system can keep them in synch. unless you have
multiple terabytes of stored e-mail you'll never feel the cost of the
second copy.

> Keeping it in Mail means you have whichever
> decoded messages you want greppable/indexable;
> be it done to all on inc, or manually for a select
> few. Then, when you remove them message, the parts
> get automagically wiped out by rmm.

i don't see how to support indexing on a read-only mail store if we're
interleaving the files. while bboards may be long gone usenet is still
out there, and imap too.

paul

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

(Continue reading)

Paul Fox | 26 Jun 2012 13:45
Picon
Favicon

Re: mime-aware filtering?


paul vixie wrote:
 > On 2012-06-26 3:19 AM, Jerrad Pierce wrote:
 > > Sorry for the premature reply.
 > >
 > > I see now that Paul did understand my idea.
 > > I can underatd that some might not want duplicate
 > > content, but that's what I proposed it be optional.
 > > A temporary cache does not allow for indexing.
 > 
 > i'm ok with that. disk space is cheap. the index can keep copies of the
 > content. the mh hook system can keep them in synch. unless you have
 > multiple terabytes of stored e-mail you'll never feel the cost of the
 > second copy.
 > 
 > > Keeping it in Mail means you have whichever
 > > decoded messages you want greppable/indexable;
 > > be it done to all on inc, or manually for a select
 > > few. Then, when you remove them message, the parts
 > > get automagically wiped out by rmm.
 > 
 > i don't see how to support indexing on a read-only mail store if we're
 > interleaving the files. while bboards may be long gone usenet is still
 > out there, and imap too.

why couldn't an indexer know the difference between the message file
and the content cache?

anyway:  i think i still prefer the idea that the content cache
directories be kept in the message tree.  but i also understand why
(Continue reading)

Paul Vixie | 26 Jun 2012 19:45
Favicon

Re: mime-aware filtering?

On 2012-06-26 11:45 AM, Paul Fox wrote:
> anyway:  i think i still prefer the idea that the content cache
> directories be kept in the message tree.  but i also understand why
> one might want them separate.  if the idea is that the message tree
> and the cache tree are roughly isomorphic, i'll bet that could be made
> a per-user choice, as long as the content directories were really
> named "53.mime/" and not simply "53/" -- i.e., the messages and the
> mime-dirs could either live in the same tree or not, since they use
> different parts of the namespace.  (but clients certainly would need
> to be careful not to assume one model or the other.)

lots of code (here i'm thinking of uw-imap) makes the assumption that if
there's a directory then it's a folder. such names need not be
all-numeric or semi-numeric. you'd have to preface the name with a dot
('.') to prevent it from opendir()'ing or even chdir()'ing. i see this
as an unfortunate and unnecessary burden on code whose assumptions have
been valid for a long time.

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Jerrad Pierce | 26 Jun 2012 20:13

Re: mime-aware filtering?

>as an unfortunate and unnecessary burden on code whose assumptions have
>been valid for a long time.
But it's still an assumption, and we know what those mean.
More seriously though, is there an actual spec for MH declaring
what valid folder and filenames are?

What's the worst-case for those using older software with assumptions?
They see sub-folders, all ending in .mime, with no valid messages within them.
Annoying perhaps, but not fatal. It could actually be useful, because parts
that are valid messages could be linked to a valid filename for them to access.

Making MIME directories dotted is a work-around, but that's a bit of an
annoyance for things/users wishing to access to them, depending upon the
languages available to you e.g; having to be sure to exclude . and ..

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Paul Fox | 26 Jun 2012 20:15
Picon
Favicon

Re: mime-aware filtering?


paul wrote:
 > On 2012-06-26 11:45 AM, Paul Fox wrote:
 > > anyway:  i think i still prefer the idea that the content cache
 > > directories be kept in the message tree.  but i also understand why
 > > one might want them separate.  if the idea is that the message tree
 > > and the cache tree are roughly isomorphic, i'll bet that could be made
 > > a per-user choice, as long as the content directories were really
 > > named "53.mime/" and not simply "53/" -- i.e., the messages and the
 > > mime-dirs could either live in the same tree or not, since they use
 > > different parts of the namespace.  (but clients certainly would need
 > > to be careful not to assume one model or the other.)
 > 
 > lots of code (here i'm thinking of uw-imap) makes the assumption that if
 > there's a directory then it's a folder. such names need not be
 > all-numeric or semi-numeric. you'd have to preface the name with a dot
 > ('.') to prevent it from opendir()'ing or even chdir()'ing. i see this
 > as an unfortunate and unnecessary burden on code whose assumptions have
 > been valid for a long time.

ah, good point.  i never ever use nested folders, and so didn't even
consider that issue.  and i also wasn't considering non-nmh clients
of the tree.

paul
=---------------------
 paul fox, pgf <at> foxharp.boston.ma.us (arlington, ma, where it's 64.6 degrees)

_______________________________________________
Nmh-workers mailing list
(Continue reading)

Anders Eriksson | 27 Jun 2012 10:44

Re: mime-aware filtering?

On 2012-06-26 06:30, Paul Vixie wrote:
> On 2012-06-26 3:19 AM, Jerrad Pierce wrote:
> > Sorry for the premature reply.
> >
> > I see now that Paul did understand my idea.
> > I can underatd that some might not want duplicate
> > content, but that's what I proposed it be optional.
> > A temporary cache does not allow for indexing.
>
> i'm ok with that. disk space is cheap. the index can keep copies of the
> content. the mh hook system can keep them in synch. unless you have
> multiple terabytes of stored e-mail you'll never feel the cost of the
> second copy.
>
> > Keeping it in Mail means you have whichever
> > decoded messages you want greppable/indexable;
> > be it done to all on inc, or manually for a select
> > few. Then, when you remove them message, the parts
> > get automagically wiped out by rmm.
>
> i don't see how to support indexing on a read-only mail store if we're
> interleaving the files. while bboards may be long gone usenet is still
> out there, and imap too.
If we allow ourselves to touch the 822 part, we could annotate
it with a pointer (URL?) to the unrolled data. It would allow the
refile to continue to work unconstrained, and generally disconnect
the two sets nicely.

-Anders
> paul
(Continue reading)

Ken Hornstein | 26 Jun 2012 01:56
X-Face
Picon
Favicon

Re: mime-aware filtering?

>My only thought is that MIME is more than just the linear list of
>attachments that many seem to believe, and we need to come up with
>a naming convention capable of representing that. And even then,
>deciding what to store as content for a given part isn't necessarily
>straightforward. For example, if you have a multipart/alternative
>part, how do you represent that in the filesystem? We've briefly
>touched on some of this before:

I think that it's solvable; seems like the multipart "container" objects
wouldn't be represented in the filesystem.

>	http://lists.nongnu.org/archive/html/nmh-workers/2012-02/msg00088.html

I also note that thread included someone (who shall remain nameless) offering
to design a new API to replace m_getfld() :-)

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Paul Vixie | 26 Jun 2012 04:18
Favicon

Re: mime-aware filtering?

On 2012-06-25 11:56 PM, Ken Hornstein wrote:
> I also note that thread included someone (who shall remain nameless)
> offering to design a new API to replace m_getfld() :-)

let's start talking about what it should look like?

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Jon Steinhart | 26 Jun 2012 04:28

Re: mime-aware filtering?

Paul Vixie writes:
> On 2012-06-25 11:56 PM, Ken Hornstein wrote:
> > I also note that thread included someone (who shall remain nameless)
> > offering to design a new API to replace m_getfld() :-)
> 
> let's start talking about what it should look like?

Well, for starters, it shouldn't include any threatening commmentary!
Big thing that I think that it needs other than cleanup is the ability
to scan for attachment part headers instead of stopping at the end of
regular headers.

Jon

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Paul Vixie | 26 Jun 2012 04:33
Favicon

Re: mime-aware filtering?


On 2012-06-26 2:28 AM, Jon Steinhart wrote:
> Paul Vixie writes:
>
>> let's start talking about what it should look like?
> Well, for starters, it shouldn't include any threatening commmentary!
> Big thing that I think that it needs other than cleanup is the ability
> to scan for attachment part headers instead of stopping at the end of
> regular headers.

well that didn't take long. :-).

m_getfld() is the heart of MH. everything about the storage and access
model is contained in it, either by its signature or its logic. i'm
opposed to grafting MIME onto it with a couple more arguments that if
non-NULL will trigger additional behaviour.

so when i say "let's talk about what m_getfld should look like" i really
mean "let's talk about what MH's storage and access model should be."

int
m_getfld (int state, unsigned char *name, unsigned char *buf,
          int bufsz, FILE *iob)

your move.

paul

_______________________________________________
Nmh-workers mailing list
(Continue reading)

Jon Steinhart | 26 Jun 2012 04:50

Re: mime-aware filtering?

Paul Vixie writes:
> 
> On 2012-06-26 2:28 AM, Jon Steinhart wrote:
> > Paul Vixie writes:
> >
> >> let's start talking about what it should look like?
> > Well, for starters, it shouldn't include any threatening commmentary!
> > Big thing that I think that it needs other than cleanup is the ability
> > to scan for attachment part headers instead of stopping at the end of
> > regular headers.
> 
> well that didn't take long. :-).
> 
> m_getfld() is the heart of MH. everything about the storage and access
> model is contained in it, either by its signature or its logic. i'm
> opposed to grafting MIME onto it with a couple more arguments that if
> non-NULL will trigger additional behaviour.
> 
> so when i say "let's talk about what m_getfld should look like" i really
> mean "let's talk about what MH's storage and access model should be."
> 
> int
> m_getfld (int state, unsigned char *name, unsigned char *buf,
>           int bufsz, FILE *iob)
> 
> your move.
> 
> paul

OK, well, I understand your point of view here but I really don't think
(Continue reading)

Ken Hornstein | 26 Jun 2012 04:59
X-Face
Picon
Favicon

Re: mime-aware filtering?

>hindsight, but one of them was to extend the definition of headers.  So,
>I'm proposing that m_getfld be extended so that it finds these "extended"
>headers.

I think Paul made his convincing case here that m_getfld() needs to die:

http://lists.gnu.org/archive/html/nmh-workers/2012-01/msg00248.html

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Paul Vixie | 26 Jun 2012 05:08
Favicon

Re: mime-aware filtering?


On 2012-06-26 2:50 AM, Jon Steinhart wrote:
> Paul Vixie writes:
>> ...
>>
>> int
>> m_getfld (int state, unsigned char *name, unsigned char *buf,
>>           int bufsz, FILE *iob)
>>
>> your move.
> OK, well, I understand your point of view here but I really don't think
> that my point of view is really different.  As far as I can tell (once
> I get past the dire warnings), the m_getfld looks for stuff in a mail
> message and stops once it gets what it needs.  It was designed in the
> age before MIME, so its notion of what constituted headers was limited.

not just that. its idea of what content is, is limited. its callers
generally expect fully decoded text, but it's perfectly capable of
returning quoted-printable or base64. the caller has currently got the
responsibility to know that it's encoded and to know how to decode it.
unsurprisingly, most parts of MH don't do this. so, to decode something
you use a special version of the command (mhshow vs. show, for example).
this is wrong, and it isn't working.

to revise the API we have to figure out what the callers need, yes, but
also what the callers should be forced to do differently. i think we're
going to have to start from an idealized environment and work backward
to the practical.

> Now, MIME did many things that maybe should have been kept separate in
(Continue reading)

Ken Hornstein | 26 Jun 2012 05:21
X-Face
Picon
Favicon

Re: mime-aware filtering?

>at a high level, how do people feel about callbacks vs. state blobs?
>that is, would we like the replacement for m_getfld() to continue to
>return each time it finds something, maintaining its state in a
>caller-supplied opaque state blob, or would we like it to call the
>caller's "work function" every time it discovers a new object?
>that's the level we have to plan at, if we're going to get MH out of the
>1980's. (where it totally ruled, btw.)

So I've been inside all of that code a lot more than when we first had
that discussion in January.

In my experience, callers of m_getfld() want one of two things:

- They want everthing of a particular "thing".  They want all headers (to
  iterate over all of them) or the complete body (to search/display it).
  Example: show.
- They want ONE particular thing; they just have to look through the whole
  parts to do it.  Example: anything that uses mh-format; right now the
  current design of m_getfld() means you have to look over all of headers
  to get the ones you care about.  It would make things a lot cleaner if
  the API just let us pick out the one(s) we care about.

As for callbacks versus state blobs, I think callbacks are fine for
threaded or event-driven programming where you tend to do things
asychronously.  But since we're going to be pretty synchronous (I
think) I'd rather have state blobs.

--Ken

_______________________________________________
(Continue reading)

Ken Hornstein | 26 Jun 2012 05:02
X-Face
Picon
Favicon

Re: mime-aware filtering?

>m_getfld() is the heart of MH.

Truer words have never been spoken; it's used by so much (including the
profile parser).

>so when i say "let's talk about what m_getfld should look like" i really
>mean "let's talk about what MH's storage and access model should be."
>
>int
>m_getfld (int state, unsigned char *name, unsigned char *buf,
>          int bufsz, FILE *iob)

Okay ... just shooting from the hip, and based on our discussion back in
January ... here's something (I'm ignoring how this would be implemented
for now, and I'm not defining any of the structures).  I hope these
functions would be obvious in operation.

int nmh_openmsg(struct message, messagehandle *, char **error);

int nmh_getheader(messagehandle, const char *, char **header, int *numheaders,
		  char **error);

int nmh_getmime(messagehandle, mimehandle_ret *, char **error);

int nmh_openmime(mimehandle, char **type, char **subtype,
		 int *nested, mimehandle_ret *, char **error);

int nmh_nextmime(mimehandle, char **type, char **subtype, int *iterator,
		 char **error);

(Continue reading)

Paul Vixie | 27 Jun 2012 01:11
Favicon

Re: mime-aware filtering?

On 2012-06-26 3:02 AM, Ken Hornstein wrote:
>> int m_getfld (int state, unsigned char *name, unsigned char *buf, int
>> bufsz, FILE *iob) 
> Okay ... just shooting from the hip, and based on our discussion back
> in January ... here's something (I'm ignoring how this would be
> implemented for now, and I'm not defining any of the structures). I
> hope these functions would be obvious in operation.

this is a good start, assuming that the places which currently use
m_getfld() could be mollified by it.

> int nmh_openmsg(struct message, messagehandle *, char **error); int
> nmh_getheader(messagehandle, const char *, char **header, int
> *numheaders, char **error); int nmh_getmime(messagehandle,
> mimehandle_ret *, char **error); int nmh_openmime(mimehandle, char
> **type, char **subtype, int *nested, mimehandle_ret *, char **error);
> int nmh_nextmime(mimehandle, char **type, char **subtype, int
> *iterator, char **error); int nmh_closemime(mimehandle); int
> nmh_closemsg(message); I'm sure there are problems with this, just
> wanted to get the ball rolling.

i'm ignoring stylistic quirks, for example, i'd return an "struct
message *" from the open function, and it would contain function
pointers to the "methods" of the "object".

i'm ignoring correctness concerns, like how do the objects inside "char
**x; int *y" get freed.

i'm ignoring naming concerns, whereby i think that "nmh_" is the wrong
prefix for these, since they could be used for any message that's in a
(Continue reading)

Jeffrey Honig | 27 Jun 2012 01:18

Re: mime-aware filtering?

A few points on this discussion:


1) The person who promised to re-write the API was an Internet Elder.  Google it.

2) Callbacks vs data structures

   One reason you might want to have callbacks is that the content might be GPG or otherwise encrypted and you may want to prompt the user.  You could of course put methods/callbacks in the data structure to handle this.

3) Expanding MIME messages into dirs

   a) Don't forget about encrypted content when using a cache,  you probably don't want to cache it.
   b) If you use .msgnum.mime would most clients ignore the dirs (i.e. .55.mime)?
_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Paul Vixie | 27 Jun 2012 01:26
Favicon

Re: mime-aware filtering?

On 2012-06-26 11:18 PM, Jeffrey Honig wrote:
> A few points on this discussion:
>
> 1) The person who promised to re-write the API was an Internet Elder.
>  Google it.

and after that... bite me.

> 2) Callbacks vs data structures
>
>    One reason you might want to have callbacks is that the content
> might be GPG or otherwise encrypted and you may want to prompt the
> user.  You could of course put methods/callbacks in the data structure
> to handle this.

i think a part handler could read/write from a mime_part_t into a gnupg
pipe either way. we may want to offer a recursive iterator that does the
callback thing, for callers who prefer working that way. but such
callers would have to maintain their own ancestor-state to know which
leg of an alternative-multipart they were in, and so on. so it's not
obviously easier, just different.

> 3) Expanding MIME messages into dirs
>
>    a) Don't forget about encrypted content when using a cache,  you
> probably don't want to cache it.

i agree that you certainly would not want to cache the cleartext. but
caching a second copy of the crypted text, where the part it was in got
copied to a file somewhere and all the base64 got decoded, is no big deal?

>    b) If you use .msgnum.mime would most clients ignore the dirs (i.e.
> .55.mime)?

all the mh directory processors i've written (or in the case of uw imap,
that i've patches) ignore dirent's whose name begin with a dot, before
they bother to stat() it to see if it's a directory or not. i think we
could ignore those who don't. but i still prefer not to permanently
unpack mimeballs. the authoritative source of a message is what came in
over SMTP, with a received: header added. in fact i'd've been willing to
keep the \r\n line terminations, though that ship has already sailed.
anything that's non-canonical should be in a separate storage container,
such as nmh-cache.

paul

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Christian Neukirchen | 26 Jun 2012 19:40
Picon

Re: mime-aware filtering?

Paul Vixie <paul <at> redbarn.org> writes:

> i consider MH's basic mailbox format to be flawed in a MIME world for
> which MH was never designed or redesigned. every attachment should be in
> its own file, even if that meant that messages were directories no
> longer files themselves.
>
> note, i know we can't do that and i know why. i am not proposing it.

Some Plan 9 guys have explored this aproach:
http://plan9.lsub.org/magic/man2html/1/mails

It should not pose a problem to modern filesystems to have that many
directories.

--

-- 
Christian Neukirchen  <chneukirchen <at> gmail.com>  http://chneukirchen.org

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

David Levine | 26 Jun 2012 05:05
Picon
Favicon

Re: mime-aware filtering?

Paul Vixie wrote:

> mhpart (or whatever) would need a -clean option to get rid
> of the /var/tmp files it has made for you in this session.
> 
> but i do not think we should pollute the Mail subdirectory
> hierarchy with permanent copies of parts.

nmh already has nmh-cache, how about putting parts there?
They could go into a hierarchy that shadows the MH
hierarchy, but with one root directory, e.g., 53.mime (or
just 53) corresponding to each message.  That way scripts
that troll the two hierarchies would look similar.  And
a script that trolls the MH hierarchy would know where
to look for the parts.

The cache be populated on demand.  And cleaned up manually.
Temporal locality of reference suggests not cleaning after
each use, but rather periodically.

David

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

David Levine | 27 Jun 2012 05:00
Picon
Favicon

Re: mime-aware filtering?

Paul Fox wrote:

> why couldn't an indexer know the difference between the message file
> and the content cache?
> 
> anyway:  i think i still prefer the idea that the content cache
> directories be kept in the message tree.  but i also understand why
> one might want them separate.  if the idea is that the message tree
> and the cache tree are roughly isomorphic, i'll bet that could be made
> a per-user choice, as long as the content directories were really
> named "53.mime/" and not simply "53/" -- i.e., the messages and the
> mime-dirs could either live in the same tree or not, since they use
> different parts of the namespace.  (but clients certainly would need
> to be careful not to assume one model or the other.)

If we follow and enforce these rules:

1) Files in the message tree can only be named [1-9][0-9]*
   or `mhparam mh-sequences` (defaults to .mh_sequences).
   I think that's what an MH folder is.  The old
   documentation mentions "standard entries", but I can only
   find mh-sequences now.

2) Subfolders in the message tree cannot match the form
   specified in 1).  nmh doesn't currently enforce this now:
   some nmh programs (scan) complain about a subfolder named
   inbox/2000, but folder happily creates it (but should not).

   It's OK for a top-level message folder to be named
   [1-9][0-9]* (or even .mh_sequences, but I wouldn't recommend
   that).

3) Files and directories in the cache tree cannot match the
   form specified in 1).

Then you could do, e.g.,

  Path: Mail
  nmh-private-cache: Mail

to have them in the same directory.

David

_______________________________________________
Nmh-workers mailing list
Nmh-workers <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Gmane