LuKreme | 18 Aug 2011 12:57
Favicon

Domain based sorting

A certain person in my household who shall remain nameless has a habit of signing up for email newsletters or
whatever at a rather astonishing rate, and I’ve been unable to keep up with any sort of sorting
methodology that can help her.. errr, can help this nameless person keep up with the mail.

I have tried to setup procmail so that ‘real’ mail gets sent in to the inbox and everything else gets sent
into a misc box, but too much ‘real’ mail misses the inbox and it is difficult (for some reason I don’t
understand) for this person to search and find the important mail.

Now, this isn’t spam, it’s stuff like amazon, woot, group on, land’s end, and several dozens of
others, and all email that, at least in theory, is ‘wanted’

So, my idea was instead of dumping it into Misc.2011-08 I would dump it in misc.<domain> and then as long as
she has some clue as to where the mail might be from, she should be able to find it. Er, this unnamed person,
that is.

this would be simple if the stupid mailers used Precedence: bulk like they are SUPPOSED to, but let’s not
go there.

So, my idea is this:

(cribbed from Sean)

# I’m already doing from here to defining FROM_DOMAIN anyway in the procmailrc
:0 h
CLEANFROM=|formail -IReply-To: -rtzxTo:

# username portion
:0
* CLEANFROM ?? ^\/[^ <at> ]+
{ FROM_USER=$MATCH }
(Continue reading)

LuKreme | 18 Aug 2011 20:14
Favicon

Re: Domain based sorting

LuKreme <kremels <at> kreme.com> squawked out on Thursday 18-Aug-2011 <at> 04:57:46
> One thing I will have to fix is that the FROM_DOMAIN will contain, for example, mx3.domain.tld and I want it
to contain just “domain”. That’s trivial though (And in fact, I may have to check the procmailrc,
but it might already be grabbed into a variable I’ve forgotten about).

OK, this should have been trivial enough, but my fu has failed me.

given Sean’s CELAN_FROM and FROM_DOMAIN I tried:

:0
* FROM_DOMAIN ?? .*\/([^\.]+)
{ ROOT_DOMAIN = $MATCH }

which works if the from domain is “domain.tld” but fails if the domain is “mail.domain.tld” (I get “mail”)

So, I tried to anchor it to the end:

:0
* FROM_DOMAIN ?? .*\/([^\.]+)\....?$
{ ROOT_DOMAIN = $MATCH }

but that always gives me “domain.tld” which confuses me because I thought the () match gave that
portion to $MATCH

So, I started to think (dangerous, I know) and I searched and found Sean’s post from a few of years ago about
dealing with getting domains in domain.co.uk sorts of situations:

Professional Software Engineering <PSE-L <at> mail.professional.org> squawked out on Sunday 25-Jan-2009 <at> 14:30:44
> # first, match the domain down to JUST the rightmost two tokens
> :0
(Continue reading)

LuKreme | 18 Aug 2011 20:31
Favicon

Re: Domain based sorting

LuKreme <kremels <at> kreme.com> squawked out on Thursday 18-Aug-2011 <at> 12:14:46
> So, I started to think (dangerous, I know) and I searched and found Sean’s post from a few of years ago
about dealing with getting domains in domain.co.uk sorts of situations:

And a few minutes later I found Dan’s post in the same thread with (trimmed down to just the part I want)

TLDREGEX = ([cC][oO][.][^.][^.]|[^.]+)

# Get the domain name
:0
* $ FQDN  ?? ()\/[^.]+[.]$TLDREGEX^^
* MATCH ?? ^^\/[^.]+
{ DOMPART = $MATCH }

This works perfectly as far as I can tell.

--

-- 
'It's vital to remember who you really are. It's very important. It
isn't a good idea to rely on other people or things to do it for you,
you see. They always get it wrong.' --Sourcery

Re: Domain based sorting

At 11:14 2011-08-18, LuKreme wrote:
>given Sean's CELAN_FROM and FROM_DOMAIN I tried:
>
>:0
>* FROM_DOMAIN ?? .*\/([^\.]+)

break it down:
         .*              match zero or more of anything
         \/              start match capture
         ([^\.]+)        match one or more of anything NOT a dot

that regexp is intended to grab the first domain token.

since the front isn't anchored, there's no real need for .* before 
the match operator (though if you drop them, owing to some parsing 
issues in procmail, you should have a () before the match 
trigger).  Now, if you KNOW there's a hostname (or, by your 
presumption that you won't be dealing with domain.co.uk style 
domains), you could count dots

Try:

* FROM_DOMAIN ?? ()\/[^\.]+\.[^\.]+$

That'll capture the last two nodes of a domain specification

         mail.domain.tld -> domain.tld
         domain.tld -> domain.tld
         foo.mail.domain.tld -> domain.tld
         host.demon.co.uk -> co.uk                       (!!!)
(Continue reading)

Re: Domain based sorting

At 11:31 2011-08-18, LuKreme wrote:
>LuKreme <kremels <at> kreme.com> squawked out on Thursday 18-Aug-2011 <at> 12:14:46
> > So, I started to think (dangerous, I know) and I searched and 
> found Sean's post from a few of years ago about dealing with 
> getting domains in domain.co.uk sorts of situations:
>
>And a few minutes later I found Dan's post in the same thread with 
>(trimmed down to just the part I want)
>
>TLDREGEX = ([cC][oO][.][^.][^.]|[^.]+)

Doesn't need to be case sensitive unless someone explicitly makes a 
recipe case sensitive by specifying the 'D' flag.  The following is 
more succinct, and accomplishes the same thing within the example recipe:

TLDREGEX = (co[.][^.][^.]|[^.]+)

Note that the [.] expression might more commonly be expressed as 
\.  but one would have to double-escape it to \\. for the slash to 
appear in the resulting regexp string, so character classing it is in 
fact clearer.

># Get the domain name
>:0
>* $ FQDN  ?? ()\/[^.]+[.]$TLDREGEX^^
>* MATCH ?? ^^\/[^.]+
>{ DOMPART = $MATCH }
>
>This works perfectly as far as I can tell.

(Continue reading)

LuKreme | 19 Aug 2011 00:40
Favicon

Re: Domain based sorting

Professional Software Engineering <PSE-L <at> mail.professional.org> squawked out on Thursday 18-Aug-2011 <at> 14:02:36
> Considering the ICANN decision to open up the TLD naming to pretty much anything, some thought needs to be
put into how domains are parsed - there's sure to be a LOT logic that will break.

I’m hoping that once I get a handle on who/what is sending all this email I will be able to eliminate a huge
bulk of the email, or at least funnel it all into one out of the way place.

As for new TLDs, it really depends on who uses them and how.

--

-- 
'You have the right to remain silent,' he [Carrot] said. 'You have the
right not to injure yourself falling down the steps on the way to the
cells. You have the right not to jump out of high windows. You do not
have to say anything, you see, but anything you do say, well, I have to
take it down and it might be used as evidence.' --Guards! Guards!
Alan Clifford | 19 Aug 2011 01:31
Favicon

Re: Domain based sorting

On Thu, 18 Aug 2011, LuKreme wrote:

> A certain person in my household who shall remain nameless has a habit 
> of signing up for email newsletters or whatever at a rather astonishing 
> rate, and I’ve been unable to keep up with any sort of sorting 
> methodology that can help her.. errr, can help this nameless person keep 
> up with the mail.
>

In the first instance, couldn't you use a list in a file of domains or 
email addresses of the bacn emails that you already know about and 
redirect then to a separate mail folder?  It might be a bit simpler to do 
and get rid of a lot of the problem very quickly.

--

-- 
Alan

(  Please address personal email to alan+1 <at>  as email to lists <at> 
    is only read from my subscribed lists. )

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail <at> lists.RWTH-Aachen.de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail

LuKreme | 19 Aug 2011 07:46
Favicon

Re: Domain based sorting

Alan Clifford <lists <at> clifford.ac> squawked out on Thursday 18-Aug-2011 <at> 17:31:16
> On Thu, 18 Aug 2011, LuKreme wrote:
> 
>> A certain person in my household who shall remain nameless has a habit of signing up for email newsletters
or whatever at a rather astonishing rate, and I’ve been unable to keep up with any sort of sorting
methodology that can help her.. errr, can help this nameless person keep up with the mail.
>> 
> 
> In the first instance, couldn't you use a list in a file of domains or email addresses of the bacn emails that
you already know about and redirect then to a separate mail folder?

Yes, I do that. but the flood has become a torrent and I’ve been unable to keep up. And since *I* do not know
what is really wanted, what is sorta wanted in theory, and what is not wanted, I have to proceed with care.

Also, that means going in to the mailserver quit frequently and updating this list manually. The key
advantage to this domain sorting is that 1) it’s automatic and 2) it will I’ve me a list very quickly I
can then use to try to cull some of the crap.

Also, some of these ‘lists’ send a whole lot of messages. Many are simply once a month newsletters, but
many are once a day, or seemingly so. Putting the once a month stuff in with the once a day stuff means the
lower frequency is simply overwhelmed.

>  It might be a bit simpler to do and get rid of a lot of the problem very quickly.

I’m looking at this as a multi-stage problem. First, get a handle on everything that is coming in, THEN
figure out how best to sort it. Right now, too much important wheat is getting lost in the (theoretically)
desired chaff.

--

-- 
ARE YOU FAMILIAR WITH THE WORDS 'DEATH WAS HIS CONSTANT COMPANION'? 'But
(Continue reading)


Gmane