Scott Lawrence | 3 Mar 21:21 2012
Picon

Suckless ML archiver?

I notice that project_ideas lists having a decent mailing list web archiver 
system as a goal - I've been parsing RFC5322 messages anyway, so here's a 
quick hack of an archiver[1]. 300 lines of go (not counting the go-mail 
library, which adds another 300). Takes an mbox files and spits out a 
directory full of html-ified messages and an index file, with threading shown 
in a manner similar to (hy|pi)permail et al. Sorry I don't have any demo 
online - I don't have any interesting mbox files to run it on. No multipart 
support ATM, although it's easy to add, since that's in the go stdlib.

There are plenty of things that still need to be done to make this decent; if 
there's interest, I'd be happy to take suggestions and get it fully working. 
This is a just a "hey look at me!".

[1] https://github.com/bytbox/slark

p.s. thanks for dwm et al!

--

-- 
Scott Lawrence

Anselm R Garbe | 17 Mar 18:27 2012
Picon

Re: Suckless ML archiver?

On 3 March 2012 21:21, Scott Lawrence <bytbox <at> gmail.com> wrote:
> I notice that project_ideas lists having a decent mailing list web archiver
> system as a goal - I've been parsing RFC5322 messages anyway, so here's a
> quick hack of an archiver[1]. 300 lines of go (not counting the go-mail
> library, which adds another 300). Takes an mbox files and spits out a
> directory full of html-ified messages and an index file, with threading
> shown in a manner similar to (hy|pi)permail et al. Sorry I don't have any
> demo online - I don't have any interesting mbox files to run it on. No
> multipart support ATM, although it's easy to add, since that's in the go
> stdlib.
>
> There are plenty of things that still need to be done to make this decent;
> if there's interest, I'd be happy to take suggestions and get it fully
> working. This is a just a "hey look at me!".
>
> [1] https://github.com/bytbox/slark

The mlmmj output format is a directory consisting of files (1-n) where
each contains a single message in mbox format. The number (1-n) is
incremented for each message. For instance the dev <at> suckless.org
mailing list directory contains 11359 message files as of now. You
could extend your archiver to work on such a directory structure. Once
done, I would give it a go on the dev <at> suckless.org messages.

Cheers,
Anselm

Scott Lawrence | 17 Mar 20:56 2012
Picon

Re: Suckless ML archiver?

Hi Anselm,

On Sat, 17 Mar 2012, Anselm R Garbe wrote:

> The mlmmj output format is a directory consisting of files (1-n) where
> each contains a single message in mbox format. The number (1-n) is
> incremented for each message. For instance the dev <at> suckless.org
> mailing list directory contains 11359 message files as of now. You
> could extend your archiver to work on such a directory structure. Once
> done, I would give it a go on the dev <at> suckless.org messages.

A single message in mbox format? Or a single message in RFC5322 format (as 
typically found in mboxes)? Or single message in not-quite-standard format 
(such as used by pipermail behind the scenes)?

If the former, a call to `cat` would suffice to "extend" my archiver.

--

-- 
Scott Lawrence

Linux jagadai 3.2.9-1-ARCH #1 SMP PREEMPT Thu Mar 1 09:31:13 CET 2012 x86_64 Intel(R) Core(TM)2 Duo CPU
P8700  <at>  2.53GHz GenuineIntel GNU/Linux

Anselm R Garbe | 17 Mar 21:01 2012
Picon

Re: Suckless ML archiver?

On 17 March 2012 20:56, Scott Lawrence <bytbox <at> gmail.com> wrote:
> On Sat, 17 Mar 2012, Anselm R Garbe wrote:
>> The mlmmj output format is a directory consisting of files (1-n) where
>> each contains a single message in mbox format. The number (1-n) is
>> incremented for each message. For instance the dev <at> suckless.org
>> mailing list directory contains 11359 message files as of now. You
>> could extend your archiver to work on such a directory structure. Once
>> done, I would give it a go on the dev <at> suckless.org messages.
>
>
> A single message in mbox format? Or a single message in RFC5322 format (as
> typically found in mboxes)? Or single message in not-quite-standard format
> (such as used by pipermail behind the scenes)?

Sorry for the confusion, it is rfc5322 format.

> If the former, a call to `cat` would suffice to "extend" my archiver.

Ok, will give it a try.

Cheers,
Anselm

Scott Lawrence | 17 Mar 21:06 2012
Picon

Re: Suckless ML archiver?

On Sat, 17 Mar 2012, Anselm R Garbe wrote:

> On 17 March 2012 20:56, Scott Lawrence <bytbox <at> gmail.com> wrote:
>> On Sat, 17 Mar 2012, Anselm R Garbe wrote:
>>> The mlmmj output format is a directory consisting of files (1-n) where
>>> each contains a single message in mbox format. The number (1-n) is
>>> incremented for each message. For instance the dev <at> suckless.org
>>> mailing list directory contains 11359 message files as of now. You
>>> could extend your archiver to work on such a directory structure. Once
>>> done, I would give it a go on the dev <at> suckless.org messages.
>>
>>
>> A single message in mbox format? Or a single message in RFC5322 format (as
>> typically found in mboxes)? Or single message in not-quite-standard format
>> (such as used by pipermail behind the scenes)?
>
> Sorry for the confusion, it is rfc5322 format.
>
>> If the former, a call to `cat` would suffice to "extend" my archiver.
>
> Ok, will give it a try.

Oh, if it's just rfc5322, then a simple 'cat' won't do (slark expects an 
actual mbox ATM). I'll patch it to handle a sensible directory layout in the 
next few days. (Sorry about being so slow to make improvements - I'm somewhat 
overloaded for a few weeks.)

Other improvements needed (in case anybody wants to learn go by patching the 
go-mail library): handle multipart and the common message encodings, handle 
HTML messages elegantly (sanitize but leave basic styling when available?), 
(Continue reading)

markus schnalke | 18 Mar 09:46 2012
Picon

Re: Suckless ML archiver?

[2012-03-17 16:06] Scott Lawrence <bytbox <at> gmail.com>
> On Sat, 17 Mar 2012, Anselm R Garbe wrote:
> > On 17 March 2012 20:56, Scott Lawrence <bytbox <at> gmail.com> wrote:
> >> On Sat, 17 Mar 2012, Anselm R Garbe wrote:
> >>>
> >>> The mlmmj output format is a directory consisting of files (1-n) where
> >>> each contains a single message in mbox format. The number (1-n) is
> >>> incremented for each message. For instance the dev <at> suckless.org
> >>> mailing list directory contains 11359 message files as of now. You
> >>> could extend your archiver to work on such a directory structure. Once
> >>> done, I would give it a go on the dev <at> suckless.org messages.
> >>
> >> A single message in mbox format? Or a single message in RFC5322 format (as
> >> typically found in mboxes)? Or single message in not-quite-standard format
> >> (such as used by pipermail behind the scenes)?
> >
> > Sorry for the confusion, it is rfc5322 format.

That means, if you add a `.mh_sequences' file, then you have an MH
mail folder -- great.

> Oh, if it's just rfc5322, then a simple 'cat' won't do (slark expects an 
> actual mbox ATM).

If you have nmh installed, then you can use packf(1) to generate an
mbox, even to stdout (packf -file /dev/stdout | ...)

AFAIK the differences between an mbox containing one message and a
plain RFC822 message (MH mail store format) are the `From ' line (line
number 1) and that each subsequenc line starting with ``From '' will
(Continue reading)


Gmane