Lars Eggert | 18 Jul 10:14

historic mailing list archives

Hi,

the mailing list search engine I emailed about earlier
(http://people.nokia.net/~lars/ietfsearch.html 
) only finds emails that are available in an HTML-ified archive  
(hypermail, mhonarc, pipermail, etc.)

Although most of our current mailing lists have such archives (esp.  
all the ones hosted on ietf.org), many historic working groups have  
archives that are only available in mbox format. If they are available  
at all - when I checked the screen scraped archive URLs on http://tools.ietf.org/wg/concluded 
  I was shocked how many are unvailable. It looks like we're in  
serious danger to loose some of our institutional memory.

So, I'm thinking about starting some concerted effort to collect and  
HTMLify the archives of historic IETF mailing lists. For example, I  
grabbed quite large number of them from
http://gd.tuwien.ac.at/infosys/network/docs/ietf.org/concluded-wg-ietf-mail-archive/ 
, and I'm getting ready to convert them via mhonarc.

Questions:

   - do people think this is a worthwhile effort?

   - where should I collect the archives and host the HTML from?
     (I'd prefer something at ietf.org rather than my own server)

Lars
Frank Ellermann | 18 Jul 12:55

Re: historic mailing list archives

Lars Eggert wrote:

> I'm thinking about starting some concerted effort to collect  
> and HTMLify the archives of historic IETF mailing lists.

If a list is available on GMaNe, and its archive there isn't
complete, you can submit the mbox files to feed this archive:
<http://gmane.org/import.php>

The opposite, GMane has an archive, and you want a copy,
might be also possible:  <http://gmane.org/export.php>

> - do people think this is a worthwhile effort?

Sure, that is why GMaNe offers this feature.  AFAIK it's the
only place with a complete USFEOR archive (before and after
the move of the list some years ago), just an example.

 Frank
Lars Eggert | 18 Jul 13:30

Re: historic mailing list archives

On 2008-7-18, at 13:55, ext Frank Ellermann wrote:
> Lars Eggert wrote:
>> I'm thinking about starting some concerted effort to collect
>> and HTMLify the archives of historic IETF mailing lists.
...
>> - do people think this is a worthwhile effort?
>
> Sure, that is why GMaNe offers this feature.  AFAIK it's the
> only place with a complete USFEOR archive (before and after
> the move of the list some years ago), just an example.

Right, but none of these third-party sites is ever complete (GMaNe for  
example didn't have the first WG I tried, which was TCPM), and because  
the data isn't under our control, they can go away and we're left with  
no archive. I think the IETF should hold this data.

Lars
Frank Ellermann | 18 Jul 14:18

Re: historic mailing list archives

Lars Eggert wrote:

> none of these third-party sites is ever complete

Actually *NO* site is ever complete, beginning with
the IETF, because that's a complex issue:

* Many IETF lists are not hosted by the IETF, but by 
  others
* Many lists start their life outside of the "IETF
  other lists" or "IETF WG lists", e.g., IDNAbis
* GMaNe uses its own "group names", e.g., IMAA and
  IMA lists (long before that became the EAI WG)
* Some "IETF" lists on GMaNe never became real IETF
  lists (e.g., CLEAR, COSMOGOL) for various reasons
* Some "other lists" such as xml2rfc or spf-discuss
  end up in other places when available on GMaNe:
  gmane.text.xml.rfc
  gmane.mail.spam.spf.discuss
* IETF lists that didn't interest any GMaNe user
  enough to propose a subscription are unavailable

That can lead to all variations you can think of,
including cases where the GMaNe archive is the best
available archive.  

> the data isn't under our control, they can go away
> and we're left with no archive.

My confidence in GMaNe is stronger than in the IETF
(Continue reading)

Lars Eggert | 18 Jul 14:41

Re: historic mailing list archives

Hi,

On 2008-7-18, at 15:18, ext Frank Ellermann wrote:
> Lars Eggert wrote:
>> none of these third-party sites is ever complete
>
> Actually *NO* site is ever complete, beginning with
> the IETF, because that's a complex issue:

Understood.

>> the data isn't under our control, they can go away
>> and we're left with no archive.
>
> My confidence in GMaNe is stronger than in the IETF
> wrt list archives:

I have no opinion about the longevity/availability of gmane, never  
having used it.

> GMaNe has search, NNTP access, various web interfaces
> with nice picons, access by Message-ID, permalinks,
> and the killer application:  Access on a raw message
> including all headers, exactly what I need to check
> obscure mail and MIME issues.  GMaNe can also filter
> obscure "list footers" or "subject tags".

> The IETF list archives offer none of these features,
> all essential from my POV.

(Continue reading)

Frank Ellermann | 18 Jul 15:26

Re: historic mailing list archives

Lars Eggert wrote:

> all this is about archive features, not about archive  
> longevity/availability. I agree that they are nice
> features, maybe even essential ones, but availability
> trumps them all.

I use GMaNe for 5+ years, sometimes there are glitches,
as with IETF lists hosted by the IETF, only different.

If you try to build new archives based on whatever you
find somewhere it would be a huge project.  

> This is the case even for recently concluded WGs such
> as pki4ipsec or, ahem, infamous WGs such as newtrk.

<http://dir.gmane.org/gmane.ietf.newtrk>  NNTP tells
me that GMaNe has 1214 articles (up to 2006-06).  It's
hard to tell what Google knows about this, you could
try two patterns:

<http://permalink.gmane.org/gmane.ietf.newtrk/*>
<http://article.gmane.org/gmane.ietf.newtrk/*>

However cross-posted articles to more than one list
available on GMaNe will only get an URL in the first
group.  If somebody posted to the general list *and*
Newtrk (that happened) it would AFAIK get only one
permalink on the general list.

(Continue reading)

Lars Eggert | 18 Jul 15:49

Re: historic mailing list archives

Hi, Frank,

I don't think the IETF can depend on gmane or any other external  
archiving service.

There are simply too many past examples of such services that have  
been used to host IETF lists that have disappeared, as evidenced by  
the defunct archive links on http://tools.ietf.org/wg/concluded

Lars

On 2008-7-18, at 16:26, ext Frank Ellermann wrote:

> Lars Eggert wrote:
>
>> all this is about archive features, not about archive
>> longevity/availability. I agree that they are nice
>> features, maybe even essential ones, but availability
>> trumps them all.
>
> I use GMaNe for 5+ years, sometimes there are glitches,
> as with IETF lists hosted by the IETF, only different.
>
> If you try to build new archives based on whatever you
> find somewhere it would be a huge project.
>
>> This is the case even for recently concluded WGs such
>> as pki4ipsec or, ahem, infamous WGs such as newtrk.
>
> <http://dir.gmane.org/gmane.ietf.newtrk>  NNTP tells
(Continue reading)

Frank Ellermann | 18 Jul 16:28

Re: historic mailing list archives

Lars Eggert wrote:

> I don't think the IETF can depend on gmane or any
> other external archiving service.

Apparently it does in the case of the newtrk archive,
or various "other lists".  Depending on how Google
indexes such archives is also risky.  I just found
that Google changed the format of "CSE annotation"
files *again*, i.e. shortly after the documentation
was in a remotely understandable state what really
happens is *again* unrelated to this documentation.

I didn't have "*.psg.com/*" and added that, before
that addition I got 69 hits for "clean slate rrg":

<http://purl.net/xyzzy/-a9/clean+slate+rrg>

Now I get less hits (62), LOL.  But with the dupes
included PSG.COM shows up, 100 hits.  The effect
depends also on the order of the keywords, 56 hits 
for <http://purl.net/xyzzy/-a9/rrg+clean+slate>

> too many past examples of such services that have  
> been used to host IETF lists that have disappeared,
> as evidenced by the defunct archive links on
> http://tools.ietf.org/wg/concluded

That list only has the former "original" archives, 
for Newtrk you could use a still existing archive
(Continue reading)

Thomas Narten | 22 Jul 18:40
Favicon

Re: historic mailing list archives

My take is that we need a short (very short, actually) requirements
document about mailing list archives. The topic as been a sore point
for me for many years already. E.g.,

- all archives should be downloadable as flat files that can be imported
  into a favorite local MUA

- web based archvies are fine too (OK, also a MUST), but need to meet
  certain criteria: 

   - have permanent (non-changing URLs)
   - have decent user interfaces (a number of IETF archives don't
     allow searching and don't even let you go to specific dates to
     browse, or just let you scroll forwards/backwards one screen at a
     time starting at the most recent message)

Once there is agreement on the  basic requirements, we can talk about
which tools work best (or not), and which tool the IETF should deploy
on its IT infrastructure.

Thomas

Gmane