Paul Hoffman | 4 Aug 17:40

Python code for extracting title from an RFC or Draft?

Greetings again. I looked on the tools site and don't see any source 
code. I would love a tool that, given a draft or RFC, extracts the 
title from the top of the first page. (Extra points for pulling out 
the abstract). If I have to do it as "given the filename, look in 
1id-abstracts.txt or rfc-index.xml", that's fine, but I would like 
the code for that as well. No wheel-recreating, if not needed.

As a side note, it seems odd that the code y'all have created isn't 
on tools.ietf.org...

--Paul Hoffman, Director
--VPN Consortium
Henrik Levkowetz | 4 Aug 18:40

Re: Python code for extracting title from an RFC or Draft?


On 2008-08-04 17:43 Paul Hoffman said the following:
> Greetings again. I looked on the tools site and don't see any source 
> code. I would love a tool that, given a draft or RFC, extracts the 
> title from the top of the first page. (Extra points for pulling out 
> the abstract). If I have to do it as "given the filename, look in 
> 1id-abstracts.txt or rfc-index.xml", that's fine, but I would like 
> the code for that as well. No wheel-recreating, if not needed.
> 
> As a side note, it seems odd that the code y'all have created isn't 
> on tools.ietf.org...

Maybe it's mostly a matter of a missing index page and documentation,
or something, I guess ...

The datatracker source is available through
  http://tools.ietf.org/tools/ietfdb

and the following covers a number of other known and unknown tools.
Not all will be up-to-date and used in generating the tools pages,
but you're welcome to browse and comment.

  http://tools.ietf.org/tools/doublespace
  http://tools.ietf.org/tools/getdrafts
  http://tools.ietf.org/tools/html-404
  http://tools.ietf.org/tools/htmlize
  http://tools.ietf.org/tools/idcomments
  http://tools.ietf.org/tools/idnits
  http://tools.ietf.org/tools/idreplaced
  http://tools.ietf.org/tools/id_rss
(Continue reading)

Joe Touch | 4 Aug 18:42
Favicon

Re: Python code for extracting title from an RFC or Draft?


Henrik Levkowetz wrote:
|
|
| On 2008-08-04 17:43 Paul Hoffman said the following:
|> Greetings again. I looked on the tools site and don't see any source
|> code. I would love a tool that, given a draft or RFC, extracts the
|> title from the top of the first page. (Extra points for pulling out
|> the abstract). If I have to do it as "given the filename, look in
|> 1id-abstracts.txt or rfc-index.xml", that's fine, but I would like the
|> code for that as well. No wheel-recreating, if not needed.

Why not pull the info from the XML rfc-index?

Joe
Paul Hoffman | 4 Aug 19:21

Re: Python code for extracting title from an RFC or Draft?

At 9:42 AM -0700 8/4/08, Joe Touch wrote:
>Why not pull the info from the XML rfc-index?

As I said, I'm happy to. If someone has already written the code to 
do so, that would help. I haven't done any XML coding in Python 
before. It is actually more important for me to be able to pull it 
for Internet Drafts.

I don't see anything on Henrik's list that does either of these...

--Paul Hoffman, Director
--VPN Consortium
Henrik Levkowetz | 4 Aug 20:53

Re: Python code for extracting title from an RFC or Draft?


On 2008-08-04 18:42 Joe Touch said the following:
> 
> 
> Henrik Levkowetz wrote:
> |
> |
> | On 2008-08-04 17:43 Paul Hoffman said the following:
> |> Greetings again. I looked on the tools site and don't see any source
> |> code. I would love a tool that, given a draft or RFC, extracts the
> |> title from the top of the first page. (Extra points for pulling out
> |> the abstract). If I have to do it as "given the filename, look in
> |> 1id-abstracts.txt or rfc-index.xml", that's fine, but I would like the
> |> code for that as well. No wheel-recreating, if not needed.
> 
> Why not pull the info from the XML rfc-index?

For RFCs specifically you could do that, and I also have the titles available,
in a variety of other formats, but that doesn't match Paul's specification of
being able to extract title from both Drafts and RFCs...

	Henrik

_______________________________________________
Tools-discuss mailing list
Tools-discuss <at> ietf.org
https://www.ietf.org/mailman/listinfo/tools-discuss
(Continue reading)

Joe Touch | 5 Aug 04:15
Favicon

Re: Python code for extracting title from an RFC or Draft?


Henrik Levkowetz wrote:
|
|
| On 2008-08-04 18:42 Joe Touch said the following:
|>
|>
|> Henrik Levkowetz wrote:
|> |
|> |
|> | On 2008-08-04 17:43 Paul Hoffman said the following:
|> |> Greetings again. I looked on the tools site and don't see any source
|> |> code. I would love a tool that, given a draft or RFC, extracts the
|> |> title from the top of the first page. (Extra points for pulling out
|> |> the abstract). If I have to do it as "given the filename, look in
|> |> 1id-abstracts.txt or rfc-index.xml", that's fine, but I would like the
|> |> code for that as well. No wheel-recreating, if not needed.
|>
|> Why not pull the info from the XML rfc-index?
|
| For RFCs specifically you could do that, and I also have the titles
| available,
| in a variety of other formats, but that doesn't match Paul's
| specification of
| being able to extract title from both Drafts and RFCs...

Hmmm. Given the automation of the ID submission process, it might be
useful to ask the ID-administrator to support this rather than screen
scraping (or page scraping, as the case may be).

(Continue reading)

Dan Wing | 6 Aug 17:42
Favicon

Re: Python code for extracting title from an RFC orDraft?

(resending; mailman didn't like my .zip attachment.)

I have been using abstract.sed for awhile, and just whipped up title.sed
tonight for you.  Hope it helps, both are in http://www2.fuggles.com/paul.zip
(along with tighten.sed).

To use it:

  sed -f abstract.sed FILENAME | sed -f tighten.sed
  sed -f title.sed FILENAME

Examples with some RFCs and an I-D:

> sed -f abstract.sed rfc1122.txt
   This is one RFC of a pair that defines and discusses the requirements
   for Internet host software.  This RFC covers the communications
   protocol layers: link layer, IP layer, and transport layer; its
   companion RFC-1123 covers the application and support protocols.

> sed -f abstract.sed rfc5037.txt | sed -f tighten.sed
   The purpose of this memo is to document how some of the requirements
   specified in RFC 1264 for advancing protocols developed by working
   groups within the IETF Routing Area to Draft Standard have been
   satisfied by LDP (Label Distribution Protocol).  Specifically, this
   report documents operational experience with LDP, requirement 5 of
   section 5.0 in RFC 1264.

> sed -f title.sed rfc5037.txt
         Experience with the Label Distribution Protocol (LDP)

(Continue reading)


Gmane