brother.gabriel | 13 Dec 2010 18:27
Gravatar

a little pcre regex help

Hello, y'all! , I have to pcre match a text file that contains the following (this is a snippet).
    <date>2
    <Body text>The Most Holy Name of Jesus--W (II)
    <date>3
    <Body text>Ferial--W (IV)
    <Body text>St. Frances Xavier Cabrini
    <Saint kind>(See Nov. 13)
    <date>4
    <Body text>Ferial--W (IV)
    <date>5
    <Body text>Ferial--W (IV)
    <Body text>St. Telesphorus
    <Saint kind>Pope, Martyr--R (Comm.)
I need to match from "<date> etc."  to the end of the line before the next <date>
Now, I can find simply the line with <date>, but I need the whole section. Notice the last section will end
with the end of file.
Would y'all be so kind as to help me out?
Thanks!
-Brother Gabriel-Marie

------------------------------------

PowerPro can be found here: http://www.ppro.org/
and here: http://ppro.pcrei.com/Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/powerpro-beginners/

<*> Your email settings:
    Individual Email | Traditional
(Continue reading)

Sheri | 13 Dec 2010 20:28
Picon

Re: a little pcre regex help

Hi Brother Gabriel-Marie,

In your snippet each line starts with four spaces. I'm going to assume the actual text is not indented. If it
is, you would need to address that.

Here are two patterns that should do it without including the leading "<date>" in the match:

(?sm)^<date>\K.*?(?=\R<date>)
or
(?sm)(?<=^<date>).*?(?=\R<date>)

Seems pointless to include the leading <date> tag, but your description appears to want it. If so:

(?sm)^<date>.*?(?=\R<date>)

If each line really does begin with four spaces, you could change the caret for any of the above to:

^\x20{4}

and the lookahead to:

(?=\R\x20{4}<date>)

Let me know if you still have issues or need explanations.

Happy Holidays!
Sheri

------------------------------------

(Continue reading)


Gmane