24 Aug 13:16
Text obscured by subelement
From: Richard Baron Penman <richardbp+lxml <at> gmail.com>
Subject: Text obscured by subelement
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-08-24 11:19:18 GMT
Subject: Text obscured by subelement
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-08-24 11:19:18 GMT
hello,
I have a document with a format like this:
<doc>text1<b>text2</b>text3<b>text4</b>text5</doc>
I want to extract 'text1text3text5' from <doc> but the text attribute returns just 'text1'. Here is an example:
from lxml import html
doc = html.fromstring('<doc>text1<b>text2</b>text3<b>text4</b>text5</doc>')
print doc.text # 'text1'
print doc.tail # ''
print doc.text_content() # 'text1text2text3text4text5'
for child in doc:
child.drop_tree()
print doc.text # 'text1text3text5'
From the example you can see I can get what I want by first dropping the subelements.
Is there a better way to access this text?
regards,
Richard
I have a document with a format like this:
<doc>text1<b>text2</b>text3<b>text4</b>text5</doc>
I want to extract 'text1text3text5' from <doc> but the text attribute returns just 'text1'. Here is an example:
from lxml import html
doc = html.fromstring('<doc>text1<b>text2</b>text3<b>text4</b>text5</doc>')
print doc.text # 'text1'
print doc.tail # ''
print doc.text_content() # 'text1text2text3text4text5'
for child in doc:
child.drop_tree()
print doc.text # 'text1text3text5'
From the example you can see I can get what I want by first dropping the subelements.
Is there a better way to access this text?
regards,
Richard
_______________________________________________ lxml-dev mailing list lxml-dev <at> codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev
RSS Feed