23 Aug 09:02
.text_content() should leave spaces. Tests included
From: Max Ivanov <ivanov.maxim <at> gmail.com>
Subject: .text_content() should leave spaces. Tests included
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-08-23 07:05:14 GMT
Subject: .text_content() should leave spaces. Tests included
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-08-23 07:05:14 GMT
Hi! I've run into another strange behaviour. lxml.html.HTMLParser
produces html elements with similair API as Etree elements, but with
some additions. One of them is .text_content() method. Some quote from
docs: "Returns the text content of the element, including the text
content of its children, with no markup."
So according to description it transforms
"<span>element1</span><span>element2</span>" to "element1element2".
Notice the lack of space between contents of two elements. From my
point of view, that's make this method quite useless, it would be
better if it produce "element1 element2" from same string. Here is a
test fro test_htmlparser.py:
def test_html_text_content(self):
from lxml.html import HTMLParser
element = self.etree.HTML(self.html_str, parser=HTMLParser())
self.assertEquals(element.text_content(),"test page title")

RSS Feed