7 Sep 22:13
lxml.html adds a default doctype to HTML documents
From: James Graham <jg307 <at> cam.ac.uk>
Subject: lxml.html adds a default doctype to HTML documents
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-09-07 20:15:59 GMT
Subject: lxml.html adds a default doctype to HTML documents
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-09-07 20:15:59 GMT
In [2]: from lxml import html
In [3]: t = html.fromstring("<html><p>Hello World")
In [4]: docinfo = t.getroottree().docinfo
In [5]: docinfo.public_id
Out[5]: '-//W3C//DTD HTML 4.0 Transitional//EN'
Is it possible to prevent this from occurring? I couldn't see anything in the
API documentation but I might have been missing something obvious. Silently
gaining incorrect data is annoying :)
--
--
"Eternity's a terrible thought. I mean, where's it all going to end?"
-- Tom Stoppard, Rosencrantz and Guildenstern are Dead
RSS Feed