James Graham | 7 Sep 22:13
Favicon

lxml.html adds a default doctype to HTML documents

In [2]: from lxml import html

In [3]: t = html.fromstring("<html><p>Hello World")

In [4]: docinfo = t.getroottree().docinfo

In [5]: docinfo.public_id
Out[5]: '-//W3C//DTD HTML 4.0 Transitional//EN'

Is it possible to prevent this from occurring? I couldn't see anything in the 
API documentation but I might have been missing something obvious. Silently 
gaining incorrect data is annoying :)

--

-- 
"Eternity's a terrible thought. I mean, where's it all going to end?"
  -- Tom Stoppard, Rosencrantz and Guildenstern are Dead

Gmane