9 May 16:09
Getting info from an XML file that has invalid character data in it (and how to specify recover option)
From: Ben <bba <at> inbox.com>
Subject: Getting info from an XML file that has invalid character data in it (and how to specify recover option)
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-09 14:10:06 GMT
Subject: Getting info from an XML file that has invalid character data in it (and how to specify recover option)
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-09 14:10:06 GMT
Hello
I'm writing some code to check whether our daily backups worked. Backup Exec stores its
results in XML files. Sometimes bad characters - or maybe it is binary data - ends up in
these XML files and then lxml chokes:
C:\>python sb-lxml.py
Traceback (most recent call last):
File "sb-lxml.py", line 5, in <module>
Xml = etree.parse(XmlFileName)
File "lxml.etree.pyx", line 2520, in lxml.etree.parse (src/lxml/lxml.etree.c:22062)
File "parser.pxi", line 1309, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:53088)
File "parser.pxi", line 1338, in lxml.etree._parseDocumentFromURL
(src/lxml/lxml.etree.c:53337)
File "parser.pxi", line 1248, in lxml.etree._parseDocFromFile
(src/lxml/lxml.etree.c:52584)
File "parser.pxi", line 828, in lxml.etree._BaseParser._parseDocFromFile
(src/lxml/lxml.etree.c:50115)
File "parser.pxi", line 452, in lxml.etree._ParserContext._handleParseResultDoc
(src/lxml/lxml.etree.c:47023)
File "parser.pxi", line 536, in lxml.etree._handleParseResult
(src/lxml/lxml.etree.c:47861)
File "parser.pxi", line 478, in lxml.etree._raiseParseError
(src/lxml/lxml.etree.c:47285)
lxml.etree.XMLSyntaxError: PCDATA invalid Char value 11, line 132, column 95
The offending line looks like this (not sure if the bad characters will make it through
the email):
</error><error>Directory not found. Can not backup directory \Data\\l Strategy - Progress
(Continue reading)
RSS Feed