4 May 03:14
(no subject)
From: <mharper3 <at> uiuc.edu>
Subject: (no subject)
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-04 01:17:49 GMT
Subject: (no subject)
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-04 01:17:49 GMT
Hi lxml-dev: I'm getting glibc/MemoryError/cStringIO crashes/exceptions from the following (minimal reproduction) code: <code> import lxml.etree wiki_xml_filename = 'enwiki-latest-pages-articles.xml' # from http://download.wikimedia.org/enwiki/latest/ context = lxml.etree.iterparse(wiki_xml_filename, events=("end")) for action, elem in context: pass </code> The crash usually occurs about halfway through the file (around <page> 3,000,000) The same code runs on smaller mediawiki xml files (200 mb) without error. I only get this error for this very large xml file (in this case about 13gb uncompressed). I had no trouble parsing the same file with the python standard library sax parser, but it is much slower and I don't like its api. I'm using libxml2-2.6.32 (also used earlier versions), python 2.5.2, python-lxml 2.0.5 (also tried earlier versions), Kubuntu 8.04 with 2.6.24 kernel (also tested on opensuse 10.3 with earlier kernel). Some of the exceptions are MemoryErrors. The machine running the code has 4gb of ram. The kernel does not appear to significantly hit the swap during the run. Here are the errors: ** glibc detected *** python: free(): invalid pointer: 0x08220a15 *** Aborted(Continue reading)
RSS Feed