Stefan Behnel | 4 May 10:59
Picon

[Fwd: Re: (no subject)]

[Forwarding to the list ...]
From: <mharper3 <at> uiuc.edu>

Stefan --

Thanks so much for the quick response. I did consider that the tree was being
built in memory, but the documentation seems to suggest that is not the case.
Specifically the language in the tutorial
(http://codespeak.net/lxml/tutorial.html) in both the sections 'incremental
parsing' and 'event-driven parsing' seem to suggest using iterparse to access
without retaining the tree in memory. I see now that the documentation says
otherwise for iterparse, as you pointed out. If you don't mind, why does the
iterator retain the tree in memory? I would suspect otherwise from the
'natural' behavior of iterators/generators in general, though that may be an
invalid assumption. (i.e. I would parse the entire tree into memory if I
thought that I had enough memory to do so; otherwise I would _incrementally_
parse it.)

More specifically, I don't want to ignore any parts of the xml file in this
specific instance, so a ParserTarget is not the correct solution. Your
suggestion to use clear() works for me; maybe it should be made explicit in
the tutorial that memory is not cleared unless clear() is called. The only
mention in the tutorial is iterparse "also allows to clear() or modify the
content of an Element to save memory". My mistake was to assume that the
'used' elements would be freed without an explicit call to do so as the
iterator progressed.

Again, thank you for your quick reply!

 -- Marc
(Continue reading)

Stefan Behnel | 4 May 12:18
Picon

Re: saving memory with iterparse()

Hi,

Stefan Behnel wrote:
> From: <mharper3 <at> uiuc.edu>
> Thanks so much for the quick response. I did consider that the tree was being
> built in memory, but the documentation seems to suggest that is not the case.
> Specifically the language in the tutorial
> (http://codespeak.net/lxml/tutorial.html) in both the sections 'incremental
> parsing' and 'event-driven parsing' seem to suggest using iterparse to access
> without retaining the tree in memory.

It actually says:

"""
two event-driven parser interfaces, one that generates parser events
while building the tree (``iterparse``), and one that does not build the tree
at all, and instead calls feedback methods on a target object in a SAX-like
fashion.
"""

but I added a new example now that shows how to save memory.

http://codespeak.net/lxml/tutorial.html#event-driven-parsing

> If you don't mind, why does the
> iterator retain the tree in memory? I would suspect otherwise from the
> 'natural' behavior of iterators/generators in general, though that may be an
> invalid assumption. [...]
> My mistake was to assume that the
> 'used' elements would be freed without an explicit call to do so as the
(Continue reading)


Gmane