Re: haskell xml parsing for larger files?
Christian Maeder <Christian.Maeder <at> dfki.de>
2014-02-20 16:02:28 GMT
I'm afraid our use case is not a lazy prefix traversal.
I'm more shocked that about 100 MB xml content do not fit (as tree) into
3 GB memory.
Am 20.02.2014 16:49, schrieb malcolm.wallace:
> Is your usage pattern over the constructed tree likely to be a lazy
> prefix traversal? If so, then HaXml supports lazy construction of the
> parse tree. Some plots appear at the end of this paper, showing how
> memory usage can be reduced to a constant, even for very large inputs (1
> million tree nodes):
> On 20 Feb, 2014,at 11:30 AM, Christian Maeder <Christian.Maeder <at> dfki.de>
>> I've got some difficulties parsing "large" xml files (> 100MB).
>> A plain SAX parser, as provided by hexpat, is fine. However,
>> constructing a tree consumes too much memory on a 32bit machine.
>> see http://trac.informatik.uni-bremen.de:8080/hets/ticket/1248
>> I suspect that sharing strings when constructing trees might greatly
>> reduce memory requirements. What are suitable libraries for string pools?
>> Before trying to implement something myself, I'ld like to ask who else
>> has tried to process large xml files (and met similar memory problems)?
>> I have not yet investigated xml-conduit and hxt for our purpose. (These
>> look scary.)
>> In fact, I've basically used the content trees from "The (simple) xml
>> package" and switching to another tree type is no fun, in particular if
>> this gains not much.
>> Thanks Christian
>> Glasgow-haskell-users mailing list
>> Glasgow-haskell-users <at> haskell.org
>> <mailto:Glasgow-haskell-users <at> haskell.org>