kris | 6 May 03:31

generative building of xml?


I am generating, processing and eventually serializing
several XML streams.   I was wondering if this was possible 
to do with lxml?

Here's the setup.  I've got several databases
generating XML content (which can be quite large), I really want
to be able to process the database record progressively 
generating XML and sending out on its own stream. 

An aggregator/filter  (elsewhere) will read the streams 
and parse them processing similar members and generate 
a new stream based on the combined streams.

DB1    DB2   DB3   Core database
XML    XML   XML   XML genaration
 WS     WS   WS     delivery over a stream using generator 
 |      |     |
 +------+-----+
       AGG          Parse and match incoming streams (iterparse)
       XML
       WS           send resulting merge as XML using generator.

So the questions:

1.. Does anybody have a recipe to build a recursive generator using
    Element?

2.  Given the above generator, is there any such 
    thing as a generator version etree.tostring?
(Continue reading)

Stefan Behnel | 8 May 09:22

Re: generative building of xml?

Hi,

kris wrote:
> I am generating, processing and eventually serializing
> several XML streams.   I was wondering if this was possible 
> to do with lxml?

Probably, although lxml is not designed for pipelined XML processing (any
better than SAX, that is).

It also depends on how your XML looks like. If it's from a database, it's
probably something simple like

  <root>
    <row>
      <column>...</column>
      ...
    </row>
    ...
  </root>

That shouldn't cause too many problems, you can use the (SAX-like) target
parser to copy it into a simple Python container class, use that inside your
program, merge all of those objects into a single stream at some point and
then generate a new XML stream from that.

> Here's the setup.  I've got several databases
> generating XML content (which can be quite large), I really want
> to be able to process the database record progressively 
> generating XML and sending out on its own stream. 
(Continue reading)

kris | 8 May 20:03

Re: generative building of xml?

On Thu, 2008-05-08 at 09:22 +0200, Stefan Behnel wrote:
> Hi,

> Probably, although lxml is not designed for pipelined XML processing (any
> better than SAX, that is).
> 
> It also depends on how your XML looks like. If it's from a database, it's
> probably something simple like
> 
>   <root>
>     <row>
>       <column>...</column>
>       ...
>     </row>
>     ...
>   </root>
> 
> That shouldn't cause too many problems, you can use the (SAX-like) target
> parser to copy it into a simple Python container class, use that inside your
> program, merge all of those objects into a single stream at some point and
> then generate a new XML stream from that.
> 
> 
> > Here's the setup.  I've got several databases
> > generating XML content (which can be quite large), I really want
> > to be able to process the database record progressively 
> > generating XML and sending out on its own stream. 
> > 
> > An aggregator/filter  (elsewhere) will read the streams 
> > and parse them processing similar members and generate 
(Continue reading)

Stefan Behnel | 9 May 10:47

Re: generative building of xml?

Hi,

kris wrote:
> On Thu, 2008-05-08 at 09:22 +0200, Stefan Behnel wrote:
>> If the interface is a generator (yielding strings, I assume), then you will
>> have to use the feed parser interface to copy the data into the parser,
>> otherwise, you can just use one thread per DB connection and have it read and
>> parse the data for you.
>>
>>> 2.  Given the above generator, is there any such 
>>>     thing as a generator version etree.tostring?
>> Nothing keeps you from yielding "<root>", followed by the serialised stream
>> entries (call tostring() on each separately), followed by a "</root>".
> 
> Unfortunately it is a tree structure.. I would like to visit the tree
> in something like;
> 
> yield "<root>"
> yield '  <child attr0="a" attr1="b" >  '
> yield '      <child ... '
> ...
> yield '      </child '
> yield '  </child>'
> yield '  <child attr0="c" attr1="d" >  '
> ...
> yield '</root'>

I think that's a bad idea, as you loose semantics that you will need to
recover in each generator step.

(Continue reading)


Gmane