Anders Bruun Olsen | 11 Sep 22:13

Serialization with namespaces

Hi,

I need to chop up some XML based on XPath expressions and serialize the
resulting chunks individually. I thought LXML would be perfect for this
task but have run into some problems.

Here is the sample I use, test.xtm:
<?xml version="1.0" encoding="UTF-8"?>
<topicMap
        xmlns="http://www.topicmaps.org/xtm/1.0/"
        xmlns:xlink="http://www.w3.org/1999/xlink"
        id="personnavnereg1">
        <topic id="abeleHenriksdatter">
                <instanceOf>
                        <topicRef xlink:href="person-template.xtmp#kvinde"/>
                </instanceOf>
                <baseName>
                        <baseNameString>Abele Henriksdatter i Radsted,
Gotfred Bangs hustru</baseNameString>
                </baseName>
                <occurrence>
                        <instanceOf>
                                <topicRef
xlink:href="DDref.xtmp#dato1407.01.06"/>
                        </instanceOf>
                        <resourceRef
xlink:href="http://xxx/diplomer/07-002.html"/>
                </occurrence>
        </topic>
</topicMap>
(Continue reading)

Stefan Behnel | 12 Sep 12:19

Re: Serialization with namespaces


Anders Bruun Olsen wrote:
> Now the problem occurs when I try to serialize. When I serialize the
> root, everything looks fine:
> 
>    >>> etree.tostring(root, pretty_print=True)
>    '<topicMap xmlns="http://www.topicmaps.org/xtm/1.0/"
> xmlns:xlink="http://www.w3.org/1999/xlink" id="personnavnereg1">
>    ...
> 
> The XML Namespace is applied as it should. However on the topic-element
> that I found using XPath no XML Namespace is output:
> 
>    >>> etree.tostring(elem, pretty_print=True)
>    '<topic id="abeleHenriksdatter">\n\t\t<instanceOf>\n\t\t\t<topicRef
>   ...
> 
> Even though the nsmap attribute is set correctly:
> 
>    >>> elem.nsmap
>    {None: 'http://www.topicmaps.org/xtm/1.0/', 'xlink':
> 'http://www.w3.org/1999/xlink'}

Hmm, I actually thought these problems were gone with 1.3, but I can reproduce
this with the current trunk.

I'll look into it.

Stefan
(Continue reading)

Stefan Behnel | 12 Sep 12:47

Re: Serialization with namespaces


Stefan Behnel wrote:
> Anders Bruun Olsen wrote:
>> Now the problem occurs when I try to serialize. When I serialize the
>> root, everything looks fine:
>>
>>    >>> etree.tostring(root, pretty_print=True)
>>    '<topicMap xmlns="http://www.topicmaps.org/xtm/1.0/"
>> xmlns:xlink="http://www.w3.org/1999/xlink" id="personnavnereg1">
>>    ...
>>
>> The XML Namespace is applied as it should. However on the topic-element
>> that I found using XPath no XML Namespace is output:
>>
>>    >>> etree.tostring(elem, pretty_print=True)
>>    '<topic id="abeleHenriksdatter">\n\t\t<instanceOf>\n\t\t\t<topicRef
>>   ...
>>
>> Even though the nsmap attribute is set correctly:
>>
>>    >>> elem.nsmap
>>    {None: 'http://www.topicmaps.org/xtm/1.0/', 'xlink':
>> 'http://www.w3.org/1999/xlink'}
> 
> Hmm, I actually thought these problems were gone with 1.3, but I can reproduce
> this with the current trunk.

Ok, so the problem here is libxml2. It serialises only the namespaces that are
defined on the node itself, not all those that are defined in the node's context.

(Continue reading)

Stefan Behnel | 12 Sep 14:54

Re: Serialization with namespaces


Stefan Behnel wrote:
> Stefan Behnel wrote:
>> Anders Bruun Olsen wrote:
>>> Now the problem occurs when I try to serialize. When I serialize the
>>> root, everything looks fine:
>>>
>>>    >>> etree.tostring(root, pretty_print=True)
>>>    '<topicMap xmlns="http://www.topicmaps.org/xtm/1.0/"
>>> xmlns:xlink="http://www.w3.org/1999/xlink" id="personnavnereg1">
>>>    ...
>>>
>>> The XML Namespace is applied as it should. However on the topic-element
>>> that I found using XPath no XML Namespace is output:
>>>
>>>    >>> etree.tostring(elem, pretty_print=True)
>>>    '<topic id="abeleHenriksdatter">\n\t\t<instanceOf>\n\t\t\t<topicRef
>>>   ...
>>>
>>> Even though the nsmap attribute is set correctly:
>>>
>>>    >>> elem.nsmap
>>>    {None: 'http://www.topicmaps.org/xtm/1.0/', 'xlink':
>>> 'http://www.w3.org/1999/xlink'}
>> Hmm, I actually thought these problems were gone with 1.3, but I can reproduce
>> this with the current trunk.
> 
> Ok, so the problem here is libxml2. It serialises only the namespaces that are
> defined on the node itself, not all those that are defined in the node's context.

(Continue reading)

Anders Bruun Olsen | 12 Sep 15:19

Re: Serialization with namespaces

Stefan Behnel wrote:
> Stefan Behnel wrote:
>> Stefan Behnel wrote:
>>> Anders Bruun Olsen wrote:
>>>> Now the problem occurs when I try to serialize. When I serialize the
>>>> root, everything looks fine:
>>>>
>>>>    >>> etree.tostring(root, pretty_print=True)
>>>>    '<topicMap xmlns="http://www.topicmaps.org/xtm/1.0/"
>>>> xmlns:xlink="http://www.w3.org/1999/xlink" id="personnavnereg1">
>>>>    ...
>>>>
>>>> The XML Namespace is applied as it should. However on the topic-element
>>>> that I found using XPath no XML Namespace is output:
>>>>
>>>>    >>> etree.tostring(elem, pretty_print=True)
>>>>    '<topic id="abeleHenriksdatter">\n\t\t<instanceOf>\n\t\t\t<topicRef
>>>>   ...
>>>>
>>>> Even though the nsmap attribute is set correctly:
>>>>
>>>>    >>> elem.nsmap
>>>>    {None: 'http://www.topicmaps.org/xtm/1.0/', 'xlink':
>>>> 'http://www.w3.org/1999/xlink'}
>>> Hmm, I actually thought these problems were gone with 1.3, but I can reproduce
>>> this with the current trunk.
>> Ok, so the problem here is libxml2. It serialises only the namespaces that are
>> defined on the node itself, not all those that are defined in the node's context.
> 
> Here's a patch (against the trunk) that works for me. It copies the node
(Continue reading)

Anders Bruun Olsen | 12 Sep 15:22

Re: Serialization with namespaces

Anders Bruun Olsen wrote:
> Something seems amiss with the patch:

Sorry, my bad, you have of course already applied it to trunk.

--

-- 
Anders
Anders Bruun Olsen | 12 Sep 15:28

Re: Serialization with namespaces

Anders Bruun Olsen wrote:
> Anders Bruun Olsen wrote:
>> Something seems amiss with the patch:
> 
> Sorry, my bad, you have of course already applied it to trunk.
> 

However, it seems that trunk does not build:

$ make
python setup.py  build_ext -i
Building with Cython.
Building lxml version 2.0.alpha2-46501
running build_ext
building 'lxml.etree' extension

Error converting Pyrex file to C:
------------------------------------------------------------
...
include "xmlerror.pxi"     # Error and log handling
include "classlookup.pxi"  # Element class lookup mechanisms
include "nsclasses.pxi"    # Namespace implementation and registry
include "docloader.pxi"    # Support for custom document loaders
include "parser.pxi"       # XML Parser
include "parsertarget.pxi" # ET Parser target
^
------------------------------------------------------------

/home/abo/tmp/lxml/src/lxml/etree.pyx:2156:0: 'parsertarget.pxi' not found
make: *** [inplace] Error 1
(Continue reading)

Stefan Behnel | 12 Sep 22:02

Re: Serialization with namespaces


Anders Bruun Olsen wrote:
> /home/abo/tmp/lxml/src/lxml/etree.pyx:2156:0: 'parsertarget.pxi' not found

Ah, thanks. I forgot that file when committing the target parser
implementation. Fixed now.

Stefan
Anders Bruun Olsen | 13 Sep 13:58

Re: Serialization with namespaces

Stefan Behnel wrote:
>> /home/abo/tmp/lxml/src/lxml/etree.pyx:2156:0: 'parsertarget.pxi' not found
> Ah, thanks. I forgot that file when committing the target parser
> implementation. Fixed now.

Okay, trunk builds now. And I can confirm that the namespace patch works.

Thanks! :)

--

-- 
Anders

Gmane