C K Kashyap | 29 Jul 08:21 2012
Picon

Capturing the parent element as I parse XML using parsec

Hi,


With the help of the cafe I've been able to write up the xml parser using parsec - https://github.com/ckkashyap/really-simple-xml-parser/blob/master/RSXP.hs

I am struggling with an idea though - How can I capture the parent element of each element as I parse? Is it possible or would I have to do a second pass to do the fixup?

Regards,
Kashyap
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Antoine Latter | 29 Jul 17:54 2012
Picon

Re: Capturing the parent element as I parse XML using parsec

On Sun, Jul 29, 2012 at 1:21 AM, C K Kashyap <ckkashyap <at> gmail.com> wrote:
> Hi,
>
> With the help of the cafe I've been able to write up the xml parser using
> parsec -
> https://github.com/ckkashyap/really-simple-xml-parser/blob/master/RSXP.hs
>
> I am struggling with an idea though - How can I capture the parent element
> of each element as I parse? Is it possible or would I have to do a second
> pass to do the fixup?
>

What are you trying to do? Maybe you could give an example of what
you'd like to produce?

Generally speaking, having tree elements in a Haskell datatype point
to their parent and their children is asking for trouble - it means
you can't change any part of the tree without re-building the entire
tree (otherwise your parent pointers point to the parent in the old
version of the tree).

If you're interested in complex traversals and transformation of XML
trees, I like the cursor API here:
http://hackage.haskell.org/packages/archive/xml/1.3.12/doc/html/Text-XML-Light-Cursor.html

HaXML is also popular for whole-tree queries and transformations.

Antoine
Richard O'Keefe | 30 Jul 00:44 2012
Picon

Re: Capturing the parent element as I parse XML using parsec


On 29/07/2012, at 6:21 PM, C K Kashyap wrote:
> I am struggling with an idea though - How can I capture the parent element of each element as I parse? Is it
possible or would I have to do a second pass to do the fixup?

Why do you *want* the parent element of each element?
One of the insanely horrible aspects of the Document Object Model is that every
element is nailed in place by pointers everywhere, with the result that you
cannot share elements, and even moving an element was painful.
I still do a fair bit of SGML/XML process in C using a "Document Value Model"
library that uses hash consing, and it's so much easier it isn't funny.

While you are traversing a document tree it is useful to keep track of the
path from the root.  Given

    data XML
       = Element String [(String,String)] [XML]
       | Text String

you do something like

    traverse :: ([XML] -> [a] -> a) -> ([XML] -> String -> a) -> XML -> a
    traverse f g xml = loop [] xml
      where loop ancs (Text s)           = g ancs  s
            loop ancs e <at> (Element _ _ ks) = f ancs' (map (loop ancs') ks)
                                           where ancs' = e:ancs

(This is yet another area where Haskell's non-strictness pays off.)
If you do that, then you have the parent information available without
it being stored in the tree.
C K Kashyap | 30 Jul 11:53 2012
Picon

Re: Capturing the parent element as I parse XML using parsec

Thank you Richard and Antoine.


I think I see the pointlessness of my ask. 

Regards,
Kashyap

On Mon, Jul 30, 2012 at 4:14 AM, Richard O'Keefe <ok <at> cs.otago.ac.nz> wrote:

On 29/07/2012, at 6:21 PM, C K Kashyap wrote:
> I am struggling with an idea though - How can I capture the parent element of each element as I parse? Is it possible or would I have to do a second pass to do the fixup?

Why do you *want* the parent element of each element?
One of the insanely horrible aspects of the Document Object Model is that every
element is nailed in place by pointers everywhere, with the result that you
cannot share elements, and even moving an element was painful.
I still do a fair bit of SGML/XML process in C using a "Document Value Model"
library that uses hash consing, and it's so much easier it isn't funny.

While you are traversing a document tree it is useful to keep track of the
path from the root.  Given

    data XML
       = Element String [(String,String)] [XML]
       | Text String

you do something like

    traverse :: ([XML] -> [a] -> a) -> ([XML] -> String -> a) -> XML -> a
    traverse f g xml = loop [] xml
      where loop ancs (Text s)           = g ancs  s
            loop ancs e <at> (Element _ _ ks) = f ancs' (map (loop ancs') ks)
                                           where ancs' = e:ancs

(This is yet another area where Haskell's non-strictness pays off.)
If you do that, then you have the parent information available without
it being stored in the tree.





_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane