Thomas Ginter | 14 Jun 2012 23:39
Picon
Favicon

Exception thrown during CAS serialization for Remote UIMA-AS Service

We are getting an odd error while trying to process large datasets using UIMA-AS 2.3.1.  There is an
exception thrown by the XmiCasSerializer in the Client when it is in the process of serializing a CAS to be
sent to a remote service.  The exception is as follows:

org.apache.uima.resource.ResourceProcessException
      at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:854)
      at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:885)
      at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.process(BaseUIMAAsynchronousEngineCommon_impl.java:734)
      at gov.va.vinci.flap.Client.run(Client.java:181)
      at gov.va.vinci.density.DensityClient.main(DensityClient.java:137)
Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 character: _, 0x1a
      at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
      at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:174)
      at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:1003)
      at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:755)
      at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:700)
      at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:268)
      at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$700(XmiCasSerializer.java:108)
      at org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1539)
      at org.apache.uima.aae.UimaSerializer.serializeCasToXmi(UimaSerializer.java:136)
      at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.serializeCAS(BaseUIMAAsynchronousEngineCommon_impl.java:260)
      at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:779)
      ... 4 more

It happens at apparently random points when processing the corpus and is never actually "thrown" but is
simply written to StdErr.  Also the serializer never seems to return which means the
UimaAsynchronoousEngine.process() method never returns and the client simply "hangs" until it is
manually terminated.  To resolve this issue I have implemented text filters for the incoming CAS data to
prevent anything out of the ASCII-8 range.  I have also tried switching the server and client to binary
serialization strategies but that causes the XmiCasSerializer in my UimaAsBaseListener object to
(Continue reading)

Jörn Kottmann | 14 Jun 2012 23:52
Picon

Re: Exception thrown during CAS serialization for Remote UIMA-AS Service

You write a string to the CAS which contains a non-xml character.
This character cannot be serialized into XMI, and thats what this 
exception is about.

Have a look at our documentation explaining the issue:
http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.xmi_emf.xml_character_issues

Hope that helps,
Jörn

On 06/14/2012 11:39 PM, Thomas Ginter wrote:
> We are getting an odd error while trying to process large datasets using UIMA-AS 2.3.1.  There is an
exception thrown by the XmiCasSerializer in the Client when it is in the process of serializing a CAS to be
sent to a remote service.  The exception is as follows:
>
> org.apache.uima.resource.ResourceProcessException
>        at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:854)
>        at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:885)
>        at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.process(BaseUIMAAsynchronousEngineCommon_impl.java:734)
>        at gov.va.vinci.flap.Client.run(Client.java:181)
>        at gov.va.vinci.density.DensityClient.main(DensityClient.java:137)
> Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 character: _, 0x1a
>        at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
>        at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:174)
>        at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:1003)
>        at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:755)
>        at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:700)
>        at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:268)
>        at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$700(XmiCasSerializer.java:108)
>        at org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1539)
(Continue reading)

Thomas Ginter | 15 Jun 2012 03:13
Picon
Favicon

Re: Exception thrown during CAS serialization for Remote UIMA-AS Service

Jorn,

Thanks for the link to that section of documentation.  The mention of the XMLUtils class was just what I
needed.  I wrote an XmlFilter class that uses XMLUtils to detect invalid XML characters and replace them
with spaces so that our annotation offsets will still match the original text.  I was thinking about the
issue all wrong.  I was assuming that all ASCII-8 characters are also valid XML-1.0 characters.

Thanks,

Thomas Ginter
801-448-7676
thomas.ginter@...

On Jun 14, 2012, at 3:52 PM, Jörn Kottmann wrote:

> You write a string to the CAS which contains a non-xml character.
> This character cannot be serialized into XMI, and thats what this exception is about.
> 
> Have a look at our documentation explaining the issue:
> http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.xmi_emf.xml_character_issues
> 
> Hope that helps,
> Jörn
> 
> On 06/14/2012 11:39 PM, Thomas Ginter wrote:
>> We are getting an odd error while trying to process large datasets using UIMA-AS 2.3.1.  There is an
exception thrown by the XmiCasSerializer in the Client when it is in the process of serializing a CAS to be
sent to a remote service.  The exception is as follows:
>> 
>> org.apache.uima.resource.ResourceProcessException
(Continue reading)


Gmane