14 Jun 2012 23:39
Exception thrown during CAS serialization for Remote UIMA-AS Service
We are getting an odd error while trying to process large datasets using UIMA-AS 2.3.1. There is an
exception thrown by the XmiCasSerializer in the Client when it is in the process of serializing a CAS to be
sent to a remote service. The exception is as follows:
org.apache.uima.resource.ResourceProcessException
at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:854)
at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:885)
at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.process(BaseUIMAAsynchronousEngineCommon_impl.java:734)
at gov.va.vinci.flap.Client.run(Client.java:181)
at gov.va.vinci.density.DensityClient.main(DensityClient.java:137)
Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 character: _, 0x1a
at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:174)
at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:1003)
at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:755)
at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:700)
at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:268)
at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$700(XmiCasSerializer.java:108)
at org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1539)
at org.apache.uima.aae.UimaSerializer.serializeCasToXmi(UimaSerializer.java:136)
at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.serializeCAS(BaseUIMAAsynchronousEngineCommon_impl.java:260)
at org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:779)
... 4 more
It happens at apparently random points when processing the corpus and is never actually "thrown" but is
simply written to StdErr. Also the serializer never seems to return which means the
UimaAsynchronoousEngine.process() method never returns and the client simply "hangs" until it is
manually terminated. To resolve this issue I have implemented text filters for the incoming CAS data to
prevent anything out of the ASCII-8 range. I have also tried switching the server and client to binary
serialization strategies but that causes the XmiCasSerializer in my UimaAsBaseListener object to
(Continue reading)
RSS Feed