Ngan Nguyen | 18 May 17:58
Picon

Problem with subIterator of AnnotationIndex

In AnnotationIndex class says that "Annotations are sorted in increasing
order of their start offset". However when I use subIterator method of
AnnotationIndex to get an iterator of annotations (token) inside an
annotation (sentence), the returned annotations' order is strange. See below
token_string(begin,end,part_of_speech):

He(0/2/PRP)
. (13,14,.)
running(6,13,VBG)
is (3,5,VBZ)

Can such strange orders happen in the AnnotationIndex? or just my program
bugs? I tried to find the reason why but I still couldn't.

I find AnnotationIndex a bit inconvenient for programmers. Do you have any
effective strategy to deal with annotations in the annotation pool?
Thilo Goetz | 18 May 20:31
Picon
Picon

Re: Problem with subIterator of AnnotationIndex

Ngan Nguyen wrote:
> In AnnotationIndex class says that "Annotations are sorted in increasing
> order of their start offset". However when I use subIterator method of
> AnnotationIndex to get an iterator of annotations (token) inside an
> annotation (sentence), the returned annotations' order is strange. See 
> below
> token_string(begin,end,part_of_speech):
> 
> He(0/2/PRP)
> . (13,14,.)
> running(6,13,VBG)
> is (3,5,VBZ)
> 
> Can such strange orders happen in the AnnotationIndex? or just my program
> bugs? I tried to find the reason why but I still couldn't.
> 
> I find AnnotationIndex a bit inconvenient for programmers. Do you have any
> effective strategy to deal with annotations in the annotation pool?
> 

That sounds like a bug.  Could you provide a test case (code)?  Or maybe an
XCAS of a document plus instructions on how to reproduce this?  Thanks.

--Thilo

Ngan Nguyen | 19 May 11:00
Picon

Re: Problem with subIterator of AnnotationIndex

In my type system, PosTaggerToken is a subtype of FullToken type. I
aggregate UIMA SimpleTokenAndSentenceAnnotator, my type converter,
POSTagger, and another AE (with strange behavior) whose code is:

AnnotationIndex sentenceIndex= (AnnotationIndex) aJCas.getJFSIndexRepository
().getAnnotationIndex(Sentence.type);
        AnnotationIndex tokenIndex= (AnnotationIndex)
aJCas.getJFSIndexRepository().getAnnotationIndex(FullToken.type);

        //iterate over Sentences
        FSIterator sentenceIterator = sentenceIndex.iterator();
        while (sentenceIterator.hasNext()) {
            Sentence sentence = (Sentence) sentenceIterator.next();
            // iterate over Tokens
            FSIterator tokenIterator = tokenIndex.subiterator(sentence);
            while (tokenIterator.hasNext()){
                System.out.println((FullToken) tokenIterator.next());
            }
        }

And here is the XCAS of a document containing only one test sentence : "He
is running"

<cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text"
sofaString="He is running.                         &#10;"/>
<tcas:DocumentAnnotation xmi:id="8" sofa="1" begin="0" end="40"
language="en"/>
<examples:SourceDocumentInformation xmi:id="13" sofa="1" begin="0" end="0"
uri="file:test2.txt" offsetInSource="0" documentSize="40"
lastSegment="true"/>
(Continue reading)

Thilo Goetz | 21 May 22:18
Picon
Picon

Re: Problem with subIterator of AnnotationIndex

Hi Ngan,

thanks for reporting this.

Could you send me a) the XMI file and b) your type system as attachments, please?
Don't send them to the list, it doesn't allow attachments.

What would be even better would be to open a Jira issue and attach that files to
that, if you don't mind.

Thanks.

--Thilo


Gmane