Xiao Yu | 6 Jun 08:22 2012
Picon

Exceptions using corpus benchmark tool

Dear all,

I met some problems when using the Corpus Benchmark Tool to evaluate performance.The followings are the details:

I created a corpus including a pdf file and saved the corpus into the "clean" folder. Then I manually annotated the file under the Default annotation set and saved the corpus into the folder of "marked". After that, I built my own rules within the annotation set of NE(Chinese plugin) and saved the application. Finally I used the function of "Human Marked Against Current Processing Results" by rendering the main directory folder(clean, marked and processed) and the saved application. However, some exceptions come out, which makes me quite confused.  

The contents of the corpus_tool.properties under the build folder(GATE\build\corpus_tool.properties) are like this:

threshold=0.8
annotSetName=
outputSetName=NE
annotTypes=Journal;Pyear;Random;TreatmentTeam;ComparisonTeam;TreatmentStandard
annotFeatures=content;isEmptyAndSpan;name;class


The followings are the exceptions I received:

Evaluating human-marked documents against current processing results.
C:\Xiao Yu\GATE\corpus_tool.properties
App file is: U:\GATE\save.gapp
Processing directory: U:\GATE\SmallEvaluation<P>
Warning: Document remains unparsed. 

  Stack Dump: 
com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </Feature>; expected </Value>.
 at [row,col,system-id]: [9,9,"file:/U:/GATE/SmallEvaluation/clean/“祛斑汤”内服外敷治疗黄褐斑41例临床观察.pdf.xml"]
at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:605)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
at com.ctc.wstx.sr.BasicStreamReader.reportWrongEndElem(BasicStreamReader.java:3256)
at com.ctc.wstx.sr.BasicStreamReader.readEndElem(BasicStreamReader.java:3198)
at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2830)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
at gate.corpora.DocumentStaxUtils.readFeatureNameOrValue(DocumentStaxUtils.java:480)
at gate.corpora.DocumentStaxUtils.readFeatureMap(DocumentStaxUtils.java:445)
at gate.corpora.DocumentStaxUtils.readGateXmlDocument(DocumentStaxUtils.java:130)
at gate.corpora.XmlDocumentFormat.unpackGateFormatMarkup(XmlDocumentFormat.java:179)
at gate.corpora.XmlDocumentFormat.unpackMarkup(XmlDocumentFormat.java:129)
at gate.corpora.XmlDocumentFormat.unpackMarkup(XmlDocumentFormat.java:83)
at gate.corpora.DocumentImpl.init(DocumentImpl.java:271)
at gate.Factory.createResource(Factory.java:384)
at gate.util.CorpusBenchmarkTool.evaluateMarkedClean(CorpusBenchmarkTool.java:897)
at gate.util.CorpusBenchmarkTool.evaluateCorpus(CorpusBenchmarkTool.java:553)
at gate.util.CorpusBenchmarkTool.execute(CorpusBenchmarkTool.java:196)
at gate.util.CorpusBenchmarkTool.execute(CorpusBenchmarkTool.java:61)
at gate.gui.MainFrame$CleanMarkedCorpusEvalAction$1.run(MainFrame.java:2508)
at java.lang.Thread.run(Thread.java:722)
<H2>“祛斑汤”内服外敷治疗黄褐斑41例临床观察.pdf.xml</H2>
Warning: Document remains unparsed. 

  Stack Dump: 
com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </Feature>; expected </Value>.
 at [row,col,system-id]: [9,9,"file:/U:/GATE/SmallEvaluation/marked/“祛斑汤”内服外敷治疗黄褐斑41例临床观察.pdf.xml"]
at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:605)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
at com.ctc.wstx.sr.BasicStreamReader.reportWrongEndElem(BasicStreamReader.java:3256)
at com.ctc.wstx.sr.BasicStreamReader.readEndElem(BasicStreamReader.java:3198)
at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2830)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
at gate.corpora.DocumentStaxUtils.readFeatureNameOrValue(DocumentStaxUtils.java:480)
at gate.corpora.DocumentStaxUtils.readFeatureMap(DocumentStaxUtils.java:445)
at gate.corpora.DocumentStaxUtils.readGateXmlDocument(DocumentStaxUtils.java:130)
at gate.corpora.XmlDocumentFormat.unpackGateFormatMarkup(XmlDocumentFormat.java:179)
at gate.corpora.XmlDocumentFormat.unpackMarkup(XmlDocumentFormat.java:129)
at gate.corpora.XmlDocumentFormat.unpackMarkup(XmlDocumentFormat.java:83)
at gate.corpora.DocumentImpl.init(DocumentImpl.java:271)
at gate.Factory.createResource(Factory.java:384)
at gate.util.CorpusBenchmarkTool.evaluateMarkedClean(CorpusBenchmarkTool.java:948)
at gate.util.CorpusBenchmarkTool.evaluateCorpus(CorpusBenchmarkTool.java:553)
at gate.util.CorpusBenchmarkTool.execute(CorpusBenchmarkTool.java:196)
at gate.util.CorpusBenchmarkTool.execute(CorpusBenchmarkTool.java:61)
at gate.gui.MainFrame$CleanMarkedCorpusEvalAction$1.run(MainFrame.java:2508)
at java.lang.Thread.run(Thread.java:722)
<H2> Statistics </H2>
No types given for evaluation, cannot obtain precision/recall
Overall average precision: NaN
Overall average recall: NaN
Overall average fMeasure : NaN
Finished!


What could be the problems here?

Thanks a lot.

Regards,
Xiao
Attachment (main directory.7z): application/octet-stream, 8 KiB
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-users mailing list
GATE-users@...
https://lists.sourceforge.net/lists/listinfo/gate-users
Diana Maynard | 6 Jun 12:48 2012
Picon

Re: Exceptions using corpus benchmark tool

It sounds as if you followed the correct procedure (I assume you have 
checked that your properties file is set correctly for the relevant 
names of sets, annotations and so on), but that for some reason you've 
got at least one document that GATE can't analyse. Can you send the 
corpus in another format (zip or tar)  so I can investigate?
Diana

On 06/06/12 07:22, Xiao Yu wrote:
> Dear all,
>
> I met some problems when using the Corpus Benchmark Tool to evaluate
> performance.The followings are the details:
>
> I created a corpus including a pdf file and saved the corpus into the
> "clean" folder. Then I manually annotated the file under the Default
> annotation set and saved the corpus into the folder of "marked". After
> that, I built my own rules within the annotation set of NE(Chinese
> plugin) and saved the application. Finally I used the function of "Human
> Marked Against Current Processing Results" by rendering the main
> directory folder(clean, marked and processed) and the saved application.
> However, some exceptions come out, which makes me quite confused.
>
> The contents of the corpus_tool.properties under the build
> folder(GATE\build\corpus_tool.properties) are like this:
>
> threshold=0.8
> annotSetName=
> outputSetName=NE
> annotTypes=Journal;Pyear;Random;TreatmentTeam;ComparisonTeam;TreatmentStandard
> annotFeatures=content;isEmptyAndSpan;name;class
>
>
> The followings are the exceptions I received:
>
> Evaluating human-marked documents against current processing results.
> C:\Xiao Yu\GATE\corpus_tool.properties
> App file is: U:\GATE\save.gapp
> Processing directory: U:\GATE\SmallEvaluation<P>
> Warning: Document remains unparsed.
>
> Stack Dump:
> com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </Feature>;
> expected </Value>.
> at [row,col,system-id]: [9,9,"file:/U:/GATE/SmallEvaluation/clean/“祛斑
> 汤”内服外敷治疗黄褐斑41例临床观察.pdf.xml"]
> at
> com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:605)
> at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
> at
> com.ctc.wstx.sr.BasicStreamReader.reportWrongEndElem(BasicStreamReader.java:3256)
> at
> com.ctc.wstx.sr.BasicStreamReader.readEndElem(BasicStreamReader.java:3198)
> at
> com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2830)
> at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
> at
> gate.corpora.DocumentStaxUtils.readFeatureNameOrValue(DocumentStaxUtils.java:480)
> at gate.corpora.DocumentStaxUtils.readFeatureMap(DocumentStaxUtils.java:445)
> at
> gate.corpora.DocumentStaxUtils.readGateXmlDocument(DocumentStaxUtils.java:130)
> at
> gate.corpora.XmlDocumentFormat.unpackGateFormatMarkup(XmlDocumentFormat.java:179)
> at gate.corpora.XmlDocumentFormat.unpackMarkup(XmlDocumentFormat.java:129)
> at gate.corpora.XmlDocumentFormat.unpackMarkup(XmlDocumentFormat.java:83)
> at gate.corpora.DocumentImpl.init(DocumentImpl.java:271)
> at gate.Factory.createResource(Factory.java:384)
> at
> gate.util.CorpusBenchmarkTool.evaluateMarkedClean(CorpusBenchmarkTool.java:897)
> at
> gate.util.CorpusBenchmarkTool.evaluateCorpus(CorpusBenchmarkTool.java:553)
> at gate.util.CorpusBenchmarkTool.execute(CorpusBenchmarkTool.java:196)
> at gate.util.CorpusBenchmarkTool.execute(CorpusBenchmarkTool.java:61)
> at gate.gui.MainFrame$CleanMarkedCorpusEvalAction$1.run(MainFrame.java:2508)
> at java.lang.Thread.run(Thread.java:722)
> <H2>“祛斑汤”内服外敷治疗黄褐斑41例临床观察.pdf.xml</H2>
> Warning: Document remains unparsed.
>
> Stack Dump:
> com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </Feature>;
> expected </Value>.
> at [row,col,system-id]: [9,9,"file:/U:/GATE/SmallEvaluation/marked/“祛斑
> 汤”内服外敷治疗黄褐斑41例临床观察.pdf.xml"]
> at
> com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:605)
> at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
> at
> com.ctc.wstx.sr.BasicStreamReader.reportWrongEndElem(BasicStreamReader.java:3256)
> at
> com.ctc.wstx.sr.BasicStreamReader.readEndElem(BasicStreamReader.java:3198)
> at
> com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2830)
> at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
> at
> gate.corpora.DocumentStaxUtils.readFeatureNameOrValue(DocumentStaxUtils.java:480)
> at gate.corpora.DocumentStaxUtils.readFeatureMap(DocumentStaxUtils.java:445)
> at
> gate.corpora.DocumentStaxUtils.readGateXmlDocument(DocumentStaxUtils.java:130)
> at
> gate.corpora.XmlDocumentFormat.unpackGateFormatMarkup(XmlDocumentFormat.java:179)
> at gate.corpora.XmlDocumentFormat.unpackMarkup(XmlDocumentFormat.java:129)
> at gate.corpora.XmlDocumentFormat.unpackMarkup(XmlDocumentFormat.java:83)
> at gate.corpora.DocumentImpl.init(DocumentImpl.java:271)
> at gate.Factory.createResource(Factory.java:384)
> at
> gate.util.CorpusBenchmarkTool.evaluateMarkedClean(CorpusBenchmarkTool.java:948)
> at
> gate.util.CorpusBenchmarkTool.evaluateCorpus(CorpusBenchmarkTool.java:553)
> at gate.util.CorpusBenchmarkTool.execute(CorpusBenchmarkTool.java:196)
> at gate.util.CorpusBenchmarkTool.execute(CorpusBenchmarkTool.java:61)
> at gate.gui.MainFrame$CleanMarkedCorpusEvalAction$1.run(MainFrame.java:2508)
> at java.lang.Thread.run(Thread.java:722)
> <H2> Statistics </H2>
> No types given for evaluation, cannot obtain precision/recall
> Overall average precision: NaN
> Overall average recall: NaN
> Overall average fMeasure : NaN
> Finished!
>
>
> What could be the problems here?
>
> Thanks a lot.
>
> Regards,
> Xiao
>
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-users mailing list
GATE-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gate-users
Ian Roberts | 6 Jun 13:02 2012
Picon

Re: Exceptions using corpus benchmark tool

On 06/06/2012 07:22, Xiao Yu wrote:
> The contents of the corpus_tool.properties under the build
> folder(GATE\build\corpus_tool.properties) are like this:
> 
> threshold=0.8
> annotSetName=
> outputSetName=NE
> annotTypes=Journal;Pyear;Random;TreatmentTeam;ComparisonTeam;TreatmentStandard
> annotFeatures=content;isEmptyAndSpan;name;class

The error message you're seeing suggests it's trying to use the wrong
character encoding when loading the documents from XML.  The files you
attached appear to be in UTF-8 so try adding a line

encoding=UTF-8

to your corpus_tool.properties to force it to use the right encoding.

Ian

--

-- 
Ian Roberts               | Department of Computer Science
i.roberts@...  | University of Sheffield, UK

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Xiao Yu | 6 Jun 13:12 2012
Picon

Re: Exceptions using corpus benchmark tool

Dear Ian,

I added the line of encoding into the properties file, and restarted GATE. However, the same errors still
came out. 

Regards,
Xiao 
________________________________________
From: Ian Roberts [i.roberts@...]
Sent: Wednesday, 6 June 2012 21:02
To: gate-users@...
Subject: Re: [gate-users] Exceptions using corpus benchmark tool

On 06/06/2012 07:22, Xiao Yu wrote:
> The contents of the corpus_tool.properties under the build
> folder(GATE\build\corpus_tool.properties) are like this:
>
> threshold=0.8
> annotSetName=
> outputSetName=NE
> annotTypes=Journal;Pyear;Random;TreatmentTeam;ComparisonTeam;TreatmentStandard
> annotFeatures=content;isEmptyAndSpan;name;class

The error message you're seeing suggests it's trying to use the wrong
character encoding when loading the documents from XML.  The files you
attached appear to be in UTF-8 so try adding a line

encoding=UTF-8

to your corpus_tool.properties to force it to use the right encoding.

Ian

--
Ian Roberts               | Department of Computer Science
i.roberts@...  | University of Sheffield, UK

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-users mailing list
GATE-users@...
https://lists.sourceforge.net/lists/listinfo/gate-users

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Ian Roberts | 6 Jun 13:38 2012
Picon

Re: Exceptions using corpus benchmark tool

On 06/06/2012 12:12, Xiao Yu wrote:
> Dear Ian,
> 
> I added the line of encoding into the properties file, and restarted GATE. However, the same errors still
came out. 

Which corpus_tool.properties did you edit, the one you referred to
"under the build folder" or the one in C:\Xiao Yu\GATE\ which the tool
says it is using?

Ian

--

-- 
Ian Roberts               | Department of Computer Science
i.roberts@...  | University of Sheffield, UK

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Xiao Yu | 6 Jun 13:47 2012
Picon

Re: Exceptions using corpus benchmark tool

Dear Ian,

I didn't find corpus_tool.properties under the home directory of GATE. I found there was one under the
build directory and modified that one. I will copy this file to the home directory of GATE and try again.

Thanks.

Regards,
Xiao
________________________________________
From: Ian Roberts [i.roberts@...]
Sent: Wednesday, 6 June 2012 21:38
To: Gate Users
Subject: Re: [gate-users] Exceptions using corpus benchmark tool

On 06/06/2012 12:12, Xiao Yu wrote:
> Dear Ian,
>
> I added the line of encoding into the properties file, and restarted GATE. However, the same errors still
came out.

Which corpus_tool.properties did you edit, the one you referred to
"under the build folder" or the one in C:\Xiao Yu\GATE\ which the tool
says it is using?

Ian

--
Ian Roberts               | Department of Computer Science
i.roberts@...  | University of Sheffield, UK

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-users mailing list
GATE-users@...
https://lists.sourceforge.net/lists/listinfo/gate-users

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Ian Roberts | 6 Jun 13:52 2012
Picon

Re: Exceptions using corpus benchmark tool

On 06/06/2012 12:47, Xiao Yu wrote:
> Dear Ian,
> 
> I didn't find corpus_tool.properties under the home directory of GATE. I found there was one under the
build directory and modified that one. I will copy this file to the home directory of GATE and try again.

The CBT looks for corpus_tool.properties in whatever directory was
current when GATE Developer was started up.  This usually means the
top-level GATE directory.  In any case, when you run the tool the first
thing it prints to the messages pane is the location it expects for
corpus_tool.properties, in your case "C:\Xiao
Yu\GATE\corpus_tool.properties", and this is where you should put the
file if it doesn't exist already.

Ian

--

-- 
Ian Roberts               | Department of Computer Science
i.roberts@...  | University of Sheffield, UK

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Ian Roberts | 6 Jun 14:01 2012
Picon

Re: Exceptions using corpus benchmark tool

On 06/06/2012 12:52, Ian Roberts wrote:
> The CBT looks for corpus_tool.properties in whatever directory was
> current when GATE Developer was started up.  This usually means the
> top-level GATE directory.  In any case, when you run the tool the first
> thing it prints to the messages pane is the location it expects for
> corpus_tool.properties

And note that it is *not* currently treated as an error if this file
doesn't exist, the tool simply uses a default configuration.  Maybe it
should complain more loudly, or at least print a warning, if the
properties file is not found.

Ian

--

-- 
Ian Roberts               | Department of Computer Science
i.roberts@...  | University of Sheffield, UK

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Xiao Yu | 6 Jun 14:08 2012
Picon

Re: Exceptions using corpus benchmark tool

Sure. If it could give some warnings, the problems could be much easily solved.

Thanks.
________________________________________
From: Ian Roberts [i.roberts@...]
Sent: Wednesday, 6 June 2012 22:01
To: Gate Users
Subject: Re: [gate-users] Exceptions using corpus benchmark tool

On 06/06/2012 12:52, Ian Roberts wrote:
> The CBT looks for corpus_tool.properties in whatever directory was
> current when GATE Developer was started up.  This usually means the
> top-level GATE directory.  In any case, when you run the tool the first
> thing it prints to the messages pane is the location it expects for
> corpus_tool.properties

And note that it is *not* currently treated as an error if this file
doesn't exist, the tool simply uses a default configuration.  Maybe it
should complain more loudly, or at least print a warning, if the
properties file is not found.

Ian

--
Ian Roberts               | Department of Computer Science
i.roberts@...  | University of Sheffield, UK

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-users mailing list
GATE-users@...
https://lists.sourceforge.net/lists/listinfo/gate-users

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Ian Roberts | 6 Jun 14:18 2012
Picon

Re: Exceptions using corpus benchmark tool

On 06/06/2012 13:08, Xiao Yu wrote:
> Sure. If it could give some warnings, the problems could be much easily solved.

Indeed, and I've just committed a change to make it clearer in future
versions.

Ian

--

-- 
Ian Roberts               | Department of Computer Science
i.roberts@...  | University of Sheffield, UK

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Xiao Yu | 6 Jun 15:38 2012
Picon

Re: Exceptions using corpus benchmark tool

One strange thing just came out. For the final statistics table, all values are 0 except the column
spurious. For almost all the documents, the values of Precision are 0.0, and the values for the Recall are
1.0. I just compared  the documents by opening in GATE and checked their existence and locations. Most all
of them should be correct, and it should have not too bad precision for most of the values.

Take one document for example:

I opened the manual-annotated file and the system generated one in GATE, and ticked the labels that I wanted
to compare. The annotation lists are given below(see the attachment):

Manually annotated document in GATE:

[manual_generated_annotation_list.png]

System-generated document according to my rules in GATE:
[system_generated_annotation_list.png]

The comparison table created by Corpus Benchmark tool:

[Corpus benchmark tool table.html]

What could be the problems?

Thanks.

Regards,
Xiao
________________________________________
From: Ian Roberts [i.roberts@...]
Sent: Wednesday, 6 June 2012 22:18
To: Gate Users
Subject: Re: [gate-users] Exceptions using corpus benchmark tool

On 06/06/2012 13:08, Xiao Yu wrote:
> Sure. If it could give some warnings, the problems could be much easily solved.

Indeed, and I've just committed a change to make it clearer in future
versions.

Ian

--
Ian Roberts               | Department of Computer Science
i.roberts@...  | University of Sheffield, UK

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-users mailing list
GATE-users@...
https://lists.sourceforge.net/lists/listinfo/gate-users

Evaluating human-marked documents against current processing results. C:\Xiao Yu\GATE\corpus_tool.properties New threshold is: 0.8

Annotation set in processed docs is: NE

New encoding is: UTF-8

Using annotation types from the properties file.

Using annotation features from the properties file. Features: [TIKA_CREATOR, content, gate.SourceURL, isEmptyAndSpan, TIKA_TITLE, MimeType, name, CREATOR, class, TITLE]

App file is: U:\GATE\save.gapp Processing directory: U:\GATE\SmallEvaluation

¡°ìî°ßÌÀ¡±ÄÚ·þÍâ·óÖÎÁƻƺְß41ÀýÁÙ´²¹Û²ì.pdf.xml

OrthoMatcher Warning: No annotations found for processing
Word count: 641
Annotation Type Precision Recall
Journal 0.0 1.0
Pyear 0.0 1.0
Random 0.0 1.0
TreatmentTeam 0.0 1.0
ComparisonTeam 0.0 1.0
TreatmentStandard 0.0 1.0

Statistics

Annotation Type Correct Partially Correct Missing Spurious Precision Recall F-Measure
Journal 0 0 0 1 0.0 0.0 0.0
Pyear 0 0 0 1 0.0 0.0 0.0
Random 0 0 0 1 0.0 0.0 0.0
TreatmentTeam 0 0 0 1 0.0 0.0 0.0
ComparisonTeam 0 0 0 1 0.0 0.0 0.0
TreatmentStandard 0 0 0 1 0.0 0.0 0.0
Overall average precision: 0.0 Overall average recall: 1.0 Overall average fMeasure : 0.0 Finished!
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-users mailing list
GATE-users@...
https://lists.sourceforge.net/lists/listinfo/gate-users
Diana Maynard | 6 Jun 15:48 2012
Picon

Re: Exceptions using corpus benchmark tool

Hi Xiao
Judging by the message generated, you appear to have no annotation types 
in your properties file, so it's not comparing anything. I would expect 
to see the list of them here:

Using annotation types from the properties file.

So I think something is wrong with your properties file.
Diana

====================================================================

Evaluating human-marked documents against current processing results. 
C:\Xiao Yu\GATE\corpus_tool.properties New threshold is: 0.8

Annotation set in processed docs is: NE

New encoding is: UTF-8

Using annotation types from the properties file.

Using annotation features from the properties file. Features: 
[TIKA_CREATOR, content, gate.SourceURL, isEmptyAndSpan, TIKA_TITLE, 
MimeType, name, CREATOR, class, TITLE]

App file is: U:\GATE\save.gapp Processing directory: U:\GATE\SmallEvaluation

“祛斑汤”内服外敷治疗黄褐斑41例临床观察.pdf.xml

On 06/06/12 14:38, Xiao Yu wrote:
> One strange thing just came out. For the final statistics table, all values are 0 except the column
spurious. For almost all the documents, the values of Precision are 0.0, and the values for the Recall are
1.0. I just compared  the documents by opening in GATE and checked their existence and locations. Most all
of them should be correct, and it should have not too bad precision for most of the values.
>
> Take one document for example:
>
> I opened the manual-annotated file and the system generated one in GATE, and ticked the labels that I
wanted to compare. The annotation lists are given below(see the attachment):
>
> Manually annotated document in GATE:
>
> [manual_generated_annotation_list.png]
>
> System-generated document according to my rules in GATE:
> [system_generated_annotation_list.png]
>
> The comparison table created by Corpus Benchmark tool:
>
> [Corpus benchmark tool table.html]
>
>
> What could be the problems?
>
> Thanks.
>
> Regards,
> Xiao
> ________________________________________
> From: Ian Roberts [i.roberts <at> dcs.shef.ac.uk]
> Sent: Wednesday, 6 June 2012 22:18
> To: Gate Users
> Subject: Re: [gate-users] Exceptions using corpus benchmark tool
>
> On 06/06/2012 13:08, Xiao Yu wrote:
>> Sure. If it could give some warnings, the problems could be much easily solved.
>
> Indeed, and I've just committed a change to make it clearer in future
> versions.
>
> Ian
>
> --
> Ian Roberts               | Department of Computer Science
> i.roberts <at> dcs.shef.ac.uk  | University of Sheffield, UK
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> GATE-users mailing list
> GATE-users <at> lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gate-users
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> GATE-users mailing list
> GATE-users <at> lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gate-users

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-users mailing list
GATE-users <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gate-users
Xiao Yu | 6 Jun 13:55 2012
Picon

Re: Exceptions using corpus benchmark tool

Dear Ian,

Thanks for your help. The problem has solved, and the expecting results have come out. Great! 

Regards,
Xiao
________________________________________
From: Ian Roberts [i.roberts@...]
Sent: Wednesday, 6 June 2012 21:52
To: Gate Users
Subject: Re: [gate-users] Exceptions using corpus benchmark tool

On 06/06/2012 12:47, Xiao Yu wrote:
> Dear Ian,
>
> I didn't find corpus_tool.properties under the home directory of GATE. I found there was one under the
build directory and modified that one. I will copy this file to the home directory of GATE and try again.

The CBT looks for corpus_tool.properties in whatever directory was
current when GATE Developer was started up.  This usually means the
top-level GATE directory.  In any case, when you run the tool the first
thing it prints to the messages pane is the location it expects for
corpus_tool.properties, in your case "C:\Xiao
Yu\GATE\corpus_tool.properties", and this is where you should put the
file if it doesn't exist already.

Ian

--
Ian Roberts               | Department of Computer Science
i.roberts@...  | University of Sheffield, UK

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-users mailing list
GATE-users@...
https://lists.sourceforge.net/lists/listinfo/gate-users

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

Gmane