10 Dec 2004 23:26
RE: evaluation of Snowball stemmers
Martin Porter <martin.porter <at> grapeshot.co.uk>
2004-12-10 22:26:20 GMT
2004-12-10 22:26:20 GMT
Fred, Do you mean you got a 29%/56% average precision improvement when you switched stemming off? Anything is possible, but this does surprise me: I would have expected Russian, with its highly (and regularly) inflected vocabulary to do quite well under stemming. If you look at the paper at http://clef.isti.cnr.it/2004/working_notes/WorkingNotes2004/16.pdf (Mono- and Crosslingual Retrieval Experiments at the University of Hildesheim - René Hackl, Thomas Mandl and Christa Womser-Hacker) the evidence, for Finnish, points the other way ("the snowball stemmer works very well"). Their Russian experiments were not unfortunately taken to conclusion, but I feel much more confidence myself in the snowball Russian stemmer than the snowball Finnish stemmer. On the other hand I have had verbal notice (which I did not entirely trust!) of the Finnish stemmer doing badly in some other tests. I should point out that although the version of the stemmer you picked up works for KOI-8, Snowball is designed to make switching to other character codes as easy as possible. See the notes at http://snowball.tartarus.org/codesets/guide.html Martin(Continue reading)
RSS Feed