Martin Porter | 10 Dec 2004 23:26
Picon

RE: evaluation of Snowball stemmers


Fred,

Do you mean you got a 29%/56% average precision improvement when you
switched stemming off? Anything is possible, but this does surprise me: I
would have expected Russian, with its highly (and regularly) inflected
vocabulary to do quite well under stemming.

If you look at the paper at 

http://clef.isti.cnr.it/2004/working_notes/WorkingNotes2004/16.pdf

(Mono- and Crosslingual Retrieval Experiments at the University of
Hildesheim  - René Hackl, Thomas Mandl and  Christa Womser-Hacker) the
evidence, for Finnish, points the other way ("the snowball stemmer works
very well"). Their Russian experiments were not unfortunately taken to
conclusion, but I feel much more confidence myself in the snowball Russian
stemmer than the snowball Finnish stemmer.

On the other hand I have had verbal notice (which I did not entirely trust!)
of the Finnish stemmer doing badly in some other tests.

I should point out that although the version of the stemmer you picked up
works for KOI-8, Snowball is designed to make switching to other character
codes as easy as possible. See the notes at

http://snowball.tartarus.org/codesets/guide.html

Martin

(Continue reading)

Fred Gey | 10 Dec 2004 23:44
Picon
Picon

RE: evaluation of Snowball stemmers

I meant that the performance with stemming was 29%/56% better than
performance without stemming ((prec-with minus prec-without)/prec-without) *
100).  I attach the fragment of text (as a Word document) which describes
the experiments. 
Fred

-----Original Message-----
From: Martin Porter [mailto:martin.porter <at> grapeshot.co.uk] 
Sent: Friday, December 10, 2004 2:26 PM
To: gey <at> berkeley.edu; 'Diana Maynard'
Cc: snowball-discuss <at> lists.tartarus.org
Subject: RE: [Snowball-discuss] evaluation of Snowball stemmers

Fred,

Do you mean you got a 29%/56% average precision improvement when you
switched stemming off? Anything is possible, but this does surprise me: I
would have expected Russian, with its highly (and regularly) inflected
vocabulary to do quite well under stemming.

If you look at the paper at 

http://clef.isti.cnr.it/2004/working_notes/WorkingNotes2004/16.pdf

(Mono- and Crosslingual Retrieval Experiments at the University of
Hildesheim  - René Hackl, Thomas Mandl and  Christa Womser-Hacker) the
evidence, for Finnish, points the other way ("the snowball stemmer works
very well"). Their Russian experiments were not unfortunately taken to
conclusion, but I feel much more confidence myself in the snowball Russian
stemmer than the snowball Finnish stemmer.
(Continue reading)

Fred Gey | 10 Dec 2004 23:52
Picon
Picon

RE: evaluation of Snowball stemmers

Sorry for the word order reversal -- should have been:

After the workshop we ran some no-stemmer/stemmer experiments. The results
were remarkable (I did not test statistical significance): for
Title-Description (shorter queries) average precision went from 0. 259 to
0.334 (29% improvement), for Title-Description-Narrative (longer queries),
average precision went from 0.236 to 0.367 (56% improvement).

Fred

-----Original Message-----
From: Martin Porter [mailto:martin.porter <at> grapeshot.co.uk] 
Sent: Friday, December 10, 2004 2:26 PM
To: gey <at> berkeley.edu; 'Diana Maynard'
Cc: snowball-discuss <at> lists.tartarus.org
Subject: RE: [Snowball-discuss] evaluation of Snowball stemmers

Fred,

Do you mean you got a 29%/56% average precision improvement when you
switched stemming off? Anything is possible, but this does surprise me: I
would have expected Russian, with its highly (and regularly) inflected
vocabulary to do quite well under stemming.

If you look at the paper at 

http://clef.isti.cnr.it/2004/working_notes/WorkingNotes2004/16.pdf

(Mono- and Crosslingual Retrieval Experiments at the University of
Hildesheim  - René Hackl, Thomas Mandl and  Christa Womser-Hacker) the
(Continue reading)

Oleg Bartunov | 10 Dec 2004 23:35
Picon

RE: evaluation of Snowball stemmers

Here s a paper which compare several stemmers (including snowball) on
russian corpus.
http://company.yandex.ru/articles/iseg-las-vegas.html

My own experience is good.

 	Oleg
On Fri, 10 Dec 2004, Martin Porter wrote:

>
> Fred,
>
> Do you mean you got a 29%/56% average precision improvement when you
> switched stemming off? Anything is possible, but this does surprise me: I
> would have expected Russian, with its highly (and regularly) inflected
> vocabulary to do quite well under stemming.
>
> If you look at the paper at
>
> http://clef.isti.cnr.it/2004/working_notes/WorkingNotes2004/16.pdf
>
> (Mono- and Crosslingual Retrieval Experiments at the University of
> Hildesheim  - Ren? Hackl, Thomas Mandl and  Christa Womser-Hacker) the
> evidence, for Finnish, points the other way ("the snowball stemmer works
> very well"). Their Russian experiments were not unfortunately taken to
> conclusion, but I feel much more confidence myself in the snowball Russian
> stemmer than the snowball Finnish stemmer.
>
> On the other hand I have had verbal notice (which I did not entirely trust!)
> of the Finnish stemmer doing badly in some other tests.
(Continue reading)


Gmane