Grushevskiy Dmitry | 16 Jul 2012 11:35

Digraphs

Please help me

In Polish digraphs using like letters, but snowball compiler ignoring it

stringdef ia   hex '69 61'
stringdef ia"  hex '69 105'
stringdef ie   hex '69 65'
stringdef ie"  hex '69 119'
stringdef io   hex '69 6F'
stringdef io"  hex '69 F3'
stringdef iu   hex '69 75'

stringdef ch   hex '63 68'
stringdef cz   hex '63 7A'
stringdef dz   hex '64 7A'
stringdef dz"  hex '64 17C'
stringdef dz`  hex '64 17A'
stringdef rz   hex '72 7A'
stringdef sz   hex '73 7A'

define v 'a{a"}e{e"}o{o"}uy{ia}{ia"}{ie}{ie"}{io}{io"}{iu}i'

static const unsigned char g_v[] = { 17, 65, 16, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 4, 0, 16, 0, 0, 1 };

how i can use digraphs in stemmer?
Martin Porter | 16 Jul 2012 19:12
Picon

Re: Digraphs

On 7/16/12, Grushevskiy Dmitry <dgr <at> jooble.com> wrote:
> Please help me
>
> In Polish digraphs using like letters, but snowball compiler ignoring it
>
> stringdef ia   hex '69 61'
. . . .
>
> define v 'a{a"}e{e"}o{o"}uy{ia}{ia"}{ie}{ie"}{io}{io"}{iu}i'
>

Dmitry,

It's because a stringdef, as you suppose, gives a name to a sequence
of characters, but a 'define' for a grouping defines a name which
stand for a group of single characters, so if {io} is a stringdef of a
pair of characters, define v '...{io}...' puts each member of the pair
into the group, not the digraph pairing {io}. In other words,

stringdef {io} 'io'
define i_or_o 'io{io}'

puts i into i_or_o, then o, then {io}, which is just i and o again,
and so is the same as

define i_or_o as 'io'

It may be you don't need to worry too much about digraphs -- in
Spanish, ch and ll are digraphs, but that doesn't really affect
writing a stemmer. It might matter if you were counting letters, e.g.
(Continue reading)


Gmane