Richard Boulton | 28 Jul 09:22

Re: xapwrap to xappy migration


On Jul 25, 8:03 pm, dimazest <dimaz...@...> wrote:
> The problem is that it finds some documents, but I cannot get IDs of
> them. Any ideas how can i get IDs and other fields?

It very much depends on how xapwrap stores its data, but I'm afraid I
don't know anything about how it does that.

I suspect that it's going to be quite hard to make a database built
with xapwrap searchable with xappy.  Even if you can get the IDs and
other field prefix mappings set correctly (which would involve hacking
into the internals of xappy to directly set its prefix map - not a
very robust approach), the way in which text is indexed with xapwrap
is unlikely to be identical to xappy, which will lead to poor search
performance at best; often searches simply won't return the right
results.

Instead, I think you'd be much better off trying to build new indexes
from scratch with xappy.

--
Richard
dimazest | 28 Jul 12:57
Picon

Re: xapwrap to xappy migration


Hello,

On Jul 28, 9:22 am, Richard Boulton <boulton...@...> wrote:
> On Jul 25, 8:03 pm, dimazest <dimaz...@...> wrote:
>
>
> Instead, I think you'd be much better off trying to build new indexes
> from scratch with xappy.
>

We decided to build index with xappy from scratch. Now I need to add
terms to fields::

            pdoc = connection.process(doc)
            pdoc.add_term('revision', 'XREV')
            pdoc.add_term('mimetype', 'T')
            pdoc.add_term('title', 'S')
            pdoc.add_term('fulltitle', 'XFT')
            pdoc.add_term('domain', 'XDOMAIN')

But I get
...
  File "/Volumes/RamDisk/moin/src/1.9-xapian-dmilajevs/MoinMoin/search/
Xapian.py", line 628, in _index_page_rev
    pdoc.add_term('revision', 'XREV')
  File "/Volumes/RamDisk/moin/src/1.9-xapian-dmilajevs/MoinMoin/
support/xappy/datastructures.py", line 116, in add_term
    prefix = self._fieldmappings.get_prefix(field)
  File "/Volumes/RamDisk/moin/src/1.9-xapian-dmilajevs/MoinMoin/
(Continue reading)

Richard Boulton | 28 Jul 15:49
Gravatar

Re: xapwrap to xappy migration

2009/7/28 dimazest <dimazest-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Hello,

On Jul 28, 9:22 am, Richard Boulton <boulton...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
> On Jul 25, 8:03 pm, dimazest <dimaz...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>
> Instead, I think you'd be much better off trying to build new indexes
> from scratch with xappy.
>

We decided to build index with xappy from scratch. Now I need to add
terms to fields::

           pdoc = connection.process(doc)
           pdoc.add_term('revision', 'XREV')
           pdoc.add_term('mimetype', 'T')
           pdoc.add_term('title', 'S')
           pdoc.add_term('fulltitle', 'XFT')
           pdoc.add_term('domain', 'XDOMAIN')

Am I right that terms are added to the processed documents? Could you
suggest some documentation describing terms.

You probably don't want to work at the term level at all.  Instead, set up field actions on the database (via an IndexerConnection), create UnprocessedDocuments, and add the UnprocessedDocuments to an IndexerConnection,  The terms will be generated from the text automatically.

See docs/introduction.rst for an introduction to the concepts.

-- 
Richard

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "xappy-discuss" group.
To post to this group, send email to xappy-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to xappy-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at http://groups.google.com/group/xappy-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

dimazest | 28 Jul 18:43
Picon

Re: xapwrap to xappy migration


Thank you for reply.

Another question, do I need care about stemming or I should just set
lang parameter for FREE_TEXT actions? Is it possible for different
documents set different languages?

On Jul 28, 3:49 pm, Richard Boulton <rich...@...> wrote:
> 2009/7/28 dimazest <dimaz...@...>
>
>
>
> > Hello,
>
> > On Jul 28, 9:22 am, Richard Boulton <boulton...@...> wrote:
> > > On Jul 25, 8:03 pm, dimazest <dimaz...@...> wrote:
>
> > > Instead, I think you'd be much better off trying to build new indexes
> > > from scratch with xappy.
>
> > We decided to build index with xappy from scratch. Now I need to add
> > terms to fields::
>
> >            pdoc = connection.process(doc)
> >            pdoc.add_term('revision', 'XREV')
> >            pdoc.add_term('mimetype', 'T')
> >            pdoc.add_term('title', 'S')
> >            pdoc.add_term('fulltitle', 'XFT')
> >            pdoc.add_term('domain', 'XDOMAIN')
>
> > Am I right that terms are added to the processed documents? Could you
> > suggest some documentation describing terms.
>
> You probably don't want to work at the term level at all.  Instead, set up
> field actions on the database (via an IndexerConnection), create
> UnprocessedDocuments, and add the UnprocessedDocuments to an
> IndexerConnection,  The terms will be generated from the text automatically.
>
> See docs/introduction.rst for an introduction to the concepts.
>
> --
> Richard
Richard Boulton | 28 Jul 18:52
Picon
Favicon

Re: xapwrap to xappy migration

2009/7/28 dimazest <dimazest-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Thank you for reply.

Another question, do I need care about stemming or I should just set
lang parameter for FREE_TEXT actions? Is it possible for different
documents set different languages?

If you set the "language" parameter for free text actions, xappy will take care of stemming for you.

It's not really possible for different documents to have different languages.  You can, however, set different fields to have different languages (so one field could be text_en and english, and another could be text_fr and be in french).  However, if doing this, you'll need to decide which language to search in at query construction time, and use the appropriate field (eg, with query_parse(default_allow="text_fr"))  You can't easily mix french and english queries (for example) because the stemming algorithm used at search time needs to be the same as that applied to the field at index time.

-- 
Richard 

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "xappy-discuss" group.
To post to this group, send email to xappy-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to xappy-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at http://groups.google.com/group/xappy-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---


Gmane