Richard Heycock | 2 Jun 06:20 2009
Picon

search without flush.

Hi,

Is it possible to perform a search without flushing the index? I've got
an application that updates the index every 4 hours but I need to be
able to search the new data fairly quickly after the index is updated.
The problem revolves around the fact that the update is often much less
than 10 000 documents so it isn't being flushed until quite a bit
latter. I realise I can do a flush after the documents have been updated
but I was trying to avoid the flush.

Is there a way of running a search after I've called add_document but
before flush is called?

rgh
Miki Tebeka | 2 Jun 20:36 2009

Re: search without flush.

Hello Richard,

> Is it possible to perform a search without flushing the index? I've got
> an application that updates the index every 4 hours but I need to be
> able to search the new data fairly quickly after the index is updated.
> The problem revolves around the fact that the update is often much less
> than 10 000 documents so it isn't being flushed until quite a bit
> latter. I realise I can do a flush after the documents have been updated
> but I was trying to avoid the flush.
You can set the XAPIAN_FLUSH_THRESHOLD to a number lower than that.

All the best,
--
Miki Tebeka
miki <at> fattoc.com
Simon Roe | 3 Jun 11:40 2009
Picon

Re: search without flush.

On Tue, Jun 2, 2009 at 7:36 PM, Miki Tebeka <miki <at> fattoc.com> wrote:
> Hello Richard,
>
>> Is it possible to perform a search without flushing the index? I've got
>> an application that updates the index every 4 hours but I need to be
>> able to search the new data fairly quickly after the index is updated.
>> The problem revolves around the fact that the update is often much less
>> than 10 000 documents so it isn't being flushed until quite a bit
>> latter. I realise I can do a flush after the documents have been updated
>> but I was trying to avoid the flush.
> You can set the XAPIAN_FLUSH_THRESHOLD to a number lower than that.

Or just manually flush the database after you indexer runs:

http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#d0077acafa9485c97b73b8726c375732

--

-- 
Help save the economy:
http://seriouschange.org.uk/

E: simon.roe <at> talusdesign.co.uk
M: 07742079314
Richard Boulton | 5 Jun 11:24 2009

Re: search without flush.

2009/6/2 Richard Heycock <rgh <at> roughage.com.au>:
> Is there a way of running a search after I've called add_document but
> before flush is called?

Yes - WritableDatabase is a subclass of Database, so you can pass it
to the Enquire object and use it to conduct a search.  If you do this,
the (possibly unflushed) version of the database accessible via the
WritableDatabase object will be used for the search, which will always
(by definition) contain the latest changes.  Note that you can only
use this from a single thread, though, so if you need concurrent
searches to be possible, you'll probably need to start flushing
faster, as the other responses discuss.

--

-- 
Richard
Olly Betts | 10 Jun 07:19 2009

Re: search without flush.

On Tue, Jun 02, 2009 at 02:20:03PM +1000, Richard Heycock wrote:
> Is it possible to perform a search without flushing the index? I've got
> an application that updates the index every 4 hours but I need to be
> able to search the new data fairly quickly after the index is updated.
> The problem revolves around the fact that the update is often much less
> than 10 000 documents so it isn't being flushed until quite a bit
> latter. I realise I can do a flush after the documents have been updated
> but I was trying to avoid the flush.

If I can make an analogy here, think of loading a file in to your text
editor and modifying it.  If you want to use the modified version,
either you need to save the file so that other processes can see it
(flush) or use the file in the text editor process (which is what
Richard Boulton suggests).

Hopefully this analogy illuminates why reading the unflushed changes
from other processes isn't possible - it's precisely the act of flushing
which makes them visible to those processes.

Cheers,
    Olly
Richard Heycock | 10 Jun 08:02 2009
Picon

Re: search without flush.

Excerpts from Olly Betts's message of Wed Jun 10 15:19:25 +1000 2009:
> On Tue, Jun 02, 2009 at 02:20:03PM +1000, Richard Heycock wrote:
> > Is it possible to perform a search without flushing the index? I've got
> > an application that updates the index every 4 hours but I need to be
> > able to search the new data fairly quickly after the index is updated.
> > The problem revolves around the fact that the update is often much less
> > than 10 000 documents so it isn't being flushed until quite a bit
> > latter. I realise I can do a flush after the documents have been updated
> > but I was trying to avoid the flush.
> 
> If I can make an analogy here, think of loading a file in to your text
> editor and modifying it.  If you want to use the modified version,
> either you need to save the file so that other processes can see it
> (flush) or use the file in the text editor process (which is what
> Richard Boulton suggests).
> 
> Hopefully this analogy illuminates why reading the unflushed changes
> from other processes isn't possible - it's precisely the act of flushing
> which makes them visible to those processes.

That does make sense! One other question then is what happens if the
program doing the updating crashes? Is the data lost?

rgh

> Cheers,
>     Olly
Olly Betts | 11 Jun 06:35 2009

Re: search without flush.

On Wed, Jun 10, 2009 at 04:02:07PM +1000, Richard Heycock wrote:
> That does make sense! One other question then is what happens if the
> program doing the updating crashes? Is the data lost?

Yes, the changes are only committed when flush() returns successfully.

Cheers,
    Olly

Gmane