Monsur Hossain | 1 Jun 2006 05:12
Picon

Re: preloading / "warming up" the index

When Lucene first issues a query, it caches a hash of sort values (one
value per document, plus a bit more if you are sorting on strings),
which takes a while.  Therefore, when our application first starts up,
we issue one query per sort type.  As I understand, it doesn't matter
what the query is or how complicated it is.

Monsur

On 5/31/06, Charles Mi <charlesmi <at> gmail.com> wrote:
> Is there a way to preload the index into memory when the process starts?
> Basically I want to warm up the index before processing user queries. What
> are some recommended ways to do this? Thanks.
>
>
Charles Mi | 1 Jun 2006 05:55
Picon

Re: preloading / "warming up" the index

Thanks for the advice guys... i'm still not entirely clear on what a search
causes Lucene to do with respect to warming up/caching portions of the index
in memory.

If I warm up lucene using a search for "apple",  does Lucene load the entire
inverted index into Memory, or just the part of the index that contains the
entry for "apple" ?   Basically I'd like to make sure that the entire
inverted index (or as much as possible) is preloaded into memory, so if I
issue a subsequent search for "microsoft", it will be fast.    Does Lucene
have any mechanism for preloading the inverted index into memory?   Also is
there a way to figure out what percentage of lucene's data storage is
occupied by the inverted index, and what percentage is occupied by the other
info, like storing the documents' field values and such.

Thanks!
Charles

On 5/31/06, Monsur Hossain <monsurh <at> gmail.com> wrote:
>
> When Lucene first issues a query, it caches a hash of sort values (one
> value per document, plus a bit more if you are sorting on strings),
> which takes a while.  Therefore, when our application first starts up,
> we issue one query per sort type.  As I understand, it doesn't matter
> what the query is or how complicated it is.
>
> Monsur
>
>
>
> On 5/31/06, Charles Mi <charlesmi <at> gmail.com> wrote:
(Continue reading)

Chris Hostetter | 1 Jun 2006 06:53

Re: preloading / "warming up" the index


: entry for "apple" ?   Basically I'd like to make sure that the entire
: inverted index (or as much as possible) is preloaded into memory, so if I

if you've got enough ram, and you really want everything loaded into
memory, you can allways use a RAMDirectory.

even if you want your index stored on disk for persistence, you can open a
RAMDirectory that mirrors an FSDirectory and use the RAMDirectory for
searching...

http://lucene.apache.org/java/docs/api/org/apache/lucene/store/RAMDirectory.html

...but i have no idea wether this is acctually faster then just trusting
your OS's filesystem cache.

-Hoss
Otis Gospodnetic | 1 Jun 2006 06:38
Picon
Favicon

Re: preloading / "warming up" the index

Look in your index directory and look for a .tii file.  That file is read in RAM (if there is enough of it.  If
there is not, you will see OOM).  What Monsur was talking about is related to sorting and warming up of
FieldCache instances.  If you don't sort your results by criteria other than the default relevance, you
can ignore FieldCache.
Any query should cause Lucene to read the whole .tii in RAM.
If you do not see a .tii file in your index directory, and instead see one or more .cfs file, you are using the
compound index format.  Run IndexReader as a java app (e.g. java org.apache.lucene....IndexReader
/your/index/dir/file(?)) to get a listing of individual index files inside a single cfs file.

Otis

----- Original Message ----
From: Charles Mi <charlesmi <at> gmail.com>
To: java-user <at> lucene.apache.org
Sent: Wednesday, May 31, 2006 11:55:44 PM
Subject: Re: preloading / "warming up" the index

Thanks for the advice guys... i'm still not entirely clear on what a search
causes Lucene to do with respect to warming up/caching portions of the index
in memory.

If I warm up lucene using a search for "apple",  does Lucene load the entire
inverted index into Memory, or just the part of the index that contains the
entry for "apple" ?   Basically I'd like to make sure that the entire
inverted index (or as much as possible) is preloaded into memory, so if I
issue a subsequent search for "microsoft", it will be fast.    Does Lucene
have any mechanism for preloading the inverted index into memory?   Also is
there a way to figure out what percentage of lucene's data storage is
occupied by the inverted index, and what percentage is occupied by the other
info, like storing the documents' field values and such.
(Continue reading)

Charles Mi | 1 Jun 2006 07:16
Picon

Re: preloading / "warming up" the index

Is there a way to preload portions of the other files, particularly .tis,
.frq, .prx into memory?  My total index size is roughly 4GB and we have 2GB
memory in the machine... the .tii file is tiny (about 1.5 MB).    Basically,
before my server starts accepting and handling queries, I'd like to load as
much of the index into memory as possible, so that Lucene doesn't have to
always hit the disk for each unique keyword, which is an order of magnitude
slower...  Does Lucene have this preloading feature?

If not, is there a way to approximate the preloading to get Lucene to cache
some of the disk data into memory?   I tried to warm up Lucene by searching
for 1000 random terms... but that doesn't seem to work.  After warming up,
searching for those same exact 1000 terms again was very fast....
but querying 1000 *other* terms was just as slow as the warm up query
speed...

Much thanks,
~Heng

On 5/31/06, Otis Gospodnetic <otis_gospodnetic <at> yahoo.com> wrote:

> Look in your index directory and look for a .tii file.  That file is read
> in RAM (if there is enough of it.  If there is not, you will see OOM).  What
> Monsur was talking about is related to sorting and warming up of FieldCache
> instances.  If you don't sort your results by criteria other than the
> default relevance, you can ignore FieldCache.
> Any query should cause Lucene to read the whole .tii in RAM.
> If you do not see a .tii file in your index directory, and instead see one
> or more .cfs file, you are using the compound index format.  Run IndexReader
> as a java app (e.g. java org.apache.lucene....IndexReader/your/index/dir/file(?)) to get a listing
of individual index files inside a
(Continue reading)

Cheolgoo Kang | 1 Jun 2006 05:18
Picon

Re: preloading / "warming up" the index

Check this out.

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200512.mbox/%3c48b708490512102110s6913a4c3k1c2c152596e50e06 <at> mail.gmail.com%3e

On 6/1/06, Monsur Hossain <monsurh <at> gmail.com> wrote:
> When Lucene first issues a query, it caches a hash of sort values (one
> value per document, plus a bit more if you are sorting on strings),
> which takes a while.  Therefore, when our application first starts up,
> we issue one query per sort type.  As I understand, it doesn't matter
> what the query is or how complicated it is.
>
> Monsur
>
>
>
> On 5/31/06, Charles Mi <charlesmi <at> gmail.com> wrote:
> > Is there a way to preload the index into memory when the process starts?
> > Basically I want to warm up the index before processing user queries. What
> > are some recommended ways to do this? Thanks.
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe <at> lucene.apache.org
> For additional commands, e-mail: java-user-help <at> lucene.apache.org
>
>

--

-- 
Cheolgoo
(Continue reading)


Gmane