Olly Betts | 6 Mar 02:49
Favicon
Gravatar

Merging stats from multiple databases for expand

In matcher/expandweight.cc we have:

OmExpandBits
operator+(const OmExpandBits &bits1, const OmExpandBits &bits2)
{
    OmExpandBits sum(bits1); 
    sum.multiplier += bits2.multiplier;
    sum.rtermfreq += bits2.rtermfreq;

    // FIXME - try to share this information rather than pick half of it
    if (bits2.dbsize > sum.dbsize) {
        DEBUGLINE(WTCALC, "OmExpandBits::operator+ using second operand: " <<
                  bits2.termfreq << "/" << bits2.dbsize << " instead of " <<
                  bits1.termfreq << "/" << bits1.dbsize);
        sum.termfreq = bits2.termfreq;
        sum.dbsize = bits2.dbsize;
    } else {
        DEBUGLINE(WTCALC, "OmExpandBits::operator+ using first operand: " <<
                  bits1.termfreq << "/" << bits1.dbsize << " instead of " <<
                  bits2.termfreq << "/" << bits2.dbsize);
        // sum already contains the parts of the first operand
    }
    return sum;
}

Why don't we "share this information" by just replacing the "if" by:

    sum.termfreq += bits2.termfreq;
    sum.dbsize += bits2.dbsize;

(Continue reading)

Richard Boulton | 6 Mar 10:19

Re: Merging stats from multiple databases for expand

Olly Betts wrote:
> In matcher/expandweight.cc we have:
> 
> OmExpandBits
> operator+(const OmExpandBits &bits1, const OmExpandBits &bits2)
> {
>     OmExpandBits sum(bits1); 
>     sum.multiplier += bits2.multiplier;
>     sum.rtermfreq += bits2.rtermfreq;
>     
>     // FIXME - try to share this information rather than pick half of it
>     if (bits2.dbsize > sum.dbsize) {
>         DEBUGLINE(WTCALC, "OmExpandBits::operator+ using second operand: " <<
>                   bits2.termfreq << "/" << bits2.dbsize << " instead of " <<
>                   bits1.termfreq << "/" << bits1.dbsize);
>         sum.termfreq = bits2.termfreq;
>         sum.dbsize = bits2.dbsize;
>     } else {
>         DEBUGLINE(WTCALC, "OmExpandBits::operator+ using first operand: " <<
>                   bits1.termfreq << "/" << bits1.dbsize << " instead of " <<
>                   bits2.termfreq << "/" << bits2.dbsize);
>         // sum already contains the parts of the first operand
>     }
>     return sum;
> }
> 
> Why don't we "share this information" by just replacing the "if" by:
> 
>     sum.termfreq += bits2.termfreq;
>     sum.dbsize += bits2.dbsize;
(Continue reading)


Gmane