17 Jul 2012 18:44
TermEnum.docFreq() includes deleted docs
Roman Chyla <roman.chyla <at> gmail.com>
2012-07-17 16:44:56 GMT
2012-07-17 16:44:56 GMT
Hi,
Tests show that TermEnum.docFreq() returns sum of all docs, including
the deleted ones. Which seems to (indirectly) contradict the javadoc
This frequency count is used to compute uninverted index
(DocTermOrds.uninvert()). The code goes like:
final int df = te.docFreq();
if (df <= maxTermDocFreq) {
So, if I happen to have many deleted documents, and maxTermDocFreq is
low, then the term will be excluded (even if the freq of the livedocs
is OK). Most likely, the cache will be incomplete.
Can it be considered a feature? Or is it a bug?
Thanks,
roman
RSS Feed