-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9565 Fix competitive iteration #1952
LUCENE-9565 Fix competitive iteration #1952
Conversation
PR apache#1351 introduced a sort optimization where documents can be skipped. But iteration over competitive iterators was not properly organized, as they were not storing the current docID, and when competitive iterator was updated the current doc ID was lost. This patch fixed it. Relates to apache#1351
This patch works well with checks in ConjunctionDISI, and all tests pass as well. After we merge this PR, we can reintroduce the checks in ConjunctionDISI. |
if (collectorIterator != null) { | ||
if (scorerIterator.docID() != -1) { | ||
collectorIterator.advance(scorerIterator.docID()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this is good enough as we might be advancing ahead of scorerIterator? This was why I thought that we should instead wrap scorerIterator in such a way that its initial docID would be -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I had not considered setting scorerIterator.docID()
as a min docID, maybe this means that we no longer need the min
parameter of RangeDISIWrapper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (collectorIterator != null) { | ||
if (scorerIterator.docID() != -1) { | ||
collectorIterator.advance(scorerIterator.docID()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I had not considered setting scorerIterator.docID()
as a min docID, maybe this means that we no longer need the min
parameter of RangeDISIWrapper
?
if (target >= max) { | ||
return docID = NO_MORE_DOCS; | ||
} | ||
return docID = in.advance(target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it just occurred to me that this implementation is not correct in the case that the minimum bound of the range of doc IDs to score is less than the current doc ID of the scorer, have you seen any failures with your change? I wonder that we would need to do
if (target >= scorer.docID()) { return scorer.docID(); }
but we should create a test that fails without this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed the recent failures on Lucene are because of this. ReqExclBulkScorer
callsscore
method several times and sometimes with the minimum bound that is <= scorer.docID().
I am thinking the logic is becoming more difficult. I am thinking to go back to the initial commit and advance collectorIterator to the same doc as scorerIterator
. At the beginning a collectorIterator
matches all docs, so it should precisely advance to the `scorerIterator.docID()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By this
if (target >= scorer.docID()) { return scorer.docID(); }
Did you mean if (target <= scorer.docID()) { return scorer.docID(); } ?
PR #1351 introduced a sort optimization where documents can be skipped.
But iteration over competitive iterators was not properly organized,
as they were not storing the current docID, and
when competitive iterator was updated, the current doc ID was lost.
This patch fixes it.
Relates to #1351