Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9565 Fix competitive iteration #1952

Merged

Conversation

mayya-sharipova
Copy link
Contributor

@mayya-sharipova mayya-sharipova commented Oct 6, 2020

PR #1351 introduced a sort optimization where documents can be skipped.
But iteration over competitive iterators was not properly organized,
as they were not storing the current docID, and
when competitive iterator was updated, the current doc ID was lost.

This patch fixes it.

Relates to #1351

PR apache#1351 introduced a sort optimization where documents can be skipped.
But iteration over competitive iterators was not properly organized,
as they were not storing the current docID, and
when competitive iterator was updated the current doc ID was lost.

This patch fixed it.

Relates to apache#1351
@mayya-sharipova
Copy link
Contributor Author

This patch works well with checks in ConjunctionDISI, and all tests pass as well.

After we merge this PR, we can reintroduce the checks in ConjunctionDISI.

if (collectorIterator != null) {
if (scorerIterator.docID() != -1) {
collectorIterator.advance(scorerIterator.docID());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is good enough as we might be advancing ahead of scorerIterator? This was why I thought that we should instead wrap scorerIterator in such a way that its initial docID would be -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz Addressed in cba6cf7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I had not considered setting scorerIterator.docID() as a min docID, maybe this means that we no longer need the min parameter of RangeDISIWrapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jpountz , addressed in d42c464

if (collectorIterator != null) {
if (scorerIterator.docID() != -1) {
collectorIterator.advance(scorerIterator.docID());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I had not considered setting scorerIterator.docID() as a min docID, maybe this means that we no longer need the min parameter of RangeDISIWrapper?

@mayya-sharipova mayya-sharipova merged commit 874c446 into apache:master Oct 6, 2020
@mayya-sharipova mayya-sharipova deleted the competitive-iteration branch October 6, 2020 17:22
mayya-sharipova added a commit that referenced this pull request Oct 6, 2020
PR #1351 introduced a sort optimization where documents can be skipped.
But iteration over competitive iterators was not properly organized,
as they were not storing the current docID, and
when competitive iterator was updated the current doc ID was lost.

This patch fixed it.

Relates to #1351
if (target >= max) {
return docID = NO_MORE_DOCS;
}
return docID = in.advance(target);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just occurred to me that this implementation is not correct in the case that the minimum bound of the range of doc IDs to score is less than the current doc ID of the scorer, have you seen any failures with your change? I wonder that we would need to do

if (target >= scorer.docID()) { return scorer.docID(); }

but we should create a test that fails without this

Copy link
Contributor Author

@mayya-sharipova mayya-sharipova Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the recent failures on Lucene are because of this. ReqExclBulkScorer callsscore method several times and sometimes with the minimum bound that is <= scorer.docID().

I am thinking the logic is becoming more difficult. I am thinking to go back to the initial commit and advance collectorIterator to the same doc as scorerIterator. At the beginning a collectorIterator matches all docs, so it should precisely advance to the `scorerIterator.docID()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz I've created a new PR: #1955. So sorry for the mess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz

By this if (target >= scorer.docID()) { return scorer.docID(); }

Did you mean if (target <= scorer.docID()) { return scorer.docID(); } ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants