-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9565 Fix competitive iteration #1952
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -204,9 +204,17 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, int max) thr | |
collector.setScorer(scorer); | ||
DocIdSetIterator scorerIterator = twoPhase == null ? iterator : twoPhase.approximation(); | ||
DocIdSetIterator collectorIterator = collector.competitiveIterator(); | ||
// if possible filter scorerIterator to keep only competitive docs as defined by collector | ||
DocIdSetIterator filteredIterator = collectorIterator == null ? scorerIterator : | ||
ConjunctionDISI.intersectIterators(Arrays.asList(scorerIterator, collectorIterator)); | ||
DocIdSetIterator filteredIterator; | ||
if (collectorIterator == null) { | ||
filteredIterator = scorerIterator; | ||
} else { | ||
if (scorerIterator.docID() != -1) { | ||
// Wrap ScorerIterator to start from -1 for conjunction | ||
scorerIterator = new RangeDISIWrapper(scorerIterator, max); | ||
} | ||
// filter scorerIterator to keep only competitive docs as defined by collector | ||
filteredIterator = ConjunctionDISI.intersectIterators(Arrays.asList(scorerIterator, collectorIterator)); | ||
} | ||
if (filteredIterator.docID() == -1 && min == 0 && max == DocIdSetIterator.NO_MORE_DOCS) { | ||
scoreAll(collector, filteredIterator, twoPhase, acceptDocs); | ||
return DocIdSetIterator.NO_MORE_DOCS; | ||
|
@@ -266,4 +274,45 @@ static void scoreAll(LeafCollector collector, DocIdSetIterator iterator, TwoPhas | |
} | ||
} | ||
|
||
/** | ||
* Wraps an internal docIdSetIterator for it to start with docID = -1 | ||
*/ | ||
protected static class RangeDISIWrapper extends DocIdSetIterator { | ||
private final DocIdSetIterator in; | ||
private final int min; | ||
private final int max; | ||
private int docID = -1; | ||
|
||
public RangeDISIWrapper(DocIdSetIterator in, int max) { | ||
this.in = in; | ||
this.min = in.docID(); | ||
this.max = max; | ||
} | ||
|
||
@Override | ||
public int docID() { | ||
return docID; | ||
} | ||
|
||
@Override | ||
public int nextDoc() throws IOException { | ||
return advance(docID + 1); | ||
} | ||
|
||
@Override | ||
public int advance(int target) throws IOException { | ||
target = Math.max(min, target); | ||
if (target >= max) { | ||
return docID = NO_MORE_DOCS; | ||
} | ||
return docID = in.advance(target); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it just occurred to me that this implementation is not correct in the case that the minimum bound of the range of doc IDs to score is less than the current doc ID of the scorer, have you seen any failures with your change? I wonder that we would need to do
but we should create a test that fails without this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed the recent failures on Lucene are because of this. I am thinking the logic is becoming more difficult. I am thinking to go back to the initial commit and advance collectorIterator to the same doc as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Did you mean if (target <= scorer.docID()) { return scorer.docID(); } ? |
||
} | ||
|
||
@Override | ||
public long cost() { | ||
return Math.min(max - min, in.cost()); | ||
} | ||
|
||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this is good enough as we might be advancing ahead of scorerIterator? This was why I thought that we should instead wrap scorerIterator in such a way that its initial docID would be -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpountz Addressed in cba6cf7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I had not considered setting
scorerIterator.docID()
as a min docID, maybe this means that we no longer need themin
parameter ofRangeDISIWrapper
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jpountz , addressed in d42c464