Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9565 Fix competitive iteration #1952

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 52 additions & 3 deletions lucene/core/src/java/org/apache/lucene/search/Weight.java
Original file line number Diff line number Diff line change
Expand Up @@ -204,9 +204,17 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, int max) thr
collector.setScorer(scorer);
DocIdSetIterator scorerIterator = twoPhase == null ? iterator : twoPhase.approximation();
DocIdSetIterator collectorIterator = collector.competitiveIterator();
// if possible filter scorerIterator to keep only competitive docs as defined by collector
DocIdSetIterator filteredIterator = collectorIterator == null ? scorerIterator :
ConjunctionDISI.intersectIterators(Arrays.asList(scorerIterator, collectorIterator));
DocIdSetIterator filteredIterator;
if (collectorIterator == null) {
filteredIterator = scorerIterator;
} else {
if (scorerIterator.docID() != -1) {
// Wrap ScorerIterator to start from -1 for conjunction
scorerIterator = new RangeDISIWrapper(scorerIterator, max);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is good enough as we might be advancing ahead of scorerIterator? This was why I thought that we should instead wrap scorerIterator in such a way that its initial docID would be -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz Addressed in cba6cf7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I had not considered setting scorerIterator.docID() as a min docID, maybe this means that we no longer need the min parameter of RangeDISIWrapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jpountz , addressed in d42c464

// filter scorerIterator to keep only competitive docs as defined by collector
filteredIterator = ConjunctionDISI.intersectIterators(Arrays.asList(scorerIterator, collectorIterator));
}
if (filteredIterator.docID() == -1 && min == 0 && max == DocIdSetIterator.NO_MORE_DOCS) {
scoreAll(collector, filteredIterator, twoPhase, acceptDocs);
return DocIdSetIterator.NO_MORE_DOCS;
Expand Down Expand Up @@ -266,4 +274,45 @@ static void scoreAll(LeafCollector collector, DocIdSetIterator iterator, TwoPhas
}
}

/**
* Wraps an internal docIdSetIterator for it to start with docID = -1
*/
protected static class RangeDISIWrapper extends DocIdSetIterator {
private final DocIdSetIterator in;
private final int min;
private final int max;
private int docID = -1;

public RangeDISIWrapper(DocIdSetIterator in, int max) {
this.in = in;
this.min = in.docID();
this.max = max;
}

@Override
public int docID() {
return docID;
}

@Override
public int nextDoc() throws IOException {
return advance(docID + 1);
}

@Override
public int advance(int target) throws IOException {
target = Math.max(min, target);
if (target >= max) {
return docID = NO_MORE_DOCS;
}
return docID = in.advance(target);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just occurred to me that this implementation is not correct in the case that the minimum bound of the range of doc IDs to score is less than the current doc ID of the scorer, have you seen any failures with your change? I wonder that we would need to do

if (target >= scorer.docID()) { return scorer.docID(); }

but we should create a test that fails without this

Copy link
Contributor Author

@mayya-sharipova mayya-sharipova Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the recent failures on Lucene are because of this. ReqExclBulkScorer callsscore method several times and sometimes with the minimum bound that is <= scorer.docID().

I am thinking the logic is becoming more difficult. I am thinking to go back to the initial commit and advance collectorIterator to the same doc as scorerIterator. At the beginning a collectorIterator matches all docs, so it should precisely advance to the `scorerIterator.docID()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz I've created a new PR: #1955. So sorry for the mess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz

By this if (target >= scorer.docID()) { return scorer.docID(); }

Did you mean if (target <= scorer.docID()) { return scorer.docID(); } ?

}

@Override
public long cost() {
return Math.min(max - min, in.cost());
}

}

}
Original file line number Diff line number Diff line change
Expand Up @@ -133,14 +133,16 @@ public DocIdSetIterator competitiveIterator() {
return null;
} else {
return new DocIdSetIterator() {
private int docID = -1;

@Override
public int nextDoc() throws IOException {
return competitiveIterator.nextDoc();
return advance(docID + 1);
}

@Override
public int docID() {
return competitiveIterator.docID();
return docID;
}

@Override
Expand All @@ -150,7 +152,7 @@ public long cost() {

@Override
public int advance(int target) throws IOException {
return competitiveIterator.advance(target);
return docID = competitiveIterator.advance(target);
}
};
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -220,14 +220,16 @@ public PointValues.Relation compare(byte[] minPackedValue, byte[] maxPackedValue
public DocIdSetIterator competitiveIterator() {
if (enableSkipping == false) return null;
return new DocIdSetIterator() {
private int docID = -1;

@Override
public int nextDoc() throws IOException {
return competitiveIterator.nextDoc();
return advance(docID + 1);
}

@Override
public int docID() {
return competitiveIterator.docID();
return docID;
}

@Override
Expand All @@ -237,7 +239,7 @@ public long cost() {

@Override
public int advance(int target) throws IOException {
return competitiveIterator.advance(target);
return docID = competitiveIterator.advance(target);
}
};
}
Expand Down