LUCENE-9565 Fix competitive iteration #1952

mayya-sharipova · 2020-10-06T14:29:12Z

PR #1351 introduced a sort optimization where documents can be skipped.
But iteration over competitive iterators was not properly organized,
as they were not storing the current docID, and
when competitive iterator was updated, the current doc ID was lost.

This patch fixes it.

Relates to #1351

PR apache#1351 introduced a sort optimization where documents can be skipped. But iteration over competitive iterators was not properly organized, as they were not storing the current docID, and when competitive iterator was updated the current doc ID was lost. This patch fixed it. Relates to apache#1351

mayya-sharipova · 2020-10-06T14:33:32Z

This patch works well with checks in ConjunctionDISI, and all tests pass as well.

After we merge this PR, we can reintroduce the checks in ConjunctionDISI.

jpountz · 2020-10-06T15:09:31Z

lucene/core/src/java/org/apache/lucene/search/Weight.java

+      if (collectorIterator != null) {
+        if (scorerIterator.docID() != -1) {
+          collectorIterator.advance(scorerIterator.docID());
+        }


I don't think that this is good enough as we might be advancing ahead of scorerIterator? This was why I thought that we should instead wrap scorerIterator in such a way that its initial docID would be -1.

@jpountz Addressed in cba6cf7

oh, I had not considered setting scorerIterator.docID() as a min docID, maybe this means that we no longer need the min parameter of RangeDISIWrapper?

Thanks @jpountz , addressed in d42c464

jpountz · 2020-10-06T16:56:37Z

lucene/core/src/java/org/apache/lucene/search/Weight.java

+      if (collectorIterator != null) {
+        if (scorerIterator.docID() != -1) {
+          collectorIterator.advance(scorerIterator.docID());
+        }


oh, I had not considered setting scorerIterator.docID() as a min docID, maybe this means that we no longer need the min parameter of RangeDISIWrapper?

PR #1351 introduced a sort optimization where documents can be skipped. But iteration over competitive iterators was not properly organized, as they were not storing the current docID, and when competitive iterator was updated the current doc ID was lost. This patch fixed it. Relates to #1351

jpountz · 2020-10-06T20:25:18Z

lucene/core/src/java/org/apache/lucene/search/Weight.java

+      if (target >= max) {
+        return docID = NO_MORE_DOCS;
+      }
+      return docID = in.advance(target);


it just occurred to me that this implementation is not correct in the case that the minimum bound of the range of doc IDs to score is less than the current doc ID of the scorer, have you seen any failures with your change? I wonder that we would need to do

if (target >= scorer.docID()) { return scorer.docID(); }

but we should create a test that fails without this

Indeed the recent failures on Lucene are because of this. ReqExclBulkScorer callsscore method several times and sometimes with the minimum bound that is <= scorer.docID().

I am thinking the logic is becoming more difficult. I am thinking to go back to the initial commit and advance collectorIterator to the same doc as scorerIterator. At the beginning a collectorIterator matches all docs, so it should precisely advance to the `scorerIterator.docID()).

@jpountz I've created a new PR: #1955. So sorry for the mess.

@jpountz

By this if (target >= scorer.docID()) { return scorer.docID(); }

Did you mean if (target <= scorer.docID()) { return scorer.docID(); } ?

mayya-sharipova requested a review from jpountz October 6, 2020 14:29

jpountz requested changes Oct 6, 2020

View reviewed changes

Wrap ScorerIterator to start from -1 for conjunction

cba6cf7

jpountz approved these changes Oct 6, 2020

View reviewed changes

Address Adrien's comments

d42c464

mayya-sharipova merged commit 874c446 into apache:master Oct 6, 2020

mayya-sharipova deleted the competitive-iteration branch October 6, 2020 17:22

jpountz reviewed Oct 6, 2020

View reviewed changes

asfimport mentioned this pull request Oct 7, 2020

Fix iteration over competitive iterators [LUCENE-9565] apache/lucene#10605

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCENE-9565 Fix competitive iteration #1952

LUCENE-9565 Fix competitive iteration #1952

mayya-sharipova commented Oct 6, 2020 •

edited

Loading

mayya-sharipova commented Oct 6, 2020

jpountz Oct 6, 2020

mayya-sharipova Oct 6, 2020

jpountz Oct 6, 2020

mayya-sharipova Oct 6, 2020

jpountz Oct 6, 2020

jpountz Oct 6, 2020

mayya-sharipova Oct 6, 2020 •

edited

Loading

mayya-sharipova Oct 6, 2020

mayya-sharipova Oct 7, 2020

LUCENE-9565 Fix competitive iteration #1952

LUCENE-9565 Fix competitive iteration #1952

Conversation

mayya-sharipova commented Oct 6, 2020 • edited Loading

mayya-sharipova commented Oct 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mayya-sharipova Oct 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mayya-sharipova commented Oct 6, 2020 •

edited

Loading

mayya-sharipova Oct 6, 2020 •

edited

Loading