Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9280: Collectors to skip noncompetitive documents #1351

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0d4b2f5
Collectors to skip noncompetitive documents
mayya-sharipova Mar 13, 2020
6384b15
Address feedback1
mayya-sharipova Mar 18, 2020
d732d7e
Address Feedback2
mayya-sharipova Mar 19, 2020
209bc21
Adjust tests
mayya-sharipova Mar 20, 2020
95e1bc1
Address feedback
mayya-sharipova Mar 23, 2020
0e3c7da
Add docs and correct bugs
mayya-sharipova Mar 25, 2020
d7e9507
Make constructor of LongDocValuesPointSortField public
mayya-sharipova Mar 25, 2020
1154d4a
Adjust docs and tests
mayya-sharipova Mar 26, 2020
39379a7
Optimized sort of field without points
mayya-sharipova Mar 30, 2020
6c628f7
Address Adrien's feedback
mayya-sharipova Mar 31, 2020
eeb23c1
Add IterableFieldComparator
mayya-sharipova Apr 2, 2020
89d241e
Address Alan's comments
mayya-sharipova Apr 6, 2020
4448499
Address Alan's feedback2
mayya-sharipova Apr 9, 2020
719882e
Address Alan's feedback 3
mayya-sharipova Apr 11, 2020
c84fe5e
Address Alan's feedback 4
mayya-sharipova Apr 15, 2020
d7ef9b6
Collector returns comparator's iterator
mayya-sharipova Apr 15, 2020
24c94ff
Address Adrien's feedback
mayya-sharipova Apr 21, 2020
2fd9075
Set scorer for inner comparator
mayya-sharipova Apr 23, 2020
b8e138c
Correct Indent
mayya-sharipova Apr 24, 2020
7120424
Separate classes for comparator and leaf comparator
mayya-sharipova Apr 29, 2020
6c62fd0
Merge remote-tracking branch 'upstream/master' into comparator-set-mi…
mayya-sharipova Apr 29, 2020
1ab2a6e
Address Adrien's comments
mayya-sharipova May 26, 2020
f910030
Merge remote-tracking branch 'upstream/master' into comparator-set-mi…
mayya-sharipova May 26, 2020
99dd0c1
Address Adrien's feedback
mayya-sharipova Jun 16, 2020
8ebcff8
Merge remote-tracking branch 'upstream/master' into comparator-set-mi…
mayya-sharipova Jun 16, 2020
55f2940
Add contributors' names
mayya-sharipova Jun 23, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,14 @@ Improvements

* LUCENE-9074: Introduce Slice Executor For Dynamic Runtime Execution Of Slices (Atri Sharma)

* LUCENE-9280: Add an ability for field comparators to skip non-competitive documents.
Creating a TopFieldCollector with totalHitsThreshold less than Integer.MAX_VALUE
instructs Lucene to skip non-competitive documents whenever possible. For numeric
sort fields the skipping functionality works when the same field is indexed both
with doc values and points. In this case, there is an assumption that the same data is
stored in these points and doc values (Mayya Sharipova, Jim Ferenczi, Adrien Grand)


Bug fixes

* LUCENE-8663: NRTCachingDirectory.slowFileExists may open a file while
Expand Down
6 changes: 6 additions & 0 deletions lucene/MIGRATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -287,3 +287,9 @@ TopDocsCollector shall no longer return an empty TopDocs for malformed arguments
Rather, an IllegalArgumentException shall be thrown. This is introduced for better
defence and to ensure that there is no bubbling up of errors when Lucene is
used in multi level applications

## Assumption of data consistency between different data-structures sharing the same field name

Sorting on a numeric field that is indexed with both doc values and points may use an
optimization to skip non-competitive documents. This optimization relies on the assumption
that the same data is stored in these points and doc values.
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float bo
return new ConstantScoreWeight(this, boost) {
@Override
public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
if (scoreMode == ScoreMode.TOP_SCORES) {
if (scoreMode.isExhaustive() == false) {
return super.bulkScorer(context);
}
final BulkScorer innerScorer = innerWeight.bulkScorer(context);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -165,8 +165,8 @@ protected NumericDocValues getNumericDocValues(LeafReaderContext context, String
* org.apache.lucene.index.LeafReader#getNumericDocValues} and sorts by ascending value */
public static class DoubleComparator extends NumericComparator<Double> {
private final double[] values;
private double bottom;
private double topValue;
protected double bottom;
protected double topValue;

/**
* Creates a new comparator based on {@link Double#compare} for {@code numHits}.
Expand Down Expand Up @@ -225,8 +225,8 @@ public int compareTop(int doc) throws IOException {
* org.apache.lucene.index.LeafReader#getNumericDocValues(String)} and sorts by ascending value */
public static class FloatComparator extends NumericComparator<Float> {
private final float[] values;
private float bottom;
private float topValue;
protected float bottom;
protected float topValue;

/**
* Creates a new comparator based on {@link Float#compare} for {@code numHits}.
Expand Down Expand Up @@ -285,8 +285,8 @@ public int compareTop(int doc) throws IOException {
* org.apache.lucene.index.LeafReader#getNumericDocValues(String)} and sorts by ascending value */
public static class IntComparator extends NumericComparator<Integer> {
private final int[] values;
private int bottom; // Value of bottom of queue
private int topValue;
protected int bottom; // Value of bottom of queue
protected int topValue;

/**
* Creates a new comparator based on {@link Integer#compare} for {@code numHits}.
Expand Down Expand Up @@ -347,8 +347,8 @@ public int compareTop(int doc) throws IOException {
* org.apache.lucene.index.LeafReader#getNumericDocValues(String)} and sorts by ascending value */
public static class LongComparator extends NumericComparator<Long> {
private final long[] values;
private long bottom;
private long topValue;
protected long bottom;
protected long topValue;

/**
* Creates a new comparator based on {@link Long#compare} for {@code numHits}.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ private static final class OneComparatorFieldValueHitQueue<T extends FieldValueH
private final int oneReverseMul;
private final FieldComparator<?> oneComparator;

public OneComparatorFieldValueHitQueue(SortField[] fields, int size) {
super(fields, size);
public OneComparatorFieldValueHitQueue(SortField[] fields, int size, boolean filterNonCompetitiveDocs) {
super(fields, size, filterNonCompetitiveDocs);

assert fields.length == 1;
oneComparator = comparators[0];
Expand Down Expand Up @@ -95,8 +95,8 @@ protected boolean lessThan(final Entry hitA, final Entry hitB) {
*/
private static final class MultiComparatorsFieldValueHitQueue<T extends FieldValueHitQueue.Entry> extends FieldValueHitQueue<T> {

public MultiComparatorsFieldValueHitQueue(SortField[] fields, int size) {
super(fields, size);
public MultiComparatorsFieldValueHitQueue(SortField[] fields, int size, boolean filterNonCompetitiveDocs) {
super(fields, size, filterNonCompetitiveDocs);
}

@Override
Expand All @@ -121,7 +121,7 @@ protected boolean lessThan(final Entry hitA, final Entry hitB) {
}

// prevent instantiation and extension.
private FieldValueHitQueue(SortField[] fields, int size) {
private FieldValueHitQueue(SortField[] fields, int size, boolean filterNonCompetitiveDocs) {
super(size);
// When we get here, fields.length is guaranteed to be > 0, therefore no
// need to check it again.
Expand All @@ -135,9 +135,15 @@ private FieldValueHitQueue(SortField[] fields, int size) {
reverseMul = new int[numComparators];
for (int i = 0; i < numComparators; ++i) {
SortField field = fields[i];

reverseMul[i] = field.reverse ? -1 : 1;
comparators[i] = field.getComparator(size, i);
if (i == 0 && filterNonCompetitiveDocs) {
// try to rewrite the 1st comparator to the comparator that can skip non-competitive documents
// skipping functionality is beneficial only for the 1st comparator
comparators[i] = FilteringFieldComparator.wrapToFilteringComparator(field.getComparator(size, i),
field.reverse, numComparators == 1);
} else {
comparators[i] = field.getComparator(size, i);
}
}
}

Expand All @@ -152,17 +158,20 @@ private FieldValueHitQueue(SortField[] fields, int size) {
* priority first); cannot be <code>null</code> or empty
* @param size
* The number of hits to retain. Must be greater than zero.
* @param filterNonCompetitiveDocs
* {@code true} If comparators should be allowed to filter non-competitive documents, {@code false} otherwise
*/
public static <T extends FieldValueHitQueue.Entry> FieldValueHitQueue<T> create(SortField[] fields, int size) {
public static <T extends FieldValueHitQueue.Entry> FieldValueHitQueue<T> create(SortField[] fields, int size,
boolean filterNonCompetitiveDocs) {

if (fields.length == 0) {
throw new IllegalArgumentException("Sort must contain at least one field");
}

if (fields.length == 1) {
return new OneComparatorFieldValueHitQueue<>(fields, size);
return new OneComparatorFieldValueHitQueue<>(fields, size, filterNonCompetitiveDocs);
} else {
return new MultiComparatorsFieldValueHitQueue<>(fields, size);
return new MultiComparatorsFieldValueHitQueue<>(fields, size, filterNonCompetitiveDocs);
}
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.lucene.search;

import org.apache.lucene.index.LeafReaderContext;

import java.io.IOException;

/**
* A wrapper over {@code FieldComparator} that provides a leaf comparator that can filter non-competitive docs.
*/
abstract class FilteringFieldComparator<T> extends FieldComparator<T> {
protected final FieldComparator<T> in;
protected final boolean reverse;
// singleSort is true, if sort is based on a single sort field. As there are no other sorts configured
// as tie breakers, we can filter out docs with equal values.
protected final boolean singleSort;
protected boolean hasTopValue = false;

public FilteringFieldComparator(FieldComparator<T> in, boolean reverse, boolean singleSort) {
this.in = in;
this.reverse = reverse;
this.singleSort = singleSort;
}

@Override
public abstract FilteringLeafFieldComparator getLeafComparator(LeafReaderContext context) throws IOException;

@Override
public int compare(int slot1, int slot2) {
return in.compare(slot1, slot2);
}

@Override
public T value(int slot) {
return in.value(slot);
}

@Override
public void setTopValue(T value) {
in.setTopValue(value);
hasTopValue = true;
}

@Override
public int compareValues(T first, T second) {
return in.compareValues(first, second);
}


/**
* Try to wrap a given field comparator to add to it a functionality to skip over non-competitive docs.
* If for the given comparator the skip functionality is not implemented, return the comparator itself.
* @param comparator – comparator to wrap
* @param reverse – if this sort is reverse
* @param singleSort – true if this sort is based on a single field and there are no other sort fields for tie breaking
* @return comparator wrapped as a filtering comparator or the original comparator if the filtering functionality
* is not implemented for it
*/
public static FieldComparator<?> wrapToFilteringComparator(FieldComparator<?> comparator, boolean reverse, boolean singleSort) {
Class<?> comparatorClass = comparator.getClass();
if (comparatorClass == FieldComparator.LongComparator.class){
return new FilteringNumericComparator<>((FieldComparator.LongComparator) comparator, reverse, singleSort);
}
if (comparatorClass == FieldComparator.IntComparator.class){
return new FilteringNumericComparator<>((FieldComparator.IntComparator) comparator, reverse, singleSort);
}
if (comparatorClass == FieldComparator.DoubleComparator.class){
return new FilteringNumericComparator<>((FieldComparator.DoubleComparator) comparator, reverse, singleSort);
}
if (comparatorClass == FieldComparator.FloatComparator.class){
return new FilteringNumericComparator<>((FieldComparator.FloatComparator) comparator, reverse, singleSort);
}
return comparator;
}

}


Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.lucene.search;

import java.io.IOException;

/**
* Decorates a wrapped LeafFieldComparator to add a functionality to skip over non-competitive docs.
* FilteringLeafFieldComparator provides two additional functions to a LeafFieldComparator:
* {@code competitiveIterator()} and {@code setCanUpdateIterator()}.
*/
public interface FilteringLeafFieldComparator extends LeafFieldComparator {
/**
* Returns a competitive iterator
* @return an iterator over competitive docs that are stronger than already collected docs
* or {@code null} if such an iterator is not available for the current segment.
*/
DocIdSetIterator competitiveIterator() throws IOException;

/**
* Informs this leaf comparator that it is allowed to start updating its competitive iterator.
* This method is called from a collector when queue becomes full and threshold is reached.
*/
void setCanUpdateIterator() throws IOException;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.lucene.search;

import org.apache.lucene.index.LeafReaderContext;

import java.io.IOException;

/**
* A wrapper over {@code NumericComparator} that provides a leaf comparator that can filter non-competitive docs.
*/
class FilteringNumericComparator<T extends Number> extends FilteringFieldComparator<T> {
public FilteringNumericComparator(NumericComparator<T> in, boolean reverse, boolean singleSort) {
super(in, reverse, singleSort);
}

@Override
public final FilteringLeafFieldComparator getLeafComparator(LeafReaderContext context) throws IOException {
LeafFieldComparator inLeafComparator = in.getLeafComparator(context);
Class<?> comparatorClass = inLeafComparator.getClass();
if (comparatorClass == FieldComparator.LongComparator.class) {
return new FilteringNumericLeafComparator.FilteringLongLeafComparator((FieldComparator.LongComparator) inLeafComparator, context,
((LongComparator) inLeafComparator).field, reverse, singleSort, hasTopValue);
} if (comparatorClass == FieldComparator.IntComparator.class) {
return new FilteringNumericLeafComparator.FilteringIntLeafComparator((FieldComparator.IntComparator) inLeafComparator, context,
((IntComparator) inLeafComparator).field, reverse, singleSort, hasTopValue);
} else if (comparatorClass == FieldComparator.DoubleComparator.class) {
return new FilteringNumericLeafComparator.FilteringDoubleLeafComparator((FieldComparator.DoubleComparator) inLeafComparator, context,
((DoubleComparator) inLeafComparator).field, reverse, singleSort, hasTopValue);
} else if (comparatorClass == FieldComparator.FloatComparator.class) {
return new FilteringNumericLeafComparator.FilteringFloatLeafComparator((FieldComparator.FloatComparator) inLeafComparator, context,
((FloatComparator) inLeafComparator).field, reverse, singleSort, hasTopValue);
} else {
throw new IllegalStateException("Unexpected numeric class of ["+ comparatorClass + "] for [FieldComparator]!");
}
}

}
Loading