-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9384: Backport for field sort optimization #1610
LUCENE-9384: Backport for field sort optimization #1610
Conversation
Backport for: LUCENE-9280: Collectors to skip noncompetitive documents (apache#1351) Similar how scorers can update their iterators to skip non-competitive documents, collectors and comparators should also provide and update iterators that allow them to skip non-competive documents. To enable sort optimization for numeric sort fields, the following needs to be done: 1) the field should be indexed with both doc_values and points, that must have the same field name and same data 2) SortField#setCanSkipNonCompetitiveDocs must be set 3) totalHitsThreshold should not be set to max value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, I left a comment regarding the name of the option. @romseygeek any idea on how we should expose the option in 8x ?
* that these fields have the same name. | ||
* This allows to use sort optimization and skip non-competitive documents. | ||
*/ | ||
public void setSkipNonCompetitiveDocs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should make the option less ambiguous by naming it something like setCanUsePoints
or something along those lines ? I am afraid that this could be misleading if we set this option on a sort field that use the _score or a keyword field for instance. We should maybe at least validate that the field is not null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about setCanUsePointsForSort
? I am also fine with setCanUsePoints
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpountz Do you have suggestions how we should expose sort optimization in 8.x? Are you ok with a proposal to have a method for SortField::setCanUsePointsForSort
that when set true
indicates that the same data is used for indexing points and docValues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO setCanUsePoints
is good enough since this is a "SortField" so it should be obvious that this is for sorting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good to me. I wonder if updating equals/hashcode is right.
* that these fields have the same name. | ||
* This allows to use sort optimization and skip non-competitive documents. | ||
*/ | ||
public void setSkipNonCompetitiveDocs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO setCanUsePoints
is good enough since this is a "SortField" so it should be obvious that this is for sorting?
@@ -408,6 +428,7 @@ public boolean equals(Object o) { | |||
&& other.reverse == this.reverse | |||
&& Objects.equals(this.comparatorSource, other.comparatorSource) | |||
&& Objects.equals(this.missingValue, other.missingValue) | |||
&& other.skipNonCompetitiveDocs == this.skipNonCompetitiveDocs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should actually take it into account for equals/hashcode as this could disable index sorting optimizations, which check for equality of the index-time and search-time SortField instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpountz Thank you for your feedback. Alternatively we can add setCanUsePoints
parameter to Sort
object instead of SortField
? What do you think?
6827412
to
6158872
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -428,7 +428,6 @@ public boolean equals(Object o) { | |||
&& other.reverse == this.reverse | |||
&& Objects.equals(this.comparatorSource, other.comparatorSource) | |||
&& Objects.equals(this.missingValue, other.missingValue) | |||
&& other.skipNonCompetitiveDocs == this.skipNonCompetitiveDocs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment to say that we're not taking it into account on purpose? I foresee that someone will file a PR to add it after running a code analysis tool that would complain that not all fields are taken into account for equals otherwise.
Backport for: LUCENE-9280: Collectors to skip noncompetitive documents (apache#1351) Similar how scorers can update their iterators to skip non-competitive documents, collectors and comparators should also provide and update iterators that allow them to skip non-competive documents. To enable sort optimization for numeric sort fields, the following needs to be done: 1) the field should be indexed with both doc_values and points, that must have the same field name and same data 2) SortField#setCanUsePoints must be set 3) totalHitsThreshold should not be set to max value.
Backport for: LUCENE-9280: Collectors to skip noncompetitive documents (#1351)
Similar how scorers can update their iterators to skip non-competitive
documents, collectors and comparators should also provide and update
iterators that allow them to skip non-competive documents.
To enable sort optimization for numeric sort fields,
the following needs to be done:
must have the same field name and same data