Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field #677

Merged
merged 8 commits into from
Feb 17, 2022

Conversation

vigyasharma
Copy link
Contributor

@vigyasharma vigyasharma commented Feb 14, 2022

Description

Since all documents are required to use the same features (LUCENE-9334) we could rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or points have a docCount that is equal to maxDoc.

Solution

Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == maxDoc

Tests

Unit tests:

  1. All docs have terms
  2. All docs have PointValues
  3. Some docs are missing either Terms or PointValues. Query is not rewritten.
  4. All docs have the same fields, but they're neither Terms nor PointValues. Rewrite evaluation logic skips this case as it would need an iteration over all the docs.
  5. Existing tests pass.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

Copy link
Member

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I left a tiny comment related to tests. Could you also add an entry to CHANGES.txt under "Lucene 9.1.0"?

@vigyasharma
Copy link
Contributor Author

This looks great. I left a tiny comment related to tests. Could you also add an entry to CHANGES.txt under "Lucene 9.1.0"?

Thank you for reviewing this PR, @jtibshirani. I've added the Changes entry and updates UTs to not assert on the search result.

@jtibshirani jtibshirani merged commit c132bbf into apache:main Feb 17, 2022
jtibshirani pushed a commit that referenced this pull request Feb 17, 2022
…when all docs have the field (#677)

Since all documents are required to use the same features (LUCENE-9334) we can
rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or
points have a docCount that is equal to maxDoc.
LeafReader leaf = context.reader();
Terms terms = leaf.terms(field);
PointValues pointValues = leaf.getPointValues(field);
if ((terms != null && terms.getDocCount() == leaf.maxDoc())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If condition false, maybe we should break for loop early?

wjp719 added a commit to wjp719/lucene that referenced this pull request Feb 23, 2022
* main:
  migrate to temurin (apache#697)
  LUCENE-10424: Optimize the "everything matches" case for count query in PointRangeQuery (apache#691)
  LUCENE-10416: Update Korean Dictionary to mecab-ko-dic-2.1.1-20180720 for Nori
  Remove deprecated constructors in Nori (apache#695)
  LUCENE-10400: revise binary dictionaries' constructor in nori (apache#693)
  LUCENE-10408: Fix vector values iteration bug (apache#690)
  Temporarily mute TestKnnVectorQuery#testRandomWithFilter
  LUCENE-10382: Support filtering in KnnVectorQuery (apache#656)
  LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field (apache#677)
  Add CHANGES entry for LUCENE-10398
  LUCENE-10398: Add static method for getting Terms from LeafReader (apache#678)
  LUCENE-10408 Better encoding of doc Ids in vectors (apache#649)
  LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (apache#685)
wjp719 added a commit to wjp719/lucene that referenced this pull request Feb 23, 2022
* main:
  LUCENE-10416: move changes entry to v10.0.0
  migrate to temurin (apache#697)
  LUCENE-10424: Optimize the "everything matches" case for count query in PointRangeQuery (apache#691)
  LUCENE-10416: Update Korean Dictionary to mecab-ko-dic-2.1.1-20180720 for Nori
  Remove deprecated constructors in Nori (apache#695)
  LUCENE-10400: revise binary dictionaries' constructor in nori (apache#693)
  LUCENE-10408: Fix vector values iteration bug (apache#690)
  Temporarily mute TestKnnVectorQuery#testRandomWithFilter
  LUCENE-10382: Support filtering in KnnVectorQuery (apache#656)
  LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field (apache#677)
  Add CHANGES entry for LUCENE-10398
  LUCENE-10398: Add static method for getting Terms from LeafReader (apache#678)
  LUCENE-10408 Better encoding of doc Ids in vectors (apache#649)
  LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (apache#685)
dantuzi pushed a commit to SeaseLtd/lucene that referenced this pull request Mar 10, 2022
…when all docs have the field (apache#677)

Since all documents are required to use the same features (LUCENE-9334) we can
rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or
points have a docCount that is equal to maxDoc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants