Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index Segment without DocValues May Cause Search to Fail [LUCENE-9755] #10794

Open
asfimport opened this issue Feb 9, 2021 · 1 comment
Open

Comments

@asfimport
Copy link

Not sure if this can be considered a bug, but it is certainly a caveat that may slip through testing due to its nature.

Consider the following scenario:

  • all documents in the index have a field "numfield" indexed as IntPoint
  • in addition, SOME of those documents are also indexed with a SortedNumericDocValuesField using the same "numfield" name

The documents without the DocValues cannot be matched from any queries that involve sorting, so we save some space by omitting the DocValues for those documents.

This works perfectly fine, unless

  • the index contains a segment that only contains documents without the DocValues

In this case, running a query that sorts by "numfield" will throw the following exception:

java.lang.IllegalStateException: unexpected docvalues type NONE for field 'numfield' (expected one of [SORTED_NUMERIC, NUMERIC]). Re-index with correct docvalues type.
   at org.apache.lucene.index.DocValues.checkField(DocValues.java:317)
   at org.apache.lucene.index.DocValues.getSortedNumeric(DocValues.java:389)
   at org.apache.lucene.search.SortedNumericSortField$3.getNumericDocValues(SortedNumericSortField.java:159)
   at org.apache.lucene.search.FieldComparator$NumericComparator.doSetNextReader(FieldComparator.java:155)

I have included a minimal example program that demonstrates the issue. This will

  • create an index with two documents, each having "numfield" indexed
  • add a DocValuesField "numfield" only for the first document
  • force the two documents into separate index segments
  • run a query that matches only the first document and sorts by "numfield"

This results in the aforementioned exception.

When removing the following lines from the code:

if (i==docCount/2) {
  iw.commit();
}

both documents get added to the same segment. When re-running the code creating with a single index segment, the query works fine.

Tested with Lucene 8.3.1 and 8.8.0  .

Like I said, this may not be considered a bug. But it has slipped through our testing because the existence of such a DocValues-free segment is such a rare and short-lived event.

We can avoid this issue in the future by using a different field name for the DocValuesField. But for our production systems we have to patch DocValues.checkField() to suppress the IllegalStateException as reindexing is not an option right now.


Migrated from LUCENE-9755 by Thomas Hecker, updated Feb 11 2021
Attachments: DocValuesTest.java

@asfimport
Copy link
Author

asfimport commented Feb 11, 2021

Mayya Sharipova (@mayya-sharipova) (migrated from JIRA)

>> Consider the following scenario:

>> all documents in the index have a field "numfield" indexed as IntPoint

>> in addition, SOME of those documents are also indexed with a SortedNumericDocValuesField using the same "numfield" name

Thomas Hecker. I am working on the #10374  that will ensure that this never happens. That is, if a document has "numfield" indexed as IntPoint, it also must have a "numfield" indexed as SortedNumericDocValuesField. In other words, there will be consistency between data-structures on a per-field across all the documents of an index.

But this will be from version 9.0. Your point is still valid for 8.x

 

 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant