-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require consistency between data-structures on a per-field basis [LUCENE-9334] #10374
Comments
David Smiley (@dsmiley) (migrated from JIRA) The scope of this seems to subsume #10120, so I'm linking. No work was done there AFAIK. |
ASF subversion and git services (migrated from JIRA) Commit d03662c in lucene's branch refs/heads/main from Mayya Sharipova LUCENE-9334 Consistency of field data structures Require consistency between data-structures on a per-field basis A field must be indexed with the same index options and data-structures across As a consequence of this, doc values updates are |
Julie Tibshirani (@jtibshirani) (migrated from JIRA) I noticed a test failure from this commit (I haven't had the chance to dig in): ./gradlew test --tests TestLucene50TermVectorsFormat.testMerge -Dtests.seed=C091C70F50381021 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ml-IN -Dtests.timezone=America/Montevideo -Dtests.asserts=true -Dtests.file.encoding=UTF-8 |
Mayya Sharipova (@mayya-sharipova) (migrated from JIRA) @jtibshiraniThanks for the report on this test failure. I will investigate it. I've temporarily muted the test in #86 |
ASF subversion and git services (migrated from JIRA) Commit 49c7cc1 in lucene's branch refs/heads/main from Mayya Sharipova Fix test that modifies schema (#87) LUCENE-9334 requires that docs have the same schema Relates to #11 |
Mike Drob (@madrob) (migrated from JIRA) I think this is causing SOLR-15360, but I can't say for certain. If there's any chance that somebody can come over and help us understand a bit more, that would be much appreciated. |
David Smiley (@dsmiley) (migrated from JIRA) Mike, the issue you just filed is effectively a duplicate of SOLR-15356 which I spent time debugging for that one. Already solved :-). I sent a message to the dev list about this too the other day. |
Mike Drob (@madrob) (migrated from JIRA) Thanks David! I tried searching the dev list and for existing issues, but it looks like I started with the other end of the failing tests than you did. Thanks for being proactive! |
Dawid Weiss (@dweiss) (migrated from JIRA) TestPerFieldConsistency has been failing on and off recently and it seems relevant to this issue. Those failures are not really reproducible but they do look similar. Is this something known? Should it be annotated as awaiting a fix? Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/30142/
Java: 64bit/jdk-15 -XX:-UseCompressedOops -XX:+UseSerialGC
1 tests failed.
FAILED: org.apache.lucene.document.TestPerFieldConsistency.testDocWithMissingSchemaOptionsThrowsError
Error Message:
java.lang.AssertionError: expected:<4> but was:<0>
Stack Trace:
java.lang.AssertionError: expected:<4> but was:<0>
at __randomizedtesting.SeedInfo.seed([589A8C303E4D61F4:20DDC33151E81C85]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:633)
at org.apache.lucene.document.TestPerFieldConsistency.testDocWithMissingSchemaOptionsThrowsError(TestPerFieldConsistency.java:172)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) |
Ignacio Vera (@iverase) (migrated from JIRA) I had a look and I see the problem with the test. We need to add an IndexWriterConfig with SerialMergeScheduler in order to reproduce the failures: IndexWriterConfig iwc = newIndexWriterConfig();
// Else seeds may not reproduce:
iwc.setMergeScheduler(new SerialMergeScheduler()); Adding that, the following seed reproduces the failure: ./gradlew cleanTest test --tests TestPerFieldConsistency -Dtests.seed=C40258ABF5E76DCB -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=te-IN -Dtests.timezone=SystemV/CST6CDT -Dtests.asserts=true -Dtests.file.encoding=UTF-8 |
Ignacio Vera (@iverase) (migrated from JIRA) The test assumes there will be no merges in the background which is not true. Maybe an easy fix is to disable merges: iwc.setMergePolicy(NoMergePolicy.INSTANCE); |
Dawid Weiss (@dweiss) (migrated from JIRA) I think that's a good first step - I don't know this patch. @mayya-sharipova may have a better insight. |
Mayya Sharipova (@mayya-sharipova) (migrated from JIRA) @dweiss Thanks for raising the failure, and thanks @iverase for investigation. @iverase Indeed, the test assumes no merging. I've created a fix in PR, and will merge it today. |
ASF subversion and git services (migrated from JIRA) Commit b5a77de in lucene's branch refs/heads/main from Mayya Sharipova Fix failures in TestPerFieldConsistency (#125) This test assumes that there is no merging, Relates to LUCENE-9334 |
Adrien Grand (@jpountz) (migrated from JIRA) Is my understanding correct that we still need to look into relaxing these checks for indexes created with version 8.x or earlier before resolving this issue? |
Mayya Sharipova (@mayya-sharipova) (migrated from JIRA) @jpountz For old indices with inconsistent data structures, the current behaviour is following:
I think
What do you think? |
Adrien Grand (@jpountz) (migrated from JIRA) Thanks @mayya-sharipova, this makes sense to me. I wasn't sure if we'd still allow reading old indices with inconsistencies, which felt important. |
Mayya Sharipova (@mayya-sharipova) (migrated from JIRA) @jpountz Thank for your feedback and clarifications, it makes sense. I just need to modify a test org.apache.lucene.facet.taxonomy.directory.TestBackwardsCompatibility::testCreateNewTaxonomy that I disabled, as this test introduces new data structures, and I will be closing this issue. |
Mayya Sharipova (@mayya-sharipova) (migrated from JIRA) Closing this issue, at the progress on taxonomy backwards compatibility test will be tracked in its own issue #10490 |
ASF subversion and git services (migrated from JIRA) Commit 7cb6960 in lucene's branch refs/heads/main from Gautam Worah Category documents added in the Lucene 9.0 taxonomy index use a Using BDV fields with a different "$full_path_binary$" name This commit also enables the back-compat check that was disabled |
Adrien Grand (@jpountz) (migrated from JIRA) Closing after the 9.0.0 release |
ASF subversion and git services (migrated from JIRA) Commit c132bbf in lucene's branch refs/heads/main from Vigya Sharma #11122: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field (#677) Since all documents are required to use the same features (LUCENE-9334) we can |
ASF subversion and git services (migrated from JIRA) Commit a9532f3 in lucene's branch refs/heads/branch_9x from Vigya Sharma #11122: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field (#677) Since all documents are required to use the same features (LUCENE-9334) we can |
In Lucene 8 it was possible to add stored and indexed fields separately to the document (because they are effectively written to different files). In Lucene 9 after this change it does not work any more. E.g.
Does it mean, that I can use only this code?:
In the last case I cannot control stored and index values separately, but have to use a common analyzer. |
Follow-up of https://lists.apache.org/thread.html/r747de568afd7502008c45783b74cc3aeb31dab8aa60fcafaf65d5431%40%3Cdev.lucene.apache.org%3E.
We would like to start requiring consitency across data-structures on a per-field basis in order to make it easier to do the right thing by default: range queries can run faster if doc values are enabled, sorted queries can run faster if points by indexed, etc.
This would be a big change, so it should be rolled out in a major.
Strict validation is tricky to implement, but we should still implement best-effort validation:
Migrated from LUCENE-9334 by Adrien Grand (@jpountz), 2 votes, resolved Jul 16 2021
Linked issues:
Pull requests: apache/lucene-solr#2186
The text was updated successfully, but these errors were encountered: