-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: add ignore missing field to text chunking processors #907
base: main
Are you sure you want to change the base?
[Feature]: add ignore missing field to text chunking processors #907
Conversation
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need a few changes to the integration test.
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
Thanks for creating this PR @IanMenendez ! Please also update |
@zhichao-aws Can you help attach the backport 2.x label to this PR. Also, after @IanMenendez deal with the review comments, please approve the CI workflow. |
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. The integration tests also passed. Thanks for your contribution.
Pinging @zane-neo @zhichao-aws @martin-gaievski for review. |
@IanMenendez Please add BWC tests |
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/resources/processor/chunker/TextChunkingTestMissingFieldDocument.json
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from minor comments my main concerns are:
- extra integ test, please move test cases to uni tests
- field name, I suggested one in the issue
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Show resolved
Hide resolved
@IanMenendez The BWC test file for text_chunking processor are located in
|
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
Why would BWC tests be needed for this feature? |
+1 I don't think BWC tests are needed |
@martin-gaievski Can you help review again? |
@IanMenendez thanks for addressing most comments. Please check one of previous comments, I've unresolved conversation. In future, can you please reply to each comment:
|
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
To avoid failures in BWC please update you main so it got code from #916, then rebase on main |
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
@@ -45,6 +48,16 @@ public TextChunkingProcessor create( | |||
) throws Exception { | |||
Map<String, Object> fieldMap = readMap(TYPE, processorTag, config, FIELD_MAP_FIELD); | |||
Map<String, Object> algorithmMap = readMap(TYPE, processorTag, config, ALGORITHM_FIELD); | |||
return new TextChunkingProcessor(processorTag, description, fieldMap, algorithmMap, environment, clusterService, analysisRegistry); | |||
boolean ignoreMissing = readBooleanProperty(TYPE, processorTag, config, IGNORE_MISSING, DEFAULT_IGNORE_MISSING); | |||
return new TextChunkingProcessor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about applying builder pattern and create a DTO object to form TextChunkingProcessor Object? Currently, argument size in the TextChunkingProcessor looks too long.
Thoughts @martin-gaievski ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do this in a separate PR aiming refactoring of plugin classes, for scope of this PR I think we're good with adding just a field.
@IanMenendez can you please create a documentation PR, we need to add the description for new field. You can kick off the process from this page |
There is one already approved here opensearch-project/documentation-website#8266 |
noticed that changelog entry is missing
@IanMenendez could you please move your entry in CHANGELOG file under 2.x/Enhancements category. Every |
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
@martin-gaievski changed changelog |
Running the CI. Will approve it once all the checks have been passed |
Description
Adds ignore missing field to text chunking processors
Related Issues
#906
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.