Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: add ignore missing field to text chunking processors #907

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

IanMenendez
Copy link
Contributor

Description

Adds ignore missing field to text chunking processors

Related Issues

#906

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me

DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need a few changes to the integration test.

@yuye-aws
Copy link
Member

Thanks for creating this PR @IanMenendez ! Please also update CHANGELOG.md. We are after the 2.17 code freeze date in https://opensearch.org/releases.html, 2.18 will be the first release for you change to take effect.

@yuye-aws
Copy link
Member

@zhichao-aws Can you help attach the backport 2.x label to this PR. Also, after @IanMenendez deal with the review comments, please approve the CI workflow.

Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. The integration tests also passed. Thanks for your contribution.

@yuye-aws
Copy link
Member

Pinging @zane-neo @zhichao-aws @martin-gaievski for review.

@vibrantvarun
Copy link
Member

vibrantvarun commented Sep 18, 2024

@IanMenendez Please add BWC tests
cc: @yuye-aws

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from minor comments my main concerns are:

  • extra integ test, please move test cases to uni tests
  • field name, I suggested one in the issue

@yuye-aws
Copy link
Member

yuye-aws commented Sep 18, 2024

@IanMenendez Please add BWC tests cc: @yuye-aws

@IanMenendez The BWC test file for text_chunking processor are located in

  1. qa/restart-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/TextChunkingProcessorIT.java
  2. qa/rolling-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/TextChunkingProcessorIT.java
    You can take a look whether the BWC tests are needed.

Ian Menendez and others added 3 commits September 21, 2024 17:22
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
@IanMenendez
Copy link
Contributor Author

IanMenendez commented Sep 21, 2024

@IanMenendez Please add BWC tests cc: @yuye-aws

Why would BWC tests be needed for this feature?

@yuye-aws
Copy link
Member

+1 I don't think BWC tests are needed

@yuye-aws
Copy link
Member

@martin-gaievski Can you help review again?

@martin-gaievski
Copy link
Member

@IanMenendez thanks for addressing most comments. Please check one of previous comments, I've unresolved conversation.

In future, can you please reply to each comment:

  • whether you're making changes or not, just ask will be good
  • if you're making the change please state whether you agree with suggestion/request, and if not please put your reasoning

Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
@martin-gaievski
Copy link
Member

To avoid failures in BWC please update you main so it got code from #916, then rebase on main

Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
@@ -45,6 +48,16 @@ public TextChunkingProcessor create(
) throws Exception {
Map<String, Object> fieldMap = readMap(TYPE, processorTag, config, FIELD_MAP_FIELD);
Map<String, Object> algorithmMap = readMap(TYPE, processorTag, config, ALGORITHM_FIELD);
return new TextChunkingProcessor(processorTag, description, fieldMap, algorithmMap, environment, clusterService, analysisRegistry);
boolean ignoreMissing = readBooleanProperty(TYPE, processorTag, config, IGNORE_MISSING, DEFAULT_IGNORE_MISSING);
return new TextChunkingProcessor(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about applying builder pattern and create a DTO object to form TextChunkingProcessor Object? Currently, argument size in the TextChunkingProcessor looks too long.

Thoughts @martin-gaievski ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this in a separate PR aiming refactoring of plugin classes, for scope of this PR I think we're good with adding just a field.

@martin-gaievski
Copy link
Member

@IanMenendez can you please create a documentation PR, we need to add the description for new field. You can kick off the process from this page

@IanMenendez
Copy link
Contributor Author

@IanMenendez can you please create a documentation PR, we need to add the description for new field. You can kick off the process from this page

There is one already approved here opensearch-project/documentation-website#8266

martin-gaievski
martin-gaievski previously approved these changes Oct 1, 2024
@martin-gaievski martin-gaievski added backport 2.x Label will add auto workflow to backport PR to 2.x branch v2.18.0 labels Oct 1, 2024
@martin-gaievski martin-gaievski dismissed their stale review October 1, 2024 18:14

noticed that changelog entry is missing

@martin-gaievski
Copy link
Member

martin-gaievski commented Oct 1, 2024

@IanMenendez could you please move your entry in CHANGELOG file under 2.x/Enhancements category. Every Feature change should go through additional security review, but for Enhancement there is no such requirement

Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
@IanMenendez
Copy link
Contributor Author

@martin-gaievski changed changelog

@yuye-aws
Copy link
Member

yuye-aws commented Oct 2, 2024

Running the CI. Will approve it once all the checks have been passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch v2.18.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants