Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Block a vector field to have invalid characters for a physical file n… #1982

Merged
merged 1 commit into from
Aug 16, 2024

Conversation

0ctopus13prime
Copy link
Contributor

@0ctopus13prime 0ctopus13prime commented Aug 16, 2024

…ame.

Description

Issue : #1859.

Issue

While OpenSearch does allow for a field name to have an empty space within it and it disallows an empty space to be contained in a physical file name, KNNCodecUtil::buildEngineFileName uses the field name directly as a part of a vector file name. As a result, in case where the field name had one of disallowed character for a physical file name, it fails in validation of BlobStoreIndexShardSnapshot. For example, _0_2011_my vector.hnswc (where 'my vector' is the field name). As a result, BlobStoreIndexShardSnapshot throws an exception complaining file name is not valid.

Solution

Add a validation logic to throw an exception in case provided vector field name has any invalid characters.

private void validateFullFieldName(BuilderContext context) {
    final String fullFieldName = buildFullName(context);
    for (char ch : fullFieldName.toCharArray()) {
        if (Strings.INVALID_FILENAME_CHARS.contains(ch)) {
            throw new IllegalArgumentException(...);
        }
    }
}


public abstract class KNNVectorFieldMapper extends ParametrizedFieldMapper {
    ...
    public static class Builder extends ParametrizedFieldMapper.Builder {
        @Override
        public KNNVectorFieldMapper build(BuilderContext context) {
            validateFullFieldName(context);
            ...

Related Issues

Issue : #1859.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ame.

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>
@0ctopus13prime
Copy link
Contributor Author

Change log already had the bug fix #1936. I did not include it in this PR.

@jmazanec15
Copy link
Member

@0ctopus13prime I think we need to backport change log as well

@0ctopus13prime
Copy link
Contributor Author

@0ctopus13prime I think we need to backport change log as well

Hi @jmazanec15
Just wondering, does this have the line we need already..?? 🤔 - Link
Please let me know if I missed sth.

@jmazanec15 jmazanec15 merged commit f3e644c into opensearch-project:2.x Aug 16, 2024
96 of 109 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants