Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow invalid characters for physical file name to be included within vector field name. #1936

Merged
merged 3 commits into from
Aug 12, 2024

Conversation

0ctopus13prime
Copy link
Contributor

Description

Issue : #1859.

Issue

While OpenSearch does allow for a field name to have an empty space within it and it disallows an empty space to be contained in a physical file name, KNNCodecUtil::buildEngineFileName uses the field name directly as a part of a vector file name. As a result, in case where the field name had one of disallowed character for a physical file name, it fails in validation of BlobStoreIndexShardSnapshot. For example, _0_2011_my vector.hnswc (where 'my vector' is the field name). As a result, BlobStoreIndexShardSnapshot throws an exception complaining file name is not valid.

Solution

Add a validation logic to throw an exception in case provided vector field name has any invalid characters.

private void validateFullFieldName(BuilderContext context) {
    final String fullFieldName = buildFullName(context);
    for (char ch : fullFieldName.toCharArray()) {
        if (Strings.INVALID_FILENAME_CHARS.contains(ch)) {
            throw new IllegalArgumentException(...);
        }
    }
}


public abstract class KNNVectorFieldMapper extends ParametrizedFieldMapper {
    ...
    public static class Builder extends ParametrizedFieldMapper.Builder {
        @Override
        public KNNVectorFieldMapper build(BuilderContext context) {
            validateFullFieldName(context);
            ...

Related Issues

Issue : #1859.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

heemin32
heemin32 previously approved these changes Aug 7, 2024
…ame.

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>
heemin32
heemin32 previously approved these changes Aug 7, 2024
navneet1v
navneet1v previously approved these changes Aug 9, 2024
Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. Please fix the conflicts.

ryanbogan
ryanbogan previously approved these changes Aug 10, 2024
Copy link
Member

@ryanbogan ryanbogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Signed-off-by: Doo Yong Kim <0ctopus13prime@gmail.com>
heemin32
heemin32 previously approved these changes Aug 12, 2024
…ame.

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>
@navneet1v navneet1v merged commit f5ba771 into opensearch-project:main Aug 12, 2024
29 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1936-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f5ba77114ef662e91a8ce26838159f383931912c
# Push it to GitHub
git push --set-upstream origin backport/backport-1936-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1936-to-2.x.

@0ctopus13prime 0ctopus13prime deleted the fix-bug branch August 12, 2024 19:17
0ctopus13prime added a commit to 0ctopus13prime/k-NN that referenced this pull request Aug 16, 2024
…hin vector field name. (opensearch-project#1936)

* Block a vector field to have invalid characters for a physical file name.

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>

---------

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>
Signed-off-by: Doo Yong Kim <0ctopus13prime@gmail.com>
Co-authored-by: Dooyong Kim <kdooyong@amazon.com>
(cherry picked from commit f5ba771)
akashsha1 pushed a commit to akashsha1/k-NN that referenced this pull request Sep 16, 2024
…hin vector field name. (opensearch-project#1936)

* Block a vector field to have invalid characters for a physical file name.

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>

* Block a vector field to have invalid characters for a physical file name.

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>

---------

Signed-off-by: Dooyong Kim <kdooyong@amazon.com>
Signed-off-by: Doo Yong Kim <0ctopus13prime@gmail.com>
Co-authored-by: Dooyong Kim <kdooyong@amazon.com>
Signed-off-by: Akash Shankaran <akash.shankaran@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants