Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add faster scaling composite hash value encoding for remote path #13155

Merged
merged 7 commits into from
Apr 16, 2024

Conversation

ashking94
Copy link
Member

@ashking94 ashking94 commented Apr 11, 2024

Description

This is one of the tasks to achieve #12589 as part of the feature request #12567. This is only an increment step to conclude the optimised prefix path work proposed in the feature request.

In this PR, we have introduced a composite encoding for the 64 bits hash value. We use multiple attributes like index uuid, shard id, data category (segment/translog) and data type (data/metadata/lock_files) for generating a 64 bit hash value. Now, based on tests performed during the development, we have found a encoding that works best when the index shard counts or indexing rate increases on a cluster. The encoding uses 1st 6 bits to generate URL safe base 64 character and uses rest of the 14 bits as binary string equivalent. This encoded value as the prefix or infix depending of the path type selected at the cluster level. Currently, we are making this encoding value as default. We have levers in code to set the hash algorithm in index metadata to allow us in future to give a dynamic cluster setting that can be used to set the hash algorithm or the hash value encoding from multiple options.

Simulation steps done using AWS S3 -

  1. Create S3 bucket and upload at very high rate at all possible characters from URL safe base64 charset. This will warm up and increase the overall maximum request rate capacity for the prefix. We see throttling for couple of hours until the throttling stops. Please note at this point we have not started our opensearch cluster.
  2. Now we start our opensearch cluster with the same bucket we use above. We see we are able to support multiple active shards. Now, if we get more indexes (or more shards) created gradually, we are able to scale up S3 request rate capacity faster with binary encoding than the base64 encoding. The time is considerably lower in comparison to base64.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • [ ] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • [ ] Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for a6a30c7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Apr 11, 2024

Compatibility status:

Checks if related components are compatible with change d1ed1ad

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

… path

Signed-off-by: Ashish Singh <ssashish@amazon.com>
@ashking94 ashking94 changed the title Add exponential scaling FNV composite value hash algorithm for remote path Add faster scaling FNV composite value hash algorithm for remote path Apr 12, 2024
@ashking94 ashking94 changed the title Add faster scaling FNV composite value hash algorithm for remote path Add faster scaling composite hash value encoding for remote path Apr 12, 2024
Copy link
Contributor

❌ Gradle check result for b2ec3ef: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Copy link
Contributor

❌ Gradle check result for b8b1fb4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Copy link
Contributor

❕ Gradle check result for 9e86fad: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.classMethod
      1 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.testResizeQueueDown

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Apr 12, 2024

Codecov Report

Attention: Patch coverage is 89.28571% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 71.50%. Comparing base (b15cb0c) to head (d1ed1ad).
Report is 162 commits behind head on main.

Files Patch % Lines
.../org/opensearch/index/remote/RemoteStoreUtils.java 72.72% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13155      +/-   ##
============================================
+ Coverage     71.42%   71.50%   +0.08%     
- Complexity    59978    60674     +696     
============================================
  Files          4985     5039      +54     
  Lines        282275   285360    +3085     
  Branches      40946    41328     +382     
============================================
+ Hits         201603   204044    +2441     
- Misses        63999    64494     +495     
- Partials      16673    16822     +149     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ashking94
Copy link
Member Author

❕ Gradle check result for 9e86fad: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.classMethod
      1 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.testResizeQueueDown

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Flaky test - #11979

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Copy link
Contributor

✅ Gradle check result for 80f4ffa: SUCCESS

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Copy link
Contributor

❌ Gradle check result for 122bcd3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Copy link
Contributor

✅ Gradle check result for d1ed1ad: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants