Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled #5332

Merged

Conversation

Rishikesh1159
Copy link
Member

@Rishikesh1159 Rishikesh1159 commented Nov 22, 2022

Description

This PR adds logic of triggering a round of replication during peer recovery before shard is marked as STARTED. It fixes the bug of newly added replica shards falling behind primary shard until an operation is performed on index when segment replication is enabled. More detail about bug is present on issue #5313.

Solution used to fix bug

With segment replication enabled when a new replica is added to cluster it goes through peer recovery process. During this recovery process after peer recovery process is completed and before replica shard is marked as STARTED, we are triggering a replication event from replica to copy all latest segment from primary shard.

Issues Resolved

Resolves #5313

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.index.ShardIndexingPressureConcurrentExecutionTests.testCoordinatingPrimaryThreadedUpdateToShardLimitsAndRejections

@codecov-commenter
Copy link

codecov-commenter commented Nov 22, 2022

Codecov Report

Merging #5332 (3e5a7c5) into main (438369c) will decrease coverage by 0.13%.
The diff coverage is 14.81%.

@@             Coverage Diff              @@
##               main    #5332      +/-   ##
============================================
- Coverage     71.06%   70.93%   -0.14%     
+ Complexity    58136    58092      -44     
============================================
  Files          4704     4704              
  Lines        277244   277270      +26     
  Branches      40137    40142       +5     
============================================
- Hits         197025   196669     -356     
- Misses        64095    64544     +449     
+ Partials      16124    16057      -67     
Impacted Files Coverage Δ
...s/replication/SegmentReplicationTargetService.java 62.26% <0.00%> (-0.60%) ⬇️
...ch/indices/cluster/IndicesClusterStateService.java 64.98% <15.38%> (-2.63%) ⬇️
...n/indices/forcemerge/ForceMergeRequestBuilder.java 0.00% <0.00%> (-75.00%) ⬇️
.../indices/forcemerge/TransportForceMergeAction.java 25.00% <0.00%> (-75.00%) ⬇️
...adonly/AddIndexBlockClusterStateUpdateRequest.java 0.00% <0.00%> (-75.00%) ⬇️
...pensearch/client/cluster/RemoteConnectionInfo.java 0.00% <0.00%> (-73.18%) ⬇️
...a/org/opensearch/client/cluster/SniffModeInfo.java 0.00% <0.00%> (-58.83%) ⬇️
...readonly/TransportVerifyShardIndexBlockAction.java 9.75% <0.00%> (-58.54%) ⬇️
...a/org/opensearch/client/cluster/ProxyModeInfo.java 0.00% <0.00%> (-55.00%) ⬇️
...n/admin/indices/readonly/AddIndexBlockRequest.java 17.85% <0.00%> (-53.58%) ⬇️
... and 486 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

for (int i = 0; i < 10; i++) {
client().prepareIndex(INDEX_NAME).setId(Integer.toString(i)).setSource("field", "value" + i).execute().actionGet();
}
logger.info("--> flush so we have an actual index");
Copy link
Member

@dreamer-89 dreamer-89 Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean by actual index -> index/segment files on disk ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me change the terminology here. actual index might be confusing.

*/
public void testAddNewReplica() throws Exception {
logger.info("--> starting [node1] ...");
final String node_1 = internalCluster().startNode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Usually I find it better to call nodes by their role. This makes it easier to understand when we perform any node specific actions (e.g. restart(primary), stop (replica) etc). Otherwise, we need to look back when node_i was created and its role.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense. I will rename both nodes accordingly

// is marked as Started.
if (indexShard.indexSettings().isSegRepEnabled()
&& shardRouting.primary() == false
&& ShardRoutingState.RELOCATING != shardRouting.state()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this condition be ShardRoutingState.STARTED == shardRouting.state() ? Existing condition applies for UNASSIGNED and INITIALIED shards, is that correct ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think ShardRoutingState.RELOCATING != shardRouting.state() is an edge case check we are doing, so that relocating shard doesn't receive any checkpoints.

For ShardRoutingState.STARTED == shardRouting.state() this check will be false at this point, because we are performing a round of replication before marking shard as STARTED. So, shard routing will never be in STARTED state at this point.

Yes existing conditions works for INITIALIZED shard routing state. ShardRoutingState.INITIALIZED will be shard routing state at this point. Not sure if shard routing state will be UNASSIGNED, after peer recovery is completed usually shard routing will be in INITIALIZED state.

)
);
if (sendShardFailure == true) {
logger.error("replication failure", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: These are logged at debug level on failShard call. May be we can remove it from here

@andrross
Copy link
Member

Minor, but I would change the commit message/PR title to explain what you've done, as opposed to the side effect you're fixing. Something like "Trigger a round of replication during recovery" or whatever makes sense. In the description you can describe the bug you're fixing and any other details, but the message header should be a clear and concise description of what is changed.

@Rishikesh1159
Copy link
Member Author

Minor, but I would change the commit message/PR title to explain what you've done, as opposed to the side effect you're fixing. Something like "Trigger a round of replication during recovery" or whatever makes sense. In the description you can describe the bug you're fixing and any other details, but the message header should be a clear and concise description of what is changed.

Thanks @andrross for pointing out. Sure, what you said makes sense. I will update the commit message and PR title.

…ication is enabled.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Merge branch 'seg-rep/force-replication' of https://github.com/Rishikesh1159/OpenSearch into seg-rep/force-replication
@Rishikesh1159 Rishikesh1159 changed the title [Segment Replication] Fix bug of newly added replica shards falling behind primary when segment replication is enabled [Segment Replication] Trigger a round of replication during peer recovery when segment replication is enabled Nov 22, 2022
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT.test {yaml=pit/10_basic/Delete all}

@Rishikesh1159 Rishikesh1159 changed the title [Segment Replication] Trigger a round of replication during peer recovery when segment replication is enabled [Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled Nov 22, 2022
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Member

@dreamer-89 dreamer-89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Rishikesh1159 for this quick fix. LGTM!

@dblock dblock requested a review from mch2 November 23, 2022 16:15
IndexShard indexShard = (IndexShard) indexService.getShardOrNull(shardRouting.id());
// For Segment Replication enabled indices, we want replica shards to start a replication event to fetch latest segments before it
// is marked as Started.
if (indexShard.indexSettings().isSegRepEnabled()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need a null check here given you are invoking getShardOrNull above.

Copy link
Member Author

@Rishikesh1159 Rishikesh1159 Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. Sure, I will add null check.

final String primary = internalCluster().startNode();

logger.info("--> creating test index ...");
prepareCreate(INDEX_NAME, Settings.builder().put("index.number_of_shards", 1).put("index.number_of_replicas", 1)).get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the actual settings instead of strings - IndexMetadata.SETTING_NUMBER_OF_SHARDS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure

* We don't perform any refresh on index and assert new replica shard on doc hit count.
* This test makes sure that when a new replica is added to an existing cluster it gets all latest segments from primary even without a refresh.
*/
public void testAddNewReplica() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is very similar to testStartReplicaAfterPrimaryIndexesDocs, can we reuse that test? That test currently indexes a doc after the replica is recovered to force another round of replication, but you could assert the doc count is sync'd on line 412 after ensureGreen().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you right. Let me see if we can reuse it

Copy link
Member

@mch2 mch2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few small changes required here - particularly the null check in handleRecoveryDone

IndexShard indexShard = (IndexShard) indexService.getShardOrNull(shardRouting.id());
// For Segment Replication enabled indices, we want replica shards to start a replication event to fetch latest segments before it
// is marked as Started.
if (indexShard.indexSettings().isSegRepEnabled()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also read the setting from indexSettings before fetching a reference to the IndexShard.

indexService.getIndexSettings().isSegRepEnabled().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can add that

);
if (sendShardFailure == true) {
logger.error("replication failure", e);
indexShard.failShard("replication failure", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reuse handleRecoveryFailure here instead of this added block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err sorry I'm off here, we'll need both indexShard.failShard("replication failure", e); that fails the engine, followed by handleRecoveryFailure which removes the shard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that note - could you pls add test here for the failure case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is important. Thanks for catching this. I will update it and an unit/integ test for failure case.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2022

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2022

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2022

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2022

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2022

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2022

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2022

Gradle Check (Jenkins) Run Completed with:

Copy link
Member

@mch2 mch2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a nit so approving. Thanks for this change.

);
}
} else {
shardStateAction.shardStarted(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - this is now invoked 3x. You could clean this up by using a StepListener that when completes marks the shard started.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mch2 sure I can do that

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@Rishikesh1159 Rishikesh1159 merged commit 0cf6797 into opensearch-project:main Dec 12, 2022
@Rishikesh1159 Rishikesh1159 added the backport 2.x Backport to 2.x branch label Dec 12, 2022
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-5332-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0cf67979064c6c8be95299911db0d1bf1ea5ed68
# Push it to GitHub
git push --set-upstream origin backport/backport-5332-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-5332-to-2.x.

Rishikesh1159 added a commit to Rishikesh1159/OpenSearch that referenced this pull request Dec 12, 2022
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
ryanbogan pushed a commit that referenced this pull request Dec 13, 2022
…ds during peer recovery when segment replication is enabled (#5332)

* Fix new added replica shards falling behind primary.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Trigger a round of replication during peer recovery when segment replication is enabled.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Remove unnecessary start replication overloaded method.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Add test for failure case and refactor some code.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Apply spotless check.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Addressing comments on the PR.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Remove unnecessary condition check.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Apply spotless check.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Add step listeners to resolve forcing round of segment replication.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Rishikesh1159 added a commit that referenced this pull request Dec 13, 2022
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
saratvemulapalli pushed a commit that referenced this pull request Dec 15, 2022
…ature/identity (#5581)

* Fix flaky ShardIndexingPressureConcurrentExecutionTests (#5439)

Add conditional check on assertNull to fix flaky tests.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Fix bwc for cluster manager throttling settings (#5305)

Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>

* Update ingest-attachment plugin dependencies: Apache Tika 3.6.0, Apache Mime4j 0.8.8, Apache Poi 5.2.3, Apache PdfBox 2.0.27 (#5448)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Enhance CheckpointState to support no-op replication (#5282)

* CheckpointState enhanced to support no-op replication

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Co-authored-by: Bukhtawar Khan<bukhtawa@amazon.com>

* [BUG] org.opensearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky: randomizing basePath (#5482)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* [Bug] fix case sensitivity for wildcard queries (#5462)

Fixes the wildcard query to not normalize the pattern when case_insensitive is
set by the user. This is achieved by creating a new normalizedWildcardQuery
method so that query_string queries (which do not support case sensitivity) can
still normalize the pattern when the default analyzer is used; maintaining
existing behavior.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>

* Support OpenSSL Provider with default Netty allocator (#5460)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Revert "build no-jdk distributions as part of release build (#4902)" (#5465)

This reverts commit 8c9ca4e.

It seems that this wasn't entirely the correct way and is currently
blocking us from removing the `build.sh` from the `opensearch-build`
repository (i.e. this `build.sh` here is not yet being used).
See the discussion in opensearch-project/opensearch-build#2835 for
further details.

Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com>

Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com>

* Add max_shard_size parameter for Shrink API (fix supported version after backport) (#5503)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Sync CODEOWNERS with MAINTAINERS. (#5501)

Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>

Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>

* Added jackson dependency to server (#5366)

* Added jackson dependency to server

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Updated CHANGELOG

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Update build.gradle files

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add RuntimePermission to fix errors

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix flaky test BulkIntegrationIT.testDeleteIndexWhileIndexing (#5491)

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* Add release notes for 2.4.1 (#5488)

Signed-off-by: Xue Zhou <xuezhou@amazon.com>

Signed-off-by: Xue Zhou <xuezhou@amazon.com>

* Properly skip OnDemandBlockSnapshotIndexInputTests.testVariousBlockSize on Windows. (#5511)

PR #5397 skipped this test in @before block but still
frequently throws a TestCouldNotBeSkippedException.  This is caused by the after block still executing and throwing  an exception
while cleaning the directory created at the path in @before.  Moving the assumption to the individual test prevents this exception by ensuring the path exists.

Signed-off-by: Marc Handalian <handalm@amazon.com>

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Merge first batch of feature/extensions into main (#5347)

* Merge first batch of feature/extensions into main

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fixed CHANGELOG

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fixed newline errors

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Renaming and CHANGELOG fixes

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Refactor extension loading into private method

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Removed skipValidation and added connectToExtensionNode method

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Remove unnecessary feature flag calls

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Renaming and exception handling

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Change latches to CompletableFuture

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Removed unnecessary validateSettingKey call

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix azure-core dependency

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Update SHAs

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Remove unintended dependency changes

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Removed dynamic settings regitration, removed info() method, and added NoopExtensionsManager

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add javadoc

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fixed spotless failure

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Removed NoopExtensionsManager

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Added functioning NoopExtensionsManager

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Added missing javadoc

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Remove forbiddenAPI

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix spotless

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Change logger.info to logger.error in handleException

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Fix ExtensionsManagerTests

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Removing unrelated change

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Update SHAs

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Bump commons-compress from 1.21 to 1.22 (#5520)

Bumps commons-compress from 1.21 to 1.22.

---
updated-dependencies:
- dependency-name: org.apache.commons:commons-compress
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled (#5332)

* Fix new added replica shards falling behind primary.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Trigger a round of replication during peer recovery when segment replication is enabled.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Remove unnecessary start replication overloaded method.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Add test for failure case and refactor some code.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Apply spotless check.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Addressing comments on the PR.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Remove unnecessary condition check.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Apply spotless check.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Add step listeners to resolve forcing round of segment replication.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Adding support to register settings dynamically (#5495)

* Adding support to register settings dynamically

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Update CHANGELOG

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Removed unnecessary registerSetting methods

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Change setting registration order

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add unregisterSettings method

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Remove unnecessary feature flag

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Updated 1.3.7 release notes date (#5536)

Signed-off-by: owaiskazi19 <owaiskazi19@gmail.com>

Signed-off-by: owaiskazi19 <owaiskazi19@gmail.com>

* Pre conditions check before updating weighted routing metadata (#4955)

* Pre conditions check to allow weight updates for non decommissioned attribute

Signed-off-by: Rishab Nahata <rnnahata@amazon.com>

* Atomically update cluster state with decommission status and corresponding action (#5093)

* Atomically update the cluster state with decommission status and its corresponding action in the same execute call

Signed-off-by: Rishab Nahata <rnnahata@amazon.com>

* Update Netty to 4.1.86.Final (#5529)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Update release date in 2.4.1 release notes (#5549)

Signed-off-by: Suraj Singh <surajrider@gmail.com>

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Update 2.4.1 release notes (#5552)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Refactor fuzziness interface on query builders (#5433)

* Refactor Object to Fuzziness type for all query builders

Signed-off-by: noCharger <lingzhichu.clz@gmail.com>

* Revise on bwc

Signed-off-by: noCharger <lingzhichu.clz@gmail.com>

* Update change log

Signed-off-by: noCharger <lingzhichu.clz@gmail.com>

Signed-off-by: noCharger <lingzhichu.clz@gmail.com>
Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>

* Upgrade lucene version (#5570)

* Added bwc version 2.4.2

Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>

* Added 2.4.2.

Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>

* Update Lucene snapshot to 9.5.0-snapshot-d5cef1c

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Update changelog entry

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add 2.4.2 bwc version

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Internal changes post lucene upgrade

Signed-off-by: Suraj Singh <surajrider@gmail.com>

Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Co-authored-by: opensearch-ci-bot <opensearch-ci-bot@users.noreply.github.com>
Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>

* Add CI bundle pattern to distribution download (#5348)

* Add CI bundle pattern for ivy repo

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

* Gradle update

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

* Extract path

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

* Change with customDistributionDownloadType

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

* Add default for exception handle

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

* Add documentations

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

* Bump protobuf-java from 3.21.9 to 3.21.11 in /plugins/repository-hdfs (#5519)

* Bump protobuf-java from 3.21.9 to 3.21.11 in /plugins/repository-hdfs

Bumps [protobuf-java](https://github.com/protocolbuffers/protobuf) from 3.21.9 to 3.21.11.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py)
- [Commits](protocolbuffers/protobuf@v3.21.9...v3.21.11)

---
updated-dependencies:
- dependency-name: com.google.protobuf:protobuf-java
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Updating SHAs

Signed-off-by: dependabot[bot] <support@github.com>

* Updated changelog

Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
Co-authored-by: Owais Kazi <owaiskazi19@gmail.com>
Co-authored-by: Suraj Singh <surajrider@gmail.com>

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Signed-off-by: Dhwanil Patel <dhwanip@amazon.com>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com>
Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Xue Zhou <xuezhou@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: owaiskazi19 <owaiskazi19@gmail.com>
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: noCharger <lingzhichu.clz@gmail.com>
Signed-off-by: Zelin Hao <zelinhao@amazon.com>
Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>
Co-authored-by: Rishikesh Pasham <62345295+Rishikesh1159@users.noreply.github.com>
Co-authored-by: Dhwanil Patel <dhwanip@amazon.com>
Co-authored-by: Andriy Redko <andriy.redko@aiven.io>
Co-authored-by: Ashish <ssashish@amazon.com>
Co-authored-by: Nick Knize <nknize@apache.org>
Co-authored-by: Ralph Ursprung <39383228+rursprung@users.noreply.github.com>
Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>
Co-authored-by: Ryan Bogan <10944539+ryanbogan@users.noreply.github.com>
Co-authored-by: Poojita Raj <poojiraj@amazon.com>
Co-authored-by: Xue Zhou <85715413+xuezhou25@users.noreply.github.com>
Co-authored-by: Marc Handalian <handalm@amazon.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Owais Kazi <owaiskazi19@gmail.com>
Co-authored-by: Rishab Nahata <rnnahata@amazon.com>
Co-authored-by: Suraj Singh <surajrider@gmail.com>
Co-authored-by: Louis Chu <lingzhichu.clz@gmail.com>
Co-authored-by: opensearch-ci-bot <opensearch-ci-bot@users.noreply.github.com>
Co-authored-by: Zelin Hao <87548827+zelinh@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
mch2 pushed a commit to mch2/OpenSearch that referenced this pull request Mar 4, 2023
…ds during peer recovery when segment replication is enabled (opensearch-project#5332)

* Fix new added replica shards falling behind primary.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Trigger a round of replication during peer recovery when segment replication is enabled.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Remove unnecessary start replication overloaded method.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Add test for failure case and refactor some code.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Apply spotless check.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Addressing comments on the PR.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Remove unnecessary condition check.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Apply spotless check.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Add step listeners to resolve forcing round of segment replication.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
5 participants