-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cluster setting for shard movement strategy #4955
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this PR. It's a big help. I made some suggestions and included a few questions. Other than that, it looks good. I'll have another look after questions have been addressed.
@@ -102,6 +102,7 @@ The following request field parameters are compatible with the cluster API. | |||
| cluster.routing.allocation.include.<attribute> | Enum | Allocates shards to a node whose `attribute` has at least one of the included comma-separated values. | | |||
| cluster.routing.allocation.require.<attribute> | Enum | Only allocates shards to a node whose `attribute` has all of the included comma-separated values. | | |||
| cluster.routing.allocation.exclude.<attribute> | Enum | Does not allocate shards to a node whose `attribute` has any of the included comma-separated values. The cluster allocation settings support the following built-in attributes: <br /> <br /> `_name` – Match nodes by node name. <br /> <br /> `_host_ip` – Match nodes by host IP address. <br /> <br /> `_publish_ip` – Match nodes by publish IP address. <br /> <br /> `_ip` – Match either `_host_ip` or `_publish_ip`. <br /> <br /> `_host` – Match nodes by hostname. <br /> <br /> `_id` – Match nodes by node ID. <br /> <br /> `_tier` – Match nodes by data tier role. | | |||
| cluster.routing.allocation.shard_movement_strategy | Enum | Decides order in which to move shards from node when shards can not stay on node anymore. Supports the following strategies: <br /> <br /> `NO_PREFERENCE` - default behavior in which order of shard movement doesn't matter. <br /> <br /> `PRIMARY_FIRST` - primary shards are moved first. <br /> <br /> `REPLICA_FIRST` - replica shards are moved first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions that users might have from original content:
- Do we need to mention any performance issues in the description of this setting? Take, for example, jainankitk's comment: "The primary first behavior can impact the speed of relocation, especially where some nodes only have replicas and they won't contribute to relocation initially. Hence, the default is best effort node level primary first and can be enforced across cluster level using primary first setting." And shortly after, knize's comment: "I updated the commit message to be more descriptive of the problem this change addresses. Can we get a separate PR to add some more thorough documentation of this new cluster setting and behavior (including the performance as a function of scale?)" Do we need to explain why you would want to order primary shards before replica shards, or replica shards before primary shards, as a concern for performance?
NO_PREFERENCE
: When the order of shard movement doesn't matter, is there still some logic that determines how the shards are moved? Or is it random? If not random, can we still explain how the system will move the shards (what dictates the order in this scenario)?
I made some minor suggestions for the content we have here. Thanks.
| cluster.routing.allocation.shard_movement_strategy | Enum | Decides order in which to move shards from node when shards can not stay on node anymore. Supports the following strategies: <br /> <br /> `NO_PREFERENCE` - default behavior in which order of shard movement doesn't matter. <br /> <br /> `PRIMARY_FIRST` - primary shards are moved first. <br /> <br /> `REPLICA_FIRST` - replica shards are moved first. | |
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `NO_PREFERENCE` - default behavior in which order of shard movement doesn't matter. <br /> <br /> `PRIMARY_FIRST` - primary shards are relocated first, before replica shards. <br /> <br /> `REPLICA_FIRST` - replica shards are moved first, before primary shards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- These settings were introduced to deal with specific scenarios.
PRIMARY_FIRST
prioritizes the primary of all shards first so the cluster does not become red if any excluded nodes (based on a cluster setting) were to go down before relocating other shards.REPLICA_FIRST
prioritizes the replica of all shards first so the cluster does not become red in a mixed version cluster if primaries are relocated to higher version nodes and send segments on a higher OS version to replicas that they cannot read, subsequently leading to shard failure.
Note that with this setting enabled performance of this change is a direct function of number of indices, shards, replicas, and nodes. The larger the indices, replicas, and distribution scale, the slower the allocation becomes. This should be used with care. This is true for both the above settings.
- No logic that determines shard movement order. It is random.
We can add in some information from (1) regarding performance and use cases for the settings. I had omitted that since the table of request field parameters didn't seem the right place to add in all this information - but if it's preferred that we add it here I'm good with including some more.
Let me know and I'll make the changes including your suggestion :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Poojita-Raj Thanks for adding context here. I think including something about these two scenarios would be worthwhile. Have a look at these revisions to the setting descriptions and let me know if I misunderstood your explanations (I'm not familiar with cluster health and shard allocation. So this is highly possible):
PRIMARY_FIRST
-- Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if nodes excluded from shard relocation happened to go down during the process.
- I'm not sure what the nodes are excluded from. Are they excluded from shard relocation? Which cluster setting are we talking about here?
- In the original, your mentioned "before relocating other shards". I'm not sure what these "other shards" are. Are these primary or replica shards? Are these the replica shards that would get second priority to the primaries in this configuration?
REPLICA_FIRST
-- Replica shards are moved first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a cluster that includes nodes on different versions of OpenSearch. In this situation, primary shards relocated to later-version nodes could try to copy segment files to replica shards on an earlier version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters.
- Is this the idea here, to relocate the replica shards first and then relocate primary shards (including those on newer versions of OpenSearch) and remove the possibility of segment file read errors?
NO_PREFERENCE
- The default behavior in which the order of shard relocation has no importance.
Let's include these descriptions in this order. It seems the purpose of adding this new setting is to give users an important choice to relocate shards one way or another. Let's present them with the options and then, last, tell them the default doesn't consider order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cwillum Thanks for the suggestions! I'll change the order of the settings and include the descriptions for each setting.
PRIMARY_FIRST
- (1) When you exclude a group of nodes from a cluster, the shards on those nodes are moved to other nodes in the cluster. cluster.routing.allocation.exclude.
is the setting used for this. For the purpose of this setting, can be referred to as the relocating nodes.
(2) Yes, other shards refers to replica shards in this case that would get second priority.
REPLICA_FIRST
- (1) Yes, that's exactly it.
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
9952ea0
to
4e24859
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some last suggestions. After this, I think this update will be ready for a final editorial review.
And thank you!
@@ -102,6 +102,7 @@ The following request field parameters are compatible with the cluster API. | |||
| cluster.routing.allocation.include.<attribute> | Enum | Allocates shards to a node whose `attribute` has at least one of the included comma-separated values. | | |||
| cluster.routing.allocation.require.<attribute> | Enum | Only allocates shards to a node whose `attribute` has all of the included comma-separated values. | | |||
| cluster.routing.allocation.exclude.<attribute> | Enum | Does not allocate shards to a node whose `attribute` has any of the included comma-separated values. The cluster allocation settings support the following built-in attributes: <br /> <br /> `_name` – Match nodes by node name. <br /> <br /> `_host_ip` – Match nodes by host IP address. <br /> <br /> `_publish_ip` – Match nodes by publish IP address. <br /> <br /> `_ip` – Match either `_host_ip` or `_publish_ip`. <br /> <br /> `_host` – Match nodes by hostname. <br /> <br /> `_id` – Match nodes by node ID. <br /> <br /> `_tier` – Match nodes by data tier role. | | |||
| cluster.routing.allocation.shard_movement_strategy | Enum | Decides order in which to move shards from node when shards can not stay on node anymore. Supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed version, segment replication enabled OpenSearch cluster. In this situation, primary shards relocated to newer OpenSearch version nodes could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cluster.routing.allocation.shard_movement_strategy | Enum | Decides order in which to move shards from node when shards can not stay on node anymore. Supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed version, segment replication enabled OpenSearch cluster. In this situation, primary shards relocated to newer OpenSearch version nodes could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance. | |
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if the relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed-version, segment-replication-enabled OpenSearch cluster. In this situation, primary shards relocated to OpenSearch nodes of a newer version could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance. |
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Poojita-Raj @cwillum Please see my comments and changes (the changes are just changing the hyphens to en dashes and an added "the") and let me know if you have any questions. Thanks!
@@ -102,6 +102,7 @@ The following request field parameters are compatible with the cluster API. | |||
| cluster.routing.allocation.include.<attribute> | Enum | Allocates shards to a node whose `attribute` has at least one of the included comma-separated values. | | |||
| cluster.routing.allocation.require.<attribute> | Enum | Only allocates shards to a node whose `attribute` has all of the included comma-separated values. | | |||
| cluster.routing.allocation.exclude.<attribute> | Enum | Does not allocate shards to a node whose `attribute` has any of the included comma-separated values. The cluster allocation settings support the following built-in attributes: <br /> <br /> `_name` – Match nodes by node name. <br /> <br /> `_host_ip` – Match nodes by host IP address. <br /> <br /> `_publish_ip` – Match nodes by publish IP address. <br /> <br /> `_ip` – Match either `_host_ip` or `_publish_ip`. <br /> <br /> `_host` – Match nodes by hostname. <br /> <br /> `_id` – Match nodes by node ID. <br /> <br /> `_tier` – Match nodes by data tier role. | | |||
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if the relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed-version, segment-replication-enabled OpenSearch cluster. In this situation, primary shards relocated to OpenSearch nodes of a newer version could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if the relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed-version, segment-replication-enabled OpenSearch cluster. In this situation, primary shards relocated to OpenSearch nodes of a newer version could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance. | |
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `PRIMARY_FIRST` -- Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if the relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` -- Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed-version, segment-replication-enabled OpenSearch cluster. In this situation, primary shards relocated to OpenSearch nodes of a newer version could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` -- The default behavior in which the order of shard relocation has no importance. |
@@ -102,6 +102,7 @@ The following request field parameters are compatible with the cluster API. | |||
| cluster.routing.allocation.include.<attribute> | Enum | Allocates shards to a node whose `attribute` has at least one of the included comma-separated values. | | |||
| cluster.routing.allocation.require.<attribute> | Enum | Only allocates shards to a node whose `attribute` has all of the included comma-separated values. | | |||
| cluster.routing.allocation.exclude.<attribute> | Enum | Does not allocate shards to a node whose `attribute` has any of the included comma-separated values. The cluster allocation settings support the following built-in attributes: <br /> <br /> `_name` – Match nodes by node name. <br /> <br /> `_host_ip` – Match nodes by host IP address. <br /> <br /> `_publish_ip` – Match nodes by publish IP address. <br /> <br /> `_ip` – Match either `_host_ip` or `_publish_ip`. <br /> <br /> `_host` – Match nodes by hostname. <br /> <br /> `_id` – Match nodes by node ID. <br /> <br /> `_tier` – Match nodes by data tier role. | | |||
| cluster.routing.allocation.shard_movement_strategy | Enum | Determines the order in which shards are relocated from outgoing to incoming nodes. This setting supports the following strategies: <br /> <br /> `PRIMARY_FIRST` - Primary shards are relocated first, before replica shards. This prioritization may help prevent a cluster's health from going red if the relocating nodes go down during the process. <br /> <br /> `REPLICA_FIRST` - Replica shards are relocated first, before primary shards. This prioritization may help prevent a cluster's health from going red when carrying out shard relocation in a mixed-version, segment-replication-enabled OpenSearch cluster. In this situation, primary shards relocated to OpenSearch nodes of a newer version could try to copy segment files to replica shards on an older version of OpenSearch, which would result in shard failure. Relocating replica shards first attempts to avoid this from happening in multi-version clusters. <br /> <br /> `NO_PREFERENCE` - The default behavior in which order of shard relocation has no importance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under PRIMARY_FIRST
: "This prioritization may help prevent a cluster's health status from going red if the relocating nodes fail during the process"?
Under REPLICA_FIRST
: "a cluster's health status"? Also, the last sentence of this description is slightly awkward. Do we mean something like "Relocating replica shards first may help to avoid this in multi-version clusters"?
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
@Poojita-Raj Thanks for taking care of the final suggested changes. This looks good. I've merged the PR. Your contribution and efforts on this issue are appreciated. |
@cwillum Thanks Chris, happy to help! |
* Add cluster setting for shard movement strategy Signed-off-by: Poojita Raj <poojiraj@amazon.com> * changes to doc Signed-off-by: Poojita Raj <poojiraj@amazon.com> * added suggestion Signed-off-by: Poojita Raj <poojiraj@amazon.com> * changes to en-dash Signed-off-by: Poojita Raj <poojiraj@amazon.com> --------- Signed-off-by: Poojita Raj <poojiraj@amazon.com>
) * Add cluster setting for shard movement strategy Signed-off-by: Poojita Raj <poojiraj@amazon.com> * changes to doc Signed-off-by: Poojita Raj <poojiraj@amazon.com> * added suggestion Signed-off-by: Poojita Raj <poojiraj@amazon.com> * changes to en-dash Signed-off-by: Poojita Raj <poojiraj@amazon.com> --------- Signed-off-by: Poojita Raj <poojiraj@amazon.com>
* Add cluster setting for shard movement strategy Signed-off-by: Poojita Raj <poojiraj@amazon.com> * changes to doc Signed-off-by: Poojita Raj <poojiraj@amazon.com> * added suggestion Signed-off-by: Poojita Raj <poojiraj@amazon.com> * changes to en-dash Signed-off-by: Poojita Raj <poojiraj@amazon.com> --------- Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Description
Adds cluster setting for shard movement strategy that was missing from the docs.
Issues Resolved
Resolves #4827
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.