You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For an index with greater than or equal to 2 replica shards, the INDEX_DELAYED_NODE_LEFT_TIMEOUT_SETTING is not getting honoured when EXISTING_SHARDS_ALLOCATOR_BATCH_MODE is enabled. When the nodes (on which the replica shards are allocated) drop from the cluster, the replica shards get allocated to different nodes instead of being delayed for the time specified in the index setting.
This incorrect allocation only occurs when more than 1 replica shards of a shardID are unassigned due to node drops. In the batch mode enabled setting, allocation decision is being made and executed for only one out all the replica shards belonging to a shardID, and thereby the left over replica shards are not getting marked as ignored during ReplicaShardBatchAllocator run. The subsequent run of BalancedShardAllocator (which runs after ReplicaShardBatchAllocator) ends up allocating those unassigned replica shards, which should instead have been delayed had the decision been taken and executed.
Describe the bug
For an index with greater than or equal to 2 replica shards, the
INDEX_DELAYED_NODE_LEFT_TIMEOUT_SETTING
is not getting honoured whenEXISTING_SHARDS_ALLOCATOR_BATCH_MODE
is enabled. When the nodes (on which the replica shards are allocated) drop from the cluster, the replica shards get allocated to different nodes instead of being delayed for the time specified in the index setting.This incorrect allocation only occurs when more than 1 replica shards of a shardID are unassigned due to node drops. In the batch mode enabled setting, allocation decision is being made and executed for only one out all the replica shards belonging to a shardID, and thereby the left over replica shards are not getting marked as ignored during
ReplicaShardBatchAllocator
run. The subsequent run ofBalancedShardAllocator
(which runs afterReplicaShardBatchAllocator
) ends up allocating those unassigned replica shards, which should instead have been delayed had the decision been taken and executed.OpenSearch/server/src/main/java/org/opensearch/gateway/ShardsBatchGatewayAllocator.java
Lines 215 to 220 in 581fcd2
Related component
Cluster Manager
To Reproduce
60m
) ofINDEX_DELAYED_NODE_LEFT_TIMEOUT_SETTING
for the index created as part of step 2.Expected behavior
The replica shards should remain unassigned for the duration specified in the index's
INDEX_DELAYED_NODE_LEFT_TIMEOUT_SETTING
.Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: