Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reassign topics uniformly after partition is added #9622

Merged
merged 5 commits into from
Mar 31, 2023

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Mar 22, 2023

Using reservoir sampling to chose the replicas that are going to be
reallocated to the new node provides better uniformity of topic replicas
distribution across the cluster. Additionally reservoir sampling
algorithm isn't prone to not selecting enough reallocations.

Fixes #7756

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Improvements

  • Topics are more uniformly distributed across the cluster after new nodes are added

@mmaslankaprv mmaslankaprv force-pushed the uniform-assignments branch 4 times, most recently from c604a7c to 969a8be Compare March 23, 2023 15:26
@mmaslankaprv mmaslankaprv changed the title Uniform assignments Reassign topics uniformly after partition is added Mar 23, 2023
@mmaslankaprv mmaslankaprv marked this pull request as ready for review March 23, 2023 19:37
@piyushredpanda piyushredpanda added this to the v23.1.4 milestone Mar 27, 2023
@piyushredpanda
Copy link
Contributor

@vshtokman : Added this to v23.1.4 (technically speaking the backport PR should be in that milestone rather than this PR thats targeted to dev, but just so we don't lose track that this is needed for v23.1.4).

src/v/cluster/tests/backend_reallocation_strategy_test.cc Outdated Show resolved Hide resolved
src/v/cluster/tests/backend_reallocation_strategy_test.cc Outdated Show resolved Hide resolved
src/v/cluster/members_backend.cc Outdated Show resolved Hide resolved
domain,
target_replicas_per_node,
diff);
err += std::abs(diff);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: squaring the diff here is probably better because if we have, say, node partition counts 0, 2 (with the target count of 3), this is worse than 1, 1 (and the least_allocated constraint will prefer the latter), even though they have the same error in the L1 metric.

src/v/cluster/members_backend.cc Outdated Show resolved Hide resolved
src/v/cluster/members_backend.cc Show resolved Hide resolved
src/v/cluster/members_backend.cc Show resolved Hide resolved
src/v/cluster/members_backend.cc Outdated Show resolved Hide resolved
src/v/cluster/members_backend.cc Outdated Show resolved Hide resolved
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Introduced a reallation strategy type to make reallocation code testable
and customizable outside of the members backend.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@ztlpn
Copy link
Contributor

ztlpn commented Mar 30, 2023

hmm, scaling_up_test still failing?

Added simple test validating correctness of backend reallcation
strategy.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Using reservoir sampling to chose the replicas that are going to be
reallocated to the new node provides better uniformity of topic replicas
distribution across the cluster. Additionally reservoir sampling
algorithm isn't prone to not selecting enough reallocations.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@mmaslankaprv mmaslankaprv merged commit 8519b76 into redpanda-data:dev Mar 31, 2023
@mmaslankaprv
Copy link
Member Author

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x fd7caa83afd96c55bb235516ea0efc374713038e a30853922fcf64e641ed9cf6475f88a35a572048 d5716ab4b6491e67ed652a8699df6fb32fc60a4b 7059fd1f93a6845363855845b4c386f32a124d01 d2456cc21da7b5a8428a8f5c2aa7fdb950c3662f

Workflow run logs.

@mmaslankaprv
Copy link
Member Author

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x fd7caa83afd96c55bb235516ea0efc374713038e a30853922fcf64e641ed9cf6475f88a35a572048 d5716ab4b6491e67ed652a8699df6fb32fc60a4b 7059fd1f93a6845363855845b4c386f32a124d01 d2456cc21da7b5a8428a8f5c2aa7fdb950c3662f

Workflow run logs.

@ZeDRoman
Copy link
Contributor

ZeDRoman commented Apr 6, 2023

Backport only with fix #9870

};

using node_replicas_map_t = absl::node_hash_map<model::node_id, node_replicas>;
static absl::node_hash_map<model::node_id, node_replicas>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you are inside an anonymous namespace you don't need to mark the function as file scope to prevent it from becoming an exported symbol

@vshtokman
Copy link
Contributor

@mmaslankaprv , I think this still needs a v23.1.x backport.

@mmaslankaprv
Copy link
Member Author

This is already backported to v23.1.x

vshtokman added a commit that referenced this pull request Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (partitions_rebalanced times out) in ScalingUpTest.test_on_demand_rebalancing
7 participants