Partition balancer: rack awareness constraint repair #6845

ztlpn · 2022-10-20T14:25:54Z

Cover letter

Add rack awareness repair to the partition balancer.

Add the partition_balancer_state class that captures the controller state needed for the balancer. This class maintains a set of ntps that have rack awareness constraint violated (i.e. more than one replica in a rack). In the balancer we go over this set and (if there are suitable racks) try to schedule repairing moves.

Fixes #6355

TODO: add a "number of ntps with violated constraint" metric

Backport Required

not a bug fix

UX changes

Rack awareness constraint repair is added to partition balancing in the continuous mode. For a given partition balancer will try to move excess replicas from racks that have more than one replica to racks where there are none.

Release notes

Features

Added rack awareness constraint repair in the continuous partition balancing mode.

This is a class that stores state that is needed for functioning of the partition balancer. This commit also wires it up to topic_updates_dispatcher and adds code maintaining a set of ntps that have rack awareness constraint violated.

Replication factor now is anyway calculated from the number of replicas of partition 0 so we don't need the metadata object if we have the set of replicas.

ZeDRoman

Looks great!
Have some minor questions

ZeDRoman · 2022-10-21T08:47:25Z

src/v/cluster/partition_balancer_planner.cc

+/// the ntp is replicated to, we try to schedule a move. For each rack we
+/// arbitrarily choose the first appearing replica to remain there (note: this
+/// is probably not optimal choice).
+void partition_balancer_planner::get_rack_constraint_repair_reassignments(


Maybe we should add rack_constraint violations into partition_balancer_violations?

I thought about it but decided against it - a full list of partitions in the violations doesn't make sense (could be thousands), but for the number of violations it makes more sense to have it as a metric - easier to observe and alert on.

ZeDRoman · 2022-10-21T08:49:40Z

tests/rptest/tests/partition_balancer_test.py

+            ns.make_unavailable(node)
+            self.wait_until_ready(expected_unavailable_node=node)
+
+            self.redpanda.start_node(self.redpanda.nodes[4])


Why not just fixing failed node?

Because it is harder for the balancer :) (the movements introduced by node-add are interfering a bit).

mmaslankaprv · 2022-10-25T11:18:10Z

src/v/cluster/partition_balancer_planner.cc

+/// is probably not optimal choice).
+void partition_balancer_planner::get_rack_constraint_repair_reassignments(
+  plan_data& result, reallocation_request_state& rrs) {
+    if (_state.ntps_with_broken_rack_constraint().empty()) {


nit: this condition is already check by the caller

I don't think this is true - we can end here e.g. if there are some unavailable nodes but no violating ntps. Although it might make sense to make it so! Would be easier to read

Reshuffled the main function a bit, not sure if this is much cleaner, but the idea is that a planner pass should decide itself if it needs to run or not, but we also want to avoid loading ntp sizes in the happy case (this is why we need an early exit if there are no violations).

src/v/cluster/partition_balancer_planner.cc

dotnwat · 2022-10-25T21:41:18Z

src/v/cluster/CMakeLists.txt

@@ -118,6 +118,7 @@ v_cc_library(
    remote_topic_configuration_source.cc
    partition_balancer_planner.cc
    partition_balancer_backend.cc
+    partition_balancer_state.cc


this seems like a nice clean up--consolidating state.

Yeah that was the idea. Although there is not much consolidation right now, we can use this class to store some balancing-specific indexes (e.g. node -> ntp map). Will be helpful when we eventually will need to get rid of those "iterate over all ntps" loops.

get rid of those "iterate over all ntps" loops.
😍

1) don't log happy path of submitting reassignments/cancellations 2) move planned reassignments/cancellation logging to planner and add more context: previous replicas and reason.

mmaslankaprv

lgtm

ZeDRoman

lgtm

ztlpn · 2022-10-28T11:47:50Z

Unrelated test failure: #6991

restarted

c/partition_allocator: add is_rack_awareness_enabled method

da9cafa

ztlpn requested review from dotnwat, mmaslankaprv and ZeDRoman October 20, 2022 14:25

github-actions bot added the area/redpanda label Oct 20, 2022

mmedenjak added the kind/enhance New feature or request label Oct 20, 2022

ztlpn force-pushed the rack-awareness-repair branch from 00cc743 to 3b20cf1 Compare October 20, 2022 16:56

ztlpn added 2 commits October 20, 2022 19:59

c/partition_balancer: add partition_balancer_state

a6316b1

This is a class that stores state that is needed for functioning of the partition balancer. This commit also wires it up to topic_updates_dispatcher and adds code maintaining a set of ntps that have rack awareness constraint violated.

c/partition_balancer_planner: get rid of the metadata param

670b318

Replication factor now is anyway calculated from the number of replicas of partition 0 so we don't need the metadata object if we have the set of replicas.

ztlpn force-pushed the rack-awareness-repair branch from 3b20cf1 to f8e3e41 Compare October 20, 2022 16:59

ZeDRoman reviewed Oct 21, 2022

View reviewed changes

mmaslankaprv reviewed Oct 25, 2022

View reviewed changes

src/v/cluster/partition_balancer_planner.cc Outdated Show resolved Hide resolved

mmaslankaprv reviewed Oct 25, 2022

View reviewed changes

src/v/cluster/partition_balancer_planner.cc Outdated Show resolved Hide resolved

dotnwat reviewed Oct 25, 2022

View reviewed changes

ztlpn added 7 commits October 28, 2022 02:01

c/partition_balancer: schedule moves repairing rack awareness constraint

5f00b51

ut/partition_balancer: easier creation of nodes with defined rack ids

7c35407

ut/partition_balancer: add rack awareness repair unit test

34aea6b

tests: add PartitionBalancerTest.test_rack_constraint_repair

4fb4ba5

c/partition_balancer: improve logging

6e41138

1) don't log happy path of submitting reassignments/cancellations 2) move planned reassignments/cancellation logging to planner and add more context: previous replicas and reason.

c/partition_balancer: partition_balancer_state unit tests

cebb295

c/partition_balancer: early return in planner main func

04e041e

ztlpn force-pushed the rack-awareness-repair branch from 67f0d60 to 04e041e Compare October 27, 2022 23:02

mmaslankaprv requested review from mmaslankaprv October 28, 2022 09:05

mmaslankaprv approved these changes Oct 28, 2022

View reviewed changes

ZeDRoman approved these changes Oct 28, 2022

View reviewed changes

ztlpn merged commit b6721de into redpanda-data:dev Oct 28, 2022

ztlpn deleted the rack-awareness-repair branch November 27, 2023 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partition balancer: rack awareness constraint repair #6845

Partition balancer: rack awareness constraint repair #6845

ztlpn commented Oct 20, 2022 •

edited

Loading

ZeDRoman left a comment

ZeDRoman Oct 21, 2022

ztlpn Oct 24, 2022

ZeDRoman Oct 21, 2022

ztlpn Oct 24, 2022

mmaslankaprv Oct 25, 2022

ztlpn Oct 27, 2022

ztlpn Oct 27, 2022

dotnwat Oct 25, 2022

ztlpn Oct 27, 2022 •

edited

Loading

dotnwat Oct 27, 2022

mmaslankaprv left a comment

ZeDRoman left a comment

ztlpn commented Oct 28, 2022

Partition balancer: rack awareness constraint repair #6845

Partition balancer: rack awareness constraint repair #6845

Conversation

ztlpn commented Oct 20, 2022 • edited Loading

Cover letter

Backport Required

UX changes

Release notes

Features

ZeDRoman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ztlpn Oct 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmaslankaprv left a comment

Choose a reason for hiding this comment

ZeDRoman left a comment

Choose a reason for hiding this comment

ztlpn commented Oct 28, 2022

ztlpn commented Oct 20, 2022 •

edited

Loading

ztlpn Oct 27, 2022 •

edited

Loading