Decommissioning can be very slow #9593

travisdowns · 2023-03-21T20:02:40Z

Version & Environment

Redpanda version: v23.1.1

What went wrong?

I am decommissioning what I consider to be a moderately sized node: about ~9 TB of local data across ~240 partition replicas in one topic.

The decommissioning is going very slowly and the speed varies a lot from "slow" to "glacial". It seems likely that the decommissioning may take more than 12 hours. In this particular case, all the decommissioned partitions are being sent to the same node (not weird as this is a new node to replace the old node, so it's empty).

These boxes have ~1.6 GB/s disk bandwidth (and network BW is better than that) so at full speed the decommissioning could be as fast as 1.5 hours. We don't need to achieve full speed but the observed speed seems unnecessarily slow, and can turn some day 2 operations into day 2 and day 3 operations.

What should have happened instead?

Faster decommissioning.

How to reproduce the issue?

Set up a cluster with 1024 partitions in one topic.
Add 9 GB * node count of local data to the partition.
Add a new code and decommission one of the existing nodes.
Observe the progress using decommission-status rpk command and/or metrics.

Additional information

The destination machine is very mostly unloaded during the process.

The decommissioning appears to go in batches: initially a large block of partitions ~50 are all decomm'ing at a similar rate and the speed is slow but not terrible: on the order of 300 MB/s. However, as partitions begin to finish up, new ones are not added and then there is a long period of only a few partitions decomissioning at a very slow speed.

Visually:

For about 30 minutes at the left we are decommissioning at about 280 MB/s but then it drops down and with fewer partitions left we spend nearly 90 minutes at about 26.1 MB/s. Finally there's a short period of only ~3 MB/s where I think 1 partition was left. Finally we start another batch of 50 partitions and the rate goes up to ~315 MB/s.

This pattern sort of repeats though the exact shape is different. If we assume an average speed of 150 MB/s this will take ~17 hours to decommission.

The node is very unloaded, close to zero reactor usage, nothing much going on besides the incoming partitions.

Bamboo link covering the time period of the first few hours of the decommission (some samples missing due to a cortex issue):

https://dev.bamboo.monitoring.dev.vectorized.cloud/grafana/d/vTJl4WBVz/redpanda-clusters-v1-for-15s-samples?orgId=1&from=1679412199363&to=1679428976063&var-node_shard=All&var-aggr_criteria=pod&var-data_cluster=amp13-3&var-node=192.168.1.96%3A9644&viewPanel=23763572007

Part of the problem is that even the "max rate" seems to be slow (300 MB/s), but another part is that we go through long very slow periods of 30 MB/s or less. That seems to be caused by decommissioning 50 partitions at once and not starting the next 50 until the batch of 50 are done: but if the partitions take different amounts of time (which seems to occur even if they are same size) there may be a long period with only a few partitions left in the batch and there seems to be a per-partition speed limit of single-digit MB/s.

Here's an example of a couple of partitions going slowly:

$ rpk redpanda admin brokers decommission-status 9   

NAMESPACE-TOPIC  PARTITION  MOVING-TO  COMPLETION-%  PARTITION-SIZE
kafka/topic1     209        16         37            39026657213
kafka/topic1     214        16         37            39238990637
kafka/topic1     217        16         37            39391682105
kafka/topic1     220        16         37            39311874718
kafka/topic1     227        16         37            39385910922
kafka/topic1     231        16         37            39261363388
kafka/topic1     235        16         37            39379042556
kafka/topic1     240        16         37            39389390533
kafka/topic1     245        16         37            39281047541
kafka/topic1     247        16         37            38941523485
kafka/topic1     252        16         18            39330731667
kafka/topic1     256        16         37            39392051184
kafka/topic1     260        16         37            39210302754
kafka/topic1     266        16         37            39326792488
kafka/topic1     271        16         37            39104305993
kafka/topic1     276        16         37            39171441142
kafka/topic1     277        16         37            39248498860
kafka/topic1     284        16         37            38926404504
kafka/topic1     287        16         36            39237677830
kafka/topic1     292        16         37            39332168217
kafka/topic1     293        16         37            39062109000
kafka/topic1     301        16         37            39206443631
kafka/topic1     303        16         37            39279129525
kafka/topic1     308        16         37            39100686365
kafka/topic1     314        16         37            39280928067
kafka/topic1     317        16         37            39134307597
kafka/topic1     319        16         37            38891406018
kafka/topic1     325        16         37            39096654992
kafka/topic1     331        16         37            38965456571
kafka/topic1     332        16         37            38947477648
kafka/topic1     337        16         37            39102134344
kafka/topic1     342        16         37            39070134351
kafka/topic1     346        16         36            39350365520
kafka/topic1     351        16         37            39158144988
kafka/topic1     358        16         37            38861789994
kafka/topic1     360        16         37            39262596032
kafka/topic1     366        16         37            39183069493
kafka/topic1     367        16         37            39402656376
kafka/topic1     375        16         37            39293402417
kafka/topic1     378        16         37            39302847312
kafka/topic1     383        16         19            39369704445
kafka/topic1     388        16         37            39237937904
kafka/topic1     389        16         37            39235296488
kafka/topic1     396        16         37            39292866890
kafka/topic1     399        16         37            39123371284
kafka/topic1     404        16         37            39282498804
kafka/topic1     408        16         37            39181195703
kafka/topic1     411        16         37            39399661729

Note that nearly all the nodes are at 37% complete but a couple are 18 or 19% complete, almost exactly half the speed.

The text was updated successfully, but these errors were encountered:

travisdowns · 2023-03-22T13:08:18Z

The decommissioning process did complete successfully after ~16 hours. Here's the full view of the write load on the receiving node (starts just before 12:00 and ends at 04:00):

vshtokman · 2023-04-07T13:36:03Z

This should be addressed by adding node-wide recovery throttling.

mmaslankaprv · 2023-04-12T10:02:00Z

Partially addressed with: #9992

bharathv · 2023-05-02T06:09:37Z

Improvements in bandwidth saturation added via #10339

travisdowns added the kind/bug Something isn't working label Mar 21, 2023

travisdowns changed the title ~~Decomissioning can be very slow~~ Decommissioning can be very slow Mar 21, 2023

travisdowns mentioned this issue Mar 22, 2023

Failure to update configuration due to duplicate RPC address leaves cluster broken #9602

Closed

incident-io bot mentioned this issue Mar 22, 2023

Cloud storage unduly favours ABS when inferring backend #9609

Closed

piyushredpanda added kind/bug Something isn't working area/raft and removed kind/bug Something isn't working labels Mar 26, 2023

piyushredpanda assigned bharathv Apr 7, 2023

bharathv closed this as completed May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decommissioning can be very slow #9593

Decommissioning can be very slow #9593

travisdowns commented Mar 21, 2023 •

edited

Loading

travisdowns commented Mar 22, 2023 •

edited

Loading

vshtokman commented Apr 7, 2023

mmaslankaprv commented Apr 12, 2023

bharathv commented May 2, 2023

Decommissioning can be very slow #9593

Decommissioning can be very slow #9593

Comments

travisdowns commented Mar 21, 2023 • edited Loading

Version & Environment

What went wrong?

What should have happened instead?

How to reproduce the issue?

Additional information

travisdowns commented Mar 22, 2023 • edited Loading

vshtokman commented Apr 7, 2023

mmaslankaprv commented Apr 12, 2023

bharathv commented May 2, 2023

travisdowns commented Mar 21, 2023 •

edited

Loading

travisdowns commented Mar 22, 2023 •

edited

Loading