Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (TimeoutError in wait_for_partitions_rebalanced) in ScalingUpTest.test_adding_nodes_to_cluster #11042

Closed
dlex opened this issue May 25, 2023 · 0 comments · Fixed by #11054
Assignees
Labels

Comments

@dlex
Copy link
Contributor

dlex commented May 25, 2023

https://buildkite.com/redpanda/redpanda/builds/29872#018851b3-5ba0-4acf-ac99-63298506a279

Module: rptest.tests.scaling_up_test
Class:  ScalingUpTest
Method: test_adding_nodes_to_cluster
Arguments:
{
  "partition_count": 1
}
test_id:    rptest.tests.scaling_up_test.ScalingUpTest.test_adding_nodes_to_cluster.partition_count=1
status:     FAIL
run time:   2 minutes 59.458 seconds


    TimeoutError('')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 49, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/scaling_up_test.py", line 162, in test_adding_nodes_to_cluster
    self.wait_for_partitions_rebalanced(total_replicas=total_replicas,
  File "/root/tests/rptest/tests/scaling_up_test.py", line 113, in wait_for_partitions_rebalanced
    wait_until(partitions_rebalanced,
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

This one is different from #10024 in the partition distribution, which is perfect in this case ([1,1,2]) but the test criteria fails to recognize this.

replicas per domain per node: {-1: {1: 5, 2: 5, 3: 6}, 0: {1: 1, 2: 1, 3: 2}}

$mean({1,1,2}) * [0.8, 1.2] = [1.0(6), 1.(6)]$ fits neither 1 nor 2

@dlex dlex added kind/bug Something isn't working ci-failure labels May 25, 2023
@dlex dlex self-assigned this May 25, 2023
dlex added a commit to dlex/redpanda that referenced this issue May 26, 2023
In the criteria to determine that the partitions are now balanced,
check whether max(replicas per node) is greater than min(replicas
per node) by no more than 1, in this case the partitions are considered
balanced without checking the 20% tolerance range.

This enables distributions like [1,1,2] to be always successful regardless
of the tolerance range.

Fixes redpanda-data#11042
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue May 31, 2023
In the criteria to determine that the partitions are now balanced,
check whether max(replicas per node) is greater than min(replicas
per node) by no more than 1, in this case the partitions are considered
balanced without checking the 20% tolerance range.

This enables distributions like [1,1,2] to be always successful regardless
of the tolerance range.

Fixes redpanda-data#11042

(cherry picked from commit 2ce2238)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue May 31, 2023
In the criteria to determine that the partitions are now balanced,
check whether max(replicas per node) is greater than min(replicas
per node) by no more than 1, in this case the partitions are considered
balanced without checking the 20% tolerance range.

This enables distributions like [1,1,2] to be always successful regardless
of the tolerance range.

Fixes redpanda-data#11042

(cherry picked from commit 2ce2238)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue May 31, 2023
In the criteria to determine that the partitions are now balanced,
check whether max(replicas per node) is greater than min(replicas
per node) by no more than 1, in this case the partitions are considered
balanced without checking the 20% tolerance range.

This enables distributions like [1,1,2] to be always successful regardless
of the tolerance range.

Fixes redpanda-data#11042

(cherry picked from commit 2ce2238)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant