Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for perfect replicas distribution in ScalingUpTest.test_adding_nodes_to_cluster #11054

Merged
merged 1 commit into from
May 31, 2023

Conversation

dlex
Copy link
Contributor

@dlex dlex commented May 26, 2023

In the criteria to determine that the partitions are now balanced, check whether max(replicas per node) is greater than min(replicas per node) by no more than 1, in this case the partitions are considered balanced without checking the 20% tolerance range.

This enables distributions like [1,1,2] to be always successful regardless of the tolerance range.

Fixes #11042.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

In the criteria to determine that the partitions are now balanced,
check whether max(replicas per node) is greater than min(replicas
per node) by no more than 1, in this case the partitions are considered
balanced without checking the 20% tolerance range.

This enables distributions like [1,1,2] to be always successful regardless
of the tolerance range.

Fixes redpanda-data#11042
@dlex dlex self-assigned this May 26, 2023
@dlex dlex marked this pull request as ready for review May 26, 2023 15:44
@dlex
Copy link
Contributor Author

dlex commented May 27, 2023

/ci-repeat 1
skip-unit
skip-rebase
dt-repeat=5
tests/rptest/tests/services_self_test.py::KgoRepeaterSelfTest.test_kgo_repeater
tests/rptest/tests/controller_log_limiting_test.py::ControllerLogLimitMirrorMakerTests.test_mirror_maker_with_limits

@dotnwat
Copy link
Member

dotnwat commented May 30, 2023

/ci-repeat 5
skip-unit
dt-repeat=100
tests/rptest/tests/scaling_up_test.py

@dotnwat
Copy link
Member

dotnwat commented May 30, 2023

ci-repeat 1
skip-unit
skip-rebase
dt-repeat=5
tests/rptest/tests/services_self_test.py::KgoRepeaterSelfTest.test_kgo_repeater
tests/rptest/tests/controller_log_limiting_test.py::ControllerLogLimitMirrorMakerTests.test_mirror_maker_with_limits

@dlex @mmaslankaprv i re-started a ci-repeat job, but I was a little curious about why these tests were being run. the code changed in this PR appear to only affect the scaling_up_test.py file. am i missing something here?

@dlex
Copy link
Contributor Author

dlex commented May 30, 2023

but I was a little curious about why these tests were being run. the code changed in this PR appear to only affect the scaling_up_test.py file

Those are the 2 tests that are feiling in the CI in this PR fairly stably (basically 100%). I've checked with no-rebase to exclude upstream influence. Now I'm busy proving that the failures are not related, e.g. here

@dotnwat
Copy link
Member

dotnwat commented May 30, 2023

Those are the 2 tests that are feiling in the CI in this PR fairly stably (basically 100%). I've checked with no-rebase to exclude upstream influence. Now I'm busy proving that the failures are not related, e.g. #10865 (comment)

Thanks for the context @dlex, that makes sense. But they were existing failures? Do you have a hunch that your changes are related?

@dlex
Copy link
Contributor Author

dlex commented May 30, 2023

Thanks for the context @dlex, that makes sense. But they were existing failures? Do you have a hunch that your changes are related?

I can't see how they can be related. However even though the failures were existing, they never were as stable as in this PR. So better be safe than sorry right?

@dotnwat
Copy link
Member

dotnwat commented May 30, 2023

RuntimeError: Storage usage inconsistency on nodes ['3:docker-rp-3']: max difference 0.06281437747399583 on node 3:docker-rp-3

These CI failures can be ignored in the context of this PR.

@dotnwat dotnwat merged commit 17b370b into redpanda-data:dev May 31, 2023
@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v22.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v22.2.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (TimeoutError in wait_for_partitions_rebalanced) in ScalingUpTest.test_adding_nodes_to_cluster
4 participants