-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config: add "legacy defaults" capability and decrease partitions per shard default #9197
Conversation
This is a mechanism for us to set improved default values for cluster configuration properties, without impacting already-deployed systems. Systems with an older original_version in the feature table will apply the legacy default.
This enables the configuration object to apply any special legacy-only defaults
...while leaving a legacy default to retain the old value for old clusters, in case anyone was running a special system with more partitions than this (e.g. on very fast CPU cores that can handle it).
This relied on the legacy 7000 partitions per shard default.
4fdc45f
to
04c076e
Compare
@@ -88,16 +88,18 @@ def test_cpu_limited(self): | |||
|
|||
rpk = RpkTool(self.redpanda) | |||
|
|||
# Three nodes, each with 1 core, 7000 partition-replicas | |||
# per core, so with replicas=3, 7000 partitions should be the limit | |||
# Three nodes, each with 1 core, 1000 partition-replicas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily this test, but it'd be great to add an upgrade test that validates that this config doesn't change over jumping from 23.1.1.
|
||
// An alternative default that applies if the cluster's original logical | ||
// version is <= the defined version | ||
const std::optional<legacy_default<T>> _legacy_default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't have to be now, but I can imagine this eventually evolving into a list of defaults as we continue to tune certain defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, should be quick to add as/when we need it.
Historically it has been tough to change certain config defaults because it may be unsafe to e.g. decrease a limit if there are systems in the wild that are already beyond the new limit.
This PR adds a new mechanism where early in startup, the feature table notifies the configuration object of the original logical version, and the configuration properties may override their
_default
based on that version.That mechanism is then used to reduce the partitions per shard default to 1000. This change is important because the current limits are unsafe, both for raw partition count issues, and also for the default limits on tiered storage reader limits, which are driven by the max partition count.
The main limitation to the mechanism used here is that it will only help with configs that are read after cluster bootstrap happens, or in the case of a node joining a cluster, after the new node sees initial controller log messages. This is suitable for the particular config being handled in this PR, but it is important that any future uses of the mechanism bear this in mind (there is a boldface comment by the definition of
legacy_version
struct)Fixes #9179
Backports Required
Release Notes
Improvements
topic_partitions_per_shard
) by default. Legacy clusters will continue to use the legacy limit (7000), although it is recommended to modify configuration on legacy clusters to use a lower limit, to avoid risk of instability if excessive partitions are created.