-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: make cluster default read/write properties "sticky" #6950
Conversation
73ae5af
to
94647fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I think we should update disk_log_impl::override_retention_config
to remove the usage cloud_storage_enable_remote_write
and use whatever is in the ntp_config
. Might be worth waiting for #6951 to merge though.
94647fc
to
f7c3a39
Compare
Agreed, added a commit that does that and rebased on dev since #6951 merged |
f7c3a39
to
9dd6401
Compare
Oops, the last force push picked up some unrelated commits, re-pushed. |
9dd6401
to
5d72c7d
Compare
Force push: update |
Code looks fine, but there's two tests failing:
|
This commit changes the code that determines what shadow indexing mode to use when a topic is created. Previously, if no remote write or read configs were specified via topic configs, the mode was left unitiliased and other subsystems decided its value independently based on cluster level configs. Now we always initiliase such with the cluster configs as defaults.
This commit removes all usages of the remote read/write cluster configs in the archival subsystem. Instead, we now always use whatever configuration is stored in the topic table as the source of truth. Essentially, the cluster level configs act as defaults and the topic level configs are overrides.
This commit updates the description of the remote read and write cluster level configs to reflect their new behaviour: defaults at topic creation.
We no longer need to check the global defaults at this point: during the upgrade migration, the SI mode will have been stored in the topic.
These are now simply read during topic creation (or upgrade), so runtime changes are handled implicitly.
Previously this could generate spurious "Nodes report restart required" messages if the cluster was newley restarted, and some nodes were slow to get config statuses, because the final check (for restart status) uses a different call into the admin API than the earlier check for config version. Fix + simplify by using a wait_until_result to keep the last polled config status for the final check on restart status.
This validates behavior in 22.3 where the current state of cluster-wide tiered storage settings is applied persistently to each topic that exists at the time of upgrade.
Test ShadowIndexingGlobalConfig.test_overrides_set was setting remote.read to false twice, instead of setting remote.read to false and remote.write to false. Apparently this used to somehow result in a subsequent describe seeing false for both properties, but that seems like a bug.
...for sticky tiered storage configuration: the defaults at cluster level now only apply at topic creation time, and remain set on that topic thereafter. Related: redpanda-data#6917
…on_active We now explicitly set cloud storage settings in the topic config, the cluster defaults don't matter.
it is no longer the case that a topic with remote.write=false will have shadow indexing enabled when the cluster property is true.
This is a subtle upgrade case. When a 22.3 node starts and has not yet executed the migration to move tiered storage enablement flags from global defaults onto the topic, the archival_metadata_stm::max_collectible_offset method will naively return offset::max (i.e. giving storage-gc permission to proceed without waiting for uploads). Fix this by reinstating the legacy behavior (of assuming tiered storage is enabled if the global default is true) conditionally in the case where we are in the middle of an upgrade (i.e. when 22.3 feature flag is not yet active).
5d72c7d
to
2d8348d
Compare
Pushed:
|
Similar to the previous commit in archival state machine, when the cluster is in an upgrading state, we need to apply old-style rules about the SI flags, to avoid passing through a state where we temporarily tell clients that there is no data earlier than the raft contents (this manifests as a client doing list_offsets during the upgrade and seeing their start_offset jump forward to the raft logs's start offset)
Push:
The failure for this last commit is in test_rolling_upgrade but not deterministic, reproduced when running lots of test runs in parallel on my workstation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Nice catch on applying the old behaviour during upgrades.
Cover letter
Instead of treating remote.read and remote.write as properties that can be overridden by topics or can be null and thereby get cluster defaults (including updating when those defaults change), we will now always store an explicit value for the properties on the topic, and the cluster defaults will simply control what that explicit value defaults to at time of topic creation (or at time of upgrade, for clusters coming from <= 22.2)
This is based on #6876
Fixes #6917
Backport Required
UX changes
See release notes
Release notes
Improvements
cloud_storage_enable_remote_[read|write]
are now applied to topics at creation time, and if they subsequently change, then existing topics' properties do not change. To modify the tiered storage mode of existing topics, you may set theredpanda.remote.[read|write]
properties on the topic.