Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-5632] cluster: Reset cloud storage metric in the cluster::partition::remove_persistent_state #21581

Merged
merged 1 commit into from
Jul 24, 2024

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Jul 23, 2024

The probe has access to the global state so it's not safe to keep it after the partition is stopped and remove_persistent_state method is invoked. In cases when the partition is recreated with different revision id the 'double metric registration' error could be triggered.

When the exception is triggered the partition_manager is not able to remove the partition from the collection and subsequent re-creation of the partition with different ntp produces assertion because ntp already exists.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

andrwng
andrwng previously approved these changes Jul 23, 2024
Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a pointer safety perspective. But a couple questions still

@@ -902,6 +902,7 @@ partition::get_cloud_term_last_offset(model::term_id term) const {
}

ss::future<> partition::remove_persistent_state() {
_cloud_storage_probe = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this belongs in partition::stop()? Also wondering if we should be calling _cloud_storage_probe->clear_metrics(), as we do with the partition::_probe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added clear metrics method

@Lazin Lazin changed the title cluster: Reset cloud storage metric in the cluster::partition::remove_persistent_state [CORE-5632] cluster: Reset cloud storage metric in the cluster::partition::remove_persistent_state Jul 24, 2024
... in the partition::remove_persistent state. The probe has access to
the global state so it's not safe to keep it after the partition is
stopped and `remove_persistent_state` method is invoked. In cases when
the partition is recreated with different revision id the 'double metric
registration' error could be triggered.

The double metric registration exception prevents normal parittion
removal and subsequent partition creation may trigger assertion.
@Lazin Lazin merged commit 78e5023 into redpanda-data:dev Jul 24, 2024
20 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants