Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete orphan files for topics. #8185

Merged
merged 2 commits into from
Jan 20, 2023

Conversation

ZeDRoman
Copy link
Contributor

@ZeDRoman ZeDRoman commented Jan 12, 2023

This pr made for backporting temporary fix for previous revision. PR with proper solution will be introduced later.

When node is restarted while it was executing partition delete operation, this operation may be not finished and we will have orphan files for that partition left on device.
On node restart we don't have that partition data in memory so we couldn't retry partition delete operation and won't clean up partition files on reconciliation.
This pr bring garbage collector mechanism that will force delete partition files after controller log delete partition operation is reconciled.

Backports Required

  • none - not a bug fix
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v22.3.x
  • v22.2.x
  • v22.1.x

UX Changes

  • Mechanism for cleaning partition orphan files

Release Notes

Bug Fixes

  • Mechanism for cleaning partition orphan files

model::revision_id(boost::lexical_cast<uint64_t>(match[2].str())));
}

ss::future<> log_manager::remove_orphan(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coroutine?

Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test failures seem related.

down_node = self.redpanda.nodes[-1]
try:
# Make topic directory immutable to prevent deleting
down_node.account.ssh(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice trick 👍

ntp_directory_data->first == ntp.tp.partition
&& ntp_directory_data->second <= rev) {
vlog(
stlog.debug,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: bump to info? This may be useful if we realize that something was deleted by mistake in the past, may be can be moved after L453 to log ntp_directory.

@@ -401,6 +402,69 @@ ss::future<> log_manager::remove(model::ntp ntp) {
});
}

/// Parse partition directory name
static std::optional<std::pair<model::partition_id, model::revision_id>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: think this can go in storage/fs_utils.h

Comment on lines 462 to 463
.then([this, topic_directory_path]() {
return dispatch_topic_dir_deletion(topic_directory_path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: This is needed to cleanup the topic dir if this ntp is the last one to be cleaned up? Think this is okay because the newer topic revision (if one exists) will always have partitions? Hope there are no weird races if topics are deleted/created back to back in tight loops.

/// Parse partition directory name
static std::optional<std::pair<model::partition_id, model::revision_id>>
parse_partition_directory(const ss::sstring& name) {
const std::regex re(R"(^(\d+)_(\d+)$)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we make this static thread_local so it doesn't have to be compiled every time ?

}

ss::future<> log_manager::remove_orphan(
ss::sstring topic_directory_path, model::ntp ntp, model::revision_id rev) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we do not need to provide topic directory here as we have an ntp already, it should be enough to provide base data directory

Comment on lines 1575 to 1577
// partition (i.e. removal mode is global), we need to delete from the
// table regardless of whether a replica of 'ntp' is present on the
// node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unrelated change

@ZeDRoman ZeDRoman force-pushed the topic-delete-orphan-files branch 5 times, most recently from ff43304 to f617393 Compare January 16, 2023 21:08
@@ -401,6 +402,71 @@ ss::future<> log_manager::remove(model::ntp ntp) {
});
}

ss::future<> log_manager::remove_orphan(
ss::sstring data_directory_path, model::ntp ntp, model::revision_id rev) {
vlog(stlog.info, "Asked to remove orphan for: {} revision: {}", ntp, rev);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we be more specific here i.e. Asked to remove orphaned partition directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no, because we remove all directories with rev less than provided.
We will log exact directories later

ss::sstring data_directory_path, model::ntp ntp, model::revision_id rev) {
vlog(stlog.info, "Asked to remove orphan for: {} revision: {}", ntp, rev);

auto topic_directory_path = (std::filesystem::path(data_directory_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const ?

Comment on lines +413 to +410
if (_logs.contains(ntp)) {
co_return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check can be done before creating a directory path

.done()
.finally(
[topic_directory]() mutable { return topic_directory.close(); });
} catch (ss::broken_promise const&) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not catch broken_promise exception as it indicates some other issue f.e. promise is deleted before setting the result,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs when we delete directory topic directory in other shard before closing it here.
We don't have any cross-shard mutex to do it, so I came up with this solution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This smells like a bug in the list_directory() implementation in seastar. It seems like if there is an exception, the underlying stream is destroyed without calling close destroying the promise.

Comment on lines 237 to 233
for node in self.redpanda.nodes:
self.logger.error(f"Storage listing on {node.name}:")
for line in node.account.ssh_capture(
f"find {self.redpanda.DATA_DIR}"):
self.logger.error(line.strip())

raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be extracted to a separate function

When redpanda is restarted while delete operation is not finish
Partition files might be left on disk.
We need to cleanup orphan partition files
Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

}
}
vlog(
stlog.info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: think we can lower this to debug (logged for every orphan cleanup).

Trying to clean up orphan topic directory:

We don't know if the topic directory is "orphan" at this point? there may be other partitions we are just attempting to schedule a deletion if it is empty.

@ZeDRoman ZeDRoman merged commit c986662 into redpanda-data:v22.3.x Jan 20, 2023
@ZeDRoman
Copy link
Contributor Author

/backport v22.2.x

@ZeDRoman
Copy link
Contributor Author

/backport v22.1.x

@vbotbuildovich
Copy link
Collaborator

The pull request's base branch is not the default one. Cancelling backport...

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

The pull request's base branch is not the default one. Cancelling backport...

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants