-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete orphan files for topics. #8185
Delete orphan files for topics. #8185
Conversation
5d5b371
to
5f8c4a9
Compare
5f8c4a9
to
bd8a715
Compare
model::revision_id(boost::lexical_cast<uint64_t>(match[2].str()))); | ||
} | ||
|
||
ss::future<> log_manager::remove_orphan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coroutine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test failures seem related.
down_node = self.redpanda.nodes[-1] | ||
try: | ||
# Make topic directory immutable to prevent deleting | ||
down_node.account.ssh( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice trick 👍
src/v/storage/log_manager.cc
Outdated
ntp_directory_data->first == ntp.tp.partition | ||
&& ntp_directory_data->second <= rev) { | ||
vlog( | ||
stlog.debug, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: bump to info? This may be useful if we realize that something was deleted by mistake in the past, may be can be moved after L453 to log ntp_directory.
src/v/storage/log_manager.cc
Outdated
@@ -401,6 +402,69 @@ ss::future<> log_manager::remove(model::ntp ntp) { | |||
}); | |||
} | |||
|
|||
/// Parse partition directory name | |||
static std::optional<std::pair<model::partition_id, model::revision_id>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: think this can go in storage/fs_utils.h
src/v/storage/log_manager.cc
Outdated
.then([this, topic_directory_path]() { | ||
return dispatch_topic_dir_deletion(topic_directory_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: This is needed to cleanup the topic dir if this ntp is the last one to be cleaned up? Think this is okay because the newer topic revision (if one exists) will always have partitions? Hope there are no weird races if topics are deleted/created back to back in tight loops.
src/v/storage/log_manager.cc
Outdated
/// Parse partition directory name | ||
static std::optional<std::pair<model::partition_id, model::revision_id>> | ||
parse_partition_directory(const ss::sstring& name) { | ||
const std::regex re(R"(^(\d+)_(\d+)$)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we make this static thread_local so it doesn't have to be compiled every time ?
src/v/storage/log_manager.cc
Outdated
} | ||
|
||
ss::future<> log_manager::remove_orphan( | ||
ss::sstring topic_directory_path, model::ntp ntp, model::revision_id rev) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we do not need to provide topic directory here as we have an ntp
already, it should be enough to provide base data directory
src/v/cluster/controller_backend.cc
Outdated
// partition (i.e. removal mode is global), we need to delete from the | ||
// table regardless of whether a replica of 'ntp' is present on the | ||
// node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unrelated change
ff43304
to
f617393
Compare
@@ -401,6 +402,71 @@ ss::future<> log_manager::remove(model::ntp ntp) { | |||
}); | |||
} | |||
|
|||
ss::future<> log_manager::remove_orphan( | |||
ss::sstring data_directory_path, model::ntp ntp, model::revision_id rev) { | |||
vlog(stlog.info, "Asked to remove orphan for: {} revision: {}", ntp, rev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we be more specific here i.e. Asked to remove orphaned partition directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no, because we remove all directories with rev less than provided.
We will log exact directories later
src/v/storage/log_manager.cc
Outdated
ss::sstring data_directory_path, model::ntp ntp, model::revision_id rev) { | ||
vlog(stlog.info, "Asked to remove orphan for: {} revision: {}", ntp, rev); | ||
|
||
auto topic_directory_path = (std::filesystem::path(data_directory_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const ?
if (_logs.contains(ntp)) { | ||
co_return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check can be done before creating a directory path
src/v/storage/log_manager.cc
Outdated
.done() | ||
.finally( | ||
[topic_directory]() mutable { return topic_directory.close(); }); | ||
} catch (ss::broken_promise const&) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not catch broken_promise
exception as it indicates some other issue f.e. promise is deleted before setting the result,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs when we delete directory topic directory in other shard before closing it here.
We don't have any cross-shard mutex to do it, so I came up with this solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This smells like a bug in the list_directory() implementation in seastar. It seems like if there is an exception, the underlying stream is destroyed without calling close destroying the promise.
for node in self.redpanda.nodes: | ||
self.logger.error(f"Storage listing on {node.name}:") | ||
for line in node.account.ssh_capture( | ||
f"find {self.redpanda.DATA_DIR}"): | ||
self.logger.error(line.strip()) | ||
|
||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be extracted to a separate function
f617393
to
8e79664
Compare
When redpanda is restarted while delete operation is not finish Partition files might be left on disk. We need to cleanup orphan partition files
8e79664
to
79f90d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
} | ||
} | ||
vlog( | ||
stlog.info, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: think we can lower this to debug (logged for every orphan cleanup).
Trying to clean up orphan topic directory:
We don't know if the topic directory is "orphan" at this point? there may be other partitions we are just attempting to schedule a deletion if it is empty.
/backport v22.2.x |
/backport v22.1.x |
The pull request's base branch is not the default one. Cancelling backport... |
The pull request's base branch is not the default one. Cancelling backport... |
This pr made for backporting temporary fix for previous revision. PR with proper solution will be introduced later.
When node is restarted while it was executing partition delete operation, this operation may be not finished and we will have orphan files for that partition left on device.
On node restart we don't have that partition data in memory so we couldn't retry partition delete operation and won't clean up partition files on reconciliation.
This pr bring garbage collector mechanism that will force delete partition files after controller log delete partition operation is reconciled.
Backports Required
UX Changes
Release Notes
Bug Fixes