Add disk usage summary report interface #9672

dotnwat · 2023-03-28T02:20:26Z

In a previous commit the gc_estimate interface was introduced which was effectively a dry-run of retention policy designed to allow future disk space management functionality to reason about which partitions had the most data available for removal from local disk in support of low disk space scenarios.

The flip side of knowing how much data can be reclaimed is to know how much data is being used. For example, future disk space management functionality will need to understand how much space is attributable to a raft vs other sub-systems (e.g. in single-disk setups).

To that end this PR introduces disk_log_impl::disk_usage() that replaces gc_estimate and returns a report about both disk usage broken out into categories (e.g. data, index) as well as the previous dry-run retention calculation. They are combined because (1) it is more efficient to collect these statistics at once and (2) the only current user will need both pieces of information.

ss::future<usage_report> disk_log_impl::disk_usage(compaction_config cfg) 

/*
 * disk usage report
 *
 * usage: disk usage summary for log.
 * reclaim: disk usage reclaim summary for log.
 */
struct usage_report {
    usage usage;
    reclaim_size_limits reclaim;
};

Backports Required

Release Notes

none

dotnwat · 2023-03-28T04:47:50Z

Failures:

CI Failure (partitions_rebalanced times out) in ScalingUpTest.test_on_demand_rebalancing #7756

VladLazar

Looks good. Only nits and suggestions.

VladLazar · 2023-03-29T13:21:15Z

src/v/storage/segment.cc

+void segment::clear_cached_disk_usage() {
+    _idx.clear_cached_disk_usage();
+    _data_disk_usage_size.reset();
+    _compaction_index_size.reset();
+}


nit: it's a bit odd that the normal index caches its own size, but the compaction index size is cached by the segment. We don't have a higher level abstraction for the compacted index, so I don't see another way of doing it.

right. i tried very hard (and succeeded) in not refactoring a bunch of stuff to make this cleaner. it is indeed awkward to have the reader and appender and then the same path in the segment/reader/appender etc...

src/v/storage/disk_log_impl.cc

src/v/storage/segment.h

andrwng · 2023-03-29T19:22:38Z

src/v/storage/disk_log_impl.cc

@@ -1701,15 +1701,10 @@ log make_disk_backed_log(

 ss::future<reclaim_size_limits>
 disk_log_impl::estimate_reclaim_size(compaction_config cfg) {


However, even when retention is disabled we may still want to delete data that has been uploaded to cloud storage.

Just making sure I understand the behavior before this: did we previously not locally GC for compacted topics? Or did we just not estimate the size, and therefore not prioritize running GC code as highly as we should've been

Before there was no estimation. Now we can estimate, but we aren't yet using that in the scheduling decision. Also, generally, this is work is tackling retention policies, so it is orthogonal to compaction. We aren't estimating for compaction.

Got it, thanks for clarifying

The intention behind caching the size of removed persistent state for GC'd log segments was reporting: if we held a segment reference it still may have been a GC'd tombstone. In this case reporting its size as 0 was awkward. However, this interface is now going to be primarily used for determining GC schedules. In this new role it makes sense to report the most up-to-date information. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

Prior to this commit estimation of disk sizes involved reporting user-level data: for example the number of bytes written into the segment file. This is not particularly accurate because it doesn't take into account the physical allocation of space by the underlying file system. For example, a small file of a few bytes in a file system that does not have any optimizations such as storing small file data in inode blocks might require a minimum file allocation size of 4K. This can have a dramatic impact on disk usage for workloads that produce a large number of files on disk: under reporting disk usage. To make accounting more accurate we instead query for file size from the OS. It's important to use this information appropriately: it doesn't necessarily reflect user-level data sizes for retention. Rather, it allows us to make better decisions about space usage when scheduling data removal. The impact of querying the OS is expected to be small because the number of queries >> modifications that affect size, so caching is feasible. Generally only segments that are compacted and the active segment for partition will have operations that invalidate the cache. However, these are also the segments for which we would skip applying retention policy to in a particular round of data reclamation. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

It is more natural to report cumulative sizes for categories of retention where possible rather than non-overlapping sizes. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

Prior to this commit the GC estimate size would report that no data was reclaimable for non-collectible partitions. However, even when retention is disabled we may still want to delete data that has been uploaded to cloud storage. This commit fixes that accounting by reporting data that is collectible, even if it is not collectible because of normal retention policy. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

This commit renames `estimate_reclaim_size` method to be a more generically named `disk_usage` method that will return information about both estimated reclaim size as before, as well as on-disk usage information for the entire partition. The two are combined because reporting all of the information together makes for a more efficient implementation. Further, the two pieces of information will be used together as part of disk space management. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

dotnwat · 2023-03-29T19:56:56Z

Force pushes

https://github.com/redpanda-data/redpanda/compare/5f461ac13ebacac502ab90d876bd2f042455b38d..fc547bc2ff5663fbd2222a81a7afa06e0025723e
- Updated comment
- Added test for non-collectible log
https://github.com/redpanda-data/redpanda/compare/fc547bc2ff5663fbd2222a81a7afa06e0025723e..41ddc8d70e0efdfd429f535b1add8a579ba8d43a
- Updated comment

dotnwat · 2023-03-29T23:06:41Z

Failures

CI Failure (partitions_rebalanced times out) in ScalingUpTest.test_on_demand_rebalancing #7756

github-actions bot added the area/redpanda label Mar 28, 2023

dotnwat marked this pull request as ready for review March 28, 2023 04:48

dotnwat requested review from jcsp, VladLazar, andrwng and graphcareful March 28, 2023 04:48

dotnwat self-assigned this Mar 28, 2023

VladLazar previously approved these changes Mar 29, 2023

View reviewed changes

andrwng previously approved these changes Mar 29, 2023

View reviewed changes

dotnwat dismissed stale reviews from andrwng and VladLazar via fc547bc March 29, 2023 19:51

dotnwat force-pushed the disk-usage branch from 5f461ac to fc547bc Compare March 29, 2023 19:51

dotnwat added 6 commits March 29, 2023 12:54

storage: report retention data category totals

47974a5

It is more natural to report cumulative sizes for categories of retention where possible rather than non-overlapping sizes. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

storage: add usage test for non-collectible log

41ddc8d

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

dotnwat force-pushed the disk-usage branch from fc547bc to 41ddc8d Compare March 29, 2023 19:56

dotnwat requested review from andrwng and VladLazar March 29, 2023 19:57

andrwng approved these changes Mar 29, 2023

View reviewed changes

dotnwat merged commit c3cca09 into redpanda-data:dev Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add disk usage summary report interface #9672

Add disk usage summary report interface #9672

dotnwat commented Mar 28, 2023

dotnwat commented Mar 28, 2023

VladLazar left a comment

VladLazar Mar 29, 2023

dotnwat Mar 29, 2023

andrwng Mar 29, 2023

dotnwat Mar 29, 2023

andrwng Mar 29, 2023

dotnwat commented Mar 29, 2023

dotnwat commented Mar 29, 2023

		@@ -1701,15 +1701,10 @@ log make_disk_backed_log(

		ss::future<reclaim_size_limits>
		disk_log_impl::estimate_reclaim_size(compaction_config cfg) {

Add disk usage summary report interface #9672

Add disk usage summary report interface #9672

Conversation

dotnwat commented Mar 28, 2023

Backports Required

Release Notes

dotnwat commented Mar 28, 2023

VladLazar left a comment

Choose a reason for hiding this comment

VladLazar Mar 29, 2023

Choose a reason for hiding this comment

dotnwat Mar 29, 2023

Choose a reason for hiding this comment

andrwng Mar 29, 2023

Choose a reason for hiding this comment

dotnwat Mar 29, 2023

Choose a reason for hiding this comment

andrwng Mar 29, 2023

Choose a reason for hiding this comment

dotnwat commented Mar 29, 2023

dotnwat commented Mar 29, 2023