-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make S3PinotFS listFiles return directories when non-recursive #14073
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14073 +/- ##
============================================
+ Coverage 61.75% 64.16% +2.40%
- Complexity 207 1542 +1335
============================================
Files 2436 2600 +164
Lines 133233 143466 +10233
Branches 20636 21975 +1339
============================================
+ Hits 82274 92050 +9776
+ Misses 44911 44627 -284
- Partials 6048 6789 +741
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks for the fix! @swaminathanmanish Can you help take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @dd-willgan! I've left a few minor comments.
@@ -524,7 +525,7 @@ public List<FileMetadata> listFilesWithMetadata(URI fileUri, boolean recursive) | |||
.setIsDirectory(s3Object.key().endsWith(DELIMITER)); | |||
listBuilder.add(fileBuilder.build()); | |||
} | |||
}); | |||
}, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PinotFS::listFilesWithMetadata
Javadoc seems to indicate that even this method should include directories in the returned list. Shouldn't we create an appropriate CommonPrefix
consumer here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
return prefix.substring(0, prefix.length() - 1); | ||
} | ||
|
||
private void visitFiles(URI fileUri, boolean recursive, Consumer<S3Object> objectVisitor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add a small comment here clarifying that CommonPrefix
represents keys that act like subdirectories (https://docs.aws.amazon.com/AmazonS3/latest/API/API_CommonPrefix.html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
||
Assert.assertTrue(Arrays.equals(Arrays.stream(originalFiles) | ||
Assert.assertTrue(Arrays.equals(Arrays.stream(expectedFiles) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update this to use Assert.assertEquals
instead (which handles array equality checks appropriately) so that we get better error messages in failure scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
Thanks for the review @yashmayya ! Would you mind taking a look again? |
Motivation: #10956
Deleted_Segments
may account for a significant portion of bucket usage and costProblem:
S3PinotFS.listFiles
which the retention relies on does not return directories even though that is expected byPinotFS
interface andSegmentDeletionManager
Changes:
Tested:
SegmentDeletionManager
which we want to fix, the only other places listFiles is called with recursive = false isSegmentGenerationUtils
andPinotLLCRealtimeSegmentManager