Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make S3PinotFS listFiles return directories when non-recursive #14073

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

dd-willgan
Copy link
Contributor

Motivation: #10956

  • For Pinot clusters that have been up for a while Deleted_Segments may account for a significant portion of bucket usage and cost

Problem:

Changes:

  • Use CommonPrefixes in ListObjectsV2 response to get directories
  • Modify the visitFiles signature

Tested:

  • Updated existing unit test
  • Backwards incompatibilities
    • Recursive = true behavior unchanged
    • Based on screenshot below, aside from SegmentDeletionManager which we want to fix, the only other places listFiles is called with recursive = false is SegmentGenerationUtils and PinotLLCRealtimeSegmentManager
      • Those code locations won't be affected by this change

@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2024

Codecov Report

Attention: Patch coverage is 92.30769% with 1 line in your changes missing coverage. Please review.

Project coverage is 64.16%. Comparing base (59551e4) to head (f2cce7f).
Report is 1117 commits behind head on master.

Files with missing lines Patch % Lines
.../org/apache/pinot/plugin/filesystem/S3PinotFS.java 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14073      +/-   ##
============================================
+ Coverage     61.75%   64.16%   +2.40%     
- Complexity      207     1542    +1335     
============================================
  Files          2436     2600     +164     
  Lines        133233   143466   +10233     
  Branches      20636    21975    +1339     
============================================
+ Hits          82274    92050    +9776     
+ Misses        44911    44627     -284     
- Partials       6048     6789     +741     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 64.13% <92.30%> (+2.42%) ⬆️
java-21 63.96% <92.30%> (+2.34%) ⬆️
skip-bytebuffers-false 64.15% <92.30%> (+2.40%) ⬆️
skip-bytebuffers-true 63.95% <92.30%> (+36.22%) ⬆️
temurin 64.16% <92.30%> (+2.40%) ⬆️
unittests 64.15% <92.30%> (+2.40%) ⬆️
unittests1 55.77% <ø> (+8.88%) ⬆️
unittests2 34.62% <92.30%> (+6.89%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jackie-Jiang
Copy link
Contributor

Thanks for the fix! @swaminathanmanish Can you help take a look?

Copy link
Collaborator

@yashmayya yashmayya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @dd-willgan! I've left a few minor comments.

@@ -524,7 +525,7 @@ public List<FileMetadata> listFilesWithMetadata(URI fileUri, boolean recursive)
.setIsDirectory(s3Object.key().endsWith(DELIMITER));
listBuilder.add(fileBuilder.build());
}
});
}, null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PinotFS::listFilesWithMetadata Javadoc seems to indicate that even this method should include directories in the returned list. Shouldn't we create an appropriate CommonPrefix consumer here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

return prefix.substring(0, prefix.length() - 1);
}

private void visitFiles(URI fileUri, boolean recursive, Consumer<S3Object> objectVisitor,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add a small comment here clarifying that CommonPrefix represents keys that act like subdirectories (https://docs.aws.amazon.com/AmazonS3/latest/API/API_CommonPrefix.html).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


Assert.assertTrue(Arrays.equals(Arrays.stream(originalFiles)
Assert.assertTrue(Arrays.equals(Arrays.stream(expectedFiles)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update this to use Assert.assertEquals instead (which handles array equality checks appropriately) so that we get better error messages in failure scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@dd-willgan
Copy link
Contributor Author

Thanks for the review @yashmayya ! Would you mind taking a look again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants