Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BagGenerator doesn't handle missing files/404 responses or no-file datasets well. #8603

Closed
qqmyers opened this issue Apr 12, 2022 · 0 comments · Fixed by #8609
Closed

BagGenerator doesn't handle missing files/404 responses or no-file datasets well. #8603

qqmyers opened this issue Apr 12, 2022 · 0 comments · Fixed by #8609
Assignees
Labels
GDCC: TDL supported by Texas Digital Library
Milestone

Comments

@qqmyers
Copy link
Member

qqmyers commented Apr 12, 2022

What steps does it take to reproduce the issue? Trying to archive Datasets that include datafiles where the physical file has been removed or is inaccessible can cause the BagGenerator to temporarily exhaust the available pool of threads. The accompanying PR properly closes connections for such failures allowing further file retrievals (e.g. when archiving many datasets via a batch API call).

The PR also ~corrects a minor issue: Bags nominally require a manifest file containing the fixity hashes of included data files. In cases where datasets do not have datafiles (e.g. are metadata only), this PR updates the BagGenerator to provide an empty manifest file to meet the letter of the Bag specification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GDCC: TDL supported by Texas Digital Library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants