-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevents files from being staged twice #43
Conversation
Codecov Report
@@ Coverage Diff @@
## main #43 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 23 23
Lines 1150 1164 +14
Branches 189 192 +3
=========================================
+ Hits 1150 1164 +14
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 I'm going to pre-approve, this looks great. Just some comments.
@@ -62,6 +64,27 @@ def test_that_a_local_temporary_path_is_created_when_staging_a_remote_file(test_ | |||
assert remote_file.local_path == staged_path | |||
|
|||
|
|||
def test_that_error_is_raised_when_a_file_has_been_staged_multiple_times(test_files): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we need a test for when the file already exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I also realized that now that we are preventing files from being staged more than once, that outcome is occurring after the first successful execution of stage
in a test for a remote file (or when I create fake staged files in a test. I.e. we are not testing what we used to be testing after that first test. I will add a helper function to the tests that clears out the dcqc-staged-*
directories to be used where necessary.
This PR implements a fix for the bug first reported by @adamjtaylor regarding a name collision error in Nextflow Tower runs of
nf-dcqc
. The error occurred because files were being staged twice in the Nextflow processwork
directory. I identified the error as having to do with one failed Batch job attempt leading to a successful re-try which was presumable happening after the file was initially staged. So, after the second attempt succeeded there were two copies of the target file in thework
directory for the Nextflow process.In order to prevent this from happening in this and other edge cases, I have added the
file.already_staged
method, which looks in the temporary directory to see if the file has already been staged in adcqc-staged-*
pattern-matched folder. If it has already been staged more than once somehow, it throws an error ahead of the inevitable Nextflow name collision error. If the file has been staged 1 or 0 times,already_staged
returns a list of file paths.In
file.stage
, if this list of file paths is empty the file has yet to be staged and the function continues as it has, staging the file. If it has already been staged, thePath
to the previously staged file is returned.This change has been tested in local and tower runs of
nf-dqc
. Unfortunately, because of the inconsistent and non-reproducible nature of the batch error which prompted this work it is difficult to be certain if it will prevent future issues.