Skip to content
This repository has been archived by the owner on Apr 12, 2023. It is now read-only.

[HACK TOPIC] Process unit testing #7

Open
ewels opened this issue Nov 23, 2018 · 11 comments
Open

[HACK TOPIC] Process unit testing #7

ewels opened this issue Nov 23, 2018 · 11 comments

Comments

@ewels
Copy link
Member

ewels commented Nov 23, 2018

Notes from our discussion about how a new generic unit testing module could work:

Present: @ewels, @fstrozzi, @LukeGoodsell, @micans

Testing would essentially act as a wrapper, with three steps:

  1. Set up the required files with the storeDir
    • Needs some kind of nextflow syntax to be able to define these test files
  2. Run nextflow with a super lenient caching method (file name only?) {new nf feature} so that it skips all of the upstream steps
    • Super lenient caching will mean that we can have empty files for all upstream steps except the penultimate process
    • Don't have staged cached files for the process that we're interested in. Nextflow will run this process for real.
    • Stop the workflow as soon as this process is done {new nf feature}
      • Squash the output channel
      • Required in case something goes wrong and the output filenames change. If this happens then the cache won't be valid and all downstream steps will run again.
  3. Check the results from that process

This tool could then be run, specifying which process should be tested. It could then be run in parallel for each process of interest.

@ewels
Copy link
Member Author

ewels commented Nov 23, 2018

x-ref nf-core/tools#209

@LukeGoodsell
Copy link

My understanding of the user story:

  1. A developer writes a workflow and wishes to add unit tests for one or more processes.

  2. She prepares one or more sets of inputs for the processes being tested.

  3. She then writes a test file that would look something like:

     // First test
     process firstTest {
         // List of processes that are allowed to run
         runProcesses 'bqsr bwa'
     
         // A list of upstream process outputs that should be used
         upstreamOutputs {
             'fastqc': ['testdata/firstTest/input/data_R1.fastq.gz', 'testdata/firstTest/input/data_R2.fastq.gz'],
             'multiqc': ['testdata/firstTest/input/multiqc.html']
         }
    
         // Code to run to test the output
         test:
         '''
         #!/usr/bin/env python
    
         # Insert test code here
         '''
     }
    
  4. The developer can then either manually run the test, or incorporate into into CI, with a command like

      nextflow test tests/firstTest.nft
    

@LukeGoodsell
Copy link

I have a few suggested ideas for consideration:

  1. Along with a superlenient hash method, we could implement a new process executor, none, that will cause the pipeline to fail if a process is launched that has it as an executor. This will allow us to use a Nextflow config file to assign this as the executor for all processes that should not be executed and thus prevent accidental run-away execution.

  2. Above I suggested a Nextflow-based test script, but I think we can re-use an existing test framework. For example, we could extend Python's unittest classes to create an nfunittest class. This would then have a setup step that generates a nextflow config file and runs the pipeline according to the test specification, and then runs tests of the data in Python. For example:

     import nfunittest
     
     class FirstTest(nfunittest.TestCase):
         def setUp(self):
             self.runProcesses = ['bqsr', 'bwa']
             upstreamOutputs = [
                 'fastqc': ['testdata/firstTest/input/data_R1.fastq.gz', 'testdata/firstTest/input/data_R2.fastq.gz'],
                 'multiqc': ['testdata/firstTest/input/multiqc.html']
             ]
         
         def test_bwa(self):
             out_file = self.outputDir + '/data.bam'
             ...
    

    Such test scripts can make use of the large amount of existing test profiling tools and methodologies, rather than writing something new.

  3. We would probably keep the nftest tools/code in a separate repo since it's not written in groovy.

@LukeGoodsell
Copy link

I also don't think we're going to be able to get the correct hash name for each process, so prefer the injection of a storeDir in a nextflow config file for all processes. We would, however, need a way for the processes' name to be included in the storeDir directive, and I can't see a way to do that currently with config files. Might need that to be added to Nextflow.

The user would then have to create empty/dummy files for all preceding, unused processes in the form

testdata/my_test/input/[PROCESS_NAME]/[OUTPUT_FILENAME]

The contents of testdata/my_test/input would then be copied to a temporary working directory that will be injected as the storeDir(+ process name) for each process via a config file.

All process that aren't to be executed should then have the none executor (mentioned above) and the test script will run Nextflow. This will:

  1. Allow selective testing of specific processes, using controlled inputs.
  2. Prevent other processes from running.
  3. Require minimal changes to Nextflow.

Thoughts?

@ewels
Copy link
Member Author

ewels commented Nov 25, 2018

Great! Makes a lot of sense 👍

One nitpick: I love the executor: none idea but maybe nextflow should exit successfully instead of with a failure? This would be more helpful for the test exit status check.

Is there a comparable unit testing framework in java? Nextflow already had unit tests, so I guess there must be. It would be nice to keep this inside nextflow and not a separate program if possible I think.

Phil

@ewels
Copy link
Member Author

ewels commented Nov 25, 2018

Also: instead of telling downstream processes not to run, it could be better to squash the output channels of the selected process that will run. Then we don’t need to know the shape of the DAG before writing the config - nextflow can just pick one process at a time and squash its output channels.

Note that I think executor: none could still be a generally useful thing to have though. This would make it easy for people to write a custom config script that selectively disables parts of other people’s pipelines for example. At the moment we have tonnes of when: !params.skipProcessFoo in a few pipelines which could be removed with this for example.

@piotr-faba-ardigen
Copy link

Has anyone given a thought how this looks with DSL-2 being on the table? I'd like to be able to unittest a process in a module

@ewels
Copy link
Member Author

ewels commented Nov 12, 2019

There was quite a bit of discussion around this at the 2019 meeting. However, I've not seen any working examples yet.

@sfehrmann
Copy link

sfehrmann commented Dec 1, 2022

just cross-referencing as this popped up in the same search https://code.askimed.com/nf-test/getting-started/

Disclaimer: I had at best a 5 min glimpse at nf-test

@ewels
Copy link
Member Author

ewels commented Dec 1, 2022

Thanks @sfehrmann! This GitHub issue is documenting a Nextflow user meeting from 4 years ago, it's not an issue for active development :) nf-test didn't exist at the time, but you're absolutely right that it's a great tool 👍🏻 So good for future googlers..

@schultzm
Copy link

There's also this: https://github.com/LUMC/pytest-workflow

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants