[HACK TOPIC] Process unit testing #7

ewels · 2018-11-23T12:21:23Z

Notes from our discussion about how a new generic unit testing module could work:

Present: @ewels, @fstrozzi, @LukeGoodsell, @micans

Testing would essentially act as a wrapper, with three steps:

Set up the required files with the storeDir
- Needs some kind of nextflow syntax to be able to define these test files
Run nextflow with a super lenient caching method (file name only?) {new nf feature} so that it skips all of the upstream steps
- Super lenient caching will mean that we can have empty files for all upstream steps except the penultimate process
- Don't have staged cached files for the process that we're interested in. Nextflow will run this process for real.
- Stop the workflow as soon as this process is done {new nf feature}
  - Squash the output channel
  - Required in case something goes wrong and the output filenames change. If this happens then the cache won't be valid and all downstream steps will run again.
Check the results from that process

This tool could then be run, specifying which process should be tested. It could then be run in parallel for each process of interest.

The text was updated successfully, but these errors were encountered:

ewels · 2018-11-23T12:21:34Z

x-ref nf-core/tools#209

LukeGoodsell · 2018-11-23T14:38:07Z

My understanding of the user story:

A developer writes a workflow and wishes to add unit tests for one or more processes.
She prepares one or more sets of inputs for the processes being tested.

She then writes a test file that would look something like:

 // First test
 process firstTest {
     // List of processes that are allowed to run
     runProcesses 'bqsr bwa'
 
     // A list of upstream process outputs that should be used
     upstreamOutputs {
         'fastqc': ['testdata/firstTest/input/data_R1.fastq.gz', 'testdata/firstTest/input/data_R2.fastq.gz'],
         'multiqc': ['testdata/firstTest/input/multiqc.html']
     }

     // Code to run to test the output
     test:
     '''
     #!/usr/bin/env python

     # Insert test code here
     '''
 }

The developer can then either manually run the test, or incorporate into into CI, with a command like
```
  nextflow test tests/firstTest.nft
```

LukeGoodsell · 2018-11-23T14:41:01Z

I have a few suggested ideas for consideration:

Along with a superlenient hash method, we could implement a new process executor, none, that will cause the pipeline to fail if a process is launched that has it as an executor. This will allow us to use a Nextflow config file to assign this as the executor for all processes that should not be executed and thus prevent accidental run-away execution.
Above I suggested a Nextflow-based test script, but I think we can re-use an existing test framework. For example, we could extend Python's unittest classes to create an nfunittest class. This would then have a setup step that generates a nextflow config file and runs the pipeline according to the test specification, and then runs tests of the data in Python. For example:
```
 import nfunittest
 
 class FirstTest(nfunittest.TestCase):
     def setUp(self):
         self.runProcesses = ['bqsr', 'bwa']
         upstreamOutputs = [
             'fastqc': ['testdata/firstTest/input/data_R1.fastq.gz', 'testdata/firstTest/input/data_R2.fastq.gz'],
             'multiqc': ['testdata/firstTest/input/multiqc.html']
         ]
     
     def test_bwa(self):
         out_file = self.outputDir + '/data.bam'
         ...
```
Such test scripts can make use of the large amount of existing test profiling tools and methodologies, rather than writing something new.
We would probably keep the nftest tools/code in a separate repo since it's not written in groovy.

LukeGoodsell · 2018-11-23T15:00:22Z

I also don't think we're going to be able to get the correct hash name for each process, so prefer the injection of a storeDir in a nextflow config file for all processes. We would, however, need a way for the processes' name to be included in the storeDir directive, and I can't see a way to do that currently with config files. Might need that to be added to Nextflow.

The user would then have to create empty/dummy files for all preceding, unused processes in the form

testdata/my_test/input/[PROCESS_NAME]/[OUTPUT_FILENAME]

The contents of testdata/my_test/input would then be copied to a temporary working directory that will be injected as the storeDir(+ process name) for each process via a config file.

All process that aren't to be executed should then have the none executor (mentioned above) and the test script will run Nextflow. This will:

Allow selective testing of specific processes, using controlled inputs.
Prevent other processes from running.
Require minimal changes to Nextflow.

Thoughts?

ewels · 2018-11-25T11:31:42Z

Great! Makes a lot of sense 👍

One nitpick: I love the executor: none idea but maybe nextflow should exit successfully instead of with a failure? This would be more helpful for the test exit status check.

Is there a comparable unit testing framework in java? Nextflow already had unit tests, so I guess there must be. It would be nice to keep this inside nextflow and not a separate program if possible I think.

Phil

ewels · 2018-11-25T11:36:35Z

Also: instead of telling downstream processes not to run, it could be better to squash the output channels of the selected process that will run. Then we don’t need to know the shape of the DAG before writing the config - nextflow can just pick one process at a time and squash its output channels.

Note that I think executor: none could still be a generally useful thing to have though. This would make it easy for people to write a custom config script that selectively disables parts of other people’s pipelines for example. At the moment we have tonnes of when: !params.skipProcessFoo in a few pipelines which could be removed with this for example.

piotr-faba-ardigen · 2019-11-12T14:42:50Z

Has anyone given a thought how this looks with DSL-2 being on the table? I'd like to be able to unittest a process in a module

ewels · 2019-11-12T23:02:25Z

There was quite a bit of discussion around this at the 2019 meeting. However, I've not seen any working examples yet.

sfehrmann · 2022-12-01T13:46:06Z

just cross-referencing as this popped up in the same search https://code.askimed.com/nf-test/getting-started/

Disclaimer: I had at best a 5 min glimpse at nf-test

ewels · 2022-12-01T13:48:08Z

Thanks @sfehrmann! This GitHub issue is documenting a Nextflow user meeting from 4 years ago, it's not an issue for active development :) nf-test didn't exist at the time, but you're absolutely right that it's a great tool 👍🏻 So good for future googlers..

schultzm · 2023-03-30T10:37:42Z

There's also this: https://github.com/LUMC/pytest-workflow

apeltzer mentioned this issue Nov 23, 2018

Pipeline consistency & Unit testing in nf-core nf-core/tools#209

Closed

olgabot mentioned this issue Apr 27, 2020

Testing of pipeline output nf-core/tools#605

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HACK TOPIC] Process unit testing #7

[HACK TOPIC] Process unit testing #7

ewels commented Nov 23, 2018

ewels commented Nov 23, 2018

LukeGoodsell commented Nov 23, 2018

LukeGoodsell commented Nov 23, 2018

LukeGoodsell commented Nov 23, 2018

ewels commented Nov 25, 2018

ewels commented Nov 25, 2018

piotr-faba-ardigen commented Nov 12, 2019

ewels commented Nov 12, 2019

sfehrmann commented Dec 1, 2022 •

edited

Loading

ewels commented Dec 1, 2022

schultzm commented Mar 30, 2023

[HACK TOPIC] Process unit testing #7

[HACK TOPIC] Process unit testing #7

Comments

ewels commented Nov 23, 2018

ewels commented Nov 23, 2018

LukeGoodsell commented Nov 23, 2018

LukeGoodsell commented Nov 23, 2018

LukeGoodsell commented Nov 23, 2018

ewels commented Nov 25, 2018

ewels commented Nov 25, 2018

piotr-faba-ardigen commented Nov 12, 2019

ewels commented Nov 12, 2019

sfehrmann commented Dec 1, 2022 • edited Loading

ewels commented Dec 1, 2022

schultzm commented Mar 30, 2023

sfehrmann commented Dec 1, 2022 •

edited

Loading