Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding option for setting file-output-format for VEP #582

Merged
merged 13 commits into from
Jun 16, 2022

Conversation

asp8200
Copy link
Contributor

@asp8200 asp8200 commented Jun 8, 2022

Adding new CLI-option --vep_out_format for setting file-output-format for VEP. Possible values json, tab and vcf. (Defaults to vcf.)

Closes #575.

This solution uses a new version of the ensemblvep-module from nf-core/modes (nf-core/modules#1775).

The VEP-output-files get names like

sample_VEP.ann.vcf.gz
sample_VEP.ann.vcf.gz.tbi
sample_VEP.ann.json
sample_VEP.ann.tab

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - add to the software_versions process and a regex to scrape_software_versions.py
    • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint .).
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@maxulysse
Copy link
Member

I think it's a good proof of concept.
As you said, it'll be perfect if you were to update the nf-core/modules ensemblvep module as well.
Any possibility to get the output extension in the args instead?

@asp8200
Copy link
Contributor Author

asp8200 commented Jun 10, 2022

I think it's a good proof of concept. As you said, it'll be perfect if you were to update the nf-core/modules ensemblvep module as well. Any possibility to get the output extension in the args instead?

@maxulysse : I changed the solution so that the vep-output-format is set through task.ext.vep_output. (As far as I can see, the only advantage of this solution compared to my previous solution is that the input structure for the ENSEMBLVEP-module is left unchanged, i.e. it doesn't include vep_output.)

The vep-output-format could be set through task.ext.args of the ENSEMBLVEP-module, but then I would probably have to change task.ext.args from being a string into a dictionary like the following:

ext.args = {[
    "args_str": <current_definition_of_args>,
    "vep_output": params.vep_output
]}

The thing is that, as far as I can see, I need to get a "access" (i.e. $vep_output) to the vep-output-format in the ENSEMBLVEP-module. That is, I need the vep-output-format both as an argument to vep (that could be in the $args string), but I also need it in the definition of the extension of the output-file-name as shown here:

vep \\
        -i $vcf \\
        -o ${prefix}${suffix}.ann.$vep_output \\
        $args \\
        --assembly $genome \\
        --species $species \\
        --cache \\
        --cache_version $cache_version \\
        --dir_cache $dir_cache \\
        --fork $task.cpus \\
        --stats_file ${prefix}.summary.html \\
        --$vep_output

If you or someone else have suggestions for a better solution, I'm all ears. Cheers

--vcf \\
--stats_file ${prefix}.summary.html
--stats_file ${prefix}.summary.html \\
--$vep_output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this parameter I would almost say it should be set with $args . That way the module output can be configured via the modules.config and can also eaily be overwritten by the user, even if we don't parameterize it. (In sarek I would propose an additional parameter though: --vep_out_format defaulting to vcf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Rike on that

@maxulysse
Copy link
Member

Love this version, you can go open a PR in modules ;-)

@asp8200
Copy link
Contributor Author

asp8200 commented Jun 14, 2022

Shot! I just discovered that my addition of the CLI-option --vep_out_format for Sarek results in the following warning:

WARN: Found unexpected parameters:
* --vep_out_format: json
- Ignore this warning: params.schema_ignore_params = "vep_out_format"

Any suggestion on how to handle this?

@FriederikeHanssen
Copy link
Contributor

yes by running nf-core schema build and adding it to the schema. Then it will also be nicely rendered on the website docs later :)

@asp8200
Copy link
Contributor Author

asp8200 commented Jun 14, 2022

yes by running nf-core schema build and adding it to the schema. Then it will also be nicely rendered on the website docs later :)

Many thanks. I just discovered the nextflow_schema.json-file. I guess that it generated by nf-core schema build. I'll give it a spin :-)

@FriederikeHanssen
Copy link
Contributor

yes don't try annotating it by hand. the tool opens the file up in the web browser so you than sink a lot of time in finding the perfect icon for the parameter 😆

nextflow_schema.json Outdated Show resolved Hide resolved
@maxulysse
Copy link
Member

can you install vep through nf-core modules install ensemblvep --force?

@asp8200
Copy link
Contributor Author

asp8200 commented Jun 15, 2022

nf-core modules install ensemblvep --force

I'll give it a try. @nvnieuwk and I just released a new version of the vep-module in nf-core/modules. The idea is to use the new version here :-)

@maxulysse
Copy link
Member

@nf-core-bot fix linting

@github-actions
Copy link

github-actions bot commented Jun 15, 2022

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit bc36841

+| ✅ 144 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗   8 tests had warnings |!

❗ Test warnings:

  • readme - README did not have a Nextflow minimum version badge.
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • schema_description - No description provided in schema for parameter: umi_read_structure
  • schema_description - No description provided in schema for parameter: group_by_umi_strategy

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy

✅ Tests passed:

Run details

  • nf-core/tools version 2.4.1
  • Run at 2022-06-16 13:49:25

withName: 'NFCORE_SAREK:SAREK:ANNOTATE:ANNOTATION_ENSEMBLVEP:ENSEMBLVEP' {
ext.prefix = {"${meta.id}_VEP"}
}

withName: ".*:ANNOTATION_MERGE:ENSEMBLVEP" {
ext.prefix = {"${meta.id}_snpEff"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ext.prefix = {"${meta.id}_snpEff"}
ext.prefix = {"${meta.id}_snpEff_VEP"}

I think we need to add _VEP here too.
Can you check when producing annotated vcf with --tools merge if the _VEP is there?

@maxulysse maxulysse marked this pull request as ready for review June 16, 2022 06:41
workflows/sarek.nf Outdated Show resolved Hide resolved
@maxulysse
Copy link
Member

@asp8200 I'm sorry it looks like we merged a PR in dev which is raising some conflicts with your changes.
Can you try to sync your branch and fix that?

asp8200 and others added 2 commits June 16, 2022 12:39
Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>
@asp8200 asp8200 changed the title WIP: Adding option for setting file-output-format for VEP Adding option for setting file-output-format for VEP Jun 16, 2022
Copy link
Contributor

@lassefolkersen lassefolkersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through all the changes. Nice work @asp8200 ! Seems like they are well-coordinated with the others as well, so here's your approve 👍

@@ -848,6 +848,7 @@ workflow SAREK {
if (params.tools.contains('merge') || params.tools.contains('snpeff') || params.tools.contains('vep')) {

ANNOTATE(vcf_to_annotate,
fasta,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I guess a fasta is required for some of the new output formats? Is it still possible to run the annotation step without a fasta for the default vcf output?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FASTA is sometimes needed for plugins, but it should be okay for most runs to do it without a FASTA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I just mean, does this step still run in the sarek context :) Did you test this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no I don't know in the context of Sarek, but it was running without fasta before right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fasta is fully optionnal, but fasta is the only mandatory in sarek, so we can actually use it always.
Do we want not to use it by default?
Should we add a params to include it or not like --vep_include_fasta or --vep_no_fasta, and from that, it'll be easy to populate the fasta channel or not within the sarek workflow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure for me it is fine to have fasta as fully mandatory, should just communicate it :) Don't need to overcomplicate it now

build_push "GRCm39" "mus_musculus" "104" "104.3"
build_push "CanFam3.1" "canis_lupus_familiaris" "104" "104.3"
build_push "WBcel235" "caenorhabditis_elegans" "104" "104.3"
build_push "GRCh37" "homo_sapiens" "105" "105.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxulysse should the igenomes.config be updated here as well? or separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, good point, I forgot about it, yes, it should be done here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxulysse and @FriederikeHanssen : I don't understand what you guys are talking about regarding igenomes.config 😆 Does this PR need to contain updates to conf/igenomes.config ? If so, maybe you could include them? Cheers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asp8200 no worries then, I noticed that I need to add some fixes to the VEP modules anyway, so I can include that in a future PR

@maxulysse
Copy link
Member

@asp8200 Can you just update the CHANGELOG and we're good to go

@maxulysse maxulysse merged commit 7ba61bd into nf-core:dev Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants