Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding option for setting file-output-format for VEP #582

Merged
merged 13 commits into from
Jun 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#540](https://github.com/nf-core/sarek/pull/540) - Add modules and subworkflows for `cnvkit` somatic mode
- [#557](https://github.com/nf-core/sarek/pull/557) - Add `Haplotypecaller` single sample mode together with `CNNScoreVariants` and `FilterVariantTranches`
- [#576](https://github.com/nf-core/sarek/pull/576) - Add modules and subworkflows for `cnvkit` germline mode
- [#582](https://github.com/nf-core/sarek/pull/582) - Added option `--vep_out_format` for setting the format of the output-file from VEP to `json`, `tab` or `vcf` (default)

### Changed

Expand Down
22 changes: 17 additions & 5 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -1044,18 +1044,30 @@ process{
(params.vep_dbnsfp && params.dbnsfp) ? '--plugin dbNSFP,dbNSFP.gz,rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF' : '',
(params.vep_loftee) ? '--plugin LoF,loftee_path:/opt/conda/envs/nf-core-vep-104.3/share/ensembl-vep-104.3-0' : '',
(params.vep_spliceai && params.spliceai_snv && params.spliceai_indel) ? '--plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz' : '',
(params.vep_spliceregion) ? '--plugin SpliceRegion' : ''
(params.vep_spliceregion) ? '--plugin SpliceRegion' : '',
(params.vep_out_format) ? "--${params.vep_out_format}" : '--vcf'
].join(' ').trim()
if (!params.vep_cache) container = { params.vep_genome ? "nfcore/vep:104.3.${params.vep_genome}" : "nfcore/vep:104.3.${params.genome}" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/reports/EnsemblVEP/${meta.id}/${meta.variantcaller}" },
pattern: "*html"
[
mode: params.publish_dir_mode,
path: { "${params.outdir}/reports/EnsemblVEP/${meta.id}/${meta.variantcaller}" },
pattern: "*html"
],
[
mode: params.publish_dir_mode,
path: { "${params.outdir}/annotation/${meta.id}/${meta.variantcaller}" },
pattern: "*{json,tab}"
]
]
}

withName: 'NFCORE_SAREK:SAREK:ANNOTATE:ANNOTATION_ENSEMBLVEP:ENSEMBLVEP' {
ext.prefix = {"${meta.id}_VEP"}
}

withName: ".*:ANNOTATION_MERGE:ENSEMBLVEP" {
// Output file will have format *_snpEff_VEP.ann.vcf
// Output file will have format *_snpEff_VEP.ann.vcf, *_snpEff_VEP.ann.json or *_snpEff_VEP.ann.tab
ext.prefix = { "${vcf.baseName.minus(".ann.vcf")}_VEP" }
}

Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
"git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d"
},
"ensemblvep": {
"git_sha": "40dd662fd26c3eb3160b7c8cbbe9bff80bbe2c30"
"git_sha": "30f72e24822576c6f90a0bf9db678b403c70eccf"
},
"fastqc": {
"git_sha": "49b18b1639f4f7104187058866a8fab33332bdfe"
Expand Down
4 changes: 2 additions & 2 deletions modules/nf-core/modules/ensemblvep/Dockerfile

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 6 additions & 6 deletions modules/nf-core/modules/ensemblvep/build.sh

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions modules/nf-core/modules/ensemblvep/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 12 additions & 9 deletions modules/nf-core/modules/ensemblvep/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 17 additions & 2 deletions modules/nf-core/modules/ensemblvep/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ params {


// Annotation
vep_out_format = 'vcf'
vep_dbnsfp = null // dbnsfp plugin disabled within VEP
dbnsfp = null // No dbnsfp processed file
dbnsfp_tbi = null // No dbnsfp processed file index
Expand Down
8 changes: 8 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -538,6 +538,14 @@
"description": "VEP cache version.",
"help_text": "If you use AWS iGenomes, this has already been set for you appropriately."
},
"vep_out_format": {
"type": "string",
"default": "vcf",
"description": "VEP output-file format.",
"enum": ["json", "tab", "vcf"],
"help_text": "Sets the format of the output-file from VEP. Available formats: json, tab and vcf.",
"fa_icon": "fas fa-table"
},
"save_reference": {
"type": "boolean",
"fa_icon": "fas fa-download",
Expand Down
49 changes: 28 additions & 21 deletions subworkflows/local/annotate.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,23 @@ include { ANNOTATION_ENSEMBLVEP } from '../nf-core/annotatio

workflow ANNOTATE {
take:
vcf // channel: [ val(meta), vcf ]
tools // Mandatory, list of tools to apply
snpeff_db
snpeff_cache
vep_genome
vep_species
vep_cache_version
vep_cache
vep_extra_files

vcf // channel: [ val(meta), vcf ]
fasta
tools // Mandatory, list of tools to apply
snpeff_db
snpeff_cache
vep_genome
vep_species
vep_cache_version
vep_cache
vep_extra_files

main:
ch_reports = Channel.empty()
ch_vcf_ann = Channel.empty()
ch_versions = Channel.empty()
ch_reports = Channel.empty()
ch_vcf_ann = Channel.empty()
ch_tab_ann = Channel.empty()
ch_json_ann = Channel.empty()
ch_versions = Channel.empty()

if (tools.contains('merge') || tools.contains('snpeff')) {
ANNOTATION_SNPEFF(vcf, snpeff_db, snpeff_cache)
Expand All @@ -33,23 +36,27 @@ workflow ANNOTATE {

if (tools.contains('merge')) {
vcf_ann_for_merge = ANNOTATION_SNPEFF.out.vcf_tbi.map{ meta, vcf, tbi -> [meta, vcf] }
ANNOTATION_MERGE(vcf_ann_for_merge, vep_genome, vep_species, vep_cache_version, vep_cache, vep_extra_files)
ANNOTATION_MERGE(vcf_ann_for_merge, fasta, vep_genome, vep_species, vep_cache_version, vep_cache, vep_extra_files)

ch_reports = ch_reports.mix(ANNOTATION_MERGE.out.reports)
ch_vcf_ann = ch_vcf_ann.mix(ANNOTATION_MERGE.out.vcf_tbi)
ch_versions = ch_versions.mix(ANNOTATION_MERGE.out.versions.first())
}

if (tools.contains('vep')) {
ANNOTATION_ENSEMBLVEP(vcf, vep_genome, vep_species, vep_cache_version, vep_cache, vep_extra_files)
ANNOTATION_ENSEMBLVEP(vcf, fasta, vep_genome, vep_species, vep_cache_version, vep_cache, vep_extra_files)

ch_reports = ch_reports.mix(ANNOTATION_ENSEMBLVEP.out.reports)
ch_vcf_ann = ch_vcf_ann.mix(ANNOTATION_ENSEMBLVEP.out.vcf_tbi)
ch_versions = ch_versions.mix(ANNOTATION_ENSEMBLVEP.out.versions.first())
ch_reports = ch_reports.mix(ANNOTATION_ENSEMBLVEP.out.reports)
ch_vcf_ann = ch_vcf_ann.mix(ANNOTATION_ENSEMBLVEP.out.vcf_tbi)
ch_tab_ann = ch_vcf_ann.mix(ANNOTATION_ENSEMBLVEP.out.tab)
ch_json_ann = ch_vcf_ann.mix(ANNOTATION_ENSEMBLVEP.out.json)
ch_versions = ch_versions.mix(ANNOTATION_ENSEMBLVEP.out.versions.first())
}

emit:
vcf_ann = ch_vcf_ann // channel: [ val(meta), vcf.gz, vcf.gz.tbi ]
reports = ch_reports // path: *.html
versions = ch_versions // path: versions.yml
vcf_ann = ch_vcf_ann // channel: [ val(meta), vcf.gz, vcf.gz.tbi ]
tab_ann = ch_tab_ann
json_ann = ch_json_ann
reports = ch_reports // path: *.html
versions = ch_versions // path: versions.yml
}
5 changes: 4 additions & 1 deletion subworkflows/nf-core/annotation/ensemblvep/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions workflows/sarek.nf
Original file line number Diff line number Diff line change
Expand Up @@ -848,6 +848,7 @@ workflow SAREK {
if (params.tools.contains('merge') || params.tools.contains('snpeff') || params.tools.contains('vep')) {

ANNOTATE(vcf_to_annotate,
fasta,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I guess a fasta is required for some of the new output formats? Is it still possible to run the annotation step without a fasta for the default vcf output?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FASTA is sometimes needed for plugins, but it should be okay for most runs to do it without a FASTA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I just mean, does this step still run in the sarek context :) Did you test this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no I don't know in the context of Sarek, but it was running without fasta before right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fasta is fully optionnal, but fasta is the only mandatory in sarek, so we can actually use it always.
Do we want not to use it by default?
Should we add a params to include it or not like --vep_include_fasta or --vep_no_fasta, and from that, it'll be easy to populate the fasta channel or not within the sarek workflow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure for me it is fine to have fasta as fully mandatory, should just communicate it :) Don't need to overcomplicate it now

params.tools,
snpeff_db,
snpeff_cache,
Expand Down