Joint Germline subworkflow haplotypecaller -> Vqsr #595

nickhsmith · 2022-06-17T10:52:59Z

PR checklist

Add vqsr subworkflow

maxulysse · 2022-06-17T11:42:03Z

Duplicate of #546, but more advanced, so taking over

maxulysse · 2022-06-23T08:58:26Z

conf/igenomes.config

@@ -18,6 +18,7 @@ params {
            chr_dir               = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/Chromosomes"
            dbsnp                 = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf"
            dbsnp_tbi             = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf.idx"
+            dbsnp_vqsr            = 'dbsnp,known=false,training=true,truth=false,prior=2 dbsnp_138.b37.vcf'


I'd prefer having args in the modules.config, and avoiding adding extra files in igenomes.config

that doesn't fit with the nf-core/module styling as this is expected to be an inputted value

maxulysse · 2022-06-23T08:59:37Z

conf/igenomes.config

+            // resources for GATK joint germline variant recalibration
+            RESOURCE_SNP = [
+                [ res_1000g, dbsnp ],
+                [ res_1000g, dbsnp_tbi ], 
+                [ res_1000g_vqsr, dbsnp_vqsr ]
+            ]
+            resource_INDEL = [
+                [ known_indels, dbsnp ],
+                [ known_indels_tbi, dbsnp_tbi ], 
+                [ known_indels_mills_vqsr, known_indels_1000g_vqsr, dbsnp_vqsr ]
+            ]


I like that, but I feel like it should be done in the sarek script or in the joint germline variant calling workflow instead

I would then have to use less descriptive names as for hg19 and hg38 the files are slightly different. So the naming convention has to match regardless of the genome

what about something like known_snps (dbsnp should stay separate because tools like haplotypecaller explicetly want that file)

FriederikeHanssen · 2022-06-23T10:21:18Z

subworkflows/nf-core/variantcalling/haplotypecaller/main.nf

@@ -34,11 +34,10 @@ workflow RUN_HAPLOTYPECALLER {
        // group by interval
        genotype_gvcf_to_call = HAPLOTYPECALLER.out.vcf.join(HAPLOTYPECALLER.out.tbi).map{
        meta, gvcf, tbi ->
-            interval_name = meta.num_intervals > 1 ? (gvcf.simpleName - "${meta.id}_").replaceFirst("_",":") : meta.id
-            new_meta = [id: "joint_germline", interval_name: interval_name, num_intervals: meta.num_intervals]
+            interval_name = meta.num_intervals > 1 ? (gvcf.simpleName - "${meta.id}_").replaceFirst("_",":") : file(params.intervals).simpleName


I would be very careful here. I am afraid this may lead to the weird resume errors/unmatching meta mpas we had before.

hmm, would it be smarter to keep meta the same, and group py another (temporary) key?

the problem is more the reqirval of the file name. I still can't explain this, but sometimes, when retrieving something from the file name like here, the name is incorrectly resolved later on (even though the matched file in the channel element is the correct one). In this case here this would lead to some very wrong results. (in the rest of sarek, since we group on patient ID it only lead to file name clashes, a) easy to find the bug, b) the actual output results were not impacted)

conf/modules.config

subworkflows/local/germline_variant_calling.nf

maxulysse · 2022-07-20T13:53:51Z

conf/modules.config

-        ext.when         = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline}
+    withName: 'GATK4_GENOMICSDBIMPORT' {
+        ext.prefix       = { meta.num_intervals > 1 ? meta.intervals_name : "joint_interval" }
+ext.when         = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals}


Suggested change

ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals}

ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals}

FriederikeHanssen · 2022-07-20T13:54:22Z

subworkflows/local/germline_variant_calling.nf

@@ -118,17 +119,23 @@ workflow GERMLINE_VARIANT_CALLING {
    if (tools.split(',').contains('haplotypecaller')){
        cram_recalibrated_intervals_haplotypecaller = cram_recalibrated_intervals
            .map{ meta, cram, crai, intervals ->
-                [meta, cram, crai, intervals, []]
+
+                intervals_name = meta.num_intervals == 0 ? "no_interval" : intervals.simpleName


here we also need a conditional for joint germline. This addition of meta can only happen for haplotypecaller + joint germline or we need to rewrite a bunch of logic for the single sample case

maxulysse · 2022-07-20T13:54:53Z

subworkflows/local/germline_variant_calling.nf

@@ -47,7 +48,7 @@ workflow GERMLINE_VARIANT_CALLING {
        .map{ meta, cram, crai, intervals, num_intervals ->

            //If no interval file provided (0) then add empty list
-            intervals_new = num_intervals == 0 ? [] : intervals
+            intervals_new  = num_intervals == 0 ? [] : intervals


Suggested change

intervals_new = num_intervals == 0 ? [] : intervals

intervals_new = num_intervals == 0 ? [] : intervals

FriederikeHanssen · 2022-07-20T13:55:53Z

subworkflows/nf-core/gatk4/joint_germline_variant_calling/main.nf

+    //Merge scatter/gather vcfs & index
+    //Rework meta for variantscalled.csv and annotation tools
+    MERGE_GENOTYPEGVCFS(vcfs_sorted_input.intervals.map{meta, vcf ->
+         [[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf]


can you do the same nice formatting here as you did in line 116 ff?

maxulysse · 2022-07-20T13:56:10Z

subworkflows/local/germline_variant_calling.nf

-                [meta, cram, crai, intervals, []]
+
+                intervals_name = meta.num_intervals == 0 ? "no_interval" : intervals.simpleName
+                new_meta = [patient:meta.patient, sample:meta.sample, sex:meta.sex, status:meta.status, id:meta.sample, data_type:meta.data_type, num_intervals:meta.num_intervals, intervals_name:intervals_name]


Suggested change

new_meta = [patient:meta.patient, sample:meta.sample, sex:meta.sex, status:meta.status, id:meta.sample, data_type:meta.data_type, num_intervals:meta.num_intervals, intervals_name:intervals_name]

new_meta = [

data_type:meta.data_type,

id:meta.sample,

intervals_name:intervals_name,

num_intervals:meta.num_intervals,

patient:meta.patient,

sample:meta.sample,

sex:meta.sex,

status:meta.status

]

subworkflows/nf-core/gatk4/joint_germline_variant_calling/main.nf

FriederikeHanssen · 2022-07-20T13:57:44Z

subworkflows/nf-core/gatk4/joint_germline_variant_calling/main.nf

-
-    versions       =               ch_versions                                     // channel: [ versions.yml ]
+    versions       = ch_versions                               // channel: [ versions.yml ]
+    genotype_vcf   = Channel.empty().mix(vcfs_sorted_input.no_intervals,


the vcfs_sorted_input.no_intervals also needs the variantcaller: "haplotypecaller" in it smeta map here to make sure annotation is placing it in the proper folder

maxulysse · 2022-07-20T13:57:49Z

subworkflows/nf-core/gatk4/joint_germline_variant_calling/main.nf

+    //Merge scatter/gather vcfs & index
+    //Rework meta for variantscalled.csv and annotation tools
+    MERGE_GENOTYPEGVCFS(vcfs_sorted_input.intervals.map{meta, vcf ->
+         [[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf]


Suggested change

[[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf]

[[

id: "joint_variant_calling",

num_intervals: meta.num_intervals,

patient: "all_samples",

variantcaller: "haplotypecaller"

], vcf]

Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>

Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>

Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>

Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>

Gavin.Mackenzie and others added 4 commits May 13, 2022 14:29

add vqsr from Atholl, very early WIP

19e5e0e

added meta.yml for vqsr subworkflow

cd8d767

Merge pull request #9 from GCJMackenzie/add_vqsr_subworkflow

03909c9

Add vqsr subworkflow

joint variant calling updates to gatk best practices

c22a64f

group by interval and exclude sample info

028ad1a

nickhsmith changed the title ~~Vqsr~~ Joint Germline subworkflow haplotypecaller -> Vqsr Jun 20, 2022

Smith Nicholas added 2 commits June 20, 2022 15:34

interval and no_interval grouping

57c2dff

add params and vqsr process

ea8dd93

maxulysse reviewed Jun 23, 2022

View reviewed changes

FriederikeHanssen reviewed Jun 23, 2022

View reviewed changes

Smith Nicholas added 3 commits June 23, 2022 14:41

print statements

d6cc403

joint variantcalling

b78ad1c

update

37aa902

FriederikeHanssen reviewed Jun 28, 2022

View reviewed changes

conf/modules.config Outdated Show resolved Hide resolved

FriederikeHanssen reviewed Jun 28, 2022

View reviewed changes

conf/modules.config Outdated Show resolved Hide resolved

FriederikeHanssen reviewed Jun 28, 2022

View reviewed changes

conf/modules.config Outdated Show resolved Hide resolved

Smith Nicholas added 12 commits June 29, 2022 11:08

add interval_names to meta

d49a5ac

Merge remote-tracking branch 'NF-core/dev' into vqsr

d574b0c

prepare vqsr

dd7445f

remove inclusion of local config

06c66ad

lint

8686dfb

force bcftools sort

f9fcb65

fix typo and clearer naming

6e5e8dd

include variant_recalibration param

9a29759

apply vqsr and merge

aa110ab

snp and indels must be present for recalibration

fa41fa0

fix typo

17924d0

typo

50de0de

FriederikeHanssen reviewed Jul 20, 2022

View reviewed changes

subworkflows/local/germline_variant_calling.nf Outdated Show resolved Hide resolved

maxulysse reviewed Jul 20, 2022

View reviewed changes

FriederikeHanssen reviewed Jul 20, 2022

View reviewed changes

maxulysse reviewed Jul 20, 2022

View reviewed changes

FriederikeHanssen reviewed Jul 20, 2022

View reviewed changes

maxulysse reviewed Jul 20, 2022

View reviewed changes

subworkflows/nf-core/gatk4/joint_germline_variant_calling/main.nf Outdated Show resolved Hide resolved

FriederikeHanssen reviewed Jul 20, 2022

View reviewed changes

maxulysse reviewed Jul 20, 2022

View reviewed changes

nickhsmith and others added 11 commits July 20, 2022 15:58

Update conf/modules.config

b77a116

Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>

Update nextflow_schema.json

468dedb

Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>

Update subworkflows/local/germline_variant_calling.nf

e919d09

Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>

Update subworkflows/local/germline_variant_calling.nf

46bc423

Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>

Update germline_variant_calling.nf

f057709

Update subworkflows/nf-core/gatk4/joint_germline_variant_calling/main.nf

4d50caf

Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>

meta format

c707f7c

undo publishDir change

ac0a1f5

hardcode joint_variant_calling publish path

a6d6e9a

fix typo

7f5a916

Merge remote-tracking branch 'NF-core/dev' into vqsr

1fc9e2a

This was referenced Jul 20, 2022

Incomplete gVCF files with --target_bed #344

Closed

Do VQSR for HaplotypeCaller calls #89

Closed

Enable joint variant calling for germline samples #75

Closed

[FEATURE] GATK GenotypeGVCFs option? #392

Closed

nickhsmith and others added 4 commits July 20, 2022 21:23

Merge branch 'dev' into vqsr

d842fe0

fix haplotypecaller cram input

73e84db

fix indents, commas etc

9fa1a5c

remove test file

b0c32c7

FriederikeHanssen approved these changes Jul 21, 2022

View reviewed changes

FriederikeHanssen merged commit 21b9f62 into nf-core:dev Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joint Germline subworkflow haplotypecaller -> Vqsr #595

Joint Germline subworkflow haplotypecaller -> Vqsr #595

nickhsmith commented Jun 17, 2022

maxulysse commented Jun 17, 2022

maxulysse Jun 23, 2022

nickhsmith Jun 23, 2022

maxulysse Jun 23, 2022

nickhsmith Jun 23, 2022

FriederikeHanssen Jun 23, 2022

FriederikeHanssen Jun 23, 2022

nickhsmith Jun 23, 2022

FriederikeHanssen Jun 23, 2022

maxulysse Jul 20, 2022

FriederikeHanssen Jul 20, 2022

maxulysse Jul 20, 2022

FriederikeHanssen Jul 20, 2022

maxulysse Jul 20, 2022

FriederikeHanssen Jul 20, 2022

maxulysse Jul 20, 2022

	ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals}
	ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals}

	intervals_new = num_intervals == 0 ? [] : intervals
	intervals_new = num_intervals == 0 ? [] : intervals

Joint Germline subworkflow haplotypecaller -> Vqsr #595

Joint Germline subworkflow haplotypecaller -> Vqsr #595

Conversation

nickhsmith commented Jun 17, 2022

PR checklist

maxulysse commented Jun 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment