Merge pull request #352 from JoseEspinosa/updates

Updates
nf-core · Jun 28, 2023 · 3cff26f · 3cff26f
2 parents 3182e0f + 2a5f3f4
commit 3cff26f
Show file tree

Hide file tree

Showing 18 changed files with 162 additions and 152 deletions.
diff --git a/conf/modules.config b/conf/modules.config
diff --git a/docs/output.md b/docs/output.md
@@ -108,19 +108,19 @@ The library-level alignments associated with the same sample are merged and subs
 <details markdown="1">
     <summary>Output files</summary>
 
-- `<ALIGNER>/mergedLibrary/`
+- `<ALIGNER>/merged_library/`
   - `*.bam`: Merged library-level, coordinate sorted `*.bam` files after the marking of duplicates, and filtering based on various criteria. The file suffix for the final filtered files will be `*.mLb.clN.*`. If you specify the `--save_align_intermeds` parameter then two additional sets of files will be present. These represent the unfiltered alignments with duplicates marked (`*.mLb.mkD.*`), and in the case of paired-end datasets the filtered alignments before the removal of orphan read pairs (`*.mLb.flT.*`).
-- `<ALIGNER>/mergedLibrary/samtools_stats/`
+- `<ALIGNER>/merged_library/samtools_stats/`
   - SAMtools `*.flagstat`, `*.idxstats` and `*.stats` files generated from the alignment files.
-- `<ALIGNER>/mergedLibrary/picard_metrics/`
+- `<ALIGNER>/merged_library/picard_metrics/`
   - `*_metrics`: Alignment QC files from picard CollectMultipleMetrics.
   - `*.metrics.txt`: Metrics file from MarkDuplicates.
-- `<ALIGNER>/mergedLibrary/picard_metrics/pdf/`
+- `<ALIGNER>/merged_library/picard_metrics/pdf/`
   - `*.pdf`: Alignment QC plot files from picard CollectMultipleMetrics.
-- `<ALIGNER>/mergedLibrary/preseq/`
+- `<ALIGNER>/merged_library/preseq/`
   - `*.lc_extrap.txt`: Preseq expected future yield file.
 
-> **NB:** File names in the resulting directory (i.e. `<ALIGNER>/mergedLibrary/`) will have the '`.mLb.`' suffix.
+> **NB:** File names in the resulting directory (i.e. `<ALIGNER>/merged_library/`) will have the '`.mLb.`' suffix.
 
 </details>
 
@@ -141,7 +141,7 @@ The [Preseq](http://smithlabresearch.org/software/preseq/) package is aimed at p
 <details markdown="1">
     <summary>Output files</summary>
 
-- `<ALIGNER>/mergedLibrary/bigwig/`
+- `<ALIGNER>/merged_library/bigwig/`
   - `*.bigWig`: Normalised bigWig files scaled to 1 million mapped reads.
 
 </details>
@@ -153,12 +153,12 @@ The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is in a
 <details markdown="1">
     <summary>Output files</summary>
 
-- `<ALIGNER>/mergedLibrary/phantompeakqualtools/`
+- `<ALIGNER>/merged_library/phantompeakqualtools/`
   - `*.spp.out`, `*.spp.pdf`: phantompeakqualtools output files.
   - `*_mqc.tsv`: MultiQC custom content files.
-- `<ALIGNER>/mergedLibrary/deepTools/plotFingerprint/`
+- `<ALIGNER>/merged_library/deepTools/plotFingerprint/`
   - `*.plotFingerprint.pdf`, `*.plotFingerprint.qcmetrics.txt`, `*.plotFingerprint.raw.txt`: plotFingerprint output files.
-- `<ALIGNER>/mergedLibrary/deepTools/plotProfile/`
+- `<ALIGNER>/merged_library/deepTools/plotProfile/`
   - `*.computeMatrix.mat.gz`, `*.computeMatrix.vals.mat.tab`, `*.plotProfile.pdf`, `*.plotProfile.tab`, `*.plotHeatmap.pdf`, `*.plotHeatmap.mat.tab`: plotProfile output files.
 
 </details>
@@ -188,10 +188,10 @@ The results from deepTools plotProfile gives you a quick visualisation for the g
 <details markdown="1">
     <summary>Output files</summary>
 
-- `<ALIGNER>/mergedLibrary/macs2/<PEAK_TYPE>/`
+- `<ALIGNER>/merged_library/macs2/<PEAK_TYPE>/`
   - `*.xls`, `*.broadPeak` or `*.narrowPeak`, `*.gappedPeak`, `*summits.bed`: MACS2 output files - the files generated will depend on whether MACS2 has been run in _narrowPeak_ or _broadPeak_ mode.
   - `*.annotatePeaks.txt`: HOMER peak-to-gene annotation file.
-- `<ALIGNER>/mergedLibrary/macs2/<PEAK_TYPE>/qc/`
+- `<ALIGNER>/merged_library/macs2/<PEAK_TYPE>/qc/`
   - `macs2_peak.plots.pdf`: QC plots for MACS2 peaks.
   - `macs2_annotatePeaks.plots.pdf`: QC plots for peak-to-gene feature annotation.
   - `*.FRiP_mqc.tsv`, `*.peak_count_mqc.tsv`, `annotatepeaks.summary_mqc.tsv`: MultiQC custom-content files for FRiP score, peak count and peak-to-gene ratios.
@@ -217,7 +217,7 @@ Various QC plots per sample including number of peaks, fold-change distribution,
 <details markdown="1">
     <summary>Output files</summary>
 
-- `<ALIGNER>/mergedLibrary/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/`
+- `<ALIGNER>/merged_library/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/`
   - `*.bed`: Consensus peak-set across all samples in BED format.
   - `*.saf`: Consensus peak-set across all samples in SAF format. Required by featureCounts for read quantification.
   - `*.featureCounts.txt`: Read counts across all samples relative to consensus peak-set.
@@ -245,7 +245,7 @@ The [featureCounts](http://bioinf.wehi.edu.au/featureCounts/) tool is used to co
 <details markdown="1">
     <summary>Output files</summary>
 
-- `<ALIGNER>/mergedLibrary/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/deseq2/`
+- `<ALIGNER>/merged_library/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/deseq2/`
   - `*.sample.dists.txt`: Spreadsheet containing sample-to-sample distance across each consensus peak.
   - `*.plots.pdf`: File containing PCA and hierarchical clustering plots.
   - `*.dds.RData`: File containing R `DESeqDataSet` object generated by DESeq2, with either
@@ -254,7 +254,7 @@ The [featureCounts](http://bioinf.wehi.edu.au/featureCounts/) tool is used to co
     `readRDS` to give user control of the eventual object name.
   - `*pca.vals.txt`: Matrix of values for the first 2 principal components.
   - `R_sessionInfo.log`: File containing information about R, the OS and attached or loaded packages.
-  - `<ALIGNER>/mergedLibrary/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/sizeFactors/`
+  - `<ALIGNER>/merged_library/macs2/<PEAK_TYPE>/consensus/<ANTIBODY>/sizeFactors/`
   - `*.txt`, `*.RData`: Files containing DESeq2 sizeFactors per sample.
 
 </details>

diff --git a/modules/local/bam_remove_orphans.nf b/modules/local/bam_remove_orphans.nf
@@ -17,6 +17,9 @@ process BAM_REMOVE_ORPHANS {
     tuple val(meta), path("${prefix}.bam"), emit: bam
     path "versions.yml"                   , emit: versions
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
     def args = task.ext.args ?: ''
     prefix   = task.ext.prefix ?: "${meta.id}"

diff --git a/modules/local/bedtools_genomecov.nf b/modules/local/bedtools_genomecov.nf
@@ -15,11 +15,13 @@ process BEDTOOLS_GENOMECOV {
     tuple val(meta), path("*.txt")     , emit: scale_factor
     path "versions.yml"                , emit: versions
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script:
+    def args   = task.ext.args ?: ''
     def prefix = task.ext.prefix ?: "${meta.id}"
-
     def pe     = meta.single_end ? '' : '-pc'
-    def extend = (meta.single_end && params.fragment_size > 0) ? "-fs ${params.fragment_size}" : ''
     """
     SCALE_FACTOR=\$(grep '[0-9] mapped (' $flagstat | awk '{print 1000000/\$1}')
     echo \$SCALE_FACTOR > ${prefix}.scale_factor.txt
@@ -30,7 +32,7 @@ process BEDTOOLS_GENOMECOV {
         -bg \\
         -scale \$SCALE_FACTOR \\
         $pe \\
-        $extend \\
+        $args \\
         | sort -T '.' -k1,1 -k2,2n > ${prefix}.bedGraph
 
     cat <<-END_VERSIONS > versions.yml

diff --git a/modules/local/deseq2_qc.nf b/modules/local/deseq2_qc.nf
@@ -26,10 +26,12 @@ process DESEQ2_QC {
     path "size_factors"         , optional:true, emit: size_factors
     path "versions.yml"         , emit: versions
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script:
-    def args      = task.ext.args ?: ''
-    def peak_type = params.narrow_peak ? 'narrowPeak' : 'broadPeak'
-    def prefix    = task.ext.prefix ?: "${meta.id}"
+    def args   = task.ext.args ?: ''
+    def prefix = task.ext.prefix ?: "${meta.id}"
     """
     deseq2_qc.r \\
         --count_file $counts \\

diff --git a/modules/local/frip_score.nf b/modules/local/frip_score.nf
@@ -14,6 +14,9 @@ process FRIP_SCORE {
     tuple val(meta), path("*.txt"), emit: txt
     path "versions.yml"           , emit: versions
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script:
     def args   = task.ext.args   ?: ''
     def prefix = task.ext.prefix ?: "${meta.id}"

diff --git a/modules/local/genome_blacklist_regions.nf b/modules/local/genome_blacklist_regions.nf
@@ -17,6 +17,9 @@ process GENOME_BLACKLIST_REGIONS {
     path '*.bed'       , emit: bed
     path "versions.yml", emit: versions
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script:
     def file_out = "${sizes.simpleName}.include_regions.bed"
     if (blacklist) {

diff --git a/modules/local/gtf2bed.nf b/modules/local/gtf2bed.nf
@@ -14,6 +14,9 @@ process GTF2BED {
     path '*.bed'       , emit: bed
     path "versions.yml", emit: versions
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
     """
     gtf2bed \\

diff --git a/modules/local/igv.nf b/modules/local/igv.nf
@@ -20,8 +20,12 @@ process IGV {
     output:
     path "*files.txt"  , emit: txt
     path "*.xml"       , emit: xml
+    path fasta         , emit: fasta
     path "versions.yml", emit: versions
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script: // scripts are bundled with the pipeline in nf-core/chipseq/bin/
     def consensus_dir = "${aligner_dir}/mergedLibrary/macs2/${peak_dir}/consensus/*"
     """

diff --git a/modules/local/multiqc.nf b/modules/local/multiqc.nf
@@ -56,6 +56,9 @@ process MULTIQC {
     path "*_plots"             , optional:true, emit: plots
     path "versions.yml"        , emit: versions
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script:
     def args          = task.ext.args ?: ''
     def custom_config = params.multiqc_config ? "--config $mqc_custom_config" : ''

diff --git a/modules/local/multiqc_custom_peaks.nf b/modules/local/multiqc_custom_peaks.nf
@@ -14,6 +14,9 @@ process MULTIQC_CUSTOM_PEAKS {
     tuple val(meta), path("*.peak_count_mqc.tsv"), emit: count
     tuple val(meta), path("*.FRiP_mqc.tsv")      , emit: frip
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script:
     def prefix = task.ext.prefix ?: "${meta.id}"
     """

diff --git a/modules/local/multiqc_custom_phantompeakqualtools.nf b/modules/local/multiqc_custom_phantompeakqualtools.nf
@@ -16,6 +16,9 @@ process MULTIQC_CUSTOM_PHANTOMPEAKQUALTOOLS {
     tuple val(meta), path("*.spp_rsc_mqc.tsv")        , emit: rsc
     tuple val(meta), path("*.spp_correlation_mqc.tsv"), emit: correlation
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script:
     def prefix = task.ext.prefix ?: "${meta.id}"
     """

diff --git a/modules/local/plot_homer_annotatepeaks.nf b/modules/local/plot_homer_annotatepeaks.nf
@@ -17,6 +17,9 @@ process PLOT_HOMER_ANNOTATEPEAKS {
     path '*.tsv'       , emit: tsv
     path "versions.yml", emit: versions
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
     def args = task.ext.args ?: ''
     def prefix = task.ext.prefix ?: "annotatepeaks"

diff --git a/modules/local/plot_macs2_qc.nf b/modules/local/plot_macs2_qc.nf
@@ -8,15 +8,19 @@ process PLOT_MACS2_QC {
 
     input:
     path peaks
+    val is_narrow_peak
 
     output:
     path '*.txt'       , emit: txt
     path '*.pdf'       , emit: pdf
     path "versions.yml", emit: versions
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
     def args      = task.ext.args ?: ''
-    def peak_type = params.narrow_peak ? 'narrowPeak' : 'broadPeak'
+    def peak_type = is_narrow_peak ? 'narrowPeak' : 'broadPeak'
     """
     plot_macs2_qc.r \\
         -i ${peaks.join(',')} \\

diff --git a/modules/local/samplesheet_check.nf b/modules/local/samplesheet_check.nf
@@ -18,10 +18,11 @@ process SAMPLESHEET_CHECK {
     task.ext.when == null || task.ext.when
 
     script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
+    def args = task.ext.args ?: ''
     """
     check_samplesheet.py \\
         $samplesheet \\
-        samplesheet.valid.csv
+        $args
 
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":

diff --git a/modules/local/star_genomegenerate.nf b/modules/local/star_genomegenerate.nf
@@ -16,6 +16,9 @@ process STAR_GENOMEGENERATE {
     path "star"        , emit: index
     path "versions.yml", emit: versions
 
+    when:
+    task.ext.when == null || task.ext.when
+
     script:
     def args   = (task.ext.args ?: '').tokenize()
     def memory = task.memory ? "--limitGenomeGenerateRAM ${task.memory.toBytes() - 100000000}" : ''

diff --git a/subworkflows/local/prepare_genome.nf b/subworkflows/local/prepare_genome.nf
@@ -41,13 +41,7 @@ workflow PREPARE_GENOME {
         ch_fasta    = GUNZIP_FASTA ( [ [:], params.fasta ] ).gunzip.map{ it[1] }
         ch_versions = ch_versions.mix(GUNZIP_FASTA.out.versions)
     } else {
-        ch_fasta = file(params.fasta)
-    }
-
-    // Make fasta file available if reference saved or IGV is run
-    if (params.save_reference || !params.skip_igv) {
-        file("${params.outdir}/genome/").mkdirs()
-        ch_fasta.copyTo("${params.outdir}/genome/")
+        ch_fasta = Channel.value(file(params.fasta))
     }
 
     //
@@ -107,14 +101,15 @@ workflow PREPARE_GENOME {
             ch_gene_bed = GUNZIP_GENE_BED ( [ [:], params.gene_bed ] ).gunzip.map{ it[1] }
             ch_versions = ch_versions.mix(GUNZIP_GENE_BED.out.versions)
         } else {
-            ch_gene_bed = file(params.gene_bed)
+            ch_gene_bed = Channel.value(file(params.gene_bed))
         }
     }
 
     //
     // Create chromosome sizes file
     //
-    ch_chrom_sizes = CUSTOM_GETCHROMSIZES ( [ [:], ch_fasta ] ).sizes.map{ it[1] }
+    CUSTOM_GETCHROMSIZES ( ch_fasta.map { [ [:], it ] } )
+    ch_chrom_sizes = CUSTOM_GETCHROMSIZES.out.sizes.map { it[1] }
     ch_fai         = CUSTOM_GETCHROMSIZES.out.fai.map{ it[1] }
     ch_versions    = ch_versions.mix(CUSTOM_GETCHROMSIZES.out.versions)
 
@@ -144,7 +139,7 @@ workflow PREPARE_GENOME {
                 ch_bwa_index = file(params.bwa_index)
             }
         } else {
-            ch_bwa_index = BWA_INDEX ( [ [:], ch_fasta ] ).index
+            ch_bwa_index = BWA_INDEX ( ch_fasta.map { [ [:], it ] } ).index
             ch_versions  = ch_versions.mix(BWA_INDEX.out.versions)
         }
     }
@@ -162,7 +157,7 @@ workflow PREPARE_GENOME {
                 ch_bowtie2_index = [ [:], file(params.bowtie2_index) ]
             }
         } else {
-            ch_bowtie2_index = BOWTIE2_BUILD ( [ [:], ch_fasta ] ).index
+            ch_bowtie2_index = BOWTIE2_BUILD ( ch_fasta.map { [ [:], it ] } ).index
             ch_versions      = ch_versions.mix(BOWTIE2_BUILD.out.versions)
         }
     }
@@ -180,7 +175,7 @@ workflow PREPARE_GENOME {
                 ch_chromap_index = [ [:], file(params.chromap_index) ]
             }
         } else {
-            ch_chromap_index = CHROMAP_INDEX ( [ [:], ch_fasta ] ).index
+            ch_chromap_index = CHROMAP_INDEX ( ch_fasta.map { [ [:], it ] } ).index
             ch_versions  = ch_versions.mix(CHROMAP_INDEX.out.versions)
         }
     }
@@ -195,7 +190,7 @@ workflow PREPARE_GENOME {
                 ch_star_index = UNTAR_STAR_INDEX ( [ [:], params.star_index ] ).untar.map{ it[1] }
                 ch_versions   = ch_versions.mix(UNTAR_STAR_INDEX.out.versions)
             } else {
-                ch_star_index = file(params.star_index)
+                ch_star_index = Channel.value(file(params.star_index))
             }
         } else {
             ch_star_index = STAR_GENOMEGENERATE ( ch_fasta, ch_gtf ).index

diff --git a/workflows/chipseq.nf b/workflows/chipseq.nf
@@ -205,7 +205,7 @@ workflow CHIPSEQ {
         ch_samtools_stats    = FASTQ_ALIGN_BOWTIE2.out.stats
         ch_samtools_flagstat = FASTQ_ALIGN_BOWTIE2.out.flagstat
         ch_samtools_idxstats = FASTQ_ALIGN_BOWTIE2.out.idxstats
-        ch_versions = ch_versions.mix(FASTQ_ALIGN_BOWTIE2.out.versions.first())
+        ch_versions = ch_versions.mix(FASTQ_ALIGN_BOWTIE2.out.versions)
     }
 
     //
@@ -229,7 +229,7 @@ workflow CHIPSEQ {
         ch_samtools_stats    = FASTQ_ALIGN_CHROMAP.out.stats
         ch_samtools_flagstat = FASTQ_ALIGN_CHROMAP.out.flagstat
         ch_samtools_idxstats = FASTQ_ALIGN_CHROMAP.out.idxstats
-        ch_versions = ch_versions.mix(FASTQ_ALIGN_CHROMAP.out.versions.first())
+        ch_versions = ch_versions.mix(FASTQ_ALIGN_CHROMAP.out.versions)
     }
 
     //
@@ -274,7 +274,7 @@ workflow CHIPSEQ {
     PICARD_MERGESAMFILES (
         ch_sort_bam
     )
-    ch_versions = ch_versions.mix(PICARD_MERGESAMFILES.out.versions.first().ifEmpty(null))
+    ch_versions = ch_versions.mix(PICARD_MERGESAMFILES.out.versions.first())
 
     //
     // SUBWORKFLOW: Mark duplicates & filter BAM files after merging
@@ -549,7 +549,8 @@ workflow CHIPSEQ {
             // MODULE: MACS2 QC plots with R
             //
             PLOT_MACS2_QC (
-                ch_macs2_peaks.collect{it[1]}
+                ch_macs2_peaks.collect{it[1]},
+                params.narrow_peak
             )
             ch_versions = ch_versions.mix(PLOT_MACS2_QC.out.versions)