Skip to content

Commit

Permalink
Merge pull request #40 from qbic-pipelines/dev
Browse files Browse the repository at this point in the history
Release 1.2.0
  • Loading branch information
FriederikeHanssen committed Jan 13, 2022
2 parents 303f6cd + 0f7bf76 commit b1fc825
Show file tree
Hide file tree
Showing 12 changed files with 222 additions and 84 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
matrix:
# Nextflow versions: check pipeline minimum and current latest
nxf_ver: ['20.04.1', '']
config: ['test_chr','test_bai']
config: ['test_chr','test_bai','test_cram']
steps:
- name: Check out pipeline code
uses: actions/checkout@v2
Expand All @@ -33,13 +33,13 @@ jobs:
environment.yml
- name: Build new docker image
if: env.MATCHED_FILES
run: docker build --no-cache . -t qbicpipelines/bamtofastq:1.1.0
run: docker build --no-cache . -t qbicpipelines/bamtofastq:1.2.0

- name: Pull docker image
if: ${{ !env.MATCHED_FILES }}
run: |
docker pull qbicpipelines/bamtofastq:dev
docker tag qbicpipelines/bamtofastq:dev qbicpipelines/bamtofastq:1.1.0
docker pull qbicpipelines/bamtofastq:1.2.0
docker tag qbicpipelines/bamtofastq:1.2.0 qbicpipelines/bamtofastq:1.2.0
- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/push_dockerhub.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
docker tag qbicpipelines/bamtofastq:latest qbicpipelines/bamtofastq:dev
docker push qbicpipelines/bamtofastq:dev
- name: Push Docker image to DockerHub (release)
if: ${{ github.event_name == 'release' }}
run: |
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# nf-core/bamtofastq: Changelog

## v1.2.0 - Anna Winlock

- [#36](https://github.com/qbic-pipelines/bamtofastq/pull/36) Add options `--cram_files` and `--reference_fasta` to add support for CRAM files.
- [#31](https://github.com/qbic-pipelines/bamtofastq/pull/31) Add option `--samtools_collate_fast` and improve speed of cat.
- [#32](https://github.com/qbic-pipelines/bamtofastq/pull/32) Added `--samtools_collate_fast` to sortExtractMapped and changed cat command to append.
- [#33](https://github.com/qbic-pipelines/bamtofastq/pull/33) Added flag `--reads_in_memory` to specify how many reads shall be stored in memory.

## v1.1.0 - Katherine Johnson

- [#21](https://github.com/qbic-pipelines/bamtofastq/21) Allows bam indices as additional input files
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ LABEL authors="Friederike Hanssen" \

COPY environment.yml /
RUN conda env create -f /environment.yml && conda clean -a
ENV PATH /opt/conda/envs/qbic-pipelines-bamtofastq-1.1.0/bin:$PATH
ENV PATH /opt/conda/envs/qbic-pipelines-bamtofastq-1.2.0/bin:$PATH
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ![qbic-pipelines/bamtofastq](docs/images/qbic-pipelines-bamtofastq_logo.png)

> **An open-source pipeline converting (un)mapped single-end or paired-end bam files to fastq.gz**.
> **An open-source pipeline converting (un)mapped single-end or paired-end bam/cram files to fastq.gz**.
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A520.04.1-brightgreen.svg)](https://www.nextflow.io/)

Expand All @@ -14,8 +14,8 @@
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4022137.svg)](https://doi.org/10.5281/zenodo.4022137)
## Introduction

This pipeline converts (un)mapped `.bam` files into `fq.gz` files.
Initially, it auto-detects, whether the input file contains single-end or paired-end reads. Following this step, the reads are sorted using `samtools collate` and extracted with `samtools fastq`. Furthermore, for mapped bam files it is possible to only convert reads mapping to a specific region or chromosome. The obtained FastQ files can then be used to further process with other pipelines.
This pipeline converts (un)mapped `.bam` files (or `.cram` files with the `--cram_files` option) into `fq.gz` files.
Initially, it auto-detects, whether the input file contains single-end or paired-end reads. Following this step, the reads are sorted using `samtools collate` and extracted with `samtools fastq`. Furthermore, for mapped bam/cram files it is possible to only convert reads mapping to a specific region or chromosome. The obtained FastQ files can then be used to further process with other pipelines.

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Expand Down Expand Up @@ -62,6 +62,7 @@ Helpful contributors:
* [Gisela Gabernet](https://github.com/ggabernet)
* [Matilda Åslin](https://github.com/matrulda)
* [Susanne Jodoin](https://github.com/SusiJo)
* [Bruno Grande](https://github.com/BrunoGrandePhd)

### Resources

Expand Down
2 changes: 1 addition & 1 deletion conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ process {
}
withLabel:process_high {
cpus = { check_max( 15 * task.attempt, 'cpus' ) }
memory = { check_max( 120.GB * task.attempt, 'memory' ) }
memory = { check_max( 200.GB * task.attempt, 'memory' ) }
time = { check_max( 10.h * task.attempt, 'time' ) }
}
withLabel:process_long {
Expand Down
5 changes: 5 additions & 0 deletions conf/test_bai.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ params {
max_cpus = 2
max_memory = 6.GB
max_time = 48.h
samtools_collate_fast = true
reads_in_memory = '10000'
no_stats = true
no_read_QC = true


index_files = true
input_paths = [
Expand Down
25 changes: 25 additions & 0 deletions conf/test_cram.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*
* -------------------------------------------------
* Nextflow config file for running tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run qbic-pipelines/bamtofastq -profile test_cram
*/


params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
// Limit resources so that this can run on Travis
max_cpus = 2
max_memory = 6.GB
max_time = 48.h

cram_files = true
input = [
'https://github.com/qbic-pipelines/bamtofastq/master/testdata/First_SmallTest_Paired.cram',
'https://github.com/qbic-pipelines/bamtofastq/master/testdata/Second_SmallTest_Paired.cram'
]
reference_fasta = 'ftp://ftp.broadinstitute.org/pub/seq/references/Homo_sapiens_assembly19.fasta'
}
143 changes: 93 additions & 50 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,39 +4,44 @@

<!-- Install Atom plugin markdown-toc-auto for this ToC to auto-update on save -->
<!-- TOC START min:2 max:3 link:true asterisk:true update:true -->
* [Table of contents](#table-of-contents)
* [Introduction](#introduction)
* [Running the pipeline](#running-the-pipeline)
* [Updating the pipeline](#updating-the-pipeline)
* [Reproducibility](#reproducibility)
* [Main arguments](#main-arguments)
* [`-profile`](#-profile)
* [`--input`](#--input)
* [`--index_files`](#--index_files)
* [`--chr`](#--chr)
* [`--no_read_QC`](#--no_read_QC)
* [`--no_stats`](#--no_stats)
* [Job resources](#job-resources)
* [Automatic resubmission](#automatic-resubmission)
* [Custom resource requests](#custom-resource-requests)
* [AWS Batch specific parameters](#aws-batch-specific-parameters)
* [`--awsqueue`](#--awsqueue)
* [`--awsregion`](#--awsregion)
* [Other command line parameters](#other-command-line-parameters)
* [`--outdir`](#--outdir)
* [`--email`](#--email)
* [`--email_on_fail`](#--email_on_fail)
* [`-name`](#-name)
* [`-resume`](#-resume)
* [`-c`](#-c)
* [`--custom_config_version`](#--custom_config_version)
* [`--custom_config_base`](#--custom_config_base)
* [`--max_memory`](#--max_memory)
* [`--max_time`](#--max_time)
* [`--max_cpus`](#--max_cpus)
* [`--plaintext_email`](#--plaintext_email)
* [`--monochrome_logs`](#--monochrome_logs)
* [`--multiqc_config`](#--multiqc_config)
- [qbic-pipelines/bamtofastq: Usage](#qbic-pipelinesbamtofastq-usage)
- [Table of contents](#table-of-contents)
- [Introduction](#introduction)
- [Running the pipeline](#running-the-pipeline)
- [Updating the pipeline](#updating-the-pipeline)
- [Reproducibility](#reproducibility)
- [Main arguments](#main-arguments)
- [`-profile`](#-profile)
- [`--input`](#--input)
- [`--index_files`](#--index_files)
- [`--cram_files`](#--cram_files)
- [`--reference_fasta`](#--reference_fasta)
- [`--chr` (optional)](#--chr-optional)
- [`--no_read_QC` (optional)](#--no_read_qc-optional)
- [`--samtools_collate_fast` (optional)](#--samtools_collate_fast-optional)
- [`--reads_in_memory` (optional)](#--reads_in_memory-optional)
- [`--no_stats` (optional)](#--no_stats-optional)
- [Job resources](#job-resources)
- [Automatic resubmission](#automatic-resubmission)
- [Custom resource requests](#custom-resource-requests)
- [AWS Batch specific parameters](#aws-batch-specific-parameters)
- [`--awsqueue`](#--awsqueue)
- [`--awsregion`](#--awsregion)
- [Other command line parameters](#other-command-line-parameters)
- [`--outdir`](#--outdir)
- [`--email`](#--email)
- [`--email_on_fail`](#--email_on_fail)
- [`-name`](#-name)
- [`-resume`](#-resume)
- [`-c`](#-c)
- [`--custom_config_version`](#--custom_config_version)
- [`--custom_config_base`](#--custom_config_base)
- [`--max_memory`](#--max_memory)
- [`--max_time`](#--max_time)
- [`--max_cpus`](#--max_cpus)
- [`--plaintext_email`](#--plaintext_email)
- [`--monochrome_logs`](#--monochrome_logs)
- [`--multiqc_config`](#--multiqc_config)
<!-- TOC END -->

## Introduction
Expand Down Expand Up @@ -92,24 +97,24 @@ Use this parameter to choose a configuration profile. Profiles can give configur

If `-profile` is not specified at all the pipeline will be run locally and expects all software to be installed and available on the `PATH`.

* `awsbatch`
* A generic configuration profile to be used with AWS Batch.
* `conda`
* A generic configuration profile to be used with [conda](https://conda.io/docs/)
* Pulls most software from [Bioconda](https://bioconda.github.io/)
* `docker`
* A generic configuration profile to be used with [Docker](http://docker.com/)
* Pulls software from dockerhub: [`nfcore/bamtofastq`](http://hub.docker.com/r/nfcore/bamtofastq/)
* `singularity`
* A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/)
* Pulls software from DockerHub: [`nfcore/bamtofastq`](http://hub.docker.com/r/nfcore/bamtofastq/)
* `test`
* A profile with a complete configuration for automated testing
* Includes links to test data so needs no other parameters
- `awsbatch`
- A generic configuration profile to be used with AWS Batch.
- `conda`
- A generic configuration profile to be used with [conda](https://conda.io/docs/)
- Pulls most software from [Bioconda](https://bioconda.github.io/)
- `docker`
- A generic configuration profile to be used with [Docker](http://docker.com/)
- Pulls software from dockerhub: [`nfcore/bamtofastq`](http://hub.docker.com/r/nfcore/bamtofastq/)
- `singularity`
- A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/)
- Pulls software from DockerHub: [`nfcore/bamtofastq`](http://hub.docker.com/r/nfcore/bamtofastq/)
- `test`
- A profile with a complete configuration for automated testing
- Includes links to test data so needs no other parameters

### `--input`

Use this to specify the location of your input Bam files. For example:
Use this to specify the location of your input Bam files (or CRAM files if used with [`--cram_files`](#--cram_files)). For example:

```bash
--input 'path/to/data/sample_*.bam'
Expand All @@ -118,7 +123,7 @@ Use this to specify the location of your input Bam files. For example:
Please note the following requirements:

1. The path must be enclosed in quotes
2. The path must have at least one `*` wildcard character
2. The path must have at least one `*`/`**` wildcard character

### `--index_files`

Expand All @@ -133,10 +138,34 @@ Please note the following requirements:
1. The path must be enclosed in quotes
2. The path must have at least one `*` wildcard character

### `--cram_files`

Use this to indicate that **all** of the files listed in `--input` are CRAM files instead of BAM files. This enabled a step at the beginning of the workflow that converts each CRAM file to BAM format on the fly. Note that this option is incompatible with [`--index_files`](#--index_files). For example:

```bash
--cram_files --input 'path/to/data/sample_*.cram'
```

While the above command is valid, it will only work if the reference genome FASTA file listed in the CRAM header is available (_e.g._ via HTTP/FTP or on the local file system). Otherwise, you will need to use the [`--reference_fasta` option](#--reference_fasta). You can check which reference FASTA file is indicated in the CRAM header with the following command:

```bash
samtools view -H path/to/sample.cram | grep '@SQ'
```

Unfortunately, at the time of writing, FastQC [doesn't support](https://github.com/s-andrews/FastQC/issues/54) CRAM files as input. Hence, a benefit of converting CRAM files to BAM format as opposed to converting directly to FASTQ format is that you can perform QC before the final conversion.

### `--reference_fasta`

Use this option to indicate which reference genome FASTA file to use when decompressing CRAM files. This is useful if the FASTA file indicated in the CRAM header (see [`--cram_files`](#--cram_files) for more information). For example:

```bash
--cram_files --input 'path/to/data/sample_*.cram' --reference_fasta 'ftp://ftp.broadinstitute.org/pub/seq/references/Homo_sapiens_assembly19.fasta'
```

### `--chr` (optional)

Use to only obtain reads mapping to a specific chromosome or region.
> It is important to specify the chromsome or region name **exactly** as set in the bam file. Otherwise no reads may be extracted!
> It is important to specify the chromosome or region name **exactly** as set in the bam file. Otherwise no reads may be extracted!
For example:

Expand All @@ -154,6 +183,20 @@ Use to skip `FastQC` on obtained reads. This is useful, when the reads are used
--no_read_QC
```

### `--samtools_collate_fast` (optional)

Use to specify the fast mode for the `samtools collate` command in the processes `sortExtractMapped`, `sortExtractUnmapped` and `sortExtractSingleEnd`. This option relies on the samtools command line flags `-f -r INT` and will output primary alignments only. For full documentation of this mode please refer to the [samtools documentation](http://www.htslib.org/doc/samtools-collate.html#OPTIONS).

### `--reads_in_memory` (optional)

Only relevant in combination with `--samtools_collate_fast`. It specifies how many alignment reads are kept in memory [default = '100000']. This is useful for speeding up the processes `sortExtractMapped`, `sortExtractUnmapped` and `sortExtractSingleEnd`.

Example:

```bash
--samtools_collate_fast --reads_in_memory '1000000'
```

### `--no_stats` (optional)

Use to skip `FastQC` on both input bam and output reads, as well as all `samtools flagstat`, `samtools idxstats`, and `samtools stats`. This is useful for large datasets, since the quality metrics processes require a significant amount of time and resources.
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# You can use this file to create a conda environment for this pipeline:
# conda env create -f environment.yml
name: qbic-pipelines-bamtofastq-1.1.0
name: qbic-pipelines-bamtofastq-1.2.0
channels:
- conda-forge
- bioconda
Expand Down
Loading

0 comments on commit b1fc825

Please sign in to comment.