Tamor

Rapid automated Personal Cancer Genome Report (PCGR) generation using Illumina Dragen + Snakemake, handling both genomic and transcriptomic input data. Catalogues as conforming to Standardized Snakemake Workflow rules for reproducibility.

tl;dr

Data for large scale tumor analysis projects can be spread over multiple DNA sequencing instrument runs, Tamor simplifies the process of analyzing them.

Tab-delimited files are configured by the user to associate tumor (DNA and/or RNA) and germline sequencing sample IDs with a study subject ID, along with a tissue-of-origin for the tumor. Somatic variants (including small nucleotide variants, structural variants and copy number variants) as well as gene expression reports are generated using

these tab-delimited files
the Illumina sequencer output (BCL or FASTQ), and
the Illumina Experiment Manager samplesheets (CSV) for the sequencing runs

Prerequisites

This workflow is intended for people with an Illumina Dragen hardware-accelerated (FPGA) system for high-throughput genomics analysis.
If you don't have this hardware, this probably isn't for you.

This code has been tested with Dragen version 4.2 only.

If you are the sole user of the Dragen system, luck you: that's it! If there is potentially more than one user of the Dragen system, you will also need to set up a slurm queue so that jobs running on the Dragen FPGA don't collide with each other. This is recommended by Illumina support, but not part of the Dragen documentation.

Installation

Install the mamba package manager if you don't already have it on your system.
Create a mamba or conda environment for the latest Snakemake (8.something) and utilities:

mamba create -c conda-forge -c bioconda -n snakemake snakemake git wget
mamba activate snakemake
pip install snakemake-executor-plugin-cluster-generic

Download the Tamor code:

git clone https://github.com/nodrogluap/tamor
cd tamor

Testing

Tamor follows the Snakemake Distribution and Reproducibility guidelines, so files are located in standardized locations. The default config files are pre-configured for running a single test case from the NCBI Short Read Archive. This case of apparent Chronic Lymphocytic Leukemia (CLL) has both tumor DNA (30x coverage) and RNA data (27M) available, both 2x150bp paired-end Illumina.

If you would like to run the test case before reconfiguring Tamor to use your own data, you will need to download and format the CLL SRA records. This requires some additional specialty software not otherwise required by Tamor, so you will need to install a test mamba environment first.

mamba create env -f workflow/envs/test.yaml
mamba activate test
workflow/scripts/download_testdata.py
mamba deactivate test

This can take a few hours depending on your Internet connection speed, and requires at least 40GB of RAM to generate matched-pseudonormal FASTQ files from the cancer sample FASTQ files.

Running Tamor

NOTA BENE!!! The first time you run Tamor, it will download (~22GB) the cancer annotation databases that CPSR and PCGR rely on for annotating your discovered sequence variants, by automatically running the workflow/scripts/download_resources.py script. This will likely take several hours. If you interrupt that process, you will need to run that script manually.

Once either the test data or your own (see Configuration section below) is ready, you can run Snakemake to generate the BAM files, VCF files, and CPSR/PCGR reports.

On a single-user system:

snakemake --use-conda -j 1

Otherwise, on a multi-user system, it is imperative to use a queuing system such as slurm to submit only one job at a time to Dragen v4.x. Once slurm is installed and configured on your Dragen system, Snakemake support for slurm is enabled by invoking like so:

snakemake --use-conda -j 1 --executor cluster-generic --cluster-generic-submit-cmd sbatch

Regardless of the invocation method used above, the default outputs are in a directory called results/pcgr/projectID/subjectID_tumorSampleID_germlineSampleID. The most relevant document may be the self-contained Web page subjectID.pcgr.grch38.html.

After a successful workflow run, additional reporting information (including provenance) can be aggregated into a self-contained HTML report using Snakemake's automated workflow reporting tool:

snakemake --report report.html

Configuration

Detailed configuration information for importing your own cancer cases for processing is available in the config dir.

Acknowledgements

This project is being developed in support of the Terry Fox Research Institute's Marathon of Hope Cancer Care Network activities within the Prairie Cancer Research Consortium.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
config		config
docs		docs
resources/samplesheets		resources/samplesheets
workflow		workflow
.snakemake-workflow-catalog.yml		.snakemake-workflow-catalog.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tamor

tl;dr

Table of Contents

Prerequisites

Installation

Testing

Running Tamor

Configuration

Acknowledgements

About

Releases

Contributors 2

Languages

License

nodrogluap/tamor

Folders and files

Latest commit

History

Repository files navigation

Tamor

tl;dr

Table of Contents

Prerequisites

Installation

Testing

Running Tamor

Configuration

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 2

Languages