Skip to content
This repository has been archived by the owner on Jan 27, 2020. It is now read-only.

EACR25 abstract #580

Merged
merged 1 commit into from
May 2, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions doc/Abstracts/2018-06-EACR25.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# 25th Biennial Congress Of The European Association For Cancer Research 2018

## Somatic and germline calls from tumour/normal whole genome data: bioinformatics workflow for reproducible research


Szilveszter Juhos 1,
Maxime Garcia 2,
Teresita Díaz de Ståhl 3,
Johanna Sandgren 3,
Markus Mayrhofer 4,
Max Käller 5,
Björn Nystedt 6,
Monica Nistér 3

1. Karolinska Institutet, Department of Oncology Pathology, Stockholm, Sweden.
2. Karolinska Institutet- Science for Life Laboratory, Department of Oncology-Pathology, Stockholm, Sweden.
3. Karolinska Institutet, Department of Oncology-Pathology, Stockholm, Sweden.
4. Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
5. Science for Life Laboratory, Royal Institute of Technology- School of Biotechnology- Division of Gene Technology, Stockholm, Sweden.
6. Science for Life Laboratory, Department of Cell and Molecular Biology- National Bioinformatics Infrastructure Sweden- Uppsala University, Uppsala, Sweden.

### Introduction

Whole-genome sequencing of cancer tumours is more a research tool nowadays, but going to be used in clinical settings in
the near future to facilitate precision medicine. While large institutions have built up in-house bioinformatics
solutions for their own data analysis, robust and portable workflows combining multiple software have been lacking,
making it difficult for individual research groups to utilise the potential of this research field. Here we present
Sarek, a robust, easy-to-install workflow for identification of both somatic and germline mutations from paired
tumour/normal/relapse samples.

### Material and Methods

Sarek is open source and implemented in Nextflow; a domain specific programming language to enable portability and
reproducibility. With the help of docker containers the versions of the underlying software can be maintained.
Furthermore, with Singularity it is possible to run the workflow on protected clusters with no internet connection.

The workflow starts from raw FASTQ files, and follows the GATK best practices to prepare the recalibrated files with
joint realignment around indels for both the tumour and the normal data. Reads are alignment to the GRCh38 human
reference in an ALT-aware settings using BWA, however, it is possible to assign other references. HaplotypeCaller and
Strelka2 germline calls are collected for both the tumour and the normal sample, and Manta provides germline structural
variants. The somatic variations are calculated by running MuTect2, Strelka and FreeBayes (and MuTect1 optionally).
Somatic structural variants are delivered by Manta, and ASCAT estimates ploidy, tumour heterogeneity and CNVs. The
resulting variant call files are annotated by SnpEff and Ensembl-VEP. The annotated calls are further filtered and
prioritised by our custom methods. During running the workflow quality control metrics are also calculated and
aggregated by MultiQC.

### Results and Discussions

Sarek was validated on a real dataset with known golden set of somatic mutations. In a real settings, whole-genome
sequencing (WGS, 45-60x coverage) of patient-matched tumor and blood derived-DNA is being performed on a set of 80
pediatric brain tumor samples of the Swedish Childhood Tumor Biobank. The workflow helps to produce, filter, prioritise
and characterise both germline and somatic variations.

### Conclusion

Sarek is a portable bioinformatics pipeline for WGS normal/tumour matched samples, aiding precision medicine by improved
subtyping and to gain novel functional insights in a reproducible framework.