diff --git a/doc/Abstracts/2018-06-EACR25.md b/doc/Abstracts/2018-06-EACR25.md new file mode 100644 index 0000000000..ca86f2e658 --- /dev/null +++ b/doc/Abstracts/2018-06-EACR25.md @@ -0,0 +1,59 @@ +# 25th Biennial Congress Of The European Association For Cancer Research 2018 + +## Somatic and germline calls from tumour/normal whole genome data: bioinformatics workflow for reproducible research + + +Szilveszter Juhos 1, +Maxime Garcia 2, +Teresita Díaz de Ståhl 3, +Johanna Sandgren 3, +Markus Mayrhofer 4, +Max Käller 5, +Björn Nystedt 6, +Monica Nistér 3 + +1. Karolinska Institutet, Department of Oncology Pathology, Stockholm, Sweden. +2. Karolinska Institutet- Science for Life Laboratory, Department of Oncology-Pathology, Stockholm, Sweden. +3. Karolinska Institutet, Department of Oncology-Pathology, Stockholm, Sweden. +4. Science for Life Laboratory, Uppsala University, Uppsala, Sweden. +5. Science for Life Laboratory, Royal Institute of Technology- School of Biotechnology- Division of Gene Technology, Stockholm, Sweden. +6. Science for Life Laboratory, Department of Cell and Molecular Biology- National Bioinformatics Infrastructure Sweden- Uppsala University, Uppsala, Sweden. + +### Introduction + +Whole-genome sequencing of cancer tumours is more a research tool nowadays, but going to be used in clinical settings in +the near future to facilitate precision medicine. While large institutions have built up in-house bioinformatics +solutions for their own data analysis, robust and portable workflows combining multiple software have been lacking, +making it difficult for individual research groups to utilise the potential of this research field. Here we present +Sarek, a robust, easy-to-install workflow for identification of both somatic and germline mutations from paired +tumour/normal/relapse samples. + +### Material and Methods + +Sarek is open source and implemented in Nextflow; a domain specific programming language to enable portability and +reproducibility. With the help of docker containers the versions of the underlying software can be maintained. +Furthermore, with Singularity it is possible to run the workflow on protected clusters with no internet connection. + +The workflow starts from raw FASTQ files, and follows the GATK best practices to prepare the recalibrated files with +joint realignment around indels for both the tumour and the normal data. Reads are alignment to the GRCh38 human +reference in an ALT-aware settings using BWA, however, it is possible to assign other references. HaplotypeCaller and +Strelka2 germline calls are collected for both the tumour and the normal sample, and Manta provides germline structural +variants. The somatic variations are calculated by running MuTect2, Strelka and FreeBayes (and MuTect1 optionally). +Somatic structural variants are delivered by Manta, and ASCAT estimates ploidy, tumour heterogeneity and CNVs. The +resulting variant call files are annotated by SnpEff and Ensembl-VEP. The annotated calls are further filtered and +prioritised by our custom methods. During running the workflow quality control metrics are also calculated and +aggregated by MultiQC. + +### Results and Discussions + +Sarek was validated on a real dataset with known golden set of somatic mutations. In a real settings, whole-genome +sequencing (WGS, 45-60x coverage) of patient-matched tumor and blood derived-DNA is being performed on a set of 80 +pediatric brain tumor samples of the Swedish Childhood Tumor Biobank. The workflow helps to produce, filter, prioritise +and characterise both germline and somatic variations. + +### Conclusion + +Sarek is a portable bioinformatics pipeline for WGS normal/tumour matched samples, aiding precision medicine by improved +subtyping and to gain novel functional insights in a reproducible framework. + +