Skip to content

Latest commit

 

History

History
65 lines (32 loc) · 5.68 KB

literature.md

File metadata and controls

65 lines (32 loc) · 5.68 KB

Literature

##Sequencing biases

Meacham et al. (2011): Identification and correction of systematic error in high-throughput sequence data, (doi:10.1186/1471-2105-12-451) - Correcting for systematic base pair errors in deep sequencing; important paper if you want to look at any allele-specificy or if you're interested in SNPs

Benjamini & Speed (2012): Summarizing and correcting the GC content bias in high-throughput sequencing, (doi: 10.1093/nar/gks001) - GC bias of deeply sequenced samples; very good paper that systematically assesses many possible sourced of GC bias for deeply sequenced samples and eventually pinpoints it to the DNA polymerase

A collection of papers on quality controls for various NGS applications: Frontiers in Genetics (2014)

##Deep sequencing

Zentner and Henikoff (2012): Surveying the epigenomic landscape, one base at a time, (doi:10.1186/gb-2012-13-10-250) - Overview of popular *-seq techniques; very nice description of DNase-seq, MNase-seq, FAIRE-seq etc.

Son and Taylor (2011): Preparing DNA Libraries for Multiplexed Paired-End Deep Sequencing for Illumina GA Sequencers, (doi:10.1002/9780471729259.mc01e04s20) - Paper on multiplexing; describes the individual steps of the Illumina deep sequencing protocols quite in detail

Illumina's technical report - focuses on Illumina's sequencing technology; nice educative figures

##Mapping of short NGS reads

informative slides Mapping of sequencing reads - Introduction to various aspects of NGS read mapping

Fonseca et al. (2012): Tools for mapping high-throughput sequencing data, (doi:10.1093/bioinformatics/bts605) - An excellent starting point despite its "old" age, you will learn a lot about the different philosophies behind the read alignment tools!

Hatem et al. (2013): Benchmarking short sequence mapping tools, (doi:10.1186/1471-2105-14-184) (spoiler alert: bowtie wins)

Engstrom et al. (2013): Systematic evaluation of spliced alignment programs for RNA-seq data, (doi:10.1038/nmeth.2722)

Genome Mappability

Lee and Schatz (2012): The reliability of short read mapping, (doi:10.1093/bioinformatics/bts330) - Very detailed paper about genome mappability issues that presents a new suite of tools for taking the mappability into account

mappability maps can be downloaded here

##NGS data formats

  • UCSC has a very good overview with brief descriptions of BED, bedGraph, bigWig etc.: https://genome.ucsc.edu/FAQ/FAQformat.html

  • VCF format (encoding SNPs, indels etc.): Very readable, albeit not exhausting description

  • Transcriptomes are often saved in GFF3 format (this is what TopHat needs, for example), but just to make things more complicated, GTF is another format used for transcriptome information, too (here are more information on GTF)

##Bioinformatic Tools (Linux, R, BEDTools etc.) - Manuals, courses, original papers

Linux Command Line

R

  • Hands on R course - For beginners - R is probably the most widely used open-source statistical software; through our epicenter website you can also access RStudio which provides are very nice interface to working and plotting with R. In fact, most of the plots generated within Galaxy are generated through R scripts, so if you're not happy with the default formats of the Galaxy graphs, definitely have a look at R yourself. The learning curve is steep, but it is worth it.

BEDTools

  • BEDTools Manual - When working with genomic intervals (e.g. genes, peaks, enriched regions...), BEDTools are invaluable! The manual is a very good read and we refer to it almost daily.