Literature

##Sequencing biases

Meacham et al. (2011): Identification and correction of systematic error in high-throughput sequence data, (doi:10.1186/1471-2105-12-451) - Correcting for systematic base pair errors in deep sequencing; important paper if you want to look at any allele-specificy or if you're interested in SNPs

Benjamini & Speed (2012): Summarizing and correcting the GC content bias in high-throughput sequencing, (doi: 10.1093/nar/gks001) - GC bias of deeply sequenced samples; very good paper that systematically assesses many possible sourced of GC bias for deeply sequenced samples and eventually pinpoints it to the DNA polymerase

A collection of papers on quality controls for various NGS applications: Frontiers in Genetics (2014)

##Deep sequencing

Zentner and Henikoff (2012): Surveying the epigenomic landscape, one base at a time, (doi:10.1186/gb-2012-13-10-250) - Overview of popular *-seq techniques; very nice description of DNase-seq, MNase-seq, FAIRE-seq etc.

Son and Taylor (2011): Preparing DNA Libraries for Multiplexed Paired-End Deep Sequencing for Illumina GA Sequencers, (doi:10.1002/9780471729259.mc01e04s20) - Paper on multiplexing; describes the individual steps of the Illumina deep sequencing protocols quite in detail

Illumina's technical report - focuses on Illumina's sequencing technology; nice educative figures

##Mapping of short NGS reads

informative slides Mapping of sequencing reads - Introduction to various aspects of NGS read mapping

Fonseca et al. (2012): Tools for mapping high-throughput sequencing data, (doi:10.1093/bioinformatics/bts605) - An excellent starting point despite its "old" age, you will learn a lot about the different philosophies behind the read alignment tools!

Hatem et al. (2013): Benchmarking short sequence mapping tools, (doi:10.1186/1471-2105-14-184) (spoiler alert: bowtie wins)

Engstrom et al. (2013): Systematic evaluation of spliced alignment programs for RNA-seq data, (doi:10.1038/nmeth.2722)

Genome Mappability

Lee and Schatz (2012): The reliability of short read mapping, (doi:10.1093/bioinformatics/bts330) - Very detailed paper about genome mappability issues that presents a new suite of tools for taking the mappability into account

mappability maps can be downloaded here

##NGS data formats

UCSC has a very good overview with brief descriptions of BED, bedGraph, bigWig etc.: https://genome.ucsc.edu/FAQ/FAQformat.html
VCF format (encoding SNPs, indels etc.): Very readable, albeit not exhausting description
Transcriptomes are often saved in GFF3 format (this is what TopHat needs, for example), but just to make things more complicated, GTF is another format used for transcriptome information, too (here are more information on GTF)

##Bioinformatic Tools (Linux, R, BEDTools etc.) - Manuals, courses, original papers

Why and how is bioinformatics software special? Altschul et a. (2013) The anatomy of successful computational biology software, (doi:10.1038/nbt.2721) (Highly recommended to read!)
Bild et al. (2014) A Field Guide to Genomics Research, (doi:10.1371/journal.pbio.1001744) - Very readable introduction about the different caveats of genomics research (with cute cartoons!)

Linux Command Line

Linux & Perl Primer for Biologists - Very entertaining introduction to command line commands and perl scripts with a focus on bioinformatic application, i.e. handling of DNA sequences
Linux Tutorial for Beginners - Thorough, but concise online tutorial introducing the very basics of handling the Linux command line
Writing Linux shell scripts - Useful for slightly more advanced Linux command line users

R

Hands on R course - For beginners - R is probably the most widely used open-source statistical software; through our epicenter website you can also access RStudio which provides are very nice interface to working and plotting with R. In fact, most of the plots generated within Galaxy are generated through R scripts, so if you're not happy with the default formats of the Galaxy graphs, definitely have a look at R yourself. The learning curve is steep, but it is worth it.

BEDTools

BEDTools Manual - When working with genomic intervals (e.g. genes, peaks, enriched regions...), BEDTools are invaluable! The manual is a very good read and we refer to it almost daily.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

literature.md

literature.md

Literature

Files

literature.md

Latest commit

History

literature.md

File metadata and controls

Literature