##Sequencing biases
Meacham et al. (2011): Identification and correction of systematic error in high-throughput sequence data, (doi:10.1186/1471-2105-12-451) - Correcting for systematic base pair errors in deep sequencing; important paper if you want to look at any allele-specificy or if you're interested in SNPs
Benjamini & Speed (2012): Summarizing and correcting the GC content bias in high-throughput sequencing, (doi: 10.1093/nar/gks001) - GC bias of deeply sequenced samples; very good paper that systematically assesses many possible sourced of GC bias for deeply sequenced samples and eventually pinpoints it to the DNA polymerase
A collection of papers on quality controls for various NGS applications: Frontiers in Genetics (2014)
##Deep sequencing
Zentner and Henikoff (2012): Surveying the epigenomic landscape, one base at a time, (doi:10.1186/gb-2012-13-10-250) - Overview of popular *-seq techniques; very nice description of DNase-seq, MNase-seq, FAIRE-seq etc.
Son and Taylor (2011): Preparing DNA Libraries for Multiplexed Paired-End Deep Sequencing for Illumina GA Sequencers, (doi:10.1002/9780471729259.mc01e04s20) - Paper on multiplexing; describes the individual steps of the Illumina deep sequencing protocols quite in detail
Illumina's technical report - focuses on Illumina's sequencing technology; nice educative figures
##Mapping of short NGS reads
informative slides Mapping of sequencing reads - Introduction to various aspects of NGS read mapping
Fonseca et al. (2012): Tools for mapping high-throughput sequencing data, (doi:10.1093/bioinformatics/bts605) - An excellent starting point despite its "old" age, you will learn a lot about the different philosophies behind the read alignment tools!
Hatem et al. (2013): Benchmarking short sequence mapping tools, (doi:10.1186/1471-2105-14-184) (spoiler alert: bowtie wins)
Engstrom et al. (2013): Systematic evaluation of spliced alignment programs for RNA-seq data, (doi:10.1038/nmeth.2722)
Genome Mappability
Lee and Schatz (2012): The reliability of short read mapping, (doi:10.1093/bioinformatics/bts330) - Very detailed paper about genome mappability issues that presents a new suite of tools for taking the mappability into account
mappability maps can be downloaded here
##NGS data formats
-
UCSC has a very good overview with brief descriptions of BED, bedGraph, bigWig etc.: https://genome.ucsc.edu/FAQ/FAQformat.html
-
VCF format (encoding SNPs, indels etc.): Very readable, albeit not exhausting description
-
Transcriptomes are often saved in GFF3 format (this is what TopHat needs, for example), but just to make things more complicated, GTF is another format used for transcriptome information, too (here are more information on GTF)
##Bioinformatic Tools (Linux, R, BEDTools etc.) - Manuals, courses, original papers
- Why and how is bioinformatics software special? Altschul et a. (2013) The anatomy of successful computational biology software, (doi:10.1038/nbt.2721) (Highly recommended to read!)
- Bild et al. (2014) A Field Guide to Genomics Research, (doi:10.1371/journal.pbio.1001744) - Very readable introduction about the different caveats of genomics research (with cute cartoons!)
Linux Command Line
- Linux & Perl Primer for Biologists - Very entertaining introduction to command line commands and perl scripts with a focus on bioinformatic application, i.e. handling of DNA sequences
- Linux Tutorial for Beginners - Thorough, but concise online tutorial introducing the very basics of handling the Linux command line
- Writing Linux shell scripts - Useful for slightly more advanced Linux command line users
R
- Hands on R course - For beginners - R is probably the most widely used open-source statistical software; through our epicenter website you can also access RStudio which provides are very nice interface to working and plotting with R. In fact, most of the plots generated within Galaxy are generated through R scripts, so if you're not happy with the default formats of the Galaxy graphs, definitely have a look at R yourself. The learning curve is steep, but it is worth it.
BEDTools
- BEDTools Manual - When working with genomic intervals (e.g. genes, peaks, enriched regions...), BEDTools are invaluable! The manual is a very good read and we refer to it almost daily.