Skip to content

Short Read QC and Mapping DNASeq Lab I

Meg Staton edited this page Sep 19, 2016 · 7 revisions

Goals

  • Learn about sequence read quality and trimming
  • Understand the characteristics of short read mappers
  • Understand the file formats SAM, BAM, CRAM
  • Assess the quality of a read file
  • Trim a read file

Concepts to know

  • What questions are you answering when you do quality checks on sequence data? are some types of quality checks done by the software Fastqc?
  • What types of trimming are you likely to need to do to sequence data? What is a software package that could do this?
  • What characteristics does a short read mapper have that is different from BLAST?
  • What are some good criteria to use to select a read mapping software package?
  • Alignment software often requires an initial step to index the reference - what is the indexing step doing?
  • What does BWA acronym stand for?
  • What type of data is stored in SAM/CRAM/BAM formatted files?
  • What is binary format? How do sam vs bam differ? Why convert to binary?
  • How can you identify the header of a sam file from the main content? What type of information is stored int he header?
  • What is a cigar string? (No need to decipher one, just know what type of information it stores)

Materials

Readings

This is why Biostars is awesome. Because scientists are debating the relative merits of trimming right now, while we learn about it. Read for the content and for the blistering snarkiness.

(Theres a good bit on there about kmers - kmers are super important, we'll get to those soon. Kmer wikipedia article for now).

Del Fabbro et al., 2013 An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis

Clone this wiki locally