Skip to content

waoverholt/snakemake_16S_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

16S Amplicon Initial Processing Pipeline

This is a snakemake pipeline for initial handling of 16S amplicon datasets. It seeks to automate many of the steps that I describe here.

It takes as raw input sequences in the format SAMPLE-NAME_R1_001.fastq.gz that our MiSeq produces by default. The final output is an unfiltered OTU table suitable for further work with R, MOTHUR, or QIIME. This repository will not likely be maintained and I will probably end up testing QIIME2 and might migrate to newer OTU picking methods. However, feel free to use it as a reference (crude) snakemake pipeline. Also, don't hesitate to contact me at waoverholt@gmail.com if you have specific questions about this pipeline.

Dependencies

This pipeline was written with snakemake and requires python3. It is set up to be configurable with miniconda. Follow instructions to install python and miniconda.

Once you have miniconda installed you can use the environment.yaml file to install all dependencies in a virtual environment. conda env create -n snakemake_16S python=3.5 --file environment.yaml The environment name can be anything you'd like.

To set up your own environment, the pipeline requires the following packages and programs in your path:

Config File

The pipeline requires a config.yaml file to run. Please modify the existing config file for your datasets.

Pipeline Summary

  1. Merge paired reads with pear.
  2. Quality control with vsearch.
  3. Adapter triming with MOTHUR.
  4. Dereplication with vsearch.
  5. Denovo and reference chimera detection with vsearch.
  6. OTU picking with swarm.

#Chimera reference database The chimera reference database I use it too large to be hosted on github. I am currently using the 97 rep_set from SILVA128 database.

The resulting OTU table is not abundance filtered. The tab delimited and the biom format tables produced are identical. Further analyzes can proceed using QIIME or R. To work with QIIME, you will need to deactivate the conda environment.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages