Skip to content

Workflow Inputs

John Vivian edited this page Nov 10, 2018 · 9 revisions

The Toil RNA-seq workflow requires input files in order to run. These files are hosted on Synapse and by UC Santa Cruz. Inputs were built using Gencode v23 and HG38 (see Methods).

Building Your Own Indices

To build your own indices, a FASTA reference file is needed along with an annotation GTF. Simply supply the reference genome and GTF file to toil-rnaseq-inputs.

An example command to generate indices for a mouse genome:

toil-rnaseq-inputs --ref /mnt/mm10.fa --gtf /mnt/mouse-annotation.gtf --star --rsem --kallisto --hera

Direct Links

Synapse

  • Register for a Synapse account
  • Either download the samples from the website GUI or use the Python API
  • pip install synapseclient
  • python
    • import synapseclient
    • syn = synapseclient.Synapse()
    • syn.login('foo@bar.com', 'password')
    • Get the RSEM reference (1 GB)
      • syn.get('syn5889216', downloadLocation='.')
    • Get the Kallisto index (2 GB)
      • syn.get('syn5886142', downloadLocation='.')
    • Get the STAR index (25 GB)
      • syn.get('syn5886182', downloadLocation='.')
    • Get the Hera index (2 GB)
      • syn.get('syn11678373', downloadLocation='.')

Test Inputs

These inputs are used specifically for a test sample run during continuous integration. The test sample was generated from reads mapped to chromosome 6. Ensure the ci option is enabled in the config if attempting to run these test samples so the appropriate resources are requested.

Direct Links

There are no specific test inputs for Hera and Kallisto, just use the standard inputs.

Synapse

  • python
    • Get the sample (500 KB)
      • syn.get('syn9924961', downloadLocation='.')
      • syn.get('syn9924962', downloadLocation='.')
    • Get the small RSEM reference (8 MB)
      • syn.get('syn9772189', downloadLocation='.')
    • Get the small STAR reference (2 GB)
      • syn.get('syn9772190', downloadLocation='.')
    • Get the Kallisto reference (1 GB, same as regular input)
      • syn.get('syn5886142', downloadLocation='.')

When running the pipeline, set the CI option in the config to true, so that it requests an appropriate amount of memory when running STAR.

Clone this wiki locally