Skip to content
Phil Ewels edited this page Dec 7, 2018 · 3 revisions

Icing

The name

The MinION (Oxford Nanopore) systems are relatively simple to use, and icing is HLA typing workflow based on long-range amplicon PCR that is using long reads from this sequencing platform. Hence the name: MinION rhymes with mignon that is a sort of cake in Hungary (minyon); there is usually some icing on it.

The workflow

Is written in nextflow that is a DSL to make a framework around other tools used in the pipeline. Additional scripts (i.e. for demultiplexing) are written in python. To run icing you will need not only nextflow, but other tools included like BWA and canu. Once they are installed and configured correctly, the workflow should run by reading raw 2D FASTQ files and printing out HLA genotypes. The sequencing chemistry and basecalling just changed in the recent months (related to July 2016), the process of getting 2D FASTQs has changed though not significantly. This HLA workflow is independent of this preceding process but the read quality is expected to be significantly better. The workflow actually expects 1D reads as well, though it is not recommended to use those for genotyping.

Workflow steps:

  1. demultiplex pooled input 2D FASTQ
  2. 2D read alignment with BWA to genomic sequences only from IMGT/HLA
  3. select some candidates from the IMGT/HLA database (best matching alleles)
  4. extract reads mappable to these candidates
  5. generate consensus sequences alleles with canu
  6. do the actual HLA typing based on the final consensus sequences and the whole IMGT/HLA database

Demultiplex pooled input 2D FASTQ

The script responsible for demultiplexing is demultiplexON.py, expecting a single FASTQ file (with 2D reads) and a YAML file containing the "handle" and the "index" sequences. The supposed structure of the amplicons are like:

 handle_prefix-index-handle_postfix-amplicon

so, if the YAML file defines handles and indexes as

handles:
    prefix: "ACAGTC"
    postfix: "TGATGC"
indexes: [
    "GTCGAT",
    "TGAGTG",
    "GTACTG"
]

the script will look for next three patterns at the beginning and the end of the reads (considering reverse complements):

ACAGTC-GTCGAT-TGATGC-...
ACAGTC-TGAGTG-TGATGC-...
ACAGTC-GTACTG-TGATGC-...

2D read alignment with BWA to genomic sequences only from IMGT/HLA

select some candidates from the IMGT/HLA database (best matching alleles)

extract reads mappable to these candidates

generate consensus sequences alleles with canu

do the actual HLA typing based on the final consensus sequences and the whole IMGT/HLA database