Skip to content

Single Sample SVs

John Garza edited this page Jun 21, 2019 · 7 revisions

Introduction

Tutorials

Command Line

Sample commands:

run_cnvkit -> cnvkit_main (docker: etal/cnvkit) /usr/bin/python /usr/local/bin/cnvkit.py batch -r <cnvkit_reference_cnn> --method wgs

run_cnvkit -> cns_to_vcf (docker: etal/cnvkit) /usr/bin/python /usr/local/bin/cnvkit.py call <cnvkit_main output cns> -o adjusted.tumor.cns && /usr/bin/python /usr/local/bin/cnvkit.py export vcf adjusted.tumor.cns --cnr <cnvkit_main output cnr> -o cnvkit.vcf

run_manta (docker: mgibio/manta_somatic-cwl) /usr/bin/python /usr/bin/manta/bin/configManta.py --referenceFasta --tumorBam --runDir && /usr/bin/python runWorkflow.py -m local -j 12

run_smoove (docker: brentp/smoove) /usr/local/bin/smoove call --processes 4 -F --genotype --name SV --fasta --exclude <smoove_exclude_regions>

CWL Workflow

INSERT LINK TO DOCKER IMAGES/REPOS AND CWL

Steps

INSERT PROCESS DIAGRAM

Inputs

Name Description Example Required
bam Aligned sequencing results to be analyzed for SVs
cnvkit_diagram Create an ideogram of copy ratios on chromosomes as a pdf false
cnvkit_drop_low_coverage Helps avoid false positive deletions in low quality tumor samples false
cnvkit_male_reference Use/assume a male reference false
cnvkit_method Sequencing protocol used wgs
cnvkit_reference_cnn A copy number reference file against which potential copy number variants will be evaluated /gscmnt/gc2560/core/cnvkit_pon/v1/reference.cnn
cnvkit_scatter_plot Create a whole genome copy ratio profile as a pdf scatter plot false
cnvkit_vcf_name Custom name to use for the cnvkit output vcf cnvkit_output
manta_call_regions bgzip-compressed, tabix-indexed BED file specifiying regions to which variant analysis will be restricted
manta_non_wgs When true, activates settings appropriate for whole exome sequencing false
manta_output_contigs if true, outputs assembled contig sequences in final VCF files, in the INFO field CONTIG true
maximum_sv_pop_freq Population frequency above which variants will be filtered out
merge_estimate_sv_distance When evaluating variants to be merged, estimate distance based on the size of the sv true
merge_max_distance Maximum distance of variants to consider for merging 1000
merge_min_sv_size Minimum size of SVs to merge 1
merge_min_svs Minimum number of sv calls needed to be merged 1
merge_same_strand Require merged SVs to be on the same strand true
merge_same_type Require merged SVs to be of the same type true
merge_sv_pop_freq_db bed file containing allele frequencies for a population /gscmnt/gc2560/core/cwl/inputs/hall_lab_B38_SV_public_callset/sv.bedpe.gz
reference Reference sequence example_data/exome_workflow/chr17_test.fa
smoove_exclude_regions Regions to be ignored when calling SVs through smoove (a wrapper for lumpy)
sv_filter_interval_lists One or more interval lists defining regions to keep in the output vcf, labeled with the source of the intervals /gscmnt/gc2560/core/model_data/interval-list/db8c25932fd94d2a8a073a2e20449878/a35b64d628b94df194040032d53b5616.interval_list, /gscmnt/gc2560/core/model_data/interval-list/1eea27120d294db49826cef2e79b618c/3a61ffd42f074fe1b8a20742f6dfb32e.interval_list, /gscmnt/gc2560/core/model_data/interval-list/86494a288c3c4d7a89842ed2f1d6e36a/f54639200d364231bd5e1c39266ccfac.interval_list
variants_to_table_fields one or more of any standard VCF column (CHROM, ID, QUAL) or any binding in the INFO field (e.g., AC=10) to add to the tsv report
variants_to_table_genotype_fields one or more of any binding in the FORMAT field (e.g., GQ, PL) to add to the tsv report
vep_cache_dir Location of a local ensembl cache to be used by vep example_data/exome_workflow/
vep_ensembl_assembly Which (species) assembly vep should use GRCh38
vep_ensembl_species Which species vep should use homo_sapiens
vep_ensembl_version Which ensembl release vep should use 95
vep_to_table_fields VEP CSQ annotation fields to add to the tsv report

Outputs

Name Source Description
annotated_tsvs GATK VariantsToTable tsv files containing specified SV fields and annotations
cn_diagram CNVkit ideogram of copy ratios on chromosomes
cn_scatter_plot CNVkit whole genome copy ratio profile
cnvkit_vcf CNVkit final cnvkit output, converted to vcf format
filtered_vcfs Various filters SV VCF, filtered by variant population frequency and the above interval lists
manta_all_candidates Manta Unscored SV and indel candidates
manta_diploid_variants Manta SVs and indels scored and genotyped under a diploid model
manta_small_candidates Manta simple insertion and deletion variants less than the minimum scored variant size (50 by default)
manta_somatic_variants Manta SVs and indels scored under a somatic variant model
manta_tumor_only_variants Manta Subset of the candidateSV.vcf.gz file after removing redundant candidates and small indels less than the minimum scored variant size (50 by default)
merged_annotated_svs Suvivor, VEP SV calls from Manta, CNVkit, and Smoove(lumpy), merged by Survivor and annotated by VEP
smoove_output_variants Smoove (Lumpy) SV calls from Smoove, a wrapper for Lumpy
sv_pop_filtered_vcf Various filters SV VCF, filtered by variant population frequency
tumor_antitarget_coverage CNVkit Coverage in the antitarget regions from bam read depths
tumor_bin_level_ratios CNVkit table of copy number ratios
tumor_segmented_ratios CNVkit discrete copy number segments from the above table
tumor_target_coverage CNVkit Coverage in the target regions from bam read depths
Clone this wiki locally