Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SVTYPE when using IUPAC nucleotide codes #1636

Merged
merged 4 commits into from
Apr 8, 2024

Conversation

nuno-agostinho
Copy link
Contributor

Fixes #1631

Motivation

When the SVTYPE tag is defined and the variant REF/ALT alleles contain non-ATCG IUPAC nucleotide codes (such as N and R in the user's example), VEP 111 will try to parse the ALT allele as a SVTYPE and fail:

WARNING: line 1 skipped (3 90699772 MantaINS:2:34291:34291:1:0:0 ANNNNN...): ATCACAAATAGGTTCTGAGAATTATTCTGTCTAGTTTTTCTAGCGCCGTTTGAGGCCTATGGTAGAAAAGGGAATATCTTCATAGAAAAACGAGACAGAATAATTCTCAGAACCTATTTGTGATTTGTGCTT type is not supported

To avoid this issue, if the SVTYPE type is defined and if ALT does not resemble one of the VCF-supported SV types in ALT (i.e., starting with INS, DEL, INV, DUP or CN), then the SV type will be based on SVTYPE instead of ALT.

The warning message was also changed to be clearer:

WARNING: line 1 skipped (3 90699772 MantaINS:2:34291:34291:1:0:0 ANNNNN...): ANN is not a supported structural variant type

Testing

VEP should run with the following variants without returning any warnings:

3       90699772        MantaINS:2:34291:34291:1:0:0    ANNNNNNNNNNNNNNNNNNNN   ATCACAAATAGGTTCTGAGAATTATTCTGTCTAGTTTTTCTAGCGCCGTTTGAGGCCTATGGTAGAAAAGGGAATATCTTCATAGAAAAACGAGACAGAATAATTCTCAGAACCTATTTGTGATTTGTGCTT    957     PASS    END=90699792;SVTYPE=INS;SVLEN=131;CIGAR=1M131I20D;set=manta     GT:FT:GQ:PL:PR:SR       1/1:PASS:62:999,65,0:0,1:0,25                                                                                                                                                                                                                   
4       31835775        MantaINS:70994:0:0:0:1:0        TNNNNNNNNNNNNNNNNNNNN   TTGCAGTGAAGAGAGATCACGACACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATTTAAATTTAAAAAACCCCACATGAACAAGCTAATAAAGCATACTGAGTTTGATGAAATACATTTCTTTTCT       999     PASS   END=31835795;SVTYPE=INS;SVLEN=176;CIGAR=1M176I20D;set=manta      GT:FT:GQ:PL:PR:SR       1/1:PASS:161:999,164,0:0,18:0,61                                                                                                                                                                
10      39254773        MantaINS:109681:2:2:0:0:0       GNNNNNNNNNNNNNNNNNNNN   GCAGTTTCTCTGAAATCTTCTTTCTAGTTTTTATCTGTAGATGTTTCCTATTTCACCATAGGCCTGAAGGCTCACCAAAGTATCCCTATGCAGATTCTACAAAAACAGTGTTACCAAACTGTTGAATGAAAAGAGAGGTTGAACTCTGTAAGATGAATGGAGACATCATGAAATGGTTTCTCAGATAGCTTCCTTCGAGTTTTTATCCTGAAATATTCCCTTTTGCACCATGACCTCAATGAGCTCGCAAATGTCCAC      999     PASS    END=39254793;SVTYPE=INS;SVLEN=257;CIGAR=1M257I20D;set=manta     GT:FT:GQ:PL:PR:SR       1/1:PASS:150:999,153,0:0,21:0,38                                                                                
17      21860937        MantaINS:2:27285:27285:2:1:0    GNNNNNNNNNNNNNNNNNNNN   GGAATGGAATCGAATGGAATGTAATCAAATGGAATGGACCAGAATGGAATGGAATGGAAAAGAACGGACATGAATGTAATGGACTGCAATCTAACTGATTCGAAAGAATGGAATCGAAAG        999     PASS    END=21860957;SVTYPE=INS;SVLEN=119;CIGAR=1M119I20D;set=manta     GT:FT:GQ:PL:PR:SR       1/1:PASS:119:999,122,0:0,6:0,40                                                                                                                                                                                                                         
17      26820065        MantaDEL:2:27595:27595:2:1:0    CNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN      CGA     999    MaxMQ0Frac       END=26820266;SVTYPE=DEL;SVLEN=-201;CIGAR=1M2I201D;set=manta     GT:FT:GQ:PL:PR:SR       1/1:PASS:85:999,88,0:1,3:0,30                                                                                                                                                   

The output should return that these are all intergenic variants (instead of no consequence at all).

Copy link
Contributor

@olaaustine olaaustine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on GRCh38, works as expected.
Thank you @nuno-agostinho
Waiting for the tests to be fixed before merging.
LGTM.

@olaaustine olaaustine merged commit 5dafe4b into Ensembl:postreleasefix/112 Apr 8, 2024
1 check passed
@olaaustine
Copy link
Contributor

Merged into release and main

@nuno-agostinho nuno-agostinho deleted the fix/svtype branch May 22, 2024 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants