Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HGVS notation for dup in 109 becomes ins in 110 #1633

Open
barbarian1803 opened this issue Mar 12, 2024 · 7 comments
Open

HGVS notation for dup in 109 becomes ins in 110 #1633

barbarian1803 opened this issue Mar 12, 2024 · 7 comments
Assignees

Comments

@barbarian1803
Copy link

barbarian1803 commented Mar 12, 2024

Describe the issue

For below variant:

#CHROM POS ID REF ALT QUAL FILTER INFO
chr21 5233678 . A AATTT . . .

In VEP 109.3, this variant has HGVS notation: ENST00000623753.1:n.132-758_132-755dup
In VEP 110.1/111.0, this variant has notation: ENST00000623753.1:n.132-755_132-754insAAAT
I notice a lot of similar variant that used to be dup becomes ins in VEP 110 and 111.
The correct notation would be the dup.

Another example

#CHROM POS ID REF ALT QUAL FILTER INFO
chr21 13933439 . C CT . . .
It used to be : ENST00000451663.5:n.2429+398dup
now becomes: ENST00000451663.5:n.2429+398_2429+399insA

Additional information

Run via docker for version 110.1 and VEP web for latest version v111.

System

  • VEP version: 110.1/111
  • VEP Cache version: 110/111
@likhitha-surapaneni likhitha-surapaneni self-assigned this Mar 12, 2024
@likhitha-surapaneni
Copy link
Contributor

likhitha-surapaneni commented Mar 12, 2024

Hi @barbarian1803 ,
Thank you for reporting to us. There is a fix applied in the upcoming release to address the issue. With this, the HGVSc would be reported as dup instead of ins.

Kind regards,
Likhitha

@GSYongWu
Copy link

GSYongWu commented Apr 2, 2024

I have encountered the same problem and hope it can be updated as soon as possible.

@aksenia
Copy link

aksenia commented Jun 18, 2024

Hi is this still an issue in v112? thank you!

@GSYongWu
Copy link

This issue is resolved, but I have discovered a new problem. The CDS coordinates for some genes are incorrect. For example, the mutation SRGAP2:NM_015326.5, c.85A>T(p.T29S) has been annotated as c.994A>T(p.T332S). I suspect it is a database issue.

@davmlaw
Copy link

davmlaw commented Aug 1, 2024

@GSYongWu - I think it's a RefSeq problem not VEP. What build are you using? if GRCh37 then looking at the RefSeq GFF, entry for NM_015326.5 - something is a bit strange - the cDNA match starts at 910 not 1

Maybe you could try using GRCh38 rather than 37 - and if the problem goes away that shows it's a RefSeq problem and they can close this issue as fixed

@GSYongWu
Copy link

GSYongWu commented Aug 2, 2024

@davmlaw Is it possible that the GFF file used by VEP this time is incorrect? causing the coordinates for certain gene annotations to be inaccurate? This part has always been correct in the older version of VEP.

@davmlaw
Copy link

davmlaw commented Aug 2, 2024

Yes. Of course it can be wrong!

The GFF is produced by getting sequences reported by labs around the world over many many years then aligning them using automated tools (algorithms built on our understanding of biology) against a pretty arbitrary reference sequence. Something can go wrong at every single step of that process, or arbitrary decisions made you can't know which is right and it is done at massive scale.

A quick glance at the differences between refseq and Ensembl transcripts (which are trying to do pretty much the same thing) shows you the scale of how imperfect it is.

It's super useful and valuable, though! Not to knock either teams

The transcript sequences differ per version and the alignments for a given sequence can differ for a build

When working this out it helps to explicitly list the genome builds and in your examples the transcript versions for your expected results (eg NM_015326.4 is length 6781, NM_015326.5 is length 6884)

It is also better to raise a new issue for a new problem than add it to an existing unrelated issue raised by someone else, that is now fixed (as this makes it hard for the hardworkong VEP people to manage their project and keep track of issues)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants