-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duphold with Nanopore data? #45
Comments
Hi, yes, duphold should work very well on long read data. |
Thanks for your reply @brentp The coverage should be around 120X as 3 flowcells were used in the sequencing. Could it be that something is off about the format of my VCF that I used as input for duphold? They were ran through sniffles and then VEP, i'll give a few sample lines:
|
Hi, perhaps you can plot a few of these with samplot? |
I managed to look at the data in JBrowse and looked at a few of the SVs with high numbers; This is the following DUP: DUP | Chr6 | 165030128 | 165048325 | 18197 | IMPRECISE | 149 | 0/1 | transcript_amplification | ENSSSCG00000003889 And here's some deletions DEL Chr13 73763246 73770714 -7468 PRECISE 34 0/1 transcript_ablation ENSSSCG00000045067 DEL Chr2 122165466 122173123 -7657 PRECISE 31 0/1 coding_sequence_variant&intron_variant&feature_truncation COMMD10 ENSSSCG00000014223 161.818 143.548 161.818 Only thing that seems a bit strange to me it that those deletions have a higher coverage, I don't see what it could be about the duplication that gives it such high values though. |
Yes, something is wrong. Would be nice to see these in IGV or samplot. If you can share one chromosome (bam,vcf) then I can have a look. |
I was having trouble attaching the files here, so i've send them to you in the mail. Maybe good to know that this alignment is done by mapping long reads against an assembly that was made with the same long reads, to find heterozygous SVs. Thanks for all the help! |
thanks, can you also send the reference fasta via that service? |
Yes, you should have received it |
got it. will go through this in next few days. |
Hi,
I used duphold on some VCFs containing SVs called with Sniffles where the alignment was done using minimap2. The data used is all nanopore longread data. I was wondering if duphold is supposed to work properly on long read data? As quite a bit of the results seemed off/incorrect with very high numbers. Here's a sample of some of the results (parsed to obtain the info relevant for me):
SVtype | #CHROM | POS | End | SVLength | Precision | RE | Genotype | Consequence | GeneSymbol | EnsemblID | DHFC | DHFFC | DHBFC
DEL | Chr3 | 21648500 | 21946887 | -298387 | IMPRECISE | 10 | 01/Jan | coding_sequence_variant&5_prime_UTR_variant&intron_variant&feature_truncation | ARHGAP17 | ENSSSCG00000031488 | 181.481 | 101.031 | 178.182
DEL | Chr5 | 90902052 | 90907562 | -5510 | PRECISE | 39 | 0/1 | coding_sequence_variant&intron_variant&feature_truncation | CDK17 | ENSSSCG00000028182 | 0.963636 | 0.386861 | 0.946429
DEL | Chr2 | 122165466 | 122173123 | -7657 | PRECISE | 31 | 0/1 | coding_sequence_variant&intron_variant&feature_truncation | COMMD10 | ENSSSCG00000014223 | 161.818 | 143.548 | 161.818
INS | Chr6 | 13229037 | 13229037 | 46 | PRECISE | 21 | 0/1 | coding_sequence_variant&feature_elongation | GLG1 | ENSSSCG00000030420 | 0.814815 | 1 | 0.814815
DEL | Chr10 | 66054310 | 66056761 | -2451 | PRECISE | 62 | 0/1 | coding_sequence_variant&intron_variant&feature_truncation | ITIH5 | ENSSSCG00000011129 | 0.909091 | 0.438596 | 0.925926
DEL | Chr2 | 28721000 | 28772243 | -51243 | PRECISE | 69 | 0/1 | coding_sequence_variant&intron_variant&feature_truncation | KIAA1549L | ENSSSCG00000013311 | 143.636 | 0.153696 | 143.636
INV | Chr2 | 28705133 | 28707047 | 1914 | PRECISE | 130 | 0/1 | coding_sequence_variant&intron_variant | KIAA1549L | ENSSSCG00000013311 | 112.727 | 0.136564 | 114.815
INV | Chr2 | 28705133 | 28707047 | 1914 | PRECISE | 143 | 0/1 | coding_sequence_variant&intron_variant | KIAA1549L | ENSSSCG00000013311 | 112.727 | 0.136564 | 114.815
DEL | ChrX | 90375498 | 90376075 | -577 | PRECISE | 232 | 01/Jan | coding_sequence_variant&intron_variant&feature_truncation | MID2 | ENSSSCG00000012564 | 117.857 | 0.103774 | 117.857
INS | Chr13 | 135062890 | 135063265 | 1039 | IMPRECISE | 20 | 0/1 | coding_sequence_variant&intron_variant&feature_elongation | MUC13 | ENSSSCG00000011862 | 0.75 | 0.976744 | 0.792453
Thanks in advance,
Kind regards,
Anne
The text was updated successfully, but these errors were encountered: