Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contigs without assigned phaseblocks in their name #38

Open
sivico26 opened this issue Nov 14, 2023 · 3 comments
Open

Contigs without assigned phaseblocks in their name #38

sivico26 opened this issue Nov 14, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@sivico26
Copy link

Hi @fenderglass,

Thanks for developing Hapdup. I am trying to phase some loci of an allopolyploid plant into what should be the 2 subgenomes of its parents. After checking the output, I have some questions. I will use one of the assemblies as an example.

For one of my locus if I look into the hapdup_phased_* assemblies, I can see the following names for hap1:

contig_10_phaseblock_0  33510
contig_12       5153
contig_14_phaseblock_0  11146
contig_16_phaseblock_0  9374
contig_20       34460
contig_23_phaseblock_0  35265
contig_7_phaseblock_0   8359
contig_7_phaseblock_1   66335

While for hap2 it is:

contig_10_phaseblock_0  31793
contig_12       5153
contig_14_phaseblock_0  9818
contig_16       13178
contig_20_phaseblock_0  38831
contig_23_phaseblock_0  36404
contig_7_phaseblock_0   7893
contig_7_phaseblock_1   66913

As you can see, most of the contigs have their homolog in both haplotypes (contigs 7, 10, 14, and 23). But there are other two categories that confuse:

  • Contig 12 does not have any phaseblock assigned in either of the haplotypes. I take this as that there were reads mapping into that contig after filtering, or maybe there were but not with enough variant information (e.g. a very homozygous region). In short, insufficient info to actually phase it.
  • But then I stumble to extend the previous reasoning with contigs 16 and 20. They are, respectively, assigned a phaseblock in haplotype 1 and haplotype 2, but not in the other. How should I interpret this?
@mikolmogorov
Copy link
Collaborator

Hi,

That's unexpected, I think it probably represents an error in hapdup rather that something meaningful.. It is likely some kind of an edge case, where phasing block boundary is very close to contig end, but coordinates shifted slightly in different haplotypes. As a result, hapdup split contig_16 in HP1, but not in HP2.

In dual assembly mode this should not happen, but for the phasing mode I'll try to fix that in the future releases.

@mikolmogorov mikolmogorov added the bug Something isn't working label Nov 21, 2023
@sivico26
Copy link
Author

All right, if you need some data to debug this, let me know.

I am wondering, if the assembly has some redundancy, do you think it could lead/facilitate this problem? I am working with Flye assemblies, but I have not checked if there is redundancy on those.

@mikolmogorov
Copy link
Collaborator

How big is your dataset? If you could send it somehow, that would be helpful! Feel free to email mikolmogorov@gmail.com

I don't think this is specific to the genome, just a borderline case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants