Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to remove the default depth filter in the pileup. Also fix the version #292

Merged
merged 1 commit into from
Apr 6, 2024

Conversation

dpryan79
Copy link

@dpryan79 dpryan79 commented Apr 5, 2024

By default, htslib has a maximum heap size sufficient to store a few thousand reads, resulting in clair3 only using this many reads for calling and reporting. In most use-cases that's not an issue, but in high-coverage amplicon-seq this is undesired, since you're more likely to get wildly incorrect fold-changes and other coverage metrics. I ran into the same issue in a tool I developed using htslib and there the fix turned out to be a simple 1-line change. It'd be good if someone confirmed that this small change fixes that issue :)

I also noticed that clair3 is reporting the wrong version in the headers it produces. I've updated that to 1.0.7, which would presumably be the next release. Ideally this would get updated automatically and though given that this is a mix of shell scripts and python that may be more hassle than it's worth.

@aquaskyline
Copy link
Member

aquaskyline commented Apr 5, 2024

Thank you Devon for the PR. In your amplicon sequencing test case, how high the coverage can go? Using bam_mplp_set_maxcnt is the way to go. We can set it to a reasonably high value so most of the amplicon sequencing cases can be covered. But setting it to INT_MAX might be too high and makes the htslib vulnerable to malformed BAM input.

@dpryan79
Copy link
Author

dpryan79 commented Apr 5, 2024

Hi @aquaskyline ! I have up to ~1,000,000x data (yes, that is absurdly high, I didn't generate it). In practice samtools coverage has identical C code, so I wouldn't be too concerned with malformed BAM input causing issues (the samtools folks would have run into this already).

Cf. https://github.com/samtools/samtools/blob/d92b49e81b89144096dde8b64a96f262fe86edae/coverage.c#L561

@aquaskyline
Copy link
Member

Understood. I will test out bam_mplp_set_maxcnt(mplp, 2^20);, which should be enough for your case. If no problem we will have it in v1.0.7. And thanks for spotting the version mistake in the header!

@dpryan79
Copy link
Author

dpryan79 commented Apr 5, 2024

Sounds great, thanks for the extremely fast reply!

@zhengzhenxian zhengzhenxian merged commit 328532b into HKU-BAL:main Apr 6, 2024
@dpryan79 dpryan79 deleted the infinite_maxcnt branch April 6, 2024 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants