Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the speed of some post-cluster calculations #932

Merged
merged 30 commits into from
Jan 18, 2022

Conversation

drroe
Copy link
Contributor

@drroe drroe commented Jan 18, 2022

Version 6.2.0.

Certain post-clustering actions (such as the cluster summary, best representatives frame calculation, and cluster silhouette calculation) became very expensive for high frame counts when sieving was used due to reliance on a O(N) scaling function. This has been replaced in many instances with a boolean array, greatly improving speed at the cost of some memory.

Also fixes best representative frames calculation for the summary by parts (summarysplit).

Also adds a new option to control the index written to the cluster frame silhouette file: [silidx {idx|frm}], where idx indicates the sorted index should be used (this is what had been used) and frm indicates the actual frame number should be used.

way it only needs to be generated once. It may be better to just have it
generated all the time.
rep found when doing best rep search for cluster by parts and all
frames just happen to be sieved out
best reps for split, fix frame index in silhouette calc.
@drroe drroe added the bugfix label Jan 18, 2022
@drroe drroe self-assigned this Jan 18, 2022
@drroe drroe merged commit 09d075a into Amber-MD:master Jan 18, 2022
@drroe drroe deleted the fix.cluster.summary branch January 18, 2022 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant