Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Optimize pdist's handling of diagonal chunks #84

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jakirkham
Copy link
Owner

@jakirkham jakirkham commented Oct 9, 2017

Follow-up to PR ( #81 ).
Fixes #92

Make use of cdist for computing the bulk of the results for pdist. However drop all duplicate chunks on the opposite side of the diagonal. Though keep all chunks on the right side of the diagonal. As for any chunks that land on the diagonal, break them up into pieces that include non-duplicated pairs and pass those through cdist as well. Take all of these results and flatten them so that they can be concatenated into one big result.

@jakirkham jakirkham force-pushed the opt_gen_pdist_diag_2 branch 8 times, most recently from 00ea49e to e687cf4 Compare October 9, 2017 05:08
@jakirkham
Copy link
Owner Author

After a fair bit of tweaking, this still appears to be a bit slower than no optimization for the diagonal at all. It might be possible to improve on this by recursively calling pdist on smaller chunks. This allows as many computations to be bundled for each chunk as possible. An attempt at this is in PR ( #90 ).

@jakirkham jakirkham changed the title Optimize pdist's handling of diagonal chunks WIP: Optimize pdist's handling of diagonal chunks Oct 9, 2017
Make use of `cdist` for computing the bulk of the results for `pdist`.
However drop all duplicate chunks on the opposite side of the diagonal.
Though keep all chunks on the right side of the diagonal. As for any
chunks that land on the diagonal, break them up into pieces that include
non-duplicated pairs and pass those through `cdist` as well. Take all of
these results and flatten them so that they can be concatenated into one
big result.
@jakirkham
Copy link
Owner Author

While this remains the best option for optimizing pdist diagonal chunks, it still suffers compared to doing no optimization at all. Not to mention the approach here is a bit complicated to follow. As such it is hard to justify why this should be accepted. However will leave this open for now in the event that some improvements can be made in time. Not planning to look into it near term though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant