-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Recursively handle pdist's diagonal chunks #90
base: master
Are you sure you want to change the base?
Conversation
This ends up being slower than the non-recursive implementation in PR ( #84 ). Though this is likely a consequence of |
0af1da3
to
c31a83a
Compare
Make use of `cdist` for computing the bulk of the results for `pdist`. However drop all duplicate chunks on the opposite side of the diagonal. Though keep all chunks on the right side of the diagonal. As for any chunks that land on the diagonal, break them up into pieces that include non-duplicated pairs and pass those through `cdist` as well. Take all of these results and flatten them so that they can be concatenated into one big result.
Make use of `cdist` for computing the bulk of the results for `pdist`. However drop all duplicate chunks on the opposite side of the diagonal. Though keep all chunks on the right side of the diagonal. As for any chunks that straddle the diagonal, recursively break them up into smaller pieces. If the pieces are on the right side of the diagonal, they are trivially handled with `cdist`. If they are on or beyond the diagonal, they are trivially dropped. If they still land on the diagonal, repeat the process by calling into `pdist` again until they are resolved one of these two ways. Since `pdist` returns its results in vector form, the recursive portion needs to make use of `squareform` to convert them back into square matrices that can be more easily worked with. Though the results are again unraveled according to the constraints of `pdist`. So the brief restructuring with `squareform` is a mere convenience to allow recursive calls of `pdist` to proceed without issues.
In recursive calls to `pdist`, try to rechunk the result to match the original chunking before it was split further. This is done in an effort to ensure that `squareform` handles it well.
This reverts commit 1049849.
a759f0f
to
ae0b43f
Compare
Even after a significant boost to |
Have reworked this implementation to not use |
fbc27fd
to
ace1e58
Compare
Instead of using `squareform` to restructure the result in `pdist` from each recursive call, adjust the recursive strategy to work with the sparse `pdist` result. This should cut a significant amount of overhead out of the recursive `pdist` diagonal optimization strategy.
ace1e58
to
72cfba4
Compare
Instead of explicitly setting the chunking for recursive calls to `pdist`. Simply slice each piece and use `concatenate` to join them back together. This has basically the same effect as rechunking, but appears to be a little bit faster.
Combine to calls to `getitem` on the blocks that `pdist` acts on recursively so there is only one call to `getitem`. Should make things a bit more efficient.
The empty array was really just filler at this point. Not to mention it doesn't make much sense now that we are using flattened results from `pdist` and the empty array is not being flattened at all.
Follow-up to PR ( #84 ).
Fixes #92
Make use of
cdist
for computing the bulk of the results forpdist
. However drop all duplicate chunks on the opposite side of the diagonal. Though keep all chunks on the right side of the diagonal.As for any chunks that straddle the diagonal, recursively break them up into smaller pieces. If the pieces are on the right side of the diagonal, they are trivially handled with
cdist
. If they are on or beyond the diagonal, they are trivially dropped. If they still land on the diagonal, repeat the process by calling intopdist
again until they are resolved one of these two ways.Since
pdist
returns its results in vector form, the recursive portion needs to make use ofsquareform
to convert them back into square matrices that can be more easily worked with. Though the results are again unraveled according to the constraints ofpdist
. So the brief restructuring withsquareform
is a mere convenience to allow recursive calls ofpdist
to proceed without issues.