Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize reduceToTopK in ResultUtil by removing pre-filling and reducing peek calls #2146

Merged
merged 2 commits into from
Sep 27, 2024

Conversation

junqiu-lei
Copy link
Member

@junqiu-lei junqiu-lei commented Sep 24, 2024

Description

This PR optimizes the reduceToTopK method by eliminating unnecessary pre-filling of the priority queue, reducing redundant peek() calls, and adding null safety checks for better performance.

Related Issues

Closes #2145

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ing peek calls

Signed-off-by: Junqiu Lei <junqiu@amazon.com>
@junqiu-lei junqiu-lei changed the title Optimize searchTopK method in ExactSearcher Optimize reduceToTopK in ResultUtil by removing pre-filling and reducing peek calls Sep 25, 2024
@junqiu-lei
Copy link
Member Author

Offline synced with @navneet1v and @jmazanec15, we now focused on optimizing reduceToTopK function in ResultUtil, updated the PR.

Copy link
Contributor

@shatejas shatejas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junqiu-lei Did we compare removing the map with this approach to see which is faster? or is it not possible? I feel like its a simpler code that way if its yielding better results

We might have to first check if KNNResults array is in order. but that way we can do something similar to what lucene does and just pick the top elements and stop at k in reduceToTopK

@junqiu-lei
Copy link
Member Author

@junqiu-lei Did we compare removing the map with this approach to see which is faster? or is it not possible? I feel like its a simpler code that way if its yielding better results

We might have to first check if KNNResults array is in order. but that way we can do something similar to what lucene does and just pick the top elements and stop at k in reduceToTopK

I think this is another place we can optimize, no we don't have comparing the difference so far. For this PR, we might can use micro benchmark for testing the improvement if possible.

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think the most optimal way to do this would be to require that each perLeafResults comes in order from top scores to worst scores. Then, a simple algorithm that basically just scans the top of each set of results and then only takes the top k. That being, that would require how results are returned in the different methods. Do you think we should do this @navneet1v ?

jmazanec15
jmazanec15 previously approved these changes Sep 27, 2024
Signed-off-by: Junqiu Lei <junqiu@amazon.com>
@junqiu-lei
Copy link
Member Author

So, I think the most optimal way to do this would be to require that each perLeafResults comes in order from top scores to worst scores. Then, a simple algorithm that basically just scans the top of each set of results and then only takes the top k. That being, that would require how results are returned in the different methods. Do you think we should do this @navneet1v ?

I can raise other PR for this part optimization.

@navneet1v
Copy link
Collaborator

So, I think the most optimal way to do this would be to require that each perLeafResults comes in order from top scores to worst scores. Then, a simple algorithm that basically just scans the top of each set of results and then only takes the top k. That being, that would require how results are returned in the different methods. Do you think we should do this @navneet1v ?

Yes I think we should change the return types

@junqiu-lei junqiu-lei merged commit e0c3afe into opensearch-project:main Sep 27, 2024
30 checks passed
@junqiu-lei junqiu-lei deleted the optimize-exact-search branch September 27, 2024 21:47
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 27, 2024
…ing peek calls (#2146)

Signed-off-by: Junqiu Lei <junqiu@amazon.com>
(cherry picked from commit e0c3afe)
junqiu-lei added a commit that referenced this pull request Sep 30, 2024
…ing peek calls (#2146) (#2164)

Signed-off-by: Junqiu Lei <junqiu@amazon.com>
(cherry picked from commit e0c3afe)

Co-authored-by: Junqiu Lei <junqiu@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Optimize searchTopK Method in ExactSearcher for Improved Performance
4 participants