-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust ToParentBlockJoin[Byte|Float]KnnVectorQuery to return highest score child doc ID by parent id #12510
Adjust ToParentBlockJoin[Byte|Float]KnnVectorQuery to return highest score child doc ID by parent id #12510
Conversation
…score child doc ID by parent id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good, I agree that the naming can be confusing.
Here's some possible alternatives:
- DiversifyingKnn(Collector|VectorQuery)
- DiversifyingChildrenKnn...
- CollapsingKnn...
- CollapsingChildren...
Naming is hard.
/** kNN byte vector query that joins matching children vector documents with their parent doc id. */ | ||
/** | ||
* kNN byte vector query that joins matching children vector documents with their parent doc id. The | ||
* top documents returned are the child document ids and the calculated scores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add an example on how to mix with root document queries? Something like:
ToParentBlockJoinByteKnnVectorQuery childQuery = ...
Query query = new ToParentBlockJoinQuery(childQuery, parentsFilter, ..)
...
?
@@ -38,6 +38,7 @@ | |||
|
|||
/** | |||
* kNN float vector query that joins matching children vector documents with their parent doc id. | |||
* The top documents returned are the child document ids and the calculated scores. | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here?
…rn highest score child doc ID by parent id (#12510) The current query is returning parent-id's based off of the nearest child-id score. However, its difficult to invert that relationship (meaning determining what exactly the nearest child was during search). So, I changed the new `ToParentBlockJoin[Byte|Float]KnnVectorQuery` to `DiversifyingChildren[Byte|Float]KnnVectorQuery` and now it returns the nearest child-id instead of just that child's parent id. The results are still diversified by parent-id. Now its easy to determine the nearest child vector as that is what the query is returning. To determine its parent, its as simple as using the previously provided parent bit set. Related to: #12434
While integrating, I discovered a frustrating bug :(
The current query is returning parent-id's based off of the nearest child-id score. However, its difficult to invert that relationship (meaning determining what exactly the nearest child was during search).
So, I changed the new
ToParentBlockJoin[Byte|Float]KnnVectorQuery
to return the nearest child-id instead of just that child's parent id. The results are still diversified by parent-id.Now its easy to determine the nearest child vector as that is what the query is returning. To determine its parent, its as simple as using the previously provided parent bit set.
I realize that this might make the name weird. I am happy to consider a new name. All the "join" names are confusing to me already.
I am happy to change the name.
Since this is iterating on an unreleased query and related to: #12434 I am not adding a change log.