Adjust ToParentBlockJoin[Byte|Float]KnnVectorQuery to return highest score child doc ID by parent id #12510

benwtrent · 2023-08-15T21:00:50Z

While integrating, I discovered a frustrating bug :(

The current query is returning parent-id's based off of the nearest child-id score. However, its difficult to invert that relationship (meaning determining what exactly the nearest child was during search).

So, I changed the new ToParentBlockJoin[Byte|Float]KnnVectorQuery to return the nearest child-id instead of just that child's parent id. The results are still diversified by parent-id.

Now its easy to determine the nearest child vector as that is what the query is returning. To determine its parent, its as simple as using the previously provided parent bit set.

I realize that this might make the name weird. I am happy to consider a new name. All the "join" names are confusing to me already.

I am happy to change the name.

Since this is iterating on an unreleased query and related to: #12434 I am not adding a change log.

…score child doc ID by parent id

jimczi

The change looks good, I agree that the naming can be confusing.
Here's some possible alternatives:

DiversifyingKnn(Collector|VectorQuery)
DiversifyingChildrenKnn...
CollapsingKnn...
CollapsingChildren...
Naming is hard.

jimczi · 2023-08-16T05:14:44Z

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinByteKnnVectorQuery.java

-/** kNN byte vector query that joins matching children vector documents with their parent doc id. */
+/**
+ * kNN byte vector query that joins matching children vector documents with their parent doc id. The
+ * top documents returned are the child document ids and the calculated scores.


Maybe add an example on how to mix with root document queries? Something like:

ToParentBlockJoinByteKnnVectorQuery childQuery = ... Query query = new ToParentBlockJoinQuery(childQuery, parentsFilter, ..) ...

?

jimczi · 2023-08-16T05:14:59Z

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinFloatKnnVectorQuery.java

@@ -38,6 +38,7 @@

 /**
 * kNN float vector query that joins matching children vector documents with their parent doc id.
+ * The top documents returned are the child document ids and the calculated scores.
 */


…block-join-query

…rn highest score child doc ID by parent id (#12510) The current query is returning parent-id's based off of the nearest child-id score. However, its difficult to invert that relationship (meaning determining what exactly the nearest child was during search). So, I changed the new `ToParentBlockJoin[Byte|Float]KnnVectorQuery` to `DiversifyingChildren[Byte|Float]KnnVectorQuery` and now it returns the nearest child-id instead of just that child's parent id. The results are still diversified by parent-id. Now its easy to determine the nearest child vector as that is what the query is returning. To determine its parent, its as simple as using the previously provided parent bit set. Related to: #12434

Adjust ToParentBlockJoin[Byte|Float]KnnVectorQuery to return highest …

0c594cf

…score child doc ID by parent id

benwtrent added the vector-based-search label Aug 15, 2023

jimczi approved these changes Aug 16, 2023

View reviewed changes

benwtrent added 2 commits August 16, 2023 12:27

Merge remote-tracking branch 'upstream/main' into feature/fix-parent-…

7bf2152

…block-join-query

renaming and adding docs

850f367

benwtrent merged commit 4174b52 into apache:main Aug 16, 2023
4 checks passed

benwtrent deleted the feature/fix-parent-block-join-query branch August 16, 2023 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust ToParentBlockJoin[Byte|Float]KnnVectorQuery to return highest score child doc ID by parent id #12510

Adjust ToParentBlockJoin[Byte|Float]KnnVectorQuery to return highest score child doc ID by parent id #12510

benwtrent commented Aug 15, 2023

jimczi left a comment

jimczi Aug 16, 2023

jimczi Aug 16, 2023

Adjust ToParentBlockJoin[Byte|Float]KnnVectorQuery to return highest score child doc ID by parent id #12510

Adjust ToParentBlockJoin[Byte|Float]KnnVectorQuery to return highest score child doc ID by parent id #12510

Conversation

benwtrent commented Aug 15, 2023

jimczi left a comment

Choose a reason for hiding this comment

jimczi Aug 16, 2023

Choose a reason for hiding this comment

jimczi Aug 16, 2023

Choose a reason for hiding this comment