Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix function call return type for IN / NOT IN created from SEARCH in the multistage engine #14128

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yashmayya
Copy link
Collaborator

  • IN / NOT IN should have a return type of BOOLEAN and not the type of the search argument.
  • Pinot doesn't have native support for a SEARCH operator, hence we convert any SEARCH calls generated by the Calcite plan in the v2 engine into an IN, a NOT IN, or a combination (AND / OR) of range predicates (>, >=, <, <=).
  • The return type for the range predicate function call is correctly set to BOOLEAN, but the return type for IN / NOT IN is currently incorrectly set to the type of the search argument.
  • This can cause errors if there is a function call on top of the search and the scalar function implementation requires a boolean argument. This patch fixes the return type of the converted IN / NOT IN function calls.

@yashmayya yashmayya added bugfix multi-stage Related to the multi-stage query engine labels Oct 1, 2024
@codecov-commenter
Copy link

codecov-commenter commented Oct 1, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 64.07%. Comparing base (59551e4) to head (7a1f87d).
Report is 1110 commits behind head on master.

Files with missing lines Patch % Lines
...inot/query/planner/logical/RexExpressionUtils.java 33.33% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14128      +/-   ##
============================================
+ Coverage     61.75%   64.07%   +2.32%     
- Complexity      207     1537    +1330     
============================================
  Files          2436     2600     +164     
  Lines        133233   143505   +10272     
  Branches      20636    21982    +1346     
============================================
+ Hits          82274    91952    +9678     
+ Misses        44911    44781     -130     
- Partials       6048     6772     +724     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 64.04% <33.33%> (+2.33%) ⬆️
java-21 63.96% <33.33%> (+2.33%) ⬆️
skip-bytebuffers-false 64.06% <33.33%> (+2.31%) ⬆️
skip-bytebuffers-true 63.94% <33.33%> (+36.21%) ⬆️
temurin 64.07% <33.33%> (+2.32%) ⬆️
unittests 64.07% <33.33%> (+2.32%) ⬆️
unittests1 55.75% <33.33%> (+8.86%) ⬆️
unittests2 34.53% <0.00%> (+6.80%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@gortiz gortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I would suggest to add a test that fails in previous code in order to make sure we don't make this mistake again.

@yashmayya
Copy link
Collaborator Author

I would suggest to add a test that fails in previous code in order to make sure we don't make this mistake again.

I considered this, but it's a tricky case to add a test for. I guess it doesn't really make sense to test it by directly calling one of RexExpressionUtils methods (i.e., by creating some synthetic RexNodes) and the failure case described in the PR description was detected in #14125 when I had put a boolean type check on the when arguments for the case scalar function. The failing query was:

SELECT primary_key, CASE WHEN description IN ('Item one', 'Item two') THEN attribute ELSE description END AS description, CASE WHEN description NOT IN ('Item three', 'Item four') THEN attribute ELSE description END AS attribute FROM ( select {tbl1}.primary_key, {tbl1}.description, {tbl2}.attribute FROM {tbl1} JOIN {tbl2} ON {tbl1}.primary_key = {tbl2}.primary_key) tmp WHERE attribute IN ('A','B','C','D') limit 10

in ResourceBasedQueriesTest.

However, I removed the boolean type check from the variadic case function implementation since that wasn't backward compatible (the earlier scalar function implementation didn't have argument type checks). The only other scalar function that takes a boolean argument seems to be NOT and Calcite automatically converts NOT on a SEARCH to a SEARCH with a modified Sarg. Furthermore, the explain plans also don't really have any indication of the return type of converted RexExpressions. Any other ideas on an elegant way to add a test for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants