Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polymorphic binary arithmetic scalar functions #14089

Conversation

yashmayya
Copy link
Collaborator

@yashmayya yashmayya commented Sep 26, 2024

Exception: java.lang.IllegalArgumentException: Cannot convert value: '1.726807808874E12' to type: LONG
        at org.apache.pinot.spi.data.FieldSpec$DataType.convertInternal(FieldSpec.java:724)
        at org.apache.pinot.core.query.optimizer.filter.MergeRangeFilterOptimizer.getComparable(MergeRangeFilterOptimizer.java:149)
        at org.apache.pinot.core.query.optimizer.filter.MergeRangeFilterOptimizer.getRange(MergeRangeFilterOptimizer.java:135)
        at org.apache.pinot.core.query.optimizer.filter.MergeRangeFilterOptimizer.optimize(MergeRangeFilterOptimizer.java:85)
org.apache.pinot.spi.exception.BadQueryRequestException: java.lang.IllegalArgumentException: Cannot convert value: '1.727344079699E12' to type: LONG
	at org.apache.pinot.core.query.pruner.ValueBasedSegmentPruner.convertValue(ValueBasedSegmentPruner.java:157)
	at org.apache.pinot.core.query.pruner.ColumnValueSegmentPruner.pruneRangePredicate(ColumnValueSegmentPruner.java:176)
	at org.apache.pinot.core.query.pruner.ColumnValueSegmentPruner.pruneSegmentWithPredicate(ColumnValueSegmentPruner.java:88)
	at org.apache.pinot.core.query.pruner.ValueBasedSegmentPruner.pruneSegment(ValueBasedSegmentPruner.java:144)
  • We should also look into fixing the above root cause by using a type specific bounds in the range predicates instead of a string value.

@yashmayya
Copy link
Collaborator Author

I've removed the polymorphic division function implementation because that breaks backward compatibility for the v1 query engine. select 1 / 2 from mytable currently returns 0.5 in v1 and with the polymorphic function changes it would return 0 (which is actually the SQL compliant result, and is also the returned result in the v2 engine with or without these changes due to Calcite based return type inference of the standard division operator).

@yashmayya yashmayya force-pushed the polymorphic-binary-arithmetic-scalar-functions branch from bfc0da0 to ff6a73b Compare September 26, 2024 11:25
@codecov-commenter
Copy link

codecov-commenter commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 75.75758% with 16 lines in your changes missing coverage. Please review.

Project coverage is 64.11%. Comparing base (59551e4) to head (fbd44ce).
Report is 1109 commits behind head on master.

Files with missing lines Patch % Lines
...unction/scalar/arithmetic/MinusScalarFunction.java 73.33% 3 Missing and 1 partial ⚠️
...function/scalar/arithmetic/MultScalarFunction.java 73.33% 3 Missing and 1 partial ⚠️
...function/scalar/arithmetic/PlusScalarFunction.java 73.33% 3 Missing and 1 partial ⚠️
...tic/PolymorphicBinaryArithmeticScalarFunction.java 60.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14089      +/-   ##
============================================
+ Coverage     61.75%   64.11%   +2.36%     
- Complexity      207     1538    +1331     
============================================
  Files          2436     2600     +164     
  Lines        133233   143505   +10272     
  Branches      20636    21982    +1346     
============================================
+ Hits          82274    92014    +9740     
+ Misses        44911    44722     -189     
- Partials       6048     6769     +721     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 64.09% <75.75%> (+2.38%) ⬆️
java-21 63.98% <75.75%> (+2.36%) ⬆️
skip-bytebuffers-false 64.10% <75.75%> (+2.36%) ⬆️
skip-bytebuffers-true 63.96% <75.75%> (+36.23%) ⬆️
temurin 64.11% <75.75%> (+2.36%) ⬆️
unittests 64.11% <75.75%> (+2.36%) ⬆️
unittests1 55.82% <75.75%> (+8.93%) ⬆️
unittests2 34.53% <68.18%> (+6.80%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yashmayya yashmayya force-pushed the polymorphic-binary-arithmetic-scalar-functions branch from ff6a73b to 42091c8 Compare September 26, 2024 12:58
@yashmayya yashmayya marked this pull request as ready for review September 26, 2024 13:39
@Jackie-Jiang
Copy link
Contributor

Per the PG behavior: https://www.postgresql.org/docs/current/functions-math.html
Should it return INT for operations between INT columns?
Good catch on the division function behavior difference. In order to move towards standard SQL behavior without breaking backward compatibility, we can consider adding a query option to follow standard SQL. This way we can keep v1 behavior and also make v2 standard SQL compatible. V1 user can also migrate to standard SQL behavior.

@yashmayya
Copy link
Collaborator Author

Should it return INT for operations between INT columns?

Ideally yes, but that would break backward compatibility too (just like the division case). In PG, 2000000000 + 2000000000 returns ERROR: integer out of range whereas in Pinot it returns 4000000000.

Good catch on the division function behavior difference. In order to move towards standard SQL behavior without breaking backward compatibility, we can consider adding a query option to follow standard SQL. This way we can keep v1 behavior and also make v2 standard SQL compatible. V1 user can also migrate to standard SQL behavior.

Do you mean a specific query option for arithmetic related operations or just a generic option under which we'd bucket all such compatibility breaking but SQL compliant changes in the future? Also, the v2 behavior should already be SQL compliant because Calcite will make sure that the return type for the division operator is appropriate (and Pinot makes sure to cast internal values based on the expected return type of an operator).

@yashmayya
Copy link
Collaborator Author

Also, if we do end up introducing such a query option in the future, we should probably make sure that the arithmetic transform functions are updated to respect the query option too. For instance, there is an AdditionTransformFunction which also always returns DOUBLE regardless of the input operand types. And this is the function that will get used instead of the scalar function (that is being updated here) if there's a query like SELECT intCol + 5 FROM mytable.

@Jackie-Jiang
Copy link
Contributor

I was thinking adding a general query option to turn v1 engine into standard sql compatible mode, so that v2 engine can enable it and get standard sql behavior. Even if we cast after getting the result, it might already lose precision and give inaccurate result.

Currently, when adding 2 ints, will v2 engine return int? For division, PG always round towards 0, which is not followed if we cast the result after execution.

@Jackie-Jiang
Copy link
Contributor

Basically we want a general way so that we can add standard SQL support without breaking existing v1 queries

@yashmayya
Copy link
Collaborator Author

Even if we cast after getting the result, it might already lose precision and give inaccurate result.

Yes, this is a valid point. Also, an expression like 5 / 3 + 1.5 will return 3.166666666666667 in the v2 engine (since the cast is not done after each function call), as opposed to 2.5 in Postgres.

Currently, when adding 2 ints, will v2 engine return int

Yes, it does, which seems somewhat problematic because something like 2000000000 + 2000000000 results in an integer overflow occurring and a negative integer value being returned.

I was thinking adding a general query option to turn v1 engine into standard sql compatible mode, so that v2 engine can enable it and get standard sql behavior

Basically we want a general way so that we can add standard SQL support without breaking existing v1 queries

I wasn't able to find any common standard SQL definition for such arithmetic operations. For instance, in MySQL 1 / 2 returns 0.5 and 2000000000 + 2000000000 returns 4000000000 (similar to the Pinot v1 engine, unlike Postgres or Pinot v2 engine). Other databases seem to be doing this their own way as well. If there's a consensus that we always want to do things the Postgres way going forward, then the query option should be something like usePostgresSemantics? And coming to this PR itself, I presume we'd want the improvements here with or without such a query option right? And only put some of the compatibility breaking changes (like enforcing int for arithmetic operations on int and the type specific division) behind the new query option?

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss query options in a different thread. That is orthogonal to this PR

}

@Override
@Nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Remove @Nullable. Same for other places

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually maybe we should register them with the standard Calcite operator SqlStdOperatorTable.MINUS?

Copy link
Collaborator Author

@yashmayya yashmayya Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I had pushed this change to fix some test failures with direct usages of ADD, and IIRC operator alias based registering wasn't working for me locally earlier but I checked again and things look good (might've messed something up earlier). I've pushed a commit to use aliases for the standard operators and removed these overrides.

@Jackie-Jiang Jackie-Jiang added the backward-incompat Referenced by PRs that introduce or fix backward compat issues label Sep 30, 2024
…umMap instead of HashMap; cleanup functionInfoForTypes method in PolymorphicBinaryArithmeticScalarFunction
@yashmayya yashmayya force-pushed the polymorphic-binary-arithmetic-scalar-functions branch from bc3f33c to fbd44ce Compare September 30, 2024 18:10
@Jackie-Jiang Jackie-Jiang merged commit 19b79f4 into apache:master Sep 30, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backward-incompat Referenced by PRs that introduce or fix backward compat issues bugfix feature query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants