Refactor function registry for multi-stage engine #13573

Jackie-Jiang · 2024-07-09T22:35:07Z

Here are the main changes:

Do not register any function in the catalog. PinotCatalog is just a wrapper over table cache to resolve database name and extract table schema.
Register all function signatures into PinotOperatorTable
The following functions are registered in the PinotOperatorTable:
- Selected standard function operators from Calcite SqlStdOperatorTable
- Pinot custom function operators
- Aggregation function types
- Transform function types
- Scalar functions
Customize function lookup to follow the single-stage engine convention: ignore case and underscore within function names
Add framework to support scalar function class with polymorphism

codecov-commenter · 2024-07-09T23:18:03Z

Codecov Report

Attention: Patch coverage is 0.70053% with 567 lines in your changes missing coverage. Please review.

Project coverage is 27.72%. Comparing base (59551e4) to head (96040f0).
Report is 779 commits behind head on master.

Files	Patch %	Lines
...che/pinot/segment/spi/AggregationFunctionType.java	0.00%	132 Missing ⚠️
...apache/pinot/common/function/FunctionRegistry.java	0.00%	97 Missing ⚠️
...e/pinot/common/function/TransformFunctionType.java	0.00%	86 Missing ⚠️
...ache/pinot/calcite/sql/fun/PinotOperatorTable.java	0.00%	68 Missing ⚠️
...rg/apache/pinot/common/function/FunctionUtils.java	0.00%	26 Missing ⚠️
...ot/calcite/rel/rules/PinotEvaluateLiteralRule.java	0.00%	22 Missing ⚠️
...ery/runtime/operator/operands/FunctionOperand.java	0.00%	19 Missing ⚠️
...el/rules/PinotAggregateExchangeNodeInsertRule.java	0.00%	18 Missing ⚠️
.../java/org/apache/pinot/query/QueryEnvironment.java	0.00%	13 Missing ⚠️
...r/transform/function/TransformFunctionFactory.java	0.00%	12 Missing ⚠️
... and 19 more

❗ There is a different number of reports uploaded between BASE (59551e4) and HEAD (96040f0). Click for more details.

HEAD has 15 uploads less than BASE

Flag BASE (59551e4) HEAD (96040f0)

temurin 12 10

java-21 7 6

skip-bytebuffers-true 3 2

skip-bytebuffers-false 7 5

unittests 5 1

unittests1 2 0

java-11 5 4

unittests2 3 1

Additional details and impacted files

@@              Coverage Diff              @@
##             master   #13573       +/-   ##
=============================================
- Coverage     61.75%   27.72%   -34.03%     
+ Complexity      207      198        -9     
=============================================
  Files          2436     2553      +117     
  Lines        133233   140470     +7237     
  Branches      20636    21851     +1215     
=============================================
- Hits          82274    38949    -43325     
- Misses        44911    98533    +53622     
+ Partials       6048     2988     -3060

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration2	`0.00% <0.00%> (ø)`
java-11	`27.72% <0.70%> (-33.99%)`	⬇️
java-21	`<0.01% <0.00%> (-61.63%)`	⬇️
skip-bytebuffers-false	`27.72% <0.70%> (-34.02%)`	⬇️
skip-bytebuffers-true	`<0.01% <0.00%> (-27.73%)`	⬇️
temurin	`27.72% <0.70%> (-34.03%)`	⬇️
unittests	`27.72% <0.70%> (-34.03%)`	⬇️
unittests1	`?`
unittests2	`27.72% <0.70%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yashmayya

Thanks @Jackie-Jiang, this is a really nice improvement with lots of cleanups! I had a few minor comments and questions to better my understanding of these areas.

pinot-common/src/main/java/org/apache/pinot/common/function/PinotScalarFunction.java

pinot-spi/src/main/java/org/apache/pinot/spi/annotations/ScalarFunction.java

pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java

...rc/main/java/org/apache/pinot/core/operator/transform/function/TransformFunctionFactory.java

pinot-core/src/test/java/org/apache/pinot/core/function/FunctionDefinitionRegistryTest.java

yashmayya · 2024-07-10T09:54:47Z

...nner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotAggregateReduceFunctionsRule.java

-    return call;
+  public boolean canReduce(AggregateCall call) {
+    SqlKind kind = call.getAggregation().getKind();
+    return kind == SqlKind.SUM || kind == SqlKind.AVG;


Why do we need this customized rule? Which of the original Calcite rule's reductions don't work in Pinot?

Added some javadoc to make it clear. Currently STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, COVAR_POP, COVAR_SAMP breaks because of the original rule. Take a look at the changes in StatisticAggregates.json

Ah okay, makes sense now, thanks!

Does this rule applies in the leaf stage or also in the intermediate stage? How we also merge data from different workers in the not simpler form?

When this rule is applied, we don't have leaf stage concept yet (leaf stage is determined with PinotAggregateExchangeNodeInsertRule).
We don't really need these rewrite because Pinot can directly handle SUM and AVG with proper null handling (the rule is needed for engine without proper null handling). I didn't directly remove the rule because that is out of the scope of this PR, and null handling support requires some more tweaks.

pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java

yashmayya · 2024-07-11T09:49:21Z

Looks like this test needs to be updated with the new exception type and message due to the changes in FunctionOperand

pinot-common/src/main/java/org/apache/pinot/common/function/scalar/LogicalFunctions.java

pinot-common/src/main/java/org/apache/pinot/common/function/scalar/ArrayFunctions.java

pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java

gortiz

Still need more time to finish the first read and probably have a second one :D

gortiz · 2024-07-11T14:57:28Z

pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java

+  @Deprecated
+  @Nullable
+  public static FunctionInfo getFunctionInfo(String name, int numArguments) {
+    return lookupFunctionInfo(canonicalize(name), numArguments);


I would even recommend to create a class called CannonicalName that contains a String. Then use that class as input. We may have a static class that transforms Strings into CanonicalNames.

This will let us:

Have type safe checks, so we Java don't let us call lookupFunctionInfo with non canonical names.

We can cache the CanonicalNames, so we don't need to allocate.

pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java

pinot-common/src/main/java/org/apache/pinot/common/function/PinotScalarFunction.java

gortiz · 2024-07-11T15:45:10Z

pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java

+    public FunctionInfo getFunctionInfo(int numArguments) {
+      FunctionInfo functionInfo = _functionInfoMap.get(numArguments);
+      return functionInfo != null ? functionInfo : _functionInfoMap.get(VAR_ARG_KEY);


I think it would be cool to have documented somewhere how functions can be registered. AFAIU that should be something like:

Using annotated methods: Simpler and shorted but less expressive. For example, you cannot support polymorphism.

Using annotated classes that implement PinotScalarFunction: More expressive.

By @Jackie-Jiang comment here it looks like there is a third way that consist on registering the function explicitly in PinotOperatorTable. But function won't be usable in V1, am I right?

pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java

pinot-core/src/test/java/org/apache/pinot/queries/TimestampQueriesTest.java

...r/src/main/java/org/apache/pinot/calcite/rel/rules/PinotAggregateExchangeNodeInsertRule.java

gortiz · 2024-07-11T16:13:07Z

...nner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotAggregateReduceFunctionsRule.java

-    return call;
+  public boolean canReduce(AggregateCall call) {
+    SqlKind kind = call.getAggregation().getKind();
+    return kind == SqlKind.SUM || kind == SqlKind.AVG;


Does this rule applies in the leaf stage or also in the intermediate stage? How we also merge data from different workers in the not simpler form?

pinot-query-planner/src/main/java/org/apache/pinot/calcite/sql/fun/PinotOperatorTable.java

gortiz · 2024-07-17T10:09:01Z

...-runtime/src/main/java/org/apache/pinot/query/runtime/operator/operands/FunctionOperand.java

+    if (functionInfo == null) {
+      if (FunctionRegistry.contains(canonicalName)) {
+        throw new IllegalArgumentException(
+            String.format("Unsupported function: %s with argument types: %s", functionName,
+                Arrays.toString(argumentTypes)));
+      } else {
+        throw new IllegalArgumentException(String.format("Unsupported function: %s", functionName));
+      }
+    }


Ideally we should also include the variants that were not selected.

We could do that by adding a method in FunctionRegistry that returns all FunctionInfo for a given name and then showing these options here.

I don't think it has to be added in this PR, but it is something that will be useful to debug problems

It's a little bit tricky though because the matches could be done via the type inference. We can probably add some usage info within each PinotScalarFunction which can be lookup up in the FunctionRegistry. Added a TODO to follow up

gortiz · 2024-07-17T10:11:19Z

pinot-query-runtime/src/test/resources/queries/UDFAggregates.json

        "sql": "SELECT PERCENTILE_TDIGEST(float_col, 50), PERCENTILE_TDIGEST(double_col, 5), PERCENTILE_TDIGEST(int_col, 75), PERCENTILE_TDIGEST(long_col, 75) FROM {tbl}",
-        "outputs": [[1.75, 1.0, 137, 137]]
+        "outputs": [[1.75, 1.0, 137.75, 137.75]]


Here we are changing the return type, right? Couldn't this produce problems in production? Does it only happen with PERCETILE_TDIGEST or are other aggregations that can change their type?

Only PERCENTILE_TDIGEST where we registered the wrong return type before. This is actually a bugfix.

Jackie-Jiang added release-notes Referenced by PRs that need attention when compiling the next release notes bugfix cleanup refactor multi-stage Related to the multi-stage query engine labels Jul 9, 2024

Jackie-Jiang requested review from xiangfu0, gortiz and yashmayya July 9, 2024 22:35

Jackie-Jiang force-pushed the function_registry branch from 8b0f391 to 777503c Compare July 10, 2024 00:58

yashmayya reviewed Jul 10, 2024

View reviewed changes

Jackie-Jiang force-pushed the function_registry branch 3 times, most recently from c6f2303 to 5447423 Compare July 11, 2024 02:34

gortiz reviewed Jul 11, 2024

View reviewed changes

pinot-common/src/main/java/org/apache/pinot/common/function/scalar/LogicalFunctions.java Show resolved Hide resolved

gortiz reviewed Jul 11, 2024

View reviewed changes

pinot-common/src/main/java/org/apache/pinot/common/function/scalar/ArrayFunctions.java Show resolved Hide resolved

gortiz reviewed Jul 11, 2024

View reviewed changes

pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java Outdated Show resolved Hide resolved

gortiz reviewed Jul 11, 2024

View reviewed changes

Jackie-Jiang force-pushed the function_registry branch 6 times, most recently from 36f30a4 to d8e9770 Compare July 17, 2024 00:19

gortiz reviewed Jul 17, 2024

View reviewed changes

gortiz approved these changes Jul 17, 2024

View reviewed changes

Jackie-Jiang added 2 commits July 17, 2024 12:15

Refactor function registry for multi-stage engine

d8bb77d

Fix test and address comments

3509bad

Jackie-Jiang added 2 commits July 17, 2024 12:15

Fix the wrong Nullable imports

f73eb4a

Address comments

96040f0

Jackie-Jiang force-pushed the function_registry branch from d8e9770 to 96040f0 Compare July 17, 2024 19:15

Jackie-Jiang merged commit 55f519f into apache:master Jul 17, 2024
21 of 22 checks passed

Jackie-Jiang deleted the function_registry branch July 17, 2024 20:35

yashmayya mentioned this pull request Jul 30, 2024

Support polymorphic scalar comparison functions in the multi-stage query engine #13711

Merged

npawar added the v1v2 label Aug 15, 2024

This was referenced Sep 25, 2024

Remove Calcite return type override for FROM_DATE_TIME scalar function in PinotOperatorTable #14075

Closed

Polymorphic binary arithmetic scalar functions #14089

Merged

[Feature] create FunctionRegistry based on argument type not argument count #8597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor function registry for multi-stage engine #13573

Refactor function registry for multi-stage engine #13573

Jackie-Jiang commented Jul 9, 2024

codecov-commenter commented Jul 9, 2024 •

edited

Loading

yashmayya left a comment

yashmayya Jul 10, 2024

Jackie-Jiang Jul 10, 2024

yashmayya Jul 11, 2024

gortiz Jul 11, 2024

Jackie-Jiang Jul 11, 2024

yashmayya commented Jul 11, 2024

gortiz left a comment

gortiz Jul 11, 2024

gortiz Jul 11, 2024

gortiz Jul 11, 2024

gortiz Jul 17, 2024

Jackie-Jiang Jul 17, 2024

gortiz Jul 17, 2024

Jackie-Jiang Jul 17, 2024

Refactor function registry for multi-stage engine #13573

Refactor function registry for multi-stage engine #13573

Conversation

Jackie-Jiang commented Jul 9, 2024

codecov-commenter commented Jul 9, 2024 • edited Loading

Codecov Report

yashmayya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yashmayya commented Jul 11, 2024

gortiz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 9, 2024 •

edited

Loading