Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: volatile expressions should not be target of common subexpt elimination #8520

Merged
merged 5 commits into from
Dec 14, 2023

Conversation

viirya
Copy link
Member

@viirya viirya commented Dec 12, 2023

Which issue does this PR close?

Closes #8518.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @viirya -- this is about the fastest turnaround I have ever seen for a issue filed --> PR created 🙏

I think we should be checking volatility slightly differently, and I left comments about why and how inline.

Thanks again for such a fast PR

datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved
datafusion/expr/src/expr.rs Show resolved Hide resolved
datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved
Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I find a document from Postgres which describes the same behavior:

Any function with side-effects must be labeled VOLATILE, so that calls to it cannot be optimized away. Even a function with no side-effects needs to be labeled VOLATILE if its value can change within a single query; some examples are random(), currval(), timeofday().

ScalarFunctionDefinition::UDF(udf) => {
udf.signature().volatility == crate::Volatility::Volatile
}
ScalarFunctionDefinition::Name(_) => false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under which condition does the optimizer see an unresolved function name? And: if we don't know how the function behaves (because there's only a name) shouldn't we assume the worst case (i.e. that it is volatile)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should assume it is volatile given lack of additional information

At this moment, I don't think the optimizer will ever see unresolved function names

The plan is to use them to implement expr_fns like power once we remove the hard coded list of built in functions as decsribed in #8045

In fact @edmondop is working on this as part of #8157

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "resolve" pass should probably run before other optimizers then, right? So maybe it's even safe to error out if we hit an unresolved function here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to get error on unresolved scalar function here. Missing that it is a unresolved one. (maybe renaming it to Unresolved 🤔 ).

@alamb
Copy link
Contributor

alamb commented Dec 13, 2023

The "resolve" pass should probably run before other optimizers then, right? So maybe it's even safe to error out if we hit an unresolved function here.

Yes, I agree this is a good idea. The resolution should run before the optimizers (we have it as part of the analysis pass at the moment)

@edmondop
Copy link
Contributor

edmondop commented Dec 13, 2023 via email

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @viirya for your quick and thorough work on this PR

Ok(udf.signature().volatility == crate::Volatility::Volatile)
}
ScalarFunctionDefinition::Name(func) => {
internal_err!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


# Verify that multiple calls to volatile functions like `random()` are not combined / optimized away
query B
SELECT r FROM (SELECT r1 == r2 r, r1, r2 FROM (SELECT random() r1, random() r2) WHERE r1 > 0 AND r2 > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

This test fails on main:

❯ SELECT r FROM (SELECT r1 == r2 r, r1, r2 FROM (SELECT random() r1, random() r2) WHERE r1 > 0 AND r2 > 0)
;
+------+
| r    |
+------+
| true |
+------+
1 row in set. Query took 0.037 seconds.

@viirya viirya merged commit 5909866 into apache:main Dec 14, 2023
22 checks passed
@viirya
Copy link
Member Author

viirya commented Dec 14, 2023

Thanks for review @alamb @crepererum @waynexia

appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Dec 14, 2023
…ination (apache#8520)

* fix: volatile expressions should not be target of common subexpt elimination

* Fix clippy

* For review

* Return error for unresolved scalar function

* Improve error message
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Jan 2, 2024
…ination (apache#8520)

* fix: volatile expressions should not be target of common subexpt elimination

* Fix clippy

* For review

* Return error for unresolved scalar function

* Improve error message
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Jan 3, 2024
…ination (apache#8520)

* fix: volatile expressions should not be target of common subexpt elimination

* Fix clippy

* For review

* Return error for unresolved scalar function

* Improve error message
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Jan 8, 2024
…ination (apache#8520)

* fix: volatile expressions should not be target of common subexpt elimination

* Fix clippy

* For review

* Return error for unresolved scalar function

* Improve error message
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Jan 8, 2024
…ination (apache#8520)

* fix: volatile expressions should not be target of common subexpt elimination

* Fix clippy

* For review

* Return error for unresolved scalar function

* Improve error message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple calls to the same volatile function do not produce different answers
5 participants