Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't error in simplify_expressions rule #8957

Merged
merged 3 commits into from
Jan 24, 2024
Merged

Conversation

haohuaijin
Copy link
Contributor

Which issue does this PR close?

Closes #8909

Rationale for this change

see #8910 and #8909 ,

What changes are included in this PR?

before this pr, in simplify_espression we try to fold the expression, but this can result some error, for example

select case 1 when 2 then 4/0 end;

4/0 never reach in runtime.

so in this PR, if we encounter an error during fold expression, return the origin expression, and the error will occur in runtime if it is a real error(not the short-circuit problem).

Note: this pr doesn't solve #8910 , due to #8927

Are these changes tested?

test by existing tests

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jan 23, 2024
@haohuaijin
Copy link
Contributor Author

haohuaijin commented Jan 23, 2024

cc @jackwener @alamb. This is the first step to solving #8910; the main change in this PR is that it throws an error in runtime, not in simplify_expression. I open #8959 to discuss how to deal with the short-circuit function.
before this pr

❯ select 1/0;
Optimizer rule 'simplify_expressions' failed
caused by
Arrow error: Divide by zero error

after this pr

❯ select 1/0;
Arrow error: Divide by zero error

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @haohuaijin -- I think this is looking really good to me.

I think we should avoid the clone if possible, add a few more tests, and update the expected error messages, but all in all this PR is looking really close.

cc @jackwener

@@ -138,28 +138,6 @@ mod tests {
ExprSchemable, JoinType,
};

/// A macro to assert that one string is contained within another with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for reusing the existing one

datafusion/sqllogictest/test_files/arrow_typeof.slt Outdated Show resolved Hide resolved
@@ -281,7 +280,13 @@ impl<'a> TreeNodeRewriter for ConstEvaluator<'a> {

fn mutate(&mut self, expr: Expr) -> Result<Expr> {
match self.can_evaluate.pop() {
Some(true) => Ok(Expr::Literal(self.evaluate_to_scalar(expr)?)),
Some(true) => {
let lit = self.evaluate_to_scalar(expr.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a real shame that this has to clone the expression even for the common case where there is no error

maybe we could use something like the following:

enum SimplifyResult {
   // expr was simplifed and contains the new expression
  Simplified(Expr),
  // Evalaution encountered an error, contains the original expression
  SimplifyRuntimeError(DataFusionError, Expr), 
}

@haohuaijin
Copy link
Contributor Author

Thanks @alamb for the great reviews; I apply your suggestions, re-added the deleted tests and add some new tests.

SELECT not(1), not(0)

query ?B
SELECT null, not(null)
----
NULL NULL

query error DataFusion error: Optimizer rule 'simplify_expressions' failed\ncaused by\nInternal error: NOT 'Literal \{ value: Utf8\("hi"\) \}' can't be evaluated because the expression's type is Utf8, not boolean or NULL
query error
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the detailed error message because this query return internal error, that make ci failed

❯ select not('hi');
Internal error: NOT 'Literal { value: Utf8("hi") }' can't be evaluated because the expression's type is Utf8, not boolean or NULL.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
❯ select 1 and 2;
Error during planning: Cannot infer common argument type for logical boolean operation Int64 AND Int64

I think the return error message should like And operator, they both only work for boolean value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

track in #8965

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @haohuaijin -- this PR looks great to me. Thank you

FYI @jackwener

#[test]
fn test_simplify_divide_zero_by_zero() {
// because divide by 0 maybe occur in short-circuit expression
// so we should not simplify this, and throw error in runtime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


#[test]
fn test_expression_partial_simplify_2() {
// (1 > 2) and (4 / 0) -> false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very nice job! Thanks @haohuaijin @alamb ❤️

Comment on lines +291 to +294
// Certain expressions such as `CASE` and `COALESCE` are short circuiting
// and may not evalute all their sub expressions. Thus if
// if any error is countered during simplification, return the original
// so that normal evaluation can occur
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 78ca43c into apache:main Jan 24, 2024
22 checks passed
@haohuaijin haohuaijin deleted the fix_expression branch January 24, 2024 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluates CASE branches even if their WHEN clause is false
3 participants