Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASE WHEN doesn't pick first true branch? Differs from SQLite/Postgres #8475

Closed
tv42 opened this issue Dec 8, 2023 · 0 comments · Fixed by #8477
Closed

CASE WHEN doesn't pick first true branch? Differs from SQLite/Postgres #8475

tv42 opened this issue Dec 8, 2023 · 0 comments · Fixed by #8477
Labels
bug Something isn't working

Comments

@tv42
Copy link
Contributor

tv42 commented Dec 8, 2023

Describe the bug

I found this in the sqlite sqllogictest repository, random/expr/slt_good_3.slt:

query I rowsort label-676
SELECT ALL - CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59
----
-337914

Just to confirm with sqlite3:

$ sqlite3
SQLite version 3.43.2 2023-10-10 12:14:04
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> SELECT ALL - CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59;
-337914
sqlite>

Datafusion disagrees:

$ datafusion-cli
DataFusion CLI v33.0.0
❯ SELECT ALL - CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (- CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Int64(47) WHEN NULL IS NULL THEN COUNT(*) END) + Int64(93) + (- Int64(-44)) * Int64(91) + CASE Int64(44) WHEN (- Int64(-21)) * Int64(69) - Int64(12) THEN Int64(58) ELSE Int64(-3) END * Int64(23) * Int64(84) * (- Int64(-59)) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -337868                                                                                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.017 seconds.

❯

With SQLite, the expression simplifies to

SELECT - 47 + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59;

and then

SELECT - 47 + 93 + - - 44 * 91 + - 3 * + + 23 * + 84 * - - 59;

which Datafusion agrees is -337914. Hence, Datafusion must have evaluated one of the cases differently!

sqlite> SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
47
❯ SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
+----------------------------------------------------------------------------------------------+
| CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Int64(47) WHEN NULL IS NULL THEN COUNT(*) END |
+----------------------------------------------------------------------------------------------+
| 1                                                                                            |
+----------------------------------------------------------------------------------------------+
1 row in set. Query took 0.015 seconds.

Trying to isolate that further, SQLite thinks NOT - AVG ( - 41 ) IS NULL is 1, Datafusion thinks it's true which is the same with types, and Datafusion handles that CASE in isolation just fine:

sqlite> SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
47
sqlite> select NOT - AVG ( - 41 ) IS NULL;
1
sqlite> select case when NOT - AVG ( - 41 ) IS NULL then 'yes' else 'no' end;
yes
❯ SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
+----------------------------------------------------------------------------------------------+
| CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Int64(47) WHEN NULL IS NULL THEN COUNT(*) END |
+----------------------------------------------------------------------------------------------+
| 1                                                                                            |
+----------------------------------------------------------------------------------------------+
1 row in set. Query took 0.014 seconds.

❯ select NOT - AVG ( - 41 ) IS NULL;
+---------------------------------+
| NOT (- AVG(Int64(-41))) IS NULL |
+---------------------------------+
| true                            |
+---------------------------------+
1 row in set. Query took 0.011 seconds.

❯ select case when NOT - AVG ( - 41 ) IS NULL then 'yes' else 'no' end;
+--------------------------------------------------------------------------------+
| CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Utf8("yes") ELSE Utf8("no") END |
+--------------------------------------------------------------------------------+
| yes                                                                            |
+--------------------------------------------------------------------------------+
1 row in set. Query took 0.014 seconds.

But when it's part of that larger statement something goes wrong.

And this is it:

SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN true THEN 'datafusion' END;
sqlite and postgres
❯ SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN true THEN 'datafusion' END;
+---------------------------------------------------------------------------------------------------------+
| CASE WHEN Boolean(true) THEN Utf8("sqlite and postgres") WHEN Boolean(true) THEN Utf8("datafusion") END |
+---------------------------------------------------------------------------------------------------------+
| datafusion                                                                                              |
+---------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.006 seconds.

To Reproduce

❯ SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN true THEN 'datafusion' END;
+---------------------------------------------------------------------------------------------------------+
| CASE WHEN Boolean(true) THEN Utf8("sqlite and postgres") WHEN Boolean(true) THEN Utf8("datafusion") END |
+---------------------------------------------------------------------------------------------------------+
| datafusion                                                                                              |
+---------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.006 seconds.

Expected behavior

sqlite> SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN true THEN 'datafusion' END;
sqlite and postgres

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant