Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result when use volatile function alias in having clause #7976

Open
haohuaijin opened this issue Oct 30, 2023 · 3 comments
Open

Incorrect result when use volatile function alias in having clause #7976

haohuaijin opened this issue Oct 30, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@haohuaijin
Copy link
Contributor

Is your feature request related to a problem or challenge?

related to #7876
when we do the query like

select a, random() as r from test group by a having r > 0.5;

we will get the incorrect result like

+----+---------------------+
| a  | r                   |
+----+---------------------+
| c  | 0.1664909010224056  |
| e  | 0.35292520754684475 |
| b  | 0.26159048688135855 |
+----+---------------------+

this is because when we convert the Statement into a LogicalPlan, we will "dereferences" any aliases in the HAVING clause in the below section
https://github.com/apache/arrow-datafusion/blob/1dd887cdff518ede1d1de457f4b20c22a9c7228f/datafusion/sql/src/select.rs#L110-L122

Describe the solution you'd like

Based on the way we implemented alias in having clause, I think we should disable the use of aliases for volatile functions in the having clause and report a error.
and we can use subqueries to implement the above SQL

select t.a, t.r from (select a, random() as r from test group by a) as t where t.r > 0.5;

Describe alternatives you've considered

support the sql like

select a, random() as r from test group by a having r > 0.5;

but this method is more complicated

Additional context

duchdb also do this query incorrect

D select t.c1, random() as r from agg.csv as t group by t.c1 having r > 0.5;
┌─────────┬─────────────────────┐
│   c1    │          r          │
│ varchar │       double        │
├─────────┼─────────────────────┤
│ b       │  0.5120539779309183 │
│ d       │ 0.20845546829514205 │
└─────────┴─────────────────────┘

mysql can get correct result
postgres disallow use alias in having clause

@haohuaijin haohuaijin added the enhancement New feature or request label Oct 30, 2023
@haohuaijin
Copy link
Contributor Author

cc @alamb @jonahgao

@jonahgao
Copy link
Member

The solution is acceptable to me.

I wonder if it is possible to eliminate this dereference step entirely and directly generate the same logical plan as the subquery.

But it would involve modifying other parts of the code, which may be complicated.

@alamb
Copy link
Contributor

alamb commented Oct 30, 2023

I wonder if it is possible to eliminate this dereference step entirely and directly generate the same logical plan as the subquery.

This would be the most elegant (and correct) solution, in my opinion.

@alamb alamb added bug Something isn't working and removed enhancement New feature or request labels Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants