Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip table qualifiers from schema in UNION ALL for unparser #11082

Merged

Conversation

phillipleblanc
Copy link
Contributor

Which issue does this PR close?

Closes #10706

Rationale for this change

The schema that is the result of a UNION ALL should not have any table qualifiers, as the table information has effectively been erased and is no longer a valid reference.

Consider the following SQL:

SELECT table1.foo FROM table1
UNION ALL
SELECT table2.foo FROM table2

The logical schema from this UNION ALL should be just foo, but it is currently taking the first input's schema directly, resulting in a schema of table1.foo.

This came up as an issue when trying to validate the unparser works correctly for UNION ALL statements, which I added in #10603

Slightly modifying the above example, if I add an ORDER BY clause to the input SQL:

SELECT table1.foo FROM table1
UNION ALL
SELECT table2.foo FROM table2
ORDER BY foo

Then the resulting unparsed SQL will be:

SELECT table1.foo FROM table1
UNION ALL
SELECT table2.foo FROM table2
ORDER BY table1.foo

Because the unparser takes the schema directly from the Union node when writing out the column expressions.

This is my second attempt to fix #10706, my first attempt was to remove the table qualifiers when building the Union plan directly, but that caused issues documented here: #10707 (comment)

This attempt scopes the problem to just removing the table qualifiers during the unparsing.

What changes are included in this PR?

Rewrite the LogicalPlan before unparsing to remove table identifiers from the Union All plan and any sort expressions that take a Union All plan as input.

Are these changes tested?

Yes, I added a test in the Unparser sql_integration test.

Are there any user-facing changes?

No

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @phillipleblanc -- I left some style comments / suggestions, but I think this PR does what it says it does and could be merged as is

Thank you very much for the contribution

datafusion/sql/src/unparser/rewrite.rs Outdated Show resolved Hide resolved
datafusion/sql/src/unparser/rewrite.rs Outdated Show resolved Hide resolved
datafusion/sql/src/unparser/rewrite.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@alamb alamb merged commit ed7c884 into apache:main Jun 24, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 24, 2024

Thanks again @phillipleblanc and nice work!@

findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…e#11082)

* Add test cases for issue

* Remove test from logical_plan/builder.rs

* Remove table qualifiers for sorts following a Union

* Use transform_up and add more documentation on why this is needed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UNION ALL should strip table identifiers in its resulting schema
2 participants