Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression by reverting Materialize dictionaries in group keys #8740

Merged
merged 4 commits into from
Jan 8, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 3, 2024

This closes #8738 by reverting #8291

I am not sure if this is the right fix -- I need to write some more tests / understand the problem more fully, but I wanted to get the PR up as I had the code.

To be clear, this change fixes a functional regression (a query that used to run no longer does), as descsribed in #8738

What I would like to do is to reopen #7647 and then we can work on reapply that change, ensuring the LogicalPlan and PhysicalPlan schemas match as well as maybe being more dilligent about performance testing

@alamb
Copy link
Contributor Author

alamb commented Jan 5, 2024

I believe we should proceed with this change, for the reasons explained in #7647 (comment)

@alamb alamb changed the title revert eb8aff7becaf5d4a44c723b29445deb958fbe3b4 / Materialize dictionaries in group keys revert Materialize dictionaries in group keys Jan 5, 2024
@alamb alamb marked this pull request as ready for review January 5, 2024 19:36
@tustvold
Copy link
Contributor

tustvold commented Jan 5, 2024

Do we have any empirical numbers to support this, recomputing dictionaries is extremely expensive and I would have thought it would outweigh any other overheads?

@alamb
Copy link
Contributor Author

alamb commented Jan 5, 2024

Do we have any empirical numbers to support this, recomputing dictionaries is extremely expensive and I would have thought it would outweigh any other overheads?

To be clear, the core rationale to revert this change is fix the functional regression (a query that used to run no longer does), as described in #8738

Once we have figure out how to avoid that functional regression with this change, we can also have a more reasonable discussion on performance. I will file a ticket to make some performance benchmarks

@alamb alamb changed the title revert Materialize dictionaries in group keys Fix regression by reverting Materialize dictionaries in group keys Jan 5, 2024
@alamb alamb requested a review from tustvold January 8, 2024 12:28
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this approach, but don't feel strongly

@alamb
Copy link
Contributor Author

alamb commented Jan 8, 2024

As I mentioned before, regressions in functionality I think take priority over (theorized) performance improvements which is what #8291 is

In retrospect, I should have insisted on some benchmarks showing improvements before I approved (and merged) #8291

To partly make up for this oversight, I plan to at least create said benchmarks so we can have a fact driven conversation rather than speculating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression "RowConverter column schema mismatch, expected Utf8 got Dictionary(Int32, Utf8)" after upgrade
2 participants