Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider table scan filter during analysis of optimize projections #9131

Closed
wants to merge 3 commits into from
Closed

Consider table scan filter during analysis of optimize projections #9131

wants to merge 3 commits into from

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #9109.

Rationale for this change

What changes are included in this PR?

This PR fixes the bug described in the issue

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Feb 5, 2024
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mustafasrepo do you think it is #5436 related?

@mustafasrepo
Copy link
Contributor Author

Thanks @mustafasrepo do you think it is #5436 related?

I think that PR is related to the projection functionality inside FilterExec. But I am not sure though.
I think, currently we can eliminate unnecessary fields in the filter expression by adding projection after FilterExec such as:

ProjectionExec(a)
--FilterExec(a+b=2)

However, in the issue I am not sure this is the desired feature.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remember we had this discussion in the past (maybe with @Dandandan ) about if columns referenced only in pushed down projections should also appear in the projection of the scan or if the table scan implementations should have to figure that out

I think the consensus was that the projection should only reflect which columns are actually needed at the output of the TableProvider. This allows for more optimized pushdown in parquet (e.g. can skip materializing filter-only columns if we can rule out the entire page using page pruning)

In fact the documentation of TableProvider says explicitly

In certain cases, a query may only use a certain column in a Filter that has been completely pushed down to the scan. In this case, the projection will not contain all the columns found in the filter expressions.

Which I think is contrary to what this PR does.

Thus, I am not sure if this is a good change.

If we do make this change, we should also update the documentation to reflect what the code does.

@ozankabak
Copy link
Contributor

ozankabak commented Feb 6, 2024

Thus, I am not sure if this is a good change.

I agree with @alamb. I talked to @mustafasrepo and I am not sure if #9109 is a bug.

@mustafasrepo
Copy link
Contributor Author

Thus, I am not sure if this is a good change.

If we do make this change, we should also update the documentation to reflect what the code does.

After these discussions, I agree that this change is not for the better. Hence I am closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong projection after optimize_projections rule
4 participants