Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add projection to FilterExec to avoid unecessary output creation #5436

Closed
Dandandan opened this issue Mar 1, 2023 · 2 comments · Fixed by #12281
Closed

Add projection to FilterExec to avoid unecessary output creation #5436

Dandandan opened this issue Mar 1, 2023 · 2 comments · Fixed by #12281
Labels
enhancement New feature or request performance Make DataFusion faster

Comments

@Dandandan
Copy link
Contributor

Dandandan commented Mar 1, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently FilterExec filters/outputs all columns present in the filter expression. This does unnecessary work, as we might not need certain columns later.

Describe the solution you'd like
Add projection to Filter/FilterExec and use it before passing the RecordBatch to the arrow kernel.
This saves a bit of computation, especially when there is only one column in the projection (fast case).

Also update the schema based on the projection.

Describe alternatives you've considered

Additional context

@Dandandan Dandandan added enhancement New feature or request performance Make DataFusion faster labels Mar 1, 2023
@jackwener
Copy link
Member

It's a great idea.

After it's finished, we also should do optimization job in EliminateProjection PushdownProjection

@Dandandan
Copy link
Contributor Author

Dandandan commented Mar 1, 2023

Thanks @jackwener yes, I agree (suggestion was actually inspired by your recent PR, thanks for that).

Filter should also update the output schema based on projection, so EliminateProjection should be able to use the schema based on the change there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Make DataFusion faster
Projects
None yet
2 participants