Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][Acero] Provide method to perform aggregations with acero for datasets #44168

Open
sidneymau opened this issue Sep 19, 2024 · 1 comment

Comments

@sidneymau
Copy link

sidneymau commented Sep 19, 2024

Describe the enhancement requested

Presently, Dataset has methods to perform several actions—sort_by, join, and join_asof—with Acero. It would be especially helpful to provide a method to perform aggregations on datasets using Acero for convenient out of core processing.

The implementation can be modeled off of the existing Dataset Acero operations as well as the aggregate method of TableGroupBy.

Component(s)

Python

@sidneymau
Copy link
Author

Note that the implementation proposed in the above PR ends up being fairly inefficient because it can't fully leverage nodes for, e.g., projections and filtering. If interested, this functionality could be included—basically providing a dataframe-like interface to constructing an Acero plan—but that is a bit larger in scope. I made a first effort at this for my own use: https://github.com/sidneymau/dataplan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant