Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] in polars-gpu, group_by(maintain_order=True) is not ordered #16893

Open
KazukiOnodera opened this issue Sep 24, 2024 · 0 comments · May be fixed by #16907
Open

[BUG] in polars-gpu, group_by(maintain_order=True) is not ordered #16893

KazukiOnodera opened this issue Sep 24, 2024 · 0 comments · May be fixed by #16907
Assignees
Labels
bug Something isn't working cudf.polars Issues specific to cudf.polars

Comments

@KazukiOnodera
Copy link

Steps/Code to reproduce bug

import polars as pl
import numpy as np

df = pl.DataFrame(
    {
        "random": np.random.rand(30_000),
        "groups": np.random.randint(100, size=30_000),
    }
)
df = df.lazy()
 
df.group_by("groups", maintain_order=True).agg(pl.col("random").sum()).collect(engine="gpu")

df.group_by("groups", maintain_order=True).agg(pl.col("random").sum()).collect()

Image

Expected behavior
The result should be same as cpu.

Environment details

  • cudf-cu12-24.8.3
  • cudf-polars-cu12-24.8.3
  • cupy-cuda12x-13.3.0
  • polars-1.8.1
  • rmm-cu12-24.8.2
@KazukiOnodera KazukiOnodera added the bug Something isn't working label Sep 24, 2024
@wence- wence- added the cudf.polars Issues specific to cudf.polars label Sep 24, 2024
wence- added a commit to wence-/cudf that referenced this issue Sep 25, 2024
When we are requested to maintain order in groupby aggregations we
must post-process the result by computing a permutation between the
wanted order (of the input keys) and the order returned by the groupby
aggregation. To do this, we can perform a join between the two unique
key tables. Previously, we assumed that the gather map returned in
this join for the left (wanted order) table was the identity. However,
this is not guaranteed, in addition to computing the match between the
wanted key order and the key order we have, we must also apply the
permutation between the left gather map order and the identity.

- Closes rapidsai#16893
@wence- wence- self-assigned this Sep 26, 2024
wence- added a commit to wence-/cudf that referenced this issue Sep 27, 2024
When we are requested to maintain order in groupby aggregations we
must post-process the result by computing a permutation between the
wanted order (of the input keys) and the order returned by the groupby
aggregation. To do this, we can perform a join between the two unique
key tables. Previously, we assumed that the gather map returned in
this join for the left (wanted order) table was the identity. However,
this is not guaranteed, in addition to computing the match between the
wanted key order and the key order we have, we must also apply the
permutation between the left gather map order and the identity.

- Closes rapidsai#16893
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf.polars Issues specific to cudf.polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants