-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ScalarValue::List
not working as expected in UDAF state
#8472
Comments
Maybe @jayzhan211 or @Weijun-H have some thoughts |
Note that ScalarValue::List only accept ListArray. In your example, it should be something like ListArray(Float64Array::from(floats)) where ListArray is length = 1. Btw, ScalarValue::Float64 is somethinkg like Float64Array::from(floats) but length of floats is one. |
The docs say that it accepts an |
Alright, so after constructing a |
Agree, I had thought about it but did not come out a nice solution. The problem is that ScalarValue is enum not a function created with new(), so we can only let it panic somewhere... |
Understood. Can the signature of |
I think this would be a very interesting idea to pursue. Thank you for the suggestion @rspears74 Are you (or @jayzhan211 ) interested in doing so? If not, that is fine, and I will file another ticket with a more approachable description. However, if one of you who already has this context is going to do so I won't bother writing it up more carefully :) |
I'll have to familiarize myself with the contribution guide, but I'd certainly like to take a whack at this @alamb |
Describe the bug
I am trying to use
ScalarValue::List
to storef64
values for thestate
in a UDAF. It was working well in datafusion version 32, but I upgraded to datafusion 33, only to find that the API had changed forScalarValue::List
to accept anArrayRef
rather than anOption<Vec<ScalarValue>>
. I thought I had converted my code correctly, but I'm getting the following error:First of all, I can't tell exactly where this error is coming from (and RUST_BACKTRACE=1 doesn't do anything), even after adding some log statements at the beginning and end of each
Accumulator
function (what I mean by this is that this error doesn't seem to be occurring "in the middle" of any of theAccumulator
functions I've implemented).I am serializing my list via
ScalarValue::List(Arc::new(Float64Array::from(floats)))
wherefloats: Vec<f64>
And deserializing my list via something like this:
Finally, my
state_type
in thecreate_udaf
function is:Via logging, I seem to get successful calls to
state
andupdate_batch
(update_batch
doesn't concern itself with the serialized state and probably isn't part of the issue). If the issue was in the state deserialization I'd expect to see a log statement from the beginning ofmerge_batch
but not one from the end, but I'm not seeing any of mymerge_batch
log statements.I'm not sure if I'm making a mistake somewhere, or if this is a bug. But it seems like somewhere in the guts of the
Accumulator
, something is not working correctly. Happy to help fix the bug if one can be identified.To Reproduce
Define a UDAF
Accumulator
that usesScalarValue::List
to serialize its state, and use the aggregate function.Expected behavior
Successful aggregation of the values into a new dataframe column.
Additional context
No response
The text was updated successfully, but these errors were encountered: