Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: data types in array_expressions #7280

Merged
merged 11 commits into from
Aug 16, 2023
Merged

refactor: data types in array_expressions #7280

merged 11 commits into from
Aug 16, 2023

Conversation

izveigor
Copy link
Contributor

@izveigor izveigor commented Aug 14, 2023

Which issue does this PR close?

Part of #7279
Part of #7288

Rationale for this change

We can use the template decorator (call_array_function) to reduce code (DRY method). I use scoping tactic (See examples: https://veykril.github.io/tlborm/decl-macros/minutiae/scoping.html)

What changes are included in this PR?

Are these changes tested?

No

Are there any user-facing changes?

Yes

@github-actions github-actions bot added the physical-expr Physical Expressions label Aug 14, 2023
@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 14, 2023
@izveigor izveigor marked this pull request as draft August 14, 2023 20:54
@izveigor izveigor marked this pull request as ready for review August 15, 2023 08:22
@izveigor
Copy link
Contributor Author

@alamb @jayzhan211 I wonder if you have time to review the cleanup.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM


let mut boolean_builder = BooleanArray::builder(array.len());
for (arr, sub_arr) in array.iter().zip(sub_array.iter()) {
if let (Some(arr), Some(sub_arr)) = (arr, sub_arr) {
let res = match (arr.data_type(), sub_arr.data_type()) {
(DataType::List(_), DataType::List(_)) => {
let res = match arr.data_type() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to only check arr type not sub_arr type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, check_datatypes checks the equality of arr and sub_arr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

"Array_append is not implemented for types '{array_data_type:?}' and '{element_data_type:?}'."
)))
}
check_datatypes("array_append", &[arr.values(), &args[1]])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use element instead of &args[1]

)))
}
};
check_datatypes("array_positions", &[arr.values(), &args[1]])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use element instead of &args[1]

};
($DATATYPE:expr, $INCLUDE_LIST:expr) => {{
match $DATATYPE {
DataType::List(_) => array_function!(ListArray),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better if we use a flag to merge these two in one?

Suggested change
DataType::List(_) => array_function!(ListArray),
DataType::List(_) => if $INLCUDE_LIST {array_function!(ListArray)} else {
unreachable!()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not work with macro_rules! because every function (even with if) will consider List is one of possible ways to compute function (some functions can not work with List).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @izveigor -- very nice 👌

my only concern is about the change to the tests, otherwise this PR is a clear improvement in my mind

I do think we can make things even better (as a follow on PR) but I think we should move forward with this one

))),
macro_rules! array_function {
($ARRAY_TYPE:ident) => {
slice!(list_array, key, extra_key, return_element, $ARRAY_TYPE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code certainly looks nicer -- thank you @izveigor

I think this function still generates an individual function for each potential ListArray type, when it is fundamentally an operation on indices (and thus we don't need specialized functions for Strings, Int8s, Int16s, Int32s, etc)

However this is getting closer to a single implementation so I think it is improvement to what was here previously

# array_append scalar function #1
query ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a regression in functionality (aka something that used to work no longer does)? Is it hard to fix prior to merging this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now working on the problem of empty arrays, so I removed the functionality for a while. But I agree with you that it's better to leave the current null handling for now.


let array = array_append(&args);

assert!(array.is_err());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also want to check that the error is as expected with something like:

Suggested change
assert!(array.is_err());
assert!(array.unwrap_err().to_string(), "Expected error message");

@github-actions github-actions bot removed the core Core DataFusion crate label Aug 15, 2023
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Aug 15, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @izveigor

@alamb
Copy link
Contributor

alamb commented Aug 16, 2023

🤔 the CI appears to be failing now

External error: query failed: DataFusion error: Optimizer rule 'simplify_expressions' failed
caused by
Error during planning: array_append received incompatible types: '[Null, Int64]'.
[SQL] select array_append(make_array(), 4);
at test_files/array.slt:868

Error: Execution("1 failures")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`

Caused by:
  process didn't exit successfully: `/__w/arrow-datafusion/arrow-datafusion/target/debug/deps/sqllogictests-0cd27f6536ed60ad` (exit status: 1)
Error: Process completed with exit code 1.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 16, 2023
@izveigor
Copy link
Contributor Author

@alamb I'm sorry, but we can not avoid regression with array_append and array_prepend right now. I think I can fix it in the next batch of PRs.

@alamb
Copy link
Contributor

alamb commented Aug 16, 2023

I took the liberty of merging up to this branch to resolve conflicts between this and #7293

@alamb alamb merged commit 3702aba into apache:main Aug 16, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants