Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ListBuilder::with_field to support non nullable list fields (#5330) #5331

Merged
merged 3 commits into from
Jan 25, 2024

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #5330

Rationale for this change

This lets users override the Field used by ListBuilder, allowing overriding the nullability, name, and other metadata. It also switches over to using the safer ListArray constructors.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 24, 2024
///
/// By default a nullable field is created with the name `item`
///
/// Note: [`Self::finish`] and [`Self::finish_cloned`] will panic if the provided data type does not match `T`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ergonomics of this aren't ideal, but aside from adding a data_type method to ArrayBuilder I'm not sure of a way around this.

}
}

/// Override the field passed to [`GenericListArray::new`]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids duplicating the docs about what the implications of different nullability options are

let array_data = unsafe { array_data_builder.build_unchecked() };
let field = match &self.field {
Some(f) => f.clone(),
None => Arc::new(Field::new("item", values.data_type().clone(), true)),
Copy link
Contributor

@dvic dvic Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
None => Arc::new(Field::new("item", values.data_type().clone(), true)),
None => Arc::new(Field::new_list_field(values.data_type().clone(), true)),

Minor nitpick, but noticed this function for creating the "default" list item field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would create a field of DataType::List which isn't what we want here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I must have looked at the wrong thing, the docs say:

assert_eq!(
  Field::new("item", DataType::Int32, true),
  Field::new_list_field(DataType::Int32, true)
);

https://docs.rs/arrow-schema/latest/arrow_schema/struct.Field.html#method.new_list_field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, could it be that you've mistaken new_list_field for new_list? https://docs.rs/arrow-schema/latest/arrow_schema/struct.Field.html#method.new_list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't realise the field version got added in the end 😅. I'm old fashioned and prefer the explicit form, but good suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there are two camps in terms of "do I know / want to know what the name of the anonymous field are in the list"

Specifically,

  1. if you are familiar with the convention, then it is better to have the code explcit.
  2. If you don't know the "item" convention having a function hide it is easier to understand

@alamb alamb changed the title Add ListBuilder::with_field (#5330) Add ListBuilder::with_field to support non nullable list fields (#5330) Jan 25, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thank you @tustvold and @dvic

let array_data = unsafe { array_data_builder.build_unchecked() };
let field = match &self.field {
Some(f) => f.clone(),
None => Arc::new(Field::new("item", values.data_type().clone(), true)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there are two camps in terms of "do I know / want to know what the name of the anonymous field are in the list"

Specifically,

  1. if you are familiar with the convention, then it is better to have the code explcit.
  2. If you don't know the "item" convention having a function hide it is easier to understand


#[test]
fn test_non_nullable_list() {
let field = Arc::new(Field::new("item", DataType::Int32, false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to also add a test for a field that is not named "item"?

fn test_non_nullable_list() {
let field = Arc::new(Field::new("item", DataType::Int32, false));
fn test_with_field() {
let field = Arc::new(Field::new("bar", DataType::Int32, false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tustvold tustvold merged commit 8fff5e4 into apache:master Jan 25, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Creating Non-Nullable Lists with ListBuilder
3 participants