Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Creating Non-Nullable Lists with ListBuilder #5330

Closed
dvic opened this issue Jan 24, 2024 · 3 comments · Fixed by #5331
Closed

Support Creating Non-Nullable Lists with ListBuilder #5330

dvic opened this issue Jan 24, 2024 · 3 comments · Fixed by #5331
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@dvic
Copy link
Contributor

dvic commented Jan 24, 2024

Describe the bug
When I try to write a non-null List<non-null i32> to a parquet file, I get an error. Might be related to #385.

To Reproduce

use std::{error::Error, fs::File, sync::Arc};

use ::arrow::record_batch::RecordBatch;
use arrow::{
    array::{Int32Builder, ListBuilder},
    datatypes::{DataType, Field, Schema},
};

use parquet::arrow::arrow_writer::ArrowWriter;

pub fn main() -> Result<(), Box<dyn Error>> {
    let mut builder = ListBuilder::new(Int32Builder::new());

    builder.append_value([Some(1), Some(2), Some(3)]);

    let array = builder.finish();

    let inner_nullable = false;

    let schema = Arc::new(Schema::new(vec![Field::new_list(
        "col_a",
        Field::new_list_field(DataType::Int32, inner_nullable),
        false,
    )]));

    let batch = RecordBatch::try_new(schema, vec![Arc::new(array)])?;

    let buf = File::create("test.parquet")?;
    let mut writer = ArrowWriter::try_new(buf, batch.schema(), None)?;
    writer.write(&batch)?;
    writer.close()?;

    Ok(())
}

Expected behavior
I expect no errors, but instead I get:

Error: InvalidArgumentError("column types must match schema types, 

expected
List(Field { name: \"item\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })

but found 
List(Field { name: \"item\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })

at column index 0")

@dvic dvic added the bug label Jan 24, 2024
@tustvold
Copy link
Contributor

tustvold commented Jan 24, 2024

This is actually a known limitation of ListBuilder, where it can only produce nullable lists. I think there is already an issue about this, but I can't find it.

You will need to use the ListArray constructors, potentially with the components produced by ListBuilder

Edit: the feature is actually pretty straightforward so bashed it out in #5331

@tustvold tustvold added enhancement Any new improvement worthy of a entry in the changelog and removed bug labels Jan 24, 2024
@tustvold tustvold changed the title Cannot write parquet file with list of non-nullable item Support Creating Non-Nullable Lists with ListBuilder Jan 24, 2024
tustvold added a commit to tustvold/arrow-rs that referenced this issue Jan 24, 2024
@dvic
Copy link
Contributor Author

dvic commented Jan 25, 2024

Edit: the feature is actually pretty straightforward so bashed it out in #5331

Thanks! I actually just wanted to propose something like this :) (I saw the todo line at https://docs.rs/arrow-array/50.0.0/src/arrow_array/builder/generic_list_builder.rs.html#313)

tustvold added a commit that referenced this issue Jan 25, 2024
… (#5331)

* Add ListBuilder::with_field (#5330)

* Tweak docs

* Review feedback
@tustvold tustvold added the arrow Changes to the arrow crate label Mar 1, 2024
@tustvold
Copy link
Contributor

tustvold commented Mar 1, 2024

label_issue.py automatically added labels {'arrow'} from #5331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants