Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added system test and sample for dataframe contains array #365

Closed

Conversation

HemangChothani
Copy link
Contributor

Fixes #19

@HemangChothani HemangChothani requested review from tswast and a team November 4, 2020 12:28
@HemangChothani HemangChothani requested a review from a team as a code owner November 4, 2020 12:28
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 4, 2020
@snippet-bot
Copy link

snippet-bot bot commented Nov 4, 2020

Here is the summary of changes.

You added 1 region tag.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Nov 4, 2020
# pyarrow 1.0.0 is required for the use of timestamp_as_object keyword.
"pyarrow >= 1.0.0, < 2.0dev",
# pyarrow 2.0.0 is required for the use of arrays in dataframe to load the table .
"pyarrow >= 2.0.0, < 3.0dev",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not bump the minimum version here. Most features do work with 1.0, and pyarrow is a core library that is very useful to have a wide range of support.

None,
(
bigquery.SchemaField(
"item", "INTEGER", "NULLABLE", None, (), None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... This is a bit of a surprising schema. It appears to match the behavior we were encountering previously. This feature is not supported if we cannot upload directly to a REPEATED INTEGER column.

# table_id = "your-project.your_dataset.your_table_name"

dataframe = pandas.DataFrame({"A": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]})
job = client.load_table_from_dataframe(dataframe, table_id) # Make an API request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without an explicit schema, this sample is no different from the generic load_table_from_dataframe sample.

I was imagining system test XOR sample, as they are testing the same behavior.

@tswast
Copy link
Contributor

tswast commented Nov 4, 2020

I've sent #368 to capture just the desired setup.py changes.

It's possible there are some kinds of arrays (such as arrays of records) that are supported, but it appears arrays of scalars still aren't handled correctly.

@tswast tswast closed this Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery: Upload pandas DataFrame containing arrays
2 participants