Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bigquery/storage/managedwriter/adapt): add schema -> proto support #4375
feat(bigquery/storage/managedwriter/adapt): add schema -> proto support #4375
Changes from 8 commits
d802075
1bf877e
df9faa8
3cbe9c2
03abf14
b353af0
e30b336
c82dec6
de0c252
7ea758f
af9cb2f
66faea3
a170306
b94b29c
c257d01
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a Timestamp type in proto3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/protocolbuffers/protobuf/tree/master/src/google/protobuf, but this comes down to what the backend accepts for each column type. If the backend will convert timestamp protos, we can use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we haven't supported it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we default nullable to wrapper? Should we annotate the field with use_defaults instead?
https://screenshot.googleplex.com/3kHa8QbkhmgFsuj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to include a bunch of internal type annotations for driving behavior here.
We have no stable source of truth for them. They're published within the zetasql project, but that project doesn't make any guarantees about stability and has no GA release, so the route to a GA launch is...probably problematic.
The reason we're using proto3 semantics is external users live in a proto3 ecosystem. In that world, the wrapper types are how to properly communicate nulls. I don't know that it matters, but type_annotations.proto is still proto2 based, so it's possible it introduces other issues as a side effect. We could consider revisiting this route down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think frontend supports wrapper to nullable field conversion (we probably should). So currently if you convert these to wrapper, they will try map themselves back as a struct. In order to support such converter, the field needs to be annotated as is_wrapper. I understand that zetasql is not usable, but we can open backdoors for your self-defined annotation. Introducing the default behavior of mapping wrapper to nullable field would be a breaking change now.
Note that our API actually accepts proto2 instead of proto3 as the schema descriptor, inside of the converter, we look at the default_value field, if it is set then we set the default value, if not set, then we set null. You can surely set it on the ProtoSchema explicitly without having to introduce this mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, you're making users choose between being unable to send nulls, or being unable to send default values (empty string, 0, etc) when communicating the ProtoSchema?
Do you want me to create an issue for supporting wrapper types on the internal converter, or is there one already? An advantage of using the well known wrapper types is to avoid the need for special annotations, since both the client and backend have the definitions in place as part of the whole protocol buffer ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to support the wrapper, we would need the is_wrapper annotation, otherwise it is a breaking change to existing conversions.
As to your first question, we don't have to make the choice if we are using proto2 (yeah, going back to that question). If it is proto 3 then there seems to be no other choices.
you can create an issue to support the wrapper, you are even welcomed to just go ahead and fix it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created b/193064992 for the wrapper issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the final solution would be for zetasql to be published and everything conform with the current googlesql specification (instead of creating our own wheels). If that is not possible, adding some fields to ProtoSchema would be fine, less ideal since it would be per schema instead of per msg/field option.
Since you are applying this to all nullable fields, it would be better for the backend to first support it before you add the wrapper conversion, otherwise, the library would be unusable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a toggle to suppress generation of wrapper types, so we can revisit this once we have a path forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also looking at the
is_wrapper
annotation you mentioned (googlesql MessageOption extension). I don't think that's going to be the right way to do this, as the expectation is that you "own" the message and can annotate it appropriately. These types are provided as part of the standard protocol buffer definitions, so adding annotations seems dodgy. I could see adding a FieldOption extension where you effectively say "the message in this field is a wrapper", but the "this message is a wrapper" annotation seems mismatched for standard types.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the annotation should belong to the field.