Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for BigQuery table and view options #1061

Merged
merged 5 commits into from
Jan 23, 2024

Conversation

iffyio
Copy link
Contributor

@iffyio iffyio commented Dec 3, 2023

Extends the parser with BigQuery support for

  • Table/View level options
  • Column level options
  • Table creation configurations CLUSTER BY, PARTITION BY

Extends the parser with BigQuery support for
- Table/View level options
- Column level options
- Table creation configurations `CLUSTER BY`, `PARTITION BY`
@coveralls
Copy link

coveralls commented Dec 3, 2023

Pull Request Test Coverage Report for Build 7584334675

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.3%) to 87.607%

Totals Coverage Status
Change from base Build 7575556197: -0.3%
Covered Lines: 19121
Relevant Lines: 21826

💛 - Coveralls

also replace custom trimming with concat
@iffyio iffyio force-pushed the bigquery-create-table-view-options branch from a345ec1 to 3b2dec1 Compare December 7, 2023 08:06
src/ast/mod.rs Outdated
@@ -2713,6 +2752,25 @@ impl fmt::Display for Statement {
if let Some(order_by) = order_by {
write!(f, " ORDER BY ({})", display_comma_separated(order_by))?;
}
if let Some(bigquery_config) = big_query_config {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names bigquery_config and big_query_config are different, is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right that wasn't intentional, updated!

@@ -72,6 +72,7 @@ pub struct CreateTableBuilder {
pub on_commit: Option<OnCommit>,
pub on_cluster: Option<String>,
pub order_by: Option<Vec<Ident>>,
pub big_query_config: Option<Box<BigQueryCreateTableConfiguration>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other dialects that will have similar variations on Create Table?
If yes, might be worth pulling this up to a more generic type - something like CreateTableConfiguration with variants for different dialects.

Might be a premature abstraction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense to do! I've updated to use an enum as a result. There are a couple fields on the struct like the hive_* ones that could use that format so it seems reasonable to introduce yeah

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @iffyio and I apologize for the late reivew (I typically batch the reviews of sqlparser PRs prior to release to maximize efficiency)

I left some suggestions, please let me know what you think

src/ast/ddl.rs Outdated
@@ -527,6 +528,29 @@ impl fmt::Display for ColumnDef {
}
}

/// Column definition for a view.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add some examples to this doc comment showing what is allowed (and not)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the description

src/ast/ddl.rs Outdated
/// ```
/// [1]: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#view_column_option_list
/// [2]: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#column_option_list
SqlOptions(Vec<SqlOption>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the SQL is OPTIONs what would you think about keeping the AST similar?

Suggested change
SqlOptions(Vec<SqlOption>),
Options(Vec<SqlOption>),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

src/ast/mod.rs Outdated
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub enum CreateTableConfiguration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you considered inlining these into Statement::CreateTable rather than adding a new structure?

Like

/// CREATE TABLE
CreateTable {
    or_replace: bool,
    temporary: bool,
    external: bool,
    /// A partition expression for the table.
    /// <https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#partition_expression>
    partition_by: Option<Expr>,
    /// Table clustering column list.
    /// <https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#table_option_list>
     cluster_by: Option<Vec<Ident>>,
    /// Table options list.
    /// <https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#table_option_list>
     options: Option<Vec<SqlOption>>,
...

I think that might be more consistent with other dialect specific fields as well as order_by and strict.

Having an extra struct here also may be confusing as it doesn't seem to mirror the structure of the SQL -- for example, it seems to imply to me that there is some sort of "CONFIG" keyword / clause that appears in the SQL when https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_table_statement seems to imply that PARTITION BY CLUSTER BY and OPTIONS are clauses of the CREATE TABLE statement itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I figured to have them inline initially, agree that makes it less confusing. The only issue was with serializing back to SQL text - since the fields in bigquery's case have a defined order (e.g PARTITION BY before CLUSTER BY), the inlined fields might end up not being reusable by other dialects if they show up at different parts of the CREATE statement. I'm not so sure if that's worth optimizing for though, so happy to just inline them if that's preferrable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an inlined version would be better to maintain consistency. If someone needs a different order when serializing for another dialect we can add that as a follow on PR perhaps.

If for some reason we need to keep the separate struct I think that would be acceptable if there were some doc comments explaining the rational (as that is not easy to understand from the code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good! Will make updates to this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined!

@iffyio
Copy link
Contributor Author

iffyio commented Jan 19, 2024

@alamb sorry for the delay in following up on this - I've addressed the comments and it should be good for another pass when you get the chance to!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @iffyio -- this is looking great.

Also thanks again @tobyhede for your reviews

Some(vec![
SqlOption {
name: Ident::new("partition_expiration_days"),
value: Expr::Value(number("1")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for testing that the value came through as an Expr

@alamb alamb merged commit 3a6d3ec into apache:main Jan 23, 2024
10 checks passed
@iffyio iffyio deleted the bigquery-create-table-view-options branch July 16, 2024 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants