Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake column comments #2321

Merged
merged 16 commits into from
May 5, 2020

Conversation

sfc-gh-aseitz
Copy link
Contributor

@sfc-gh-aseitz sfc-gh-aseitz commented Apr 14, 2020

resolves:

Description

Adding in support for Persist Docs into the Snowflake adapter and laying out a general framework for other adapters to support this feature.

Some caveats:

  1. It only supports table materialization currently. Incremental materializations will not be refreshed with new docs until further logic is added to them. Additionally, views in Snowflake do not support altering column comments and must be created when the view is created.
  2. Snowflake uses $$ string literal $$ to make ingesting strings with quotes easy. Therefore in your docs, if you have a $ we will replace it with [$] such that you never have two $$ next to each other.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot
Copy link

cla-bot bot commented Apr 14, 2020

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @snowflakeseitz

@beckjake
Copy link
Contributor

@cla-bot check

@cla-bot cla-bot bot added the cla:yes label Apr 14, 2020
@cla-bot
Copy link

cla-bot bot commented Apr 14, 2020

The cla-bot has been summoned, and re-checked this pull request!

@beckjake
Copy link
Contributor

Additionally, views in Snowflake do not support altering column comments and must be created when the view is created.

Just out of curiosity, how does one do that? I can see how to set a comment for the whole view, but not for columns. Is the same syntax viable for tables?

@sfc-gh-aseitz
Copy link
Contributor Author

@beckjake exactly, you would have to inject it into the SQL statement itself:
https://support.snowflake.net/s/article/How-To-Add-a-Comment-to-the-Column-of-a-View

@sfc-gh-aseitz
Copy link
Contributor Author

@drewbanin let me know what I need to get this over the finish line :)

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @snowflakeseitz - I just dropped some comments inline here. Let me know what you think :)

It would be really good to add some tests here! Can you think of a good way to test this functionality? I can imagine an integration test which:

  1. Creates a table/view with comments
  2. Runs some information schema queries to ensure the comments have been set successfully


{% endmacro %}

{% macro snowflake__create_view_as(relation, sql) -%}
{%- set secure = config.get('secure', default=false) -%}
{%- set copy_grants = config.get('copy_grants', default=false) -%}
{%- set sql_header = config.get('sql_header', none) -%}
{%- set raw_persist_docs = config.get('persist_docs', {}) -%}
{%- set relation_comment = get_relation_comment(raw_persist_docs, model) -%}
{%- set column_comment = get_relation_column_comments(raw_persist_docs, model) -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it appears that we're not going to be able to set column comments on views, I do think we can set a view-level comment!

Can you add a call to snowflake__alter_relation_comment at the bottom of this macro to render the SQL required to add a view comment?

@@ -156,3 +172,19 @@
alter table {{ relation }} alter {{ adapter.quote(column_name) }} set data type {{ new_column_type }};
{% endcall %}
{% endmacro %}

{% macro snowflake__alter_relation_comment(relation, relation_comment) -%}
comment on table {{ relation }} IS $${{ relation_comment | replace('$', '[$]') }}$$;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use relation.type instead of table here? That will let us use this macro for both tables and views :)

Same for the implementation of snowflake__alter_column_comment below!

@drewbanin
Copy link
Contributor

hey @snowflakeseitz - do you think you'll be able to revisit this PR? I think we can indeed proceed without support for column-level comments on views for now, and there are certainly ways we can add support for them in the future too. Let me know if you're able to respond to the comments I dropped inline in this PR :)

@sfc-gh-aseitz
Copy link
Contributor Author

@drewbanin I fell off the face of the earth last week with work :) I will make these changes and test them by tomorrow

@drewbanin
Copy link
Contributor

@jtcohen6 can you give this a review tomorrow? Please loop in @beckjake if you have any questions :)

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passes local testing for me. I left one comment around an argument naming inconsistency, though it doesn't actually raise an error.

Discussion: where should this happen?

Does it make sense that the comment on + alter statements should be called by snowflake__create_table_as and snowflake__create_view_as, as opposed to the Snowflake table and view materializations?

On BigQuery, we can embed the object description within the create statement's options(). If we think that BQ should set the standard, then it makes sense that we'd append comment on + alter where there are here, since it's closest to parity.

If we instead say that the standard should be Postgres/Redshift/Snowflake, where we're effectively adding descriptions by post hooks, I think it would make more sense (and be more modular) to move config.get('persist_docs'), the calls to get_relation_comment/get_relation_column_comment, and alter_relation_comment/alter_column_comment into the materialization instead, and treat them like hooks.

(I know that Snowflake clustering operates as a series of alter statements within the tight bound of snowflake__create_table_as, but I've always viewed that as a bit of a hack.)

I don't think that line of thinking should block this PR, which is just about good to go. Since an ulterior goal is scoping out future contributions that might add this functionality on Postgres and Redshift, I do think we're setting some precedent here.

Future work

@drewbanin We discussed yesterday two possible approaches for implementing column comments for the Snowflake view materialization. I imagine we'll scope that for a future PR, building off the work here.

It also looks like this branch is now 59 commits ahead of (and several thousand line diffs away from) dev/octavius-catto. I just want to confirm that that's a result of those commits having been squashed in the base branch.

@@ -16,3 +16,19 @@
{% endif %}

{% endmacro %}


{% macro get_relation_column_comments(persist_docs, model) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the exception below would return an error, because the macro argument is named persist_docs but the exception expects raw_persist_docs.

As it is, I haven't been able to actually get this exception/error. It's preempted by a parsing error, now that model + project configs have tighter type enforcement:

Invalid config field: "persist_docs" must be a dict

All the same, the argument should be named raw_persist_docs or persist_docs consistently. I'm in favor of raw_persist_docs here, since that's what you pass from create_table_as.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved, I stuck with persist_docs because that what was used before my PR

CHANGELOG.md Outdated Show resolved Hide resolved
@beckjake
Copy link
Contributor

Discussion: where should this happen?

Let me chime in with my $0.02 here, since I'm likely going to be doing the postgres/redshift one. When I first thought about this, I was firmly in favor of this happening in a special macro called by materializations, similar to how hooks work, and letting BigQuery be the weird special case.

But then I realized that create_X_as macros have kind of misleading names. Those macros are always going to act with the current model's config settings, regardless of the relation name you create. If you call create_table_as in bigquery with a Relation that isn't the model being materialized, your created relation will have docs corresponding to the model being materialized, even if it makes no sense! Same goes for labels. In the interest of consistency, maybe it makes a lot of sense to put this stuff into the create_X_as macros.

It also makes me think maybe we should rename these macros, but I would really prefer not to do that, plugins have just seen so much churn lately.

@jtcohen6
Copy link
Contributor

Those macros are always going to act with the current model's config settings, regardless of the relation name you create.

That's a really good point. Insofar as create_x_as macros are pulling directly from the model config, they are more than just boilerplate DDL wrappers.

I recall having discussed the desire for a future state in which only materializations pull directly from model config, and pass those values around to macros as keyword arguments. I don't know if that's still something we want, or if we want it enough to make meaningful steps in that direction—or, in this case, to not make meaningful steps away from that direction.

@beckjake
Copy link
Contributor

I recall having discussed the desire for a future state in which only materializations pull directly from model config

You'd have to pass something suspiciously similar to the full generic model config to a lot of these things if you wanted to handle per-adapter differences nicely while allowing third parties to create their own custom adapter-generic materializations that pass along adapter-specific configs cleanly. I suppose you could have a generic adapter_macro that takes arbitrary keyword arguments and/or a model, returns some sort of object that you should then pass to create_table_as. So materializations would do:

{# get_create_table_opts calls return adapter_macro('get_create_table_opts', model=model, **kwargs) #}
{% set create_table_opts = get_create_table_opts(model=model) %} 
{% do run_query(create_table_as(tmp_relation, sql, create_table_opts) %}

Or, instead of a macro maybe you'd just pass in the global config object and give materialization authors an easy way to make their own "config"s to pass to create_table_as?

I'm not sure what that means for this PR, but I'm inclined to accept its answer of doing what bigquery already does and assuming that create_X_as is only being run for the currently-being-materialized model. I think revisiting that assumption is also totally on the table, we should just do it somewhere else (and before 1.0).

@sfc-gh-aseitz
Copy link
Contributor Author

@jtcohen6 I resolved the rebase issue from above, so the PR should be much cleaner
@beckjake happy to make further modifications if you think necessary or if it is worth waiting for the future

{{ return(adapter_macro('alter_column_comment', relation, column_dict)) }}
{% endmacro %}

{% macro default__alter_column_comment(relation, column_name, new_column_type) -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be relation, column_dict, not relation, column_name, new_column_type!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this is fixed now

@drewbanin
Copy link
Contributor

@snowflakeseitz FYI these tests are failing! I can take a deeper look and follow up with some more info about what's going wrong here

@beckjake
Copy link
Contributor

beckjake commented May 1, 2020

@snowflakeseitz I had the same issue you did when I was working on #2333 #2378! What I found was that the incremental model uses this, which doesn't have a type. My solution was to put {% set relation = relation.incorporate(type='view') %} in the create_view_as macro implementation before I called the comment-setting methods, and {% set relation = relation.incorporate(type='table') %} in the create_table_as macro.

Here's the relevant commit

@sfc-gh-aseitz
Copy link
Contributor Author

sfc-gh-aseitz commented May 4, 2020

@beckjake good catch, I just committed your suggestion!

Comment on lines +134 to +156


{% macro alter_column_comment(relation, column_dict) -%}
{{ return(adapter_macro('alter_column_comment', relation, column_dict)) }}
{% endmacro %}

{% macro default__alter_column_comment(relation, column_dict) -%}
{{ exceptions.raise_not_implemented(
'alter_column_comment macro not implemented for adapter '+adapter.type()) }}
{% endmacro %}

{% macro alter_relation_comment(relation, relation_comment) -%}
{{ return(adapter_macro('alter_relation_comment', relation, relation_comment)) }}
{% endmacro %}

{% macro default__alter_relation_comment(relation, relation_comment) -%}
{{ exceptions.raise_not_implemented(
'alter_relation_comment macro not implemented for adapter '+adapter.type()) }}
{% endmacro %}




Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowflakeseitz Can you remove these, or the ones from the bottom of the file and the comment on the other ones? I merged the redshift/postgres PR and I must have messed it up, git didn't detect the duplicate changes as I expected it to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I just rebase to your changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I think you'll have to remove them during the rebase anyway.

{{ return(none) }}
{% endif %}

{% endmacro %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as my above comment applies here - I think these two are the only overlap!

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion above @beckjake. I'm on board with shipping the implementation as-is for 0.17.0. Given your finding in #2378 on the redshift/pg version, I think we should reassess where these comment statements best belong—within create_table_as or as implicit hooks—as part of revisiting materializations more broadly for 0.18.0.

@sfc-gh-aseitz
Copy link
Contributor Author

@drewbanin this has been rebased to @beckjake most recent changes! Hopefully, we can get this merged soon!

@beckjake
Copy link
Contributor

beckjake commented May 5, 2020

I'll kick off the test suite now, assuming it passes I'll merge it!

@beckjake beckjake merged commit 7c916e9 into dbt-labs:dev/octavius-catto May 5, 2020
@drewbanin drewbanin changed the title Column comments Snowflake column comments May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants