Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake query tag #2555

Merged
merged 12 commits into from
Jun 25, 2020
Merged

Conversation

DrMcTaco
Copy link
Contributor

@DrMcTaco DrMcTaco commented Jun 16, 2020

resolves #1030

Description

Add the ability to set snowflake query tags upon creation of a connection or at the start of a materialization. For query tags used during materialization the session parameter that controls query tags will be returned to whatever value it was before the materialization began

Not sure what/how to test for this addition.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Jun 16, 2020
Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @DrMcTaco! This PR looks great, I have one small fix to suggest.

It would also be great if you could add tests, though if tests aren't really reasonable here I understand that. Does snowflake have an API or something we could query that shows this stuff?

One thing I would like to bring up: On errors, it seems to me that this will result in the query tag getting "stuck", because the unset_query_tag never runs, right? Is that a big problem? If it is, I think we could add a dbt feature to make cleanup more feasible (some sort of adapter-level pre/post model hooks, or a finally equivalent in jinja, or something similar).

{% endmacro %}

{% macro unset_query_tag(original_query_tag) -%}
{% set new_query_tag = config.get('query_tag', none) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{% set new_query_tag = config.get('query_tag', none) %}
{% set new_query_tag = config.get('query_tag') %}

There's no reasonable way for you to know this, but config.get doesn't do what you think it does! The second argument isn't a default value, but a "validator". See #2441.

@DrMcTaco
Copy link
Contributor Author

It would also be great if you could add tests, though if tests aren't really reasonable here I understand that. Does snowflake have an API or something we could query that shows this stuff?

Snowflakes query tags are session level parameters. They can be viewed by running the sql SHOW PARAMETERS [ LIKE '<pattern>' ][ IN { SESSION | ACCOUNT | ... }]. The query_tag parameter does not appear to be accessible by introspecting the snowflake connection object and I am not aware of any way to find session parameters for arbitrary open sessions. So any kind of testing would require running the SQL in the get_current_query_tag macro from inside the session at appropriate times or looking at the snowflake query_history view, which has some obnoxious lag. I think given the mentioned complications testing for this feature might be too difficult.

One thing I would like to bring up: On errors, it seems to me that this will result in the query tag getting "stuck", because the unset_query_tag never runs, right? Is that a big problem? If it is, I think we could add a dbt feature to make cleanup more feasible (some sort of adapter-level pre/post model hooks, or a finally equivalent in jinja, or something similar).

In my view query_tags are primarily a tacking and observability tool, not a functional part of the transformation process. To my knowledge they cannot impact the running of SQL against snowflake. So "getting stuck" with an incorrect query_tag will not stop dbt from running properly.

Also I am not familiar enough with how dbt handles SQL failures in the middle of runs with multiple materializations. Will connections/sessions be reused for subsequent models if one model fails?

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will connections/sessions be reused for subsequent models if one model fails?

I am pretty sure we do, yes. I think I agree with your assessment that it's not the end of the world if the tags are wrong, I guess I just have this nagging feeling that if we're going to do it we should do it right. But maybe this is one of those "get it working now, make it great later" situations.

@jtcohen6 what do you think? Is this an acceptable risk? I'm probably not familiar enough with snowflake to make a good call here.

@jtcohen6
Copy link
Contributor

The stakes here feel pretty low. I haven't used query tags in Snowflake, but I could imagine them being most compelling as a mechanism for longitudinal analysis of models across production runs where builds are almost always successful, e.g. to measure performance timing. If I needed to debug a run with failed models from within Snowflake's query history, I'd more likely look to the query-comment or raw SQL as a starting point.

That said, I agree that we should have a construct for end-of-materialization cleanup steps that aren't precluded by model SQL errors.

@beckjake
Copy link
Contributor

Cool, let's just add this as-is then. We can add support for proper cleanups at a later date. I'm going to test it out locally manually before I kick off tests - I will try to get to this today, but it may be next week.

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this and it seems to work to me, thanks for your patience! I have some minor formatting requests regarding the changelog, but that's it.

CHANGELOG.md Outdated
Comment on lines 1 to 2
## Next
- Added support for Snowflake query tags at the connection and model level ([#1030](https://github.com/fishtown-analytics/dbt/issues/1030))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move this into the dbt 0.18.0 section, put it under a ### Features heading, and make it link to both the issue and this PR?

@beckjake
Copy link
Contributor

This looks ready to go to me! I've merged the changelog updates and once the tests pass I'll merge this in for 0.18.0. Thanks for your contribution, @DrMcTaco!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add query_tag to the model hierarchy
3 participants