Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query_tag to the model hierarchy #1030

Closed
jon-rtr opened this issue Sep 26, 2018 · 4 comments · Fixed by #2555
Closed

Add query_tag to the model hierarchy #1030

jon-rtr opened this issue Sep 26, 2018 · 4 comments · Fixed by #2555
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! snowflake

Comments

@jon-rtr
Copy link
Contributor

jon-rtr commented Sep 26, 2018

Feature

Feature description

After starting to analyze our query history in some depth, I had the realization that it would be useful to be able to specify QUERY_TAG (snowflake concept) anywhere in the hierarchy, like the materialization configuration.

This way, you can query snowflake query_history by the query tag. Maybe you set one query tag for your whole dbt project. Maybe you set it per model folder. Maybe you set it on specific models.

https://docs.snowflake.net/manuals/sql-reference/parameters.html#query-tag
https://stackoverflow.com/questions/50965558/snowflake-cli-snowsql-query-tagging

QUERY_TAG is a session parameter, so the solution might be to set it in snowflake's table and view materializations, if the query tag is set?

Who will this benefit?

Anyone using snowflake who would like more granular information in her query history.

@drewbanin drewbanin added enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! snowflake labels Nov 8, 2019
@drewbanin drewbanin added this to the Marian Anderson milestone Jun 11, 2020
@DrMcTaco
Copy link
Contributor

I can see this being useful in a few different situations.

  • cost/usage tracking: all queries run by DBT against snowflake being tagged and thus traceable in snowflake query history
  • snowflake optimization/debugging: tags at the model level for trending performance and anomaly detection.

I am considering starting work to resolve this issue and the above use cases make me think there may be at least 2 distinct places one would want to add a tag:

  1. at the connection level
  2. at the model level

The second can easily be achieved with model hooks (this is not to say some centralized supported method should not be developed).

I am however not sure how to go about setting query tag immediately after opening a connection so that all queries run regardless of their source are tagged. Any ideas on how to achieve that or other thoughts on the general problem before I dig in would be appreciated.

@drewbanin
Copy link
Contributor

hey @DrMcTaco - I like the way you're thinking about this feature! I agree - I think we should support query tags both at the model level and at the connection level. We'd very much welcome a PR for this change.

Some rough implementation notes:

  • We can add a parameter to the SnowflakeCredentials dataclass to capture a query_tag string in a profiles.yml target.
  • We'll want to alter the session parameter after opening a connection. Check out an example of where we do something similar in the postgres plugin.
  • Supporting this session parameter in model configs is going to be a tiny bit more involved than setting this field for all connections opened by dbt. We'd need to update each of dbt's Snowflake materializations to check the config and execute a SQL query to set the session parameter before running a query. There might be a better way to do this though.... let me think about that some more.

Beyond these things, we also need to think about what happens if a query_tag is provided for both the target and a model. The easiest approach would just be to run two alter session queries. I think/hope Snowflake will do the right thing there, but it's worth verifying the behavior!

Let me know what you think about all that :)

@DrMcTaco
Copy link
Contributor

The easiest approach would just be to run two alter session queries. I think/hope Snowflake will do the right thing there, but it's worth verifying the behavior!

Looks like you can just run one after another and the second will overwrite the first
alter_tag_cmd
alter_tag_history

In other news I have the profile based query_tags working. I am getting a bit hung up finding the right place and way to do it in the model macros. I am not the most familiar with Jinja.

Can you link to a macro where something similar is done?

@drewbanin
Copy link
Contributor

Nice!

For the model-level configs, you'll want to add the query_tag config here to register the config with the Snowflake adapter plugin:

https://github.com/fishtown-analytics/dbt/blob/74c78ef9279638d14949d4e93ae76166d4fd8022/plugins/snowflake/dbt/adapters/snowflake/impl.py#L20-L27

As far as the jinja code is concerned, I'd grab this config value from the jinja context with config.get('query_tag'). If the config is defined, then I'd issue the alter session statement at the top of the Snowflake materialization macros:

I'd try to run this query as early as possible in the materialization code as possible so that all of the SQL (eg. hooks) are executed with the correct query tag. It might also be a good idea to make a helper macro (like this one) that actually does the heavy lifting to run this alter session statement if a config is defined.

One last thing to consider: dbt uses a connection pool to execute queries. Do we need to "reset" the query_tag after running a given model? I want to make sure that the session parameter for a previous model doesn't leak into a subsequent model executed from the same thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! snowflake
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants