Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve utility of database & schema information present in on-run-end hooks #1924

Closed
1 of 5 tasks
dinedal opened this issue Nov 18, 2019 · 6 comments
Closed
1 of 5 tasks
Labels
enhancement New feature or request

Comments

@dinedal
Copy link

dinedal commented Nov 18, 2019

Describe the bug

When running dbt run with a profile that includes the example macro of https://docs.getdbt.com/docs/on-run-end-context and writes to multiple databases, the grants are only executed on one of the databases, not the correct database of the schema

Steps To Reproduce

Use the macro at https://docs.getdbt.com/docs/on-run-end-context

Use a dbt_project.yml similar to:

models:
  dbt_project:
    staging:
      schema: staging
      database: data_warehouse
    dim_models:
      schema: dim_models
      database: data_warehouse
    data_marts:
      schema: data_marts
      database: data_warehouse
    exports:
      schema: dropbox
      database: exports
on-run-end:
  - "{{ grant_on_schemas('all privileges', schemas, 'accountadmin', true) }}"

When any model in exports is run, confirm that the grant statement fails because it attempts to run on the database data_warehouse instead, where dropbox schema doesn't exist

Expected behavior

The macro should execute on the database of the schema(s) passed in

Screenshots and log output

12:09:17 | Concurrency: 16 threads (target='prod')
12:09:17 |
.
.
.
12:09:40 |
12:09:40 | Running 3 on-run-end hooks
12:09:40 | 1 of 3 START hook: dbt_project.on-run-end.0.............................. [RUN]
Database error while running on-run-end
Encountered an error:
Database Error
  002003 (02000): 01904edd-0042-9085-0000-243d012d02b2: SQL compilation error:
  Schema 'DATA_WAREHOUSE.DROPBOX' does not exist or not authorized.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.14.3
   latest version: 0.14.4

Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation

The operating system you're using:
OS X 10.14.5

The output of python --version:
Python 3.7.5

Additional context

N/A

@dinedal dinedal added bug Something isn't working triage labels Nov 18, 2019
@drewbanin drewbanin added enhancement New feature or request and removed triage bug Something isn't working labels Nov 20, 2019
@drewbanin drewbanin changed the title Grants in on-run-end context are not mapped to the correct database Improve utility of database & schema information present in on-run-end hooks Nov 20, 2019
@drewbanin
Copy link
Contributor

Thanks for the report @dinedal - I updated the title of this issue to make it a little bit more actionable. I think the big problem here is that dbt renders out schemas in the on-run-end hook, but really, we want to include the database and the schema for use in hook code.

I wouldn't feel great about changing the schemas list to be a set of <database>.<schema> as that would probably break a lot of existing projects. Instead, we can add a new context variable (tbd what that would be called) which contains information about the databases and schemas that dbt objects were created in.

One other thing to note here: it might be worth including information about node types. I know some folks want to grant access to snapshots and models separately - it's not currently possible to programatically differentiate between schemas that dbt built snapshots in vs. schemas the dbt built models in.

Last: I think on-run-end has sort of run it's course. As dbt projects are becoming more complicated, I think we'll need a more flexible way to define project-level hooks based on environment variables, the dbt command that was run, etc. TBD if we tackle this as a part of #1842 or a different issue, but did want to surface it here.

Thanks!

@dinedal
Copy link
Author

dinedal commented Nov 29, 2019

Instead, we can add a new context variable (tbd what that would be called) which contains information about the databases and schemas that dbt objects were created in.

This would work fine for now

Last: I think on-run-end has sort of run it's course.

It seems like #1842 is quite a large leap, would love to see dbt progress there for sure, however to just add the database to the context of the macro seems like a much smaller change?

@drewbanin drewbanin added this to the 0.15.2: Barbara Gittings milestone Dec 2, 2019
@beckjake
Copy link
Contributor

One other thing to note here: it might be worth including information about node types. I know some folks want to grant access to snapshots and models separately - it's not currently possible to programatically differentiate between schemas that dbt built snapshots in vs. schemas the dbt built models in.

That information is actually available, as is the information you want (though it's available in a very clunky way): Each entry in the results list in the context has a node attribute, and the node attribute has a database attribute, a schema attribute, and a resource_type attribute. You could do something horrible like this:

{% set db_schemas = [] %}
{% for result in results %}
  {% if result.error is none and not result.failed and not result.skipped %}
    {% set db_schema = (result.node.database, result.node.schema) %}
    {% if db_schema not in db_schemas %}
      {% do db_schemas.append(db_schema) %}
    {% endif %}
  {% endif %}
{% endfor %}

That's basically what dbt does currently to generate the schemas context value, just with database considered as well.

That said, I think it does make a lot of sense to add a database_schemas set of unique (database, schema) pairs as we can do that much more efficiently.

I think if users want to do complicated things like think about node types, they should probably just implement something like the stuff above. Especially because you can just check the value of results[0].node.resource_type, and if it's 'snapshot', you know you are in dbt snapshot and can react accordingly.

@dinedal
Copy link
Author

dinedal commented Jan 2, 2020

@beckjake
This code seems to return a value of database on the key result.node.database sometimes, not sure what causes it or why, otherwise it does appear to work, so thank you at least for this.

@beckjake
Copy link
Contributor

beckjake commented Jan 2, 2020

Hmm. I'm not sure why that would be. I thought that might be some sort of sneaky old default lingering in dbt, but I don't see anything in the code like that. Is it possible that your profiles.yml config has database: database?

beckjake added a commit that referenced this issue Jan 6, 2020
@drewbanin
Copy link
Contributor

closed by #2031

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants