Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run hooks outside of a transaction #510

Merged
merged 11 commits into from
Aug 29, 2017

Conversation

drewbanin
Copy link
Contributor

@drewbanin drewbanin commented Aug 16, 2017

This is required to get vacuum working in post- hooks. Pre/Post hooks can now be:

  • a string
  • a json string containing a "sql" string key and a "transaction" key

All of the following are valid:

{{ config(
    'post-hook': [
        'select 1 -- inside transaction',
        {"sql": "select 2 -- inside transaction", "transaction": true},
        {"sql": "vacuum {{ this }}", "transaction": false},
        '{{ after_commit("vacuum " ~ adapter.quote_schema_and_table(this.schema, this.name)) }}',
        '{{ vacuum(this) }} -- this is a macro which expands to a post-transaction json string'
    ]
) }}

@jthandy
Copy link
Member

jthandy commented Aug 16, 2017

neat, i hadn't realized we were actually working actively on this.

is it possible to do this?

post-hook:
  - {sql: "grant select on {{ this }} to db_reader", transaction: true}
  - "vacuum {{ this.schema }}.{{ this.name }}"

essentially, the first argument is always the sql and an optional second argument specifies the transaction status. most times it would be nice to just declare the sql syntax and not think about transactions at all, and i'm also nervous about adding additional configuration parameters given that we have a bit of a configuration spaghetti bowl at the moment.

@drewbanin
Copy link
Contributor Author

yeah, i'm with you @jthandy -- doesn't really make sense to add another config arg for this. I like that idea!

@cmcarthur
Copy link
Member

cmcarthur commented Aug 16, 2017

I like the idea of defaulting to transaction = true. One recommendation: name this thing after_commit, it's more clear. Like:

post-hook:
  - "grant select on {{ this }} to db_reader"
  - { after_commit: true, sql: "vacuum {{ this.schema }}.{{ this.name }}" }

@drewbanin
Copy link
Contributor Author

@cmcarthur that makes sense, but we need this to work for pre-hooks too ;)

@cmcarthur
Copy link
Member

cmcarthur commented Aug 16, 2017

before_begin?

@drewbanin
Copy link
Contributor Author

@cmcarthur @jthandy just pushed some more code... all of these are valid ways to define pre- or post-hooks:

    'post-hook': [
        'select 1 -- inside transaction',
        {"sql": "select 2 -- inside transaction", "transaction": true},
        {"sql": "vacuum \"dbt_dbanin\".\"dbt_users\"", "transaction": false},
        '{{ after_commit("vacuum " ~ adapter.quote_schema_and_table(this.schema, this.name)) }}',
        '{{ vacuum(this) }}'
    ]

Some of these macros don't belong here, but I'm using it as a proof-of-concept: https://github.com/fishtown-analytics/dbt/blob/0e6044508def2e9901a1d964861c0b678be28617/dbt/include/global_project/macros/materializations/helpers.sql

@drewbanin drewbanin modified the milestone: 0.9.0 rc-1 Aug 17, 2017
@@ -23,14 +23,14 @@ def exception_handler(cls, profile, sql, model_name=None,
except psycopg2.DatabaseError as e:
logger.debug('Postgres error: {}'.format(str(e)))

cls.rollback(connection)
cls.release_connection(profile, connection_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this automatically roll back open transactions? would want to be sure we don't create a deadlock because of an open tx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, release_connection calls rollback under the hood if the transaction is open. This change is to fix an issue where we'd try to rollback transactions that weren't open

{%- set hook_sql = hook_data.get('sql', hook) -%};

{%- if hook_is_in_transaction == inside_transaction -%}
{% call statement(auto_begin=inside_transaction) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is slick

wrapped_hooks = []
for hook in hooks:
if isinstance(hook, dict):
hook = json.dumps(hook)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only part of this branch that doesn't feel right to me. it seems like we should do the opposite: parse all the hooks to their dictionary version and return those. then you don't need to decode each hook individually in a loop before you can perform filter logic.

that would let you write, for example:

{% macro run_hooks(hooks, inside_transaction=True) %}
  {% for hook in hooks | selectattr('transaction', 'equalto', inside_transaction)  %}
    {% call statement(auto_begin=inside_transaction) %}
      {{ hook.get('sql') }}
    {% endcall %}
  {% endfor %}
{% endmacro %}

(and, in general, lets you do more powerful comprehensions on hooks)

Copy link
Member

@cmcarthur cmcarthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great. approved if tests pass

"{} for model {} is not a string!".format(key, name))
hooks = relevant_configs[key]
if not isinstance(hooks, (list, tuple)):
hooks = [hooks]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

fqn_extra = []
tags = coalesce(tags, set())
fqn_extra = coalesce(fqn_extra, [])
macros = coalesce(macros, {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this

Copy link
Contributor Author

@drewbanin drewbanin Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i intend to rewrite dbt-core in sql, and this is step 1 of 10 billion

i think this could equivalently be:

tags = tags or set()

but i always find that confusing, and I think codeclimate would still complain about cyclomatic complexity. 👍

@drewbanin
Copy link
Contributor Author

drewbanin commented Aug 29, 2017

Summary, because this branch kind of morphed from its initial purpose:

This branch formalizes how hooks should work. Now, a hook is either:

  • a sql string
  • a dict in the format {"sql": sql_string, "transaction": True/False}

If a sql string is provided, the hook will run inside of the model transaction. If a dict is provided, and transaction is False, then the hook will either run before the begin or after the commit for the model transaction, depending on if the hook is a pre-hook or post-hook.

This means that commands like analyze and vacuum can now be run in post-hooks.

The dictionary syntax is a little unwieldy. Three new macros, before_begin, in_transaction, and after_commit have been added to make adding hooks easier.

With this syntax, you can define a hook that looks like:

{{ config({
    "post-hook": after_transaction('vacuum ` ~ adapter.quote_schema_and_table(this.schema, this.name))
  })
}}

Finally, on-run-start and on-run-end hooks have been refactored to permit the same hook definition provided above. This means that you can define hook macros and use them either in run hooks or model hooks. Note that, for now, all on-run-* run outside of a transaction.

@drewbanin drewbanin merged commit cda3a68 into development Aug 29, 2017
@drewbanin drewbanin deleted the hooks-outside-of-transactions branch August 29, 2017 20:25
@drewbanin
Copy link
Contributor Author

Fixes #486

@ryantuck
Copy link

I think the code block above should read (as of v0.10.1):

{{ config({
    "post-hook": after_commit('vacuum ' ~ adapter.quote_schema_and_table(this.schema, this.name))
  })
}}

@drewbanin
Copy link
Contributor Author

Thanks @ryantuck - I think you're right :)

iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* make it possible to issue hooks outside of a transaction

* wip

* fix incremental materializations

* hook contract for on-run-start/on-run-end

* make on-run-* hooks work more sanely

* pep8

* make codeclimate happy(-ier)

* typo

* fix for bq commit signature


automatic commit by git-black, original commits:
  cda3a68
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants