Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/snowflake view transactions #940

Merged
merged 8 commits into from
Aug 24, 2018

Conversation

drewbanin
Copy link
Contributor

This PR fixes #930. Additionally, it reconciles this.name and this.table sort of by accident.

The Problem

When Snowflake views are created, the column structure for the view is determined and saved by Snowflake. For example, if a Snowflake view is defined as select * from some_table, then this view will have a structure identical to that of some_table. If in the future, some_table is recreated with a new definition, then the saved structure for the original view might become out of sync with it's new definition. For example:

create table public.base_table as (
  select 1 as id
);

-- dependent_view is created with a single column (id)
create view public.dependent_view as (
  select * from public.base_table
);

-- recreate base_table with two columns (id, name)
drop table public.base_table;
create table public.base_table as (
  select 1 as id, 2 as name
);

-- This will fail: dependent_view's structure is now invalid
select * from public.dependent_view;

> View definition for <relation> declared 1 column(s), but view query produces 2 column(s)

What happened

dbt's behavior around building tables and views changed in 0.10.2. Namely: statements were reordered so that drop statements are run outside of transactions. This change affects Postgres, Redshift, and Snowflake. It did have the positive effect of reducing concurrent transaction errors (more info on this here), but it also caused this unintended consequence with Snowflake views.

Manifestation in dbt

dbt failed with this error during the swap-and-drop part of the view and table materializations. In particular, running:

alter table dbt_schema.some_view rename to some_view__backup

would fail if some_view's definition became inconsistent with its saved schema. This would happen when the definition of an upstream model was changed manually, for instance.

The Fix

The best way to fix this is to avoid the whole create-temp then swap-and-drop workflow. Instead, we can very simply execute create or replace view to atomically overwrite an existing view with its new definition. When this happens, Snowflake does not try to validate the existing view's definition against it's saved structure.

Fallout

With this change, dbt is no longer creating a __dbt_tmp suffixed view, so a special case would need to be made for the this variable. Rather than add increasingly complex logic to prevent this.table from using a __dbt_tmp suffix for snowflake views (as we currently do for incremental models), this PR runs hooks after the new model is renamed to its final identifier. As a result, the __dbt_tmp table (if it exists) is never visible to hooks, so we can make this.schema == this.name.

- obviate difference between this.name and this.table
- use `create or replace view` on snowflake
- avoid doing costly alter tables and drops for snowflake views
@drewbanin
Copy link
Contributor Author

drewbanin commented Aug 20, 2018

@beckjake adding you for review also since you were just deep in the BQ view materialization code

Does all of this look reasonable? There's a lot in here, but I think it fixes our immediate problem with:

View definition for declared 1 column(s), but view query produces 2 column(s)

as well as obviating the need for that terrible __dbt_tmp suffix code

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I guess I'm still not clear on non-destructive vs full_refresh but if you're confident in this logic, looks good to me.

{{ adapter.drop_relation(old_relation) }}
{%- else -%}
{{ exceptions.relation_wrong_type(old_relation, 'view') }}
{%- endif -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about this quite a bit, and I am not sure about the split here. Should snowflake really behave this fundamentally differently from bigquery? Never raise and ignore the full refresh flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decision to raise this error on BigQuery was, I think, a shortsighted one. The error we throw here is very specifically intended to avoid clobbering an ingestion-time partitioned table with a view by accident. This could happen if you switch the type of your materialization from table to view. We don't do this check anywhere else in dbt, and ingestion-time partitioned tables are mostly superseded by column partitioning, so I don't know that too many people are actually being helped out by this error in practice.

I'm in favor of dropping the exception if @cmcarthur is into it as well. I agree that it's a thorny maintenance burden and has little utility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove the exception you also get to remove handle_existing_table entirely, and removing code is always a win.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewbanin I defer totally to you on it. IMO it's totally fine to handle these kinds of errors situationally based on the adapter. We added this error for a really good reason, namely building time partitioned tables was costly, and dropping it unintentionally would cause users a lot of pain. If you believe that this isn't going to cause pain then I'm on board with removing it

@@ -0,0 +1,86 @@
from nose.plugins.attrib import attr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this import

Copy link
Member

@cmcarthur cmcarthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewbanin i defer to you totally on the logic here, but this implementation does not look quite right to me. if you'd like to discuss further in person, ping me

{{ adapter.drop_relation(old_relation) }}
{%- else -%}
{{ exceptions.relation_wrong_type(old_relation, 'view') }}
{%- endif -%}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewbanin I defer totally to you on it. IMO it's totally fine to handle these kinds of errors situationally based on the adapter. We added this error for a really good reason, namely building time partitioned tables was costly, and dropping it unintentionally would cause users a lot of pain. If you believe that this isn't going to cause pain then I'm on board with removing it

as dropping inside the transaction will lead to a "table dropped by concurrent transaction"
error. In the future, Snowflake should use `create or replace table` syntax to obviate this code
*/ #}
{% if adapter.type() == 'snowflake' and old_relation.type == 'view' %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤢

This is antithetical to the adapter architecture. Why not implement totally different table and view materializations for Snowflake vs. Bigquery? Did we architect this part incorrectly?

@drewbanin drewbanin merged commit b1e186a into development Aug 24, 2018
@drewbanin drewbanin deleted the fix/snowflake-view-transactions branch August 24, 2018 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants