Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snowflake incremental merge does not detect new columns, fails silently #1870

Closed
4 tasks
sfc-gh-cfraleigh opened this issue Oct 28, 2019 · 1 comment
Closed
4 tasks
Labels
duplicate This issue or pull request already exists

Comments

@sfc-gh-cfraleigh
Copy link

Describe the bug

when doing some dev work, I noticed that the incremental merge in snowflake doesn't fail when a new column is added to the incremental model definition. It seems that DBT builds the select list based on the existing relation, not the temp one, so if a new column is added, it just doesn't get put in the merge statement.

Steps To Reproduce

I was testing using some simple tables, adding and removing columns from one and the other to cause failure

testing_refresh.sql:

{{
    config(materialized= "table", unique_key= "id" )
}}


with test as ( select 1 as id, 'tacos' as food, 'blue' as color, 'xl' as size ),
test2 as (select 2 as id, 'burgers' as food, 'red' as color, 'l' as size ),
test3 as (select 3 as id, 'pizza' as food, 'yellow' as color, 'm' as size )

select id, food, size, color from test
union all
select id, food ,size, color from test2
union all 
select id, food, size, color from test3

testing_refresh2:

{{
    config(materialized="incremental", unique_key= "id" )
}}

select id, color, food, size from {{ref('testing_refresh')}}

Expected behavior

If I were to add a column from testing_refresh to the incremental model testing_refresh2, I would expect the model to fail to merge, since testing_refresh2 does not yet have that column. However, what appears to happen, is the merge operation goes ahead with the columns defined previously, and does not account for new columns.

If I remove a column from testing_refresh, the table model, the incremental model will indeed fail. So the incremental model is aware when a column it is trying to reference fails, but not when it is added.

Screenshots and log output

Log output from snowflake:

  • here the temp table refs all the columns I expect to be added to the incremental model
create or replace temporary table schema.bi.testing_refresh2__dbt_tmp  as
      (

select id, color, food, size from schema.bi.testing_refresh 
      );
  • however, the SIZE column is missing from the merge statement, even though it is explicitly called in the incremental model definition.
merge into schema.bi.testing_refresh2 as DBT_INTERNAL_DEST
    using schema.bi.testing_refresh2__dbt_tmp as DBT_INTERNAL_SOURCE

    
        on DBT_INTERNAL_SOURCE.id = DBT_INTERNAL_DEST.id
    

    
    when matched then update set
        ID = DBT_INTERNAL_SOURCE.ID,COLOR = DBT_INTERNAL_SOURCE.COLOR,FOOD = DBT_INTERNAL_SOURCE.FOOD
    

    when not matched then insert
        (ID, COLOR, FOOD)
    values
        (ID, COLOR, FOOD)

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • [X ] snowflake
  • other (specify: ____________)

The output of dbt --version:

this is on a working branch of 15

The operating system you're using:
macOS

The output of python --version:
3.7

Additional context

Add any other context about the problem here.

@drewbanin
Copy link
Contributor

This is, for the moment anyway, the expected behavior. See these issues for alternative approaches we can pursue in the future:

Thanks for the report here though @cfraleig - you're totally right that this behavior is largely suboptimal and not what most users would anticipate happening when columns are added to an incremental model.

@drewbanin drewbanin added the duplicate This issue or pull request already exists label Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants