Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/bq incremental strategy insert_overwrite #2153

Merged
merged 2 commits into from
Mar 4, 2020

Conversation

jtcohen6
Copy link
Contributor

@jtcohen6 jtcohen6 commented Feb 24, 2020

This is a small feature that builds on top of the tremendous work from #2140. It shouldn't have any breaking changes, so I think we could ship it in 0.16.1. I'm opening this now so that I can link to it in a forthcoming post about dbt + BigQuery + incremental models.

A common request from the community is an incremental materialization on BigQuery to just drop and replace an entire day of data. By setting incremental_strategy = "insert_overwrite" in the config, any partition with new data will be completely dropped and recreated.

Example usage:

{{ config(
    materialized='incremental',
    partition_by="ts",
    cluster_by="id",
    incremental_strategy = "insert_overwrite"
) }}

with data as (
    select 1 as id, cast('2019-01-01' as date) as ts union all
    select 2 as id, cast('2019-01-02' as date) as ts union all
    select 3 as id, cast('2019-01-02' as date) as ts union all
    select 4 as id, cast('2019-01-03' as date) as ts union all
    select 5 as id, cast('2019-01-04' as date)
)

select *
from data

{% if is_incremental() %}
where ts > _dbt_max_partition
{% endif %}

@cla-bot cla-bot bot added the cla:yes label Feb 24, 2020
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 this is great! And self-contained! I think I just convinced myself that we should try to ship this for 0.16.0.... i can't think of any reason at all why we should not do that. Can you?

{%- set predicates = [] if predicates is none else [] + predicates -%}
{%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%}

merge into {{ target }} as DBT_INTERNAL_DEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i could live 1,000 more years and i would still not understand this DML...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have lived 2 days and I have read the documentation, and I now understand this DML.

The key thing I was missing: when not matched by source is always true here because we're using a constant-false predicate. I was errantly thinking that we were still merging on a unique_key, which we are not.

From the docs:

If the merge_condition is FALSE, the query optimizer avoids using a JOIN. This optimization is referred to as a constant false predicate. A constant false predicate is useful when you perform an atomic DELETE on the target plus an INSERT from a source (DELETE with INSERT is also known as a REPLACE operation).

Cool!

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Mar 3, 2020

Neat! As part of finalizing my Discourse post about new BQ partitioning + incremental modeling in 0.16.0, I'm going to test this strategy on a (public) dataset of some size.

@drewbanin drewbanin force-pushed the feature/bq-insert-overwrite branch from 273f0e2 to 0656477 Compare March 4, 2020 18:49
@drewbanin
Copy link
Contributor

The test failures here appear to be intermittent weirdness on Snowflake's end... Looks like it was returning Arrow data instead of JSON data? Merging this one for 0.16.0!

@drewbanin drewbanin merged commit 3419567 into dev/barbara-gittings Mar 4, 2020
@drewbanin drewbanin deleted the feature/bq-insert-overwrite branch March 4, 2020 21:56
@hui-zheng
Copy link

@jtcohen6 @drewbanin
It's exciting to see this feature out. We implemented something similar in-house and had long waited for this dbt native feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants