Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-974][CT-3230] [Bug] type should not be set to null for dynamic tables and materialized view #8864

Closed
2 tasks done
dbeatty10 opened this issue Oct 16, 2023 · 9 comments · Fixed by dbt-labs/dbt-snowflake#818, dbt-labs/dbt-bigquery#996, dbt-labs/dbt-redshift#653 or #8945
Assignees
Labels
backport 1.6.latest backport 1.7.latest bug Something isn't working Impact: CA materialized_views Team:Adapters Issues designated for the adapter area of the code

Comments

@dbeatty10
Copy link
Contributor

dbeatty10 commented Oct 16, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Summary

A succinct summary by @jtcohen6:

if table be static
type table - pragmatic
if table dynamic
null type - panic!

Current Behavior

This is showing up in catalog.json with materialized views / dynamic tables: there are model nodes that have metadata.type: null.

Expected Behavior

Table type should be a required field that is never null.

We should create some kind of integration test to validate that the value is not null.

Steps To Reproduce

Run dbt docs generate within a dbt project that contains a model materialized as a dynamic table (Snowflake) or materialized view (other adapters). Examine catalog.json and see that node > metadata > type is null.

Relevant log output

catalog.json versus the expected JSON schema

catalog.json

image

expected JSON schema

image

Environment

- OS:
- Python:
- dbt:

Which database adapter are you using with dbt?

dbt-snowflake is where this was noticed, but we expect this to affect other adapters as well.

Additional Context

Internal slack thread:
https://dbt-labs.slack.com/archives/C05FWBP9X1U/p1697228694650389

Related issues

@dbeatty10 dbeatty10 added bug Something isn't working Team:Adapters Issues designated for the adapter area of the code Impact: CA materialized_views labels Oct 16, 2023
@github-actions github-actions bot changed the title [Bug] type should not be set to null dynamic tables and materialized view [CT-3230] [Bug] type should not be set to null dynamic tables and materialized view Oct 16, 2023
@dbeatty10 dbeatty10 changed the title [CT-3230] [Bug] type should not be set to null dynamic tables and materialized view [CT-3230] [Bug] type should not be set to null for dynamic tables and materialized view Oct 16, 2023
@graciegoheen
Copy link
Contributor

Question from refinement:
Which adapters are affected by this? Can we fix this in dbt-core, or is this something that we need to fix in the adapter repos?

@dbeatty10
Copy link
Contributor Author

If/when we have a fix for this @eddowh and @amychen1776 requested if this could be backported for >=1.6.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Oct 23, 2023

This is most pressing on Snowflake, but we should review the catalog queries for all adapters:

  • Snowflake: table_type is null for dynamic tables
  • BigQuery: MVs will fall into the else condition and be classified as external
  • Postgres + Redshift: We consider any relkind that isn't 'v' ("view") to be "base table." We should add a line for m (= mat view) on Postgres; IIRC, the same may not work on Redshift.
  • Spark / Databricks: Looks like type will be "table" if Type: View isn't detected in show table extended information

So to summarize: This is (potentially) wrong on all adapters, but dbt-snowflake is wrong in a way that breaks downstream validation/ingestion of catalog.json.

@dbeatty10
Copy link
Contributor Author

Question from refinement: Which adapters are affected by this? Can we fix this in dbt-core, or is this something that we need to fix in the adapter repos?

The only currently known adapter that triggers this issue is dbt-snowflake. But this being a required field in dbt-core could also be considered the root cause.

Solution in dbt-core

Two options in dbt-core:

  1. allow this to be an optional field and handle null values gracefully during catalog generation
  2. keep this as a required field, but either provide more robust adapter tests or implement a default fall-back value (unknown? other?)

Solution in dbt-snowflake

The workaround provided by @jtcohen6 (which might also be the solution for dbt-snowflake):

{% macro snowflake__get_catalog_tables_sql(information_schema) -%}
    select
        table_catalog as "table_database",
        table_schema as "table_schema",
        table_name as "table_name",
        -- this is the line to change
        coalesce(table_type, "DYNAMIC TABLE") as "table_type",
        comment as "table_comment",

        -- note: this is the _role_ that owns the table
        table_owner as "table_owner",

        'Clustering Key' as "stats:clustering_key:label",
        clustering_key as "stats:clustering_key:value",
        'The key used to cluster this table' as "stats:clustering_key:description",
        (clustering_key is not null) as "stats:clustering_key:include",

        'Row Count' as "stats:row_count:label",
        row_count as "stats:row_count:value",
        'An approximate count of rows in this table' as "stats:row_count:description",
        (row_count is not null) as "stats:row_count:include",

        'Approximate Size' as "stats:bytes:label",
        bytes as "stats:bytes:value",
        'Approximate size of the table as reported by Snowflake' as "stats:bytes:description",
        (bytes is not null) as "stats:bytes:include",

        'Last Modified' as "stats:last_modified:label",
        to_varchar(convert_timezone('UTC', last_altered), 'yyyy-mm-dd HH24:MI'||'UTC') as "stats:last_modified:value",
        'The timestamp for last update/change' as "stats:last_modified:description",
        (last_altered is not null and table_type='BASE TABLE') as "stats:last_modified:include"
    from {{ information_schema }}.tables
{%- endmacro %}

@dbeatty10
Copy link
Contributor Author

@jtcohen6 beat me to the reply button! :sonic:

Proposed acceptance criteria

Based off the comment in #8864 (comment), one option could be to add a functional test that confirms the table_type value of each materialization after a dbt run / build.

i.e. a workflow like this:

  1. Define a model that is materialized as a materialized_view
  2. Run that model
  3. Run a catalog query and confirm that the expected table_type is returned

@mikealfare
Copy link
Contributor

Incorrectly closed due to automation.

@mikealfare
Copy link
Contributor

Incorrectly closed due to automation...again.

@mikealfare mikealfare reopened this Oct 31, 2023
@mikealfare mikealfare changed the title [CT-3230] [Bug] type should not be set to null for dynamic tables and materialized view [ADAP-974][CT-3230] [Bug] type should not be set to null for dynamic tables and materialized view Oct 31, 2023
@mikealfare
Copy link
Contributor

I did some things. Please take a look at the four PRs attached to this issue.

It's worth noting that all four adapters which support materialized views (dbt-posgres, dbt-snowflake, dbt-bigquery, and dbt-redshift) were affected. All failed the new integration test that I created and all now pass that test. It's also worth noting that the new integration test tests all supported relation types for that adapter.

I'd like to call out one thing in particular; it looks like we're not consistent across adapters with what table_type returns. Most of them return one of "BASE TABLE", "VIEW", (and now) "MATERIALIZED VIEW"/"DYNAMIC TABLE". However, dbt-bigquery returns "table", "view", (and now) "materialized view". I did not want to change that because I didn't want to break something downstream. My knee jerk reaction is that these should return a value in the RelationType enum. I would consider all of this as beyond the scope of this fix though.

@dbeatty10
Copy link
Contributor Author

Thanks for calling this out @mikealfare @mikealfare ! 🏆

Created a separate issue to follow-up on this piece that you called out. The issue description is just an initial stub right now, but we will flesh it out as-needed:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment