feat: command to sync DBT to Superset #18098

betodealmeida · 2022-01-19T22:50:18Z

SUMMARY

This PR introduces a new command to sync metadata from DBT to Superset. The command reads the profile and manifest files, creating/updating databases and datasets in Superset based on them.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Given this ~/.dbt/profiles.yml:

superset_examples:
  outputs:

    dev:
      type: postgres
      threads: 1
      host: localhost
      port: 5432
      user: beto
      pass: ''
      dbname: examples_dev
      schema: public
      meta:
        superset:
          cache_timeout: 300  # arbitrary metadata for our DB

  target: dev

The file messages_channels.sql:

SELECT
  messages.ts,
  channels.name,
  messages.text
FROM
  {{ source ('public', 'messages') }} messages
  JOIN {{ source ('public', 'channels') }} channels ON messages.channel_id = channels.id

And schema.yaml:

version: 2

sources:
  - name: public
    tables:
      - name: messages
        description: 'Messages in the Slack channel'
      - name: channels
        description: 'Information about Slack channels'

metrics:
  - name: cnt 
    label: ''
    model: ref('messages_channels')
    description: ''
    type: count
    sql: '*'

We can run:

$ superset sync dbt \
> ~/Projects/dbt-examples/superset_examples/target/manifest.json \
> --project superset_examples \
> --target dev  # not needed, default is already "dev"

This will (1) create (or update) the a new database connection based on Postgres:

It will also (2) create/update three datasets owner by the admin:

(Note that the dataset description comes from the DBT config.)

It will also populate metrics:

TESTING INSTRUCTIONS

Create a DBT project.
Run superset sync dbt /path/to/project/target/manifest.json --project PROJECT --target TARGET

Check that everything is imported correctly.

Currently, this only works for Postgres, but adding other profile types is straightforward.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

codecov · 2022-01-19T23:05:29Z

Codecov Report

Merging #18098 (4035ccb) into master (9e2bc72) will decrease coverage by 0.12%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master   #18098      +/-   ##
==========================================
- Coverage   65.85%   65.73%   -0.13%     
==========================================
  Files        1577     1581       +4     
  Lines       61828    61942     +114     
  Branches     6244     6244              
==========================================
  Hits        40719    40719              
- Misses      19509    19623     +114     
  Partials     1600     1600

Flag	Coverage Δ
hive	`51.95% <0.00%> (-0.20%)`	⬇️
mysql	`80.73% <0.00%> (-0.31%)`	⬇️
postgres	`80.78% <0.00%> (-0.31%)`	⬇️
presto	`51.79% <0.00%> (-0.20%)`	⬇️
python	`81.22% <0.00%> (-0.31%)`	⬇️
sqlite	`80.47% <0.00%> (-0.31%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/cli/celery.py	`0.00% <ø> (ø)`
superset/cli/main.py	`0.00% <0.00%> (ø)`
superset/cli/sync/dbt/command.py	`0.00% <0.00%> (ø)`
superset/cli/sync/dbt/databases.py	`0.00% <0.00%> (ø)`
superset/cli/sync/dbt/datasets.py	`0.00% <0.00%> (ø)`
superset/cli/sync/main.py	`0.00% <0.00%> (ø)`
superset/cli/update.py	`0.00% <ø> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e2bc72...4035ccb. Read the comment docs.

rumbin · 2022-01-22T07:23:18Z

Beto, are you aware of this project?
https://github.com/slidoapp/dbt-superset-lineage
I haven't tried it, though...

Regarding your approach, I have some doubts on automatically updating the DB connection.
In many cases people may want to use a different user here, or different connection strings for the sake of things like user impersonation, proxy settings, roles, warehouse settings (Snowflake), etc.
I know that you are currently only covering Postgres, but for other databases it may become troublesome.
Maybe this step could be made optional?

betodealmeida · 2022-01-22T15:54:20Z

Beto, are you aware of this project? https://github.com/slidoapp/dbt-superset-lineage I haven't tried it, though...

Yes, it was the inspiration! I should've mentioned it in the summary, I'll update it. I think both approaches are valid — this one here is simpler because you're using the CLI, while the other uses the API and is valuable for cases where you don't have direct access.

Regarding your approach, I have some doubts on automatically updating the DB connection. In many cases people may want to use a different user here, or different connection strings for the sake of things like user impersonation, proxy settings, roles, warehouse settings (Snowflake), etc. I know that you are currently only covering Postgres, but for other databases it may become troublesome. Maybe this step could be made optional?

That's a great point! I'll make it optional, allowing the user to reuse an existing DB.

rumbin · 2022-01-22T17:52:43Z

Sounds great.
What I haven't understood so far is, how you envision the setup of this solution. Where is the superset sync dbt run ideally? I suppose that this would be part of a CI/CD pipeline. So, what components of Superset need to be installed in the container?
We should also consider what would be suitable scenarios for dbt Cloud users who would first need to fetch the dbt artifacts via API calls.

BTW, I am not associated with the dbt-superset-lineage project in any way. I was just planning to give it a try in the near future. However, now I am curious to wait for your solution.

mrshu · 2022-01-22T19:42:51Z

BTW, I am not associated with the dbt-superset-lineage project in any way. I was just planning to give it a try in the near future. However, now I am curious to wait for your solution.

@rumbin Being at least somewhat associated with it, although not one of the authors, please do not hesitate to reach out with feedback!

betodealmeida · 2022-01-24T19:37:48Z

Sounds great. What I haven't understood so far is, how you envision the setup of this solution. Where is the superset sync dbt run ideally? I suppose that this would be part of a CI/CD pipeline. So, what components of Superset need to be installed in the container? We should also consider what would be suitable scenarios for dbt Cloud users who would first need to fetch the dbt artifacts via API calls.

To run it in CI/CD you would need to pip install superset, set SUPERSET__SQLALCHEMY_DATABASE_URI, and run superset sync dbt.

noel · 2022-03-03T21:44:10Z

Based on a conversation in the dbt Slack #tools-superset channel

It was suggested that I add a comment to this PR.
The idea is to create datasets automatically via a configuration in the dbt_profiles.yml

models:
  project:
    marts:
      +superset_export: true

The above would create datasets for all the models in under marts

betodealmeida · 2022-03-23T22:39:35Z

I'm working on a better solution for this.

rumbin · 2022-05-20T07:42:06Z

@betodealmeida Is there any resource for your new approach available?

betodealmeida · 2022-05-20T15:45:27Z

@rumbin take a look at https://github.com/preset-io/backend-sdk

GeorgePearse · 2022-06-15T14:16:35Z

Really interested in any work to make DBT and Superset work better together.

feat: command to sync DBT to Superset

4035ccb

pull-request-size bot added the size/L label Jan 19, 2022

betodealmeida mentioned this pull request Jan 20, 2022

Sync exposures #18099

Closed

9 tasks

rumbin mentioned this pull request Jan 22, 2022

[SIP-68] A better model for Datasets #14909

Closed

betodealmeida closed this Mar 23, 2022

skwaugh mentioned this pull request Jun 24, 2022

Is this project active and alive? preset-io/backend-sdk#28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: command to sync DBT to Superset #18098

feat: command to sync DBT to Superset #18098

betodealmeida commented Jan 19, 2022

codecov bot commented Jan 19, 2022 •

edited

Loading

rumbin commented Jan 22, 2022

betodealmeida commented Jan 22, 2022

rumbin commented Jan 22, 2022

mrshu commented Jan 22, 2022 •

edited

Loading

betodealmeida commented Jan 24, 2022

noel commented Mar 3, 2022

betodealmeida commented Mar 23, 2022

rumbin commented May 20, 2022

betodealmeida commented May 20, 2022

GeorgePearse commented Jun 15, 2022

feat: command to sync DBT to Superset #18098

feat: command to sync DBT to Superset #18098

Conversation

betodealmeida commented Jan 19, 2022

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

codecov bot commented Jan 19, 2022 • edited Loading

Codecov Report

rumbin commented Jan 22, 2022

betodealmeida commented Jan 22, 2022

rumbin commented Jan 22, 2022

mrshu commented Jan 22, 2022 • edited Loading

betodealmeida commented Jan 24, 2022

noel commented Mar 3, 2022

betodealmeida commented Mar 23, 2022

rumbin commented May 20, 2022

betodealmeida commented May 20, 2022

GeorgePearse commented Jun 15, 2022

codecov bot commented Jan 19, 2022 •

edited

Loading

mrshu commented Jan 22, 2022 •

edited

Loading