Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3068] [Bug] relative node paths incl. "../../../" in dbt_project.yml completely destroy repository #8541

Open
2 tasks done
jaklan opened this issue Sep 2, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@jaklan
Copy link

jaklan commented Sep 2, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Defining node (model, seed etc.) paths in dbt_project.yml as the relative ones can completely destroy your repository as the compiled nodes doesn't stay in target/compiled/ directory, but they are moved up in the file hierarchy:

  • paths incl. ../ would be located in target/<dbt project name>/ (instead of target/compiled/<dbt project name>/)
  • paths incl. ../../ would be located directly in target/
  • paths incl. ../../../ and deeper would be moved outside target/ directory and become the parts of the actual git repository!

Expected Behavior

All the compiled nodes should stay in target/compiled directory, no matter if the node paths are relative or not.

Potential solution: instead of recreating the directory structure of specified nodes, that structure should be reflected in single directory name, e.g. ../../../../common/dbt/models after compilation should go to target/compiled/<dbt project name>/..-..-..-..-common-dbt-models (or anything similar as that approach would make the directory hidden on Unix-like operating systems).

Steps To Reproduce

We use dbt in monorepo. Our structure looks like:

  • common
    • dbt
      • dbt_project.yml
      • models
      • ...
  • projects
    • emea
      • common
        • dbt
          • dbt_project.yml
          • models
          • ...
      • germany
        • dbt
          • dbt_project.yml
          • models
          • ...

dbt_project.yml for Germany looks then:

model-paths:
  - models
  - ../../common/dbt/models
  - ../../../../common/dbt/models

When you run dbt compile, 3 things happen:

  • compiled German models are located in:
    projects/emea/germany/dbt/target/compiled/<German dbt project name>/models
    which is fine
  • compiled EMEA common models are located in:
    projects/emea/germany/dbt/target/common/dbt/models (🚨)
    which is not fine, because the models are located 2 levels higher, outside compiled sub-directory - but they still fit inside the target one
  • compiled root common models are located in:
    projects/emea/germany/common/dbt/models (🚨🚨🚨)
    which is a complete disaster - models are moved 4 levels higher, so they leave target directory and become the parts of the actual repository!

Relevant log output

No response

Environment

- OS: macOS Ventura
- Python: 3.10.9
- dbt: tested with both 1.4 and 1.6

Which database adapter are you using with dbt?

redshift

Additional Context

No response

@jaklan jaklan added bug Something isn't working triage labels Sep 2, 2023
@github-actions github-actions bot changed the title [Bug] relative node paths incl. "../../../" in dbt_project.yml completely destroy repository [CT-3068] [Bug] relative node paths incl. "../../../" in dbt_project.yml completely destroy repository Sep 2, 2023
@jaklan
Copy link
Author

jaklan commented Sep 2, 2023

Before you recommend us to use packages instead of relative paths:

@jtcohen6 jtcohen6 self-assigned this Sep 5, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 5, 2023

@jaklan Thanks for opening.

To better understand the use case, in the example you've provided, are the "common" models:

  1. Models that need to be built once, and then referenced in several downstream country-specific projects (Germany, France, ...)
  2. "Template" models that need to be reconfigured / run with certain vars depending on the country (Germany, France, ...)

That will help me in understanding whether your primary goal is to (a) split up a single monolithic project, or (b) to reuse model templates across multiple regions/instances.


Today, model-paths and other resource path configs in dbt_project.yml don't support relative paths to parent directories (outside the working directory), nor do they support absolute paths (#7373). It's never been a high-priority fix for us, and in the meantime, we've never documented that either pattern is supported:

The alternative pattern, which you mention in your comment, is to use "local" packages:

# projects/emea/germany/packages.yml
packages:
  - local: "../../common/dbt"
  - local: "../../../../common/dbt"

Here's what I'm thinking: In an ideal world, both of these patterns would work. But we would prioritize and strongly recommend using local packages, rather than model-paths that reach up into parent directories. Why? Each project/package represents a colocated module, rather than a scattered collection of files which could be anywhere in the file system. It's also then much easier to split out each project/package into a separate standalone repo, or to switch between package and project dependencies (= to use cross-project ref in dbt Cloud Enterprise).

Your point about the documentation saying, "Local packages should only be used for specific situations" — I think we should just update our docs! Multi-project monorepos are another legit use case for local packages.

I think the pattern I like the most is running dbt from the root of your monorepo, and installing the appropriate packages based on what you need at a given point in time:

# packages.yml (root)
packages:
  - local: common/dbt
  - local: projects/emea/common/dbt
  - local: projects/emea/germany/dbt

Proposed next steps

In rough order of priority:

@jtcohen6 jtcohen6 removed the triage label Sep 5, 2023
@trouze
Copy link
Contributor

trouze commented Sep 5, 2023

Hi @jtcohen6! @jaklan asked me to answer your query:

To better understand the use case, in the example you've provided, are the "common" models:
Models that need to be built once, and then referenced in several downstream country-specific projects (Germany, France, ...)
"Template" models that need to be reconfigured / run with certain vars depending on the country (Germany, France, ...)

I'd say the latter, "template" models. There are some models that technically could be built once and then referenced downstream by each affiliate, but this adds complexity from a maintainability perspective as we can have both a model that uses a country var macro as well as one that doesn't coming from the same source system (which we organize by in our staging layers). I suppose the answer is technically both (and I'm sure there are others that implement this pattern), but in practice for us it's only the latter.

mirnawong1 added a commit to dbt-labs/docs.getdbt.com that referenced this issue Sep 8, 2023
Context:
dbt-labs/dbt-core#8541 (comment)

## What are you changing in this pull request and why?

There are multiple legitimate use cases for 'local' packages, beyond the
one that's currently documented — especially as we see more
multi-project setups.
@misteliy
Copy link

I see the fix was merged. Will this be coming in the next release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants