Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cross-project ref + model access + dependencies.yml #3577

Merged
merged 16 commits into from
Jul 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions website/dbt-versions.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ exports.versions = [
]

exports.versionedPages = [
{
"page": "docs/collaborate/govern/project-dependencies",
"firstVersion": "1.6",
},
{
"page": "reference/resource-properties/deprecation_date",
"firstVersion": "1.6",
Expand Down
4 changes: 2 additions & 2 deletions website/docs/docs/build/packages.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,10 @@ Defining and installing dbt packages is different from [defining and installing
:::

## How do I add a package to my project?
1. Add a `packages.yml` file to your dbt project. This should be at the same level as your `dbt_project.yml` file.
1. Add a file named <VersionBlock firstVersion="1.6"> `dependencies.yml` or </VersionBlock> `packages.yml` to your dbt project. This should be at the same level as your `dbt_project.yml` file.
2. Specify the package(s) you wish to add using one of the supported syntaxes, for example:

<File name='packages.yml'>
<File>

```yaml
packages:
Expand Down
42 changes: 35 additions & 7 deletions website/docs/docs/collaborate/govern/model-access.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,16 +163,44 @@ Of course, dbt can facilitate this by means of [the `grants` config](/reference/

As we continue to develop multi-project collaboration, `access: public` will mean that other teams are allowed to start taking a dependency on that model. This assumes that they've requested, and you've granted them access, to select from the underlying dataset.

### What about referencing models from a package?
### How do I ref a model from another project?

For historical reasons, it is possible to `ref` a protected model from another project, _if that protected model is installed as a package_. This is useful for packages containing models for a common data source; you can install the package as source code, and run the models as if they were your own.
<VersionBlock lastVersion="1.5">

In dbt Core v1.5 (and earlier versions), the only way to reference a model from another project is by installing that project as a package, including its full source code. It is not possible to restrict references across projects based on model `access`.

For more control over per-model access across projects, select v1.6 (or newer) from the version dropdown.

</VersionBlock>

<VersionBlock firstVersion="1.6">

You can `ref` a model from another project in two ways:
1. [Project dependency](/docs/collaborate/govern/project-dependencies): In dbt Cloud Enterprise, you can use project dependencies to `ref` a model. dbt Cloud uses a behind-the-scenes metadata service to resolve the reference, enabling efficient collaboration across teams and at scale.
2. ["Package" dependency](/docs/build/packages): Another way to `ref` a model from another project is to treat the other project as a package dependency. This requires installing the other project as a package, including its full source code, as well as its upstream dependencies.

### How do I restrict access to models defined in a package?

Source code installed from a package becomes part of your runtime environment. You can call macros and run models as if they were macros and models that you had defined in your own project.

For this reason, model access restrictions are "off" by default for models defined in packages. You can reference models from that package regardless of their `access` modifier.

The project being installed as a package can optionally restrict external `ref` access to just its public models. The package maintainer does this by setting a `restrict-access` config to `True` in `dbt_project.yml`.

By default, the value of this config is `False`. This means that:
- Models in the package with `access: protected` may be referenced by models in the root project, as if they were defined in the same project
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
- Models in the package with `access: private` may be referenced by models in the root project, so long as they also have the same `group` config

When `restrict-access: True`:
- Any `ref` from outside the package to a protected or private model in that package will fail.
- Only models with `access: public` can be referenced outside the package.

<File name="dbt_project.yml">

dbt Core v1.6 will introduce a new kind of `project` dependency, distinct from a `package` dependency, defined in `dependencies.yml`:
```yml
projects:
- project: jaffle_finance
restrict-access: True # default is False
```

Unlike installing a package, the models in the `jaffle_finance` project will not be pulled down as source code, or selected to run during `dbt run`. Instead, `dbt-core` will expect stateful input that enables it to resolve references to those public models.
</File>

Models referenced from a `project`-type dependency must use [two-argument `ref`](#two-argument-variant), including the project name. Only public models can be accessed in this way. That holds true even if the `jaffle_finance` project is _also_ installed as a package (pulled down as source code), such as in a coordinated deployment. If `jaffle_finance` is listed under the `projects` in `dependencies.yml`, dbt will raise an error if a protected model is referenced from outside its project.
</VersionBlock>
96 changes: 96 additions & 0 deletions website/docs/docs/collaborate/govern/project-dependencies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
title: "Project dependencies"
id: project-dependencies
sidebar_label: "Project dependencies"
description: "Reference public models across dbt projects"
---

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add a note to this page about project-level cycle detection (dbt-labs/dbt-core#7468)

:::info
"Project" dependencies and cross-project `ref` is currently in closed beta and are features of dbt Cloud Enterprise. To access these features, please contact your account team.
:::

For a long time, dbt has supported code reuse and extension by installing other projects as [packages](/docs/build/packages). When you install another project as a package, you are pulling in its full source code, and adding it to your own. This enables you to call macros and run models defined in that other project.

While this is a great way to reuse code, share utility macros, and establish a starting point for common transformations, it's not a great way to enable collaboration across teams and at scale, especially at larger organizations.

This year, dbt Labs is introducing an expanded notion of `dependencies` across multiple dbt projects:
- **Packages** &mdash; Familiar and pre-existing type of dependency. You take this dependency by installing the package's full source code (like a software library).
- **Projects** &mdash; A _new_ way to take a dependency on another project. Using a metadata service that runs behind the scenes, dbt Cloud resolves references on-the-fly to public models defined in other projects. You don't need to parse or run those upstream models yourself. Instead, you treat your dependency on those models as an API that returns a dataset. The maintainer of the public model is responsible for guaranteeing its quality and stability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sugg adding a header here

Suggested change
## Use case example

## Example

As an example, let's say you work on the Marketing team at the Jaffle Shop. The name of your team's project is `jaffle_marketing`:

<File name="dbt_project.yml">

```yml
name: jaffle_marketing
```

</File>

As part of your modeling of marketing data, you need to take a dependency on two other projects:
- `dbt_utils` as a [package](#packages-use-case): An collection of utility macros that you can use while writing the SQL for your own models. This package is, open-source public, and maintained by dbt Labs.
- `jaffle_finance` as a [project use-case](#projects-use-case): Data models about the Jaffle Shop's revenue. This project is private and maintained by your colleagues on the Finance team. You want to select from some of this project's final models, as a starting point for your own work.

<File name="dependencies.yml">

```yml
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1

projects:
- name: jaffle_finance # matches the 'name' in their 'dbt_project.yml'
```

</File>

What's happening here?

The `dbt_utils` package &mdash; When you run `dbt deps`, dbt will pull down this package's full contents (100+ macros) as source code and add them to your environment. You can then call any macro from the package, just as you can call macros defined in your own project.

The `jaffle_finance` projects &mdash; This is a new scenario. Unlike installing a package, the models in the `jaffle_finance` project will _not_ be pulled down as source code and parsed into your project. Instead, dbt Cloud provides a metadata service that resolves references to [**public models**](/docs/collaborate/govern/model-access) defined in the `jaffle_finance` project.

### Advantages

Copy link
Contributor

@mirnawong1 mirnawong1 Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Projects use case

When you're building on top of another team's work, resolving the references in this way has several advantages:
- You're using an intentional interface designated by the model's maintainer with `access: public`.
- You're keeping the scope of your project narrow, and avoiding unnecessary resources and complexity. This is faster for you and faster for dbt.
- You don't need to mirror any conditional configuration of the upstream project such as `vars`, environment variables, or `target.name`. You can reference them directly wherever the Finance team is building their models in production. Even if the Finance team makes changes like renaming the model, changing the name of its schema, or [bumping its version](/docs/collaborate/govern/model-versions), your `ref` would still resolve successfully.
- You eliminate the risk of accidentally building those models with `dbt run` or `dbt build`. While you can select those models, you can't actually build them. This prevents unexpected warehouse costs and permissions issues. This also ensures proper ownership and cost allocation for each team's models.

### Usage

**Writing `ref`:** Models referenced from a `project`-type dependency must use [two-argument `ref`](/reference/dbt-jinja-functions/ref#two-argument-variant), including the project name:

<File name="models/marts/roi_by_channel.sql">

```sql
with monthly_revenue as (

select * from {{ ref('jaffle_finance', 'monthly_revenue') }}

),

...

```

</File>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Package use case

**Cycle detection:** Currently, "project" dependencies can only go in one direction, meaning that the `jaffle_finance` project could not add a new model that depends, in turn, on `jaffle_marketing.roi_by_channel`. dbt will check for cycles across projects and raise errors if any are detected. We are considering support for this pattern in the future, whereby dbt would still check for node-level cycles while allowing cycles at the project level.

### Comparison

If you were to instead install the `jaffle_finance` project as a `package` dependency, you would instead be pulling down its full source code and adding it to your runtime environment. This means:
- dbt needs to parse and resolve more inputs (which is slower)
- dbt expects you to configure these models as if they were your own (with `vars`, env vars, etc)
- dbt will run these models as your own unless you explicitly `--exclude` them
- You could be using the project's models in a way that their maintainer (the Finance team) hasn't intended

There are a few cases where installing another internal project as a package can be a useful pattern:
- Unified deployments &mdash; In a production environment, if the central data platform team of Jaffle Shop wanted to schedule the deployment of models across both `jaffle_finance` and `jaffle_marketing`, they could use dbt's [selection syntax](/reference/node-selection/syntax) to create a new "passthrough" project that installed both projects as packages.
- Coordinated changes &mdash; In development, if you wanted to test the effects of a change to a public model in an upstream project (`jaffle_finance.monthly_revenue`) on a downstream model (`jaffle_marketing.roi_by_channel`) _before_ introducing changes to a staging or production environment, you can install the `jaffle_finance` package as a package within `jaffle_marketing`. The installation can point to a specific git branch, however, if you find yourself frequently needing to perform end-to-end testing across both projects, we recommend you re-examine if this represents a stable interface boundary.

These are the exceptions, rather than the rule. Installing another team's project as a package adds complexity, latency, and risk of unnecessary costs. By defining clear interface boundaries across teams, by serving one team's public models as "APIs" to another, and by enabling practitioners to develop with a more narrowly-defined scope, we can enable more people to contribute, with more confidence, while requiring less context upfront.
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ dbt Labs is committed to providing backward compatibility for all versions 1.x,

[**Namespacing:**](/faqs/Models/unique-model-names) Model names can be duplicated across different namespaces (packages/projects), so long as they are unique within each package/project. We strongly encourage using [two-argument `ref`](/reference/dbt-jinja-functions/ref#two-argument-variant) when referencing a model from a different package/project.

[**Project dependencies**](/docs/collaborate/govern/project-dependencies): Introduces `dependencies.yml` and dependent `projects` as a feature of dbt Cloud Enterprise. Allows enforcing model access (public vs. protected/private) across project/package boundaries. Enables cross-project `ref` of public models, without requiring the installation of upstream source code.

### Quick hits

More consistency and flexibility around packages! Resources defined in a package will respect variable and global macro definitions within the scope of that package.
Expand Down
1 change: 1 addition & 0 deletions website/docs/reference/dbt_project.yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ vars:
- macro_namespace: packagename
search_order: [packagename]

[restrict-access](/docs/collaborate/govern/model-access): true | false
```

</File>
1 change: 1 addition & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -397,6 +397,7 @@ const sidebarSettings = {
"docs/collaborate/govern/model-access",
"docs/collaborate/govern/model-contracts",
"docs/collaborate/govern/model-versions",
"docs/collaborate/govern/project-dependencies",
],
},
],
Expand Down
Loading