Feature/yaml selections #2640

beckjake · 2020-07-22T15:57:01Z

resolves #2172

Description

This PR implements the advanced node selection syntax, mostly as described in #2172. I've made some changes to accommodate yaml syntax things, but I tried to keep with the spirit of the thing.

Here's a selectors.yml example file (taken from tests):

selectors:
  - name: views-or-foos-not-bars
    definition:
      union:
        - method: config.materialized
          value: view
        - tag: foo
        - exclude:
          - tag: bar

It defines a selector that is the all models materialized as views or tagged foo, except without any models tagged bar.

I didn't use hologram for parsing here - hologram (well, actually Python's type system) doesn't support recursive type definitions. Hologram also does a poor job of handling ambiguous specifications like this one: it's very hard to get hologram to support both the semi-arbitrary {'method': 'tag', 'value': 'foo'} spec and the fully-arbitrary-but-one-key {'tag': 'foo'} spec.

Selector definitions

There are 3 ways to define a simple selector:

definition:
  tag: foo

This parses to a list entry containing a dict like {'tag': 'foo'}. It's converted, but you can't use modifiers like @ or + here. We could add support for modifiers by examining the keys and values and taking any prefixes/suffixes, or more likely doing ':'.join([key, value]) and passing that in to the string parsing logic. The more I think about this, the more I think it would be good, if only for consistency with the next form.
We could also add support for exclude in this syntax if we wanted, though I think it makes the subtle distinction between tag: foo and tag:foo much more confusing.

definition:
  tag:foo

This parses to a string entry like 'tag:foo', which is then parsed like CLI arguments are. You can use modifiers like @ or + here, though yaml will want you to quote them: (- "@tag:foo")

definition:
  method: tag
  value: foo
  childrens_parents: true

This parses to a dictionary entry like {'method': 'tag', 'value': 'foo', 'childrens_parents': True}. This is what the string form is converted into, and then it goes down the same conversion route.

Combinations

Internally, this code still uses the same basic ideas introduced in previous PRs: You can combine values as unions, differences, and intersections of sets of selector definitions. The union and intersection combinations are themselves selector definitions, and can be used anywhere. Set differences are discussed below, but basically exclude can exist anywhere a selector definition could, or within a simple selector definition (which makes it... not so simple).

Only the third form of simple selector definition can be used with exclude. For example, to do what dbt run --exclude @tag:foo does:

- method: fqn
   value: "*"
   exclude:
   - "@tag:foo"

Of course, you can define a one-element union with exclusions if you prefer that syntax:

- union:
    - "fqn:*"
    - exclude:
      - "@tag:foo"

Set differences

The 'exclude' key is the only way to specify set differences. It accepts a list of definitions that are then unioned together. I could definitely be convinced that that's wrong and the value should instead be just a definition. My reasoning rests on the assertion that (at least in Python!) difference(a, union(b, c, d)) is the same as difference(a, b, c, d), which I feel reasonably confident about.

I think it'd be reasonable to add a difference key that acts as an explicit set difference. It would be its own value, as opposed to exclude, which I think of as modifying its parents: {method: 'fqn', value: '*', exclude: ['@tag:foo']} actually becomes difference("fqn:*", "@tag:foo").

Exclude syntax

The syntax isn't immensely satisfying to me, especially around exclude. Currently, a union with exclusions has exclude: as one of its elements. Does it make more sense for exclude: to live on the same level as union?:

name: my-selector
definition:
  union:
    - tag:foo
    - config.materialized: view
  exclude:
    - tag: bar

I don't feel like this is really better at all (I think the lack of indentation makes it hard to parse mentally at a glance), but I don't have great taste on this kind of thing.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

This doesn't use hologram, as it needs more permissive schemas than what hologram provides. Added some unit tests

Added tests Added RPC and CLI support

jtcohen6

I'm very excited about this! I read through the description yesterday, wrote up some examples for docs last night, and then checked to see if my pseudo-syntax worked in practice. Almost perfect.

I'm fine with using exclude syntax for set differences, for now. I agree that it feels more intuitive to have exclude live as a nested object within union (or single-arg definition). "Give me all of this/these, except this carved-out subset."

I think this syntax will get us a lot of mileage before we need to add any more complexity. I'm confident there are talented community members who will find interesting places to go with these selectors + YML anchors.

mlavoie-sm360 · 2020-08-20T14:52:44Z

Can we make suggestions?

When executing a run using a --select, it would be nice if the logs printed the expanded version of that selector, i.e.

selectors:
  - name: my_selector
    definition:
      union:
        - method: tag
          value: something
          parents: true
        - exclude:
            - method: tag
              value: something_else

Would then print out something like:

dbt run --select my_selector
Running with dbt=0.18.0-rc1
Found 562 models, 578 tests, 8 snapshots, 177 analyses, 323 macros, 4 operations, 40 seed files, 198 sources

15:40:52 | Executing dbt run --models +tag:something --exclude tag:something_else
15:40:52 | 
15:40:52 | Running 2 on-run-start hooks
...

jtcohen6 · 2020-08-20T15:01:36Z

@mlavoie-sm360 Suggestions very welcome! There's a separate open issue (#2700) around making YAML selectors user-friendlier, specifically for better logging / errors. Would you mind posting your comment there so we don't lose track of it?

Jacob Beck added 2 commits July 22, 2020 07:58

Parse selectors

a976e54

This doesn't use hologram, as it needs more permissive schemas than what hologram provides. Added some unit tests

Create selector config file, handle selector argument

1e8543e

Added tests Added RPC and CLI support

cla-bot bot added the cla:yes label Jul 22, 2020

add changelog

184146b

beckjake marked this pull request as ready for review July 22, 2020 16:22

beckjake requested review from jtcohen6 and kwigley July 22, 2020 16:22

jtcohen6 mentioned this pull request Jul 22, 2020

Slim CI runs, via state comparison and deferred refs #2641

Closed

kwigley approved these changes Jul 23, 2020

View reviewed changes

jtcohen6 mentioned this pull request Jul 24, 2020

[v0.18.0] New selectors dbt-labs/docs.getdbt.com#302

Merged

2 tasks

jtcohen6 approved these changes Jul 24, 2020

View reviewed changes

beckjake merged commit 3ec911b into dev/marian-anderson Jul 24, 2020

beckjake deleted the feature/yaml-selections branch July 24, 2020 20:30

jtcohen6 mentioned this pull request Aug 13, 2020

Better error messaging for YAML selectors #2700

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/yaml selections #2640

Feature/yaml selections #2640

beckjake commented Jul 22, 2020 •

edited

Loading

jtcohen6 left a comment

mlavoie-sm360 commented Aug 20, 2020

jtcohen6 commented Aug 20, 2020

Feature/yaml selections #2640

Feature/yaml selections #2640

Conversation

beckjake commented Jul 22, 2020 • edited Loading

Description

Selector definitions

Combinations

Set differences

Exclude syntax

Checklist

jtcohen6 left a comment

Choose a reason for hiding this comment

mlavoie-sm360 commented Aug 20, 2020

jtcohen6 commented Aug 20, 2020

beckjake commented Jul 22, 2020 •

edited

Loading