Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] CT-201 source config WIP #5003

Closed
wants to merge 8 commits into from
Closed

Conversation

emmyoop
Copy link
Member

@emmyoop emmyoop commented Apr 6, 2022

resolves #3662

Description

Draft PR for feature branch.

Some context over in #4960

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have added information about my change to be included in the CHANGELOG.

@cla-bot cla-bot bot added the cla:yes label Apr 6, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2022

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@emmyoop emmyoop changed the title Feature/ct 201 source config [Feature] CT-201 source config WIP Apr 6, 2022
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close! I left a few comments that may provide helpful (or least historical) context

base=False,
patch_config_dict=target.source.config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need access to both:

  • target.source.config = config defined at the source level
  • target.table.config = config defined at the source table level

I think there are two options here:

  • Merge those configs together (preferring table-level over source-level), before passing into patch_config_dict
  • Modify calculate_node_config to accept multiple "levels" of patches, so that all calculation happens in that same method: source table > source > project-level config

@@ -465,6 +465,8 @@ def parse_project(
else:
dct = block.file.dict_from_yaml
parser.parse_file(block, dct=dct)
# Came out of here with UnpatchedSourceDefinition containing configs at the source level
# and not configs at the table level (as expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment re: patch_config_dict passed into calculate_node_config

default=None,
metadata=CompareBehavior.Exclude.meta(),
)
# TODO what type is this? docs say: "<column_name_or_expression>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String is right.

Food for thought: These dataclass attributes are duplicative with the ones already defined:

Given our aim here is backwards compatibility, that duplication may be unavoidable

@@ -335,6 +335,39 @@ def replace(self, **kwargs):
@dataclass
class SourceConfig(BaseConfig):
enabled: bool = True
quoting: Dict[str, Any] = field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move the addition of these new configs into a separate PR? If the goal is just to support setting enabled as a config on the source/table levels

)

unrendered_config = self._generate_source_config(
fqn=target.fqn,
target=target,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Background context on rendered: bool, UnrenderedConfigGenerator, and unrendered_config, since these are pretty confusing!

This exists to limit "false positives" during state comparison, i.e. to power state:modified selection in Slim CI runs ("just build what's changed"): https://docs.getdbt.com/reference/node-selection/state-comparison-caveats#false-positives

The comparison happens in the same_config method here:

def same_config(self, old: T) -> bool:
return self.config.same_contents(
self.unrendered_config,
old.unrendered_config,
)

Idea being, it's not uncommon for users to have environment-based configuration like:

sources:
  - name: my_source
    config:
      database: "{{ 'prod_raw' if target.name == 'prod' else 'sampled_dev' }}"
      schema: "{{ env_var('DBT_SOURCE_SCHEMA') }}"
      # etc

The rendered versions of those configs will be different based on the values of env vars or target values. The thing we actually want to detect changes in is the raw un-rendered Jinja expression. If that's changed, it's because someone edited the thing.

This works for configs set in dbt_project.yml, which are stored as unrendered Jinja expressions, and passed twice into calculate_node_config (once with rendering, once without).

However, we've already rendered all fields in (non-project) yaml files, during initial parsing:

# Render the data (except for tests and descriptions).
# See the SchemaYamlRenderer
entry = self.render_entry(entry)

So we're passing already-rendered Jinja expressions into calculate_node_config, and we never get access to their unrendered forms. We'd need a subtle reordering of parsing/rendering to make it work as intended.

Related: #2744, #3576

@jtcohen6 jtcohen6 mentioned this pull request Apr 8, 2022
4 tasks
@nathaniel-may
Copy link
Contributor

@jtcohen6 I definitely requested your review on the wrong PR. This one will be for the full feature. The enabled flag work is going on in #5008. Thank you for all the context here!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2022

This PR has been marked as Stale because it has been open for 180 days with no activity. If you would like the PR to remain open, please remove the stale label or comment on the PR, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Oct 6, 2022
@emmyoop emmyoop closed this Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes stale Issues that have gone stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-201] Reconcile configs + properties for sources
3 participants