Convert to using mashumaro jsonschema with acceptable performance #8437

gshank · 2023-08-16T22:45:55Z

resolves #8426

Problem

Original conversion performed in #8132, but with performance issues. Use caching to improve performance.

See the comments in #8132 for additional context for code reviews.

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

codecov · 2023-08-16T22:48:45Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (07372db) 86.34% compared to head (6122517) 86.34%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #8437   +/-   ##
=======================================
  Coverage   86.34%   86.34%           
=======================================
  Files         174      174           
  Lines       25579    25531   -48     
=======================================
- Hits        22087    22046   -41     
+ Misses       3492     3485    -7

Flag	Coverage Δ
integration	`83.14% <100.00%> (+0.01%)`	⬆️
unit	`65.10% <95.04%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
core/dbt/parser/base.py	`93.45% <ø> (ø)`
core/dbt/utils.py	`81.38% <ø> (ø)`
core/setup.py	`0.00% <ø> (ø)`
core/dbt/context/context_config.py	`94.11% <100.00%> (+0.26%)`	⬆️
core/dbt/contracts/connection.py	`96.03% <100.00%> (ø)`
core/dbt/contracts/graph/model_config.py	`92.09% <100.00%> (-1.72%)`	⬇️
core/dbt/contracts/graph/nodes.py	`95.25% <100.00%> (ø)`
core/dbt/contracts/graph/unparsed.py	`93.10% <100.00%> (ø)`
core/dbt/contracts/project.py	`97.68% <100.00%> (+0.06%)`	⬆️
core/dbt/contracts/util.py	`93.83% <100.00%> (+1.47%)`	⬆️
... and 2 more

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

emmyoop · 2023-08-17T00:38:45Z

core/dbt/contracts/graph/model_config.py

    )
    pre_hook: List[Hook] = field(
        default_factory=list,
-        metadata=MergeBehavior.Append.meta(),
+        metadata={"merge": MergeBehavior.Append, "alias": "pre-hook"},


Why are aliases needed for Append now? Why does packages not need it on line 466?

The "alias" is for handling the dashes in the names properly. Most of the other field definitions use that kind of hacky metadata=MergeBehavior.DictKeyAppend.meta() thing, which doesn't allow setting additional metadata.

core/dbt/contracts/project.py

emmyoop · 2023-08-17T00:48:18Z

core/setup.py

@@ -72,12 +72,12 @@
        # ----
        # These are major-version-0 packages also maintained by dbt-labs. Accept patches.
        "dbt-extractor~=0.5.0",
-        "hologram~=0.0.16",  # includes transitive dependencies on python-dateutil and jsonschema


gshank · 2023-08-17T14:47:11Z

@jtcohen6 @graciegoheen I tagged you on this because the output jsonschema is different than what was generated by hologram in a number of ways. Do people actually read it? Do we have any concerns there?

For example, the resource_type shows up as a const, and the use of OneOf vs AnyOf is different.

jtcohen6 · 2023-08-23T09:30:14Z

@gshank I think that's fine, as long as this is a forward-looking change for new versions of dbt-core, and not a change to the existing published & versioned schemas.

We know that the jsonschemas generated by hologram were not always even technically correct, which could lead to edge cases if used for programmatic validation (e.g. #4657). I am hoping that the ones produced by mashumaro can achieve better correctness!

I just want to clarify that:

The actual contents of our contracted artifacts (manifest.json, run_results.json, catalog.json, sources.json) are not changing
The new JSONSchemas (produced by mashumaro) can be used to successfully validate those artifacts, in the ways that we know some users try to do programmatically (e.g. [CT-2268] [Bug] dbt-core >= 1.4.2 manifests not passing v8 schema validation #7119)
We will still be able to publish these JSONSchemas, and the "human-readable" versions, at schemas.getdbt.com

cc @dbt-labs/cloud-artifacts for visibility

gshank · 2023-08-23T13:49:27Z

That's right, the other schemas will change too. Should I update the other schemas too? Nothing has probably changed as far as validation... Or should we wait for an actual change and just verify that newly generated schemas still work?

jtcohen6 · 2023-08-23T14:10:39Z

@gshank Good point re: artifacts that won't actually be changing their schema in v1.7 (most likely catalog.json + run_results.json + sources.json). Let's verify that the new generated schemas actually work for validating instances of those artifacts, as produced by older versions of dbt-core — we can use our internal-analytics project as a real-world example. That sense-check would make me feel much better about updating the jsonschemas that we have published at schemas.getdbt.com.

mikealfare

I have a few questions, no serious concerns, and multiple nits (take them or leave them, just things I noticed).

mikealfare · 2023-08-23T14:34:03Z

core/dbt/context/context_config.py

+        return updated
+
+    def translate_hook_names(self, project_dict):
+        # This is a kind of kludge because the fix for #6411 specifically allowed misspelling


If this is not the intended input format, should we raise a warning here indicating that? I wouldn't cause anything to fail, but providing some direction would make it easier for us to deprecate the incorrect spelling in the future (likely one less thing for folks to change for 2.0).

That ticket specifically allowed the "incorrect" spellings, so it's now a feature.

There's no :lolsob: emoji, why is there no :lolsob: emoji when I need one so badly.

That being said, we don't intend on ever migrating folks off of the "incorrect" spelling either?

You'd have to ask product and Doug :). If you want to open a ticket, go ahead. Not in scope for this one though...

The misspelling here we mean is, we'll accept either kebab case or snake case for these two configs, in the several places they could be potentially defined:

post-hook or post_hook

pre-hook or pre_hook

The misspelling here we mean is, we'll accept either kebab case or snake case for these two configs

Agreed, I'm asking if we ever want to back out of that ditch, or support that for the foreseeable future.

core/dbt/contracts/graph/model_config.py

core/dbt/contracts/util.py

core/dbt/dataclass_schema.py

tests/unit/test_graph.py

tests/unit/utils.py

core/dbt/parser/base.py

mikealfare · 2023-08-30T16:26:32Z

tests/functional/artifacts/test_docs_generate_defer.py

+
+        # Check that catalog validates with jsonschema
+        catalog_dict = catalog.to_dict()
+        try:


I can't explain it, but this feels like an odd flow to me. Would something like this work?

assert catalog.validate(catalog_dict), "Catalog validation failed"

or even

assert catalog.validate(catalog.to_dict()), "Catalog validation failed"

mikealfare · 2023-08-30T16:27:10Z

tests/functional/sources/test_source_fresher_state.py

@@ -81,6 +82,10 @@ def _assert_freshness_results(self, path, state):
        with open(path) as fp:
            data = json.load(fp)

+        try:


Same comment as in test_docs_generate_defer.

)

* Add compiled node properties to run_results.json * Include compiled-node attributes in run_results.json * Fix typo * Bump schema version of run_results * Fix test assertions * Update expected run_results to reflect new attributes * Code review changes * Fix mypy warnings for ManifestLoader.load() (#8443) * revert python version for docker images (#8445) * revert python version for docker images * add comment to not update python version, update changelog * Bumping version to 1.7.0b1 and generate changelog * [CT-3013] Fix parsing of `window_groupings` (#8454) * Update semantic model parsing tests to check measure non_additive_dimension spec * Make `window_groupings` default to empty list if not specified on `non_additive_dimension` * Add changie doc for `window_groupings` parsing fix * update `Number` class to handle integer values (#8306) * add show test for json data * oh changie my changie * revert unecessary cahnge to fixture * keep decimal class for precision methods, but return __int__ value * jerco updates * update integer type * update other tests * Update .changes/unreleased/Fixes-20230803-093502.yaml --------- Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com> * Improve docker image README (#8212) * Improve docker image README - Fix unnecessary/missing newline escapes - Remove double whitespace between parameters - 2-space indent for extra lines in image build commands * Add changelog entry for #8212 * ADAP-814: Refactor prep for MV updates (#8459) * apply reformatting changes only for #8449 * add logging back to get_create_materialized_view_as_sql * changie * swap trigger (#8463) * update the implementation template (#8466) * update the implementation template * add colon * Split tests into classes (#8474) * add flaky decorator * split up tests into classes * revert update agate for int (#8478) * updated typing and methods to meet mypy standards (#8485) * Convert error to conditional warning for unversioned contracted model, fix msg format (#8451) * first pass, tests need updates * update proto defn * fixing tests * more test fixes * finish fixing test file * reformat the message * formatting messages * changelog * add event to unit test * feedback on message structure * WIP * fix up event to take in all fields * fix test * Fix ambiguous reference error for duplicate model names across packages with tests (#8488) * Safely remove external nodes from manifest (#8495) * [CT-2840] Improved semantic layer protocol satisfaction tests (#8456) * Test `SemanticModel` satisfies protocol when none of it's `Optionals` are specified * Add tests ensuring SourceFileMetadata and FileSlice satisfiy DSI protocols * Add test asserting Defaults obj satisfies protocol * Add test asserting SemanticModel with optionals specified satisfies protocol * Split dimension protocol satisfaction tests into with and without optionals * Simplify DSI Protocol import strategy in protocol satisfaction tests * Add test asserting DimensionValidtyParams satisfies protocol * Add test asserting DimensionTypeParams satisfies protocol * Split entity protocol satisfaction tests into with and without optionals * Split measure protocol satisfication tests and add measure aggregation params satisficaition test * Split metric protocol satisfaction test into optional specified an unspecified Additionally, create where_filter pytest fixture * Improve protocol satisfaction tests for MetricTypeParams and sub protocols Specifically we added/improved protocol satisfaction tests for - MetricTypeParams - MetricInput - MetricInputMeasure - MetricTimeWindow * Convert to using mashumaro jsonschema with acceptable performance (#8437) * Regenerate run_results schema after merging in changes from main. --------- Co-authored-by: Gerda Shank <gerda@dbtlabs.com> Co-authored-by: Matthew McKnight <91097623+McKnight-42@users.noreply.github.com> Co-authored-by: Github Build Bot <buildbot@fishtownanalytics.com> Co-authored-by: Quigley Malcolm <QMalcolm@users.noreply.github.com> Co-authored-by: dave-connors-3 <73915542+dave-connors-3@users.noreply.github.com> Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com> Co-authored-by: Jaime Martínez Rincón <jaime@jamezrin.name> Co-authored-by: Mike Alfare <13974384+mikealfare@users.noreply.github.com> Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>

gshank added 2 commits August 16, 2023 15:17

Switch from hologram to mashumaro jsonschema

e5a331d

Use lru_cache on json_schema method

0d0f7d9

gshank requested review from a team as code owners August 16, 2023 22:45

gshank requested review from mikealfare, emmyoop and aranke and removed request for a team August 16, 2023 22:45

cla-bot bot added the cla:yes label Aug 16, 2023

emmyoop reviewed Aug 17, 2023

View reviewed changes

gshank marked this pull request as draft August 17, 2023 13:27

Add back semver pattern comment

93eb22e

gshank marked this pull request as ready for review August 17, 2023 14:14

gshank requested review from jtcohen6 and graciegoheen August 17, 2023 14:44

mikealfare reviewed Aug 23, 2023

View reviewed changes

Changes for some code review comments

dde27b6

jtcohen6 reviewed Aug 25, 2023

View reviewed changes

core/dbt/parser/base.py Outdated Show resolved Hide resolved

gshank added 2 commits August 25, 2023 10:07

Merge branch 'main' into ct-3000-mashumaro_jsonschema

e6d02a9

Update schemas/dbt/manifest/v11.json

1a5f65b

gshank requested a review from a team as a code owner August 25, 2023 14:09

gshank requested review from heysweet and removed request for a team August 25, 2023 14:09

Update run_results schema

f5b22cb

heysweet requested review from eddowh and removed request for heysweet August 25, 2023 14:29

gshank added 5 commits August 25, 2023 11:51

Add schemas for catalog.json and sources.json

5149d59

Remove UnparsedNode resource_type type ignore

68a2f96

fix unit test

2161503

Merge branch 'main' into ct-3000-mashumaro_jsonschema

6812aa5

add tests for sources.json and catalog.json validation

6122517

mikealfare reviewed Aug 30, 2023

View reviewed changes

peterallenwebb approved these changes Aug 30, 2023

View reviewed changes

gshank merged commit f063e4e into main Aug 30, 2023
51 checks passed

gshank deleted the ct-3000-mashumaro_jsonschema branch August 30, 2023 18:07

peterallenwebb pushed a commit that referenced this pull request Aug 30, 2023

Convert to using mashumaro jsonschema with acceptable performance (#8437

9097548

)

gshank mentioned this pull request Aug 30, 2023

[ADAP-864] Make changes necessary to support dbt-core switch away from hologram dbt-labs/dbt-redshift#591

Closed

This was referenced Aug 30, 2023

[ADAP-866] [Regression] Dataclass updates needed to support migration off hologram dbt-labs/dbt-bigquery#906

Closed

[ADAP-867] Dataclass updates needed to support migration off hologram dbt-labs/dbt-spark#881

Closed

dataders mentioned this pull request Oct 2, 2023

unit test error: jsonschema.exceptions.ValidationError: 'database' is a required property databricks/dbt-databricks#469

Closed

tlento mentioned this pull request Nov 6, 2023

Update typing-extensions version to >=4.4 #9012

Merged

5 tasks

tlento mentioned this pull request Feb 26, 2024

Update dbt-semantic-interfaces dependency to compatible range #9671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert to using mashumaro jsonschema with acceptable performance #8437

Convert to using mashumaro jsonschema with acceptable performance #8437

gshank commented Aug 16, 2023 •

edited

Loading

codecov bot commented Aug 16, 2023 •

edited

Loading

emmyoop Aug 17, 2023

gshank Aug 17, 2023

emmyoop Aug 17, 2023

peterallenwebb Aug 17, 2023

gshank commented Aug 17, 2023

jtcohen6 commented Aug 23, 2023

gshank commented Aug 23, 2023

jtcohen6 commented Aug 23, 2023

mikealfare left a comment

mikealfare Aug 23, 2023

gshank Aug 23, 2023

mikealfare Aug 23, 2023

gshank Aug 25, 2023

jtcohen6 Aug 25, 2023

mikealfare Aug 25, 2023

mikealfare Aug 30, 2023

mikealfare Aug 30, 2023

Convert to using mashumaro jsonschema with acceptable performance #8437

Convert to using mashumaro jsonschema with acceptable performance #8437

Conversation

gshank commented Aug 16, 2023 • edited Loading

Problem

Checklist

codecov bot commented Aug 16, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gshank commented Aug 17, 2023

jtcohen6 commented Aug 23, 2023

gshank commented Aug 23, 2023

jtcohen6 commented Aug 23, 2023

mikealfare left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gshank commented Aug 16, 2023 •

edited

Loading

codecov bot commented Aug 16, 2023 •

edited

Loading