Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendorize simplified jsonschema 4.17.3 #1586

Closed
wants to merge 23 commits into from

Conversation

braingram
Copy link
Contributor

@braingram braingram commented Jul 11, 2023

This PR 'vendorizes' jsonschema 4.17.3 and removes much of it's unused features including:

  • cli
  • all non-draft4 validators, type checkers, etc
  • change pyrsistent pmap usage to dictionaries (performance seems comparable and allows removal of the pyrsistent dependency)

This PR keeps jsonschema as a dependency (but remove the upper pin) and conditionally use the pypi provided jsonschema only for exception classes to allow downstream code to continue catching jsonschema.exceptions.ValidationError until we can add equivalent exceptions in asdf and transition downstream code to catch the asdf provided exceptions.

I have mixed thoughts about these changes.

Some pros are:

  • we no longer have to work around jsonschema API changes (and our previous modifications can be revisited by reaching into the vendorized version to avoid needing to patch classes we were no longer allowed to subclass)
  • we remove a dependency that is undergoing rather major change in a minor version (and with the modifications in this PR also remove the indirect dependency onpyrsistent). Note that jsonschema 4.18 introduced several new dependencies (jsonschema-specifications, referencing, rpds-py), all less than a year old, not yet at version 1.0 (except for jsonschema-specifications which doesn't follow semantic versioning) and introduces rust as a dependency (via rpds-py)
  • we can remove the upper pin on jsonschema

Some cons are:

  • we inherit all of the known and unknown bugs in 4.17.3 and it is now our problem to fix
  • we are now responsible for maintaining the vendorized code to maintain compatibility with new python and dependency versions
  • we no longer benefit from an actively maintained project

Related to the last con, I am hopeful that 4.18 (or perhaps a later version) will be more performant and hopefully compatible with asdf in the future. So hopefully this vendorization is a temporary change.

Related to the last pro, the removal of the upper pin, jsonschema is a popular library (3 million downloads yesterday). Keeping the upper pin risks compatibility issues with asdf and other libraries (which can likely upgrade to 4.18 because they don't rely on complex schemas like asdf) so this seems like a major pro.

@CagtayFabry
Copy link
Contributor

Hi @braingram , just out of curiosity: is this planned to be implemented for 3.0?

@braingram
Copy link
Contributor Author

braingram commented Jul 12, 2023

Hi @CagtayFabry thanks for asking!

At the moment I'm not sure. I would love to hear your opinion on this and how weldx might be impacted by the options below.

In brief, asdf isn't compatible with jsonschema 4.18. The uses of the jsonschema library within asdf are far from simple and there is a history of needing to add upper pins and make rather complex internal changes to deal with new jsonschema versions.

At the moment jsonschema is pinned below 4.18. This doesn't seem sustainable so we're left with probably 3 options:

  1. Switch to a different jsonschema validation library. At the moment no suitable alternatives exist in part because much of asdf schema validation is a very thin wrapper around the jsonschema library and tied closely to the jsonschema api).
  2. Update asdf to be compatible with 4.18. Attempts so far have been unsuccessful but this might eventually work.
  3. Vendorize jsonschema 4.17.3 (this draft PR is to test one approach for accomplishing this).

Vendorizing jsonschema does cause a few complications. One being downstream libraries have been importing exceptions from jsonschema directly but I think the changes in this PR should allow users to transition to importing exceptions from asdf once we have those flushed out. The second issue is one of maintenance. Including an older version of jsonschema in asdf means we may have to fix bugs in the vendorized code (and keep it up to date for python and other dependency changes). This PR attempts to make the maintenance easier by stripping out all but draft 4 related code, removing uses of pyrsistent (which appears to used for a fast dictionary like structure, swapping this out with dictionaries results in an insignificant change to performance with the testing I've done so far).

One benefit of vendorizing jsonschema is that we can remove some of the complicated code that had to be added to deal with jsonschema api changes (mainly the inability to subclass Validator) which may in the end improve performance.

In addition to your opinions on the above options, has the jsonschema pin caused any issues for you?

EDIT: after posting I realized I didn't answer your question :) I am hoping that a fix (either unpinning or vendorizing) should be back-portable to 2.15.

@braingram braingram changed the title Gut jsonschema TEST: Vendorize simplified jsonschema 4.17.3 Jul 12, 2023
many expect asdf to include this so to limit the
number of immediate issues, include jsonschema as
a dependency
prior versions have different expections
about the ValidationError class which conflict
with the conditional imports in the vendorized
version (to ease transition away from jsonschema)
- adds tests
- remove all but draft4 validation
- add jsonschema test suite tests (don't bundle these
  and have them off by default to allow installed
  package to pass tests)
- remove pyrsistent dependency
add coverage report to run logs
add pytest durations for 10 slowest tests
@braingram
Copy link
Contributor Author

Roman regtests run and passed: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FRoman-Developers-Pull-Requests/detail/Roman-Developers-Pull-Requests/279/pipeline/

JWST regtests: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/783/pipeline
showed 5 failures with errors similar to Extra keyword 'R_DFLAT' in b: 'N/A' matching errors in different failing test in other runs:
spacetelescope/stdatamodels#178

I opened an initial attempt at backporting this to 2.15.x as a draft PR (if we decide that's the way to go) #1587
The CI on that branch will need to be updated to remove astropy testing, fix docs, etc. by backporting:
#1467
#1585

@braingram
Copy link
Contributor Author

The asdf-standard CI failure should be fixed by:
asdf-format/asdf-standard#391

Specifically, jsonschema 4.18 does not currently support custom metaschemas (used extensively in asdf). The asdf-standard test suite uses jsonschema directly to test schemas. These tests are not correctly finding errors with 4.18 as the schemas are being checked against the draft 4 meta schema and not the defined custom schema.

@braingram braingram marked this pull request as ready for review July 14, 2023 17:02
@braingram braingram requested a review from a team as a code owner July 14, 2023 17:02
@braingram braingram changed the title TEST: Vendorize simplified jsonschema 4.17.3 Vendorize simplified jsonschema 4.17.3 Jul 14, 2023
@braingram braingram marked this pull request as draft July 17, 2023 16:55
@braingram braingram closed this Jul 17, 2023
@braingram braingram mentioned this pull request Jul 17, 2023
@braingram
Copy link
Contributor Author

I closed this in favor of #1591 which is largely the same except that all existing drafts are preserved in the vendorized jsonschema (in case at some point in the future asdf decides to support post-4 drafts).

@braingram braingram deleted the gut_jsonschema branch September 18, 2023 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants