Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendorize jsonschema #1591

Merged
merged 12 commits into from
Aug 7, 2023

Conversation

braingram
Copy link
Contributor

@braingram braingram commented Jul 17, 2023

This is a follow up to #1586 that is largely the same except for preserving all drafts (4 and others) as removing all but draft 4 resulted in no significant gains.

This PR 'vendorizes' jsonschema 4.17.3 and removes much of it's unused features including:

  • cli
  • removes automatic http/s url fetching
  • change pyrsistent pmap usage to dictionaries (performance seems comparable and allows removal of the pyrsistent dependency)
  • removed the deprecated jsonschema.__version__ (which relied on reading the package metadata)

The tests and test suite from jsonschema are preserved mostly intact (tests related to removed features above were removed). As there are many drafts and many tests (especially when the test suite is used), this PR adds a pytest option jsonschema which when provided will run the jsonschema related tests (these are off by default). This PR also adds checks to the github ci to trigger these tests when a jsonschema label is added to a PR (like with this PR).

This PR keeps jsonschema as a dependency (but remove the upper pin) and conditionally use the pypi provided jsonschema only for exception classes to allow downstream code to continue catching jsonschema.exceptions.ValidationError until we can add equivalent exceptions in asdf and transition downstream code to catch the asdf provided exceptions.

I have mixed thoughts about these changes but overall think they are an improvement over keeping the upper pin for jsonschema.

Some pros are:

  • we no longer have to work around jsonschema API changes (and our previous modifications can be revisited by reaching into the vendorized version to avoid needing to patch classes we were no longer allowed to subclass)
  • we remove a dependency that is undergoing rather major change in a minor version (and with the modifications in this PR also remove the indirect dependency onpyrsistent). Note that jsonschema 4.18 introduced several new dependencies (jsonschema-specifications, referencing, rpds-py), all less than a year old, not yet at version 1.0 (except for jsonschema-specifications which doesn't follow semantic versioning) and introduces rust as a dependency (via rpds-py)
  • we can remove the upper pin on jsonschema

Some cons are:

  • we inherit all of the known and unknown bugs in 4.17.3 and it is now our problem to fix
  • we are now responsible for maintaining the vendorized code to maintain compatibility with new python and dependency versions
  • we no longer benefit from an actively maintained project

Related to the last con, I am hopeful that 4.18 (or perhaps a later version) will be more performant and hopefully compatible with asdf in the future. So hopefully this vendorization is a temporary change.

Related to the last pro, the removal of the upper pin, jsonschema is a popular library (3 million downloads yesterday). Keeping the upper pin risks compatibility issues with asdf and other libraries (which can likely upgrade to 4.18 because they don't rely on complex schemas like asdf) so this seems like a major pro.

The asdf-standard CI failure should be fixed by:
asdf-format/asdf-standard#391

Specifically, jsonschema 4.18 does not currently support custom metaschemas (used extensively in asdf). The asdf-standard test suite uses jsonschema directly to test schemas. These tests are not correctly finding errors with 4.18 as the schemas are being checked against the draft 4 meta schema and not the defined custom schema.

@github-actions github-actions bot added this to the 3.0.0 milestone Jul 17, 2023
@braingram braingram marked this pull request as ready for review July 17, 2023 17:57
@braingram braingram requested a review from a team as a code owner July 17, 2023 17:57
Copy link
Contributor

@eslavich eslavich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one question about the license, otherwise this looks great

@@ -0,0 +1,19 @@
Copyright (c) 2013 Julian Berman
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include this in a more obvious location, something along the lines of astropy's top-level licenses directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great question. I never looked at the atropy top-level licenses (thanks for mentioning it!).

I stuck it in the source directory so that it would be packaged with the source (I'm not sure if this is necessary but it does seem like it should satisfy the requirement that the "copyright notice and this permission notice shall be included in all copies or substantial portions of the Software".

It doesn't appear that astropy packages the licenses they include in the top-level folder (looking at the DataTables license it mentions copyright by Allan Jardine and I find no mention of that name in the installed package).

What do you think about mentioning (perhaps in the README) that the vendorized jsonschema is subject to copyright and use restrictions by the license found in the _jsonschema_ submodule (or hopefully something more elegantly worded). I don't see a license notice at the moment so perhaps adding it to the end of the README and also mentioning the broader license in LICENSE?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good solution. I suspect people who are concerned about licenses are apt to look in the repo root but may not realize they need to drill down into other directories. A obvious pointer of some kind to nested license file would do the trick (could even check in a soft link?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a top level licenses directory with soft links to the included licenses (good idea!) and a link in the readme.



try:
from jsonschema import ValidationError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's possible that some future version of jsonschema will change ValidationError enough that it will no longer work with this code, but I don't think that's likely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a risk and we should mention that we might not be able to sustain this (even for minor version changes in asdf 3.0). The shadowing of these exceptions was the reason I had to bump the minimum jsonschema version to 4.8 (from 4.0.1, which contained exceptions that didn't behave in the same way).

Perhaps this is a good excuse to start writing the asdf 3.0 release docs (with a section about the vendorized jsonschema). I'll take a stab at this and mention that:

  • we vendorized jsonschema 4.17.3.
  • we are keeping jsonschema as a dependency (as downstream packages often expected this to be installed with asdf).
  • we will attempt to raise exceptions imported from any installed jsonschema (so code that catches exceptions from jsonschema will continue to work) but encourage developers to instead catch asdf.exceptions.ValidationError as we're unsure if we can maintain full compatibility between exceptions in the vendorized and non-vendorized versions of jsonschema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a What's New page with the discussed information:
https://asdf--1591.org.readthedocs.build/en/1591/asdf/whats_new.html#whats-new

Given the recent release of jupyter-events which now requires jsonschema > 4.18 I'm inclined to backport this PR to 2.15 and release a 2.15.2 with jsonschema vendorized. That will need to involve some rewording to make sure we mention that the vendorization took place in 2.15 (not 3.0) and change log entry fixes as well.

@braingram braingram force-pushed the vendorize_jsonschema_w_tests branch from 50d02fa to ef8821f Compare August 1, 2023 21:14
@braingram
Copy link
Contributor Author

braingram commented Aug 4, 2023

jwst devdeps failures are due to use of the released stdatamodels which does not contain the fix:
spacetelescope/stdatamodels#184
for an issue revealed by asdf PR:
#1561
(the same jwst devdeps failures are seen in other unrelated asdf PRs like: #1597)

Copy link
Contributor

@eslavich eslavich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@braingram braingram merged commit 09b04a8 into asdf-format:main Aug 7, 2023
45 of 46 checks passed
@braingram braingram deleted the vendorize_jsonschema_w_tests branch August 7, 2023 11:30
braingram added a commit to braingram/asdf that referenced this pull request Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants