-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Pydantic types #13
Conversation
- created new module for API model types `eark_validator.model` that contains types for: - `Level` (requirements level); - `PackageDetails`; - `Severity` (of test result); - `StructResults` (for structural validation results); - `StructStatus` (for structural validation status); - `TestResult` (for validation test results); and - `ValidationReport` (for final validation report); - refactored structural validation to use new types; - small fix to schematron test for `CSIP11` in `mets_metsHdr_rules.xml`; - new `eark_validator.packages` module for package validation; - introduced a `PackageHandler` type to start abstraction of package parsing (replaces `STRUCT.ArchivePackageHandler`); - introduced a `ValidationReport` type to handle final aggregation of validation results; - added missing `__init__.py` files with appropriate commenting; - removed some defunct types; - removed unused imports; - introduced light use of pydantic, better use to come; and - unit test improvements and fixes.
FIX: Pylint errors.
- converted struct types to Pydantic; and - fixed constructors where necessary.
- let the validator handle package unpacking; - added empty list default for `Checksum`; and - replaced `pprint` with `print` for now.
- added model types for Checkums and ChecksumAlg types; - removed/refactored defunct Checksum types; and - refactored tests to use new Checksum types.
- refactored model code to group related types in modules; - better inclusing of model code in `__init__.py`; - completed addition of type hints to methods; - output of CLI now mostly JSON; - terminate processing of bad input files quickly; and - removed unused imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Carl,
To be able to use the package I needed to explicitly set my venv to Python3.11 and I have detailed why in a previous comment.
Having made that change I still had an issue where the xsd resources are not in the installed package:
lxml.etree.XMLSchemaParseError: Failed to locate the main schema resource at "..."
If I manually copy the files into the correct library folder within venv folder then I can use the cli tool via "ip-check" without errors.
In previous python builds you could have these files built into the package by adding
include_package_data = True
To the setup.cfg file. The process may have changes with the pyproject.toml build configuration file.
But once this has been addressed, I think this will be ready to merge into main branch.
Darren
from eark_validator.model.checksum import Checksum | ||
|
||
class ManifestEntry(BaseModel): | ||
path : Path | str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not valid for Python3.9 so my initial attempts to use the package caused an exception due to this not being a supported use of the | operator.
Installing Python3.10 or above resolves this error. And while Python 3.10 is gaining greater support, Python3.9 is still quite prevalent and is the default Python3.x version on many Linux distributions.
So either a minimum version requirement should be mentioned in the README file, or we make this backwards compatible with Python3.9.
There are other instances of this type hinting usage of the | character, but this was the first for me to raise an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with you, I think backward compatibility is more important here and 3.11 as mandatory is too restrictive. A build matrix with the appropriate versions here would pick this up automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm beginning to wonder, Python 3.9 is EOL in October 2025: https://devguide.python.org/versions/ and the code is fine with 3.10 and beyond: https://github.com/E-ARK-Software/eark-validator/actions/runs/7758461444
Thoughts welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the changes made in this pull request. It's clear that a significant amount of work has gone into this refactor, and I appreciate the effort to improve our code base.
The changes are consistent, the code is cleaner, and the overall structure seems more maintainable. I did not find any issues that would prevent these changes from being merged.
I'm approving these changes and looking forward to seeing them integrated into our main branch, and the continued improvements.
I agree that we should drop support for Python 3.9 and focus on 3.10 and above as the added improvements this brings, especially in light of the fact that 3.9 is heading towards EOL.
pprint
withprint
for now.