-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Schema validation performance #159
Comments
Within the |
I've been using this as a benchmark:
If you cProfile just the |
I'll play around with it. I'm also curious about what code breaks when json schema isn't used for validation. I think some duplicate validation is probably okay--if jsonschema slows things down so much, a little extra won't make a big difference there, and probably also won't add nearly as much overhead in the cases where jsonschema is turned off. But it's a question of how much redundancy we're talking about... |
Well, it's things like not checking if the dtype is one of the acceptable values, whether the |
I figured for the most part you're on your own if you disable validation, and would only do so on known-good files. But I agree it makes sense to catch some things in the software so we don't just end up with unhelpful uncaught exceptions. |
JSON schema validation currently takes 60% of load time on a benchmark with 10000 arrays.
Unlike the YAML parsing where there was a lot of low-hanging fruit, in JSON schema things are tricky. It's hard to figure out what to do to improve the performance of jsonschema without obliterating its really clean architecture.
Relatedly, I experimented with adding a flag to turn of JSON schema validation. The problem is that then many of the type converters become more brittle in interesting ways because they don't do their own validation that the JSON schema is currently doing for them. Duplicating that work seems like a way to only make things slower, so not sure what to do there.
The text was updated successfully, but these errors were encountered: