Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive calls leading to an OOM when parsing a valid schema #1016

Closed
dweiss opened this issue Apr 15, 2024 · 6 comments · Fixed by #1018
Closed

Recursive calls leading to an OOM when parsing a valid schema #1016

dweiss opened this issue Apr 15, 2024 · 6 comments · Fixed by #1018

Comments

@dweiss
Copy link

dweiss commented Apr 15, 2024

Hello!

I've been trying to upgrade our project to the latest version but detected an issue where the validator causes an OOM with a schema that previously worked just fine (and validates against other on-line validators). This commit in my fork reproduces the problem:

dweiss@cefbf5b

It seems to be somewhere around the loop initializing validators:
image

It's not clear to my why this doesn't end with a stack overflow - instead, it just leads to an OOM, almost as if it were fanning out somewhere (there are lots of refs in that schema).

I didn't dig deep but I thought you folks would be interested and would have more expertise to tell what's going on. Thank you.

@dweiss
Copy link
Author

dweiss commented Apr 15, 2024

I attach the problematic schema here too, for convenience.
schema.json

@dweiss
Copy link
Author

dweiss commented Apr 15, 2024

Maybe it'll be of help, bisect shows this commit as the one after which the code hangs: 7f1ec11 (#931).

@justin-tay
Copy link
Contributor

Unfortunately this typically means that there's not enough memory to load your schema and you would need to allocate more heap for your JVM.

Previously the default was to lazily load the schema. Now the default is to preload the schema.

You can still configure it to lazily load the schema with the setPreloadJsonSchema option.

    @Test
    void testNoErrorsForEmptyObject() throws IOException {
    	SchemaValidatorsConfig config = new SchemaValidatorsConfig();
    	config.setPreloadJsonSchema(false);
        getJsonSchemaFromClasspath(resource("schema.json"), SpecVersion.VersionFlag.V7, config);
    }

However, you should note that if you have inputs that exercise all the possible evaluation paths of your schema as defined, then it's just going to OOM later when if validates such an input.

@dweiss
Copy link
Author

dweiss commented Apr 15, 2024

That seems... improbable, @justin-tay ? I mean - this schema is not that large (or complex). Something seems wrong if it needs in excess of gigabytes of memory to load validators for a mere 300kb?

@justin-tay
Copy link
Contributor

The state of the validators / schemas are different along the evaluation path. Therefore the code will need to create validator / schema instances at each evaluation path to store this state. This is cached since throwing it away impacts performance as this state is fixed for a particular schema. Throwing it away means that the same state needs to be regenerated between each run and also increases the objects that need to be garbage collected.

The following is a small sample of the evaluation paths that get loaded for your schema.

/properties/components/additionalProperties/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/7/$ref/properties/clusters/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/3/$ref/properties/matrix/allOf/0/$ref/oneOf/6/$ref/properties/vectors/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/7/$ref/properties/exclude/items/$ref/oneOf/2/$ref/properties/entries/oneOf/1/$ref
/properties/components/additionalProperties/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/7/$ref/properties/clusters/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/3/$ref/properties/matrix/allOf/0/$ref/oneOf/6/$ref/properties/vectors/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/7/$ref/properties/exclude/items/$ref/oneOf/2/$ref/properties/comment/$ref
/properties/components/additionalProperties/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/7/$ref/properties/clusters/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/3/$ref/properties/matrix/allOf/0/$ref/oneOf/6/$ref/properties/vectors/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/7/$ref/properties/exclude/items/$ref/oneOf/3/$ref/properties/dictionary/oneOf/1/$ref
/properties/components/additionalProperties/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/7/$ref/properties/clusters/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/3/$ref/properties/matrix/allOf/0/$ref/oneOf/6/$ref/properties/vectors/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/7/$ref/properties/exclude/items/$ref/oneOf/3/$ref/properties/comment/$ref
/properties/components/additionalProperties/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/7/$ref/properties/clusters/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/3/$ref/properties/matrix/allOf/0/$ref/oneOf/6/$ref/properties/vectors/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/7/$ref/properties/exclude/items/$ref/oneOf/4/$ref/properties/comment/$ref
/properties/components/additionalProperties/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/7/$ref/properties/clusters/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/3/$ref/properties/matrix/allOf/0/$ref/oneOf/6/$ref/properties/vectors/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/7/$ref/properties/exclude/items/$ref/oneOf/5/$ref/properties/comment/$ref
/properties/components/additionalProperties/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/7/$ref/properties/clusters/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/3/$ref/properties/matrix/allOf/0/$ref/oneOf/6/$ref/properties/vectors/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/3/$ref/properties/labelAggregator/allOf/0/$ref/oneOf/0/$ref/properties/labelCollector/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref
/properties/components/additionalProperties/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/7/$ref/properties/clusters/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/3/$ref/properties/matrix/allOf/0/$ref/oneOf/6/$ref/properties/vectors/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/3/$ref/properties/labelAggregator/allOf/0/$ref/oneOf/0/$ref/properties/labelCollector/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/0/$ref
/properties/components/additionalProperties/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/7/$ref/properties/clusters/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/3/$ref/properties/matrix/allOf/0/$ref/oneOf/6/$ref/properties/vectors/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/3/$ref/properties/labelAggregator/allOf/0/$ref/oneOf/0/$ref/properties/labelCollector/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/1/$ref
/properties/components/additionalProperties/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/7/$ref/properties/clusters/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/3/$ref/properties/matrix/allOf/0/$ref/oneOf/6/$ref/properties/vectors/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/3/$ref/properties/labelAggregator/allOf/0/$ref/oneOf/0/$ref/properties/labelCollector/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/1/$ref/properties/operator/oneOf/1/$ref
/properties/stages/additionalProperties/$ref/oneOf/5/$ref/properties/clusters/allOf/0/$ref/oneOf/1/$ref/properties/matrix/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/2/$ref/properties/vectors/properties/columns/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/7/$ref/properties/exclude/items/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/11/$ref/properties/documentPairs/allOf/0/$ref/oneOf/1/$ref/properties/validation/allOf/0/$ref/properties/pairwiseSimilarity/allOf/0/$ref/oneOf/1/$ref/properties/features/allOf/0/$ref/oneOf/9/$ref/properties/minFeatureCount/oneOf/1/$ref
/properties/stages/additionalProperties/$ref/oneOf/5/$ref/properties/clusters/allOf/0/$ref/oneOf/1/$ref/properties/matrix/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/2/$ref/properties/vectors/properties/columns/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/7/$ref/properties/exclude/items/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/11/$ref/properties/documentPairs/allOf/0/$ref/oneOf/1/$ref/properties/validation/allOf/0/$ref/properties/pairwiseSimilarity/allOf/0/$ref/oneOf/1/$ref/properties/features/allOf/0/$ref/oneOf/9/$ref/properties/maxFeatureCount/oneOf/0/allOf/0/$ref
/properties/stages/additionalProperties/$ref/oneOf/5/$ref/properties/clusters/allOf/0/$ref/oneOf/1/$ref/properties/matrix/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/2/$ref/properties/vectors/properties/columns/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/7/$ref/properties/exclude/items/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/11/$ref/properties/documentPairs/allOf/0/$ref/oneOf/1/$ref/properties/validation/allOf/0/$ref/properties/pairwiseSimilarity/allOf/0/$ref/oneOf/1/$ref/properties/features/allOf/0/$ref/oneOf/9/$ref/properties/maxFeatureCount/oneOf/1/$ref
/properties/stages/additionalProperties/$ref/oneOf/5/$ref/properties/clusters/allOf/0/$ref/oneOf/1/$ref/properties/matrix/allOf/0/$ref/oneOf/2/$ref/properties/matrixRows/allOf/0/$ref/oneOf/2/$ref/properties/vectors/properties/columns/allOf/0/$ref/oneOf/1/$ref/properties/labels/allOf/0/$ref/oneOf/1/$ref/properties/labelFilter/allOf/0/$ref/oneOf/7/$ref/properties/exclude/items/$ref/oneOf/5/$ref/properties/query/allOf/0/$ref/oneOf/8/$ref/properties/documents/allOf/0/$ref/oneOf/11/$ref/properties/documentPairs/allOf/0/$ref/oneOf/1/$ref/properties/validation/allOf/0/$ref/properties/pairwiseSimilarity/allOf/0/$ref/oneOf/1/$ref/properties/features/allOf/0/$ref/oneOf/9/$ref/properties/comment/$ref

Looking at this it looks like even if you don't preload, you could still potentially get an OOM during execution if you have inputs that exercise the evaluation paths and if you don't want to allocate memory for all the evaluation paths as there is currently no option not to cache the loaded schema during the run itself.

This probably needs more memory than the consolidated FHIR schema, which like a 4 MB schema so it's not really the size in bytes of the schema itself.

@dweiss
Copy link
Author

dweiss commented Apr 16, 2024

Right, I see it now. It's the fanout on evaluation of refs then - you can easily make this exponential and crash schema parsing even with a very small input. It's basically what's happening with the example I attached. Thank you for looking for a workaround - I don't think multiple refs are going to be that uncommon in the wild, so it's probably good to have an option to turn off caching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants