Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete validation for unevaluatedProperties #752

Closed
costas80 opened this issue May 12, 2023 · 5 comments · Fixed by #754
Closed

Incomplete validation for unevaluatedProperties #752

costas80 opened this issue May 12, 2023 · 5 comments · Fixed by #754
Assignees

Comments

@costas80
Copy link
Contributor

When looking into the fix for #750 I noticed an issue in an update made recently by @fdutton . There was a method introduced to convert a Legacy or JSON Path path to a JSON Pointer to make a subsequent lookup.

Specifically in PathType the following was introduced:

    public String convertToJsonPointer(String path) {
        switch (this) {
            case JSON_POINTER: return path;
            default: return fromLegacyOrJsonPath(path);
        }
    }

    static String fromLegacyOrJsonPath(String path) {
        return path
                .replace("\"", "")
                .replace("]", "")
                .replace('[', '/')
                .replace('.', '/')
                .replace("$", "");
    }

That is used in UnEvaluatedPropertiesValidator:

    public Set<ValidationMessage> validate(JsonNode node, JsonNode rootNode, String at) {
       ...
        unevaluatedPaths.forEach(path -> {
            String pointer = getPathType().convertToJsonPointer(path);
            JsonNode property = rootNode.at(pointer);
            if (!schema.validate(property, rootNode, path).isEmpty()) {
                failingPaths.add(path);
            }
        });
       ...
    }

The fromLegacyOrJsonPath method of PathType is oversimplified and assumes that there are no characters with potentially special meaning in the property names being looked up. For example a property including /, ~, ", [, ], . or $ would end up with an invalid JSON pointer that would not throw an error but would never end up matching anything. In addition, as it stands, the default path approach is the legacy "JSON-Path-like" approach which may be invalid as JSON Path to begin with.

What are your thoughts @fdutton ?

@fdutton
Copy link
Contributor

fdutton commented May 13, 2023

@costas80 Good catch! There are four approaches to solving this issue and none are perfect.

  1. Replace the use of String to represent a path with an abstraction that can be converted into the three existing representations: LEGACY, JSON_POINTER and JSON_PATH. This would be the most efficient in terms of performance but breaks the JsonSchemaWalker contract. One possible abstraction would be to use Jackson's JsonPointer since it is already present or create one optimized for this use-case.
  2. Include Jayway's json-path as a dependency (not ideal) and use that for the lookup when appropriate. In other words, we would use Jackson's support for JSON Pointer when PathType is JSON_POINTER and Jayway when PathType is JSON_PATH. LEGACY poses a problem because it is path-like but suffers the same issue with unusual property names.
  3. Write a proper JSON Path parser, which could affect performance and may still suffer from this issue when using LEGACY.
  4. Drop support for LEGACY and JSON_PATH. We would keep these enum definitions but ignore them to maintain API and binary compatibility but this would break clients that parse the output.

I'm going to start by writing a parser and testing its effect on performance. For 2.0, I prefer using an abstraction.

@stevehu What do you think?

@costas80
Copy link
Contributor Author

I'm going to start by writing a parser and testing its effect on performance. For 2.0, I prefer using an abstraction.

I agree. An abstraction would be best but currently the parser approach would be the way to go. I would also drop already LEGACY as it is essentially close to JSON Path but possibly incorrect, and switch the default approach to JSON_PATH which is very close. Wouldn't it be sufficient to simply mention this in release notes for people upgrading? Alternatively we could keep LEGACY for now but mark it as deprecated and treat it simply as an alias to JSON_PATH. In the end what would be important for the parser would be to only have to focus on valid JSON Path.

@stevehu
Copy link
Contributor

stevehu commented May 15, 2023

Sorry. I was away from my desktop for several days. I agree we can drop the LEGACY as it still needs to be fully implemented. We can increase the minor number to indicate this validator might break with the upgrade. Thanks.

stevehu pushed a commit that referenced this issue May 15, 2023
Resolves #752

Co-authored-by: Faron Dutton <faron.dutton@insightglobal.com>
costas80 added a commit to costas80/json-schema-validator that referenced this issue May 15, 2023
@costas80
Copy link
Contributor Author

costas80 commented May 15, 2023

Hi @fdutton , I was checking the PR you made on this and noticed a few problems. First of all there was an obvious issue in the handling of single quotes (a dead store) that I fixed in #755 .

There are however other issues I found in the parser that are more complex and should be rechecked. I'm adding below a series of unit test assertions that fail given the new parsing:

// Gives "/foobar". The '.' is not handled correctly.
assertEquals("/foo/bar", PathType.JSON_PATH.convertToJsonPointer("$.foo.bar"));
// Gives "/foo~bar". The '~' needs to be escaped as '~0'.
assertEquals("/foo~0bar", PathType.JSON_PATH.convertToJsonPointer("$['foo~bar']"));
// Gives "/foo/bar". The '/' needs to be escaped as '~1'.
assertEquals("/foo~1bar", PathType.JSON_PATH.convertToJsonPointer("$['foo/bar']"));
// Gives "/foo'bar']". The '[' is not handled correctly.
assertEquals("/foo/bar", PathType.JSON_PATH.convertToJsonPointer("$.foo['bar']"));

Note that the above don't result from an exhaustive test. I only tested with the obvious special characters from the JSON Pointer spec (~ and /), and the characters that drive the new parsing in PathType ([ and .).

Also, I would again suggest that LEGACY is marked as deprecated and set as an alias for JSON_PATH. Also the DEFAULT should be set to JSON_PATH. As it stands if a user of the library does not specify a type of desired path, she will get LEGACY which may well fail when checking unevaluatedProperties. Otherwise the fromLegacy method would need also a big update if it is to handle everything correctly.

stevehu pushed a commit that referenced this issue May 15, 2023
Co-authored-by: simatosc <costas.simatos@sigmacubed.eu>
@costas80
Copy link
Contributor Author

Hey @fdutton, for information I did an update on PathType (PR #757) to address the handling of additional special characters (see latest comments on #750). In case you're updating this to fix the parsing issue I mentioned you should rebase your local branch to get the latest code (once the PR merged).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants