From a69f70e39c2d1ec62c92b3972c8a9827b1839b41 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Wed, 22 May 2024 21:24:54 +1200 Subject: [PATCH 01/12] remove vocabularies from core --- jsonschema-core.md | 543 +++++++++------------------------------------ 1 file changed, 106 insertions(+), 437 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 29dc745d..b817d9dc 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -29,12 +29,11 @@ and interaction control of JSON data. This specification defines JSON Schema core terminology and mechanisms, including pointing to another JSON Schema by reference, dereferencing a JSON -Schema reference, specifying the dialect being used, specifying a dialect's -vocabulary requirements, and defining terms. +Schema reference, specifying the dialect being used, and defining terms. -Other specifications define the vocabularies that perform assertions about -validation, linking, annotation, navigation, and interaction as well as output -formats. +Other specifications define keywords that perform assertions about validation, +linking, annotation, navigation, interaction, as well as other related concepts +such as output formats. ## Conventions and Terminology @@ -69,23 +68,22 @@ JSON Schema can be extended either by defining additional vocabularies, or less formally by defining additional keywords outside of any vocabulary. Unrecognized individual keywords are not supported. -This document defines a core vocabulary that MUST be supported by any -implementation, and cannot be disabled. Its keywords are each prefixed with a -"$" character to emphasize their required nature. This vocabulary is essential -to the functioning of the `application/schema+json` media type, and is used to -bootstrap the loading of other vocabularies. +This document defines a set of core keywords that MUST be supported by any +implementation, and cannot be disabled. These keywords are each prefixed with a +"$" character to emphasize their required nature. These keywords are essential +to the functioning of the `application/schema+json` media type. -Additionally, this document defines a RECOMMENDED vocabulary of keywords for +Additionally, this document defines a RECOMMENDED set of keywords for applying subschemas conditionally, and for applying subschemas to the contents -of objects and arrays. Either this vocabulary or one very much like it is +of objects and arrays. These keywords, or a set very much like them, are required to write schemas for non-trivial JSON instances, whether those schemas are intended for assertion validation, annotation, or both. While not part of -the required core vocabulary, for maximum interoperability this additional -vocabulary is included in this document and its use is strongly encouraged. +the required core set, for maximum interoperability this additional +set is included in this document and its use is strongly encouraged. -Further vocabularies for purposes such as structural validation or hypermedia +Further keywords for purposes such as structural validation or hypermedia annotation are defined in other documents. These other documents each define a -dialect collecting the standard sets of vocabularies needed to write schemas for +dialect collecting the standard sets of keywords needed to write schemas for that document's purpose. ## Definitions @@ -132,20 +130,20 @@ depending on the type: Whitespace and formatting concerns, including different lexical representations of numbers that are equal within the data model, are thus outside the scope of -JSON Schema. JSON Schema [vocabularies](#vocabulary) that wish to work with such -differences in lexical representations SHOULD define keywords to precisely -interpret formatted strings within the data model rather than relying on having -the original JSON representation Unicode characters available. +JSON Schema. Extensions to JSON Schema that wish to work with such differences +in lexical representations SHOULD define keywords to precisely interpret +formatted strings within the data model rather than relying on having the +original JSON representation Unicode characters available. Since an object cannot have two properties with the same key, behavior for a JSON document that tries to define two properties with the same key in a single object is undefined. -Note that JSON Schema vocabularies are free to define their own extended type +Note that JSON Schema extensions are free to define their own extended type system. This should not be confused with the core data model types defined here. -As an example, "integer" is a reasonable type for a vocabulary to define as a -value for a keyword, but the data model makes no distinction between integers -and other numbers. +As an example, "integer" is a reasonable type to define as a value for a +keyword, but the data model makes no distinction between integers and other +numbers. #### Instance Equality @@ -175,7 +173,7 @@ where an instance may be outside any of the six JSON data types. In this case, annotations still apply; but most validation keywords will not be useful, as they will always pass or always fail. -A custom vocabulary may define support for a superset of the core data model. +An extension may define support for a superset of the core data model. The schema itself may only be expressible in this superset; for example, to make use of the `const` keyword. @@ -228,8 +226,7 @@ never produce annotation results. These boolean schemas exist to clarify schema author intent and facilitate schema processing optimizations. They behave identically to the following schema -objects (where `not` is part of the subschema application vocabulary defined in -this document). +objects (where `not` is defined in [later this document](#not)). - `true`: Always passes validation, as if the empty schema `{}` - `false`: Always fails validation, as if the schema `{ "not": {} }` @@ -238,37 +235,11 @@ While the empty schema object is unambiguous, there are many possible equivalents to the `false` schema. Using the boolean values ensures that the intent is clear to both human readers and implementations. -#### Schema Vocabularies - -A schema vocabulary, or simply a vocabulary, is a set of keywords, their syntax, -and their semantics. A vocabulary is generally organized around a particular -purpose. Different uses of JSON Schema, such as validation, hypermedia, or user -interface generation, will involve different sets of vocabularies. - -Vocabularies are the primary unit of re-use in JSON Schema, as schema authors -can indicate what vocabularies are required or optional in order to process the -schema. Since vocabularies are identified by IRIs in the meta-schema, generic -implementations can load extensions to support previously unknown vocabularies. -While keywords can be supported outside of any vocabulary, there is no analogous -mechanism to indicate individual keyword usage. - -A schema vocabulary can be defined by anything from an informal description to a -standards proposal, depending on the audience and interoperability expectations. -In particular, in order to facilitate vocabulary use within non-public -organizations, a vocabulary specification need not be published outside of its -scope of use. - #### Meta-Schemas A schema that itself describes a schema is called a meta-schema. Meta-schemas -are used to validate JSON Schemas and specify which vocabularies they are using. - -Typically, a meta-schema will specify a set of vocabularies, and validate -schemas that conform to the syntax of those vocabularies. However, meta-schemas -and vocabularies are separate in order to allow meta-schemas to validate schema -conformance more strictly or more loosely than the vocabularies' specifications -call for. Meta-schemas may also describe and validate additional keywords that -are not part of a formal vocabulary. +are used to validate JSON Schemas and specify the set of keywords those schemas +are using. #### Root Schema and Subschemas and Resources {#root} @@ -415,14 +386,13 @@ neither at the beginning nor at the end. This means, for instance, the pattern ### Extending JSON Schema {#extending} -Additional schema keywords and schema vocabularies MAY be defined by any entity. -Save for explicit agreement, schema authors SHALL NOT expect these additional -keywords and vocabularies to be supported by implementations that do not -explicitly document such support. +Additional schema keywords MAY be defined by any entity. Save for explicit +agreement, schema authors SHALL NOT expect these additional keywords to be +supported by implementations that do not explicitly document such support. Implementations MAY provide the ability to register or load handlers for -vocabularies that they do not support directly. The exact mechanism for -registering and implementing such handlers is implementation-dependent. +keywords that they do not support directly. The exact mechanism for registering +and implementing such handlers is implementation-dependent. #### Implicit annotation keywords {#implicit-annotations} @@ -434,7 +404,7 @@ Implicit annotation keywords MUST NOT affect evaluation of a schema in any way other than annotation collection. Consequently, the "x-" prefix is reserved for this purpose, and extension -vocabularies MUST NOT define any keywords which begin with this prefix. +keywords MUST NOT begin with this prefix. #### Handling of unrecognized or unsupported keywords {#unrecognized} @@ -563,19 +533,19 @@ in this document. Note that when no such alternate approach is possible for a keyword, implementations that do not support annotation collections will not be able to -support those keywords or vocabularies that contain them. +support those keywords. ### Identifiers Identifiers define IRIs for a schema, or affect how such IRIs are resolved in -[references](#referenced), or both. The Core vocabulary defined in this document -defines several identifying keywords, most notably `$id`. +[references](#referenced), or both. This document defines several identifying +keywords, most notably `$id`. Canonical schema IRIs MUST NOT change while processing an instance, but keywords that affect IRI-reference resolution MAY have behavior that is only fully determined at runtime. -While custom identifier keywords are possible, vocabulary designers should take +While custom identifier keywords are possible, extension designers should take care not to disrupt the functioning of core keywords. For example, the `$dynamicAnchor` keyword in this specification limits its IRI resolution effects to the matching `$dynamicRef` keyword, leaving the behavior of `$ref` @@ -641,16 +611,16 @@ Most assertions only constrain values within a certain primitive type. When the type of the instance is not of the type targeted by the keyword, the instance is considered to conform to the assertion. -For example, the `maxLength` keyword from the companion [validation -vocabulary](#json-schema-validation): will only restrict certain strings (that +For example, the `maxLength` keyword will only restrict certain strings (that are too long) from being valid. If the instance is a number, boolean, null, array, or object, then it is valid against this assertion. This behavior allows keywords to be used more easily with instances that can be -of multiple primitive types. The companion validation vocabulary also includes a -`type` keyword which can independently restrict the instance to one or more -primitive types. This allows for a concise expression of use cases such as a -function that might return either a string of a certain length or a null value: +of multiple primitive types. The companion Validation specification also +includes a `type` keyword which can independently restrict the instance to one +or more primitive types. This allows for a concise expression of use cases such +as a function that might return either a string of a certain length or a null +value: ```jsonschema { @@ -824,14 +794,19 @@ assertions. A fourth category of keywords simply reserve a location to hold re-usable components or data of interest to schema authors that is not suitable for re-use. These keywords do not affect validation or annotation results. Their -purpose in the core vocabulary is to ensure that locations are available for -certain purposes and will not be redefined by extension keywords. +purpose is to ensure that locations are available for certain purposes and will +not be redefined by extension keywords. + +While these keywords do not directly affect results, as explained in +{{non-schemas}} unrecognized extension keywords that reserve locations for +re-usable schemas may have undesirable interactions with references in certain +circumstances. ### Loading Instance Data -While none of the vocabularies defined as part of this or the associated -documents define a keyword which may target and/or load instance data, it is -possible that other vocabularies may wish to do so. +While none of the keywords defined as part of this or the associated +documents define a keyword which target and/or load instance data, it is +possible that extensions may wish to do so. Keywords MAY be defined to use JSON Pointers or Relative JSON Pointers to examine parts of an instance outside the current evaluation location. @@ -839,64 +814,31 @@ examine parts of an instance outside the current evaluation location. Keywords that allow adjusting the location using a Relative JSON Pointer SHOULD default to using the current location if a default is desireable. -## The JSON Schema Core Vocabulary {#core} - -Keywords declared in this section, which all begin with "$", make up the JSON -Schema Core vocabulary. These keywords are either required in order to process -any schema or meta-schema, including those split across multiple documents, or -exist to reserve keywords for purposes that require guaranteed interoperability. - -The Core vocabulary MUST be considered mandatory at all times, in order to -bootstrap the processing of further vocabularies. Meta-schemas that use the -[`$vocabulary`](#vocabulary) keyword to declare the vocabularies in use MUST -explicitly list the Core vocabulary, which MUST have a value of true indicating -that it is required. - -The behavior of a false value for this vocabulary (and only this vocabulary) is -undefined, as is the behavior when `$vocabulary` is present but the Core -vocabulary is not included. However, it is RECOMMENDED that implementations -detect these cases and raise an error when they occur. It is not meaningful to -declare that a meta-schema optionally uses Core. - -Meta-schemas that do not use `$vocabulary` MUST be considered to require the -Core vocabulary as if its IRI were present with a value of true. - -The current IRI for the Core vocabulary is: -`https://json-schema.org/draft/next/vocab/core`. +## The JSON Schema Core Keywords {#core} -The current IRI for the corresponding meta-schema is: -`https://json-schema.org/draft/next/meta/core`. +Keywords declared in this section, which all begin with "$", are essential to +processing JSON Schema. These keywords inform implementations how to process any +schema or meta-schema, including those split across multiple documents, or exist +to reserve keywords for purposes that require guaranteed interoperability. -The "$" prefix is reserved for use by the Core vocabulary. Vocabulary extensions -MUST NOT define new keywords that begin with "$". +Support for these keywords MUST be considered mandatory at all times in order to +bootstrap the processing of further keywords. -### Meta-Schemas and Vocabularies {#vocabulary} +The "$" prefix is reserved for use by this specification. Extensions MUST NOT +define new keywords that begin with "$". -Two concepts, meta-schemas and vocabularies, are used to inform an -implementation how to interpret a schema. Every schema has a meta-schema, which -can be declared using the `$schema` keyword. +### Meta-Schemas -The meta-schema serves two purposes: +Meta-schemas are used to inform an implementation how to interpret a schema. +Every schema has a meta-schema, which can be explicitly declared using the +`$schema` keyword. -Declaring the vocabularies in use: The `$vocabulary` keyword, when it appears in -a meta-schema, declares which vocabularies are available to be used in schemas -that refer to that meta-schema. Vocabularies define keyword semantics, as well -as their general syntax. By combining various vocabularies, distinct -sets of keywords can be made available for use in a schema. This collection of -vocabularies defines a dialect. - -Describing valid schema syntax: A schema MUST successfully validate against its -meta-schema, which constrains the syntax of the available keywords. The syntax -described is expected to be compatible with the vocabularies declared; while it -is possible to describe an incompatible syntax, such a meta-schema would be -unlikely to be useful. - -Meta-schemas are separate from vocabularies to allow for vocabularies to be -combined in different ways, and for meta-schema authors to impose additional -constraints such as forbidding certain keywords, or performing unusually strict -syntactical validation, as might be done during a development and testing cycle. -Each vocabulary typically identifies a meta-schema consisting only of the -vocabulary's keywords. +The meta-schema serves to describe valid schema syntax. A schema MUST +successfully validate against its meta-schema, which constrains the syntax of +the available keywords. The syntax described for a given keyword is expected to +be compatible with the document which defines the keyword; while it is possible +to describe an incompatible syntax, such a meta-schema would be unlikely to be +useful. Meta-schema authoring is an advanced usage of JSON Schema, so the design of meta-schema features emphasizes flexibility over simplicity. @@ -926,7 +868,7 @@ steps. (Note that steps 2 and 3 are mutually exclusive.) If the dialect is not specified through one of these methods, the implementation -MUST refuse to process the schema, as with unsupported required vocabularies. +MUST refuse to process the schema. #### The `$schema` Keyword {#keyword-schema} @@ -950,121 +892,6 @@ keyword appears in a non-resource root schema object, the behavior is undefined. Values for this property are defined elsewhere in this and other documents, and by other parties. -#### The `$vocabulary` Keyword - -The `$vocabulary` keyword is used in meta-schemas to identify the vocabularies -available for use in schemas described by that meta-schema, and whether each -vocabulary is required or optional. Together, this information forms a dialect. - -The value of this keyword MUST be an object. The property names in the object -MUST be IRIs (containing a scheme) and each IRI MUST be normalized. Each IRI -that appears as a property name identifies a specific set of keywords and their -semantics. - -The IRI MAY be a URL, but the nature of the retrievable resource is currently -undefined, and reserved for future use. Vocabulary authors MAY use the URL of -the vocabulary specification, in a human-readable media type such as `text/html` -or `text/plain`, as the vocabulary IRI.[^2] - -[^2]: Vocabulary documents may be added in forthcoming drafts. For now, -identifying the keyword set is deemed sufficient as that, along with meta-schema -validation, is how the current "vocabularies" work today. Any future vocabulary -document format will be specified as a JSON document, so using `text/html` or -other non-JSON formats in the meantime will not produce any future ambiguity. - -The values of the object properties MUST be booleans. If the value is true, then -the vocabulary MUST be considered to be required. If the value is false, then -the vocabulary MUST be considered to be optional. - -##### Required, optional, and omitted vocabularies - -A schema is said to use a dialect and its constituent vocabularies if it is -associated with a meta-schema defining the dialect with `$vocabulary`, either -through `$schema`, through appropriately defined media type parameters or link -relation types, or through documented default implementation-defined behavior in -the absence of an explicit meta-schema. If a meta-schema does not contain -`$vocabulary`, the set of vocabularies in use is determined according to -{{default-vocabs}}. - -Any vocabulary in use by a schema and understood by the implementation MUST be -processed in a manner consistent with the semantic definitions contained within -the vocabulary, regardless of whether that vocabulary is required or optional. - -Any vocabulary that is not present in `$vocabulary` MUST NOT be made available -for use in schemas described by that meta-schema, except for the core vocabulary -as specified by the introduction to {{core}}. - -Implementations that do not support a vocabulary required by a schema MUST -refuse to process that schema. - -Implementations that do not support a vocabulary that is optionally used by a -schema SHOULD proceed with processing the schema. The keywords will be -considered to be unrecognized keywords as addressed by {{unrecognized}}. - -##### Vocabularies are schema resource-scoped - -The `$vocabulary` keyword SHOULD be used in the root schema of any schema -resource intended for use as a meta-schema. It MUST NOT appear in subschemas. - -The `$vocabulary` keyword MUST be ignored in schema resources that are not being -processed as a meta-schema. This allows validating a meta-schema M against its -own meta-schema M' without requiring the validator to understand the -vocabularies declared by M. - -##### Vocabulary and non-vocabulary keywords - -Keywords from different vocabularies, as well as non-vocabulary extension -keywords, can have identical names. These are not considered to be the same -keyword from the perspective of enabling or disabling them through -`$vocabulary`. - -In particular the keywords defined in this specification and its companion -documents MUST be considered to be vocabulary keywords, with availability -governed by `$vocabulary` even in implementations that do not support any -extension vocabularies. - -Guidance regarding vocabularies with identically-named keywords is provided in -{{vocab-practices}}. - -##### Default vocabularies {#default-vocabs} - -If `$vocabulary` is absent, an implementation MAY determine behavior based on -the meta-schema if it is recognized from the IRI value of the referring schema's -`$schema` keyword. This is how behavior (such as Hyper-Schema usage) has been -recognized prior to the existence of vocabularies. - -If the meta-schema, as referenced by the schema, is not recognized, or is -missing, then the behavior is implementation-defined. If the implementation -proceeds with processing the schema, it MUST assume the use of the core -vocabulary. If the implementation is built for a specific purpose, then it -SHOULD assume the use of all of the most relevant vocabularies for that purpose. - -For example, an implementation that is a validator SHOULD assume the use of all -vocabularies in this specification and the companion Validation specification. - -##### Non-inheritability of vocabularies - -Note that the processing restrictions on `$vocabulary` mean that meta-schemas -that reference other meta-schemas using `$ref` or similar keywords do not -automatically inherit the vocabulary declarations of those other meta-schemas. -All such declarations must be repeated in the root of each schema document -intended for use as a meta-schema. This is demonstrated in [the example -meta-schema](#example-meta-schema).[^3] - -[^3]: This requirement allows implementations to find all vocabulary requirement -information in a single place for each meta-schema. As schema extensibility -means that there are endless potential ways to combine more fine-grained -meta-schemas by reference, requiring implementations to anticipate all -possibilities and search for vocabularies in referenced meta-schemas would be -overly burdensome. - -#### Updates to Meta-Schema and Vocabulary IRIs - -Updated vocabulary and meta-schema IRIs MAY be published between specification -drafts in order to correct errors. Implementations SHOULD consider IRIs dated -after this specification draft and before the next to indicate the same syntax -and semantics as those listed here. - ### Base IRI, Anchors, and Dereferencing To differentiate between schemas in a vast ecosystem, schemas are identified by @@ -1244,11 +1071,6 @@ this string to end users. Tools for editing schemas SHOULD support displaying and editing this keyword. The value of this keyword MAY be used in debug or error output which is intended for developers making use of schemas. -Schema vocabularies SHOULD allow `$comment` within any object containing -vocabulary keywords. Implementations MAY assume `$comment` is allowed unless the -vocabulary specifically forbids it. Vocabularies MUST NOT specify any effect of -`$comment` beyond what is described in this specification. - Tools that translate other media types or programming languages to and from `application/schema+json` MAY choose to convert that media type or programming language's native comments to or from `$comment` values. The behavior of such @@ -1326,9 +1148,8 @@ processed both ways in the course of one session. Implementations MAY allow a schema to be explicitly passed as a meta-schema, for implementation-specific purposes, such as pre-loading a commonly used -meta-schema and checking its vocabulary support requirements up front. -Meta-schema authors MUST NOT expect such features to be interoperable across -implementations. +meta-schema and checking its requirements up front. Meta-schema authors MUST NOT +expect such features to be interoperable across implementations. ### Dereferencing @@ -1478,7 +1299,7 @@ the same document to ease transportation. Each embedded Schema Resource MUST be treated as an individual Schema Resource, following standard schema loading and processing requirements, including -determining vocabulary support. +determining keyword support. #### Bundling @@ -1560,10 +1381,25 @@ recursive nesting like this; the behavior is undefined. #### References to Possible Non-Schemas {#non-schemas} Subschema objects (or booleans) are recognized by their use with known -applicator keywords or with location-reserving keywords such as [`$defs`](#defs) -that take one or more subschemas as a value. These keywords may be `$defs` and -the standard applicators from this document, or extension keywords from a known -vocabulary, or implementation-specific custom keywords. +applicator keywords or with location-reserving keywords such as +[`$defs`](#defs) that take one or more subschemas as a value. These keywords may +be `$defs` and the standard applicators from this document or +implementation-specific custom keywords. + +Multi-level structures of unknown keywords are capable of introducing nested +subschemas, which would be subject to the processing rules for `$id`. Therefore, +having a reference target in such an unrecognized structure cannot be reliably +implemented, and the resulting behavior is undefined. Similarly, a reference +target under a known keyword, for which the value is known not to be a schema, +results in undefined behavior in order to avoid burdening implementations with +the need to detect such targets.[^10] + +[^10]: These scenarios are analogous to fetching a schema over HTTP but +receiving a response with a Content-Type other than `application/schema+json`. +An implementation can certainly try to interpret it as a schema, but the origin +server offered no guarantee that it actually is any such thing. Therefore, +interpreting it as such has security implication and may produce unpredictable +results. Note that single-level custom keywords with identical syntax and semantics to `$defs` do not allow for any intervening `$id` keywords, and therefore will @@ -1647,27 +1483,17 @@ User-Agent: product-name/5.4.1 so-cool-json-schema/1.0.2 curl/7.43.0 Clients SHOULD be able to make requests with a "From" header so that server operators can contact the owner of a potentially misbehaving script. -## A Vocabulary for Applying Subschemas {#applicatorvocab} - -This section defines a vocabulary of applicator keywords that are RECOMMENDED -for use as the basis of other vocabularies. - -Meta-schemas that do not use `$vocabulary` SHOULD be considered to require this -vocabulary as if its IRI were present with a value of true. +## Keywords for Applying Subschemas -The current IRI for this vocabulary, known as the Applicator vocabulary, is: -`https://json-schema.org/draft/next/vocab/applicator`. - -The current IRI for the corresponding meta-schema is: -`https://json-schema.org/draft/next/meta/applicator`. +This section defines a set of keywords that enable schema combinations and +composition. ### Keyword Independence Schema keywords typically operate independently, without affecting each other's outcomes. -For schema author convenience, there are some exceptions among the keywords in -this vocabulary: +For schema author convenience, there are some exceptions among these keywords: - `additionalProperties`, whose behavior is defined in terms of `properties` and `patternProperties` @@ -1839,8 +1665,7 @@ keyword. If the `items` subschema is applied to any positions within the instance array, it produces an annotation result of boolean true, indicating that all remaining array elements have been evaluated against this keyword's subschema. This -annotation affects the behavior of `unevaluatedItems` in the Unevaluated -vocabulary. +annotation affects the behavior of `unevaluatedItems`. Omitting this keyword has the same assertion behavior as an empty schema. @@ -1862,8 +1687,7 @@ validates against the corresponding schema. The annotation result of this keyword is the set of instance property names which are also present under this keyword. This annotation affects the behavior -of `additionalProperties` (in this vocabulary) and `unevaluatedProperties` in -the Unevaluated vocabulary. +of `additionalProperties` and `unevaluatedProperties`. Omitting this keyword has the same assertion behavior as an empty object. @@ -1882,8 +1706,7 @@ not implicitly anchored. The annotation result of this keyword is the set of instance property names matched by at least one property under this keyword. This annotation affects the -behavior of `additionalProperties` (in this vocabulary) and -`unevaluatedProperties` (in the Unevaluated vocabulary). +behavior of `additionalProperties` and `unevaluatedProperties`. Omitting this keyword has the same assertion behavior as an empty object. @@ -1902,7 +1725,7 @@ against the `additionalProperties` schema. The annotation result of this keyword is the set of instance property names validated by this keyword's subschema. This annotation affects the behavior of -`unevaluatedProperties` in the Unevaluated vocabulary. +`unevaluatedProperties`. Omitting this keyword has the same assertion behavior as an empty schema. @@ -1986,14 +1809,13 @@ successfully when applied to every index of the instance. The annotation MUST be present if the instance array to which this keyword's schema applies is empty. -This annotation affects the behavior of `unevaluatedItems` in the Unevaluated -vocabulary. +This annotation affects the behavior of `unevaluatedItems`. The subschema MUST be applied to every array element even after the first match has been found, in order to collect annotations for use by other keywords. This is to ensure that all possible annotations are collected. -## A Vocabulary for Unevaluated Locations +## Keywords for Unevaluated Locations The purpose of these keywords is to enable schema authors to apply subschemas to array items or object properties that have not been successfully evaluated @@ -2020,19 +1842,10 @@ subschemas. The behavior of these keywords depend on the annotation results of adjacent keywords that apply to the instance location being validated. -Meta-schemas that do not use `$vocabulary` SHOULD be considered to require this -vocabulary as if its IRI were present with a value of true. - -The current IRI for this vocabulary, known as the Unevaluated Applicator -vocabulary, is: `https://json-schema.org/draft/next/vocab/unevaluated`. - -The current IRI for the corresponding meta-schema is: -`https://json-schema.org/draft/next/meta/unevaluated`. - ### Keyword Independence Schema keywords typically operate independently, without affecting each other's -outcomes. However, the keywords in this vocabulary are notable exceptions: +outcomes. However, these keywords are notable exceptions: - `unevaluatedItems`, whose behavior is defined in terms of annotations from `prefixItems`, `items`, `contains`, and itself @@ -2209,7 +2022,7 @@ Servers MUST ensure that malicious parties cannot change the functionality of existing schemas by uploading a schema with a pre-existing or very similar `$id`. -Individual JSON Schema vocabularies are liable to also have their own security +Individual JSON Schema extensions are liable to also have their own security considerations. Consult the respective specifications for more information. Schema authors should take care with `$comment` contents, as a malicious @@ -2588,150 +2401,6 @@ of the node schema objects were moved under `$defs`. It is the matching `$dynamicAnchor` values which tell us how to resolve the dynamic reference, not any sort of correlation in JSON structure. -## [Appendix] Working with vocabularies - -### Best practices for vocabulary and meta-schema authors {#vocab-practices} - -Vocabulary authors should take care to avoid keyword name collisions if the -vocabulary is intended for broad use, and potentially combined with other -vocabularies. JSON Schema does not provide any formal namespacing system, but -also does not constrain keyword names, allowing for any number of namespacing -approaches. - -Vocabularies may build on each other, such as by defining the behavior of their -keywords with respect to the behavior of keywords from another vocabulary, or by -using a keyword from another vocabulary with a restricted or expanded set of -acceptable values. Not all such vocabulary re-use will result in a new -vocabulary that is compatible with the vocabulary on which it is built. -Vocabulary authors should clearly document what level of compatibility, if any, -is expected. - -Meta-schema authors should not use `$vocabulary` to combine multiple -vocabularies that define conflicting syntax or semantics for the same keyword. -As semantic conflicts are not generally detectable through schema validation, -implementations are not expected to detect such conflicts. If conflicting -vocabularies are declared, the resulting behavior is undefined. - -Vocabulary authors SHOULD provide a meta-schema that validates the expected -usage of the vocabulary's keywords on their own. Such meta-schemas SHOULD not -forbid additional keywords, and MUST not forbid any keywords from the Core -vocabulary. - -It is recommended that meta-schema authors reference each vocabulary's -meta-schema using the [`allOf`](#allof) keyword, although other mechanisms for -constructing the meta-schema may be appropriate for certain use cases. - -The recursive nature of meta-schemas makes the `$dynamicAnchor` and -`$dynamicRef` keywords particularly useful for extending existing meta-schemas, -as can be seen in the JSON Hyper-Schema meta-schema which extends the Validation -meta-schema. - -Meta-schemas may impose additional constraints, including describing keywords -not present in any vocabulary, beyond what the meta-schemas associated with the -declared vocabularies describe. This allows for restricting usage to a subset of -a vocabulary, and for validating locally defined keywords not intended for -re-use. - -However, meta-schemas should not contradict any vocabularies that they declare, -such as by requiring a different JSON type than the vocabulary expects. The -resulting behavior is undefined. - -Meta-schemas intended for local use, with no need to test for vocabulary support -in arbitrary implementations, can safely omit `$vocabulary` entirely. - -### Example meta-schema with vocabulary declarations {#example-meta-schema} - -This meta-schema explicitly declares both the Core and Applicator vocabularies, -together with an extension vocabulary, and combines their meta-schemas with an -`allOf`. The extension vocabulary's meta-schema, which describes only the -keywords in that vocabulary, is shown after the main example meta-schema. - -The main example meta-schema also restricts the usage of the Unevaluated -vocabulary by forbidding the keywords prefixed with "unevaluated", which are -particularly complex to implement. This does not change the semantics or set of -keywords defined by the other vocabularies. It just ensures that schemas using -this meta-schema that attempt to use the keywords prefixed with "unevaluated" -will fail validation against this meta-schema. - -Finally, this meta-schema describes the syntax of a keyword, "localKeyword", -that is not part of any vocabulary. Presumably, the implementors and users of -this meta-schema will understand the semantics of "localKeyword". JSON Schema -does not define any mechanism for expressing keyword semantics outside of -vocabularies, making them unsuitable for use except in a specific environment in -which they are understood. - -This meta-schema combines several vocabularies for general use. - -```jsonschema -{ - "$schema": "https://json-schema.org/draft/next/schema", - "$id": "https://example.com/meta/general-use-example", - "$dynamicAnchor": "meta", - "$vocabulary": { - "https://json-schema.org/draft/next/vocab/core": true, - "https://json-schema.org/draft/next/vocab/applicator": true, - "https://json-schema.org/draft/next/vocab/validation": true, - "https://example.com/vocab/example-vocab": true - }, - "allOf": [ - {"$ref": "https://json-schema.org/draft/next/meta/core"}, - {"$ref": "https://json-schema.org/draft/next/meta/applicator"}, - {"$ref": "https://json-schema.org/draft/next/meta/validation"}, - {"$ref": "https://example.com/meta/example-vocab"}, - ], - "patternProperties": { - "^unevaluated": false - }, - "properties": { - "localKeyword": { - "$comment": "Not in vocabulary, but validated if used", - "type": "string" - } - } -} -``` - -This meta-schema describes only a single extension vocabulary. - -```jsonschema -{ - "$schema": "https://json-schema.org/draft/next/schema", - "$id": "https://example.com/meta/example-vocab", - "$dynamicAnchor": "meta", - "$vocabulary": { - "https://example.com/vocab/example-vocab": true, - }, - "type": ["object", "boolean"], - "properties": { - "minDate": { - "type": "string", - "pattern": "\\d\\d\\d\\d-\\d\\d-\\d\\d", - "format": "date", - } - } -} -``` - -As shown above, even though each of the single-vocabulary meta-schemas -referenced in the general-use meta-schema's `allOf` declares its corresponding -vocabulary, this new meta-schema must re-declare them. - -The standard meta-schemas that combine all vocabularies defined by the Core and -Validation specification, and that combine all vocabularies defined by those -specifications as well as the Hyper-Schema specification, demonstrate additional -complex combinations. These IRIs for these meta-schemas may be found in the -Validation and Hyper-Schema specifications, respectively. - -While the general-use meta-schema can validate the syntax of `minDate`, it is -the vocabulary that defines the logic behind the semantic meaning of `minDate`. -Without an understanding of the semantics (in this example, that the instance -value must be a date equal to or after the date provided as the keyword's value -in the schema), an implementation can only validate the syntactic usage. In this -case, that means validating that it is a date-formatted string (using `pattern` -to ensure that it is validated even when `format` functions purely as an -annotation, as explained in the [Validation -specification](#json-schema-validation). - ## [Appendix] References and generative use cases While the presence of references is expected to be transparent to validation From 3cd9d3a3ceca3f62279e471924cd1b962493998c Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Thu, 23 May 2024 09:23:15 +1200 Subject: [PATCH 02/12] remove vocabulary usages from validation --- jsonschema-validation.md | 228 ++++++++++----------------------------- 1 file changed, 54 insertions(+), 174 deletions(-) diff --git a/jsonschema-validation.md b/jsonschema-validation.md index 02d8b8e5..78c53f91 100644 --- a/jsonschema-validation.md +++ b/jsonschema-validation.md @@ -55,11 +55,11 @@ which it applies. This greatly simplifies the implementation requirements for validators by ensuring that they do not need to maintain state across the document-wide validation process. -This specification defines a set of assertion keywords, as well as a small -vocabulary of metadata keywords that can be used to annotate the JSON instance -with useful information. The {{format}} keyword is intended primarily as an -annotation, but can optionally be used as an assertion. The {{content}} keywords -are annotations for working with documents embedded as JSON strings. +This specification defines a set of assertion keywords, as well as a number of +metadata keywords that can be used to annotate the JSON instance with useful +information. The {{format}} keyword is intended primarily as an annotation, but +can optionally be used as an assertion. The {{content}} keywords are annotations +for working with documents embedded as JSON strings. ## Interoperability Considerations @@ -87,32 +87,21 @@ regular expressions in the [JSON Schema Core](#json-schema) specification. The current IRI for the default JSON Schema dialect meta-schema is `https://json-schema.org/draft/next/schema`. For schema author convenience, this -meta-schema describes a dialect consisting of all vocabularies defined in this -specification and the JSON Schema Core specification, as well as two former -keywords which are reserved for a transitional period. Individual vocabulary and -vocabulary meta-schema IRIs are given for each section below. Certain -vocabularies are optional to support, which is explained in detail in the -relevant sections. +meta-schema describes a dialect consisting of all keywords defined in this +specification and the JSON Schema Core specification. Certain keywords specify +some functionality which is optional to support and is explained in detail in +the relevant sections. -Updated vocabulary and meta-schema IRIs MAY be published between specification -drafts in order to correct errors. Implementations SHOULD consider IRIs dated -after this specification draft and before the next to indicate the same syntax -and semantics as those listed here. +Updated meta-schema IRIs MAY be published between specification drafts in order +to correct errors. Implementations SHOULD consider IRIs dated after this +specification draft and before the next to indicate the same syntax and +semantics as those listed here. -## A Vocabulary for Structural Validation +## Keywords for Structural Validation Validation keywords in a schema impose requirements for successful validation of an instance. These keywords are all assertions without any annotation behavior. -Meta-schemas that do not use `$vocabulary` SHOULD be considered to require this -vocabulary as if its IRI were present with a value of `true`. - -The current IRI for this vocabulary, known as the Validation vocabulary, is: -`https://json-schema.org/draft/next/vocab/validation`. - -The current IRI for the corresponding meta-schema is: -`https://json-schema.org/draft/next/meta/validation`. - ### Validation Keywords for Any Instance Type {#general} #### `type` @@ -295,7 +284,7 @@ the name of a property in the instance. Omitting this keyword has the same behavior as an empty object. -## Vocabularies for Semantic Content With `format` {#format} +## Semantic Content With `format` {#format} ### Foreword @@ -320,115 +309,57 @@ can be used alongside the `type` keyword with a value of "integer", or could be explicitly defined to always pass if the number is not an integer, which produces essentially the same behavior as only applying to integers. -The current IRI for this vocabulary, known as the Format-Annotation vocabulary, -is: `https://json-schema.org/draft/next/vocab/format-annotation`. The current -IRI for the corresponding meta-schema is: -`https://json-schema.org/draft/next/meta/format-annotation`. Implementing -support for this vocabulary is REQUIRED. - -In addition to the Format-Annotation vocabulary, a secondary vocabulary is -available for custom meta-schemas that defines `format` as an assertion. The IRI -for the Format-Assertion vocabulary, is: -`https://json-schema.org/draft/next/vocab/format-assertion`. The current IRI for -the corresponding meta-schema is: -`https://json-schema.org/draft/next/meta/format-assertion`. Implementing support -for the Format-Assertion vocabulary is OPTIONAL. - -Specifying both the Format-Annotation and the Format-Assertion vocabularies is -functionally equivalent to specifying only the Format-Assertion vocabulary since -its requirements are a superset of the Format-Annotation vocabulary. - -### Implementation Requirements - -The `format` keyword functions as defined by the vocabulary which is referenced. - -#### Format-Annotation Vocabulary - -The value of format MUST be collected as an annotation, if the implementation -supports annotation collection. This enables application-level validation when -schema validation is unavailable or inadequate. - -Implementations MAY still treat `format` as an assertion in addition to an -annotation and attempt to validate the value's conformance to the specified -semantics. The implementation MUST provide options to enable and disable such -evaluation and MUST be disabled by default. Implementations SHOULD document -their level of support for such validation.[^2] +Implementing support for `format` as an annotation is REQUIRED (if the +implementation supports annotation collection). -[^2]: Specifying the Format-Annotation vocabulary and enabling validation in an -implementation should not be viewed as being equivalent to specifying the -Format-Assertion vocabulary since implementations are not required to provide -full validation support when the Format-Assertion vocabulary is not specified. - -When the implementation is configured for assertion behavior, it: +Implementing support for `format` as an assertion is OPTIONAL. Implementations +which choose to support assertion behavior: +- MUST still collect the keyword's value as an annotation (if the implementation + supports annotation collection), +- MUST provide a configuration option to enable assertion behavior, defaulting to + annotation-only behavior - SHOULD provide an implementation-specific best effort validation for each - format attribute defined below; + format attribute defined below;[^3] - MAY choose to implement validation of any or all format attributes as a no-op - by always producing a validation result of `true`;[^3] + by always producing a validation result of true;[^4] +- SHOULD use a common parsing library for each format, or a well-known regular + expression; +- SHOULD clearly document how and to what degree each format attribute is + validated. + +[^3]: The expectation is that for simple formats such as date-time, syntactic +validation will be thorough. For a complex format such as email addresses, which +are the amalgamation of various standards and numerous adjustments over time, +with obscure and/or obsolete rules that may or may not be restricted by other +applications making use of the value, a minimal validation is sufficient. For +example, an instance string that does not contain an "@" is clearly not a valid +email address, and an "email" or "hostname" containing characters outside of +7-bit ASCII is likewise clearly invalid. -[^3]: This matches the current reality of implementations, which provide widely +[^4]: This matches the current reality of implementations, which provide widely varying levels of validation, including no validation at all, for some or all format attributes. It is also designed to encourage relying only on the annotation behavior and performing semantic validation in the application, which is the recommended best practice. -#### Format-Assertion Vocabulary - -When the Format-Assertion vocabulary is declared with a value of `true`, -implementations MUST provide full validation support for all of the formats -defined by this specification. Implementations that cannot provide full -validation support MUST refuse to process the schema. - -An implementation that supports the Format-Assertion vocabulary: - -- MUST still collect `format` as an annotation if the implementation supports - annotation collection; -- MUST evaluate `format` as an assertion; -- MUST implement syntactic validation for all format attributes defined in this - specification, and for any additional format attributes that it recognizes, - such that there exist possible instance values of the correct type that will - fail validation. - The requirement for minimal validation of format attributes is intentionally vague and permissive, due to the complexity involved in many of the attributes. Note in particular that the requirement is limited to syntactic checking; it is not to be expected that an implementation would send an email, attempt to connect to a URL, or otherwise check the existence of an entity -identified by a format instance.[^4] - -[^4]: The expectation is that for simple formats such as date-time, syntactic -validation will be thorough. For a complex format such as email addresses, which -are the amalgamation of various standards and numerous adjustments over time, -with obscure and/or obsolete rules that may or may not be restricted by other -applications making use of the value, a minimal validation is sufficient. For -example, an instance string that does not contain an "@" is clearly not a valid -email address, and an "email" or "hostname" containing characters outside of -7-bit ASCII is likewise clearly invalid. - -It is RECOMMENDED that implementations use a common parsing library for each -format, or a well-known regular expression. Implementations SHOULD clearly -document how and to what degree each format attribute is validated. - -The [standard core and validation meta-schema](#meta-schema) includes this -vocabulary in its `$vocabulary` keyword with a value of `false`, since by default -implementations are not required to support this keyword as an assertion. -Supporting the format vocabulary with a value of `true` is understood to greatly -increase code size and in some cases execution time, and will not be appropriate -for all implementations. +identified by a format instance. #### Custom format attributes Implementations MAY support custom format attributes. Save for agreement between parties, schema authors SHALL NOT expect a peer implementation to support such -custom format attributes. An implementation MUST NOT fail to collect unknown -formats as annotations. When the Format-Assertion vocabulary is specified, -implementations MUST fail upon encountering unknown formats. +custom format attributes. -Vocabularies do not support specifically declaring different value sets for -keywords. Due to this limitation, and the historically uneven implementation of -this keyword, it is RECOMMENDED to define additional keywords in a custom -vocabulary rather than additional format attributes if interoperability is -desired. +An implementation MUST NOT fail to collect unknown formats as annotations. + +When configured for assertion behavior for `format`, implementations MUST fail +upon encountering unknown formats. ### Defined Formats @@ -560,7 +491,7 @@ Implementations that validate formats MUST accept at least the subset of ECMA-262 defined in {{regexinterop}}), and SHOULD accept all valid ECMA-262 expressions. -## A Vocabulary for the Contents of String-Encoded Data {#content} +## Keywords for the Contents of String-Encoded Data {#content} ### Foreword @@ -573,15 +504,6 @@ encoded, and/or how it may be validated. They do not function as validation assertions; a malformed string-encoded document MUST NOT cause the containing instance to be considered invalid. -Meta-schemas that do not use `$vocabulary` SHOULD be considered to require this -vocabulary as if its IRI were present with a value of `true`. - -The current IRI for this vocabulary, known as the Content vocabulary, is: -`https://json-schema.org/draft/next/vocab/content`. - -The current IRI for the corresponding meta-schema is: -`https://json-schema.org/draft/next/meta/content`. - ### Implementation Requirements Due to security and performance concerns, as well as the open-ended nature of @@ -710,20 +632,12 @@ structures: first the header, and then the payload. Since the JWT media type ensures that the JWT can be represented in a JSON string, there is no need for further encoding or decoding. -## A Vocabulary for Basic Meta-Data Annotations These general-purpose annotation -keywords provide commonly used information for documentation and user interface -display purposes. They are not intended to form a comprehensive set of features. -Rather, additional vocabularies can be defined for more complex annotation-based -applications. - -Meta-schemas that do not use `$vocabulary` SHOULD be considered to require this -vocabulary as if its IRI were present with a value of `true`. +## Keywords for Basic Meta-Data Annotations -The current IRI for this vocabulary, known as the Meta-Data vocabulary, is: -`https://json-schema.org/draft/next/vocab/meta-data`. - -The current IRI for the corresponding meta-schema is: -`https://json-schema.org/draft/next/meta/meta-data`. +These general-purpose annotation keywords provide commonly used information for +documentation and user interface display purposes. They are not intended to form +a comprehensive set of features. Rather, additional keywords can be defined +for more complex annotation-based applications. ### `title` and `description` @@ -816,10 +730,10 @@ example. If `examples` is absent, `default` MAY still be used in this manner. ## Security Considerations {#security} -JSON Schema validation defines a vocabulary for JSON Schema core and concerns -all the security considerations listed there. +JSON Schema Validation assumes all the security considerations listed in the +JSON Schema Core specification. -JSON Schema validation allows the use of Regular Expressions, which have +JSON Schema Validation allows the use of Regular Expressions, which have numerous different (often incompatible) implementations. Some implementations allow the embedding of arbitrary code, which is outside the scope of JSON Schema and MUST NOT be permitted. Regular expressions can often also be crafted to be @@ -970,40 +884,6 @@ draft-bhutton-json-schema-01, June 2022, Hoehrmann, B., "Scripting Media Types", RFC 4329, DOI 10.17487/RFC4329, April 2006, <>. -## [Appendix] Keywords Moved from Validation to Core - -Several keywords have been moved from this document into the [Core -Specification](#json-schema) starting with draft 2019-09, in some cases with -re-naming or other changes. This affects the following former validation -keywords: - -- *`definitions`* Renamed to `$defs` to match `$ref` and be shorter to type. - Schema vocabulary authors SHOULD NOT define a `definitions` keyword with - different behavior in order to avoid invalidating schemas that still use the - older name. While `definitions` is absent in the single-vocabulary - meta-schemas referenced by this document, it remains present in the default - meta-schema, and implementations SHOULD assume that `$defs` and `definitions` - have the same behavior when that meta-schema is used. -- *`allOf`, `anyOf`, `oneOf`, `not`, `if`, `then`, `else`, `items`, - `additionalItems`, `contains`, `propertyNames`, `properties`, - `patternProperties`, `additionalProperties`* All of these keywords apply - subschemas to the instance and combine their results, without asserting any - conditions of their own. Without assertion keywords, these applicators can - only cause assertion failures by using the `false` boolean schema, or by - inverting the result of the `true` boolean schema (or equivalent schema - objects). For this reason, they are better defined as a generic mechanism on - which validation, hyper-schema, and extension vocabularies can all be based. -- *`maxContains`, `minContains`* These keywords modify the behavior of - `contains`, and are therefore grouped with it in the applicator vocabulary. -- *`dependencies`* This keyword had two different modes of behavior, which made - it relatively challenging to implement and reason about. The schema form has - been moved to Core and renamed to `dependentSchemas`, as part of the - applicator vocabulary. It is analogous to `properties`, except that instead of - applying its subschema to the property value, it applies it to the object - containing the property. The property name array form is retained here and - renamed to `dependentRequired`, as it is an assertion which is a shortcut for - the conditional use of the `required` assertion keyword. - ## [Appendix] Acknowledgments Thanks to Gary Court, Francis Galiegue, Kris Zyp, Geraint Luff, and Henry From 3dc1967787b0e4b3aedddf0aad90f38683e0e4c2 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Mon, 10 Jun 2024 13:08:15 +1200 Subject: [PATCH 03/12] start on proposal (with adr) --- proposals/vocabularies-adr.md | 74 +++++++++++++++++++++++++++ proposals/vocabularies.md | 95 +++++++++++++++++++++++++++++++++++ 2 files changed, 169 insertions(+) create mode 100644 proposals/vocabularies-adr.md create mode 100644 proposals/vocabularies.md diff --git a/proposals/vocabularies-adr.md b/proposals/vocabularies-adr.md new file mode 100644 index 00000000..5eab80d7 --- /dev/null +++ b/proposals/vocabularies-adr.md @@ -0,0 +1,74 @@ +# [short title of solved problem and solution] + +* Status: proposed +* Deciders: @gregsdennis, @jdesrosiers +* Date: 2024-06-10 + +Technical Story: + +- Issues discussing feature - https://github.com/json-schema-org/json-schema-spec/issues?q=is%3Aopen+is%3Aissue+label%3Avocabulary +- ADR to extract from the spec and use feature life cycle - https://github.com/json-schema-org/json-schema-spec/pull/1510 + +## Context and Problem Statement + +The current approach to extending JSON Schema by providing custom keywords is +very implementation-specific and therefore not interoperable. + +To address this deficiency, this document proposes vocabularies as a concept +and a new Core keyword, `$vocabulary` to support it. + +## Decision Drivers + +- Language-agnostic +- Ease of use +- Ease of implementation + +## Considered Options + +### Current design as included in 2019-09 and 2020-12. + +Vocabularies are external documents that describe how new keywords function. +They can be in a specification style, or a blog post, or some other format. + +An implementation declares support for a particular vocabulary via +implementation of its keywords and documentation. + +`$vocabulary` keyword is an object with URI keys and boolean values. The URIs +identify each vocab, and the values indicate whether the implementation must +"understand" that vocab in order to process the schema. This keyword is only +processed when it is found as part of a meta-schema. + +* Good because it provides a language-agnostic mechanism that's built into JSON + Schema itself +* Bad because unknown keywords are now unsupported, which implies that + [unknown vocabularies are implicitly unsupported](https://github.com/orgs/json-schema-org/discussions/342) + +### [option 2] + +[example | description | pointer to more information | …] + +* Good, because [argument a] +* Good, because [argument b] +* Bad, because [argument c] +* … + +### [option 3] + +[example | description | pointer to more information | …] + +* Good, because [argument a] +* Good, because [argument b] +* Bad, because [argument c] +* … + +## Decision Outcome + +_TBD_ + +### Positive Consequences + +_TBD_ + +### Negative Consequences + +_TBD_ diff --git a/proposals/vocabularies.md b/proposals/vocabularies.md new file mode 100644 index 00000000..758cb250 --- /dev/null +++ b/proposals/vocabularies.md @@ -0,0 +1,95 @@ +# JSON Schema Proposal: + +## Abstract + +The current approach to extending JSON Schema by providing custom keywords is +very implementation-specific and therefore not interoperable. + +To address this deficiency, this document proposes vocabularies as a concept +and a new Core keyword, `$vocabulary` to support it. + +While the Core specification will define and describe vocabularies in general, +the Validation specification will also need to change to incorporate some of +these ideas. This proposal will be updated as necessary to reflect the changes +in both documents. + +## Current Status + +This proposal was originally integrated into both specifications, starting with +the 2019-09 release, and has been extracted as the feature is incomplete. The +feature, at best effort, was extracted in such a way as to retain the +functionality present in the 2020-12 release. + +Trying to fit the 2020-12 version into the current specification, however, +raises some problems, and further discussion around the design of +this concept is needed. + +## Note to Readers + +The issues list for this proposal can be found at +. + +For additional information, see . + +To provide feedback, use this issue tracker or any of the communication methods +listed on the homepage. + +## Table of Contents + +## Conventions and Terminology + +All conventions and terms used and defined by the JSON Schema Core specification +also apply to this document. + +## Overview + +### Problem Statement + +The specification allows implementations to support user-defined keywords. However, this vague and open allowance has drawbacks. + +1. This isn't a requirement, it is a permission. An implementation could just as easily choose _not_ to support user-defined keywords. +2. There is no prescribed mechanism by which an implementation should provide this support. As a result, each implementation that _does_ have the feature supports it in different ways. +3. Support for any given user-defined keyword will be limited to that implementation. There is no guarantee that the keyword will be supported in another implementation, and unless the user explicitly configures the other implementation, their keywords likely will not be supported. + +This exposes a need for the specification(s) to define a way for implementations to share knowledge of a keyword or group of keywords. + +### Solution + + + +### Limitations + + + +## Change Details + + + +## [Appendix] Change Log + +* [MMMM YYYY] Created + +## [Appendix] Champions + +| Champion | Company | Email | URI | +|----------------------------|---------|-------------------------|----------------------------------| +| Your Name | | | < GitHub profile page > | From 493d9bf13f788dd2a3eb7ab16c489f356d476a7e Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Mon, 10 Jun 2024 13:10:15 +1200 Subject: [PATCH 04/12] tweaks to problem statement --- proposals/vocabularies.md | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/proposals/vocabularies.md b/proposals/vocabularies.md index 758cb250..f8086cfb 100644 --- a/proposals/vocabularies.md +++ b/proposals/vocabularies.md @@ -45,13 +45,20 @@ also apply to this document. ### Problem Statement -The specification allows implementations to support user-defined keywords. However, this vague and open allowance has drawbacks. - -1. This isn't a requirement, it is a permission. An implementation could just as easily choose _not_ to support user-defined keywords. -2. There is no prescribed mechanism by which an implementation should provide this support. As a result, each implementation that _does_ have the feature supports it in different ways. -3. Support for any given user-defined keyword will be limited to that implementation. There is no guarantee that the keyword will be supported in another implementation, and unless the user explicitly configures the other implementation, their keywords likely will not be supported. - -This exposes a need for the specification(s) to define a way for implementations to share knowledge of a keyword or group of keywords. +The specification allows implementations to support user-defined keywords. +However, this vague and open allowance has drawbacks. + +1. This isn't a requirement, it is a permission. An implementation could just as + easily (_more_ easily) choose _not_ to support user-defined keywords. +2. There is no prescribed mechanism by which an implementation should provide + this support. As a result, each implementation that _does_ have the feature + supports it in different ways. +3. Support for any given user-defined keyword will be limited to that + implementation. Unless the user explicitly configures another + implementation, their keywords likely will not be supported. + +This exposes a need for the specification(s) to define a way for implementations +to share knowledge of a keyword or group of keywords. ### Solution From 8dabd03f36cefc7f1e1d165708bdfa0bf198c124 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Tue, 11 Jun 2024 13:16:20 +1200 Subject: [PATCH 05/12] add vocabulary proposal doc --- proposals/vocabularies.md | 199 ++++++++++++++++++++++++++++++++++---- 1 file changed, 182 insertions(+), 17 deletions(-) diff --git a/proposals/vocabularies.md b/proposals/vocabularies.md index f8086cfb..ab0e7ecd 100644 --- a/proposals/vocabularies.md +++ b/proposals/vocabularies.md @@ -6,7 +6,7 @@ The current approach to extending JSON Schema by providing custom keywords is very implementation-specific and therefore not interoperable. To address this deficiency, this document proposes vocabularies as a concept -and a new Core keyword, `$vocabulary` to support it. +and a new Core keyword, `$vocabulary`, to support it. While the Core specification will define and describe vocabularies in general, the Validation specification will also need to change to incorporate some of @@ -16,9 +16,9 @@ in both documents. ## Current Status This proposal was originally integrated into both specifications, starting with -the 2019-09 release, and has been extracted as the feature is incomplete. The -feature, at best effort, was extracted in such a way as to retain the -functionality present in the 2020-12 release. +the 2019-09 release. For the upcoming stable release, the feature has been +extracted as it is incomplete. The feature, at best effort, was extracted in +such a way as to retain the functionality present in the 2020-12 release. Trying to fit the 2020-12 version into the current specification, however, raises some problems, and further discussion around the design of @@ -45,28 +45,191 @@ also apply to this document. ### Problem Statement -The specification allows implementations to support user-defined keywords. -However, this vague and open allowance has drawbacks. +To support extensibility, the specification allows implementations to support +keywords that are not defined in the specifications themselves. However, this +vague and open allowance has drawbacks. -1. This isn't a requirement, it is a permission. An implementation could just as - easily (_more_ easily) choose _not_ to support user-defined keywords. +1. Such support is not a requirement; it is a permission. An implementation + could just as easily (_more_ easily) choose _not_ to support extension + keywords. 2. There is no prescribed mechanism by which an implementation should provide this support. As a result, each implementation that _does_ have the feature supports it in different ways. -3. Support for any given user-defined keyword will be limited to that - implementation. Unless the user explicitly configures another - implementation, their keywords likely will not be supported. +3. Support for any given user-defined keyword will be limited to the + implementations which are explicitly configured for that keyword. For a user + defining their own keyword, this becomes difficult and/or impossible + depending on the varying support for extension keywords offered by the + implementations the user is using. -This exposes a need for the specification(s) to define a way for implementations -to share knowledge of a keyword or group of keywords. +This exposes a need for an implementation-agnostic approach to +externally-defined keywords as well as a way for implementations to declare +support for them. ### Solution - +Two new concepts, vocabularies and dialects, will be introduced into the Core +specification. + +A vocabulary is identified by an absolute URI and is used to define a set of +keywords. A vocabulary is generally defined in a human-readable _vocabulary +description document_. (The URI for the vocabulary may be the same as the URL of +where this vocabulary description document can be found, but no recommendation +is made either for or against this practice.) + +A new keyword, `$vocabulary`, will be introduced into the Core specification as +well. This keyword's value is an object with vocabulary URIs as keys and +booleans as values. This keyword only has meaning within a meta-schema. A +meta-schema which includes a vocabulary's URI in its `$vocabulary` keyword is +said to "include" that vocabulary. + +```jsonc +{ + "$schema": "https://example.org/draft/next/schema", + "$id": "https://example.org/schema", + "$vocabulary": { + "https://example.org/vocab/vocab1": true, + "https://example.org/vocab/vocab2": true, + "https://example.org/vocab/vocab3": false + }, + // ... +} +``` + +A dialect is the set of vocabularies listed by a meta-schema. It is ephemeral +and carries no identifier. + +_**NOTE** It is possible for two meta-schemas, which would have different `$id` +values, to share a common dialect if they both declare the same set of +vocabularies._ + +A schema that declares a meta-schema (via `$schema`) which contains +`$vocabulary` is declaring that only those keywords defined by the included +vocabularies are to be processed when evaluating the schema. All other keywords +are to be considered "unknown" and handled accordingly. + +The boolean values in `$vocabulary` signify implementation requirements for each +vocabulary. + +- A `true` value indicates that the implementation must recognize the vocabulary + and be able to process each of the keywords defined it. If an implementation + does not recognize the vocabulary or cannot process all of its defined + keywords, the implementation must refuse to process the schema. These + vocabularies are also known as "required" vocabularies. +- A `false` value indicates that the implementation is not required to recognize + the vocabulary or its keywords and may continue processing the schema anyway. + However, keywords that are not recognized or supported must be considered + "unknown" and handled accordingly. These vocabularies are also known as + "optional" vocabularies. + +Typically, but not required, a schema will accompany the vocabulary description +document. This _vocabulary schema_ should carry an `$id` value which is distinct +from the vocabulary URI. The purpose of the vocabulary schema is to provide +syntactic validation for the the vocabulary's keywords' values for when the +schema is being validated by a meta-schema that includes the vocabulary. (A +vocabulary schema is not itself a meta-schema since it does not validate entire +schemas.) To facilitate this extra validation, when a vocabulary schema is +provided, any meta-schema which includes the vocabulary should also contain a +reference (via `$ref`) to the vocabulary schema's `$id` value. + +```jsonc +{ + "$schema": "https://example.org/draft/next/schema", + "$id": "https://example.org/schema", + "$vocabulary": { + "https://example.org/vocab/vocab1": true, + "https://example.org/vocab/vocab2": true, + "https://example.org/vocab/vocab3": false + }, + "allOf": { + {"$ref": "meta/vocab1"}, // https://example.org/meta/vocab1 + {"$ref": "meta/vocab2"}, // https://example.org/meta/vocab2 + {"$ref": "meta/vocab3"} // https://example.org/meta/vocab3 + } + // ... +} +``` + +Finally, the keywords in both the Core and Validation specifications will be +divided into multiple vocabularies. The keyword definitions will be removed from +the meta-schema and added to vocabulary schemas to which the meta-schema will +contain references. In this way, the meta-schema's functionality remains the same. + +```json +{ + "$schema": "https://json-schema.org/draft/next/schema", + "$id": "https://json-schema.org/draft/next/schema", + "$vocabulary": { + "https://json-schema.org/draft/next/vocab/core": true, + "https://json-schema.org/draft/next/vocab/applicator": true, + "https://json-schema.org/draft/next/vocab/unevaluated": true, + "https://json-schema.org/draft/next/vocab/validation": true, + "https://json-schema.org/draft/next/vocab/meta-data": true, + "https://json-schema.org/draft/next/vocab/format-annotation": true, + "https://json-schema.org/draft/next/vocab/content": true + }, + "$dynamicAnchor": "meta", + + "title": "Core and Validation specifications meta-schema", + "allOf": [ + {"$ref": "meta/core"}, + {"$ref": "meta/applicator"}, + {"$ref": "meta/unevaluated"}, + {"$ref": "meta/validation"}, + {"$ref": "meta/meta-data"}, + {"$ref": "meta/format-annotation"}, + {"$ref": "meta/content"} + ], +} +``` + +The division of keywords among the vocabularies will be in accordance with the +2020-12 specification (for now). ### Limitations - +#### Unknown Keywords and Unsupported Vocabularies + +This proposal, in its current state, seeks to mimic the behavior defined in the +2020-12 specification. However, the current specification's disallowance of +unknown keywords presents a problem for schemas that use keywords from optional +vocabularies. (This is the topic of the discussion at +https://github.com/orgs/json-schema-org/discussions/342.) + +In short, if a schema uses a keyword from an unknown _optional_ vocabulary, the +implementation cannot proceed because unknown keywords are explicitly +disallowed. However, not being able to proceed with evaluation is the behavior +prescribed for _required_ vocabularies. Thus, if the behaviors for required and +optional vocabularies is the same, then the boolean value is moot, which +highlights that the structure of `$vocabulary` needs to be reconsidered. + +#### Machine Readability + +The vocabulary URI is an opaque value. There is no data that an implementation +can reference to identify the keywords defined by the vocabulary. The vocabulary +schema _implies_ this, but scanning a `properties` keyword isn't very reliable. +Moreover, such a system cannot provide metadata about the keywords. As such, the +user must explicitly ensure that the implementation recognizes and supports the +vocabulary, which isn't much of an improvement over the current state. + +Having some sort of "vocabulary definition" file could alleviate this. + +One reason for _not_ having such a file is that, at least for functional +keywords, the user generally needs to provide custom code to the implementation +to process the keywords, thus performing that same explicit configuration +anyway. (Such information cannot be gleaned from a vocabulary specification. For +example, an implementation can't know what to do with a hypothetical `minDate` +keyword.) + +#### Implicit Inclusion of Core Vocabulary + +Because the Core keywords (the ones that start with `$`) instruct an +implementation on how a schema should be processed, its inclusion is mandatory +and assumed. As such, while excluding the Core Vocabulary from the `$vocabulary` +keyword has no effect, it is generally advised as common practice to include the +Core Vocabulary explicitly. + +This can be confusing and difficult to use/implement, and we probably need +something better here. ## Change Details @@ -91,12 +254,14 @@ For example ``` --> +_**NOTE** Since the design of vocabularies will be changing anyway, it's not worth the time and effort to fill in this section just yet. As such, please read the above sections for loose requirements. For tighter requirements, please assume conformance with the 2020-12 Core and Validation specifications._ + ## [Appendix] Change Log -* [MMMM YYYY] Created +* 2024-06-10 - Created ## [Appendix] Champions | Champion | Company | Email | URI | |----------------------------|---------|-------------------------|----------------------------------| -| Your Name | | | < GitHub profile page > | +| Greg Dennis | | gregsdennis@yahoo.com | https://github.com/gregsennis | From 25faf6c25087261379d2dee9d30ec843a0722b43 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Thu, 20 Jun 2024 09:23:13 +1200 Subject: [PATCH 06/12] Update jsonschema-core.md Co-authored-by: Jason Desrosiers --- jsonschema-core.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index b817d9dc..18fb2b2e 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -226,7 +226,7 @@ never produce annotation results. These boolean schemas exist to clarify schema author intent and facilitate schema processing optimizations. They behave identically to the following schema -objects (where `not` is defined in [later this document](#not)). +objects (where `not` is defined [later in this document](#not)). - `true`: Always passes validation, as if the empty schema `{}` - `false`: Always fails validation, as if the schema `{ "not": {} }` From f83f28787b076b891080c15b925fbd8c5058e3f7 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Fri, 21 Jun 2024 13:58:21 +1200 Subject: [PATCH 07/12] Update jsonschema-core.md Co-authored-by: Jason Desrosiers --- jsonschema-core.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/jsonschema-core.md b/jsonschema-core.md index 18fb2b2e..d3c7963b 100644 --- a/jsonschema-core.md +++ b/jsonschema-core.md @@ -821,8 +821,8 @@ processing JSON Schema. These keywords inform implementations how to process any schema or meta-schema, including those split across multiple documents, or exist to reserve keywords for purposes that require guaranteed interoperability. -Support for these keywords MUST be considered mandatory at all times in order to -bootstrap the processing of further keywords. +Support for these keywords MUST be considered mandatory at all times as they are +necessary to navigate and process any schema. The "$" prefix is reserved for use by this specification. Extensions MUST NOT define new keywords that begin with "$". From 61ae0cdd833090fd9b48e7d5995ef035384aab74 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Fri, 21 Jun 2024 17:46:45 +1200 Subject: [PATCH 08/12] addressing PR comments --- proposals/vocabularies-adr.md | 16 ++++++++++------ proposals/vocabularies.md | 14 ++++---------- 2 files changed, 14 insertions(+), 16 deletions(-) diff --git a/proposals/vocabularies-adr.md b/proposals/vocabularies-adr.md index 5eab80d7..7e8a7ac8 100644 --- a/proposals/vocabularies-adr.md +++ b/proposals/vocabularies-adr.md @@ -27,19 +27,23 @@ and a new Core keyword, `$vocabulary` to support it. ### Current design as included in 2019-09 and 2020-12. -Vocabularies are external documents that describe how new keywords function. -They can be in a specification style, or a blog post, or some other format. +Vocabularies are collections of keywords and are defined by vocabulary document. +For the 2019-09 and 2020-12 vocabularies, the documents are integrated into the +specifications themselves. -An implementation declares support for a particular vocabulary via -implementation of its keywords and documentation. +With vocabularies as the primary method for defining individual keywords, +dialects can be created by combining different vocabularies. + +Users must confirm with an implementation's documentation whether a given +vocabulary is supported. `$vocabulary` keyword is an object with URI keys and boolean values. The URIs identify each vocab, and the values indicate whether the implementation must "understand" that vocab in order to process the schema. This keyword is only processed when it is found as part of a meta-schema. -* Good because it provides a language-agnostic mechanism that's built into JSON - Schema itself +* Good because it provides a language-agnostic method of defining extension + keywords that's built into JSON Schema itself * Bad because unknown keywords are now unsupported, which implies that [unknown vocabularies are implicitly unsupported](https://github.com/orgs/json-schema-org/discussions/342) diff --git a/proposals/vocabularies.md b/proposals/vocabularies.md index ab0e7ecd..bc79d0d7 100644 --- a/proposals/vocabularies.md +++ b/proposals/vocabularies.md @@ -67,7 +67,7 @@ support for them. ### Solution -Two new concepts, vocabularies and dialects, will be introduced into the Core +This proposal introduces vocabularies as a new concept to be added to the Core specification. A vocabulary is identified by an absolute URI and is used to define a set of @@ -95,8 +95,9 @@ said to "include" that vocabulary. } ``` -A dialect is the set of vocabularies listed by a meta-schema. It is ephemeral -and carries no identifier. +Whereas in the current specification, a dialect is merely the set of keywords +used by a schema, with this proposal a dialect is defined by the set of +vocabularies listed by a meta-schema. It is ephemeral and carries no identifier. _**NOTE** It is possible for two meta-schemas, which would have different `$id` values, to share a common dialect if they both declare the same set of @@ -195,13 +196,6 @@ unknown keywords presents a problem for schemas that use keywords from optional vocabularies. (This is the topic of the discussion at https://github.com/orgs/json-schema-org/discussions/342.) -In short, if a schema uses a keyword from an unknown _optional_ vocabulary, the -implementation cannot proceed because unknown keywords are explicitly -disallowed. However, not being able to proceed with evaluation is the behavior -prescribed for _required_ vocabularies. Thus, if the behaviors for required and -optional vocabularies is the same, then the boolean value is moot, which -highlights that the structure of `$vocabulary` needs to be reconsidered. - #### Machine Readability The vocabulary URI is an opaque value. There is no data that an implementation From 59ea62b39f32e86a5fbd58316d1f6e9308781c02 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Fri, 21 Jun 2024 17:52:59 +1200 Subject: [PATCH 09/12] forgot to save a file before committing --- proposals/vocabularies.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/proposals/vocabularies.md b/proposals/vocabularies.md index bc79d0d7..ebef88e8 100644 --- a/proposals/vocabularies.md +++ b/proposals/vocabularies.md @@ -201,9 +201,7 @@ https://github.com/orgs/json-schema-org/discussions/342.) The vocabulary URI is an opaque value. There is no data that an implementation can reference to identify the keywords defined by the vocabulary. The vocabulary schema _implies_ this, but scanning a `properties` keyword isn't very reliable. -Moreover, such a system cannot provide metadata about the keywords. As such, the -user must explicitly ensure that the implementation recognizes and supports the -vocabulary, which isn't much of an improvement over the current state. +Moreover, such a system cannot provide metadata about the keywords. Having some sort of "vocabulary definition" file could alleviate this. @@ -214,6 +212,12 @@ anyway. (Such information cannot be gleaned from a vocabulary specification. For example, an implementation can't know what to do with a hypothetical `minDate` keyword.) +Several ideas have been offeree for this sort of document: + +- https://github.com/json-schema-org/json-schema-spec/issues/1523 +- https://github.com/json-schema-org/json-schema-spec/issues/1423 +- https://github.com/json-schema-org/json-schema-spec/pull/1257 + #### Implicit Inclusion of Core Vocabulary Because the Core keywords (the ones that start with `$`) instruct an From d23ec0517ab3da6515feba5a589f20227d7bcf7f Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Wed, 17 Jul 2024 12:20:52 +1200 Subject: [PATCH 10/12] updated per PR comments --- proposals/vocabularies-adr.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/vocabularies-adr.md b/proposals/vocabularies-adr.md index 7e8a7ac8..d0c475d9 100644 --- a/proposals/vocabularies-adr.md +++ b/proposals/vocabularies-adr.md @@ -27,7 +27,7 @@ and a new Core keyword, `$vocabulary` to support it. ### Current design as included in 2019-09 and 2020-12. -Vocabularies are collections of keywords and are defined by vocabulary document. +A vocabulary is a collection of keywords and is defined by a vocabulary document. For the 2019-09 and 2020-12 vocabularies, the documents are integrated into the specifications themselves. From 8440ee4b968a10c4903a3dfce78765d82ad7ad69 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Thu, 18 Jul 2024 10:10:45 +1200 Subject: [PATCH 11/12] fix typo --- proposals/vocabularies.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/vocabularies.md b/proposals/vocabularies.md index ebef88e8..d45bedac 100644 --- a/proposals/vocabularies.md +++ b/proposals/vocabularies.md @@ -212,7 +212,7 @@ anyway. (Such information cannot be gleaned from a vocabulary specification. For example, an implementation can't know what to do with a hypothetical `minDate` keyword.) -Several ideas have been offeree for this sort of document: +Several ideas have been offered for this sort of document: - https://github.com/json-schema-org/json-schema-spec/issues/1523 - https://github.com/json-schema-org/json-schema-spec/issues/1423 From 009cabf146cf581d5a7e8437d2c28651907fb864 Mon Sep 17 00:00:00 2001 From: Greg Dennis Date: Sat, 17 Aug 2024 11:57:57 +1200 Subject: [PATCH 12/12] Update proposals/vocabularies.md Co-authored-by: Jason Desrosiers --- proposals/vocabularies.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/vocabularies.md b/proposals/vocabularies.md index d45bedac..619688bd 100644 --- a/proposals/vocabularies.md +++ b/proposals/vocabularies.md @@ -112,7 +112,7 @@ The boolean values in `$vocabulary` signify implementation requirements for each vocabulary. - A `true` value indicates that the implementation must recognize the vocabulary - and be able to process each of the keywords defined it. If an implementation + and be able to process each of the keywords defined in it. If an implementation does not recognize the vocabulary or cannot process all of its defined keywords, the implementation must refuse to process the schema. These vocabularies are also known as "required" vocabularies.