From 1c35b16d5c849b39c38d4d7b74e2b93b91a3c493 Mon Sep 17 00:00:00 2001 From: Mitchell Shiell Date: Thu, 3 Jul 2025 14:56:12 -0400 Subject: [PATCH 1/5] overview.md --- .../01-lectern.md => 01-overview.md} | 28 +- docs/02-Setup.md | 240 ++++++++++++++++++ .../00-dictionaryReference.md} | 147 ++++++----- .../01-glossaryOfTerms.md} | 9 +- docs/assets/submission-system.svg | 4 + docs/overview/images/submission-system.svg | 4 - .../migration/lectern2changes.md | 1 - .../validation/field-validation.md | 0 {docs => pendingDocs}/validation/index.md | 0 .../validation/record-validation.md | 0 10 files changed, 346 insertions(+), 87 deletions(-) rename docs/{overview/01-lectern.md => 01-overview.md} (69%) create mode 100644 docs/02-Setup.md rename docs/{dictionary-reference.md => Reference/00-dictionaryReference.md} (95%) rename docs/{important-concepts.md => Reference/01-glossaryOfTerms.md} (94%) create mode 100644 docs/assets/submission-system.svg delete mode 100644 docs/overview/images/submission-system.svg rename docs/lectern-2.0-changes.md => pendingDocs/migration/lectern2changes.md (99%) rename {docs => pendingDocs}/validation/field-validation.md (100%) rename {docs => pendingDocs}/validation/index.md (100%) rename {docs => pendingDocs}/validation/record-validation.md (100%) diff --git a/docs/overview/01-lectern.md b/docs/01-overview.md similarity index 69% rename from docs/overview/01-lectern.md rename to docs/01-overview.md index b9f7af79..22d84add 100644 --- a/docs/overview/01-lectern.md +++ b/docs/01-overview.md @@ -1,22 +1,20 @@ -# Lectern - -Lectern is Overture's Data Dictionary Schema Manager, designed to validate, store, and manage collections of data dictionaries. These dictionaries define schemas that specify expected data structure and syntax for tabular (TSV) data submissions. With built-in version control capabilities, Lectern can track schema evolution and compute differences between versions, while integrating with the [Lyric](https://docs.overture.bio/docs/under-development/lyric/) data submission service. +# Overview +Data dictionaries are organized collections of schemas that define the structure, constraints, and relationships of data models. Overture's Data Dictionary Manager, Lectern, is designed to manage collections of data dictionaries and can be integrated into any data platform. Lectern typically works with Overture's tabular data submission service, [Lyric](https://docs.overture.bio/docs/under-development/lyric/), to ensure data quality and consistency throughout the submission workflow. ## Key Features -- **Schema Definition:** Define comprehensive schemas specifying structure, constraints, and relationships of data elements. -- **Dictionary Management:** Maintain collections of schemas (data dictionaries) with multiple versions. -- **Version Control:** Track changes and evolution of data structures over time. +- **Schema Definition:** Define schemas that specify the structure, constraints, and relationships of data elements +- **Version Control:** Track changes of data structures over time. - **Difference Computation:** Compare versions to understand changes in data requirements. -- **Schema Validation:** Validate the structure and syntax of data dictionary schema against Lecterns base meta-schema. +- **Schema Validation:** Validate the structure and syntax of data dictionary schema against the Lectern base meta-schema. - **Integration:** includes a RESTful API (Swagger) for integration with larger data management systems. ## System Architecture -Lectern operates as a central Dictionary Schema repository within the Overture ecosystem, providing dictionary management and validation services through its RESTful API. The service maintains schemas in a database, tracking versions and relationships between different schema elements. Lectern's schemas are primarily consumed by Lyric, which stores and uses them to validate incoming tabular data submissions. Through this integration, Lectern plays a crucial role in ensuring data quality and consistency in the Overture submission workflow. +Lectern operates as a central Dictionary Schema repository, providing dictionary management and validation services through its RESTful API. The service maintains schemas in a database (mongoDb), tracking versions and relationships between different schema elements. In the Overture platform Lectern's schemas are consumed by [Lyric](https://docs.overture.bio/docs/under-development/lyric/), which stores and uses them to validate incoming tabular data submissions. -![Submission System Architecture](./images/submission-system.svg 'Updated Overture Submission System') +![Submission System Architecture](./assets/submission-system.svg "Updated Overture Submission System") ## Repository Structure @@ -25,7 +23,7 @@ The repository is organized with the following directory structure: ``` . ├── apps/ -│ └── server +│ └── server └── packages/ │ ├── client | ├── common @@ -33,14 +31,14 @@ The repository is organized with the following directory structure: | └── validation └── scripts/ ``` -[Click here to view the Lectern repository on GitHub](https://github.com/overture-stack/lectern) +[Click here to view the Lectern repository on GitHub](https://github.com/overture-stack/lectern) The modules in the monorepo are organized into three categories: - - `apps/`: Standalone processes meant to be run. These are published to [ghcr.io](https://ghcr.io) as container images. - - `packages/`: Reusable packages shared between applications and other packages. Packages are published to [NPM](https://npmjs.com). - - `scripts`: Utility scripts for use within this repo. +- `apps/`: Standalone processes meant to be run. These are published to [ghcr.io](https://ghcr.io) as container images. +- `packages/`: Reusable packages shared between applications and other packages. Packages are published to [NPM](https://npmjs.com). +- `scripts`: Utility scripts for use within this repo. #### Lectern Components @@ -51,4 +49,4 @@ Each component serves a specific purpose within Lectern, providing functionality | [Lectern Server](https://github.com/overture-stack/lectern/blob/develop/apps/server/README.md) | @overture-stack/lectern-server | apps/server/ | [![Lectern GHCR Packages](https://img.shields.io/badge/GHCR-lectern-brightgreen?style=for-the-badge&logo=github)](https://github.com/overture-stack/lectern/pkgs/container/lectern) | Lectern Server web application. | | [Lectern Client](https://github.com/overture-stack/lectern/blob/develop/packages/client/README.md) | @overture-stack/lectern-client | packages/client | [![Lectern Client NPM Package](https://img.shields.io/npm/v/@overture-stack/lectern-client?color=%23cb3837&style=for-the-badge&logo=npm)](https://www.npmjs.com/package/@overture-stack/lectern-client) | TypeScript Client to interact with Lectern Server and Lectern data dictionaries. This library provides a REST client to assist in fetching data from the Lectern server. It also exposes the functionality from the Lectern Validation library to use a Lectern data dictionary to validate data. | | [Lectern Dictionary](https://github.com/overture-stack/lectern/blob/develop/packages/dictionary/README.md) | | @overture-stack/lectern-dictionary | [![Lectern Client NPM Package](https://img.shields.io/npm/v/@overture-stack/lectern-dictionary?color=%23cb3837&style=for-the-badge&logo=npm)](https://www.npmjs.com/package/@overture-stack/lectern-dictionary) | Dictionary meta-schema definition, includes TS types, and Zod schemas. This also exports all utilities for getting the diff of two dictionaries. | - | [Lectern Validation](https://github.com/overture-stack/lectern/blob/develop/packages/validation/README.md) | @overture-stack/lectern-validation | packages/validation/ | [![Lectern Validation NPM Package](https://img.shields.io/npm/v/@overture-stack/lectern-validation?color=%23cb3837&style=for-the-badge&logo=npm)](https://www.npmjs.com/package/@overture-stack/lectern-client) | Validate data using Lectern Dictionaries. + | [Lectern Validation](https://github.com/overture-stack/lectern/blob/develop/packages/validation/README.md) | @overture-stack/lectern-validation | packages/validation/ | [![Lectern Validation NPM Package](https://img.shields.io/npm/v/@overture-stack/lectern-validation?color=%23cb3837&style=for-the-badge&logo=npm)](https://www.npmjs.com/package/@overture-stack/lectern-client) | Validate data using Lectern Dictionaries. diff --git a/docs/02-Setup.md b/docs/02-Setup.md new file mode 100644 index 00000000..b75c1095 --- /dev/null +++ b/docs/02-Setup.md @@ -0,0 +1,240 @@ +# Setup + +This guide provides instructions for setting up a complete development environment for Lectern, Overture's data dictionary management web server service. + +## Prerequisites + +Before beginning, ensure you have the following installed on your system: + +- PNPM (package manager - used instead of npm) +- Node.js (v18 or higher) +- Docker (for running containerized services) + +## Developer Setup + +This guide will walk you through setting up a complete development environment for Lectern Server, including its complementary services. + +### Setting up supporting services + +Lectern Server requires a MongoDB database to store dictionaries and metadata. + +1. **MongoDB Database Setup** + + Use the provided docker-compose configuration to start MongoDB: + + ```bash + # Navigate to the server directory + cd apps/server + + # Start MongoDB using docker-compose + docker-compose up -d + ``` + + Alternatively, you can start MongoDB manually: + + ```bash + docker run --name lectern-mongo \ + -e MONGO_INITDB_ROOT_USERNAME=admin \ + -e MONGO_INITDB_ROOT_PASSWORD=password \ + -p 27017:27017 \ + -d mongo:latest + ``` + +
+ **Click here for a detailed breakdown** + + This command will set up the database service for Lectern Server development as follows: + + | Service | Port | Description | Purpose in Lectern Server Development | + | --------------- | ------ | ------------------------------ | ------------------------------------ | + | MongoDB | 27017 | NoSQL database for dictionary storage | Stores data dictionaries, versions, and metadata | + + - Ensure port 27017 is free on your system before starting the database. + - The default configuration uses `admin/password` for MongoDB credentials. + - You may need to adjust the port in the configuration file if you have conflicts with existing services. + +
+ +In the next steps, we will run a Lectern development server against these supporting services. + +### Running the Development Server + +1. Clone Lectern and move into its directory: + + ```bash + git clone https://github.com/overture-stack/lectern.git + cd lectern + ``` + +2. Install all dependencies for the entire monorepo: + + ```bash + pnpm install + ``` + +3. Navigate to the server directory: + + ```bash + cd apps/server + ``` + +4. Configure environment variables: + + ```bash + cp .env.example .env + ``` + + :::info + + This `.env` file is preconfigured as follows for the Lectern Server environment: + + ```env + # Express Configuration + PORT=3000 + + # Swagger Docs Config + OPENAPI_PATH=/api-docs + + # Mongo Configuration + MONGO_HOST=localhost + MONGO_PORT=27017 + MONGO_DB=lectern + MONGO_USER= + MONGO_PASS= + + # Auth Configuration + AUTH_ENABLED=false + EGO_API= + SCOPE= + + # CORS allowed origins can be a comma separated list of the allowed domains. + # Leave empty to not allow any web connections (default). Use * to allow all. + CORS_ALLOWED_ORIGINS= + + # Vault Configuration + VAULT_ENABLED=false + VAULT_URL=http://localhost:8200 + VAULT_SECRETS_PATH=/kv/lectern + VAULT_TOKEN=00000000-0000-0000-0000-000000000000 + VAULT_ROLE= + ``` + +
+ **Click here for an explanation of Lectern Server environment variables** + + - **Express Configuration** + - `PORT`: Port number for the Lectern Server web application (default: 3000) + - `OPENAPI_PATH`: Path to Swagger UI with API documentation (default: /api-docs) + + - **MongoDB Configuration** + - `MONGO_HOST`: MongoDB server hostname (default: localhost) + - `MONGO_PORT`: MongoDB server port (default: 27017) + - `MONGO_DB`: Database name to use (default: lectern) + - `MONGO_USER`: Username for MongoDB connection (optional) + - `MONGO_PASS`: Password for MongoDB connection (optional) + + - **Authorization** (optional) + - `AUTH_ENABLED`: Enable/disable authorization (default: false) + - `EGO_API`: URL to the EGO API for JWT validation + - `SCOPE`: Policy name to look for in JWT scope + - `CORS_ALLOWED_ORIGINS`: Comma-separated list of allowed CORS origins + + - **Vault Configuration** (optional) + - `VAULT_ENABLED`: Enable/disable Vault integration (default: false) + - `VAULT_URL`: URL to Vault server + - `VAULT_SECRETS_PATH`: Path to secrets in Vault + - `VAULT_TOKEN`: Access token for Vault + - `VAULT_ROLE`: Role to use for Vault connection + +
+ +5. Build the Lectern Server and its dependencies: + + ```bash + pnpm nx build @overture-stack/lectern-server + # or from the server directory: + # pnpm build + ``` + +6. Start the Lectern Server development server: + + ```bash + pnpm nx start @overture-stack/lectern-server + # or for development mode with hot reloading: + # pnpm nx debug server + ``` + +### Verification + +After installation and configuration, verify that Lectern Server is functioning correctly: + +1. **Test Dictionary Management** + + - Access the API documentation at `http://localhost:3000/api-docs` + - Try creating a new data dictionary using the REST API + - Expected result: Should be able to create, view, and manage data dictionaries + - Troubleshooting: + - Check MongoDB connection and ensure database is accessible + - Verify API endpoints are responding correctly + - Check server logs for any validation errors + +2. **Test API Endpoints** + - Health check: `curl http://localhost:3000/health` + - API documentation: Navigate to `http://localhost:3000/api-docs` + - Expected result: Health endpoint should return 200 OK, Swagger docs should load + +**Optional: Enabling Authorization** + +For production use, you should enable authorization: + +1. Set `AUTH_ENABLED=true` in your `.env` file +2. Configure `EGO_API` to point to your Ego authorization service +3. Set the appropriate `SCOPE` for your permissions + +**Optional: Vault Integration** + +If you use HashiCorp Vault for secret management: + +1. Set `VAULT_ENABLED=true` in your `.env` file +2. Configure the Vault connection parameters +3. Lectern will read MongoDB credentials from Vault instead of environment variables + +For further assistance, open an issue on [GitHub](https://github.com/overture-stack/lectern/issues). + +:::warning +This guide is meant for development purposes and is not intended for production use. If you use this in any public or production environment, please implement appropriate security measures and configure your environment variables accordingly. +::: + +--- + +## **Additional Development Commands** + +### From the workspace root: +```bash +# Build Lectern Server +pnpm nx build @overture-stack/lectern-server + +# Start Lectern Server +pnpm nx start @overture-stack/lectern-server + +# Run in debug mode (with hot reloading) +pnpm nx debug server +``` + +### From the apps/server directory: +```bash +# Build +pnpm build + +# Start +pnpm start + +# Development mode +pnpm debug +``` + +### Building Docker Image: +```bash +# From workspace root +docker build --no-cache -t lectern -f apps/server/Dockerfile . +``` \ No newline at end of file diff --git a/docs/dictionary-reference.md b/docs/Reference/00-dictionaryReference.md similarity index 95% rename from docs/dictionary-reference.md rename to docs/Reference/00-dictionaryReference.md index f192cdf3..ecf745b8 100644 --- a/docs/dictionary-reference.md +++ b/docs/Reference/00-dictionaryReference.md @@ -1,4 +1,4 @@ -# Lectern Dictionary Meta-Schema Refernce +# Dictionary Meta-Schema Refernce For a high level description of the component parts of a Lectern Dictionary see [Important Concepts - Dictionary Model](./important-concepts.md#dictionary-model). @@ -9,16 +9,23 @@ A Lectern Dictionary is a collection of Lectern Schemas. Each schema describes t In addition to schemas, a Lectern Dictionary can contain reference values that can be reused throughout the schema definitions to define property restrictions with shared rules. > **Dictionary Structure Example** +> > ```json > { -> "name": "example_dictionary", -> "description": "Collection of schemas to demonstrate Lectern functionality", -> "meta": { /* Custom meta data about the dictionary here */ }, -> -> "version": "1.0", -> -> "schemas": [ /* Schemas Here */ ], -> "references": { /* Reference Variables Here */ } +> "name": "example_dictionary", +> "description": "Collection of schemas to demonstrate Lectern functionality", +> "meta": { +> /* Custom meta data about the dictionary here */ +> }, +> +> "version": "1.0", +> +> "schemas": [ +> /* Schemas Here */ +> ], +> "references": { +> /* Reference Variables Here */ +> } > } > ``` @@ -27,23 +34,28 @@ In addition to schemas, a Lectern Dictionary can contain reference values that c | `name` | `string` | Required | Display name of the dictionary | `"Example Lectern Dictionary"` | | `version` | `string`, as a semantic version number `major`.`minor`.`patch` | Required | Version of the dictionary. | `"1.23.4"` | | `schemas` | `Array<`[`LecternSchema`](#dictionary-schema-structure)`>` | Required | An array containing Lectern Schemas. Minimum of 1 Schema is required. | [Dictionary Schema Structure](#dictionary-schema-structure) | -| `description` | `string` | Optional | Free text description of the schema, for use as a reference for users of the dictionary. This description is not used by Lcetern for dictionary validation. | `"Collection of schemas to demonstrate Lectern functionality"` | +| `description` | `string` | Optional | Free text description of the schema, for use as a reference for users of the dictionary. This description is not used by Lcetern for dictionary validation. | `"Collection of schemas to demonstrate Lectern functionality"` | | `meta` | [MetaData](#meta-data-structure) object | Optional | Schema implementor defined fields to capture any additional properties not defined in standard Lectern dictionaries. These properties are not used by Lctern for dictionary validation | `{ "author": "Guy Incognito" }` | | `references` | [References](#references-structure) object | Optional | Reference values that can be referenced throughout the dictionary. | `{ "customRegex": { "ncitIds": "^NCIT:C\d+$" } }` | ### Dictionary Schema Structure + > **Dictionary Schema Example** +> > ```json > { -> "name": "example-schema", -> "description": "Demonstrating structure of Lectern Schema", -> "meta": { /* Custom meta data about the schema here */ }, -> -> "fields": [ /* Fields Here */ ] +> "name": "example-schema", +> "description": "Demonstrating structure of Lectern Schema", +> "meta": { +> /* Custom meta data about the schema here */ +> }, +> +> "fields": [ +> /* Fields Here */ +> ] > } > ``` - | Property | Type | Required | Default | Description | Example | | ------------- | ------------------------------------------------------- | -------- | ------- | :------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------- | | `name` | NameString (no whitespace or `.`) | Required | | Name of the schema. This will be used in paths that reference this schema, and for identifying files containing records for this schema. | `"example-schema"` | @@ -52,18 +64,22 @@ In addition to schemas, a Lectern Dictionary can contain reference values that c | `meta` | [`MetaData`](#meta-data-structure) | Optional | None | Schema implementor defined fields to capture any additional properties not defined in standard Lectern schemas. | [Meta Data Structure](#meta-data-structure) | ### Dictionary Field Structure + > **Example Dictionary Field Definition** +> > ```json > { -> "name": "example_field", -> "description": "Shows a string field with a required restriction", -> "meta": { /* Custom meta data abou the field here */ }, -> "isArray": false, -> -> "valueType": "string", -> "restrictions": { -> "required": true -> } +> "name": "example_field", +> "description": "Shows a string field with a required restriction", +> "meta": { +> /* Custom meta data abou the field here */ +> }, +> "isArray": false, +> +> "valueType": "string", +> "restrictions": { +> "required": true +> } > } > ``` @@ -78,8 +94,6 @@ In addition to schemas, a Lectern Dictionary can contain reference values that c | `restrictions` | Optional | No Restrictions | `RestrictionsObject` or `Array` | An object containing all validation rules for this field. This can be a single object containing all [restrictions](#field-restrictions) applied to this field or a list of objects whose restrictions will be combined. [Conditional restrictions](#conditional-restrictions) can also be used to apply validation rules based on values of other fields in the record. | `{ "required": true }` | | `unique` | Optional | `false` | `boolean` | Indicates that every record in this schema should have a unique value for this field. This rule is applied when a collection of records are validated together, ensuring that no two records in that collection repeat a value. | `true` | - - #### Field Data Types | valueType | Description | Examples | @@ -87,7 +101,7 @@ In addition to schemas, a Lectern Dictionary can contain reference values that c | `boolean` | Boolean value, either `true` or `false`. Accepts values with any letter casing, for example `true`, `True`, and `TRUE` will all be interpretted as `true` | `true`, `false` | | `integer` | Numeric integer value. Will accept positive and negative values (ex. `21` or `-8`) but will reject any decimals (ex. `1.23`) | `21`, `-8` | | `number` | Numeric value. Will accept any numeric value, including those with decimals. | `1.23`, `-4.567` | -| `string` | String fields. Value can have any length and use any character, other than the array delimiter for an array field (by default ` \| `) | `asdf`, `Hello World`, `Another longer example of a string` | +| `string` | String fields. Value can have any length and use any character, other than the array delimiter for an array field (by default `\|`) | `asdf`, `Hello World`, `Another longer example of a string` | #### Field Restrictions @@ -113,7 +127,7 @@ Restrictions can be added with conditions so that the validations are only appli A conditional restriction uses an if/then/else style syntax: -The `if` property will be an object containing an array of `conditions` that look at other fields on the same record and apply matching rules to their values. When those field values match the rules in the condition than the condition passes. An optional `case` property can be added to the `if` object that defines how many of the `conditions` have to pass in order for the whole condition block to resolve as `true` - default is `all`, requiring all conditions to be met. +The `if` property will be an object containing an array of `conditions` that look at other fields on the same record and apply matching rules to their values. When those field values match the rules in the condition than the condition passes. An optional `case` property can be added to the `if` object that defines how many of the `conditions` have to pass in order for the whole condition block to resolve as `true` - default is `all`, requiring all conditions to be met. The `then` object contains the restrictions that will be applied when the `if` condition is `true`, and the `else` condition contains restrictions to apply when the `if` condition is `false`. The `then` property is required but using an `else` property is optional. @@ -146,74 +160,81 @@ A requirement condition is defined by providing a field name or list of field na | `case` | Optional | `all` | `all`, `any`, `none` | Defines how many of the listed `fields` must have a value that matches the `match` rules. `all` requires all fields values to have matching values. `any` requires at least one field to have a matching value. `none` requires that there none of the specified fields have values that match. | `any` | > **Example Conditional Restriction**: match single value -> +> > Condition where `shirt_size` is `Small` +> > ```json > { -> "fields": ["shirt_size"], -> "match": { -> "value": "Small" -> } +> "fields": ["shirt_size"], +> "match": { +> "value": "Small" +> } > } > ``` > **Example Conditional Restriction**: match value from list -> +> > Condition where `shirt_size` is any value in a list (`Medium` or `Large`) +> > ```json > { -> "fields": ["shirt_size"], -> "match": { -> "codeList": ["Medium", "Large`"] -> } +> "fields": ["shirt_size"], +> "match": { +> "codeList": ["Medium", "Large`"] +> } > } > ``` ##### Conditional Match Rules -| Property | Used with Field Types | Type | Description | Example | -| ---------- | --------------------- | -------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | -| `codeList` | all | Array of type of specified fields | A list of values that the field could match. This rule passes when the specified field's value can be found in this list. | `["value_one", "value_two"]` | -| `count` | Array type fields | Integer, or [RangeRule](#rangerule-data-structure) | Matches the number of values in an array field. This condition can be provided as a number, in which case this condition matches if the array is that exact length. This condition can be provided as a Range object as well, in which case it will match if the number of elements in the array pass the minimum and maximum conditions provided in the condition. | `2` - Field must have exactly 2 elements.
`{ max: 10 }` - Field must have no more than 10 items. | -| `exists` | all | Boolean | This condition requires a field to either have a value or have no value. When the `exists` condition is set to `true`, the field must have a value. When `exists` is sdet to `false`, the field must have no value. For array fields, `exists=false` only matches when the array is completely empty, and `exists=true` passes if the array has 1 or more values - `arrayCase` has no interaction with the `exists` condition. | `true` | -| `range` | `number`, `integer` | [RangeRule](#rangerule-data-structure) | Maximum and minimum value conditions that a numeric field must pass. | `{ min: 5, exclusiveMax: 10 }` Represents an integer from 5-9. | -| `regex` | `string` | String (Regular Expression) | A regular expression pattern that the value must match. | `^NCIT:C\d+$` Value must match an NCI Thesaurus ID. | -| `value` | all | Type of specified fields | Field value matches the value of the specified field. Strings are matched case insensitive. When arrays are matched, the order of their elements is ignored - a field matches this condition if the elements in field are the same elements as in the value match rule. For example, the rule `['abc', 'def']` matches the value `['def', 'abc']` but does not match `['abc', 'def', 'ghi']`. | `some_value`, `[1, 2, 3]` | + +| Property | Used with Field Types | Type | Description | Example | +| ---------- | --------------------- | -------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ | +| `codeList` | all | Array of type of specified fields | A list of values that the field could match. This rule passes when the specified field's value can be found in this list. | `["value_one", "value_two"]` | +| `count` | Array type fields | Integer, or [RangeRule](#rangerule-data-structure) | Matches the number of values in an array field. This condition can be provided as a number, in which case this condition matches if the array is that exact length. This condition can be provided as a Range object as well, in which case it will match if the number of elements in the array pass the minimum and maximum conditions provided in the condition. | `2` - Field must have exactly 2 elements. `{ max: 10 }` - Field must have no more than 10 items. | +| `exists` | all | Boolean | This condition requires a field to either have a value or have no value. When the `exists` condition is set to `true`, the field must have a value. When `exists` is sdet to `false`, the field must have no value. For array fields, `exists=false` only matches when the array is completely empty, and `exists=true` passes if the array has 1 or more values - `arrayCase` has no interaction with the `exists` condition. | `true` | +| `range` | `number`, `integer` | [RangeRule](#rangerule-data-structure) | Maximum and minimum value conditions that a numeric field must pass. | `{ min: 5, exclusiveMax: 10 }` Represents an integer from 5-9. | +| `regex` | `string` | String (Regular Expression) | A regular expression pattern that the value must match. | `^NCIT:C\d+$` Value must match an NCI Thesaurus ID. | +| `value` | all | Type of specified fields | Field value matches the value of the specified field. Strings are matched case insensitive. When arrays are matched, the order of their elements is ignored - a field matches this condition if the elements in field are the same elements as in the value match rule. For example, the rule `['abc', 'def']` matches the value `['def', 'abc']` but does not match `['abc', 'def', 'ghi']`. | `some_value`, `[1, 2, 3]` | ### Meta Data Structure > **Meta Example** +> > ```json > { -> "displayName": "Nicely Formatted Name", -> "externalReferenceId": "ABCD:1234", -> "exampleBooleanPropery": true, -> "exampleNumericProperty": 123 +> "displayName": "Nicely Formatted Name", +> "externalReferenceId": "ABCD:1234", +> "exampleBooleanPropery": true, +> "exampleNumericProperty": 123 > } > ``` A `meta` object is available to allow the dictionary creator to add custom properties to the Lectern Dictionary. The `meta` property is available to all Dictionary, Schema, and Field objects. Providing a `meta` value is optional. If provided the `meta` value is a JSON object. There are no restrictions on the field names that can be added to the `meta` object other than they must be valid JSON. The values for properties of the `meta` can either be another nested meta object, or are one of the allowed value types: - - `string` - - `number` - - `boolean` - - `Array` - - `Array` + +- `string` +- `number` +- `boolean` +- `Array` +- `Array` ### References Structure References are defined at the dictionary level so they can be reused across schemas. References can be used to store values that can be used in `meta` or `restrictions` #### Using References + Reference variables can be used in a `meta` object or a `restrictions` object as either a restriction value or a conditional match value. -To use a reference, replace the value in the value of the meta or restriction property with a string containing a `ReferenceTag`. A `ReferenceTags` +To use a reference, replace the value in the value of the meta or restriction property with a string containing a `ReferenceTag`. A `ReferenceTags` ### RangeRule Data Structure > **RangeRule Example** +> > ```json > { -> "min": 5, -> "exclusiveMax": 10 +> "min": 5, +> "exclusiveMax": 10 > } > ``` @@ -233,11 +254,11 @@ A `RangeRule` cannot include but an inclusive and exclusive version of min, or o ### ComparedFieldsRule Data Structure > **ComparedFieldsRule** Example -> +> > ```json > { -> "fields": "some_field", -> "relation": "equal", +> "fields": "some_field", +> "relation": "equal" > } > ``` @@ -262,4 +283,4 @@ A `RangeRule` cannot include but an inclusive and exclusive version of min, or o ## Source Code Reference -Source code for the Lectern Dictionary meta-schema is made available through the package [@overture-stack/lectern-dictionary](../packages/dictionary/). The meta-schema is formally defined in TypeScript and exported as the type `Dictionary` from the file [`dictionary/src/types/dictionaryTypes.ts`](../packages/dictionary/src/types/dictionaryTypes.ts). This definition is created using [`Zod`] schemas, which are also exported from this package and available for use to confirm a given object is a valid Lectern Dictionary. \ No newline at end of file +Source code for the Lectern Dictionary meta-schema is made available through the package [@overture-stack/lectern-dictionary](../packages/dictionary/). The meta-schema is formally defined in TypeScript and exported as the type `Dictionary` from the file [`dictionary/src/types/dictionaryTypes.ts`](../packages/dictionary/src/types/dictionaryTypes.ts). This definition is created using [`Zod`] schemas, which are also exported from this package and available for use to confirm a given object is a valid Lectern Dictionary. diff --git a/docs/important-concepts.md b/docs/Reference/01-glossaryOfTerms.md similarity index 94% rename from docs/important-concepts.md rename to docs/Reference/01-glossaryOfTerms.md index 644b34c1..ab3e834b 100644 --- a/docs/important-concepts.md +++ b/docs/Reference/01-glossaryOfTerms.md @@ -4,11 +4,11 @@ This document is a reference of commonly used terms and definitions. ## Dictionary Model -Lectern provides a "meta-schema" which describes a syntax for creating Data Dictionaries. This meta-schema is a set of rules for a JSON document, and any JSON document that correctly applies these rules represents a valid Lectern Dictionary. The meta-schema is defined through code rules in the [@overture-stack/lectern-dictionary](../packages/dictionary) package. +Lectern provides a "meta-schema" which describes a syntax for creating Data Dictionaries. This meta-schema is a set of rules for a JSON document, and any JSON document that correctly applies these rules represents a valid Lectern Dictionary. The meta-schema is defined through code rules in the [@overture-stack/lectern-dictionary](../packages/dictionary) package. A [JSON-schema version of this meta-schema](../generated/DictionaryMetaSchema.json) has been generated and is included in this code base. -This section describes at a high level the component parts of a Lectern Dictionary and the terms used when discussing those parts. The terms defined here are used throughout the documentation and the type system of the Lectern codebase. If you are writing a Lectern Dictionary, you may instead be looking for the [reference documentation for Lectern Dictionaries](). +This section describes at a high level the component parts of a Lectern Dictionary and the terms used when discussing those parts. The terms defined here are used throughout the documentation and the type system of the Lectern codebase. If you are writing a Lectern Dictionary, you may instead be looking for the reference documentation for Lectern Dictionaries. ### Dictionary @@ -49,6 +49,7 @@ Placeholder ## Common Types ### DataRecord and UnprocessedDataRecord + The `DataRecord` type represents a single record from some Schema. They are objects with keys that match the [fields](#field) from a [schema](#schema) and a value that should be one of the valid Lectern data types. There is no guarantee that a `DataRecord` is "valid", it could have values that fail some restrictions from the schema. An `UnprocessedDataRecord` is very similar, but all values are raw string values. These represent a single record as it would be submitted in a test file, for example all the data from a single line in a TSV. These string values will need to be [parsed](#parsing) to be converted to their proper types as defined in a schema. @@ -65,7 +66,7 @@ Example Valid `TestResult`: ```ts { - valid: true + valid: true; } ``` @@ -99,4 +100,4 @@ Placeholder ### Processing -Placeholder \ No newline at end of file +Placeholder diff --git a/docs/assets/submission-system.svg b/docs/assets/submission-system.svg new file mode 100644 index 00000000..780e0ddb --- /dev/null +++ b/docs/assets/submission-system.svg @@ -0,0 +1,4 @@ + + + +
Search & Exploration
Data Management & Storage
Arranger Configs
Define the structure and formatting of your data
Index Mapping
 Generated based on Lyric dictionary Schema
Lectern Dictionary 
Define the structure of your tabular data 
Tabular Data
Submission
File Metadata
Submission
File Data
Submission
\ No newline at end of file diff --git a/docs/overview/images/submission-system.svg b/docs/overview/images/submission-system.svg deleted file mode 100644 index a41f4e66..00000000 --- a/docs/overview/images/submission-system.svg +++ /dev/null @@ -1,4 +0,0 @@ - - - - \ No newline at end of file diff --git a/docs/lectern-2.0-changes.md b/pendingDocs/migration/lectern2changes.md similarity index 99% rename from docs/lectern-2.0-changes.md rename to pendingDocs/migration/lectern2changes.md index 9f389539..23361b3e 100644 --- a/docs/lectern-2.0-changes.md +++ b/pendingDocs/migration/lectern2changes.md @@ -22,7 +22,6 @@ The release of Lectern 2.0 brings some important upgrades to the Lectern service - Updated interface for Lectern Server REST client - Exposes dictionary meta-schema validation, data parsing, and data validation functions - ### New Published Lectern TS Packages - [Lectern Dictionary](../packages/dictionary/) diff --git a/docs/validation/field-validation.md b/pendingDocs/validation/field-validation.md similarity index 100% rename from docs/validation/field-validation.md rename to pendingDocs/validation/field-validation.md diff --git a/docs/validation/index.md b/pendingDocs/validation/index.md similarity index 100% rename from docs/validation/index.md rename to pendingDocs/validation/index.md diff --git a/docs/validation/record-validation.md b/pendingDocs/validation/record-validation.md similarity index 100% rename from docs/validation/record-validation.md rename to pendingDocs/validation/record-validation.md From 51c6d60a6539d39e4ea93ad02799b1834b85ce77 Mon Sep 17 00:00:00 2001 From: Mitchell Shiell Date: Thu, 3 Jul 2025 15:06:25 -0400 Subject: [PATCH 2/5] setup.md --- docs/02-Setup.md | 305 ++++++++++++++++++++++++----------------------- 1 file changed, 156 insertions(+), 149 deletions(-) diff --git a/docs/02-Setup.md b/docs/02-Setup.md index b75c1095..e4d63101 100644 --- a/docs/02-Setup.md +++ b/docs/02-Setup.md @@ -6,222 +6,223 @@ This guide provides instructions for setting up a complete development environme Before beginning, ensure you have the following installed on your system: -- PNPM (package manager - used instead of npm) -- Node.js (v18 or higher) -- Docker (for running containerized services) +- **PNPM** (package manager - used instead of npm) +- **Node.js** (v18 or higher) +- **Docker** (for running containerized services) -## Developer Setup +## Development Environment Setup -This guide will walk you through setting up a complete development environment for Lectern Server, including its complementary services. +### 1. Database Setup -### Setting up supporting services +Lectern requires a MongoDB database to store dictionaries and metadata. Choose one of the following setup methods: -Lectern Server requires a MongoDB database to store dictionaries and metadata. +**Option A: Using Docker Compose (Recommended)** -1. **MongoDB Database Setup** - - Use the provided docker-compose configuration to start MongoDB: +```bash +# Navigate to the server directory +cd apps/server - ```bash - # Navigate to the server directory - cd apps/server - - # Start MongoDB using docker-compose - docker-compose up -d - ``` +# Start MongoDB using docker-compose +docker-compose up -d +``` - Alternatively, you can start MongoDB manually: +**Option B: Manual Docker Setup** - ```bash - docker run --name lectern-mongo \ - -e MONGO_INITDB_ROOT_USERNAME=admin \ - -e MONGO_INITDB_ROOT_PASSWORD=password \ - -p 27017:27017 \ - -d mongo:latest - ``` +```bash +docker run --name lectern-mongo \ +-e MONGO_INITDB_ROOT_USERNAME=admin \ +-e MONGO_INITDB_ROOT_PASSWORD=password \ +-p 27017:27017 \ +-d mongo:latest +```
- **Click here for a detailed breakdown** - - This command will set up the database service for Lectern Server development as follows: - - | Service | Port | Description | Purpose in Lectern Server Development | - | --------------- | ------ | ------------------------------ | ------------------------------------ | - | MongoDB | 27017 | NoSQL database for dictionary storage | Stores data dictionaries, versions, and metadata | + Database Service Details - - Ensure port 27017 is free on your system before starting the database. - - The default configuration uses `admin/password` for MongoDB credentials. - - You may need to adjust the port in the configuration file if you have conflicts with existing services. + | Service | Port | Description | Purpose | + |---------|-------|---------------------------------------|----------------------------------------------| + | MongoDB | 27017 | NoSQL database for dictionary storage | Stores data dictionaries, versions, and metadata | -
+ **Important Notes:** + - Ensure port 27017 is available on your system + - Default credentials: `admin/password` + - Adjust port configuration if conflicts exist with other services -In the next steps, we will run a Lectern development server against these supporting services. + -### Running the Development Server +### 2. Server Setup -1. Clone Lectern and move into its directory: +1. **Clone the Repository** ```bash git clone https://github.com/overture-stack/lectern.git cd lectern ``` -2. Install all dependencies for the entire monorepo: +2. **Install Dependencies** ```bash + # Install all dependencies for the entire monorepo pnpm install ``` -3. Navigate to the server directory: +3. **Configure Environment** ```bash cd apps/server + cp .env.example .env ``` -4. Configure environment variables: + The `.env` file comes preconfigured with development defaults: - ```bash - cp .env.example .env + ```env + # Express Configuration + PORT=3000 + + # Swagger Documentation + OPENAPI_PATH=/api-docs + + # MongoDB Configuration + MONGO_HOST=localhost + MONGO_PORT=27017 + MONGO_DB=lectern + MONGO_USER= + MONGO_PASS= + + # Authentication (disabled by default) + AUTH_ENABLED=false + EGO_API= + SCOPE= + + # CORS Configuration + CORS_ALLOWED_ORIGINS= + + # Vault Configuration (disabled by default) + VAULT_ENABLED=false + VAULT_URL=http://localhost:8200 + VAULT_SECRETS_PATH=/kv/lectern + VAULT_TOKEN=00000000-0000-0000-0000-000000000000 + VAULT_ROLE= ``` - :::info +
+ Environment Variables Reference - This `.env` file is preconfigured as follows for the Lectern Server environment: - - ```env - # Express Configuration - PORT=3000 - - # Swagger Docs Config - OPENAPI_PATH=/api-docs - - # Mongo Configuration - MONGO_HOST=localhost - MONGO_PORT=27017 - MONGO_DB=lectern - MONGO_USER= - MONGO_PASS= - - # Auth Configuration - AUTH_ENABLED=false - EGO_API= - SCOPE= - - # CORS allowed origins can be a comma separated list of the allowed domains. - # Leave empty to not allow any web connections (default). Use * to allow all. - CORS_ALLOWED_ORIGINS= - - # Vault Configuration - VAULT_ENABLED=false - VAULT_URL=http://localhost:8200 - VAULT_SECRETS_PATH=/kv/lectern - VAULT_TOKEN=00000000-0000-0000-0000-000000000000 - VAULT_ROLE= - ``` - -
- **Click here for an explanation of Lectern Server environment variables** - - - **Express Configuration** - - `PORT`: Port number for the Lectern Server web application (default: 3000) - - `OPENAPI_PATH`: Path to Swagger UI with API documentation (default: /api-docs) - - - **MongoDB Configuration** - - `MONGO_HOST`: MongoDB server hostname (default: localhost) - - `MONGO_PORT`: MongoDB server port (default: 27017) - - `MONGO_DB`: Database name to use (default: lectern) - - `MONGO_USER`: Username for MongoDB connection (optional) - - `MONGO_PASS`: Password for MongoDB connection (optional) - - - **Authorization** (optional) - - `AUTH_ENABLED`: Enable/disable authorization (default: false) - - `EGO_API`: URL to the EGO API for JWT validation - - `SCOPE`: Policy name to look for in JWT scope - - `CORS_ALLOWED_ORIGINS`: Comma-separated list of allowed CORS origins - - - **Vault Configuration** (optional) - - `VAULT_ENABLED`: Enable/disable Vault integration (default: false) - - `VAULT_URL`: URL to Vault server - - `VAULT_SECRETS_PATH`: Path to secrets in Vault - - `VAULT_TOKEN`: Access token for Vault - - `VAULT_ROLE`: Role to use for Vault connection - -
- -5. Build the Lectern Server and its dependencies: + **Express Configuration** + - `PORT`: Server port (default: 3000) + - `OPENAPI_PATH`: Swagger UI path (default: /api-docs) + + **MongoDB Configuration** + - `MONGO_HOST`: Database hostname (default: localhost) + - `MONGO_PORT`: Database port (default: 27017) + - `MONGO_DB`: Database name (default: lectern) + - `MONGO_USER`: Database username (optional) + - `MONGO_PASS`: Database password (optional) + + **Authentication (Optional)** + - `AUTH_ENABLED`: Enable JWT-based authorization (default: false) + - `EGO_API`: EGO API URL for JWT validation + - `SCOPE`: Required policy name in JWT scope + - `CORS_ALLOWED_ORIGINS`: Comma-separated list of allowed origins + + **Vault Integration (Optional)** + - `VAULT_ENABLED`: Enable HashiCorp Vault integration (default: false) + - `VAULT_URL`: Vault server URL + - `VAULT_SECRETS_PATH`: Path to secrets in Vault + - `VAULT_TOKEN`: Vault access token + - `VAULT_ROLE`: Vault role for authentication + +
+ +4. **Build the Application** ```bash + # From workspace root pnpm nx build @overture-stack/lectern-server - # or from the server directory: - # pnpm build + + # Or from apps/server directory + pnpm build ``` -6. Start the Lectern Server development server: +5. **Start the Development Server** ```bash + # Production mode pnpm nx start @overture-stack/lectern-server - # or for development mode with hot reloading: - # pnpm nx debug server + + # Development mode with hot reloading + pnpm nx debug server ``` -### Verification +## Verification & Testing -After installation and configuration, verify that Lectern Server is functioning correctly: +### API Health Check -1. **Test Dictionary Management** +Verify that Lectern is running correctly: - - Access the API documentation at `http://localhost:3000/api-docs` - - Try creating a new data dictionary using the REST API - - Expected result: Should be able to create, view, and manage data dictionaries - - Troubleshooting: - - Check MongoDB connection and ensure database is accessible - - Verify API endpoints are responding correctly - - Check server logs for any validation errors +```bash +# Health endpoint +curl http://localhost:3000/health + +# Expected response: 200 OK +``` -2. **Test API Endpoints** - - Health check: `curl http://localhost:3000/health` - - API documentation: Navigate to `http://localhost:3000/api-docs` - - Expected result: Health endpoint should return 200 OK, Swagger docs should load +### API Documentation -**Optional: Enabling Authorization** +Access the interactive API documentation at: +- **Swagger UI**: `http://localhost:3000/api-docs` + +### Dictionary Management Testing + +1. Navigate to the Swagger UI +2. Test creating a new data dictionary using the REST API +3. Verify dictionary creation, retrieval, and management operations + +**Troubleshooting:** +- Ensure MongoDB is running and accessible +- Check server logs for validation errors +- Verify API endpoints are responding correctly + + +:::info Need Help? +If you encounter any issues or have questions about our API, please don't hesitate to reach out through our relevant [**community support channels**](https://docs.overture.bio/community/support). +::: -For production use, you should enable authorization: +## Advanced Configuration + +### Enabling Authorization + +For production environments, enable JWT-based authorization: 1. Set `AUTH_ENABLED=true` in your `.env` file 2. Configure `EGO_API` to point to your Ego authorization service 3. Set the appropriate `SCOPE` for your permissions -**Optional: Vault Integration** +### Vault Integration -If you use HashiCorp Vault for secret management: +For secure secret management using HashiCorp Vault: 1. Set `VAULT_ENABLED=true` in your `.env` file -2. Configure the Vault connection parameters -3. Lectern will read MongoDB credentials from Vault instead of environment variables - -For further assistance, open an issue on [GitHub](https://github.com/overture-stack/lectern/issues). - -:::warning -This guide is meant for development purposes and is not intended for production use. If you use this in any public or production environment, please implement appropriate security measures and configure your environment variables accordingly. -::: +2. Configure Vault connection parameters +3. Lectern will retrieve MongoDB credentials from Vault instead of environment variables ---- +## Development Commands Reference -## **Additional Development Commands** +### From Workspace Root -### From the workspace root: ```bash -# Build Lectern Server +# Build server pnpm nx build @overture-stack/lectern-server -# Start Lectern Server +# Start server pnpm nx start @overture-stack/lectern-server -# Run in debug mode (with hot reloading) +# Debug mode (hot reloading) pnpm nx debug server ``` -### From the apps/server directory: +### From apps/server Directory + ```bash # Build pnpm build @@ -233,8 +234,14 @@ pnpm start pnpm debug ``` -### Building Docker Image: +### Docker Operations + ```bash -# From workspace root +# Build Docker image docker build --no-cache -t lectern -f apps/server/Dockerfile . -``` \ No newline at end of file +``` + + +:::warning +This guide is intended for development purposes only. For production deployments, implement appropriate security measures, configure authentication, and review all environment variables for your specific use case. +::: \ No newline at end of file From fa6e7789977d2479533a8eb2861eccc00173fb1f Mon Sep 17 00:00:00 2001 From: Mitchell Shiell Date: Thu, 3 Jul 2025 18:22:55 -0400 Subject: [PATCH 3/5] patial update to dictionaryReference.md --- docs/03-dictionaryReference.md | 604 ++++++++++++++++++ .../01-glossaryOfTerms.md => 04-glossary.md} | 2 +- docs/Reference/00-dictionaryReference.md | 286 --------- 3 files changed, 605 insertions(+), 287 deletions(-) create mode 100644 docs/03-dictionaryReference.md rename docs/{Reference/01-glossaryOfTerms.md => 04-glossary.md} (99%) delete mode 100644 docs/Reference/00-dictionaryReference.md diff --git a/docs/03-dictionaryReference.md b/docs/03-dictionaryReference.md new file mode 100644 index 00000000..f2800040 --- /dev/null +++ b/docs/03-dictionaryReference.md @@ -0,0 +1,604 @@ +# Dictionary Syntax + +A Lectern Dictionary is a JSON configuration that defines the structure and validation rules for tabular data files. It consists of schemas that describe individual file formats, with each schema containing field definitions and validation constraints. + +## Basic Dictionary Structure + +A Lectern Dictionary is a JSON configuration file that defines the structure and validation rules for your data files. At its core, every dictionary must contain three essential components: a **name**, **version**, and at least one **schema**. Additional optional components like descriptions, metadata, and references can enhance functionality. + +```json showLineNumbers {2-3,9-26} +{ + "name": "clinical_data_dictionary", + "version": "1.2.0", + "description": "Clinical trial data collection schemas", + "meta": { + "author": "Clinical Data Team", + "created": "2024-01-15" + }, + "schemas": [ + { + "name": "patient", + "description": "Patient demographic and clinical information", + "fields": [ + { + "name": "patient_id", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^PAT-\\d{6}$" + }, + "unique": true, + "description": "Unique patient identifier in format PAT-XXXXXX" + } + ] + } + ], + "references": { + "customRegex": { + "dateFormat": "^\\d{4}-\\d{2}-\\d{2}$" + } + } +} +``` + +### Basic Dictionary Properties + +| Property | Type | Required | Description | Example | +| ------------- | --------------- | -------- | -------------------------------------- | ----------------------------------------------- | +| `name` | `string` | ✓ | Display name of the dictionary | `"clinical_data_dictionary"` | +| `version` | `string` | ✓ | Semantic version (major.minor.patch) | `"1.2.0"` | +| `schemas` | `Array` | ✓ | List of schema definitions (minimum 1) | See [Schema Structure](#basic-schema-structure) | +| `description` | `string` | ✗ | A human-readable description | `"Clinical trial data schemas"` | +| `meta` | `object` | ✗ | Custom metadata fields | `{"author": "Clinical Data Team"}` | +| `references` | `object` | ✗ | Reusable reference values | See [References](#references) | + +## Basic Schema Structure + +Each schema defines the structure of a single tabular data file. Every dictionary must have a **name** and **fields** array. + +```json showLineNumbers {11,13-26} +{ + "name": "clinical_data_dictionary", + "version": "1.2.0", + "description": "Clinical trial data collection schemas", + "meta": { + "author": "Clinical Data Team", + "created": "2024-01-15" + }, + "schemas": [ + { + "name": "patient", + "description": "Patient demographic and clinical information", + "fields": [ + { + "name": "patient_id", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^PAT-\\d{6}$" + }, + "unique": true, + "description": "Unique patient identifier in format PAT-XXXXXX" + } + ] + } + ], + "references": { + "customRegex": { + "dateFormat": "^\\d{4}-\\d{2}-\\d{2}$" + } + } +} +``` + +### Schema Properties + +| Property | Type | Required | Description | +| ------------- | -------------- | -------- | ------------------------------------- | +| `name` | `string` | ✓ | Schema identifier (no spaces or dots) | +| `fields` | `Array` | ✓ | List of field definitions | +| `description` | `string` | ✗ | Human-readable description | +| `meta` | `object` | ✗ | Custom metadata | + +## Field Definitions + +Fields define the individual columns in your data files, at minimum a field object must have a **name** and **valueType**. + +```json showLineNumbers {15,17,25,27} +{ + "name": "clinical_data_dictionary", + "version": "1.2.0", + "description": "Clinical trial data collection schemas", + "meta": { + "author": "Clinical Data Team", + "created": "2024-01-15" + }, + "schemas": [ + { + "name": "patient", + "description": "Patient demographic and clinical information", + "fields": [ + { + "name": "patient_id", + "description": "Unique patient identifier in format PAT-XXXXXX", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^PAT-\\d{6}$" + }, + "unique": true + }, + { + "name": "diagnosis_date", + "description": "Date of initial diagnosis in YYYY-MM-DD format", + "valueType": "string", + "meta": { + "displayName": "Diagnosis Date", + "category": "clinical" + }, + "restrictions": { + "required": true, + "regex": "dateFormat" + } + } + ] + } + ], + "references": { + "customRegex": { + "dateFormat": "^\\d{4}-\\d{2}-\\d{2}$" + } + } +} +``` + +### Field Properties + +| Property | Type | Required | Default | Description | +| -------------- | -------------- | -------- | ------- | --------------------------------------------------------- | +| `name` | `string` | ✓ | - | Field identifier (used as a column header) | +| `description` | `string` | ✗ | `""` | Human-readable description | +| `valueType` | `string` | ✓ | - | Data type: `string`, `integer`, `number`, `boolean` | +| `isArray` | `boolean` | ✗ | `false` | Whether field accepts multiple values | +| `delimiter` | `string` | ✗ | `","` | Separator for array values | +| `unique` | `boolean` | ✗ | `false` | Whether values must be unique across records | +| `restrictions` | `object/array` | ✗ | `{}` | Where the validation rules/logic for the field is defined | +| `meta` | `object` | ✗ | `{}` | Custom metadata | + +### Field Data Types + +| Type | Description | Valid Examples | Invalid Examples | +| --------- | --------------------------------------------------- | -------------------------------- | ---------------------- | +| `string` | Text values (any characters except array delimiter) | `"Hello"`, `"PAT-001234"`, `""` | N/A (accepts any text) | +| `integer` | Whole numbers only | `42`, `-17`, `0` | `3.14`, `1.0`, `2.5` | +| `number` | Any numeric value | `42`, `3.14`, `-17.5`, `0` | `"abc"`, `"N/A"` | +| `boolean` | True/false (case-insensitive) | `true`, `True`, `FALSE`, `false` | `yes`, `1`, `0`, `Y` | + +### Field Restrictions + +Field restrictions define validation rules that field values must satisfy to be considered valid. These rules ensure data integrity by enforcing specific constraints on field content. + +The `restrictions` property accepts either: + +- A single restrictions object +- An array of restrictions objects + +When multiple restrictions are provided in an array, they are evaluated sequentially. Data is only considered valid if it passes every restriction in the array. + +Each restrictions object can contain: + +- **Standard Restrictions** (detailed in the sections below) +- **Conditional restrictions** that apply validation logic based on specific conditions + +## Standard Restrictions + +### `required` + +Ensures a field has a value. + +```json showLineNumbers +{ + "name": "patient_id", + "valueType": "string", + "restrictions": { "required": true } +} +``` + +**Validation behavior:** + +- Empty strings `""` are rejected +- Zero `0` is accepted for numbers +- `false` is accepted for booleans +- Arrays must contain at least one element + +### `codeList` + +Restricts values to a predefined list of acceptable options. + +```json showLineNumbers +{ + "name": "gender", + "valueType": "string", + "restrictions": { + "codeList": ["Male", "Female", "Other", "Unknown"] + } +} +``` + +### `range` + +Sets numeric boundaries for `integer` and `number` fields. + +```json showLineNumbers +{ + "name": "age", + "valueType": "integer", + "restrictions": { + "range": { "min": 0, "max": 120 } + } +} +``` + +**Range options:** + +- `min` / `max` - Inclusive boundaries +- `exclusiveMin` / `exclusiveMax` - Exclusive boundaries + +```json showLineNumbers +// Age must be 18 or older, but less than 65 +{ "range": { "min": 18, "exclusiveMax": 65 } } +``` + +### `regex` + +Applies pattern matching validation to string fields. + +```json showLineNumbers +{ + "name": "email", + "valueType": "string", + "restrictions": { + "regex": "^[\\w\\.-]+@[\\w\\.-]+\\.[a-zA-Z]{2,}$" + } +} +``` + +### `empty` + +Requires a field to be empty. This is typically used within conditional restrictions. + +```json showLineNumbers +{ + "restrictions": { "empty": true } +} +``` + +## Array-Specific Restrictions + +### `count` + +Controls the number of elements allowed in array fields. + +```json showLineNumbers +{ + "name": "medications", + "valueType": "string", + "isArray": true, + "restrictions": { + "count": { "min": 1, "max": 10 } + } +} +``` + +Uses the same boundary options as `range`: `min`, `max`, `exclusiveMin`, `exclusiveMax`. + +## Field Comparison Restrictions + +### `compare` + +Compares field values with other fields in the same record. + +```json showLineNumbers +{ + "name": "age_at_death", + "valueType": "integer", + "restrictions": { + "compare": { + "fields": ["age_at_diagnosis"], + "relation": "greaterThanOrEqual" + } + } +} +``` + +**Available comparison relations:** + +- `equal` / `notEqual` - Value equality comparison +- `greaterThan` / `greaterThanOrEqual` - Numeric comparison +- `lessThan` / `lessThanOrEqual` - Numeric comparison +- `contains` / `containedIn` - String containment comparison + +## Conditional Restrictions + +Conditional restrictions allow you to apply different validation rules based on values in other fields within the same record. This enables dynamic validation where the requirements for one field change depending on the data in other fields. + +Think of conditional restrictions as "if-then-else" logic for your data validation: + +- **IF** certain conditions are met in other fields +- **THEN** apply these validation rules +- **ELSE** apply different validation rules (optional) + +### Basic Structure + +```json showLineNumbers +{ + "if": { + "conditions": [ + /* conditions to check */ + ], + "case": "all" // how many conditions must be true + }, + "then": { + /* validation rules when conditions are true */ + }, + "else": { + /* validation rules when conditions are false */ + } +} +``` + +### Simple Example + +Let's start with a straightforward example: + +```json showLineNumbers +{ + "name": "date_of_death", + "valueType": "string", + "restrictions": { + "if": { + "conditions": [ + { "fields": ["patient_status"], "match": { "value": "deceased" } } + ] + }, + "then": { "required": true }, + "else": { "empty": true } + } +} +``` + +**What this means:** + +- **IF** the `patient_status` field equals "deceased" +- **THEN** the `date_of_death` field is required +- **ELSE** the `date_of_death` field must be empty + +### Multiple Conditions + +When you need to check multiple conditions, use the `case` property to specify how many must be true: + +```json showLineNumbers +{ + "name": "treatment_details", + "valueType": "string", + "restrictions": { + "if": { + "conditions": [ + { "fields": ["patient_status"], "match": { "value": "active" } }, + { "fields": ["enrollment_date"], "match": { "exists": true } } + ], + "case": "all" + }, + "then": { "required": true } + } +} +``` + +**What this means:** + +- **IF** `patient_status` equals "active" **AND** `enrollment_date` has a value +- **THEN** `treatment_details` is required + +### Case Options + +| Case Value | Description | Example | +| ----------------- | ----------------------------------- | ------------------------------------------------ | +| `"all"` (default) | All conditions must be true | Patient must be active AND enrolled | +| `"any"` | At least one condition must be true | Patient is active OR has enrollment date | +| `"none"` | No conditions can be true | Patient is NOT active AND has NO enrollment date | + +## Building Conditions + +Each condition has three parts: + +1. **`fields`** - Which fields to check +2. **`match`** - What to look for in those fields +3. **`case`** - How many fields must match (when checking multiple fields) + +### Basic Condition Structure + +```json showLineNumbers {6-9} +{ + "name": "treatment_details", + "valueType": "string", + "restrictions": { + "if": { + "conditions": [ + { "fields": ["patient_status"], "match": { "value": "active" } }, + { "fields": ["enrollment_date"], "match": { "exists": true } } + ], + "case": "all" + }, + "then": { "required": true } + } +} +``` + +The basic condition structure highlighted above is as follows: + +```json showLineNumbers +"conditions": [ + { + "fields": ["field_name"], + "match": { + "value": "specific_value" + } + } +], +``` + +### Checking Multiple Fields + +```json showLineNumbers +{ + "fields": ["field1", "field2", "field3"], + "match": { "value": "active" }, + "case": "any" // at least one field must equal "active" +} +``` + +### Match Criteria + +The `match` object defines what you're looking for. Here are the available options: + +### Exact Value Match + +```json showLineNumbers +{ + "fields": ["treatment_arm"], + "match": { "value": "Treatment_A" } +} +``` + +### Value from List + +```json showLineNumbers +{ + "fields": ["disease_stage"], + "match": { "codeList": ["Stage_III", "Stage_IV"] } +} +``` + +### Numeric Range + +```json showLineNumbers +{ + "fields": ["age"], + "match": { "range": { "min": 18, "max": 65 } } +} +``` + +### Pattern Matching + +```json showLineNumbers +{ + "fields": ["patient_id"], + "match": { "regex": "^PAT-\\d{6}$" } +} +``` + +### Field Has Value + +```json showLineNumbers +{ + "fields": ["consent_date"], + "match": { "exists": true } +} +``` + +### Array Length + +```json showLineNumbers +{ + "fields": ["medications"], + "match": { "count": { "min": 1 } } +} +``` + +```json showLineNumbers +{ + "name": "sociodem_question_detail", + "valueType": "string", + "restrictions": { + "if": { + "conditions": [ + { + "fields": ["sociodem_question"], + "match": { + "codeList": ["PCGL reference question", "Another question"] + }, + "case": "any" + } + ] + }, + "then": { "required": true }, + "else": { + "required": false, + "empty": true + } + } +} +``` + +### Working with Arrays + +When checking array fields, use `arrayFieldCase` to specify how many array elements must match: + +```json showLineNumbers +{ + "name": "diabetes_medication", + "valueType": "string", + "restrictions": { + "if": { + "conditions": [ + { + "fields": ["medical_history"], + "match": { "value": "diabetes" }, + "arrayFieldCase": "any" // any element in the array can match + } + ] + }, + "then": { "required": true } + } +} +``` + +### Array Field Case Options + +| Value | Description | +| -------- | ------------------------------------ | +| `"all"` | All elements in the array must match | +| `"any"` | At least one element must match | +| `"none"` | No elements can match | + +### Complex Conditional Logic + +You can combine multiple conditions with different logic: + +```json showLineNumbers +{ + "name": "follow_up_required", + "valueType": "boolean", + "restrictions": { + "if": { + "conditions": [ + { + "fields": ["treatment_response"], + "match": { "codeList": ["partial_response", "stable_disease"] } + }, + { "fields": ["adverse_events"], "match": { "count": { "min": 1 } } } + ], + "case": "any" // either condition can trigger the requirement + }, + "then": { "required": true } + } +} +``` + +## Source Code Reference + +Source code for the Lectern Dictionary meta-schema is available through the package [@overture-stack/lectern-dictionary](../packages/dictionary/). The meta-schema is formally defined in TypeScript and exported as the type `Dictionary` from [`dictionary/src/types/dictionaryTypes.ts`](../packages/dictionary/src/types/dictionaryTypes.ts). This definition uses [Zod](https://zod.dev/) schemas, which are also exported for validation purposes. + +:::info Need Help? +If you encounter any issues or have questions about our API, please don't hesitate to reach out through our relevant [**community support channels**](https://docs.overture.bio/community/support). +::: diff --git a/docs/Reference/01-glossaryOfTerms.md b/docs/04-glossary.md similarity index 99% rename from docs/Reference/01-glossaryOfTerms.md rename to docs/04-glossary.md index ab3e834b..d6efbdc4 100644 --- a/docs/Reference/01-glossaryOfTerms.md +++ b/docs/04-glossary.md @@ -1,4 +1,4 @@ -# Important Concepts +# Glossary This document is a reference of commonly used terms and definitions. diff --git a/docs/Reference/00-dictionaryReference.md b/docs/Reference/00-dictionaryReference.md deleted file mode 100644 index ecf745b8..00000000 --- a/docs/Reference/00-dictionaryReference.md +++ /dev/null @@ -1,286 +0,0 @@ -# Dictionary Meta-Schema Refernce - -For a high level description of the component parts of a Lectern Dictionary see [Important Concepts - Dictionary Model](./important-concepts.md#dictionary-model). - -## Dictionary Structure - -A Lectern Dictionary is a collection of Lectern Schemas. Each schema describes the structure of a TSV file, providing a list of the columns for that file and the data types and restrictions on the content of those columns. - -In addition to schemas, a Lectern Dictionary can contain reference values that can be reused throughout the schema definitions to define property restrictions with shared rules. - -> **Dictionary Structure Example** -> -> ```json -> { -> "name": "example_dictionary", -> "description": "Collection of schemas to demonstrate Lectern functionality", -> "meta": { -> /* Custom meta data about the dictionary here */ -> }, -> -> "version": "1.0", -> -> "schemas": [ -> /* Schemas Here */ -> ], -> "references": { -> /* Reference Variables Here */ -> } -> } -> ``` - -| Property | Type | Required | Description | Example | -| ------------- | -------------------------------------------------------------- | -------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- | -| `name` | `string` | Required | Display name of the dictionary | `"Example Lectern Dictionary"` | -| `version` | `string`, as a semantic version number `major`.`minor`.`patch` | Required | Version of the dictionary. | `"1.23.4"` | -| `schemas` | `Array<`[`LecternSchema`](#dictionary-schema-structure)`>` | Required | An array containing Lectern Schemas. Minimum of 1 Schema is required. | [Dictionary Schema Structure](#dictionary-schema-structure) | -| `description` | `string` | Optional | Free text description of the schema, for use as a reference for users of the dictionary. This description is not used by Lcetern for dictionary validation. | `"Collection of schemas to demonstrate Lectern functionality"` | -| `meta` | [MetaData](#meta-data-structure) object | Optional | Schema implementor defined fields to capture any additional properties not defined in standard Lectern dictionaries. These properties are not used by Lctern for dictionary validation | `{ "author": "Guy Incognito" }` | -| `references` | [References](#references-structure) object | Optional | Reference values that can be referenced throughout the dictionary. | `{ "customRegex": { "ncitIds": "^NCIT:C\d+$" } }` | - -### Dictionary Schema Structure - -> **Dictionary Schema Example** -> -> ```json -> { -> "name": "example-schema", -> "description": "Demonstrating structure of Lectern Schema", -> "meta": { -> /* Custom meta data about the schema here */ -> }, -> -> "fields": [ -> /* Fields Here */ -> ] -> } -> ``` - -| Property | Type | Required | Default | Description | Example | -| ------------- | ------------------------------------------------------- | -------- | ------- | :------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------- | -| `name` | NameString (no whitespace or `.`) | Required | | Name of the schema. This will be used in paths that reference this schema, and for identifying files containing records for this schema. | `"example-schema"` | -| `fields` | `Array<`[`LecterField`](#dictionary-field-structure)`>` | Required | | List of fields contained in this Schema. | [Dictionary Field Structure](#dictionary-field-structure) | -| `description` | `string` | Optional | None | Free text description of the schema, for use as a reference for users of the schema. This description is not used in dictionary validation. | `"Demonstrating structure of Lectern Schema"` | -| `meta` | [`MetaData`](#meta-data-structure) | Optional | None | Schema implementor defined fields to capture any additional properties not defined in standard Lectern schemas. | [Meta Data Structure](#meta-data-structure) | - -### Dictionary Field Structure - -> **Example Dictionary Field Definition** -> -> ```json -> { -> "name": "example_field", -> "description": "Shows a string field with a required restriction", -> "meta": { -> /* Custom meta data abou the field here */ -> }, -> "isArray": false, -> -> "valueType": "string", -> "restrictions": { -> "required": true -> } -> } -> ``` - -| Property | Required | Default | Type | Description | Example | -| -------------- | -------- | ---------------------- | --------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- | -| `name` | Required | | NameString (no whitespace or `.`) | Name of the field. This will be used as the header in TSV files in this field's schema, and in any paths referencing this field. | `"example_field` | -| `valueType` | Required | | [Field Data Type](#field-data-types) | Type of value stored in this field | `"string"` | -| `delimiter` | Optional | `,` | `string` | Character or string that will be used to split multiple values into an array. The default delimiter is a comma `,`. Any characters can be used as a delimiter. The delimiter value can be one or more characters long, but cannot be an empty string. Note: This property has no effect unless the field has `isArray: true`. | `"\|"` | -| `description` | Optional | `""` No value | `string` | Free text description of the field, for use as a reference for users of the schema. This description is not used in dictionary validation. | `"Shows a string field with a required restriction"` | -| `meta` | Optional | Empty object, no value | [`MetaData`](#meta-data-structure) object | Schema implementor defined fields to capture any additional properties not defined in standard Lectern fields. | `{ "displayName": "Example Field" }` | -| `isArray` | Optional | `false` | `boolean` | Type of value stored in this field | | -| `restrictions` | Optional | No Restrictions | `RestrictionsObject` or `Array` | An object containing all validation rules for this field. This can be a single object containing all [restrictions](#field-restrictions) applied to this field or a list of objects whose restrictions will be combined. [Conditional restrictions](#conditional-restrictions) can also be used to apply validation rules based on values of other fields in the record. | `{ "required": true }` | -| `unique` | Optional | `false` | `boolean` | Indicates that every record in this schema should have a unique value for this field. This rule is applied when a collection of records are validated together, ensuring that no two records in that collection repeat a value. | `true` | - -#### Field Data Types - -| valueType | Description | Examples | -| --------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- | -| `boolean` | Boolean value, either `true` or `false`. Accepts values with any letter casing, for example `true`, `True`, and `TRUE` will all be interpretted as `true` | `true`, `false` | -| `integer` | Numeric integer value. Will accept positive and negative values (ex. `21` or `-8`) but will reject any decimals (ex. `1.23`) | `21`, `-8` | -| `number` | Numeric value. Will accept any numeric value, including those with decimals. | `1.23`, `-4.567` | -| `string` | String fields. Value can have any length and use any character, other than the array delimiter for an array field (by default `\|`) | `asdf`, `Hello World`, `Another longer example of a string` | - -#### Field Restrictions - -Restrictions on a field are a list of rules that all values for this field must adhere to, these are the list of validations on the contents of each field. Two examples of restrictions are that a value is `required`, and that a value must take a value from a list of available options (`codeList`). The full list of available restrictions are described in the table below. - -The restrictions property of a field can have a value that is either a single restrictions object, or an array with any number of restrictions objects. If an array of restriction objects is provided, each set of restrictions will be applied in turn - for data to be valid, all restrictions in the array must pass. A restrictions object can either contain a set of restrictions from the table below, or be a [conditional restriction](#conditional-restrictions). - -The full list of available restrictions are: - -| Restriction | Used with Field Types | Type | Description | Examples | -| ----------- | ----------------------------- | --------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `codeList` | `integer`, `number`, `string` | Array of type of the field | An array of values of the type matching this field. Data provided for this field must have one of the values in this list. | `["Weak", "Average", "Strong"]` | -| `compare` | all | [ComparedFieldsRule](#comparedfieldsrule-data-structure) object | Enforces that this field has a value based on the provided value in another field. Examples would be to ensure that the two values are not equal, or for numeric values ensure one is greater than the other. | `{ "fields": ["age_at_diagnosis"], "relation": "greaterThanOrEqual" }` Ensure that a field such as `age_at_death` is greater than the provided `age_at_diagnosis` | -| `count` | Array fields of all types | `integer` or [`RangeRule`](#rangerule-data-structure) object | Enfroces the number of entries in an array. Can specify an exact array size, or provide range rules that set maximum and minimum counts. | `7` or `{"min": 5, "max": 10}` | -| `empty` | all | | Requires that no value is provided. This is useful when used on a [conditional restriction](#conditional-restrictions) in order to prevent a value from being given when the condition is `true`. For an array field with this restriction, an empty array is a valid value for this restriction. | n/a | -| `range` | `integer`, `number` | | Uses a [RangeRule](#rangerule-data-structure) object to define minimum and/or maximum values for this field | `{"min": 5}`, `{"exclusiveMax": 50}`, `{"exclusiveMin": 5, "max": 50}` | -| `regex` | `string` | | A regular expression that all values must match. | `^[a-z0-9]+$` | -| `required` | all | | A value must be provided, missing/undefined values will fail validation. Empty strings will not be accepted, though `0` (for `number` and `int` fields) and `false` (for `boolean` fields) are accepted. An array field with this restriction must have at least one entry. | `true`, `false` | - -#### Conditional Restrictions - -Restrictions can be added with conditions so that the validations are only applied based on the values provided to other fields within a record. - -A conditional restriction uses an if/then/else style syntax: - -The `if` property will be an object containing an array of `conditions` that look at other fields on the same record and apply matching rules to their values. When those field values match the rules in the condition than the condition passes. An optional `case` property can be added to the `if` object that defines how many of the `conditions` have to pass in order for the whole condition block to resolve as `true` - default is `all`, requiring all conditions to be met. - -The `then` object contains the restrictions that will be applied when the `if` condition is `true`, and the `else` condition contains restrictions to apply when the `if` condition is `false`. The `then` property is required but using an `else` property is optional. - -| Property | Required | Default | Type | Description | Example | -| -------- | -------- | ----------------------------- | --------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | -| `if` | Required | | `RequirementsConditions` | Contains the conditional cases that will be checked before applying this object's restrictions. This object contains a list of `conditions` and a `case` that indicates how many of the conditions need to be found `true` for the entire conditions block to be considered `true`. The case options are `any`, `all`, and `none`, with `all` being default (if case is not provided). | `{ "conditions": [ { "field": "another_field", "match": { "value": "Some Value" }} ], "case": "all" }` | -| `then` | Required | | `RestrictionsObject` or `Array` | The restriction rules to apply when the `if` condition is found to be `true`. | `{ "required": true}` | -| `else` | Optional | Empty object, no restrictions | `RestrictionsObject` or `Array` | The restriction rules to apply when the `if` condition is found to be `false`. | `{ "empty": true}` | - -```json -{ - "if": { - "conditions": [ /* Restriction conditions */ ], - "case": "all" - }, - "then": {/* Restrictons */} OR [ /* Restrictions objects (restriction values or nested conditional restrictions */ ], - "else": {/* Restrictons */} OR [ /* Restrictions objects (restriction values or nested conditional restrictions */ ] -} -``` - -##### Conditions Structure - -A requirement condition is defined by providing a field name or list of field names from this schema, and the matching rules that satisfy this condition. If multiple field names are provided, a `case` property can be added to specify how many of their values must pass the matching rules (`all`, `any`, or `none` of them). - -| Property | Required | Default | Type | Description | Example | -| ---------------- | -------- | ------- | -------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------- | -| `fields` | Required | | `Array` | Names of fields from the same schema. This match rule will be applied to all fields listed - see `case` to determine the rules for how many of these fields must match. All specified fields must store values of the same type. | `["some_field"]` | -| `match` | Required | | `MatchRules` object | Matching rules for the values of the `fields`. All rules included in this object will be tested and all must be pass - this is not affected by the `case` property. [Conditional Match Rules](#conditional-match-rules) | `{ "value": "Hello World" }` | -| `arrayFieldCase` | Optional | `all` | `all`, `any`, `none` | When a specified field is an array type, the `arrayFieldCase` dictates how many of the values in the array must pass the matching rules. `all` requires all values in the array to pass the matching rule. `any` requires at least one value in the array to match. `non` requires that none of the values in the array match. | `any` | -| `case` | Optional | `all` | `all`, `any`, `none` | Defines how many of the listed `fields` must have a value that matches the `match` rules. `all` requires all fields values to have matching values. `any` requires at least one field to have a matching value. `none` requires that there none of the specified fields have values that match. | `any` | - -> **Example Conditional Restriction**: match single value -> -> Condition where `shirt_size` is `Small` -> -> ```json -> { -> "fields": ["shirt_size"], -> "match": { -> "value": "Small" -> } -> } -> ``` - -> **Example Conditional Restriction**: match value from list -> -> Condition where `shirt_size` is any value in a list (`Medium` or `Large`) -> -> ```json -> { -> "fields": ["shirt_size"], -> "match": { -> "codeList": ["Medium", "Large`"] -> } -> } -> ``` - -##### Conditional Match Rules - -| Property | Used with Field Types | Type | Description | Example | -| ---------- | --------------------- | -------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ | -| `codeList` | all | Array of type of specified fields | A list of values that the field could match. This rule passes when the specified field's value can be found in this list. | `["value_one", "value_two"]` | -| `count` | Array type fields | Integer, or [RangeRule](#rangerule-data-structure) | Matches the number of values in an array field. This condition can be provided as a number, in which case this condition matches if the array is that exact length. This condition can be provided as a Range object as well, in which case it will match if the number of elements in the array pass the minimum and maximum conditions provided in the condition. | `2` - Field must have exactly 2 elements. `{ max: 10 }` - Field must have no more than 10 items. | -| `exists` | all | Boolean | This condition requires a field to either have a value or have no value. When the `exists` condition is set to `true`, the field must have a value. When `exists` is sdet to `false`, the field must have no value. For array fields, `exists=false` only matches when the array is completely empty, and `exists=true` passes if the array has 1 or more values - `arrayCase` has no interaction with the `exists` condition. | `true` | -| `range` | `number`, `integer` | [RangeRule](#rangerule-data-structure) | Maximum and minimum value conditions that a numeric field must pass. | `{ min: 5, exclusiveMax: 10 }` Represents an integer from 5-9. | -| `regex` | `string` | String (Regular Expression) | A regular expression pattern that the value must match. | `^NCIT:C\d+$` Value must match an NCI Thesaurus ID. | -| `value` | all | Type of specified fields | Field value matches the value of the specified field. Strings are matched case insensitive. When arrays are matched, the order of their elements is ignored - a field matches this condition if the elements in field are the same elements as in the value match rule. For example, the rule `['abc', 'def']` matches the value `['def', 'abc']` but does not match `['abc', 'def', 'ghi']`. | `some_value`, `[1, 2, 3]` | - -### Meta Data Structure - -> **Meta Example** -> -> ```json -> { -> "displayName": "Nicely Formatted Name", -> "externalReferenceId": "ABCD:1234", -> "exampleBooleanPropery": true, -> "exampleNumericProperty": 123 -> } -> ``` - -A `meta` object is available to allow the dictionary creator to add custom properties to the Lectern Dictionary. The `meta` property is available to all Dictionary, Schema, and Field objects. Providing a `meta` value is optional. If provided the `meta` value is a JSON object. There are no restrictions on the field names that can be added to the `meta` object other than they must be valid JSON. The values for properties of the `meta` can either be another nested meta object, or are one of the allowed value types: - -- `string` -- `number` -- `boolean` -- `Array` -- `Array` - -### References Structure - -References are defined at the dictionary level so they can be reused across schemas. References can be used to store values that can be used in `meta` or `restrictions` - -#### Using References - -Reference variables can be used in a `meta` object or a `restrictions` object as either a restriction value or a conditional match value. - -To use a reference, replace the value in the value of the meta or restriction property with a string containing a `ReferenceTag`. A `ReferenceTags` - -### RangeRule Data Structure - -> **RangeRule Example** -> -> ```json -> { -> "min": 5, -> "exclusiveMax": 10 -> } -> ``` - -`RangeRule` objects are used to define restrictions and conditions where a numeric minimum or maximum needs to be defined. This object must define at least 1 property (ie. could define a minimum but not maximum, or vice-versa). - -There is an inclusive and an exclusive version of the minimum and maximum properties. `min` and `max` are _inclusive_, and the alternate form `exclusiveMin` and `exclusiveMax` are _exclusive_. By example, `{ "min":5 }` allows the value `5` and greater, while `{ "exclusiveMin": 5 }` allows only values greater than `5` but not `5` itself. - -A `RangeRule` cannot include but an inclusive and exclusive version of min, or of max (ie. it cannot have `min` and `exclusiveMin`.) - -| Property | Description | -| -------------- | :---------------------------------------------------------------- | -| `exclusiveMax` | Allows values less than this value, but not this value itself. | -| `exclusiveMin` | Allows values greater than this value, but not this value itself. | -| `max` | Allows this value and values lesser than this value. | -| `min` | Allows this value and values greater than this value. | - -### ComparedFieldsRule Data Structure - -> **ComparedFieldsRule** Example -> -> ```json -> { -> "fields": "some_field", -> "relation": "equal" -> } -> ``` - -| Property | Required | Default | Type | Description | -| ---------- | -------- | -------------------- | ---------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `fields` | Required | | `string` or `Array` | The field(s) that the values of will be compared to. These fields will be refered to throughout this section as _compared to_ fields. All these fields need to be the same type as the field(s) they will be compared to. | -| `relation` | Required | | `equal`, `notEqual`, `contains`, `containedIn`, `greaterThan`, `greaterThanOrEqual`, `lesserThan`, `lesserThanOrEqual` | The relation between the values of the test field and the compared to fields. See [ComparedFieldsRule Relations](#comparedfieldsrule-relations). | -| `case` | Optional | `all`, `any`, `none` | MatchCase (RangeRule or one of: `all`, `any`, `none`) | How many of the _compared to_ fields must pass the comparison for this rule to pass. | - -#### ComparedFieldsRule Relations - -| Relation Value | Allowable Field Types | Description | -| ------------------------ | --------------------- | :--------------------------------------------------------------------------------------------------------- | -| **`equal`**: | all | Checks that the current field and the comapred field(s) have the same value | -| **`notEqual`**: | all | Checks that the current field and the comapred field(s) do not have the same value | -| **`contains`** | `string` | Checks that the value of the current field completely contains the value of the compared field(s) | -| **`containedIn`** | `string` | Checks that the value of the current field is completely contained in the value of the compared field(s) | -| **`greaterThan`** | `number`, `integer` | Checks that the value of the current field is greater than (exclusive) the value of the compared field(s). | -| **`greaterThanOrEqual`** | `number`, `integer` | Checks that the value of the current field is greater than or equal to the value of the compared field(s). | -| **`lesserThan`** | `number`, `integer` | Checks that the value of the current field is lesser than (exclusive) the value of the compared field(s). | -| **`lesserThanOrEqual`** | `number`, `integer` | Checks that the value of the current field is lesser than or equal to the value of the compared field(s). | - -## Source Code Reference - -Source code for the Lectern Dictionary meta-schema is made available through the package [@overture-stack/lectern-dictionary](../packages/dictionary/). The meta-schema is formally defined in TypeScript and exported as the type `Dictionary` from the file [`dictionary/src/types/dictionaryTypes.ts`](../packages/dictionary/src/types/dictionaryTypes.ts). This definition is created using [`Zod`] schemas, which are also exported from this package and available for use to confirm a given object is a valid Lectern Dictionary. From 1921eb282fb23db8f21389a84353c99c5459e657 Mon Sep 17 00:00:00 2001 From: Mitchell Shiell Date: Fri, 4 Jul 2025 14:06:19 -0400 Subject: [PATCH 4/5] dictionaryReference.md --- docs/03-dictionaryReference.md | 914 ++++++++++++------ .../04-glossary.md => pendingDocs/glossary.md | 0 2 files changed, 643 insertions(+), 271 deletions(-) rename docs/04-glossary.md => pendingDocs/glossary.md (100%) diff --git a/docs/03-dictionaryReference.md b/docs/03-dictionaryReference.md index f2800040..e700d6aa 100644 --- a/docs/03-dictionaryReference.md +++ b/docs/03-dictionaryReference.md @@ -1,12 +1,12 @@ -# Dictionary Syntax +# Building Dictionaries A Lectern Dictionary is a JSON configuration that defines the structure and validation rules for tabular data files. It consists of schemas that describe individual file formats, with each schema containing field definitions and validation constraints. -## Basic Dictionary Structure +## Dictionary Structure -A Lectern Dictionary is a JSON configuration file that defines the structure and validation rules for your data files. At its core, every dictionary must contain three essential components: a **name**, **version**, and at least one **schema**. Additional optional components like descriptions, metadata, and references can enhance functionality. +A Lectern Dictionary is a JSON configuration file that defines the structure and validation rules for your data files. At its core, every dictionary must contain three essential components: a **name**, **version**, and **schemas**. -```json showLineNumbers {2-3,9-26} +```json showLineNumbers {2-3,9-11} { "name": "clinical_data_dictionary", "version": "1.2.0", @@ -16,47 +16,34 @@ A Lectern Dictionary is a JSON configuration file that defines the structure and "created": "2024-01-15" }, "schemas": [ - { - "name": "patient", - "description": "Patient demographic and clinical information", - "fields": [ - { - "name": "patient_id", - "valueType": "string", - "restrictions": { - "required": true, - "regex": "^PAT-\\d{6}$" - }, - "unique": true, - "description": "Unique patient identifier in format PAT-XXXXXX" - } - ] - } + { ... } ], "references": { - "customRegex": { - "dateFormat": "^\\d{4}-\\d{2}-\\d{2}$" + "regex": { + "patient_id_format": "^PAT-\\d{6}$", } } } ``` -### Basic Dictionary Properties +Optional components like descriptions, metadata, and references which can also be used are described in the table below. + +#### Dictionary Properties | Property | Type | Required | Description | Example | | ------------- | --------------- | -------- | -------------------------------------- | ----------------------------------------------- | | `name` | `string` | ✓ | Display name of the dictionary | `"clinical_data_dictionary"` | | `version` | `string` | ✓ | Semantic version (major.minor.patch) | `"1.2.0"` | -| `schemas` | `Array` | ✓ | List of schema definitions (minimum 1) | See [Schema Structure](#basic-schema-structure) | +| `schemas` | `Array` | ✓ | List of schema definitions (minimum 1) | See [Schema Structure](#schema-structure) | | `description` | `string` | ✗ | A human-readable description | `"Clinical trial data schemas"` | -| `meta` | `object` | ✗ | Custom metadata fields | `{"author": "Clinical Data Team"}` | +| `meta` | `object` | ✗ | Any custom defined metadata fields | `{"author": "Clinical Data Team"}` | | `references` | `object` | ✗ | Reusable reference values | See [References](#references) | -## Basic Schema Structure +## Schema Structure -Each schema defines the structure of a single tabular data file. Every dictionary must have a **name** and **fields** array. +Each **schema** defines the structure of a single tabular data file. Every dictionary must have a **name** field and **fields** array. -```json showLineNumbers {11,13-26} +```json showLineNumbers {9-24} { "name": "clinical_data_dictionary", "version": "1.2.0", @@ -67,44 +54,39 @@ Each schema defines the structure of a single tabular data file. Every dictionar }, "schemas": [ { - "name": "patient", + "name": "patientSchema", "description": "Patient demographic and clinical information", "fields": [ - { - "name": "patient_id", - "valueType": "string", - "restrictions": { - "required": true, - "regex": "^PAT-\\d{6}$" - }, - "unique": true, - "description": "Unique patient identifier in format PAT-XXXXXX" - } - ] - } - ], - "references": { - "customRegex": { - "dateFormat": "^\\d{4}-\\d{2}-\\d{2}$" + { ... }, + ], + }, + { + "name": "sampleSchema", + "description": "Data derived from patient samples", + "fields": [ + { ... }, + ], } - } + ] } ``` -### Schema Properties +At the schema-level descriptions and metadata can also be optionally added. + +#### Schema Properties | Property | Type | Required | Description | | ------------- | -------------- | -------- | ------------------------------------- | -| `name` | `string` | ✓ | Schema identifier (no spaces or dots) | -| `fields` | `Array` | ✓ | List of field definitions | -| `description` | `string` | ✗ | Human-readable description | -| `meta` | `object` | ✗ | Custom metadata | +| `name` | `string` | ✓ | The schema identifier (no spaces or dots) | +| `fields` | `Array` | ✓ | List of field definitions, see [Field Structure](#field-structure) | +| `description` | `string` | ✗ | A human-readable description | +| `meta` | `object` | ✗ | Any custom defined metadata fields | -## Field Definitions +## Field Structure -Fields define the individual columns in your data files, at minimum a field object must have a **name** and **valueType**. +**Fields** define the individual columns in your data files, at minimum a field object must have a **name** and **valueType**. -```json showLineNumbers {15,17,25,27} +```json showLineNumbers {13-32} { "name": "clinical_data_dictionary", "version": "1.2.0", @@ -122,50 +104,30 @@ Fields define the individual columns in your data files, at minimum a field obje "name": "patient_id", "description": "Unique patient identifier in format PAT-XXXXXX", "valueType": "string", - "restrictions": { - "required": true, - "regex": "^PAT-\\d{6}$" - }, - "unique": true + "restrictions": { ... }, + } + { + "name": "field2", + "valueType": "boolean", }, { - "name": "diagnosis_date", - "description": "Date of initial diagnosis in YYYY-MM-DD format", - "valueType": "string", - "meta": { - "displayName": "Diagnosis Date", - "category": "clinical" - }, - "restrictions": { - "required": true, - "regex": "dateFormat" - } - } + "name": "field3", + "valueType": "integer", + }, + { + "name": "field4", + "valueType": "number", + }, ] } ], - "references": { - "customRegex": { - "dateFormat": "^\\d{4}-\\d{2}-\\d{2}$" - } - } } ``` -### Field Properties +#### Field Value Types -| Property | Type | Required | Default | Description | -| -------------- | -------------- | -------- | ------- | --------------------------------------------------------- | -| `name` | `string` | ✓ | - | Field identifier (used as a column header) | -| `description` | `string` | ✗ | `""` | Human-readable description | -| `valueType` | `string` | ✓ | - | Data type: `string`, `integer`, `number`, `boolean` | -| `isArray` | `boolean` | ✗ | `false` | Whether field accepts multiple values | -| `delimiter` | `string` | ✗ | `","` | Separator for array values | -| `unique` | `boolean` | ✗ | `false` | Whether values must be unique across records | -| `restrictions` | `object/array` | ✗ | `{}` | Where the validation rules/logic for the field is defined | -| `meta` | `object` | ✗ | `{}` | Custom metadata | +The allowed values for `valueType` include: -### Field Data Types | Type | Description | Valid Examples | Invalid Examples | | --------- | --------------------------------------------------- | -------------------------------- | ---------------------- | @@ -174,57 +136,80 @@ Fields define the individual columns in your data files, at minimum a field obje | `number` | Any numeric value | `42`, `3.14`, `-17.5`, `0` | `"abc"`, `"N/A"` | | `boolean` | True/false (case-insensitive) | `true`, `True`, `FALSE`, `false` | `yes`, `1`, `0`, `Y` | -### Field Restrictions -Field restrictions define validation rules that field values must satisfy to be considered valid. These rules ensure data integrity by enforcing specific constraints on field content. +#### Field Properties + +At the field-level the following properties can also be included: + +| Property | Type | Required | Default | Description | +| -------------- | -------------- | -------- | ------- | --------------------------------------------------------- | +| `name` | `string` | ✓ | - | Field identifier (used as a column header) | +| `description` | `string` | ✗ | `""` | Human-readable description | +| `valueType` | `string` | ✓ | - | Data type: `string`, `integer`, `number`, `boolean` | +| `isArray` | `boolean` | ✗ | `false` | Whether field accepts multiple values | +| `delimiter` | `string` | ✗ | `","` | Separator for array values | +| `unique` | `boolean` | ✗ | `false` | Whether values must be unique across records | +| `restrictions` | `object/array` | ✗ | `{}` | Where the validation rules/logic for the field is defined, see [Field Restrictions](#field-restrictions) | +| `meta` | `object` | ✗ | `{}` | Any custom defined metadata fields | -The `restrictions` property accepts either: +## Field Restrictions -- A single restrictions object -- An array of restrictions objects +Field restrictions define the rules that field values must satisfy to be considered valid. Here's an example of a field that defines and validates the age of a patient: + +```json +{ + "name": "patient_age", + "valueType": "integer", + "restrictions": { + "required": true, + "range": { + "min": 0, + "max": 150 + } + } +} +``` + +:::note What this means: +Age must be provided and must be between 0-150 years old. +::: -When multiple restrictions are provided in an array, they are evaluated sequentially. Data is only considered valid if it passes every restriction in the array. Each restrictions object can contain: - **Standard Restrictions** (detailed in the sections below) -- **Conditional restrictions** that apply validation logic based on specific conditions - -## Standard Restrictions +- **Conditional restrictions** that apply validation logic based on specific [`conditions`](#conditions) ### `required` Ensures a field has a value. ```json showLineNumbers -{ - "name": "patient_id", - "valueType": "string", - "restrictions": { "required": true } -} + { + "name": "patient_id", + "valueType": "string", + "restrictions": { "required": true } + } ``` -**Validation behavior:** - -- Empty strings `""` are rejected -- Zero `0` is accepted for numbers -- `false` is accepted for booleans -- Arrays must contain at least one element - ### `codeList` Restricts values to a predefined list of acceptable options. ```json showLineNumbers { - "name": "gender", + "name": "treatment_response", "valueType": "string", "restrictions": { - "codeList": ["Male", "Female", "Other", "Unknown"] + "codeList": ["Complete Response", "Partial Response", "Stable Disease", "Progressive Disease"] } } ``` +:::note **What this means:** +The `treatment_response` field will only accept values that exactly match one of the four options in the list. Any other value will be rejected. +::: + ### `range` Sets numeric boundaries for `integer` and `number` fields. @@ -245,10 +230,19 @@ Sets numeric boundaries for `integer` and `number` fields. - `exclusiveMin` / `exclusiveMax` - Exclusive boundaries ```json showLineNumbers -// Age must be 18 or older, but less than 65 -{ "range": { "min": 18, "exclusiveMax": 65 } } +{ + "name": "adult_age", + "valueType": "integer", + "restrictions": { + "range": { "min": 18, "exclusiveMax": 65 } + } +} ``` +:::note **What this means:** +Age must be 18 or older (inclusive), but less than 65 (exclusive). So valid values are 18-64. +::: + ### `regex` Applies pattern matching validation to string fields. @@ -263,23 +257,36 @@ Applies pattern matching validation to string fields. } ``` -### `empty` - -Requires a field to be empty. This is typically used within conditional restrictions. +For human readability we recommend using the `description` property and creating a `meta` property `examples` to clearly document the regex restriction. ```json showLineNumbers { - "restrictions": { "empty": true } + "name": "email", + "valueType": "string", + "description": "Contact email address for patient communication and records. Valid email format: username@domain.extension (minimum 2-letter extension). Accepts letters, numbers, dots, hyphens, and underscores in username and domain.", + "restrictions": { + "required": true, + "regex": "^[\\w\\.-]+@[\\w\\.-]+\\.[a-zA-Z]{2,}$" + }, + "meta": { + "examples": [ + "patient@example.com", + "john.doe@hospital.org", + "contact123@healthcare.gov" + ] + } } ``` -## Array-Specific Restrictions +:::note **What this means:** +Email address must be provided and follow standard email format with a username, @ symbol, domain name, and valid extension. The examples show acceptable formats while the pattern description explains what characters are allowed. +::: ### `count` -Controls the number of elements allowed in array fields. +Count is an **array specific** condition that controls the number of elements allowed in array fields. -```json showLineNumbers +```json showLineNumbers {4,6} { "name": "medications", "valueType": "string", @@ -292,35 +299,55 @@ Controls the number of elements allowed in array fields. Uses the same boundary options as `range`: `min`, `max`, `exclusiveMin`, `exclusiveMax`. -## Field Comparison Restrictions +:::note **What this means:** +The medications array must contain between 1 and 10 medication entries. Empty arrays or arrays with more than 10 items will be rejected. +::: + -### `compare` +### `empty` -Compares field values with other fields in the same record. +Requires a field to have no value. ```json showLineNumbers { - "name": "age_at_death", - "valueType": "integer", - "restrictions": { - "compare": { - "fields": ["age_at_diagnosis"], - "relation": "greaterThanOrEqual" - } - } + "name": "date_of_death", + "valueType": "string", + "restrictions": { "empty": true } } ``` -**Available comparison relations:** +:::note **What this means:** +The field must have no value - only empty strings, null, or undefined values are accepted. Any actual content will be rejected. +::: -- `equal` / `notEqual` - Value equality comparison -- `greaterThan` / `greaterThanOrEqual` - Numeric comparison -- `lessThan` / `lessThanOrEqual` - Numeric comparison -- `contains` / `containedIn` - String containment comparison +
+**Here is a more practical example using conditional logic** -## Conditional Restrictions +The `empty` restriction becomes particularly useful when combined with [conditional restrictions](#conditions) to create logical data rules: -Conditional restrictions allow you to apply different validation rules based on values in other fields within the same record. This enables dynamic validation where the requirements for one field change depending on the data in other fields. + ```json showLineNumbers + { + "name": "date_of_death", + "valueType": "string", + "restrictions": { + "if": { + "conditions": [ + { "fields": ["patient_status"], "match": { "value": "alive" } } + ] + }, + "then": { "empty": true }, + "else": { "required": true } + } + } + ``` + +**What this means:** If the patient status is "alive", then the date of death field must be empty. If the patient status is anything other than "alive", then the date of death field is required. + +
+ +### `conditions` + +Conditional restrictions allow you to apply different validation rules based on the values in other fields. This enables more complex validation rules where the requirements for one field change depending on the data in other fields. Think of conditional restrictions as "if-then-else" logic for your data validation: @@ -328,15 +355,14 @@ Think of conditional restrictions as "if-then-else" logic for your data validati - **THEN** apply these validation rules - **ELSE** apply different validation rules (optional) -### Basic Structure +#### Basic Structure ```json showLineNumbers -{ +"restrictions": { "if": { "conditions": [ /* conditions to check */ ], - "case": "all" // how many conditions must be true }, "then": { /* validation rules when conditions are true */ @@ -347,9 +373,31 @@ Think of conditional restrictions as "if-then-else" logic for your data validati } ``` -### Simple Example +**Match Criteria** -Let's start with a straightforward example: +The `match` object defines what you're looking for: + +```json showLineNumbers +// Exact value +{ "match": { "value": "Treatment_A" } } + +// Value from list +{ "match": { "codeList": ["Stage_III", "Stage_IV"] } } + +// Numeric range +{ "match": { "range": { "min": 18, "max": 65 } } } + +// Pattern matching +{ "match": { "regex": "^PAT-\\d{6}$" } } + +// Field has any value +{ "match": { "exists": true } } + +// Array length +{ "match": { "count": { "min": 1 } } } +``` + +#### Simple Example ```json showLineNumbers { @@ -367,13 +415,11 @@ Let's start with a straightforward example: } ``` -**What this means:** - -- **IF** the `patient_status` field equals "deceased" -- **THEN** the `date_of_death` field is required -- **ELSE** the `date_of_death` field must be empty +:::note **What this means:** +If the `patient_status` field equals "deceased", then the `date_of_death` field is required. Otherwise, the `date_of_death` field must be empty. +::: -### Multiple Conditions +#### Case Logic When you need to check multiple conditions, use the `case` property to specify how many must be true: @@ -394,12 +440,11 @@ When you need to check multiple conditions, use the `case` property to specify h } ``` -**What this means:** - -- **IF** `patient_status` equals "active" **AND** `enrollment_date` has a value -- **THEN** `treatment_details` is required +:::note **What this means:** +If `patient_status` equals "active" AND `enrollment_date` has a value, then `treatment_details` is required. +::: -### Case Options +**Case Options:** | Case Value | Description | Example | | ----------------- | ----------------------------------- | ------------------------------------------------ | @@ -407,198 +452,525 @@ When you need to check multiple conditions, use the `case` property to specify h | `"any"` | At least one condition must be true | Patient is active OR has enrollment date | | `"none"` | No conditions can be true | Patient is NOT active AND has NO enrollment date | -## Building Conditions -Each condition has three parts: +**Working with Arrays** -1. **`fields`** - Which fields to check -2. **`match`** - What to look for in those fields -3. **`case`** - How many fields must match (when checking multiple fields) +When checking array fields, use `arrayFieldCase` to specify how many array elements must match. Here's a practical example using a medications field: -### Basic Condition Structure - -```json showLineNumbers {6-9} +```json showLineNumbers {10} { - "name": "treatment_details", - "valueType": "string", + "name": "follow_up_required", + "valueType": "boolean", "restrictions": { "if": { "conditions": [ - { "fields": ["patient_status"], "match": { "value": "active" } }, - { "fields": ["enrollment_date"], "match": { "exists": true } } - ], - "case": "all" + { + "fields": ["current_medications"], + "match": { "codeList": ["chemotherapy", "immunotherapy", "targeted_therapy"] }, + "arrayFieldCase": "any" + } + ] }, "then": { "required": true } } } ``` -The basic condition structure highlighted above is as follows: +:::note **What this means:** +If the patient is taking ANY cancer treatment medication (chemotherapy, immunotherapy, or targeted therapy) in their medications array, then follow-up is required. +::: + +**Array Field Case Options:** -```json showLineNumbers -"conditions": [ - { - "fields": ["field_name"], - "match": { - "value": "specific_value" - } - } -], -``` +| Value | Description | Example Use Case | +| -------- | ------------------------------------ | ---------------- | +| `"all"` | All elements in the array must match | All medications must be from an approved list | +| `"any"` | At least one element must match | Patient has at least one high-risk medication | +| `"none"` | No elements can match | Patient cannot have any contraindicated drugs | -### Checking Multiple Fields +**Additional Array Examples:** ```json showLineNumbers +// Example: All medications must be FDA approved { - "fields": ["field1", "field2", "field3"], - "match": { "value": "active" }, - "case": "any" // at least one field must equal "active" + "fields": ["medications"], + "match": { "regex": "^FDA-\\d+" }, + "arrayFieldCase": "all" } -``` - -### Match Criteria - -The `match` object defines what you're looking for. Here are the available options: -### Exact Value Match - -```json showLineNumbers +// Example: Patient cannot have any experimental drugs { - "fields": ["treatment_arm"], - "match": { "value": "Treatment_A" } + "fields": ["medications"], + "match": { "codeList": ["experimental_drug_A", "experimental_drug_B"] }, + "arrayFieldCase": "none" } ``` -### Value from List +#### Complex Example + +You can combine multiple conditions with different logic: ```json showLineNumbers { - "fields": ["disease_stage"], - "match": { "codeList": ["Stage_III", "Stage_IV"] } + "name": "follow_up_required", + "valueType": "boolean", + "restrictions": { + "if": { + "conditions": [ + { + "fields": ["treatment_response"], + "match": { "codeList": ["partial_response", "stable_disease"] } + }, + { + "fields": ["adverse_events"], + "match": { "count": { "min": 1 } } + } + ], + "case": "any" // either condition can trigger the requirement + }, + "then": { "required": true } + } } ``` -### Numeric Range +:::note **What this means:** +If the treatment response is "partial_response" OR "stable_disease", OR if there's at least one adverse event, then follow-up is required. +::: -```json showLineNumbers -{ - "fields": ["age"], - "match": { "range": { "min": 18, "max": 65 } } -} -``` +## Schema Restrictions -### Pattern Matching +In addition to field-level restrictions, Lectern Dictionaries support schema-level restrictions that establish relationships between schemas. -```json showLineNumbers +### `uniqueKey` + +Primary keys identify unique records within a schema using the `uniqueKey` restriction applied at the schema level. They ensure that each record can be distinctly identified by one or more field values. + +#### Single Field Primary Key + +```json showLineNumbers {18-20} { - "fields": ["patient_id"], - "match": { "regex": "^PAT-\\d{6}$" } + "schemas": [ + { + "name": "participant", + "description": "The collection of all data related to a specific individual", + "fields": [ + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study, assigned by the data provider", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + } + ], + "restrictions": { + "uniqueKey": ["submitter_participant_id"] + } + } + ] } ``` -### Field Has Value +:::note **What this means:** +Each `submitter_participant_id` value must be unique across all records in the participant schema - no two participants can have the same ID. +::: -```json showLineNumbers +#### Compound Primary Key + +For cases where uniqueness requires multiple field combinations: + +```json showLineNumbers {24-26} { - "fields": ["consent_date"], - "match": { "exists": true } + "schemas": [ + { + "name": "patient_visit", + "description": "Patient visits identified by participant and visit number", + "fields": [ + { + "name": "submitter_participant_id", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + }, + { + "name": "visit_number", + "valueType": "integer", + "restrictions": { + "required": true, + "range": { "min": 1 } + } + } + ], + "restrictions": { + "uniqueKey": ["submitter_participant_id", "visit_number"] + } + } + ] } ``` -### Array Length +:::note **What this means:** +The combination of `submitter_participant_id` and `visit_number` must be unique. A participant can have multiple visits, and visit numbers can repeat across participants, but the same participant cannot have duplicate visit numbers. +::: -```json showLineNumbers +### `foreignKey` + +Foreign keys establish relationships between schemas by referencing primary keys in other schemas. They ensure referential integrity by validating that referenced records actually exist. + +#### Basic Foreign Key + +```json showLineNumbers {37-47} { - "fields": ["medications"], - "match": { "count": { "min": 1 } } + "schemas": [ + { + "name": "participant", + "description": "Study participants", + "fields": [ + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + } + ], + "restrictions": { + "uniqueKey": ["submitter_participant_id"] + } + }, + { + "name": "sociodemographic", + "description": "Captures sociodemographic characteristics", + "fields": [ + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study, assigned by the data provider", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + } + ], + "restrictions": { + "foreignKey": [ + { + "schema": "participant", + "mappings": [ + { + "local": "submitter_participant_id", + "foreign": "submitter_participant_id" + } + ] + } + ] + } + } + ] } ``` -```json showLineNumbers +:::note **What this means:** +Every `submitter_participant_id` in the sociodemographic schema must match an existing `submitter_participant_id` in the participant schema. You cannot create sociodemographic records for participants that don't exist. +::: + + +#### Foreign Key Properties + +| Property | Type | Required | Description | +| ---------- | ----------------------- | -------- | --------------------------------------------- | +| `schema` | `string` | ✓ | Name of the referenced schema | +| `mappings` | `Array` | ✓ | Array of field mappings between schemas | + +#### Mapping Object Properties + +| Property | Type | Required | Description | +| --------- | -------- | -------- | ------------------------------------- | +| `local` | `string` | ✓ | Field name in the current schema | +| `foreign` | `string` | ✓ | Field name in the referenced schema | + +#### Validation Rules + +Foreign key validation enforces these requirements: + +- **Referenced schema** must exist in the same dictionary +- **Referenced fields** must be defined as a `uniqueKey` in the target schema +- **Foreign key values** must match existing primary key values in the referenced schema +- **Local fields** referenced in mappings must exist in the current schema + +#### Multiple Foreign Keys + +A schema can reference multiple other schemas to create complex relationships: + +
+**Click here to view the example schema** +```json showLineNumbers {18,45-56,93-113} { - "name": "sociodem_question_detail", - "valueType": "string", - "restrictions": { - "if": { - "conditions": [ + "schemas": [ + { + "name": "participant", + "description": "Study participants", + "fields": [ { - "fields": ["sociodem_question"], - "match": { - "codeList": ["PCGL reference question", "Another question"] + "name": "submitter_participant_id", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" }, - "case": "any" + "unique": true } - ] + ], + "restrictions": { + "uniqueKey": ["submitter_participant_id"] + } }, - "then": { "required": true }, - "else": { - "required": false, - "empty": true + { + "name": "diagnosis", + "description": "Medical diagnoses for participants", + "fields": [ + { + "name": "submitter_diagnosis_id", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + }, + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + } + ], + "restrictions": { + "uniqueKey": ["submitter_diagnosis_id"], + "foreignKey": [ + { + "schema": "participant", + "mappings": [ + { + "local": "submitter_participant_id", + "foreign": "submitter_participant_id" + } + ] + } + ] + } + }, + { + "name": "treatment", + "description": "Medications, procedures, other actions taken for clinical management", + "fields": [ + { + "name": "submitter_treatment_id", + "description": "Unique identifier of the treatment, assigned by the data provider", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + }, + { + "name": "submitter_diagnosis_id", + "description": "Unique identifier of the primary diagnosis event, assigned by the data provider", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + }, + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study, assigned by the data provider", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + } + ], + "restrictions": { + "uniqueKey": ["submitter_treatment_id"], + "foreignKey": [ + { + "schema": "diagnosis", + "mappings": [ + { + "local": "submitter_diagnosis_id", + "foreign": "submitter_diagnosis_id" + } + ] + }, + { + "schema": "participant", + "mappings": [ + { + "local": "submitter_participant_id", + "foreign": "submitter_participant_id" + } + ] + } + ] + } } - } + ] } ``` -### Working with Arrays +
-When checking array fields, use `arrayFieldCase` to specify how many array elements must match: +:::note **What this means:** +In the example above each treatment record must reference both an existing diagnosis and an existing participant. This creates a hierarchical relationship: participant → diagnosis → treatment. +::: + +#### Complete Schema Relationship Example + +Here's how primary and foreign keys work together to create a complete data model: ```json showLineNumbers { - "name": "diabetes_medication", - "valueType": "string", - "restrictions": { - "if": { - "conditions": [ + "name": "clinical_dictionary", + "version": "1.0.0", + "schemas": [ + { + "name": "participant", + "description": "Study participants", + "fields": [ { - "fields": ["medical_history"], - "match": { "value": "diabetes" }, - "arrayFieldCase": "any" // any element in the array can match + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true } - ] + ], + "restrictions": { + "uniqueKey": ["submitter_participant_id"] + } }, - "then": { "required": true } - } + { + "name": "diagnosis", + "description": "Medical diagnoses for participants", + "fields": [ + { + "name": "submitter_diagnosis_id", + "description": "Unique identifier of the primary diagnosis event", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + }, + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + } + ], + "restrictions": { + "uniqueKey": ["submitter_diagnosis_id"], + "foreignKey": [ + { + "schema": "participant", + "mappings": [ + { + "local": "submitter_participant_id", + "foreign": "submitter_participant_id" + } + ] + } + ] + } + } + ] } ``` -### Array Field Case Options +:::note **What this relationship creates:** +- **Participants** have unique IDs (primary key) +- **Diagnoses** have unique IDs (primary key) +- **Each diagnosis** must belong to an existing participant (foreign key) +- **Result**: A one-to-many relationship where one participant can have multiple diagnoses +::: -| Value | Description | -| -------- | ------------------------------------ | -| `"all"` | All elements in the array must match | -| `"any"` | At least one element must match | -| `"none"` | No elements can match | +## References -### Complex Conditional Logic +The `references` section is a **dictionary-level** property that allows you to define reusable values that can be referenced throughout your entire dictionary within all schemas. This is particularly useful for common regular expressions, shared code lists, or other values that appear in multiple places across different schemas. `references` are defined once at the dictionary level and can be used in any schema within the dictionary. -You can combine multiple conditions with different logic: -```json showLineNumbers +```json showLineNumbers {39-47} { - "name": "follow_up_required", - "valueType": "boolean", - "restrictions": { - "if": { - "conditions": [ + "name": "clinical_data_dictionary", + "version": "1.2.0", + "description": "Clinical trial data collection schemas", + "meta": { + "author": "Clinical Data Team", + "created": "2024-01-15" + }, + "schemas": [ + { + "name": "patient", + "fields": [ { - "fields": ["treatment_response"], - "match": { "codeList": ["partial_response", "stable_disease"] } + "name": "bioproject_accession", + "valueType": "string", + "restrictions": { + "regex": "#/regex/BioProject_accession" + } }, - { "fields": ["adverse_events"], "match": { "count": { "min": 1 } } } - ], - "case": "any" // either condition can trigger the requirement + { + "name": "diagnosis_date", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "#/regex/date" + } + }, + { + "name": "country", + "valueType": "string", + "restrictions": { + "required": true, + "codeList": "#/list/geo_loc_name_country" + } + } + ] + } + ], + "references": { + "regex": { + "BioProject_accession": "^PRJN[A-Z0-9]+$", + "date": "^\\d{4}-\\d{2}-\\d{2}$" }, - "then": { "required": true } + "list": { + "geo_loc_name_country": ["Canada", "United States", "Mexico", "..."] + } } } ``` -## Source Code Reference - -Source code for the Lectern Dictionary meta-schema is available through the package [@overture-stack/lectern-dictionary](../packages/dictionary/). The meta-schema is formally defined in TypeScript and exported as the type `Dictionary` from [`dictionary/src/types/dictionaryTypes.ts`](../packages/dictionary/src/types/dictionaryTypes.ts). This definition uses [Zod](https://zod.dev/) schemas, which are also exported for validation purposes. - -:::info Need Help? -If you encounter any issues or have questions about our API, please don't hesitate to reach out through our relevant [**community support channels**](https://docs.overture.bio/community/support). -::: +:::info **Need Help?** +If you encounter any issues or have questions, please don't hesitate to reach out through our relevant [**community support channels**](https://docs.overture.bio/community/support). +::: \ No newline at end of file diff --git a/docs/04-glossary.md b/pendingDocs/glossary.md similarity index 100% rename from docs/04-glossary.md rename to pendingDocs/glossary.md From e88019fc0ef9f702795821d78aa71420e0e1646d Mon Sep 17 00:00:00 2001 From: Mitchell Shiell Date: Tue, 8 Jul 2025 11:03:09 -0400 Subject: [PATCH 5/5] url + minor updates --- docs/02-Setup.md | 41 +++++---- docs/03-dictionaryReference.md | 163 +++++++++++++++++---------------- 2 files changed, 105 insertions(+), 99 deletions(-) diff --git a/docs/02-Setup.md b/docs/02-Setup.md index e4d63101..ed1fc8a8 100644 --- a/docs/02-Setup.md +++ b/docs/02-Setup.md @@ -39,16 +39,17 @@ docker run --name lectern-mongo \
Database Service Details - | Service | Port | Description | Purpose | - |---------|-------|---------------------------------------|----------------------------------------------| - | MongoDB | 27017 | NoSQL database for dictionary storage | Stores data dictionaries, versions, and metadata | +| Service | Port | Description | Purpose | +| ------- | ----- | ------------------------------------- | ------------------------------------------------ | +| MongoDB | 27017 | NoSQL database for dictionary storage | Stores data dictionaries, versions, and metadata | - **Important Notes:** - - Ensure port 27017 is available on your system - - Default credentials: `admin/password` - - Adjust port configuration if conflicts exist with other services +**Important Notes:** -
+- Ensure port 27017 is available on your system +- Default credentials: `admin/password` +- Adjust port configuration if conflicts exist with other services + + ### 2. Server Setup @@ -78,25 +79,25 @@ docker run --name lectern-mongo \ ```env # Express Configuration PORT=3000 - + # Swagger Documentation OPENAPI_PATH=/api-docs - + # MongoDB Configuration MONGO_HOST=localhost MONGO_PORT=27017 MONGO_DB=lectern MONGO_USER= MONGO_PASS= - + # Authentication (disabled by default) AUTH_ENABLED=false EGO_API= SCOPE= - + # CORS Configuration CORS_ALLOWED_ORIGINS= - + # Vault Configuration (disabled by default) VAULT_ENABLED=false VAULT_URL=http://localhost:8200 @@ -109,10 +110,12 @@ docker run --name lectern-mongo \ Environment Variables Reference **Express Configuration** + - `PORT`: Server port (default: 3000) - `OPENAPI_PATH`: Swagger UI path (default: /api-docs) **MongoDB Configuration** + - `MONGO_HOST`: Database hostname (default: localhost) - `MONGO_PORT`: Database port (default: 27017) - `MONGO_DB`: Database name (default: lectern) @@ -120,12 +123,14 @@ docker run --name lectern-mongo \ - `MONGO_PASS`: Database password (optional) **Authentication (Optional)** + - `AUTH_ENABLED`: Enable JWT-based authorization (default: false) - `EGO_API`: EGO API URL for JWT validation - `SCOPE`: Required policy name in JWT scope - `CORS_ALLOWED_ORIGINS`: Comma-separated list of allowed origins **Vault Integration (Optional)** + - `VAULT_ENABLED`: Enable HashiCorp Vault integration (default: false) - `VAULT_URL`: Vault server URL - `VAULT_SECRETS_PATH`: Path to secrets in Vault @@ -139,7 +144,7 @@ docker run --name lectern-mongo \ ```bash # From workspace root pnpm nx build @overture-stack/lectern-server - + # Or from apps/server directory pnpm build ``` @@ -149,7 +154,7 @@ docker run --name lectern-mongo \ ```bash # Production mode pnpm nx start @overture-stack/lectern-server - + # Development mode with hot reloading pnpm nx debug server ``` @@ -170,6 +175,7 @@ curl http://localhost:3000/health ### API Documentation Access the interactive API documentation at: + - **Swagger UI**: `http://localhost:3000/api-docs` ### Dictionary Management Testing @@ -179,11 +185,11 @@ Access the interactive API documentation at: 3. Verify dictionary creation, retrieval, and management operations **Troubleshooting:** + - Ensure MongoDB is running and accessible - Check server logs for validation errors - Verify API endpoints are responding correctly - :::info Need Help? If you encounter any issues or have questions about our API, please don't hesitate to reach out through our relevant [**community support channels**](https://docs.overture.bio/community/support). ::: @@ -241,7 +247,6 @@ pnpm debug docker build --no-cache -t lectern -f apps/server/Dockerfile . ``` - :::warning This guide is intended for development purposes only. For production deployments, implement appropriate security measures, configure authentication, and review all environment variables for your specific use case. -::: \ No newline at end of file +::: diff --git a/docs/03-dictionaryReference.md b/docs/03-dictionaryReference.md index e700d6aa..3bf40fb2 100644 --- a/docs/03-dictionaryReference.md +++ b/docs/03-dictionaryReference.md @@ -4,7 +4,7 @@ A Lectern Dictionary is a JSON configuration that defines the structure and vali ## Dictionary Structure -A Lectern Dictionary is a JSON configuration file that defines the structure and validation rules for your data files. At its core, every dictionary must contain three essential components: a **name**, **version**, and **schemas**. +A Lectern Dictionary is a JSON configuration file that defines the structure and validation rules for your data files. At its core, every dictionary must contain three essential components: a **name**, **version**, and **schemas**. ```json showLineNumbers {2-3,9-11} { @@ -30,14 +30,14 @@ Optional components like descriptions, metadata, and references which can also b #### Dictionary Properties -| Property | Type | Required | Description | Example | -| ------------- | --------------- | -------- | -------------------------------------- | ----------------------------------------------- | -| `name` | `string` | ✓ | Display name of the dictionary | `"clinical_data_dictionary"` | -| `version` | `string` | ✓ | Semantic version (major.minor.patch) | `"1.2.0"` | -| `schemas` | `Array` | ✓ | List of schema definitions (minimum 1) | See [Schema Structure](#schema-structure) | -| `description` | `string` | ✗ | A human-readable description | `"Clinical trial data schemas"` | -| `meta` | `object` | ✗ | Any custom defined metadata fields | `{"author": "Clinical Data Team"}` | -| `references` | `object` | ✗ | Reusable reference values | See [References](#references) | +| Property | Type | Required | Description | Example | +| ------------- | -------- | -------- | -------------------------------------- | ----------------------------------------- | +| `name` | `string` | ✓ | Display name of the dictionary | `"clinical_data_dictionary"` | +| `version` | `string` | ✓ | Semantic version (major.minor.patch) | `"1.2.0"` | +| `schemas` | `Array` | ✓ | List of schema definitions (minimum 1) | See [Schema Structure](#schema-structure) | +| `description` | `string` | ✗ | A human-readable description | `"Clinical trial data schemas"` | +| `meta` | `object` | ✗ | Any custom defined metadata fields | `{"author": "Clinical Data Team"}` | +| `references` | `object` | ✗ | Reusable reference values | See [References](#references) | ## Schema Structure @@ -75,12 +75,12 @@ At the schema-level descriptions and metadata can also be optionally added. #### Schema Properties -| Property | Type | Required | Description | -| ------------- | -------------- | -------- | ------------------------------------- | -| `name` | `string` | ✓ | The schema identifier (no spaces or dots) | -| `fields` | `Array` | ✓ | List of field definitions, see [Field Structure](#field-structure) | -| `description` | `string` | ✗ | A human-readable description | -| `meta` | `object` | ✗ | Any custom defined metadata fields | +| Property | Type | Required | Description | +| ------------- | -------- | -------- | ------------------------------------------------------------------ | +| `name` | `string` | ✓ | The schema identifier (no spaces or dots) | +| `fields` | `Array` | ✓ | List of field definitions, see [Field Structure](#field-structure) | +| `description` | `string` | ✗ | A human-readable description | +| `meta` | `object` | ✗ | Any custom defined metadata fields | ## Field Structure @@ -128,7 +128,6 @@ At the schema-level descriptions and metadata can also be optionally added. The allowed values for `valueType` include: - | Type | Description | Valid Examples | Invalid Examples | | --------- | --------------------------------------------------- | -------------------------------- | ---------------------- | | `string` | Text values (any characters except array delimiter) | `"Hello"`, `"PAT-001234"`, `""` | N/A (accepts any text) | @@ -136,21 +135,20 @@ The allowed values for `valueType` include: | `number` | Any numeric value | `42`, `3.14`, `-17.5`, `0` | `"abc"`, `"N/A"` | | `boolean` | True/false (case-insensitive) | `true`, `True`, `FALSE`, `false` | `yes`, `1`, `0`, `Y` | - #### Field Properties At the field-level the following properties can also be included: -| Property | Type | Required | Default | Description | -| -------------- | -------------- | -------- | ------- | --------------------------------------------------------- | -| `name` | `string` | ✓ | - | Field identifier (used as a column header) | -| `description` | `string` | ✗ | `""` | Human-readable description | -| `valueType` | `string` | ✓ | - | Data type: `string`, `integer`, `number`, `boolean` | -| `isArray` | `boolean` | ✗ | `false` | Whether field accepts multiple values | -| `delimiter` | `string` | ✗ | `","` | Separator for array values | -| `unique` | `boolean` | ✗ | `false` | Whether values must be unique across records | +| Property | Type | Required | Default | Description | +| -------------- | -------------- | -------- | ------- | -------------------------------------------------------------------------------------------------------- | +| `name` | `string` | ✓ | - | Field identifier (used as a column header) | +| `description` | `string` | ✗ | `""` | Human-readable description | +| `valueType` | `string` | ✓ | - | Data type: `string`, `integer`, `number`, `boolean` | +| `isArray` | `boolean` | ✗ | `false` | Whether field accepts multiple values | +| `delimiter` | `string` | ✗ | `","` | Separator for array values | +| `unique` | `boolean` | ✗ | `false` | Whether values must be unique across records | | `restrictions` | `object/array` | ✗ | `{}` | Where the validation rules/logic for the field is defined, see [Field Restrictions](#field-restrictions) | -| `meta` | `object` | ✗ | `{}` | Any custom defined metadata fields | +| `meta` | `object` | ✗ | `{}` | Any custom defined metadata fields | ## Field Restrictions @@ -170,11 +168,10 @@ Field restrictions define the rules that field values must satisfy to be conside } ``` -:::note What this means: -Age must be provided and must be between 0-150 years old. +:::note What this means: +Age must be provided and must be between 0-150 years old. ::: - Each restrictions object can contain: - **Standard Restrictions** (detailed in the sections below) @@ -185,11 +182,11 @@ Each restrictions object can contain: Ensures a field has a value. ```json showLineNumbers - { - "name": "patient_id", - "valueType": "string", - "restrictions": { "required": true } - } +{ + "name": "patient_id", + "valueType": "string", + "restrictions": { "required": true } +} ``` ### `codeList` @@ -201,7 +198,12 @@ Restricts values to a predefined list of acceptable options. "name": "treatment_response", "valueType": "string", "restrictions": { - "codeList": ["Complete Response", "Partial Response", "Stable Disease", "Progressive Disease"] + "codeList": [ + "Complete Response", + "Partial Response", + "Stable Disease", + "Progressive Disease" + ] } } ``` @@ -270,8 +272,8 @@ For human readability we recommend using the `description` property and creating }, "meta": { "examples": [ - "patient@example.com", - "john.doe@hospital.org", + "patient@example.com", + "john.doe@hospital.org", "contact123@healthcare.gov" ] } @@ -303,10 +305,9 @@ Uses the same boundary options as `range`: `min`, `max`, `exclusiveMin`, `exclus The medications array must contain between 1 and 10 medication entries. Empty arrays or arrays with more than 10 items will be rejected. ::: - ### `empty` -Requires a field to have no value. +Requires a field to have no value. ```json showLineNumbers { @@ -323,23 +324,23 @@ The field must have no value - only empty strings, null, or undefined values are
**Here is a more practical example using conditional logic** -The `empty` restriction becomes particularly useful when combined with [conditional restrictions](#conditions) to create logical data rules: +The `empty` restriction becomes particularly useful when combined with [conditional restrictions](#conditions) to create logical data rules: - ```json showLineNumbers - { - "name": "date_of_death", - "valueType": "string", - "restrictions": { - "if": { - "conditions": [ - { "fields": ["patient_status"], "match": { "value": "alive" } } - ] - }, - "then": { "empty": true }, - "else": { "required": true } - } +```json showLineNumbers +{ + "name": "date_of_death", + "valueType": "string", + "restrictions": { + "if": { + "conditions": [ + { "fields": ["patient_status"], "match": { "value": "alive" } } + ] + }, + "then": { "empty": true }, + "else": { "required": true } } - ``` +} +``` **What this means:** If the patient status is "alive", then the date of death field must be empty. If the patient status is anything other than "alive", then the date of death field is required. @@ -452,7 +453,6 @@ If `patient_status` equals "active" AND `enrollment_date` has a value, then `tre | `"any"` | At least one condition must be true | Patient is active OR has enrollment date | | `"none"` | No conditions can be true | Patient is NOT active AND has NO enrollment date | - **Working with Arrays** When checking array fields, use `arrayFieldCase` to specify how many array elements must match. Here's a practical example using a medications field: @@ -460,13 +460,15 @@ When checking array fields, use `arrayFieldCase` to specify how many array eleme ```json showLineNumbers {10} { "name": "follow_up_required", - "valueType": "boolean", + "valueType": "boolean", "restrictions": { "if": { "conditions": [ { "fields": ["current_medications"], - "match": { "codeList": ["chemotherapy", "immunotherapy", "targeted_therapy"] }, + "match": { + "codeList": ["chemotherapy", "immunotherapy", "targeted_therapy"] + }, "arrayFieldCase": "any" } ] @@ -482,8 +484,8 @@ If the patient is taking ANY cancer treatment medication (chemotherapy, immunoth **Array Field Case Options:** -| Value | Description | Example Use Case | -| -------- | ------------------------------------ | ---------------- | +| Value | Description | Example Use Case | +| -------- | ------------------------------------ | --------------------------------------------- | | `"all"` | All elements in the array must match | All medications must be from an approved list | | `"any"` | At least one element must match | Patient has at least one high-risk medication | | `"none"` | No elements can match | Patient cannot have any contraindicated drugs | @@ -500,7 +502,7 @@ If the patient is taking ANY cancer treatment medication (chemotherapy, immunoth // Example: Patient cannot have any experimental drugs { - "fields": ["medications"], + "fields": ["medications"], "match": { "codeList": ["experimental_drug_A", "experimental_drug_B"] }, "arrayFieldCase": "none" } @@ -521,9 +523,9 @@ You can combine multiple conditions with different logic: "fields": ["treatment_response"], "match": { "codeList": ["partial_response", "stable_disease"] } }, - { - "fields": ["adverse_events"], - "match": { "count": { "min": 1 } } + { + "fields": ["adverse_events"], + "match": { "count": { "min": 1 } } } ], "case": "any" // either condition can trigger the requirement @@ -541,7 +543,7 @@ If the treatment response is "partial_response" OR "stable_disease", OR if there In addition to field-level restrictions, Lectern Dictionaries support schema-level restrictions that establish relationships between schemas. -### `uniqueKey` +### `uniqueKey` Primary keys identify unique records within a schema using the `uniqueKey` restriction applied at the schema level. They ensure that each record can be distinctly identified by one or more field values. @@ -597,7 +599,7 @@ For cases where uniqueness requires multiple field combinations: } }, { - "name": "visit_number", + "name": "visit_number", "valueType": "integer", "restrictions": { "required": true, @@ -646,7 +648,7 @@ Foreign keys establish relationships between schemas by referencing primary keys } }, { - "name": "sociodemographic", + "name": "sociodemographic", "description": "Captures sociodemographic characteristics", "fields": [ { @@ -681,20 +683,19 @@ Foreign keys establish relationships between schemas by referencing primary keys Every `submitter_participant_id` in the sociodemographic schema must match an existing `submitter_participant_id` in the participant schema. You cannot create sociodemographic records for participants that don't exist. ::: - #### Foreign Key Properties -| Property | Type | Required | Description | -| ---------- | ----------------------- | -------- | --------------------------------------------- | -| `schema` | `string` | ✓ | Name of the referenced schema | -| `mappings` | `Array` | ✓ | Array of field mappings between schemas | +| Property | Type | Required | Description | +| ---------- | ---------------------- | -------- | --------------------------------------- | +| `schema` | `string` | ✓ | Name of the referenced schema | +| `mappings` | `Array` | ✓ | Array of field mappings between schemas | #### Mapping Object Properties -| Property | Type | Required | Description | -| --------- | -------- | -------- | ------------------------------------- | -| `local` | `string` | ✓ | Field name in the current schema | -| `foreign` | `string` | ✓ | Field name in the referenced schema | +| Property | Type | Required | Description | +| --------- | -------- | -------- | ----------------------------------- | +| `local` | `string` | ✓ | Field name in the current schema | +| `foreign` | `string` | ✓ | Field name in the referenced schema | #### Validation Rules @@ -866,7 +867,7 @@ Here's how primary and foreign keys work together to create a complete data mode } }, { - "name": "diagnosis", + "name": "diagnosis", "description": "Medical diagnoses for participants", "fields": [ { @@ -909,16 +910,16 @@ Here's how primary and foreign keys work together to create a complete data mode ``` :::note **What this relationship creates:** + - **Participants** have unique IDs (primary key) -- **Diagnoses** have unique IDs (primary key) +- **Diagnoses** have unique IDs (primary key) - **Each diagnosis** must belong to an existing participant (foreign key) - **Result**: A one-to-many relationship where one participant can have multiple diagnoses -::: + ::: ## References -The `references` section is a **dictionary-level** property that allows you to define reusable values that can be referenced throughout your entire dictionary within all schemas. This is particularly useful for common regular expressions, shared code lists, or other values that appear in multiple places across different schemas. `references` are defined once at the dictionary level and can be used in any schema within the dictionary. - +The `references` section is a **dictionary-level** property that allows you to define reusable values that can be referenced throughout your entire dictionary within all schemas. This is particularly useful for common regular expressions, shared code lists, or other values that appear in multiple places across different schemas. `references` are defined once at the dictionary level and can be used in any schema within the dictionary. ```json showLineNumbers {39-47} { @@ -973,4 +974,4 @@ The `references` section is a **dictionary-level** property that allows you to d :::info **Need Help?** If you encounter any issues or have questions, please don't hesitate to reach out through our relevant [**community support channels**](https://docs.overture.bio/community/support). -::: \ No newline at end of file +:::