diff --git a/docs/overview/01-lectern.md b/docs/01-overview.md similarity index 69% rename from docs/overview/01-lectern.md rename to docs/01-overview.md index b9f7af79..22d84add 100644 --- a/docs/overview/01-lectern.md +++ b/docs/01-overview.md @@ -1,22 +1,20 @@ -# Lectern - -Lectern is Overture's Data Dictionary Schema Manager, designed to validate, store, and manage collections of data dictionaries. These dictionaries define schemas that specify expected data structure and syntax for tabular (TSV) data submissions. With built-in version control capabilities, Lectern can track schema evolution and compute differences between versions, while integrating with the [Lyric](https://docs.overture.bio/docs/under-development/lyric/) data submission service. +# Overview +Data dictionaries are organized collections of schemas that define the structure, constraints, and relationships of data models. Overture's Data Dictionary Manager, Lectern, is designed to manage collections of data dictionaries and can be integrated into any data platform. Lectern typically works with Overture's tabular data submission service, [Lyric](https://docs.overture.bio/docs/under-development/lyric/), to ensure data quality and consistency throughout the submission workflow. ## Key Features -- **Schema Definition:** Define comprehensive schemas specifying structure, constraints, and relationships of data elements. -- **Dictionary Management:** Maintain collections of schemas (data dictionaries) with multiple versions. -- **Version Control:** Track changes and evolution of data structures over time. +- **Schema Definition:** Define schemas that specify the structure, constraints, and relationships of data elements +- **Version Control:** Track changes of data structures over time. - **Difference Computation:** Compare versions to understand changes in data requirements. -- **Schema Validation:** Validate the structure and syntax of data dictionary schema against Lecterns base meta-schema. +- **Schema Validation:** Validate the structure and syntax of data dictionary schema against the Lectern base meta-schema. - **Integration:** includes a RESTful API (Swagger) for integration with larger data management systems. ## System Architecture -Lectern operates as a central Dictionary Schema repository within the Overture ecosystem, providing dictionary management and validation services through its RESTful API. The service maintains schemas in a database, tracking versions and relationships between different schema elements. Lectern's schemas are primarily consumed by Lyric, which stores and uses them to validate incoming tabular data submissions. Through this integration, Lectern plays a crucial role in ensuring data quality and consistency in the Overture submission workflow. +Lectern operates as a central Dictionary Schema repository, providing dictionary management and validation services through its RESTful API. The service maintains schemas in a database (mongoDb), tracking versions and relationships between different schema elements. In the Overture platform Lectern's schemas are consumed by [Lyric](https://docs.overture.bio/docs/under-development/lyric/), which stores and uses them to validate incoming tabular data submissions. -![Submission System Architecture](./images/submission-system.svg 'Updated Overture Submission System') +![Submission System Architecture](./assets/submission-system.svg "Updated Overture Submission System") ## Repository Structure @@ -25,7 +23,7 @@ The repository is organized with the following directory structure: ``` . ├── apps/ -│ └── server +│ └── server └── packages/ │ ├── client | ├── common @@ -33,14 +31,14 @@ The repository is organized with the following directory structure: | └── validation └── scripts/ ``` -[Click here to view the Lectern repository on GitHub](https://github.com/overture-stack/lectern) +[Click here to view the Lectern repository on GitHub](https://github.com/overture-stack/lectern) The modules in the monorepo are organized into three categories: - - `apps/`: Standalone processes meant to be run. These are published to [ghcr.io](https://ghcr.io) as container images. - - `packages/`: Reusable packages shared between applications and other packages. Packages are published to [NPM](https://npmjs.com). - - `scripts`: Utility scripts for use within this repo. +- `apps/`: Standalone processes meant to be run. These are published to [ghcr.io](https://ghcr.io) as container images. +- `packages/`: Reusable packages shared between applications and other packages. Packages are published to [NPM](https://npmjs.com). +- `scripts`: Utility scripts for use within this repo. #### Lectern Components @@ -51,4 +49,4 @@ Each component serves a specific purpose within Lectern, providing functionality | [Lectern Server](https://github.com/overture-stack/lectern/blob/develop/apps/server/README.md) | @overture-stack/lectern-server | apps/server/ | [![Lectern GHCR Packages](https://img.shields.io/badge/GHCR-lectern-brightgreen?style=for-the-badge&logo=github)](https://github.com/overture-stack/lectern/pkgs/container/lectern) | Lectern Server web application. | | [Lectern Client](https://github.com/overture-stack/lectern/blob/develop/packages/client/README.md) | @overture-stack/lectern-client | packages/client | [![Lectern Client NPM Package](https://img.shields.io/npm/v/@overture-stack/lectern-client?color=%23cb3837&style=for-the-badge&logo=npm)](https://www.npmjs.com/package/@overture-stack/lectern-client) | TypeScript Client to interact with Lectern Server and Lectern data dictionaries. This library provides a REST client to assist in fetching data from the Lectern server. It also exposes the functionality from the Lectern Validation library to use a Lectern data dictionary to validate data. | | [Lectern Dictionary](https://github.com/overture-stack/lectern/blob/develop/packages/dictionary/README.md) | | @overture-stack/lectern-dictionary | [![Lectern Client NPM Package](https://img.shields.io/npm/v/@overture-stack/lectern-dictionary?color=%23cb3837&style=for-the-badge&logo=npm)](https://www.npmjs.com/package/@overture-stack/lectern-dictionary) | Dictionary meta-schema definition, includes TS types, and Zod schemas. This also exports all utilities for getting the diff of two dictionaries. | - | [Lectern Validation](https://github.com/overture-stack/lectern/blob/develop/packages/validation/README.md) | @overture-stack/lectern-validation | packages/validation/ | [![Lectern Validation NPM Package](https://img.shields.io/npm/v/@overture-stack/lectern-validation?color=%23cb3837&style=for-the-badge&logo=npm)](https://www.npmjs.com/package/@overture-stack/lectern-client) | Validate data using Lectern Dictionaries. + | [Lectern Validation](https://github.com/overture-stack/lectern/blob/develop/packages/validation/README.md) | @overture-stack/lectern-validation | packages/validation/ | [![Lectern Validation NPM Package](https://img.shields.io/npm/v/@overture-stack/lectern-validation?color=%23cb3837&style=for-the-badge&logo=npm)](https://www.npmjs.com/package/@overture-stack/lectern-client) | Validate data using Lectern Dictionaries. diff --git a/docs/02-Setup.md b/docs/02-Setup.md new file mode 100644 index 00000000..ed1fc8a8 --- /dev/null +++ b/docs/02-Setup.md @@ -0,0 +1,252 @@ +# Setup + +This guide provides instructions for setting up a complete development environment for Lectern, Overture's data dictionary management web server service. + +## Prerequisites + +Before beginning, ensure you have the following installed on your system: + +- **PNPM** (package manager - used instead of npm) +- **Node.js** (v18 or higher) +- **Docker** (for running containerized services) + +## Development Environment Setup + +### 1. Database Setup + +Lectern requires a MongoDB database to store dictionaries and metadata. Choose one of the following setup methods: + +**Option A: Using Docker Compose (Recommended)** + +```bash +# Navigate to the server directory +cd apps/server + +# Start MongoDB using docker-compose +docker-compose up -d +``` + +**Option B: Manual Docker Setup** + +```bash +docker run --name lectern-mongo \ +-e MONGO_INITDB_ROOT_USERNAME=admin \ +-e MONGO_INITDB_ROOT_PASSWORD=password \ +-p 27017:27017 \ +-d mongo:latest +``` + +
+ Database Service Details + +| Service | Port | Description | Purpose | +| ------- | ----- | ------------------------------------- | ------------------------------------------------ | +| MongoDB | 27017 | NoSQL database for dictionary storage | Stores data dictionaries, versions, and metadata | + +**Important Notes:** + +- Ensure port 27017 is available on your system +- Default credentials: `admin/password` +- Adjust port configuration if conflicts exist with other services + +
+ +### 2. Server Setup + +1. **Clone the Repository** + + ```bash + git clone https://github.com/overture-stack/lectern.git + cd lectern + ``` + +2. **Install Dependencies** + + ```bash + # Install all dependencies for the entire monorepo + pnpm install + ``` + +3. **Configure Environment** + + ```bash + cd apps/server + cp .env.example .env + ``` + + The `.env` file comes preconfigured with development defaults: + + ```env + # Express Configuration + PORT=3000 + + # Swagger Documentation + OPENAPI_PATH=/api-docs + + # MongoDB Configuration + MONGO_HOST=localhost + MONGO_PORT=27017 + MONGO_DB=lectern + MONGO_USER= + MONGO_PASS= + + # Authentication (disabled by default) + AUTH_ENABLED=false + EGO_API= + SCOPE= + + # CORS Configuration + CORS_ALLOWED_ORIGINS= + + # Vault Configuration (disabled by default) + VAULT_ENABLED=false + VAULT_URL=http://localhost:8200 + VAULT_SECRETS_PATH=/kv/lectern + VAULT_TOKEN=00000000-0000-0000-0000-000000000000 + VAULT_ROLE= + ``` + +
+ Environment Variables Reference + + **Express Configuration** + + - `PORT`: Server port (default: 3000) + - `OPENAPI_PATH`: Swagger UI path (default: /api-docs) + + **MongoDB Configuration** + + - `MONGO_HOST`: Database hostname (default: localhost) + - `MONGO_PORT`: Database port (default: 27017) + - `MONGO_DB`: Database name (default: lectern) + - `MONGO_USER`: Database username (optional) + - `MONGO_PASS`: Database password (optional) + + **Authentication (Optional)** + + - `AUTH_ENABLED`: Enable JWT-based authorization (default: false) + - `EGO_API`: EGO API URL for JWT validation + - `SCOPE`: Required policy name in JWT scope + - `CORS_ALLOWED_ORIGINS`: Comma-separated list of allowed origins + + **Vault Integration (Optional)** + + - `VAULT_ENABLED`: Enable HashiCorp Vault integration (default: false) + - `VAULT_URL`: Vault server URL + - `VAULT_SECRETS_PATH`: Path to secrets in Vault + - `VAULT_TOKEN`: Vault access token + - `VAULT_ROLE`: Vault role for authentication + +
+ +4. **Build the Application** + + ```bash + # From workspace root + pnpm nx build @overture-stack/lectern-server + + # Or from apps/server directory + pnpm build + ``` + +5. **Start the Development Server** + + ```bash + # Production mode + pnpm nx start @overture-stack/lectern-server + + # Development mode with hot reloading + pnpm nx debug server + ``` + +## Verification & Testing + +### API Health Check + +Verify that Lectern is running correctly: + +```bash +# Health endpoint +curl http://localhost:3000/health + +# Expected response: 200 OK +``` + +### API Documentation + +Access the interactive API documentation at: + +- **Swagger UI**: `http://localhost:3000/api-docs` + +### Dictionary Management Testing + +1. Navigate to the Swagger UI +2. Test creating a new data dictionary using the REST API +3. Verify dictionary creation, retrieval, and management operations + +**Troubleshooting:** + +- Ensure MongoDB is running and accessible +- Check server logs for validation errors +- Verify API endpoints are responding correctly + +:::info Need Help? +If you encounter any issues or have questions about our API, please don't hesitate to reach out through our relevant [**community support channels**](https://docs.overture.bio/community/support). +::: + +## Advanced Configuration + +### Enabling Authorization + +For production environments, enable JWT-based authorization: + +1. Set `AUTH_ENABLED=true` in your `.env` file +2. Configure `EGO_API` to point to your Ego authorization service +3. Set the appropriate `SCOPE` for your permissions + +### Vault Integration + +For secure secret management using HashiCorp Vault: + +1. Set `VAULT_ENABLED=true` in your `.env` file +2. Configure Vault connection parameters +3. Lectern will retrieve MongoDB credentials from Vault instead of environment variables + +## Development Commands Reference + +### From Workspace Root + +```bash +# Build server +pnpm nx build @overture-stack/lectern-server + +# Start server +pnpm nx start @overture-stack/lectern-server + +# Debug mode (hot reloading) +pnpm nx debug server +``` + +### From apps/server Directory + +```bash +# Build +pnpm build + +# Start +pnpm start + +# Development mode +pnpm debug +``` + +### Docker Operations + +```bash +# Build Docker image +docker build --no-cache -t lectern -f apps/server/Dockerfile . +``` + +:::warning +This guide is intended for development purposes only. For production deployments, implement appropriate security measures, configure authentication, and review all environment variables for your specific use case. +::: diff --git a/docs/03-dictionaryReference.md b/docs/03-dictionaryReference.md new file mode 100644 index 00000000..3bf40fb2 --- /dev/null +++ b/docs/03-dictionaryReference.md @@ -0,0 +1,977 @@ +# Building Dictionaries + +A Lectern Dictionary is a JSON configuration that defines the structure and validation rules for tabular data files. It consists of schemas that describe individual file formats, with each schema containing field definitions and validation constraints. + +## Dictionary Structure + +A Lectern Dictionary is a JSON configuration file that defines the structure and validation rules for your data files. At its core, every dictionary must contain three essential components: a **name**, **version**, and **schemas**. + +```json showLineNumbers {2-3,9-11} +{ + "name": "clinical_data_dictionary", + "version": "1.2.0", + "description": "Clinical trial data collection schemas", + "meta": { + "author": "Clinical Data Team", + "created": "2024-01-15" + }, + "schemas": [ + { ... } + ], + "references": { + "regex": { + "patient_id_format": "^PAT-\\d{6}$", + } + } +} +``` + +Optional components like descriptions, metadata, and references which can also be used are described in the table below. + +#### Dictionary Properties + +| Property | Type | Required | Description | Example | +| ------------- | -------- | -------- | -------------------------------------- | ----------------------------------------- | +| `name` | `string` | ✓ | Display name of the dictionary | `"clinical_data_dictionary"` | +| `version` | `string` | ✓ | Semantic version (major.minor.patch) | `"1.2.0"` | +| `schemas` | `Array` | ✓ | List of schema definitions (minimum 1) | See [Schema Structure](#schema-structure) | +| `description` | `string` | ✗ | A human-readable description | `"Clinical trial data schemas"` | +| `meta` | `object` | ✗ | Any custom defined metadata fields | `{"author": "Clinical Data Team"}` | +| `references` | `object` | ✗ | Reusable reference values | See [References](#references) | + +## Schema Structure + +Each **schema** defines the structure of a single tabular data file. Every dictionary must have a **name** field and **fields** array. + +```json showLineNumbers {9-24} +{ + "name": "clinical_data_dictionary", + "version": "1.2.0", + "description": "Clinical trial data collection schemas", + "meta": { + "author": "Clinical Data Team", + "created": "2024-01-15" + }, + "schemas": [ + { + "name": "patientSchema", + "description": "Patient demographic and clinical information", + "fields": [ + { ... }, + ], + }, + { + "name": "sampleSchema", + "description": "Data derived from patient samples", + "fields": [ + { ... }, + ], + } + ] +} +``` + +At the schema-level descriptions and metadata can also be optionally added. + +#### Schema Properties + +| Property | Type | Required | Description | +| ------------- | -------- | -------- | ------------------------------------------------------------------ | +| `name` | `string` | ✓ | The schema identifier (no spaces or dots) | +| `fields` | `Array` | ✓ | List of field definitions, see [Field Structure](#field-structure) | +| `description` | `string` | ✗ | A human-readable description | +| `meta` | `object` | ✗ | Any custom defined metadata fields | + +## Field Structure + +**Fields** define the individual columns in your data files, at minimum a field object must have a **name** and **valueType**. + +```json showLineNumbers {13-32} +{ + "name": "clinical_data_dictionary", + "version": "1.2.0", + "description": "Clinical trial data collection schemas", + "meta": { + "author": "Clinical Data Team", + "created": "2024-01-15" + }, + "schemas": [ + { + "name": "patient", + "description": "Patient demographic and clinical information", + "fields": [ + { + "name": "patient_id", + "description": "Unique patient identifier in format PAT-XXXXXX", + "valueType": "string", + "restrictions": { ... }, + } + { + "name": "field2", + "valueType": "boolean", + }, + { + "name": "field3", + "valueType": "integer", + }, + { + "name": "field4", + "valueType": "number", + }, + ] + } + ], +} +``` + +#### Field Value Types + +The allowed values for `valueType` include: + +| Type | Description | Valid Examples | Invalid Examples | +| --------- | --------------------------------------------------- | -------------------------------- | ---------------------- | +| `string` | Text values (any characters except array delimiter) | `"Hello"`, `"PAT-001234"`, `""` | N/A (accepts any text) | +| `integer` | Whole numbers only | `42`, `-17`, `0` | `3.14`, `1.0`, `2.5` | +| `number` | Any numeric value | `42`, `3.14`, `-17.5`, `0` | `"abc"`, `"N/A"` | +| `boolean` | True/false (case-insensitive) | `true`, `True`, `FALSE`, `false` | `yes`, `1`, `0`, `Y` | + +#### Field Properties + +At the field-level the following properties can also be included: + +| Property | Type | Required | Default | Description | +| -------------- | -------------- | -------- | ------- | -------------------------------------------------------------------------------------------------------- | +| `name` | `string` | ✓ | - | Field identifier (used as a column header) | +| `description` | `string` | ✗ | `""` | Human-readable description | +| `valueType` | `string` | ✓ | - | Data type: `string`, `integer`, `number`, `boolean` | +| `isArray` | `boolean` | ✗ | `false` | Whether field accepts multiple values | +| `delimiter` | `string` | ✗ | `","` | Separator for array values | +| `unique` | `boolean` | ✗ | `false` | Whether values must be unique across records | +| `restrictions` | `object/array` | ✗ | `{}` | Where the validation rules/logic for the field is defined, see [Field Restrictions](#field-restrictions) | +| `meta` | `object` | ✗ | `{}` | Any custom defined metadata fields | + +## Field Restrictions + +Field restrictions define the rules that field values must satisfy to be considered valid. Here's an example of a field that defines and validates the age of a patient: + +```json +{ + "name": "patient_age", + "valueType": "integer", + "restrictions": { + "required": true, + "range": { + "min": 0, + "max": 150 + } + } +} +``` + +:::note What this means: +Age must be provided and must be between 0-150 years old. +::: + +Each restrictions object can contain: + +- **Standard Restrictions** (detailed in the sections below) +- **Conditional restrictions** that apply validation logic based on specific [`conditions`](#conditions) + +### `required` + +Ensures a field has a value. + +```json showLineNumbers +{ + "name": "patient_id", + "valueType": "string", + "restrictions": { "required": true } +} +``` + +### `codeList` + +Restricts values to a predefined list of acceptable options. + +```json showLineNumbers +{ + "name": "treatment_response", + "valueType": "string", + "restrictions": { + "codeList": [ + "Complete Response", + "Partial Response", + "Stable Disease", + "Progressive Disease" + ] + } +} +``` + +:::note **What this means:** +The `treatment_response` field will only accept values that exactly match one of the four options in the list. Any other value will be rejected. +::: + +### `range` + +Sets numeric boundaries for `integer` and `number` fields. + +```json showLineNumbers +{ + "name": "age", + "valueType": "integer", + "restrictions": { + "range": { "min": 0, "max": 120 } + } +} +``` + +**Range options:** + +- `min` / `max` - Inclusive boundaries +- `exclusiveMin` / `exclusiveMax` - Exclusive boundaries + +```json showLineNumbers +{ + "name": "adult_age", + "valueType": "integer", + "restrictions": { + "range": { "min": 18, "exclusiveMax": 65 } + } +} +``` + +:::note **What this means:** +Age must be 18 or older (inclusive), but less than 65 (exclusive). So valid values are 18-64. +::: + +### `regex` + +Applies pattern matching validation to string fields. + +```json showLineNumbers +{ + "name": "email", + "valueType": "string", + "restrictions": { + "regex": "^[\\w\\.-]+@[\\w\\.-]+\\.[a-zA-Z]{2,}$" + } +} +``` + +For human readability we recommend using the `description` property and creating a `meta` property `examples` to clearly document the regex restriction. + +```json showLineNumbers +{ + "name": "email", + "valueType": "string", + "description": "Contact email address for patient communication and records. Valid email format: username@domain.extension (minimum 2-letter extension). Accepts letters, numbers, dots, hyphens, and underscores in username and domain.", + "restrictions": { + "required": true, + "regex": "^[\\w\\.-]+@[\\w\\.-]+\\.[a-zA-Z]{2,}$" + }, + "meta": { + "examples": [ + "patient@example.com", + "john.doe@hospital.org", + "contact123@healthcare.gov" + ] + } +} +``` + +:::note **What this means:** +Email address must be provided and follow standard email format with a username, @ symbol, domain name, and valid extension. The examples show acceptable formats while the pattern description explains what characters are allowed. +::: + +### `count` + +Count is an **array specific** condition that controls the number of elements allowed in array fields. + +```json showLineNumbers {4,6} +{ + "name": "medications", + "valueType": "string", + "isArray": true, + "restrictions": { + "count": { "min": 1, "max": 10 } + } +} +``` + +Uses the same boundary options as `range`: `min`, `max`, `exclusiveMin`, `exclusiveMax`. + +:::note **What this means:** +The medications array must contain between 1 and 10 medication entries. Empty arrays or arrays with more than 10 items will be rejected. +::: + +### `empty` + +Requires a field to have no value. + +```json showLineNumbers +{ + "name": "date_of_death", + "valueType": "string", + "restrictions": { "empty": true } +} +``` + +:::note **What this means:** +The field must have no value - only empty strings, null, or undefined values are accepted. Any actual content will be rejected. +::: + +
+**Here is a more practical example using conditional logic** + +The `empty` restriction becomes particularly useful when combined with [conditional restrictions](#conditions) to create logical data rules: + +```json showLineNumbers +{ + "name": "date_of_death", + "valueType": "string", + "restrictions": { + "if": { + "conditions": [ + { "fields": ["patient_status"], "match": { "value": "alive" } } + ] + }, + "then": { "empty": true }, + "else": { "required": true } + } +} +``` + +**What this means:** If the patient status is "alive", then the date of death field must be empty. If the patient status is anything other than "alive", then the date of death field is required. + +
+ +### `conditions` + +Conditional restrictions allow you to apply different validation rules based on the values in other fields. This enables more complex validation rules where the requirements for one field change depending on the data in other fields. + +Think of conditional restrictions as "if-then-else" logic for your data validation: + +- **IF** certain conditions are met in other fields +- **THEN** apply these validation rules +- **ELSE** apply different validation rules (optional) + +#### Basic Structure + +```json showLineNumbers +"restrictions": { + "if": { + "conditions": [ + /* conditions to check */ + ], + }, + "then": { + /* validation rules when conditions are true */ + }, + "else": { + /* validation rules when conditions are false */ + } +} +``` + +**Match Criteria** + +The `match` object defines what you're looking for: + +```json showLineNumbers +// Exact value +{ "match": { "value": "Treatment_A" } } + +// Value from list +{ "match": { "codeList": ["Stage_III", "Stage_IV"] } } + +// Numeric range +{ "match": { "range": { "min": 18, "max": 65 } } } + +// Pattern matching +{ "match": { "regex": "^PAT-\\d{6}$" } } + +// Field has any value +{ "match": { "exists": true } } + +// Array length +{ "match": { "count": { "min": 1 } } } +``` + +#### Simple Example + +```json showLineNumbers +{ + "name": "date_of_death", + "valueType": "string", + "restrictions": { + "if": { + "conditions": [ + { "fields": ["patient_status"], "match": { "value": "deceased" } } + ] + }, + "then": { "required": true }, + "else": { "empty": true } + } +} +``` + +:::note **What this means:** +If the `patient_status` field equals "deceased", then the `date_of_death` field is required. Otherwise, the `date_of_death` field must be empty. +::: + +#### Case Logic + +When you need to check multiple conditions, use the `case` property to specify how many must be true: + +```json showLineNumbers +{ + "name": "treatment_details", + "valueType": "string", + "restrictions": { + "if": { + "conditions": [ + { "fields": ["patient_status"], "match": { "value": "active" } }, + { "fields": ["enrollment_date"], "match": { "exists": true } } + ], + "case": "all" + }, + "then": { "required": true } + } +} +``` + +:::note **What this means:** +If `patient_status` equals "active" AND `enrollment_date` has a value, then `treatment_details` is required. +::: + +**Case Options:** + +| Case Value | Description | Example | +| ----------------- | ----------------------------------- | ------------------------------------------------ | +| `"all"` (default) | All conditions must be true | Patient must be active AND enrolled | +| `"any"` | At least one condition must be true | Patient is active OR has enrollment date | +| `"none"` | No conditions can be true | Patient is NOT active AND has NO enrollment date | + +**Working with Arrays** + +When checking array fields, use `arrayFieldCase` to specify how many array elements must match. Here's a practical example using a medications field: + +```json showLineNumbers {10} +{ + "name": "follow_up_required", + "valueType": "boolean", + "restrictions": { + "if": { + "conditions": [ + { + "fields": ["current_medications"], + "match": { + "codeList": ["chemotherapy", "immunotherapy", "targeted_therapy"] + }, + "arrayFieldCase": "any" + } + ] + }, + "then": { "required": true } + } +} +``` + +:::note **What this means:** +If the patient is taking ANY cancer treatment medication (chemotherapy, immunotherapy, or targeted therapy) in their medications array, then follow-up is required. +::: + +**Array Field Case Options:** + +| Value | Description | Example Use Case | +| -------- | ------------------------------------ | --------------------------------------------- | +| `"all"` | All elements in the array must match | All medications must be from an approved list | +| `"any"` | At least one element must match | Patient has at least one high-risk medication | +| `"none"` | No elements can match | Patient cannot have any contraindicated drugs | + +**Additional Array Examples:** + +```json showLineNumbers +// Example: All medications must be FDA approved +{ + "fields": ["medications"], + "match": { "regex": "^FDA-\\d+" }, + "arrayFieldCase": "all" +} + +// Example: Patient cannot have any experimental drugs +{ + "fields": ["medications"], + "match": { "codeList": ["experimental_drug_A", "experimental_drug_B"] }, + "arrayFieldCase": "none" +} +``` + +#### Complex Example + +You can combine multiple conditions with different logic: + +```json showLineNumbers +{ + "name": "follow_up_required", + "valueType": "boolean", + "restrictions": { + "if": { + "conditions": [ + { + "fields": ["treatment_response"], + "match": { "codeList": ["partial_response", "stable_disease"] } + }, + { + "fields": ["adverse_events"], + "match": { "count": { "min": 1 } } + } + ], + "case": "any" // either condition can trigger the requirement + }, + "then": { "required": true } + } +} +``` + +:::note **What this means:** +If the treatment response is "partial_response" OR "stable_disease", OR if there's at least one adverse event, then follow-up is required. +::: + +## Schema Restrictions + +In addition to field-level restrictions, Lectern Dictionaries support schema-level restrictions that establish relationships between schemas. + +### `uniqueKey` + +Primary keys identify unique records within a schema using the `uniqueKey` restriction applied at the schema level. They ensure that each record can be distinctly identified by one or more field values. + +#### Single Field Primary Key + +```json showLineNumbers {18-20} +{ + "schemas": [ + { + "name": "participant", + "description": "The collection of all data related to a specific individual", + "fields": [ + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study, assigned by the data provider", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + } + ], + "restrictions": { + "uniqueKey": ["submitter_participant_id"] + } + } + ] +} +``` + +:::note **What this means:** +Each `submitter_participant_id` value must be unique across all records in the participant schema - no two participants can have the same ID. +::: + +#### Compound Primary Key + +For cases where uniqueness requires multiple field combinations: + +```json showLineNumbers {24-26} +{ + "schemas": [ + { + "name": "patient_visit", + "description": "Patient visits identified by participant and visit number", + "fields": [ + { + "name": "submitter_participant_id", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + }, + { + "name": "visit_number", + "valueType": "integer", + "restrictions": { + "required": true, + "range": { "min": 1 } + } + } + ], + "restrictions": { + "uniqueKey": ["submitter_participant_id", "visit_number"] + } + } + ] +} +``` + +:::note **What this means:** +The combination of `submitter_participant_id` and `visit_number` must be unique. A participant can have multiple visits, and visit numbers can repeat across participants, but the same participant cannot have duplicate visit numbers. +::: + +### `foreignKey` + +Foreign keys establish relationships between schemas by referencing primary keys in other schemas. They ensure referential integrity by validating that referenced records actually exist. + +#### Basic Foreign Key + +```json showLineNumbers {37-47} +{ + "schemas": [ + { + "name": "participant", + "description": "Study participants", + "fields": [ + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + } + ], + "restrictions": { + "uniqueKey": ["submitter_participant_id"] + } + }, + { + "name": "sociodemographic", + "description": "Captures sociodemographic characteristics", + "fields": [ + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study, assigned by the data provider", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + } + ], + "restrictions": { + "foreignKey": [ + { + "schema": "participant", + "mappings": [ + { + "local": "submitter_participant_id", + "foreign": "submitter_participant_id" + } + ] + } + ] + } + } + ] +} +``` + +:::note **What this means:** +Every `submitter_participant_id` in the sociodemographic schema must match an existing `submitter_participant_id` in the participant schema. You cannot create sociodemographic records for participants that don't exist. +::: + +#### Foreign Key Properties + +| Property | Type | Required | Description | +| ---------- | ---------------------- | -------- | --------------------------------------- | +| `schema` | `string` | ✓ | Name of the referenced schema | +| `mappings` | `Array` | ✓ | Array of field mappings between schemas | + +#### Mapping Object Properties + +| Property | Type | Required | Description | +| --------- | -------- | -------- | ----------------------------------- | +| `local` | `string` | ✓ | Field name in the current schema | +| `foreign` | `string` | ✓ | Field name in the referenced schema | + +#### Validation Rules + +Foreign key validation enforces these requirements: + +- **Referenced schema** must exist in the same dictionary +- **Referenced fields** must be defined as a `uniqueKey` in the target schema +- **Foreign key values** must match existing primary key values in the referenced schema +- **Local fields** referenced in mappings must exist in the current schema + +#### Multiple Foreign Keys + +A schema can reference multiple other schemas to create complex relationships: + +
+**Click here to view the example schema** +```json showLineNumbers {18,45-56,93-113} +{ + "schemas": [ + { + "name": "participant", + "description": "Study participants", + "fields": [ + { + "name": "submitter_participant_id", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + } + ], + "restrictions": { + "uniqueKey": ["submitter_participant_id"] + } + }, + { + "name": "diagnosis", + "description": "Medical diagnoses for participants", + "fields": [ + { + "name": "submitter_diagnosis_id", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + }, + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + } + ], + "restrictions": { + "uniqueKey": ["submitter_diagnosis_id"], + "foreignKey": [ + { + "schema": "participant", + "mappings": [ + { + "local": "submitter_participant_id", + "foreign": "submitter_participant_id" + } + ] + } + ] + } + }, + { + "name": "treatment", + "description": "Medications, procedures, other actions taken for clinical management", + "fields": [ + { + "name": "submitter_treatment_id", + "description": "Unique identifier of the treatment, assigned by the data provider", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + }, + { + "name": "submitter_diagnosis_id", + "description": "Unique identifier of the primary diagnosis event, assigned by the data provider", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + }, + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study, assigned by the data provider", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + } + ], + "restrictions": { + "uniqueKey": ["submitter_treatment_id"], + "foreignKey": [ + { + "schema": "diagnosis", + "mappings": [ + { + "local": "submitter_diagnosis_id", + "foreign": "submitter_diagnosis_id" + } + ] + }, + { + "schema": "participant", + "mappings": [ + { + "local": "submitter_participant_id", + "foreign": "submitter_participant_id" + } + ] + } + ] + } + } + ] +} +``` + +
+ +:::note **What this means:** +In the example above each treatment record must reference both an existing diagnosis and an existing participant. This creates a hierarchical relationship: participant → diagnosis → treatment. +::: + +#### Complete Schema Relationship Example + +Here's how primary and foreign keys work together to create a complete data model: + +```json showLineNumbers +{ + "name": "clinical_dictionary", + "version": "1.0.0", + "schemas": [ + { + "name": "participant", + "description": "Study participants", + "fields": [ + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + } + ], + "restrictions": { + "uniqueKey": ["submitter_participant_id"] + } + }, + { + "name": "diagnosis", + "description": "Medical diagnoses for participants", + "fields": [ + { + "name": "submitter_diagnosis_id", + "description": "Unique identifier of the primary diagnosis event", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + }, + "unique": true + }, + { + "name": "submitter_participant_id", + "description": "Unique identifier of the participant within the study", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "^[A-Za-z0-9\\-\\._]{1,64}$" + } + } + ], + "restrictions": { + "uniqueKey": ["submitter_diagnosis_id"], + "foreignKey": [ + { + "schema": "participant", + "mappings": [ + { + "local": "submitter_participant_id", + "foreign": "submitter_participant_id" + } + ] + } + ] + } + } + ] +} +``` + +:::note **What this relationship creates:** + +- **Participants** have unique IDs (primary key) +- **Diagnoses** have unique IDs (primary key) +- **Each diagnosis** must belong to an existing participant (foreign key) +- **Result**: A one-to-many relationship where one participant can have multiple diagnoses + ::: + +## References + +The `references` section is a **dictionary-level** property that allows you to define reusable values that can be referenced throughout your entire dictionary within all schemas. This is particularly useful for common regular expressions, shared code lists, or other values that appear in multiple places across different schemas. `references` are defined once at the dictionary level and can be used in any schema within the dictionary. + +```json showLineNumbers {39-47} +{ + "name": "clinical_data_dictionary", + "version": "1.2.0", + "description": "Clinical trial data collection schemas", + "meta": { + "author": "Clinical Data Team", + "created": "2024-01-15" + }, + "schemas": [ + { + "name": "patient", + "fields": [ + { + "name": "bioproject_accession", + "valueType": "string", + "restrictions": { + "regex": "#/regex/BioProject_accession" + } + }, + { + "name": "diagnosis_date", + "valueType": "string", + "restrictions": { + "required": true, + "regex": "#/regex/date" + } + }, + { + "name": "country", + "valueType": "string", + "restrictions": { + "required": true, + "codeList": "#/list/geo_loc_name_country" + } + } + ] + } + ], + "references": { + "regex": { + "BioProject_accession": "^PRJN[A-Z0-9]+$", + "date": "^\\d{4}-\\d{2}-\\d{2}$" + }, + "list": { + "geo_loc_name_country": ["Canada", "United States", "Mexico", "..."] + } + } +} +``` + +:::info **Need Help?** +If you encounter any issues or have questions, please don't hesitate to reach out through our relevant [**community support channels**](https://docs.overture.bio/community/support). +::: diff --git a/docs/assets/submission-system.svg b/docs/assets/submission-system.svg new file mode 100644 index 00000000..780e0ddb --- /dev/null +++ b/docs/assets/submission-system.svg @@ -0,0 +1,4 @@ + + + +
Search & Exploration
Data Management & Storage
Arranger Configs
Define the structure and formatting of your data
Index Mapping
 Generated based on Lyric dictionary Schema
Lectern Dictionary 
Define the structure of your tabular data 
Tabular Data
Submission
File Metadata
Submission
File Data
Submission
\ No newline at end of file diff --git a/docs/dictionary-reference.md b/docs/dictionary-reference.md deleted file mode 100644 index f192cdf3..00000000 --- a/docs/dictionary-reference.md +++ /dev/null @@ -1,265 +0,0 @@ -# Lectern Dictionary Meta-Schema Refernce - -For a high level description of the component parts of a Lectern Dictionary see [Important Concepts - Dictionary Model](./important-concepts.md#dictionary-model). - -## Dictionary Structure - -A Lectern Dictionary is a collection of Lectern Schemas. Each schema describes the structure of a TSV file, providing a list of the columns for that file and the data types and restrictions on the content of those columns. - -In addition to schemas, a Lectern Dictionary can contain reference values that can be reused throughout the schema definitions to define property restrictions with shared rules. - -> **Dictionary Structure Example** -> ```json -> { -> "name": "example_dictionary", -> "description": "Collection of schemas to demonstrate Lectern functionality", -> "meta": { /* Custom meta data about the dictionary here */ }, -> -> "version": "1.0", -> -> "schemas": [ /* Schemas Here */ ], -> "references": { /* Reference Variables Here */ } -> } -> ``` - -| Property | Type | Required | Description | Example | -| ------------- | -------------------------------------------------------------- | -------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- | -| `name` | `string` | Required | Display name of the dictionary | `"Example Lectern Dictionary"` | -| `version` | `string`, as a semantic version number `major`.`minor`.`patch` | Required | Version of the dictionary. | `"1.23.4"` | -| `schemas` | `Array<`[`LecternSchema`](#dictionary-schema-structure)`>` | Required | An array containing Lectern Schemas. Minimum of 1 Schema is required. | [Dictionary Schema Structure](#dictionary-schema-structure) | -| `description` | `string` | Optional | Free text description of the schema, for use as a reference for users of the dictionary. This description is not used by Lcetern for dictionary validation. | `"Collection of schemas to demonstrate Lectern functionality"` | -| `meta` | [MetaData](#meta-data-structure) object | Optional | Schema implementor defined fields to capture any additional properties not defined in standard Lectern dictionaries. These properties are not used by Lctern for dictionary validation | `{ "author": "Guy Incognito" }` | -| `references` | [References](#references-structure) object | Optional | Reference values that can be referenced throughout the dictionary. | `{ "customRegex": { "ncitIds": "^NCIT:C\d+$" } }` | - -### Dictionary Schema Structure -> **Dictionary Schema Example** -> ```json -> { -> "name": "example-schema", -> "description": "Demonstrating structure of Lectern Schema", -> "meta": { /* Custom meta data about the schema here */ }, -> -> "fields": [ /* Fields Here */ ] -> } -> ``` - - -| Property | Type | Required | Default | Description | Example | -| ------------- | ------------------------------------------------------- | -------- | ------- | :------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------- | -| `name` | NameString (no whitespace or `.`) | Required | | Name of the schema. This will be used in paths that reference this schema, and for identifying files containing records for this schema. | `"example-schema"` | -| `fields` | `Array<`[`LecterField`](#dictionary-field-structure)`>` | Required | | List of fields contained in this Schema. | [Dictionary Field Structure](#dictionary-field-structure) | -| `description` | `string` | Optional | None | Free text description of the schema, for use as a reference for users of the schema. This description is not used in dictionary validation. | `"Demonstrating structure of Lectern Schema"` | -| `meta` | [`MetaData`](#meta-data-structure) | Optional | None | Schema implementor defined fields to capture any additional properties not defined in standard Lectern schemas. | [Meta Data Structure](#meta-data-structure) | - -### Dictionary Field Structure -> **Example Dictionary Field Definition** -> ```json -> { -> "name": "example_field", -> "description": "Shows a string field with a required restriction", -> "meta": { /* Custom meta data abou the field here */ }, -> "isArray": false, -> -> "valueType": "string", -> "restrictions": { -> "required": true -> } -> } -> ``` - -| Property | Required | Default | Type | Description | Example | -| -------------- | -------- | ---------------------- | --------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- | -| `name` | Required | | NameString (no whitespace or `.`) | Name of the field. This will be used as the header in TSV files in this field's schema, and in any paths referencing this field. | `"example_field` | -| `valueType` | Required | | [Field Data Type](#field-data-types) | Type of value stored in this field | `"string"` | -| `delimiter` | Optional | `,` | `string` | Character or string that will be used to split multiple values into an array. The default delimiter is a comma `,`. Any characters can be used as a delimiter. The delimiter value can be one or more characters long, but cannot be an empty string. Note: This property has no effect unless the field has `isArray: true`. | `"\|"` | -| `description` | Optional | `""` No value | `string` | Free text description of the field, for use as a reference for users of the schema. This description is not used in dictionary validation. | `"Shows a string field with a required restriction"` | -| `meta` | Optional | Empty object, no value | [`MetaData`](#meta-data-structure) object | Schema implementor defined fields to capture any additional properties not defined in standard Lectern fields. | `{ "displayName": "Example Field" }` | -| `isArray` | Optional | `false` | `boolean` | Type of value stored in this field | | -| `restrictions` | Optional | No Restrictions | `RestrictionsObject` or `Array` | An object containing all validation rules for this field. This can be a single object containing all [restrictions](#field-restrictions) applied to this field or a list of objects whose restrictions will be combined. [Conditional restrictions](#conditional-restrictions) can also be used to apply validation rules based on values of other fields in the record. | `{ "required": true }` | -| `unique` | Optional | `false` | `boolean` | Indicates that every record in this schema should have a unique value for this field. This rule is applied when a collection of records are validated together, ensuring that no two records in that collection repeat a value. | `true` | - - - -#### Field Data Types - -| valueType | Description | Examples | -| --------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- | -| `boolean` | Boolean value, either `true` or `false`. Accepts values with any letter casing, for example `true`, `True`, and `TRUE` will all be interpretted as `true` | `true`, `false` | -| `integer` | Numeric integer value. Will accept positive and negative values (ex. `21` or `-8`) but will reject any decimals (ex. `1.23`) | `21`, `-8` | -| `number` | Numeric value. Will accept any numeric value, including those with decimals. | `1.23`, `-4.567` | -| `string` | String fields. Value can have any length and use any character, other than the array delimiter for an array field (by default ` \| `) | `asdf`, `Hello World`, `Another longer example of a string` | - -#### Field Restrictions - -Restrictions on a field are a list of rules that all values for this field must adhere to, these are the list of validations on the contents of each field. Two examples of restrictions are that a value is `required`, and that a value must take a value from a list of available options (`codeList`). The full list of available restrictions are described in the table below. - -The restrictions property of a field can have a value that is either a single restrictions object, or an array with any number of restrictions objects. If an array of restriction objects is provided, each set of restrictions will be applied in turn - for data to be valid, all restrictions in the array must pass. A restrictions object can either contain a set of restrictions from the table below, or be a [conditional restriction](#conditional-restrictions). - -The full list of available restrictions are: - -| Restriction | Used with Field Types | Type | Description | Examples | -| ----------- | ----------------------------- | --------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `codeList` | `integer`, `number`, `string` | Array of type of the field | An array of values of the type matching this field. Data provided for this field must have one of the values in this list. | `["Weak", "Average", "Strong"]` | -| `compare` | all | [ComparedFieldsRule](#comparedfieldsrule-data-structure) object | Enforces that this field has a value based on the provided value in another field. Examples would be to ensure that the two values are not equal, or for numeric values ensure one is greater than the other. | `{ "fields": ["age_at_diagnosis"], "relation": "greaterThanOrEqual" }` Ensure that a field such as `age_at_death` is greater than the provided `age_at_diagnosis` | -| `count` | Array fields of all types | `integer` or [`RangeRule`](#rangerule-data-structure) object | Enfroces the number of entries in an array. Can specify an exact array size, or provide range rules that set maximum and minimum counts. | `7` or `{"min": 5, "max": 10}` | -| `empty` | all | | Requires that no value is provided. This is useful when used on a [conditional restriction](#conditional-restrictions) in order to prevent a value from being given when the condition is `true`. For an array field with this restriction, an empty array is a valid value for this restriction. | n/a | -| `range` | `integer`, `number` | | Uses a [RangeRule](#rangerule-data-structure) object to define minimum and/or maximum values for this field | `{"min": 5}`, `{"exclusiveMax": 50}`, `{"exclusiveMin": 5, "max": 50}` | -| `regex` | `string` | | A regular expression that all values must match. | `^[a-z0-9]+$` | -| `required` | all | | A value must be provided, missing/undefined values will fail validation. Empty strings will not be accepted, though `0` (for `number` and `int` fields) and `false` (for `boolean` fields) are accepted. An array field with this restriction must have at least one entry. | `true`, `false` | - -#### Conditional Restrictions - -Restrictions can be added with conditions so that the validations are only applied based on the values provided to other fields within a record. - -A conditional restriction uses an if/then/else style syntax: - -The `if` property will be an object containing an array of `conditions` that look at other fields on the same record and apply matching rules to their values. When those field values match the rules in the condition than the condition passes. An optional `case` property can be added to the `if` object that defines how many of the `conditions` have to pass in order for the whole condition block to resolve as `true` - default is `all`, requiring all conditions to be met. - -The `then` object contains the restrictions that will be applied when the `if` condition is `true`, and the `else` condition contains restrictions to apply when the `if` condition is `false`. The `then` property is required but using an `else` property is optional. - -| Property | Required | Default | Type | Description | Example | -| -------- | -------- | ----------------------------- | --------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | -| `if` | Required | | `RequirementsConditions` | Contains the conditional cases that will be checked before applying this object's restrictions. This object contains a list of `conditions` and a `case` that indicates how many of the conditions need to be found `true` for the entire conditions block to be considered `true`. The case options are `any`, `all`, and `none`, with `all` being default (if case is not provided). | `{ "conditions": [ { "field": "another_field", "match": { "value": "Some Value" }} ], "case": "all" }` | -| `then` | Required | | `RestrictionsObject` or `Array` | The restriction rules to apply when the `if` condition is found to be `true`. | `{ "required": true}` | -| `else` | Optional | Empty object, no restrictions | `RestrictionsObject` or `Array` | The restriction rules to apply when the `if` condition is found to be `false`. | `{ "empty": true}` | - -```json -{ - "if": { - "conditions": [ /* Restriction conditions */ ], - "case": "all" - }, - "then": {/* Restrictons */} OR [ /* Restrictions objects (restriction values or nested conditional restrictions */ ], - "else": {/* Restrictons */} OR [ /* Restrictions objects (restriction values or nested conditional restrictions */ ] -} -``` - -##### Conditions Structure - -A requirement condition is defined by providing a field name or list of field names from this schema, and the matching rules that satisfy this condition. If multiple field names are provided, a `case` property can be added to specify how many of their values must pass the matching rules (`all`, `any`, or `none` of them). - -| Property | Required | Default | Type | Description | Example | -| ---------------- | -------- | ------- | -------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------- | -| `fields` | Required | | `Array` | Names of fields from the same schema. This match rule will be applied to all fields listed - see `case` to determine the rules for how many of these fields must match. All specified fields must store values of the same type. | `["some_field"]` | -| `match` | Required | | `MatchRules` object | Matching rules for the values of the `fields`. All rules included in this object will be tested and all must be pass - this is not affected by the `case` property. [Conditional Match Rules](#conditional-match-rules) | `{ "value": "Hello World" }` | -| `arrayFieldCase` | Optional | `all` | `all`, `any`, `none` | When a specified field is an array type, the `arrayFieldCase` dictates how many of the values in the array must pass the matching rules. `all` requires all values in the array to pass the matching rule. `any` requires at least one value in the array to match. `non` requires that none of the values in the array match. | `any` | -| `case` | Optional | `all` | `all`, `any`, `none` | Defines how many of the listed `fields` must have a value that matches the `match` rules. `all` requires all fields values to have matching values. `any` requires at least one field to have a matching value. `none` requires that there none of the specified fields have values that match. | `any` | - -> **Example Conditional Restriction**: match single value -> -> Condition where `shirt_size` is `Small` -> ```json -> { -> "fields": ["shirt_size"], -> "match": { -> "value": "Small" -> } -> } -> ``` - -> **Example Conditional Restriction**: match value from list -> -> Condition where `shirt_size` is any value in a list (`Medium` or `Large`) -> ```json -> { -> "fields": ["shirt_size"], -> "match": { -> "codeList": ["Medium", "Large`"] -> } -> } -> ``` - -##### Conditional Match Rules -| Property | Used with Field Types | Type | Description | Example | -| ---------- | --------------------- | -------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | -| `codeList` | all | Array of type of specified fields | A list of values that the field could match. This rule passes when the specified field's value can be found in this list. | `["value_one", "value_two"]` | -| `count` | Array type fields | Integer, or [RangeRule](#rangerule-data-structure) | Matches the number of values in an array field. This condition can be provided as a number, in which case this condition matches if the array is that exact length. This condition can be provided as a Range object as well, in which case it will match if the number of elements in the array pass the minimum and maximum conditions provided in the condition. | `2` - Field must have exactly 2 elements.
`{ max: 10 }` - Field must have no more than 10 items. | -| `exists` | all | Boolean | This condition requires a field to either have a value or have no value. When the `exists` condition is set to `true`, the field must have a value. When `exists` is sdet to `false`, the field must have no value. For array fields, `exists=false` only matches when the array is completely empty, and `exists=true` passes if the array has 1 or more values - `arrayCase` has no interaction with the `exists` condition. | `true` | -| `range` | `number`, `integer` | [RangeRule](#rangerule-data-structure) | Maximum and minimum value conditions that a numeric field must pass. | `{ min: 5, exclusiveMax: 10 }` Represents an integer from 5-9. | -| `regex` | `string` | String (Regular Expression) | A regular expression pattern that the value must match. | `^NCIT:C\d+$` Value must match an NCI Thesaurus ID. | -| `value` | all | Type of specified fields | Field value matches the value of the specified field. Strings are matched case insensitive. When arrays are matched, the order of their elements is ignored - a field matches this condition if the elements in field are the same elements as in the value match rule. For example, the rule `['abc', 'def']` matches the value `['def', 'abc']` but does not match `['abc', 'def', 'ghi']`. | `some_value`, `[1, 2, 3]` | - -### Meta Data Structure - -> **Meta Example** -> ```json -> { -> "displayName": "Nicely Formatted Name", -> "externalReferenceId": "ABCD:1234", -> "exampleBooleanPropery": true, -> "exampleNumericProperty": 123 -> } -> ``` - -A `meta` object is available to allow the dictionary creator to add custom properties to the Lectern Dictionary. The `meta` property is available to all Dictionary, Schema, and Field objects. Providing a `meta` value is optional. If provided the `meta` value is a JSON object. There are no restrictions on the field names that can be added to the `meta` object other than they must be valid JSON. The values for properties of the `meta` can either be another nested meta object, or are one of the allowed value types: - - `string` - - `number` - - `boolean` - - `Array` - - `Array` - -### References Structure - -References are defined at the dictionary level so they can be reused across schemas. References can be used to store values that can be used in `meta` or `restrictions` - -#### Using References -Reference variables can be used in a `meta` object or a `restrictions` object as either a restriction value or a conditional match value. - -To use a reference, replace the value in the value of the meta or restriction property with a string containing a `ReferenceTag`. A `ReferenceTags` - -### RangeRule Data Structure - -> **RangeRule Example** -> ```json -> { -> "min": 5, -> "exclusiveMax": 10 -> } -> ``` - -`RangeRule` objects are used to define restrictions and conditions where a numeric minimum or maximum needs to be defined. This object must define at least 1 property (ie. could define a minimum but not maximum, or vice-versa). - -There is an inclusive and an exclusive version of the minimum and maximum properties. `min` and `max` are _inclusive_, and the alternate form `exclusiveMin` and `exclusiveMax` are _exclusive_. By example, `{ "min":5 }` allows the value `5` and greater, while `{ "exclusiveMin": 5 }` allows only values greater than `5` but not `5` itself. - -A `RangeRule` cannot include but an inclusive and exclusive version of min, or of max (ie. it cannot have `min` and `exclusiveMin`.) - -| Property | Description | -| -------------- | :---------------------------------------------------------------- | -| `exclusiveMax` | Allows values less than this value, but not this value itself. | -| `exclusiveMin` | Allows values greater than this value, but not this value itself. | -| `max` | Allows this value and values lesser than this value. | -| `min` | Allows this value and values greater than this value. | - -### ComparedFieldsRule Data Structure - -> **ComparedFieldsRule** Example -> -> ```json -> { -> "fields": "some_field", -> "relation": "equal", -> } -> ``` - -| Property | Required | Default | Type | Description | -| ---------- | -------- | -------------------- | ---------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `fields` | Required | | `string` or `Array` | The field(s) that the values of will be compared to. These fields will be refered to throughout this section as _compared to_ fields. All these fields need to be the same type as the field(s) they will be compared to. | -| `relation` | Required | | `equal`, `notEqual`, `contains`, `containedIn`, `greaterThan`, `greaterThanOrEqual`, `lesserThan`, `lesserThanOrEqual` | The relation between the values of the test field and the compared to fields. See [ComparedFieldsRule Relations](#comparedfieldsrule-relations). | -| `case` | Optional | `all`, `any`, `none` | MatchCase (RangeRule or one of: `all`, `any`, `none`) | How many of the _compared to_ fields must pass the comparison for this rule to pass. | - -#### ComparedFieldsRule Relations - -| Relation Value | Allowable Field Types | Description | -| ------------------------ | --------------------- | :--------------------------------------------------------------------------------------------------------- | -| **`equal`**: | all | Checks that the current field and the comapred field(s) have the same value | -| **`notEqual`**: | all | Checks that the current field and the comapred field(s) do not have the same value | -| **`contains`** | `string` | Checks that the value of the current field completely contains the value of the compared field(s) | -| **`containedIn`** | `string` | Checks that the value of the current field is completely contained in the value of the compared field(s) | -| **`greaterThan`** | `number`, `integer` | Checks that the value of the current field is greater than (exclusive) the value of the compared field(s). | -| **`greaterThanOrEqual`** | `number`, `integer` | Checks that the value of the current field is greater than or equal to the value of the compared field(s). | -| **`lesserThan`** | `number`, `integer` | Checks that the value of the current field is lesser than (exclusive) the value of the compared field(s). | -| **`lesserThanOrEqual`** | `number`, `integer` | Checks that the value of the current field is lesser than or equal to the value of the compared field(s). | - -## Source Code Reference - -Source code for the Lectern Dictionary meta-schema is made available through the package [@overture-stack/lectern-dictionary](../packages/dictionary/). The meta-schema is formally defined in TypeScript and exported as the type `Dictionary` from the file [`dictionary/src/types/dictionaryTypes.ts`](../packages/dictionary/src/types/dictionaryTypes.ts). This definition is created using [`Zod`] schemas, which are also exported from this package and available for use to confirm a given object is a valid Lectern Dictionary. \ No newline at end of file diff --git a/docs/overview/images/submission-system.svg b/docs/overview/images/submission-system.svg deleted file mode 100644 index a41f4e66..00000000 --- a/docs/overview/images/submission-system.svg +++ /dev/null @@ -1,4 +0,0 @@ - - - - \ No newline at end of file diff --git a/docs/important-concepts.md b/pendingDocs/glossary.md similarity index 93% rename from docs/important-concepts.md rename to pendingDocs/glossary.md index 644b34c1..d6efbdc4 100644 --- a/docs/important-concepts.md +++ b/pendingDocs/glossary.md @@ -1,14 +1,14 @@ -# Important Concepts +# Glossary This document is a reference of commonly used terms and definitions. ## Dictionary Model -Lectern provides a "meta-schema" which describes a syntax for creating Data Dictionaries. This meta-schema is a set of rules for a JSON document, and any JSON document that correctly applies these rules represents a valid Lectern Dictionary. The meta-schema is defined through code rules in the [@overture-stack/lectern-dictionary](../packages/dictionary) package. +Lectern provides a "meta-schema" which describes a syntax for creating Data Dictionaries. This meta-schema is a set of rules for a JSON document, and any JSON document that correctly applies these rules represents a valid Lectern Dictionary. The meta-schema is defined through code rules in the [@overture-stack/lectern-dictionary](../packages/dictionary) package. A [JSON-schema version of this meta-schema](../generated/DictionaryMetaSchema.json) has been generated and is included in this code base. -This section describes at a high level the component parts of a Lectern Dictionary and the terms used when discussing those parts. The terms defined here are used throughout the documentation and the type system of the Lectern codebase. If you are writing a Lectern Dictionary, you may instead be looking for the [reference documentation for Lectern Dictionaries](). +This section describes at a high level the component parts of a Lectern Dictionary and the terms used when discussing those parts. The terms defined here are used throughout the documentation and the type system of the Lectern codebase. If you are writing a Lectern Dictionary, you may instead be looking for the reference documentation for Lectern Dictionaries. ### Dictionary @@ -49,6 +49,7 @@ Placeholder ## Common Types ### DataRecord and UnprocessedDataRecord + The `DataRecord` type represents a single record from some Schema. They are objects with keys that match the [fields](#field) from a [schema](#schema) and a value that should be one of the valid Lectern data types. There is no guarantee that a `DataRecord` is "valid", it could have values that fail some restrictions from the schema. An `UnprocessedDataRecord` is very similar, but all values are raw string values. These represent a single record as it would be submitted in a test file, for example all the data from a single line in a TSV. These string values will need to be [parsed](#parsing) to be converted to their proper types as defined in a schema. @@ -65,7 +66,7 @@ Example Valid `TestResult`: ```ts { - valid: true + valid: true; } ``` @@ -99,4 +100,4 @@ Placeholder ### Processing -Placeholder \ No newline at end of file +Placeholder diff --git a/docs/lectern-2.0-changes.md b/pendingDocs/migration/lectern2changes.md similarity index 99% rename from docs/lectern-2.0-changes.md rename to pendingDocs/migration/lectern2changes.md index 9f389539..23361b3e 100644 --- a/docs/lectern-2.0-changes.md +++ b/pendingDocs/migration/lectern2changes.md @@ -22,7 +22,6 @@ The release of Lectern 2.0 brings some important upgrades to the Lectern service - Updated interface for Lectern Server REST client - Exposes dictionary meta-schema validation, data parsing, and data validation functions - ### New Published Lectern TS Packages - [Lectern Dictionary](../packages/dictionary/) diff --git a/docs/validation/field-validation.md b/pendingDocs/validation/field-validation.md similarity index 100% rename from docs/validation/field-validation.md rename to pendingDocs/validation/field-validation.md diff --git a/docs/validation/index.md b/pendingDocs/validation/index.md similarity index 100% rename from docs/validation/index.md rename to pendingDocs/validation/index.md diff --git a/docs/validation/record-validation.md b/pendingDocs/validation/record-validation.md similarity index 100% rename from docs/validation/record-validation.md rename to pendingDocs/validation/record-validation.md