Skip to content

Latest commit

 

History

History
90 lines (64 loc) · 3.68 KB

README.md

File metadata and controls

90 lines (64 loc) · 3.68 KB

Dagster Dags

Dagster defintions for OCF's archival datasets


Ubiquitous language

The following terms are used throughout the codebase and documentation. They are defined here to avoid ambiguity.

  • InitTime - The time at which a forecast is initialized. For example, a forecast initialized at 12:00 on 1st January.
  • TargetTime - The time at which a predicted value is valid. For example, a forecast with InitTime 12:00 on 1st January predicts that the temperature at TargetTime 12:00 on 2nd January at position x will be 10 degrees.

Repository structure

Produced by eza:

eza --tree --git-ignore -F -I "*init*|test*.*|build"
./
├── cloud_archives/ # Dagster definitions for cloud-stored archival datasets
│  └── nwp/ # Specifications for Numerical Weather Predication data sources
│     └── icon/ 
├── constants.py # Values used across the project
├── dags_tests/ # Tests for the project
├── local_archives/ # Dagster defintions for locally-stored archival datasets
│  ├── nwp/ # Specifications for Numerical Weather Prediction data source
│  │  ├── cams/
│  │  └── ecmwf/
│  └── sat/ # Specifications for Satellite image data sources
├── managers/ # IO Managers for use across the project
├── pyproject.toml # The build configuration for the service
└── README.md

Conventions

The storage of data is handled automatically into locations defined by features of the data in question. The only configurable part of the storage is the Base Path - the root point from which dagster will then handle the subpaths. The full storage paths then take into account the following features:

  • The flavor of the data (NWP, Satellite etc)
  • The Provider of the data (CEDA, ECMWF etc)
  • The Region the data covers (UK, EU etc)
  • The InitTime the data refers to

Paths are then generated viabase/flavor/provider/region/inittime. See managers for an example implementation. For this to work, each asset must have an asset key prefix conforming to this structure [flavor, provider, region]. The Base Paths are defined in constants.py.

Local Development

First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in "editable mode" so that as you develop, local code changes will automatically apply.

pip install -e ".[dev]"

Then, start the Dagster UI web server:

dagster dev --module-name=local_archives

Open http://localhost:3000 with your browser to see the project.

Add your assets to the relevant code location. See Repository Structure for details.

Useful links