Ptype-Physical

This package contains code for generating machine learning estimates of winter precipitation type from vertical profiles of the atmosphere.

Installation

Clone the repository from github git clone git@github.com:ai2es/ptype-physical.git
Within the terminal, go to the top-level directory with cd ptype-physical.
Install miniconda, then create a ptype environment with the following command: conda env create -f environment.yml.
Alternatively, add the dependencies to an existing environment with conda env update -f environment.yml.
Activate the environment by running conda activate ptype or source activate ptype.
Install the package directly: pip install . or link the package if you are debugging pip install -e .

Train a classifier model to predict ptypes

Train a multi-layer perceptron model via

python applications/train_mlp.py -c config/ptype.yml

Upon completion, the model will save predictions on training, validation and test splits to file. Next compute metrics on the splits and case studies via

python applications/evaluate_mlp.py -c config/ptype.yml

Two types of approaches are supported: (1) A standard MLP trained with cross-entropy, and (2) evidential approach trained with Dirichlet loss.

Option (1) may be selected by setting the loss to "categorical_crossentropy" and the output_activation to "softmax" in ptype.yml.
Option (2) may be selected by setting the loss to "dirichlet" and the output_activation to "linear" in ptype.yml.

For more details,

python applications/train_mlp.py --help
python applications/evaluate_mlp.py --help

Active training

Active training can be called with Option (1) via

python applications/active_training.py -c config/ptype.yml -i 20 -p "random"

which will perform 20 iterations with a random policy. The number of cross-validation steps at each iteration is controlled using "n_splits" in the config file.

One may also select "mc-dropout" for the policy, and use the parser option -s to set the number of Monte Carlo iterations:

python applications/active_training.py -c config/ptype.yml -i 20 -p "mc-dropout" -s 100

For option (2), the supported policies are

"evidential", "aleatoric", "epistemic", "random", "mc-dropout"

The script can also be run on multiple GPU nodes for either option via

python applications/active_training.py -c config/ptype.yml -i 20 -p "random" -l 1 -n 20

which will launch 20 PBS jobs to GPU nodes.

Once all iterations have completed, the results are currently viewed using notebooks/compare_active_training.ipynb

For more details,

python applications/active_training.py --help

Config file

Inference

The P-type models can currently be run historically on using the High Resolution Rapid Refresh (HRRR), Rapid Refresh (RAP), or Global Forecast System (GFS) models. All data is downloaded in GRIB format and automatically deleted from the users space. Data is downloaded using the Herbie-data package. Data can be processed in parallel using either standard Python multiprocessing or using dask (better for large historical runs).

The output format for the prediction files is:

/glade/scratch/username/ptype_output/{nwp_model}/{init_day}/{init_hour}/ptype_predictions_{nwp_model}{init_hour}z_fh{forecast_hour}.nc

The content of each file consists of the following (can be slightly modified in configuration file)

Probability of precipitation for rain, snow, sleet, and freezing rain.
ML categorical precipitation type (max probability)
NWP categorical precipitation type (from model used for input)
Orography
2m temperature and dewpoint, 10m winds (other variables can be added in configuration file)

Configuration file for inference: /config/inference.yml

model
- supports "hrrr", "gfs", "rap"
ML_model_path
- Path to ML model
out_path
- Base save path
drop_input_data
- Boolean (whether or not to save vertical profile data used for mdoel input)
- Setting this to True greatly increase file size
n_processors
- Number of processors to use if using standard Python Multiprocessing (ignored if use_dask=True)
use_dask
- Boolean (True uses Dask for parallelizaion, False uses standard Multiprocessing)
dates
- Dictionary for starting and ending model initialization times (inclusive)
forecast_range
- Dictionary for forecast hours (inclusive) for each model initialzation time
height_levels
- Dictionary for model height levels needed for input into the ML model
variables
- Dictionary of variables and input needed to load data for each of the NWP models
- All NWP models don't all have the same variables so different ones are needed (specifically ones to derive dewpoint)
- "product" comes from the herbie API
- Other variables can be added
dask_params
- Parameters to be used if use_dask=True
- Specific for NCARs Casper Cluster

Name		Name	Last commit message	Last commit date
Latest commit History 314 Commits
.github/workflows		.github/workflows
applications		applications
composite-soundings		composite-soundings
config		config
doc		doc
notebooks		notebooks
ptype		ptype
scripts		scripts
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
githubcommand.md		githubcommand.md
requirements.txt		requirements.txt
setup.py		setup.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ptype-Physical

Installation

Train a classifier model to predict ptypes

Active training

Config file

Inference

Configuration file for inference: /config/inference.yml

About

Releases

Packages

Contributors 9

Languages

License

ai2es/ptype-physical

Folders and files

Latest commit

History

Repository files navigation

Ptype-Physical

Installation

Train a classifier model to predict ptypes

Active training

Config file

Inference

Configuration file for inference: /config/inference.yml

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages