This is a template for a PyTorch Project for training, testing, inference demo, and FastAPI serving along with Docker support.
Use python venv
or a conda env
to install requirements:
- Install full train requirements:
pip install -r requirements/train.txt
- Install minimal inference requirements:
pip install -r requirements/inference.txt
Example training for mnist digit classification:
python train.py --cfg configs/mnist_config.yaml
Set training data inside data
directory in the following format:
data
|── SOURCE_DATASET
├── CLASS 1
| ├── img1
| └── img2
| ├── ...
├── CLASS 2
| ├── img1
| └── img2
| ├── ...
Note: ImageNet style class_dir->subdirs->subdirs->images... is also supported
# generate an id to name classmap
python scripts/generate_classmap_from_dataset.py --sd data/SOURCE_DATASET --mp data/ID_2_CLASSNAME_MAP_TXT_FILE
# create train val test split, also creates an index to classname mapping txt file
python scripts/train_val_test_split.py --rd data/SOURCE_DATASET --td data/SOURCE_DATASET_SPLIT --vs VAL_SPLIT_FRAC -ts TEST_SPLIT_FRAC
# OPTIONAL duplicate train data if necessary
python scripts/duplicate_data.py --rd data/SOURCE_DATASET_SPLIT/train --td data/SOURCE_DATASET_SPLIT/train -n TARGET_NUMBER
# create a custom config file based on configs/classifier_cpu_config.yaml and modify train parameters
cp configs/classifier_cpu_config.yaml configs/custom_classifier_cpu_config.yaml
Sample data used in the custom image classification training downloaded from https://www.kaggle.com/datasets/umairshahpirzada/birds-20-species-image-classification.
# train on custom data with custom config
python train.py --cfg custom_classifier_cpu_config.yaml
Convert existing dataset to a tar
archive format used by WebDataset. The data directory must match the structure above.
# ID_2_CLASSNAME_MAP_TXT_FILE is generated using the scripts/train_val_test_split.py file
# convert train/val/test splits into tar archives
python scripts/convert_dataset_to_tar.py --sd data/SOURCE_DATA_SPLIT --td data/TARGET_TAR_SPLIT.tar --mp ID_2_CLASSNAME_MAP_TXT_FILE
An example configuration for training with the WebDataset format is provided in configs/classifier_webdataset_cpu_config.yaml
.
# example training with webdataset tar data format
python train.py --cfg configs/classifier_webdataset_cpu_config.yaml
Test based on CONFIG_FILE. By default testing is done for mnist classification.
python test.py --cfg CONFIG_FILE
python export.py --cfg CONFIG_FILE -r MODEL_PATH --mode <"ONNX_TS"/"ONNX_DYNAMO"/"TS_TRACE"/"TS_SCRIPT">
All tensorboard logs are saved in the tensorboard_log_dir
setting in the config file. Logs include train/val epoch accuracy/loss, graph, and preprocessed images per epoch.
To start a tensorboard server reading logs from the experiment
dir exposed on port localhost port 6007
:
tensorboard --logdir=TF_LOG_DIR --port=6006
Install docker in the system first:
bash scripts/build_docker.sh # builds the docker image
bash scripts/run_docker.sh # runs the previous docker image creating a shared volume checkpoint_docker outside the container
# inside the docker container
python train.py
Using gpus inside docker for training/testing:
--gpus device=0,1 or all
bash server/build_server_docker.sh -m pytorch/onnx
bash server/run_server_docker.sh -h/--http 8080
Clean cached builds, pycache, .DS_Store files, etc:
bash scripts/cleanup.sh
Count number of files in sub-directories in PATH
bash scripts/count_files.sh PATH
- Line by line GPU memory usage profiling pytorch_memlab
- Line by line time used profiling line_profiler
- https://github.com/victoresque/pytorch-template
- WebDataset https://modelzoo.co/model/webdataset
- PyTorch Ecosystem Tools https://pytorch.org/ecosystem/