This is the official repository for the paper titled, "Optimized Vision Transformer Training using GPU and Multi-threading," published at the IEEE Conference on Artificial Intelligence 2024 (IEEE CAI 2024). This repository contains an optimized implementation of Convolutional Neural Networks (CNN), Transformer, and Vision Transformer (ViT) models.
- Jonathan Ledet (@jonledet)
- Ashok Kumar
- Dominick Rizk
- Rodrigue Rizk
- KC Santosh
This project focuses on optimizing Vision Transformer training using GPU acceleration and multi-threading techniques. It provides implementations of popular deep learning models, including Convolutional Neural Networks (CNN), Transformer, and a customized version of Vision Transformer (ViT) tailored for improved performance.
- CNN: Implementation of Convolutional Neural Networks.
- Transformer: Implementation of the Transformer model.
- ViT: Customized version of the Vision Transformer (ViT) model, based on the vision-transformers-cifar10 repository.
- Python (>=3.6)
- Anaconda 3
- PyTorch
- CUDA-enabled GPU (for GPU acceleration)
-
Clone this repository:
git clone https://github.com/jonledet/vision-transformer.git
-
Create and activate a new Anaconda environment:
conda create --name your-env-name python=3.6 conda activate your-env-name
-
Install dependencies:
pip install -r requirements.txt
- To run the models, execute the corresponding Python script:
python cnn.py
python transformer.py
python vit.py
- The Vision Transformer (ViT) model is based on the work from the vision-transformers-cifar10 repository.
This project is licensed under the MIT License.