Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readme-updates #874

Merged
merged 1 commit into from
Sep 13, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 21 additions & 9 deletions torchao/sparsity/prototype/superblock/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,8 @@ At least one GPU:
Baseline:
```
python benchmark.py \
--model vit_b_16 \
--model vit_h_14 \
--batch-size 256 \
> /dev/null
```
Result:
```
Expand All @@ -59,19 +58,27 @@ Result:

80% sparsity, block size 64 (random weights):
```
python benchmark.py --model vit_b_16 \
python benchmark.py \
--model vit_h_14 \
--batch-size 256 \
--sparsity-linear 0.8 \
--sp-linear-tile-size 64 \
--sparsify-weights \
--bsr 64 \
> /dev/null
--sparsity bsr
```
Result:
```
393.864453125 ms
```

Semi-structured sparsity
```
python benchmark.py \
--model vit_h_14 \
--batch-size 256 \
--sparsity semi_structured
```


## Training
Please refer to [TRAINING.md](TRAINING.md) for training from scratch. We use [Torchvision](https://github.com/pytorch/vision/tree/main/references/classification) as our framework for training. Supermask can be applied during training.
Expand Down Expand Up @@ -102,11 +109,11 @@ To apply supermask, we have the following arguments at our disposal,
For example, if you would like to train a `vit_b_16` from scratch using Supermask, you can use the respective torchvision command found in [TRAINING.md](TRAINING.md) and append the supermask arguments:
```
torchrun --nproc_per_node=8 train.py\
--model vit_b_16 --epochs 300 --batch-size 512 --opt adamw --lr 0.003 --wd 0.3\
--model vit_h_14 --epochs 3 --batch-size 64 --opt adamw --lr 0.003 --wd 0.3\
--lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
--clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema\
--sparsity-linear 0.9 --sp-linear-tile-size 32
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 \
--clip-grad-norm 1 --cutmix-alpha 1.0 --model-ema\
--sparsity semi_structured --data-path $IMAGENET_PATH
```
Through this command, we are training a `vit_b_16` with 90% sparsity to linear layers using 32x32 tiles.

Expand Down Expand Up @@ -134,6 +141,11 @@ NGPUS=1 # put number of available GPUS here
```
This is similar to the previous command, but it does not apply offline sparsification or BSR conversion. Instead, the sparsity is applied on-the-fly during evaluation.

* Semi-structured sparsity
```
python evaluate.py --model vit_b_16 --batch-size 256 --data-path $IMAGENET_PATH --weights-path checkpoints/2x4_sparse_ft_1_epoch.pth --sparsity semi_structured --skip-last-layer-sparsity
```

Please run `python evaluate.py --help` for a full list of available arguments.

Results (1x A100):
Expand Down
Loading