Skip to content

Commit

Permalink
readme-updates (#874)
Browse files Browse the repository at this point in the history
  • Loading branch information
jcaip authored and jainapurva committed Sep 22, 2024
1 parent 704a7c6 commit 00b3220
Showing 1 changed file with 21 additions and 9 deletions.
30 changes: 21 additions & 9 deletions torchao/sparsity/prototype/superblock/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,8 @@ At least one GPU:
Baseline:
```
python benchmark.py \
--model vit_b_16 \
--model vit_h_14 \
--batch-size 256 \
> /dev/null
```
Result:
```
Expand All @@ -59,19 +58,27 @@ Result:

80% sparsity, block size 64 (random weights):
```
python benchmark.py --model vit_b_16 \
python benchmark.py \
--model vit_h_14 \
--batch-size 256 \
--sparsity-linear 0.8 \
--sp-linear-tile-size 64 \
--sparsify-weights \
--bsr 64 \
> /dev/null
--sparsity bsr
```
Result:
```
393.864453125 ms
```

Semi-structured sparsity
```
python benchmark.py \
--model vit_h_14 \
--batch-size 256 \
--sparsity semi_structured
```


## Training
Please refer to [TRAINING.md](TRAINING.md) for training from scratch. We use [Torchvision](https://github.com/pytorch/vision/tree/main/references/classification) as our framework for training. Supermask can be applied during training.
Expand Down Expand Up @@ -102,11 +109,11 @@ To apply supermask, we have the following arguments at our disposal,
For example, if you would like to train a `vit_b_16` from scratch using Supermask, you can use the respective torchvision command found in [TRAINING.md](TRAINING.md) and append the supermask arguments:
```
torchrun --nproc_per_node=8 train.py\
--model vit_b_16 --epochs 300 --batch-size 512 --opt adamw --lr 0.003 --wd 0.3\
--model vit_h_14 --epochs 3 --batch-size 64 --opt adamw --lr 0.003 --wd 0.3\
--lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
--clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema\
--sparsity-linear 0.9 --sp-linear-tile-size 32
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 \
--clip-grad-norm 1 --cutmix-alpha 1.0 --model-ema\
--sparsity semi_structured --data-path $IMAGENET_PATH
```
Through this command, we are training a `vit_b_16` with 90% sparsity to linear layers using 32x32 tiles.

Expand Down Expand Up @@ -134,6 +141,11 @@ NGPUS=1 # put number of available GPUS here
```
This is similar to the previous command, but it does not apply offline sparsification or BSR conversion. Instead, the sparsity is applied on-the-fly during evaluation.

* Semi-structured sparsity
```
python evaluate.py --model vit_b_16 --batch-size 256 --data-path $IMAGENET_PATH --weights-path checkpoints/2x4_sparse_ft_1_epoch.pth --sparsity semi_structured --skip-last-layer-sparsity
```

Please run `python evaluate.py --help` for a full list of available arguments.

Results (1x A100):
Expand Down

0 comments on commit 00b3220

Please sign in to comment.