From 06983e35f300c135749bf7a0d897ed86303dc1d9 Mon Sep 17 00:00:00 2001
From: Jesse Cai <jcjessecai@gmail.com>
Date: Wed, 11 Sep 2024 15:13:46 -0700
Subject: [PATCH] readme-updates

---
 .../sparsity/prototype/superblock/README.md   | 30 +++++++++++++------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/torchao/sparsity/prototype/superblock/README.md b/torchao/sparsity/prototype/superblock/README.md
index 4c92f2881..54a6964b1 100644
--- a/torchao/sparsity/prototype/superblock/README.md
+++ b/torchao/sparsity/prototype/superblock/README.md
@@ -47,9 +47,8 @@ At least one GPU:
 Baseline:
 ```
 python benchmark.py \
-  --model vit_b_16 \
+  --model vit_h_14 \
   --batch-size 256 \
-  > /dev/null
 ```
 Result:
 ```
@@ -59,19 +58,27 @@ Result:
 
 80% sparsity, block size 64 (random weights):
 ```
-python benchmark.py --model vit_b_16 \
+python benchmark.py \
+  --model vit_h_14 \
   --batch-size 256 \
   --sparsity-linear 0.8 \
   --sp-linear-tile-size 64 \
-  --sparsify-weights \
   --bsr 64 \
-  > /dev/null
+  --sparsity bsr
 ```
 Result:
 ```
 393.864453125 ms
 ```
 
+Semi-structured sparsity
+```
+python benchmark.py \
+  --model vit_h_14 \
+  --batch-size 256 \
+  --sparsity semi_structured
+```
+
 
 ## Training
 Please refer to [TRAINING.md](TRAINING.md) for training from scratch. We use [Torchvision](https://github.com/pytorch/vision/tree/main/references/classification) as our framework for training. Supermask can be applied during training.
@@ -102,11 +109,11 @@ To apply supermask, we have the following arguments at our disposal,
 For example, if you would like to train a `vit_b_16` from scratch using Supermask, you can use the respective torchvision command found in [TRAINING.md](TRAINING.md) and append the supermask arguments:
 ```
 torchrun --nproc_per_node=8 train.py\
-    --model vit_b_16 --epochs 300 --batch-size 512 --opt adamw --lr 0.003 --wd 0.3\
+    --model vit_h_14 --epochs 3 --batch-size 64 --opt adamw --lr 0.003 --wd 0.3\
     --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
-    --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
-    --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema\
-    --sparsity-linear 0.9 --sp-linear-tile-size 32
+    --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 \
+    --clip-grad-norm 1 --cutmix-alpha 1.0 --model-ema\
+    --sparsity semi_structured --data-path $IMAGENET_PATH
 ```
 Through this command, we are training a `vit_b_16` with 90% sparsity to linear layers using 32x32 tiles.
 
@@ -134,6 +141,11 @@ NGPUS=1 # put number of available GPUS here
   ```
   This is similar to the previous command, but it does not apply offline sparsification or BSR conversion. Instead, the sparsity is applied on-the-fly during evaluation.
 
+*  Semi-structured sparsity
+  ```
+  python evaluate.py  --model vit_b_16 --batch-size 256 --data-path $IMAGENET_PATH --weights-path checkpoints/2x4_sparse_ft_1_epoch.pth --sparsity semi_structured --skip-last-layer-sparsity
+  ```
+
 Please run `python evaluate.py --help` for a full list of available arguments.
 
 Results (1x A100):