Add support for hpu_backend and Resnet50 compile example

pytorch · Jun 10, 2024 · 6dff71c · 6dff71c
1 parent 36049cb
commit 6dff71c
Show file tree

Hide file tree

Showing 6 changed files with 110 additions and 0 deletions.
diff --git a/docs/README.md b/docs/README.md
@@ -41,6 +41,7 @@ TorchServe is a performant, flexible and easy to use tool for serving PyTorch ea
      - [TorchServe Integrations](../examples/README.md#torchserve-integrations)
      - [TorchServe UseCases](../examples/README.md#usecases)
 * [Workflow Examples](https://github.com/pytorch/serve/tree/master/examples/Workflows) - Examples of how to compose models in a workflow with TorchServe
+* [Resnet50 HPU compile](../examples/pt2/torch_compile_hpu/README.md) - An example of how to run the model in compile mode with the HPU device
 
 ## Advanced Features
 

diff --git a/examples/pt2/torch_compile_hpu/README.md b/examples/pt2/torch_compile_hpu/README.md
@@ -0,0 +1,81 @@
+
+# TorchServe Inference with torch.compile with HPU backend of Resnet50 model
+
+This guide provides steps on how to optimize a ResNet50 model using `torch.compile` with [HPU backend](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Getting_Started_with_Inference.html), aiming to enhance inference performance when deployed through TorchServe. `torch.compile` allows for ahead-of-time compilation of PyTorch models.
+
+### Prerequisites
+- `Intel® Gaudi® AI accelerator software for PyTorch` - Go to [Installation_Guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html) which covers installation procedures, including software verification and subsequent steps for software installation and management.
+
+## Workflow
+1. Configure torch.compile.
+2. Create model archive.
+3. Start TorchServe.
+4. Run Inference.
+5. Stop TorchServe.
+
+First, navigate to `examples/pt2/torch_compile_hpu`
+```bash
+cd examples/pt2/torch_compile_hpu
+```
+
+### 1. Configure torch.compile
+
+`torch.compile` allows various configurations that can influence performance outcomes. Explore different options in the [official PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.compile.html)
+
+
+In this example, we use the following config that is provided in `model-config.yaml` file:
+
+```yaml
+minWorkers: 1
+maxWorkers: 1
+pt2: {backend: "hpu_backend"}
+```
+`pt2: {backend: "hpu_backend"}` - this line enables compile mode, if you remove it from the config file, the model will run in eager mode.
+
+### 2. Create model archive
+
+Download the pre-trained model and prepare the model archive:
+```bash
+wget https://download.pytorch.org/models/resnet50-11ad3fa6.pth
+mkdir model_store
+PT_HPU_LAZY_MODE=0 torch-model-archiver --model-name resnet-50 --version 1.0 --model-file model.py \
+  --serialized-file resnet50-11ad3fa6.pth --export-path model_store \
+  --extra-files ../../image_classifier/index_to_name.json --handler hpu_image_classifier.py \
+  --config-file model-config.yaml
+```
+
+### 3. Start TorchServe
+
+Start the TorchServe server using the following command:
+```bash
+PT_HPU_LAZY_MODE=0 torchserve --start --ncs --model-store model_store --models resnet-50.mar
+```
+
+### 4. Run Inference
+
+**Note:** `torch.compile` requires a warm-up phase to reach optimal performance. Ensure you run at least as many inferences as the `maxWorkers` specified before measuring performance.
+
+```bash
+# Open a new terminal
+cd  examples/pt2/torch_compile_hpu
+curl http://127.0.0.1:8080/predictions/resnet-50 -T ../../image_classifier/kitten.jpg
+```
+
+The expected output will be JSON-formatted classification probabilities, such as:
+
+```json
+{
+  "tabby": 0.2724992632865906,
+  "tiger_cat": 0.1374046504497528,
+  "Egyptian_cat": 0.046274710446596146,
+  "lynx": 0.003206699388101697,
+  "lens_cap": 0.002257900545373559
+}
+```
+
+### 5. Stop the server
+Stop TorchServe with the following command:
+
+```bash
+torchserve --stop
+```
diff --git a/examples/pt2/torch_compile_hpu/hpu_image_classifier.py b/examples/pt2/torch_compile_hpu/hpu_image_classifier.py
@@ -0,0 +1,18 @@
+import habana_frameworks.torch.core as htcore  # nopycln: import
+import torch
+
+from ts.torch_handler.image_classifier import ImageClassifier
+
+
+class HPUImageClassifier(ImageClassifier):
+    def set_hpu(self):
+        self.map_location = "hpu"
+        self.device = torch.device(self.map_location)
+
+    def _load_pickled_model(self, model_dir, model_file, model_pt_path):
+        """
+        This override of this method allows us to set device to hpu and use the default base_handler without having to modify it.
+        """
+        model = super()._load_pickled_model(model_dir, model_file, model_pt_path)
+        self.set_hpu()
+        return model
diff --git a/examples/pt2/torch_compile_hpu/model-config.yaml b/examples/pt2/torch_compile_hpu/model-config.yaml
@@ -0,0 +1,3 @@
+minWorkers: 1
+maxWorkers: 1
+pt2: {backend: "hpu_backend"}
diff --git a/examples/pt2/torch_compile_hpu/model.py b/examples/pt2/torch_compile_hpu/model.py
@@ -0,0 +1,6 @@
+from torchvision.models.resnet import Bottleneck, ResNet
+
+
+class ImageClassifier(ResNet):
+    def __init__(self):
+        super(ImageClassifier, self).__init__(Bottleneck, [3, 4, 6, 3])
diff --git a/ts/utils/util.py b/ts/utils/util.py
@@ -27,6 +27,7 @@ class PT2Backend(str, enum.Enum):
     IPEX = "ipex"
     TORCHXLA_TRACE_ONCE = "torchxla_trace_once"
     OPENVINO = "openvino"
+    HPU_BACKEND = "hpu_backend"
 
 
 logger = logging.getLogger(__name__)