diff --git a/sdks/python/apache_beam/examples/inference/README.md b/sdks/python/apache_beam/examples/inference/README.md new file mode 100644 index 0000000000000..e98b3e8882559 --- /dev/null +++ b/sdks/python/apache_beam/examples/inference/README.md @@ -0,0 +1,114 @@ + + +# Example RunInference API Pipelines + +This module contains example pipelines that use the Beam RunInference +API. + +## Pre-requisites + +You must have `apache-beam>=2.40.0` installed in order to run these pipelines, +because the `apache_beam.examples.inference` module was added in that release. +``` +pip install apache-beam==2.40.0 +``` + +### Pytorch dependencies +The RunInference API has support for the Pytorch framework. To use Pytorch locally, first install `torch`. +``` +pip install torch==1.11.0 +``` + +For installation of the `torch` dependency on a distributed runner, like Dataflow, refer to these +[instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies). + + + +### Datasets and Models for RunInference +Data related to RunInference has been staged in +`gs://apache-beam-ml/` for use with these example pipelines. You can view the data [here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You also can see this by using the [gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted). +``` +gsutil ls gs://apache-beam-ml +``` + +--- +## Image Classification with ImageNet dataset + +[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains +an implementation for a RunInference pipeline thatpeforms image classification +on [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2 +architecture. + +The pipeline reads the images, performs basic preprocessing, passes them to the +PyTorch implementation of RunInference, and then writes the predictions +to a text file in GCS. + +### Dataset and model for Image Classification + + +- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`: + text file containing the GCS paths of the images of a subset of 15 imagenet + validation data. See the following example command to view contents of the file: +``` +$ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt +gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG +... +gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG +``` + +- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`: + JPEG images for the entire validation dataset. + +- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to + the location of the saved state_dict of the pretrained mobilenet_v2 model + from the `torchvision.models` subdirectory. + +### Running `pytorch_image_classification.py` + +To run the image classification pipeline locally, use the following command: +```sh +python -m apache_beam.examples.inference.pytorch_image_classification \ + --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \ + --output predictions.csv \ + --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth +``` + +This will write the output to the `predictions.csv` with contents like: +``` +gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333 +gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711 +gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286 +... +``` +where the second item in each line is the integer representing the predicted class of the +image.