Address PR commensts; Fix formatting

apache · Jun 15, 2022 · 6c5bd4f · 6c5bd4f
1 parent 076ca93
commit 6c5bd4f
Showing 1 changed file with 21 additions and 51 deletions.
diff --git a/sdks/python/apache_beam/examples/inference/README.md b/sdks/python/apache_beam/examples/inference/README.md
@@ -47,8 +47,7 @@ If you are using pretrained models from Hugging Face's `transformers` [package](
 pip install transformers
 ```
 
-For installation of the `torch` dependency on a distributed runner, like Dataflow, refer to these
-[instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
+For installation of the `torch` dependency on a distributed runner, like Dataflow, refer to these [instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
 
 <!---
 TODO: Add link to full documentation on Beam website when it's published.
@@ -61,13 +60,9 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2
-architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the
-PyTorch implementation of RunInference, and then writes the predictions
-to a text file.
+The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
@@ -77,17 +72,14 @@ You will need to create or download images, and place them into your `IMAGES_DIR
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
-parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
-model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
 torch.save(model.state_dict(), 'mobilenet_v2.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
-write the predictions.
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
 - **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
 
 ### Running `pytorch_image_classification.py`
@@ -116,13 +108,9 @@ This writes the output to the `predictions.csv` with contents like:
 ---
 ## Image segmentation
 
-[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
-an implementation for a RunInference pipeline that performs image segementation using the
-maskrcnn_resnet50_fpn architecture.
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the maskrcnn_resnet50_fpn architecture.
 
-The pipeline reads images, performs basic preprocessing, passes them to the
-PyTorch implementation of RunInference, and then writes the predictions
-to a text file.
+The pipeline reads images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image segmentation
 You will need to create or download images, and place them into your `IMAGES_DIR` directory. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
@@ -131,17 +119,15 @@ You will need to create or download images, and place them into your `IMAGES_DIR
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
-parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
 model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
 ```
 import torch
 from torchvision.models.detection import maskrcnn_resnet50_fpn
 model = maskrcnn_resnet50_fpn(pretrained=True)
 torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
-write the predictions.
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
 - **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
 ### Running `pytorch_image_segmentation.py`
 
@@ -166,37 +152,26 @@ This writes the output to the `predictions.csv` with contents like:
 /absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
-Each line has data separated by a semicolon ";".
-The first item is the file name. The second item is a list of predicted instances.
+Each line has data separated by a semicolon ";". The first item is the file name. The second item is a list of predicted instances.
 
 ---
 ## Language modeling
 
-[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains
-an implementation for a RunInference pipeline that performs masked language
-modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM
-architecture from Hugging Face.
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM architecture from Hugging Face.
 
-The pipeline reads sentences, performs basic preprocessing to convert the last
-word into a `[MASK]` token, passes the masked sentence to the PyTorch
-implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads sentences, performs basic preprocessing to convert the last word into a `[MASK]` token, passes the masked sentence to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
 
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
-parameters of the BertForMaskedLM model. You will need to download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM)
-model from Hugging Face's repository of pretrained models.
-Make sure you have installed `transformers` too.
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the BertForMaskedLM model. You will need to download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. Make sure you have installed `transformers` too.
 ```
 import torch
 from transformers import BertForMaskedLM
 model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
 torch.save(model.state_dict(), 'BertForMaskedLM.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
-write the predictions.
-- **Optional**: A path to a file called `SENTENCES` that contains sentences to
-feed into the model. It should look something like this:
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
+- **Optional**: A path to a file called `SENTENCES` that contains sentences to feed into the model. It should look something like this:
 ```
 The capital of France is Paris .
 He looked up and saw the sun and stars .
@@ -238,23 +213,20 @@ is the word that the model predicts for the mask.
 
 ---
 ## MNITST digit classification
-[`sklearn_mnist_classification.py`](./sklearn_mnist_classification.py) contains
-an implementation for a RunInference pipeline that performs image classification on handwritten digits from the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) database.
+[`sklearn_mnist_classification.py`](./sklearn_mnist_classification.py) contains an implementation for a RunInference pipeline that performs image classification on handwritten digits from the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) database.
 
 The pipeline reads rows of pixels corresponding to a digit, performs basic preprocessing, passes the pixels to the Scikit-learn implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
-- **Required**: A path to a file called `INPUT` that contains label and pixels to
-feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent values are pixels from pixel0 to pixel784. It should look something like this:
+- **Required**: A path to a file called `INPUT` that contains label and pixels to feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent elements would be pixel values. It should look something like this:
 ```
 1,0,0,0...
 0,0,0,0...
 1,0,0,0...
 4,0,0,0...
 ...
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
-write the predictions.
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
 - **Required**: A path to a file called `MODEL_PATH` that contains the pickled file of a scikit-learn model trained on MNIST data. Please refer to this scikit-learn [documentation](https://scikit-learn.org/stable/model_persistence.html) on how to serialize models.
 
 
@@ -271,18 +243,16 @@ For example:
 ```sh
 python -m apache_beam.examples.inference.sklearn_mnist_classification.py \
   --input_file mnist_data.csv \
-  --output predictions.csv \
+  --output predictions.txt \
   --model_path mnist_model_svm.pickle
 ```
 
-This writes the output to the `predictions.csv` with contents like:
+This writes the output to the `predictions.txt` with contents like:
 ```
 1,1
 4,9
 7,1
 0,0
 ...
 ```
-Each line has data separated by a comma ",".
-The first item is the actual label of the digit. The second item
-is the predicted label of the digit.
+Each line has data separated by a comma ",". The first item is the actual label of the digit. The second item is the predicted label of the digit.