Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add README for image classification example #21758

Merged
merged 4 commits into from
Jun 10, 2022

Conversation

yeandy
Copy link
Contributor

@yeandy yeandy commented Jun 8, 2022

Instructions on how to run a Pytorch RunInference example.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Add a link to the appropriate issue in your description, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@asf-ci
Copy link

asf-ci commented Jun 8, 2022

Can one of the admins verify this patch?

1 similar comment
@asf-ci
Copy link

asf-ci commented Jun 8, 2022

Can one of the admins verify this patch?

@yeandy
Copy link
Contributor Author

yeandy commented Jun 8, 2022

R: @ryanthompson591 @AnandInguva

Copy link
Contributor

@ryanthompson591 ryanthompson591 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple small comments.

sdks/python/apache_beam/examples/inference/README.md Outdated Show resolved Hide resolved
sdks/python/apache_beam/examples/inference/README.md Outdated Show resolved Hide resolved
sdks/python/apache_beam/examples/inference/README.md Outdated Show resolved Hide resolved

### Data
Data related to RunInference has been staged in
`gs://apache-beam-ml/` for use with these example pipelines:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know gs://apache-beam-ml will really work as a link or is right.

maybe "staged in apache-beam-testing" will work

Feel free to keep it this way if I'm just wrong or misunderstanding.

Maybe this will link right if the users cloud account is set to apache-beam-testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not intended to be a link, but rather to show the location of datasets, models, etc. related to RunInference. The apache-beam-ml bucket is public; outside users should be able to view (with command gsutil ls gs://apache-beam-ml) and read these models / datasets with no issue?

sdks/python/apache_beam/examples/inference/README.md Outdated Show resolved Hide resolved
sdks/python/apache_beam/examples/inference/README.md Outdated Show resolved Hide resolved
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005012.JPEG,573
```
where the second item in each line is the integer representing the predicted class of the
image.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be cool if one of the ptransforms in the example joined to integer prediction to the actual name of the image.

for example:
gs://apache-beam-ml/datasets/.....5102.jpeg, horse
gs://apache-beam-ml/datasets/.....5102.jpeg, cheese

etc.

But that is outside of the scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have that for a different example #21766

@yeandy
Copy link
Contributor Author

yeandy commented Jun 10, 2022

I have another PR #21766, and once that gets merged I can add instructions for it in the README.md as well.

@yeandy
Copy link
Contributor Author

yeandy commented Jun 10, 2022

R: @tvalentyn @TheNeuralBit

@codecov
Copy link

codecov bot commented Jun 10, 2022

Codecov Report

Merging #21758 (80a17cf) into master (cd966ec) will increase coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #21758      +/-   ##
==========================================
+ Coverage   74.02%   74.06%   +0.03%     
==========================================
  Files         698      698              
  Lines       92192    92333     +141     
==========================================
+ Hits        68248    68383     +135     
- Misses      22693    22699       +6     
  Partials     1251     1251              
Flag Coverage Δ
python 83.63% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...eam/transforms/py_dataflow_distribution_counter.py 91.42% <0.00%> (-4.87%) ⬇️
sdks/python/apache_beam/dataframe/io.py 88.78% <0.00%> (-3.26%) ⬇️
sdks/python/apache_beam/utils/counters.py 85.39% <0.00%> (-1.36%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.42% <0.00%> (-0.13%) ⬇️
sdks/python/apache_beam/transforms/window.py 87.20% <0.00%> (+0.15%) ⬆️
sdks/python/apache_beam/utils/windowed_value.py 92.52% <0.00%> (+0.17%) ⬆️
sdks/python/apache_beam/transforms/core.py 92.38% <0.00%> (+0.23%) ⬆️
...ks/python/apache_beam/runners/worker/operations.py 74.32% <0.00%> (+0.29%) ⬆️
...eam/runners/interactive/interactive_environment.py 92.02% <0.00%> (+0.30%) ⬆️
...ks/python/apache_beam/runners/worker/opcounters.py 85.13% <0.00%> (+0.30%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd966ec...80a17cf. Read the comment docs.

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a couple minor suggestions


### Datasets and Models for RunInference
Data related to RunInference has been staged in
`gs://apache-beam-ml/` for use with these example pipelines. You can see this by using the [gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) maybe link to the cloud console here: https://pantheon.corp.google.com/storage/browser/apache-beam-ml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that link accessible to only Google employees though? would https://console.cloud.google.com/ be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, yes it would. I copied the wrong thing :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- ...
- gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
-->
- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt/`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt/`:
- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

validation data
- gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
- ...
- gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) It might be nice to clarify that these sub-bullets are the file contents with something like:

$ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
...
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one more thing I forgot to include in the last review.

Comment on lines 39 to 40
For installation of the `torch` dependency for Dataflow pipelines, refer to these
[instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For installation of the `torch` dependency for Dataflow pipelines, refer to these
[instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
For installation of the `torch` dependency on a distributed runner, like Dataflow, refer to these
[instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Doesn't need to be exactly that text, just in general Beam docs should mention Dataflow as a distributed runner and be clear that there are others)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

TODO: Add link to full documentation on Beam website when it's published.

i.e. "See the
[documentation](https://beam.apache.org/documentation/dsls/dataframes/overview/#pre-requisites)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a leftover?

@TheNeuralBit
Copy link
Member

Merging despite the failed test since this is just a docs change.

@TheNeuralBit TheNeuralBit merged commit a079e97 into apache:master Jun 10, 2022
bullet03 pushed a commit to akvelon/beam that referenced this pull request Jun 20, 2022
* Add README for image classification example

* Fix typos and input name changes

* Fix typos and clarify inputs text

* Add link to GCP console; Add clarifying comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants