Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of real time Anomaly Detection using RunInference #23497

Conversation

shub-kris
Copy link
Contributor

This PR aims to illustrates an example of setting up a real time text anomaly detection pipeline using PubSub and RunInference.

The entire implementation is divided into two different pipelines:

  1. write_data_to_pubsub_pipeline pushes data to PubSub
  2. anomaly_detection_pipeline reads data from PubSub, uses a trained HDBSCAN model and does anomaly detection.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@shub-kris
Copy link
Contributor Author

@damccorm @andyxiexu please have a look and give your comments.

@codecov
Copy link

codecov bot commented Oct 5, 2022

Codecov Report

Merging #23497 (046227b) into master (b9aa159) will decrease coverage by 0.11%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #23497      +/-   ##
==========================================
- Coverage   73.45%   73.33%   -0.12%     
==========================================
  Files         718      719       +1     
  Lines       95884    95798      -86     
==========================================
- Hits        70427    70249     -178     
- Misses      24146    24238      +92     
  Partials     1311     1311              
Flag Coverage Δ
python 83.04% <ø> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/gcp/bigquery_tools.py 73.34% <0.00%> (-12.39%) ⬇️
sdks/python/apache_beam/io/gcp/bigquery.py 71.31% <0.00%> (-2.94%) ⬇️
sdks/python/apache_beam/internal/gcp/json_value.py 85.50% <0.00%> (-2.90%) ⬇️
...on/apache_beam/runners/dataflow/dataflow_runner.py 80.80% <0.00%> (-2.12%) ⬇️
sdks/python/apache_beam/runners/runner.py 72.39% <0.00%> (-1.85%) ⬇️
...apache_beam/typehints/native_type_compatibility.py 85.52% <0.00%> (-1.06%) ⬇️
...thon/apache_beam/runners/worker/sdk_worker_main.py 77.71% <0.00%> (-0.78%) ⬇️
sdks/python/apache_beam/pipeline.py 92.25% <0.00%> (-0.60%) ⬇️
sdks/python/apache_beam/io/iobase.py 85.92% <0.00%> (-0.50%) ⬇️
sdks/python/apache_beam/transforms/combiners.py 93.05% <0.00%> (-0.39%) ⬇️
... and 18 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, had a number of smaller comments (mostly wording)

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few lingering comments, should be good to go after those though. Thanks!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2022

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @pabloem for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@@ -59,3 +59,4 @@ In order to automate and track the AI/ML workflows throughout your project, you

You can find examples of end-to-end AI/ML pipelines for several use cases:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more descriptions to this list, for each item a small paragraph of what this does with high level example of the use case shown .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Juta please have a look and let's discuss this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be good to add a little bit of detail on these, we can also probably defer that to a future PR though where we clean up that page more generally. @shub-kris @Juta thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damccorm I will also prefer to do in a different PR. @Juta what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, I'll go ahead and merge and we can circle back to this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ok let's pick this up in a next PR

@shub-kris
Copy link
Contributor Author

retest this please

@shub-kris shub-kris requested review from damccorm and yeandy and removed request for damccorm October 14, 2022 06:07
@@ -59,3 +59,4 @@ In order to automate and track the AI/ML workflows throughout your project, you

You can find examples of end-to-end AI/ML pipelines for several use cases:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be good to add a little bit of detail on these, we can also probably defer that to a future PR though where we clean up that page more generally. @shub-kris @Juta thoughts?

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added one small comment, and it looks like there are 2 other lingering comments from Reza (the PROJECT_ID one and the overview.md one)

@damccorm
Copy link
Contributor

damccorm commented Oct 19, 2022

Thanks! I'll merge once tests complete

@damccorm damccorm merged commit 8e8e89e into apache:master Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants