AD model performance benchmark (#729) #734

kaituo · 2022-11-23T19:57:19Z

Description

This PR adds an AD model performance benchmark so that we can compare model performance across versions.

Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions.

We also backported #600 so that we can capture the performance data in CI output.

Testing done:

added unit tests to run the benchmark.

Signed-off-by: Kaituo Li kaituo@amazon.com

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>

kaituo · 2022-11-23T20:02:45Z

Build failed due to

> Can't get https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.5.0/latest/linux/x64/tar/builds/opensearch/plugins/opensearch-job-scheduler-2.5.0.0.zip to D:\a\anomaly-detection\anomaly-detection\src\test\resources\job-scheduler\opensearch-job-scheduler-2.5.0.0.zip
See https://docs.gradle.org/7.4.2/userguide/command_line_interface.html#sec:command_line_warnings

Tested locally using

(22-11-23 11:54:46) <0> [~/code/github/opensearch-ad]
dev-dsk-kaituo-2b-bf84c4db % ./gradlew build -Dopensearch.version=2.4.0-SNAPSHOT

...
r implicit dependency. This can lead to incorrect results being produced, depending on what order the tasks are executed. Please refer to https://docs.gradle.org/7.4.2/userguide/validation_problems.html#implicit_dependency for more details about this problem.

Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

See https://docs.gradle.org/7.4.2/userguide/command_line_interface.html#sec:command_line_warnings

Execution optimizations have been disabled for 1 invalid unit(s) of work during this build to ensure correctness.
Please consult deprecation warnings for more details.

BUILD SUCCESSFUL in 15m 6s
27 actionable tasks: 27 executed

kaituo requested review from a team, amitgalitz and ohltyler November 23, 2022 19:57

ohltyler approved these changes Nov 23, 2022

View reviewed changes

amitgalitz approved these changes Nov 28, 2022

View reviewed changes

kaituo merged commit 867c1b3 into opensearch-project:2.x Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AD model performance benchmark (#729) #734

AD model performance benchmark (#729) #734

kaituo commented Nov 23, 2022

kaituo commented Nov 23, 2022

AD model performance benchmark (#729) #734

AD model performance benchmark (#729) #734

Conversation

kaituo commented Nov 23, 2022

Description

kaituo commented Nov 23, 2022