-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose model accuracy metrics in tests #600
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this!
Can you add this under 'Enhancements' under 2.1 release notes? |
This PR adds an option flag to print logs during tests and turn on the flag in CI workflow. The flag is disabled by default. By doing this, we can record model accuracy metrics in git workflows and later retrieve it for analysis. Testing done: 1. We can turn on/off logs during tests. 2. The accuracy logs are recorded. Signed-off-by: Kaituo Li <kaituo@amazon.com>
Signed-off-by: Kaituo Li <kaituo@amazon.com>
added |
Codecov Report
@@ Coverage Diff @@
## main #600 +/- ##
============================================
- Coverage 79.21% 79.02% -0.20%
+ Complexity 4222 4207 -15
============================================
Files 296 296
Lines 17686 17686
Branches 1880 1880
============================================
- Hits 14010 13976 -34
- Misses 2783 2811 +28
- Partials 893 899 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. you might want to check other info level logs just to make sure you don't print too much verbose in testing.
* Expose model accuracy metrics in tests This PR adds an option flag to print logs during tests and turn on the flag in CI workflow. The flag is disabled by default. By doing this, we can record model accuracy metrics in git workflows and later retrieve it for analysis. Testing done: 1. We can turn on/off logs during tests. 2. The accuracy logs are recorded. Signed-off-by: Kaituo Li <kaituo@amazon.com> (cherry picked from commit f630c8f)
* Expose model accuracy metrics in tests This PR adds an option flag to print logs during tests and turn on the flag in CI workflow. The flag is disabled by default. By doing this, we can record model accuracy metrics in git workflows and later retrieve it for analysis. Testing done: 1. We can turn on/off logs during tests. 2. The accuracy logs are recorded. Signed-off-by: Kaituo Li <kaituo@amazon.com> (cherry picked from commit f630c8f)
* Expose model accuracy metrics in tests This PR adds an option flag to print logs during tests and turn on the flag in CI workflow. The flag is disabled by default. By doing this, we can record model accuracy metrics in git workflows and later retrieve it for analysis. Testing done: 1. We can turn on/off logs during tests. 2. The accuracy logs are recorded. Signed-off-by: Kaituo Li <kaituo@amazon.com> (cherry picked from commit f630c8f)
* Expose model accuracy metrics in tests This PR adds an option flag to print logs during tests and turn on the flag in CI workflow. The flag is disabled by default. By doing this, we can record model accuracy metrics in git workflows and later retrieve it for analysis. Testing done: 1. We can turn on/off logs during tests. 2. The accuracy logs are recorded. Signed-off-by: Kaituo Li <kaituo@amazon.com> (cherry picked from commit f630c8f)
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: 1. added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: 1. added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: 1. added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. For the single stream detector, we refactored tests in DetectionResultEvalutationIT and moved it to SingleStreamModelPerfIT. For the HCAD detector, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. We also fixed opensearch-project#712 by revising the client setup code. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. We run the benchmark in separate github workflows since they can be time consuming. For example, it takes 25+ minutes to run HCAD benchmarking alone in 1.1. Also, we print bench-marking results in standard output for recording purpose. For HCAD, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. For single stream detectors, we use a curated data set with known anomaly windows. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: 1. added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. We run the benchmark in separate github workflows since they can be time consuming. For example, it takes 25+ minutes to run HCAD benchmarking alone in 1.1. Also, we print bench-marking results in standard output for recording purpose. For HCAD, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. For single stream detectors, we use a curated data set with known anomaly windows. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: 1. added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. For the single stream detector, we refactored tests in DetectionResultEvalutationIT and moved it to SingleStreamModelPerfIT. For the HCAD detector, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. We also fixed opensearch-project#712 by revising the client setup code. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. We run the benchmark in separate github workflows since they can be time consuming. For example, it takes 25+ minutes to run HCAD benchmarking alone in 1.1. Also, we print bench-marking results in standard output for recording purpose. For HCAD, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. For single stream detectors, we use a curated data set with known anomaly windows. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: 1. added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
* HCAD model performance benchmark This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported #600 so that we can capture the performance data in CI output. Testing done: 1. added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported #600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. We run the benchmark in separate github workflows since they can be time consuming. For example, it takes 25+ minutes to run HCAD benchmarking alone in 1.1. Also, we print bench-marking results in standard output for recording purpose. For HCAD, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. For single stream detectors, we use a curated data set with known anomaly windows. We also backported #600 so that we can capture the performance data in CI output. Testing done: 1. added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported #600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported #600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported #600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
* AD model performance benchmark This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported #600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported #600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds a HCAD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported #600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported #600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
* AD model performance benchmark This PR adds an AD model performance benchmark so that we can compare model performance across versions. For the single stream detector, we refactored tests in DetectionResultEvalutationIT and moved it to SingleStreamModelPerfIT. For the HCAD detector, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported #600 so that we can capture the performance data in CI output. We also fixed #712 by revising the client setup code. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <kaituo@amazon.com>
Description
This PR adds an option flag to print logs during tests and turn on the flag in CI workflow. The flag is disabled by default. By doing this, we can record model accuracy metrics in git workflows and later retrieve it for analysis.
Testing done:
Signed-off-by: Kaituo Li kaituo@amazon.com
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.