Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inference logic and standardize config index mapping #1284

Merged
merged 1 commit into from
Aug 23, 2024

Conversation

kaituo
Copy link
Collaborator

@kaituo kaituo commented Aug 23, 2024

Description

This commit addresses several issues and improvements in the inference logic and config index mapping:

  1. Fixes in RealTimeInferencer:
  • Previously, we checked if the last update time of the model state was within the current interval and skipped inference if it was. However, this led to excessive skipping of inference because the last update time was updated when retrieving the model state from the cache.
  • Introduced lastSeenExecutionEndTime in the model state, which specifically tracks the last time a sample was processed during inference (not training). This ensures more accurate control over when inference should be skipped.
  1. Consistent Naming in Config Index Mapping:
  • To maintain consistency across the codebase, changed defaultFill to default_fill in the Config index mapping, following the underscore naming convention used elsewhere.
  1. Additional Null Checks:
  • Added more null checks for the defaultFill field in the Config constructor to improve robustness.

Testing:

  • Added a smoke test to allow the job scheduler to trigger anomaly detection inferencing, successfully reproducing and verifying the fix for item 1.
  • added unit tests for item 3.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

This commit addresses several issues and improvements in the inference logic and config index mapping:

1. Fixes in RealTimeInferencer:

* Previously, we checked if the last update time of the model state was within the current interval and skipped inference if it was. However, this led to excessive skipping of inference because the last update time was updated when retrieving the model state from the cache.
* Introduced lastSeenExecutionEndTime in the model state, which specifically tracks the last time a sample was processed during inference (not training). This ensures more accurate control over when inference should be skipped.

2. Consistent Naming in Config Index Mapping:

* To maintain consistency across the codebase, changed defaultFill to default_fill in the Config index mapping, following the underscore naming convention used elsewhere.

3. Additional Null Checks:

* Added more null checks for the defaultFill field in the Config constructor to improve robustness.

Testing:
* Added a smoke test to allow the job scheduler to trigger anomaly detection inferencing, successfully reproducing and verifying the fix for item #1.* added unit tests for item #3.

Signed-off-by: Kaituo Li <kaituo@amazon.com>
Copy link

codecov bot commented Aug 23, 2024

Codecov Report

Attention: Patch coverage is 47.05882% with 9 lines in your changes missing coverage. Please review.

Project coverage is 77.56%. Comparing base (3f0fc8c) to head (815f30f).
Report is 8 commits behind head on main.

Files Patch % Lines
...n/java/org/opensearch/timeseries/model/Config.java 11.11% 4 Missing and 4 partials ⚠️
...g/opensearch/timeseries/ml/RealTimeInferencer.java 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #1284      +/-   ##
============================================
+ Coverage     71.83%   77.56%   +5.73%     
- Complexity     4898     5428     +530     
============================================
  Files           518      532      +14     
  Lines         22879    23260     +381     
  Branches       2245     2302      +57     
============================================
+ Hits          16434    18041    +1607     
+ Misses         5410     4169    -1241     
- Partials       1035     1050      +15     
Flag Coverage Δ
plugin 77.56% <47.05%> (+5.73%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...rch/timeseries/dataprocessor/ImputationOption.java 88.57% <100.00%> (-2.06%) ⬇️
...ava/org/opensearch/timeseries/ml/ModelManager.java 91.42% <100.00%> (-0.76%) ⬇️
.../java/org/opensearch/timeseries/ml/ModelState.java 95.08% <100.00%> (+3.85%) ⬆️
...g/opensearch/timeseries/ml/RealTimeInferencer.java 87.83% <50.00%> (ø)
...n/java/org/opensearch/timeseries/model/Config.java 86.02% <11.11%> (-1.39%) ⬇️

... and 104 files with indirect coverage changes

@kaituo
Copy link
Collaborator Author

kaituo commented Aug 23, 2024

bwc test failed due to search failure. Looks like core issue. Updated opensearch-project/OpenSearch#15234

? ERROR][o.o.t.t.TaskManager      ] [adBwcCluster0-2] Failed to search task for config duayfJEBmA_ocOTd5FGK
?  org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
?  	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:770) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:395) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:810) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:548) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:316) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:766) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1729) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1515) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.NativeMessageHandler.lambda$handleException$5(NativeMessageHandler.java:454) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:343) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.NativeMessageHandler.handleException(NativeMessageHandler.java:452) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.NativeMessageHandler.handlerResponseError(NativeMessageHandler.java:444) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.NativeMessageHandler.handleMessage(NativeMessageHandler.java:172) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.NativeMessageHandler.messageReceived(NativeMessageHandler.java:126) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.InboundHandler.messageReceivedFromPipeline(InboundHandler.java:121) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:113) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:800) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.forwardFragments(NativeInboundBytesHandler.java:157) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.doHandleBytes(NativeInboundBytesHandler.java:94) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:143) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:119) [opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) [transport-netty4-client-2.17.0.jar:2.17.0]
?  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) [netty-common-4.1.112.Final.jar:4.1.112.Final]
?  	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.112.Final.jar:4.1.112.Final]
?  	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
?  Caused by: org.opensearch.index.query.QueryShardException: failed to create query: For input string: ""
?  	at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:519) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:502) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.search.SearchService.parseSource(SearchService.java:1324) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.search.SearchService.createContext(SearchService.java:1063) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:661) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:634) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:982) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.17.0.jar:2.17.0]
?  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
?  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
?  	at java.lang.Thread.run(Thread.java:1583) ~[?:?]
?  Caused by: java.lang.NumberFormatException: For input string: ""
?  	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:67) ~[?:?]
?  	at java.lang.Integer.parseInt(Integer.java:672) ~[?:?]
?  	at java.lang.Integer.parseInt(Integer.java:778) ~[?:?]
?  	at org.opensearch.common.lucene.search.Queries.calculateMinShouldMatch(Queries.java:203) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.common.lucene.search.Queries.applyMinimumShouldMatch(Queries.java:145) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.index.query.BoolQueryBuilder.doToQuery(BoolQueryBuilder.java:335) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:117) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.index.query.QueryShardContext.lambda$toQuery$3(QueryShardContext.java:503) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:515) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:502) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.search.SearchService.parseSource(SearchService.java:1324) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.search.SearchService.createContext(SearchService.java:1063) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:661) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:634) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:982) ~[opensearch-2.17.0.jar:2.17.0]
?  	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.17.0.jar:2.17.0]
?  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
?  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
?  	at java.lang.Thread.run(Thread.java:1583) ~[?:?]

@kaituo
Copy link
Collaborator Author

kaituo commented Aug 23, 2024

WhiteSource security check failed as usual.

import com.google.gson.JsonObject;

/**
* Test that is meant to run with job scheduler to test if we have at least consecutive results generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice addition

@kaituo kaituo merged commit 2922bbd into opensearch-project:main Aug 23, 2024
21 of 23 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 23, 2024
This commit addresses several issues and improvements in the inference logic and config index mapping:

1. Fixes in RealTimeInferencer:

* Previously, we checked if the last update time of the model state was within the current interval and skipped inference if it was. However, this led to excessive skipping of inference because the last update time was updated when retrieving the model state from the cache.
* Introduced lastSeenExecutionEndTime in the model state, which specifically tracks the last time a sample was processed during inference (not training). This ensures more accurate control over when inference should be skipped.

2. Consistent Naming in Config Index Mapping:

* To maintain consistency across the codebase, changed defaultFill to default_fill in the Config index mapping, following the underscore naming convention used elsewhere.

3. Additional Null Checks:

* Added more null checks for the defaultFill field in the Config constructor to improve robustness.

Testing:
* Added a smoke test to allow the job scheduler to trigger anomaly detection inferencing, successfully reproducing and verifying the fix for item #1.* added unit tests for item #3.

Signed-off-by: Kaituo Li <kaituo@amazon.com>
(cherry picked from commit 2922bbd)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@kaituo kaituo added bug Something isn't working and removed infra Changes to infrastructure, testing, CI/CD, pipelines, etc. labels Aug 23, 2024
kaituo pushed a commit that referenced this pull request Aug 23, 2024
This commit addresses several issues and improvements in the inference logic and config index mapping:

1. Fixes in RealTimeInferencer:

* Previously, we checked if the last update time of the model state was within the current interval and skipped inference if it was. However, this led to excessive skipping of inference because the last update time was updated when retrieving the model state from the cache.
* Introduced lastSeenExecutionEndTime in the model state, which specifically tracks the last time a sample was processed during inference (not training). This ensures more accurate control over when inference should be skipped.

2. Consistent Naming in Config Index Mapping:

* To maintain consistency across the codebase, changed defaultFill to default_fill in the Config index mapping, following the underscore naming convention used elsewhere.

3. Additional Null Checks:

* Added more null checks for the defaultFill field in the Config constructor to improve robustness.

Testing:
* Added a smoke test to allow the job scheduler to trigger anomaly detection inferencing, successfully reproducing and verifying the fix for item #1.* added unit tests for item #3.

Signed-off-by: Kaituo Li <kaituo@amazon.com>
(cherry picked from commit 2922bbd)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
kaituo pushed a commit that referenced this pull request Aug 26, 2024
This commit addresses several issues and improvements in the inference logic and config index mapping:

1. Fixes in RealTimeInferencer:

* Previously, we checked if the last update time of the model state was within the current interval and skipped inference if it was. However, this led to excessive skipping of inference because the last update time was updated when retrieving the model state from the cache.
* Introduced lastSeenExecutionEndTime in the model state, which specifically tracks the last time a sample was processed during inference (not training). This ensures more accurate control over when inference should be skipped.

2. Consistent Naming in Config Index Mapping:

* To maintain consistency across the codebase, changed defaultFill to default_fill in the Config index mapping, following the underscore naming convention used elsewhere.

3. Additional Null Checks:

* Added more null checks for the defaultFill field in the Config constructor to improve robustness.

Testing:
* Added a smoke test to allow the job scheduler to trigger anomaly detection inferencing, successfully reproducing and verifying the fix for item #1.* added unit tests for item #3.


(cherry picked from commit 2922bbd)

Signed-off-by: Kaituo Li <kaituo@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants