Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Neural search: ArrayIndexOutOfBoundsException: Index 495884 out of bounds for length 1 #666

Closed
lihuimingxs opened this issue Apr 2, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@lihuimingxs
Copy link

Describe the bug

What is the bug?
A clear and concise description of the bug.

In Opensearch 2.12.0:

When using GPU and initiating concurrent requests using neural retrieval, an ArrayIndexOutOfBoundsException exception was encountered.

I'm not sure if it's a concurrency issue, but what I can know is that a single request is successful, and exceptions only occur when there are concurrent requests.

Number of concurrent requests: More than 5 times

Request:

GET irp_index_vec/_search
{
      "size": 100, 
      "query": {
          "bool": {
              "filter": [
                  {
                      "bool": {
                          "must": [
                              {
                                  "terms": {
                                      "stat": [
                                          1
                                      ]
                                  }
                              }
                          ]
                      }
                  }
              ],
              "must": [
                  {
                      "neural": {
                          "embeddingCnVector1": {
                              "query_text": "some content",
                              "k": 100
                          }
                      }
                  }
              ]
          }
      }
}

Exception:

[2024-03-28T19:30:27,355][WARN ][r.suppressed             ] [opensearch-cluster_manager] path: /irp_index_vec/_search, params: {typed_keys=true, index=irp_index_vec}
org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:722) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:379) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:298) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.action.search.FetchSearchPhase.lambda$innerRun$1(FetchSearchPhase.java:138) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.action.search.CountedCollector.countDown(CountedCollector.java:66) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.action.search.CountedCollector.onFailure(CountedCollector.java:85) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.action.search.FetchSearchPhase$2.onFailure(FetchSearchPhase.java:257) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:766) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1725) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:404) [opensearch-security-2.12.0.0.jar:2.12.0.0]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1511) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.transport.InboundHandler.lambda$handleException$5(InboundHandler.java:447) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:343) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.transport.InboundHandler.handleException(InboundHandler.java:445) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:437) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:170) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:127) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:770) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) [opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) [transport-netty4-client-2.12.0.jar:2.12.0]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475) [netty-handler-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338) [netty-handler-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387) [netty-handler-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) [netty-codec-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) [netty-codec-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) [netty-codec-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.106.Final.jar:4.1.106.Final]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: org.opensearch.OpenSearchException$3: Index 495884 out of bounds for length 1
        at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:708) ~[opensearch-core-2.12.0.jar:2.12.0]
        at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:377) [opensearch-2.12.0.jar:2.12.0]
        ... 49 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 495884 out of bounds for length 1
        at org.apache.lucene.util.SparseFixedBitSet.get(SparseFixedBitSet.java:129) ~[lucene-core-9.9.2.jar:9.9.2 a2939784c4ca60bc28bf488b5479c02fc2e5e22c - 2024-01-25 09:51:09]
        at org.opensearch.search.fetch.FetchPhase.findRootDocumentIfNested(FetchPhase.java:283) ~[opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.search.fetch.FetchPhase.prepareHitContext(FetchPhase.java:299) ~[opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.search.fetch.FetchPhase.execute(FetchPhase.java:172) ~[opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.search.SearchService.lambda$executeFetchPhase$3(SearchService.java:782) ~[opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) ~[opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) ~[opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) ~[opensearch-2.12.0.jar:2.12.0]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.12.0.jar:2.12.0]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        ... 1 more

Related component

Search

To Reproduce

  1. Create neural retrieval concurrent requests
  2. View cluster startup logs
  3. See the error logs

Expected behavior

Neural Search is ok when used GPU.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: Linux CentOS 7.9
  • Version: 2.12.0
  • Plugins: ml_commons

Additional context
Add any other context about the problem here.

@lihuimingxs lihuimingxs added bug Something isn't working untriaged labels Apr 2, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7 8]
@opensearch-project/admin could you please transfer this to the neural search repository?

@bbarani bbarani transferred this issue from opensearch-project/OpenSearch Apr 3, 2024
@navneet1v
Copy link
Collaborator

@lihuimingxs can you try removing the neural search query clause from the query and run the query with 5 concurrent request? Because what I can see is stack trace is from fetch phase this could be an issue in opensearch core. Because neural query doesn't touch fetch phase of Search.

@jmazanec15
Copy link
Member

@lihuimingxs are you still facing the issue?

@jmazanec15
Copy link
Member

closing - no activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

4 participants