Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC on remote store integration #4536

Closed
Tracked by #4448
dreamer-89 opened this issue Sep 16, 2022 · 3 comments
Closed
Tracked by #4448

POC on remote store integration #4536

dreamer-89 opened this issue Sep 16, 2022 · 3 comments
Assignees
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request

Comments

@dreamer-89
Copy link
Member

dreamer-89 commented Sep 16, 2022

As part of this issue, we need to perform a POC on integrating with remote store. The high level design for POC is captured in #4555

@Poojita-Raj Poojita-Raj reopened this Sep 16, 2022
@dreamer-89 dreamer-89 changed the title POC on integration POC on remote store integration Sep 16, 2022
@tlfeng tlfeng added distributed framework feature New feature or request enhancement Enhancement or improvement to existing feature or request and removed feature New feature or request labels Sep 20, 2022
@dreamer-89 dreamer-89 self-assigned this Oct 7, 2022
@dreamer-89
Copy link
Member Author

I am starting with writing a simple integration test and then build changes to make it work. I am using this comment as one place to list down all issues identified on remote store side (will keep updating this list):

  1. Blob repository needs to be executed in GENERIC or SNAPSHOT thread which doesn't hold true for remote store repo (ClusterStateUpdate#task) resulting in test failures.
  2. Cluster state update failures due to below exception. This is failing during index clean up step.
java.lang.IllegalStateException: Future got interrupted
	at org.opensearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:77) ~[main/:?]
	at org.opensearch.cluster.service.MasterService.publish(MasterService.java:343) [main/:?]
	at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:321) [main/:?]
	at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:196) [main/:?]
	at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:176) [main/:?]
	at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:214) [main/:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [main/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) [main/:?]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) [main/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1048) ~[?:?]
	at org.opensearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:272) ~[main/:?]
	at org.opensearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:104) ~[main/:?]
	at org.opensearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:74) ~[main/:?]
	... 11 more

@dreamer-89
Copy link
Member Author

Existing changes pushed to remote_store_poc. This work depends on #4628 and #4793 before actual integration can be tested.

https://github.com/dreamer-89/OpenSearch/tree/remote_store_poc

@ankitkala
Copy link
Member

So I've been trying POC for CCR using segrep and remote store. As part of that, I did bare minimum changes to enable the segment replication via remote store: https://github.com/opensearch-project/OpenSearch/pull/7028/files

As it was discussed during design, this is a pull based implementation where primary notifies the replicas after segments have been uploaded to the remote store. Replica then pulls the segments diff from remote store and applies to the reader.

What's missing:

  • SegRep relies on StoreFileMetadata to compute the diff whereas remote store has its own metadata file for fetching diff and downloading the segments. We'll need converge both the approaches to use common constructs.
  • There were also discussions to enable fsync-less commits instead of refreshes in [Segment Replication] Remote store integration high level design. #4555, while that change is good to have, its is not immediately required to make the integration work as we're already uploading the in-memory segments(after refresh) to remote store as well.
  • Deletion policy for segments in remote store(probably should be covered during remote store GA).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

No branches or pull requests

5 participants