Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.remotemigration.RemotePrimaryLocalRecoveryIT.testLocalRecoveryFlowWithReplicas is flaky #13977

Closed
reta opened this issue Jun 4, 2024 · 3 comments
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Remote

Comments

@reta
Copy link
Collaborator

reta commented Jun 4, 2024

Describe the bug

The test case org.opensearch.remotemigration.RemotePrimaryLocalRecoveryIT.testLocalRecoveryFlowWithReplicas is flaky:

java.lang.AssertionError
	at __randomizedtesting.SeedInfo.seed([3C4E96578F397B58]:0)
	at org.opensearch.cluster.coordination.PreVoteCollector$PreVotingRound.close(PreVoteCollector.java:298)
	at org.opensearch.cluster.coordination.Coordinator.closePrevoting(Coordinator.java:482)
	at org.opensearch.cluster.coordination.Coordinator.closePrevotingAndElectionScheduler(Coordinator.java:476)
	at org.opensearch.cluster.coordination.Coordinator$1.onSuccess(Coordinator.java:389)
	at org.opensearch.cluster.service.ClusterApplierService$SafeClusterApplyListener.onSuccess(ClusterApplierService.java:682)
	at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:507)
	at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:199)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:854)
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283)
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Related component

Storage:Remote

To Reproduce

 ./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotemigration.RemotePrimaryLocalRecoveryIT.testLocalRecoveryFlowWithReplicas" -Dtests.seed=3C4E96578F397B58

Expected behavior

The test must always pass

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
Standard

Host/Environment (please complete the following information):

  • CI

Additional context

@reta reta added bug Something isn't working untriaged flaky-test Random test failure that succeeds on second run labels Jun 4, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7]
@reta Thanks for creating this issue

@bowenlan-amzn
Copy link
Member

Not the exact same failure, but same class. Using this to track
org.opensearch.remotemigration.RemotePrimaryLocalRecoveryIT.testLocalRecoveryRollingRestartAndNodeFailure

https://build.ci.opensearch.org/job/gradle-check/40770/testReport/junit/org.opensearch.remotemigration/RemotePrimaryLocalRecoveryIT/testLocalRecoveryRollingRestartAndNodeFailure/

java.lang.AssertionError: unexpected
	at __randomizedtesting.SeedInfo.seed([1D9A886514D3BEA3:F9BAA0BD17AD249E]:0)
	at org.opensearch.test.InternalTestCluster.removeExclusions(InternalTestCluster.java:2101)
	at org.opensearch.test.InternalTestCluster.restartNode(InternalTestCluster.java:2037)
	at org.opensearch.test.InternalTestCluster.rollingRestart(InternalTestCluster.java:2018)
	at org.opensearch.remotemigration.RemotePrimaryLocalRecoveryIT.triggerRollingRestartForRemoteMigration(RemotePrimaryLocalRecoveryIT.java:133)
	at org.opensearch.remotemigration.RemotePrimaryLocalRecoveryIT.testLocalRecoveryRollingRestartAndNodeFailure(RemotePrimaryLocalRecoveryIT.java:54)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at org.opensearch.test.OpenSearchTestClusterRule$1.evaluate(OpenSearchTestClusterRule.java:369)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.util.concurrent.ExecutionException: ClusterManagerNotDiscoveredException[null]
	at org.opensearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:286)
	at org.opensearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:273)
	at org.opensearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:104)
	at org.opensearch.test.InternalTestCluster.removeExclusions(InternalTestCluster.java:2099)
	... 43 more
Caused by: ClusterManagerNotDiscoveredException[null]
	at app//org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction$1.onTimeout(TransportClusterManagerNodeAction.java:356)
	at app//org.opensearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:394)
	at app//org.opensearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:294)
	at app//org.opensearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:737)
	at app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:882)
	at java.base@21.0.3/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base@21.0.3/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	... 1 more

@reta
Copy link
Collaborator Author

reta commented Jun 19, 2024

Closing in favour of #14314

@reta reta closed this as completed Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Remote
Projects
Status: ✅ Done
Development

No branches or pull requests

3 participants