Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More diagnostics for FileDescriptorLimitTest flake #486

Merged
merged 1 commit into from
Aug 8, 2023

Conversation

jglick
Copy link
Member

@jglick jglick commented Aug 8, 2023

Continuing #483. Observed a flake again

java.lang.AssertionError: 
worked on agent
Expected: a string containing "All open files"
     but: was "slave0
======

N/A: Either no connection to node or no cached result
"
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at com.cloudbees.jenkins.support.impl.FileDescriptorLimitTest.agentContentFilter(FileDescriptorLimitTest.java:99)

No obvious explanation in controller logs other than that the channel seems to be closed by this point:

…
   1.363 [id=134]	INFO	jenkins.InitReactorRunner$1#onAttained: Completed initialization
   1.417 [id=115]	INFO	o.jvnet.hudson.test.JenkinsRule#waitOnline: Launching slave0…
   2.479 [id=67]	INFO	o.j.h.test.SimpleCommandLauncher#launch: agent launched for slave0
   2.480 [id=115]	INFO	o.jvnet.hudson.test.JenkinsRule#waitOnline: …finished launching slave0.
   2.529 [id=115]	FINER	c.c.j.support.AsyncResultCache#get: com.cloudbees.jenkins.support.impl.FileDescriptorLimit$GetUlimit@3699b57f on master succeeded
   3.052 [id=115]	FINER	c.c.j.support.AsyncResultCache#get: Could not retrieve file descriptor info from slave0
java.util.concurrent.TimeoutException
	at hudson.remoting.Request$1.get(Request.java:322)
	at hudson.remoting.Request$1.get(Request.java:240)
	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
	at com.cloudbees.jenkins.support.AsyncResultCache.get(AsyncResultCache.java:66)
	at com.cloudbees.jenkins.support.AsyncResultCache.get(AsyncResultCache.java:33)
	at com.cloudbees.jenkins.support.impl.FileDescriptorLimit$1.printTo(FileDescriptorLimit.java:98)
	at com.cloudbees.jenkins.support.api.PrefilteredPrintedContent.writeTo(PrefilteredPrintedContent.java:63)
	at com.cloudbees.jenkins.support.SupportPlugin.writeBundle(SupportPlugin.java:416)
	at com.cloudbees.jenkins.support.SupportPlugin.writeBundle(SupportPlugin.java:353)
	at com.cloudbees.jenkins.support.impl.FileDescriptorLimitTest.agentContentFilter(FileDescriptorLimitTest.java:92)
	at …
   3.082 [id=111]	INFO	hudson.remoting.Request$2#run: Failed to send back a reply to the request RPCRequest:hudson.remoting.RemoteClassLoader$IClassLoader.fetch3[java.lang.String](2): hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@240c5840:slave0": channel is already closed
   3.071 [id=161]	FINE	c.c.j.support.AsyncResultCache#run: Could not retrieve file descriptor info from slave0 for caching
Command Close created at
	at hudson.remoting.Command.<init>(Command.java:70)
	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1306)
	at hudson.remoting.Channel.close(Channel.java:1480)
	at hudson.remoting.Channel.close(Channel.java:1447)
	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1312)
Caused: hudson.remoting.Channel$OrderlyShutdown
Caused: hudson.remoting.RequestAbortedException
	at hudson.remoting.Request.abort(Request.java:346)
	at hudson.remoting.Channel.terminate(Channel.java:1080)
	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1313)
	at hudson.remoting.Channel$1.handle(Channel.java:606)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:81)
Caused: java.util.concurrent.ExecutionException
	at hudson.remoting.Request$1.get(Request.java:325)
	at hudson.remoting.Request$1.get(Request.java:240)
	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
	at com.cloudbees.jenkins.support.AsyncResultCache.run(AsyncResultCache.java:107)
	at …
   3.084 [id=115]	INFO	hudson.lifecycle.Lifecycle#onStatusUpdate: Stopping Jenkins
…

(The test output also shows normal agent logs, but these are only printed during createOnlineSlave and stop thereafter.)

Was able to produce a similar error from

diff --git a/src/main/java/com/cloudbees/jenkins/support/AsyncResultCache.java b/src/main/java/com/cloudbees/jenkins/support/AsyncResultCache.java
index 69c5401..3d7f205 100644
--- a/src/main/java/com/cloudbees/jenkins/support/AsyncResultCache.java
+++ b/src/main/java/com/cloudbees/jenkins/support/AsyncResultCache.java
@@ -3,6 +3,7 @@ package com.cloudbees.jenkins.support;
 import com.cloudbees.jenkins.support.util.CallAsyncWrapper;
 import hudson.model.Computer;
 import hudson.model.Node;
+import hudson.model.Slave;
 import hudson.remoting.Callable;
 import hudson.remoting.VirtualChannel;
 import jenkins.model.Jenkins;
(1/2) Stage this hunk [y,n,q,a,d,j,J,g,/,e,?]? n
@@ -63,6 +64,9 @@ public class AsyncResultCache<T> implements Runnable {
             future = CallAsyncWrapper.callAsync(channel, operation);
         }
         try {
+            if (node instanceof Slave) {
+                node.getChannel().close();
+            }
             final V result = future.get(SupportPlugin.REMOTE_OPERATION_TIMEOUT_MS, TimeUnit.MILLISECONDS);
             synchronized (cache) {
                 cache.put(node, result);

So, enhancing failure message to show complete agent logs up to that point, in the hope that this will give some clue. (Perhaps the CI machine runs out of memory and kills the agent JVM?!)

@jglick jglick requested a review from a team as a code owner August 8, 2023 16:54
Copy link
Contributor

@Dohbedoh Dohbedoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the help narrowing this down!

@Dohbedoh Dohbedoh merged commit 0d62227 into jenkinsci:master Aug 8, 2023
16 checks passed
@jglick jglick deleted the FileDescriptorLimitTest-2 branch August 9, 2023 11:57
@jglick
Copy link
Member Author

jglick commented Aug 15, 2023

Did not help unfortunately:

$ "…/java"  -Djava.awt.headless=true -Xmx512m -XX:+PrintCommandLineFlags -jar "…/support-core-plugin/target/jenkins-for-test/WEB-INF/lib/remoting-….jar"
Picked up JAVA_TOOL_OPTIONS: -Xmx1g …
<===[JENKINS REMOTING CAPACITY]===>&#0;&#0;&#0;channel started
Remoting version: …
Launcher: SimpleCommandLauncher
Communication Protocol: Standard in/out
This is a Unix agent
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by jenkins.slaves.StandardOutputSwapper$ChannelSwapper to constructor java.io.FileDescriptor(int)
WARNING: Please consider reporting this to the maintainers of jenkins.slaves.StandardOutputSwapper$ChannelSwapper
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Evacuated stdout
Agent successfully connected and online

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants