Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THREAD_DUMPS_ON_SLOW_REQUEST: adding the base of the development of t… #558

Merged

Conversation

ironcerocloudbees
Copy link
Contributor

…his feature

Testing done

I tested this feature installing the snapshot version into a testing environment (in the last 2.452.2 LTS version) and run the following tests:

  1. The way to trigger the slow request was to run sleep(60000) groovy code in the Manage Jenkins -> Script Console in multiples tabs at the same time (around 10).
  2. I ran the above trigger a few times consecutively to confirm that we don't trigger the collecting operation while the RECURRENCE_PERIOD_MIN is working.
  3. I run the above trigger each 12-15 minutes for more than 2 hours to confirm that the SLOW_REQUEST_THREAD_DUMPS_TO_RETAIN function works propertly.
  4. I changed the parameters (RECURRENCE_PERIOD_MIN, MINIMAL_SLOW_REQUEST_COUNT, etc) in the Jenkins startup to confirm that we can modify any of those parameters and the change affected the behavior of the new feature in the correct way.
  5. I generated a new support bundle and confirmed that the new files were included correctly.

Proof of the feature working:

image

UI changes:

image

JIRA:

https://issues.jenkins.io/browse/JENKINS-73431

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@ironcerocloudbees ironcerocloudbees requested a review from a team as a code owner July 10, 2024 10:52
@jglick
Copy link
Member

jglick commented Jul 10, 2024

The slow request records component already includes thread dumps, at least for the thread handling the HTTP request (and IIRC for any thread it is blocked on as well); what additional value comes from this? I did not follow the motivation in Jira.

@ironcerocloudbees
Copy link
Contributor Author

Hi @jglick, thank you so much for your comment.

As you said, the SlowRequest indeed captures the request thread. However, under my experience, making a performance analysis could be very tricky if we only have that information (the threads of the requests that hit the Slow Request feature).

If the performance problem is locally located at the level of the feature that we are consuming with this request (for example if we are running a script in the Script Console that is consuming a lot of time), the thread of the request could be enough.
However, when Jenkins faces performance degradation caused by reasons other than a specific request, the threads of the requests are not enough to perform a performance analysis. In that kind of case, complete thread dumps (and, furthermore, some of them taken periodically) could be very useful.

I know that we have the complete thread dump in the High CPU checker. However, sometimes (and this is very common indeed), the performance degradation is not link to a high CPU usage scenario and this collection is not triggered.

In this moment, you could ask why we would need to record this information if we can take the set of thread dumps manually during the slowness episode. The answer is simple: Jenkins is usually used to run periodic tasks and automatic tasks when nobody is on the site at the moment of the performance degradation. Furthermore, with the introduction of Jenkins in K8s and the fact that the readiness and liveness probes restart the pod under this kind of situation, it is very complex to perform a "post-morten" analysis once the pod is restarted.

Kind regards,

@jglick
Copy link
Member

jglick commented Jul 10, 2024

Just curious. Leaving review to @Dohbedoh.

* Provide a means to disable the slow request thread dump checker. This is a volatile non-final field as if you run into
* issues in a running Jenkins you may need to disable without restarting Jenkins.
*/
public static volatile boolean DISABLED =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to make it volatile here ? Seems to me that only a single instance of this threqad shall be running but maybe I am missing something 🤔 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are totally right. I just revisited the code, and as you said, we don't need the volatile attribute here. The expectation is that only one thread will be running simultaneously. I will apply the change in my next commit.

Comment on lines 94 to 107
for (int i = 0; i < TOTAL_ITERATIONS; i++) {
File threadDumpFile = logs.file(format.format(new Date(fileNameDate)) + ".txt");
try (FileOutputStream fileOutputStream = new FileOutputStream(threadDumpFile)) {
ThreadDumps.threadDump(fileOutputStream);
logs.add(threadDumpFile);
} catch (IOException ioe) {
LOGGER.log(
WARNING,
"Support Core plugin can't generate automatically thread dumps under SlowRequest scenario",
ioe);
} finally {
fileNameDate += FREQUENCY_SEC * 1000;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm maybe I am missing something here, but we never actually sleep or reschedule anything here, so we are actually generating TOTAL_ITERATIONS thread dumps with different file names in a loop, not every FREQUENCY_SEC * 1000 seconds ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely generating dump without the 5 sec interval:

% ll /tmp/je-1-home/slow-request-threaddumps
total 436
drwxr-xr-x  6 jenkins wheel    192 Jul 19 19:58 .
drwxr-xr-x 53 jenkins wheel   1696 Jul 19 19:56 ..
-rw-r--r--  1 jenkins wheel 120211 Jul 19 19:58 20240719-195806.303.txt
-rw-r--r--  1 jenkins wheel 105633 Jul 19 19:58 20240719-195811.303.txt
-rw-r--r--  1 jenkins wheel 106723 Jul 19 19:58 20240719-195816.303.txt
-rw-r--r--  1 jenkins wheel 105673 Jul 19 19:58 20240719-195821.303.txt
% stat /tmp/je-1-home/slow-request-threaddumps/20240719-195806.303.txt
16777233 9837190 -rw-r--r-- 1 jenkins wheel 0 120211 "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" 4096 240 0 /tmp/je-1-home/slow-request-threaddumps/20240719-195806.303.txt
% stat /tmp/je-1-home/slow-request-threaddumps/20240719-195811.303.txt
16777233 9837191 -rw-r--r-- 1 jenkins wheel 0 105633 "Jul 19 19:58:07 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" 4096 208 0 /tmp/je-1-home/slow-request-threaddumps/20240719-195811.303.txt
% stat /tmp/je-1-home/slow-request-threaddumps/20240719-195816.303.txt
16777233 9837194 -rw-r--r-- 1 jenkins wheel 0 106723 "Jul 19 19:58:07 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" "Jul 19 19:58:06 2024" 4096 216 0 /tmp/je-1-home/slow-request-threaddumps/20240719-195816.303.txt
% stat /tmp/je-1-home/slow-request-threaddumps/20240719-195821.303.txt
16777233 9837195 -rw-r--r-- 1 jenkins wheel 0 105673 "Jul 19 19:58:08 2024" "Jul 19 19:58:07 2024" "Jul 19 19:58:07 2024" "Jul 19 19:58:06 2024" 4096 208 0 /tmp/je-1-home/slow-request-threaddumps/20240719-195821.303.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holy sh.. You are right. I forgot to add the sleeping call to wait for this FREQUENCY_SEC * 1000 seconds. As you said, currently, we are creating all those files at the same time.

I will add the sleep instruction in the finally block.

Copy link
Contributor

@Dohbedoh Dohbedoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some need to be addressed:

  • the SlowRequestThreadDumpsGenerator actually generates thread dumps consecutively without intervals and give a misleading name to the files.
  • store the data under the $JENKINS_HOME/support/ directory.

Another thing that bugs me is that if we tweak the settings, we could theoretically have several SlowRequestThreadDumpsGenerator threads running (when iterations x frequency > recurrence period) and they would generate dumps. redundantly. And that make me feel like the generator should probably be a singleton that manage and launch the task.

ironcerocloudbees and others added 2 commits July 23, 2024 08:18
…questThreadDumpsGenerator.java


That is ok with me. I agree that we would need to move the slow requests and CPU folders into the support folder as well.

Co-authored-by: Allan Burdajewicz <allan.burdajewicz@gmail.com>
…questThreadDumpsGenerator.java

Co-authored-by: Allan Burdajewicz <allan.burdajewicz@gmail.com>
@ironcerocloudbees
Copy link
Contributor Author

Another thing that bugs me is that if we tweak the settings, we could theoretically have several SlowRequestThreadDumpsGenerator threads running (when iterations x frequency > recurrence period) and they would generate dumps. redundantly. And that make me feel like the generator should probably be a singleton that manage and launch the task.

First of all, thank @Dohbedoh so much for your review. Regarding this last question, I trust your criteria better than mine. However, I would like to talk about the matter briefly before making the change.

As you said, we could tune the variables to have two threads running simultaneously, and we should try to avoid it. However, I am afraid of making the class singleton. We are saving files during the thread execution and we are using the Java tool to do it. So, in case other threads affect the performance of those libraries (or it was blocking part of that code), would it not cause any kind of deadlock (or at least a performance degradation)?

Knowing that the only benefit of the singleton here is to avoid the execution of two threads due to an incongruent variables setup, it would not be better if we check the assigned values of the variables and establish minimum values.

Kind regards,

@Dohbedoh
Copy link
Contributor

@ironcerocloudbees What I mean by singleton is a some kind of "Holder" for the generator Thread. To avoid creating new a SlowRequestThreadDumpsGenerator threads if one of them is actually still running. The condition whether we want to launch a generator thread (i.e. checkThreadDumpsTrigger) should not just be based on recurrence period but also on whether a dump generator is already running.

@ironcerocloudbees
Copy link
Contributor Author

Hi @Dohbedoh,

Thank you again for your time here. I pushed a new version using a semaphore to avoid concurrencies. I have been reviewing different implementations of the Singleton for threads, and the semaphore looks a good option for us. If you could review it it would be great.

Kind regards,

Copy link
Contributor

@Dohbedoh Dohbedoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That looks good. Maybe an AtomicBoolean is better suited for the running. That'd be my only suggestion here.

Comment on lines 136 to 148
private static boolean isRunning() {
boolean runningLocal;
synchronized (SlowRequestThreadDumpsGenerator.class) {
runningLocal = running;
}
return runningLocal;
}

private static void setRunningStatus(boolean runningLocal) {
synchronized (SlowRequestThreadDumpsGenerator.class) {
running = runningLocal;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be simplified using AtomicBoolean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Dohbedoh,

Thank you for the tip. I have never used AtomicBoolean, so I read the documentation about the topic, and I performed some changes to the PR. Could you please revisit it and confirm if the usage of AtomicBoolean is correct?

Kind regards,

@Dohbedoh Dohbedoh added this pull request to the merge queue Sep 3, 2024
Merged via the queue into jenkinsci:master with commit f4bb758 Sep 3, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants