Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkins log and artifact cleanup schedule #1017

Closed
AdamBrousseau opened this issue Jan 24, 2018 · 38 comments
Closed

Jenkins log and artifact cleanup schedule #1017

AdamBrousseau opened this issue Jan 24, 2018 · 38 comments

Comments

@AdamBrousseau
Copy link
Contributor

AdamBrousseau commented Jan 24, 2018

Today we took down Eclipse's hipp5 which is where our OpenJ9 Jenkins master lives. We were consuming 264G space according to the Bugzilla and have been asked to reduce.

What we store:

  • Build logs (console output)
  • SDKs from Builds and PRs

Jenkins has the following options, which are configured on a per job basis

Days to keep builds, if not empty, build records are only kept up to this number of days    
Max # of builds to keep, if not empty, only up to this number of build records are kept    
Days to keep artifacts, if not empty, artifacts from builds older than this number of days will be deleted, but the logs, history, reports, etc for the build will be kept    
Max # of builds to keep with artifacts, if not empty, only up to this number of builds have their artifacts retained

Currently, our nightly (and OMR acceptance) builds are setup with the following:

  • Days to keep builds: 60
  • Max # of builds to keep: 100
  • Days to keep artifacts:
  • Max # of builds to keep with artifacts: 10

Our PR builds are not configured to discard (keep everything).

Propose:

Nightlies and OMR Acceptance:

  • Days to keep builds: 60
  • Max # of builds to keep: 100
  • Days to keep artifacts: 7
  • Max # of builds to keep with artifacts: 10

PRs:

  • Days to keep builds: 60
  • Max # of builds to keep: 100
  • Days to keep artifacts: 14
  • Max # of builds to keep with artifacts: 25

Please review @pshipton

Also, I have a script to check disk space I could add a job for and send a Slack notification if we're over X. Could open as a separate issue if we're interested.

@pshipton
Copy link
Member

pshipton commented Jan 24, 2018

I'm ok with this, or with smaller numbers even. Do you know about how many builds consumed the 264G?

Perhaps others will have comments.

@pshipton
Copy link
Member

I checked a full Java 9 build and it was 4G uncompressed. This includes the git repos, all the source, build artifacts, and images. I assume the SDK kept around must be smaller than this. The jdk image itself is 323M and zipped its 202M, so around 20G for 100 builds.

@pshipton
Copy link
Member

We probably want a bigger number for PRs, "Max # of builds to keep with artifacts", say 100 to match the "Max # of builds to keep".

@AdamBrousseau
Copy link
Contributor Author

Biggest offenders currently are the PR Sanity builds. (no cleanuo yet)
688 SDKs being stored across all jobs.
Java 9 SDKs are ~200M and 8's are ~100M (tar.gz)

@AdamBrousseau
Copy link
Contributor Author

Based on how much space we've consumed in ~3 months and suggestions in comments, I have updated the numbers in the description. Propose we keep 2 months/100 builds?

@DanHeidinga
Copy link
Member

@AdamBrousseau Are there any download stats on how often these artifacts have been downloaded? If they aren't being downloaded, then storing them serves no purpose.

@pshipton
Copy link
Member

I guess a larger number of days to keep builds doesn't much matter if they are limited to 100 builds.

What does it mean not to put a limit on the artifacts? Won't the artifacts eventually consume too much space, or do they disappear with the builds?

@smlambert
Copy link
Contributor

smlambert commented Jan 25, 2018

Artifacts should disappear with the builds (from Jenkins master).

Based on typical use cases, do we realistically need to keep the SDK for 100 builds for PR builds? I had thought the use case was, "as a developer I want to download the SDK from my PR build, immediately after I build it, so I can do some additional local testing. After a few days (x number of builds later), I want to rebuild, as I want to keep current with all of the other changes coming into the repo)." As a developer, do I ever want to use an SDK that is 100 builds old?

Remind me, is the Jenkins logic to compare "days to keep" vs "# of builds", and uses the smallest? So pshiptons question above would matter more for slow projects that do not have lots of build activity over the course of y days.

@AdamBrousseau
Copy link
Contributor Author

What does it mean not to put a limit on the artifacts?

The artifacts would get cleaned up with the build logs.

download stats

I don't see anything built into Jenkins for this. Initial searching only talks about website monitoring services.

@AdamBrousseau
Copy link
Contributor Author

Can we close on this issue?
With more JDK versions coming online. Maybe we should change the config to

  • Days to keep builds: 60
  • Max # of builds to keep: 100 50

@pshipton
Copy link
Member

pshipton commented Feb 2, 2018

Are you saying we get 50 builds per job? i.e. 50 for Java 9 pLinux, 50 for Java 9 zLinux, etc.

@pshipton
Copy link
Member

pshipton commented Feb 2, 2018

I'd say make the change, and then we can adjust later as necessary.

@pshipton pshipton closed this as completed Feb 2, 2018
@pshipton pshipton reopened this Feb 2, 2018
@AdamBrousseau
Copy link
Contributor Author

Thought about it a bit more. Makes more sense to me to do something like this

  • Days to keep builds:
  • Max # of builds to keep: 50
  • Days to keep artifacts: 60
  • Max # of builds to keep with artifacts:

Modified all build configs.

@AdamBrousseau
Copy link
Contributor Author

AdamBrousseau commented Apr 19, 2018

Reopening this for discussion as we've got an eclipse bugzilla opened against us for consuming 365G space. They would like us to stay under 100.
https://bugs.eclipse.org/bugs/show_bug.cgi?id=533823
Propose we add Max # of builds to keep with artifacts: set to X for PR jobs and Y for Build jobs.
Currently an example PR job that is keeping 50 builds and 60 days for artifacts. There are 50 artifacts (all builds are less than 60 days old) and its consuming ~13G space (50*205M).

Some napkin math:

  • JDK8,9,10: 100M + 200M + 200M = 500M
  • x (build job + PR(Compile, Sanity Extended): 4 (ignoring PRs from the extensions repos as we've had less than 10 builds total per PR build, otherwise this number would be 7)
  • = 2G per platform for 1 of each build
    Saving 20 SDKs for PRs and 10 for Build jobs
  • = 35G per platform
    Currently we have 3 platforms
  • = ~105G
    When we add Windows next week we would be up to 140G.
    If we reduce to last 10 PRs and last 10 builds
  • = ~60G + ~20G when we add Windows

To summarize, Propose we keep the last 10 of every build (PullRequest* or Build*).

Disclaimer: This does not account for build logs

  • ~20M per Build job or PR Compile * 18
  • ~100M per PR Sanity * 9
  • ~30M per PR Extended * 9
  • ~70M per Test Sanity * 9
  • ~15M per Test Extended * 9
  • = 2.2G * 50 (builds to keep) = ~115G
    ^ I think these numbers are worst case scenario. I think we should worry about the artifacts for now and see where we stabilize over the next week or 2.

I also have a job I can setup to monitor the space we are consuming and list the top 10 "Pig" jobs. Optional slack notifications as well.

@AdamBrousseau AdamBrousseau reopened this Apr 19, 2018
@pshipton
Copy link
Member

pshipton commented Apr 19, 2018

Sounds good. If its still too big with the logs we can reduce builds for the PRs.

@DanHeidinga
Copy link
Member

Agreed. Though I'm fine with decreasing the number of builds kept until there is evidence someone is actually downloading them.

@AdamBrousseau
Copy link
Contributor Author

Updated all the build and PR jobs to only keep last 10 artifacts. I will keep an eye on the disk space over the next few days to see if it reduces.

@AdamBrousseau
Copy link
Contributor Author

https://bugs.eclipse.org/bugs/show_bug.cgi?id=535057
Email from Eclipse:

Hi,
We've been facing a serious outage on the machine which runs your Jenkins instance today.
There were no more space left on disk. It appears that OpenJ9 is using 252GB of disk space for its
jobs (out of the 1TB available which is shared by all 17 projects which have a Jenkins on the
same machine). Could you please do some housekeeping?
Thanks.

--
Mikaël Barbero - Eclipse Foundation
IT Services - Release Engineering
📱 (+33) 642 028 039
📧 mikael.barbero@eclipse-foundation.org

jdekonin added a commit to jdekonin/openj9 that referenced this issue May 24, 2018
* keep the artifacts on master to a minimum; no archiving on pr build
* Issue eclipse-openj9#1017
* [skip ci]

Signed-off-by: Joe deKoning <joe_dekoning@ca.ibm.com>
@AdamBrousseau
Copy link
Contributor Author

AdamBrousseau commented May 25, 2018

Joe delivered a change to stop archiving SDKs in PR builds (#1988)
Our usage has gone from 229G to 175G.
As discussed in Slack, I will reconfigure the other jobs to only store 5 SDKs instead of 10. I will post back once that change has propagated through.

175G    total
4.4G    /jobs/genie.openj9/Test-Sanity-JDK9-linux_390-64_cmprssptrs
4.2G    /jobs/genie.openj9/PullRequest-Sanity-JDK9-linux_390-64_cmprssptrs-OpenJ9
4.2G    /jobs/genie.openj9/PullRequest-Sanity-JDK10-linux_390-64_cmprssptrs-OpenJ9
4.0G    /jobs/genie.openj9/Test-Sanity-JDK8-linux_390-64_cmprssptrs
3.9G    /jobs/genie.openj9/PullRequest-Sanity-JDK10-linux_ppc-64_cmprssptrs_le-OpenJ9
3.8G    /jobs/genie.openj9/Test-Sanity-JDK10-linux_390-64_cmprssptrs
3.7G    /jobs/genie.openj9/PullRequest-Sanity-JDK9-aix_ppc-64_cmprssptrs-OpenJ9
3.7G    /jobs/genie.openj9/PullRequest-Sanity-JDK8-linux_390-64_cmprssptrs-OpenJ9
3.6G    /jobs/genie.openj9/Test-Sanity-JDK9-aix_ppc-64_cmprssptrs
3.5G    /jobs/genie.openj9/Test-Sanity-JDK10-linux_x86-64_cmprssptrs
3.5G    /jobs/genie.openj9/PullRequest-Sanity-JDK9-linux_ppc-64_cmprssptrs_le-OpenJ9
3.5G    /jobs/genie.openj9/PullRequest-Sanity-JDK8-aix_ppc-64_cmprssptrs-OpenJ9
3.5G    /jobs/genie.openj9/PullRequest-Sanity-JDK10-aix_ppc-64_cmprssptrs-OpenJ9
3.4G    /jobs/genie.openj9/Test-Sanity-JDK8-aix_ppc-64_cmprssptrs
3.3G    /jobs/genie.openj9/Test-Sanity-JDK9-linux_ppc-64_cmprssptrs_le
3.3G    /jobs/genie.openj9/PullRequest-Sanity-JDK8-linux_ppc-64_cmprssptrs_le-OpenJ9
3.2G    /jobs/genie.openj9/Test-Sanity-JDK8-linux_x86-64_cmprssptrs
3.1G    /jobs/genie.openj9/Test-Sanity-JDK8-linux_ppc-64_cmprssptrs_le
3.1G    /jobs/genie.openj9/Test-Sanity-JDK10-linux_ppc-64_cmprssptrs_le
2.9G    /jobs/genie.openj9/Build-JDK9-linux_ppc-64_cmprssptrs_le
2.8G    /jobs/genie.openj9/Adam_PullRequest-Sanity-JDK9-linux_ppc-64_cmprssptrs_le-OpenJ9
2.6G    /jobs/genie.openj9/PullRequest-Sanity-JDK8-linux_x86-64_cmprssptrs-OpenJ9
2.5G    /jobs/genie.openj9/PullRequest-Sanity-JDK10-linux_x86-64_cmprssptrs-OpenJ9
2.5G    /jobs/genie.openj9/Build-JDK9-linux_390-64_cmprssptrs
2.5G    /jobs/genie.openj9/Build-JDK9-aix_ppc-64_cmprssptrs
2.5G    /jobs/genie.openj9/Build-JDK10-linux_x86-64_cmprssptrs
2.2G    /jobs/genie.openj9/Test-Sanity-JDK10-aix_ppc-64_cmprssptrs
2.2G    /jobs/genie.openj9/Build-JDK10-linux_390-64_cmprssptrs
2.1G    /jobs/genie.openj9/PullRequest-Sanity-JDK9-linux_x86-64_cmprssptrs-OpenJ9
2.1G    /jobs/genie.openj9/Build-JDK10-linux_ppc-64_cmprssptrs_le

AdamBrousseau added a commit to AdamBrousseau/openj9 that referenced this issue May 29, 2018
[skip ci]

Issue eclipse-openj9#1017

Signed-off-by: Adam Brousseau <adam.brousseau88@gmail.com>
@AdamBrousseau
Copy link
Contributor Author

Only storing 5 SDKs per build now. Usage went from 180G to 173G.
Also noticed that the workspaces on master are consuming 35G. These are git repos from when we do the initial checkouts before we decide which node to run on. It should be easy to ensure these are cleaned up when we leave the node. I can also add a ref repo to the master to save clone times. I'd like to block this change on #1897.

AdamBrousseau added a commit to AdamBrousseau/openj9 that referenced this issue Aug 29, 2018
- Add Build Discarder
  - Retrieve values from Variable file
- Add job Parameters
  - Default values for Vendor Params must be set globally
    even if they are blank
- No Trigger for Build and Pipeline jobs
- TODO add trigger for PR builds

Issue eclipse-openj9#1017
[skip ci]
Signed-off-by: Adam Brousseau <adam.brousseau88@gmail.com>
AdamBrousseau added a commit to AdamBrousseau/openj9 that referenced this issue Sep 7, 2018
- Add Build Discarder
  - Retrieve values from Variable file
- Add job Parameters
  - Default values for Vendor Params must be set globally
    even if they are blank
- No Trigger for Build and Pipeline jobs
- TODO add trigger for PR builds

Issue eclipse-openj9#1017
[skip ci]
Signed-off-by: Adam Brousseau <adam.brousseau88@gmail.com>
AdamBrousseau added a commit to AdamBrousseau/openj9 that referenced this issue Oct 3, 2018
- Minimally we need to pass ARTIFACTORY_SERVER
  to the test jobs in order for them to upload
  test artifacts to Artifactory.
- Write the rest of the Artifactory variables
  to the build env for future use.
- We have the option to pass more of these
  variables in the future if we wish.

See
- adoptium/aqa-tests#591
- adoptium/aqa-tests#561
- eclipse-openj9#1017
[skip ci]

Signed-off-by: Adam Brousseau <adam.brousseau88@gmail.com>
AdamBrousseau added a commit to AdamBrousseau/openj9 that referenced this issue Oct 13, 2018
- Delete all Build artifacts after the overall pipeline has completed
- Always remove artifacts even if tests fail

Related eclipse-openj9#1017
[skip ci]

Signed-off-by: Adam Brousseau <adam.brousseau88@gmail.com>
@AdamBrousseau
Copy link
Contributor Author

Can this be closed and reopened later if needed?

Since the last comment, we've added Artifactory, which moved all our binaries off Master. We've also changed the way our builds are arranged (#2836 and #5182) and how many we store of each (#6021).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants