Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(repo-server): excess git request when using multi-source and ref sources (Issue #14725) #16309

Closed
wants to merge 5 commits into from

Conversation

nromriell
Copy link
Contributor

@nromriell nromriell commented Nov 13, 2023

Fixes #14725

Overview

There are a couple issues with using multi-source and ref based sources in ArgoCD right now. At scale these start to create enormous amounts of requests to the git server making this pattern unsustainable.

This broke down to three issues:

  1. The ls-remote calls coming from resolveReferencedSources and runManifestGenAsync were not using the cache.
  2. Any ls-remote calls that happen at the same time would all miss the cache if it didn't exist
  3. ref only sources were still following the generate manifest path. This causes them to continually miss the manifest cache causing excess fetch calls.

The related fixes in this PR:

  1. Updated the ls-remote calls from resolveReferencedSources and runManifestGenAsync to have access to the cache
  2. Added a cache based lock per key(repo) for git references. During calls where many Applications are trying to get the same repo data these now only get the references once. Note This does not apply to when the Application is first loaded or it is manually refreshed, in those cases the cache is skipped so the same number of calls occur. The fix for that should be an easy follow up but for now focused on removing the high number of sustained calls.
  3. During calls to manifest generation if the source is a ref only source (no path, or chart, Application contains multiple sources, and Ref is set) then manifest generation is skipped altogether and only the resolved revisions are calculated.

Additional changes:

reposerver/cache/cache

  • The top level cache is now a twolevelclient to take advantage of the in-memory cache on top of the redis cache. Options are used to skip the in-memory cache for most items, this results in no change for most items in their cache.
    These items were changed to use both in-memory and redis from the two level cache:
    - git refs
    - git directories
    - revision metadata
    - helm indexes
    - chart details

I moved everything that would have a relatively low memory footprint to in-memory and kept the others using redis to prevent any large changes to repo server memory requirements

util/cache/cache

  • Added the ability for top level entry AddCacheFlagsToCmd to return a two level cache instead of just the redis cache
  • Updated the interface for SetItem replacing delete bool with an options object
  • Small change to consolidate all the cache key generations into a single function

util/cache/client

  • Reconfigured the CacheClient interface to just accept a single Item parameter for Get calls, interface now matches Set. Options are passed as part of that item.

util/cache/twolevelclient

  • Added some logic to Get and Set to allow options to be passed in by the Item to skip either the local or remote cache. This allows us to swap the cache in place in parts of the application without forcing the move for everything to use both in-memory and redis

Comparison of 200 multi-source apps with the second source being just a values ref and with revision cache invalidation every 30s:

Before:

Screenshot from 2023-11-18 18-24-30

After:

Screenshot from 2023-11-18 19-33-39

Transitioning from v2.9.0 to the custom image

Screenshot from 2023-11-18 19-34-22

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • I have added a brief description of why this PR is necessary and/or what this PR solves.

Copy link

codecov bot commented Nov 13, 2023

Codecov Report

Attention: 39 lines in your changes are missing coverage. Please review.

Comparison is base (6fa9b17) 49.55% compared to head (33808ec) 49.86%.

Files Patch % Lines
util/git/client.go 11.76% 15 Missing ⚠️
util/cache/cache.go 70.73% 9 Missing and 3 partials ⚠️
util/cache/twolevelclient.go 55.00% 9 Missing ⚠️
reposerver/cache/cache.go 97.81% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #16309      +/-   ##
==========================================
+ Coverage   49.55%   49.86%   +0.31%     
==========================================
  Files         269      269              
  Lines       47039    47215     +176     
==========================================
+ Hits        23309    23545     +236     
+ Misses      21444    21373      -71     
- Partials     2286     2297      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nromriell nromriell requested a review from a team as a code owner November 13, 2023 09:40
@nromriell nromriell force-pushed the lsremote branch 2 times, most recently from f7d8914 to 973dd4d Compare November 13, 2023 09:44
@nromriell nromriell changed the title fix(repo-server): excess git request when using multi-source and ref sources (Issue #14725) Draft: fix(repo-server): excess git request when using multi-source and ref sources (Issue #14725) Nov 13, 2023
@nromriell nromriell changed the title Draft: fix(repo-server): excess git request when using multi-source and ref sources (Issue #14725) fix(repo-server): excess git request when using multi-source and ref sources (Issue #14725) Nov 13, 2023
Signed-off-by: nromriell <nateromriell@gmail.com>
Signed-off-by: nromriell <nateromriell@gmail.com>
Signed-off-by: nromriell <nateromriell@gmail.com>
Signed-off-by: nromriell <nateromriell@gmail.com>
@nromriell nromriell force-pushed the lsremote branch 2 times, most recently from 8458366 to e833653 Compare November 19, 2023 00:03
@nromriell nromriell changed the title fix(repo-server): excess git request when using multi-source and ref sources (Issue #14725) Draft: fix(repo-server): excess git request when using multi-source and ref sources (Issue #14725) Nov 19, 2023
@nromriell
Copy link
Contributor Author

Moving this back to draft as I work through some changes after the discussion on Thursday

… consistent behavior

Signed-off-by: nromriell <nateromriell@gmail.com>
@nromriell nromriell changed the title Draft: fix(repo-server): excess git request when using multi-source and ref sources (Issue #14725) fix(repo-server): excess git request when using multi-source and ref sources (Issue #14725) Nov 19, 2023
@nromriell
Copy link
Contributor Author

Added taking advantage of the two level cache and removing the second method from the util/cache interface to avoid confusion so this should be ready to review now. The actual changes are fairly small but added quite a few lines in mocks and tests

@crenshaw-dev
Copy link
Member

@nromriell is it possible to break fixes 1, 2, and 3 into separate PRs? I think it's a bit too big to reason about right now.

Copy link
Collaborator

@leoluz leoluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an initial comment to align on the strategy behind this PR. A proper review is still required.

Comment on lines 33 to 37
type Item struct {
Key string
Object interface{}
CacheActionOpts
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any strong reason to implement this struct as part of this PR? It changes the signature of the CacheClient interface. It seems an interesting refactoring but I think it would be better to keep this PR as small as possible. How about provide code refactoring only in a separate PR? The reason is that it allows us to easily rollback changes if required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary reason is for (2) in my description here. The need for DisableOverwrite (redis NX or memcache add) otherwise depending on timing each caller could think they were the lock holder, of course in practice that likely wouldn't be the case but it would cause inconsistent behavior.

To add that as an option I'd have to modify the caller anyway to add another option.

This struct already exists as well it's in line 13 of the diff. I just add CacheActionOpts to it.

@nromriell
Copy link
Contributor Author

Thanks @crenshaw-dev @leoluz for taking a look, I'll spend some time later today and split these pieces out into separate PRs

@nromriell
Copy link
Contributor Author

I've split this PR into four parts. Each piece is on it's own a decent improvement to the overall git requests. We can just start with the first one that's open in this repo and then I'll retarget the others once the one's before it have been merge. The first two are fairly standalone, it's only the testing mocks that are shared. The last one has a hard dependency on the one before it.

  1. fix(repo-server): excess git requests, resolveReferencedSources and runManifestGenAsync not using cache (Issue #14725) #16410
  2. fix(repo-server): excess git requests, short-circuit GenerateManifests ref only (Issue #14725) nromriell/argo-cd#1
  3. fix(repo-server): excess git requests, cache lock on revisions nromriell/argo-cd#2
  4. fix(repo-server): excess git requests, improve cache performance by adding two level cache nromriell/argo-cd#3

@nromriell
Copy link
Contributor Author

I'll close out this PR in favor of those

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Excessive requests to git repository on commit to monorepo w/ multi-source apps
3 participants