-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ SSA: Implement request caching #8207
Conversation
/test pull-cluster-api-e2e-full-main |
kudos for collecting those data! |
internal/controllers/topology/cluster/structuredmerge/dryrun.go
Outdated
Show resolved
Hide resolved
8b117a4
to
a890355
Compare
/retest |
Refactored based on suggestions and discussions from/with Fabrizio. Also adde unit tests and godoc. Should be ~ ready. |
@fabriziopandini @ykakarap PTAL :) |
/hold Let's merge #8213 before this PR. I want to have the metrics on main before this PR |
/test pull-cluster-api-e2e-full-main |
a890355
to
a8b18dd
Compare
a8b18dd
to
888c7d2
Compare
/test pull-cluster-api-e2e-full-main |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall implementation looks great!
Thanks for doing this 🙂.
Just small nits.
internal/controllers/topology/cluster/structuredmerge/dryrun.go
Outdated
Show resolved
Hide resolved
lgtm from my side, ping me when the last round of nits is fixed |
888c7d2
to
539760a
Compare
/test pull-cluster-api-e2e-full-main /hold Want to have full e2e test results before merging |
/lgtm |
LGTM label has been added. Git tree hash: 0020cbb8f4de2d8a3cabfd8d4e10d14faf8a2512
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I'll merge once ci is green and we're done with the new v1.4 beta release |
I'll wait a bit longer. I would like to have at least 1-2 green runs of the periodics with metrics + propagation fully implemented so I can compare before/after this optimization |
/lgtm |
/hold cancel |
Took a look at the data before/after this PR merged: Cluster topology controller:
MD controller:
MS controller:
Machine controller: (various workload clusters)
=> Before: 19663 => After: 3892 KCP controller:
tl;dr 5x reduction of calls for everything except KCP and MD controller. I'll take a look at those, maybe the caching doesn't work there for some reason |
What this PR does / why we need it:
In recent weeks we added in-place propagation across a few controllers and we introduced an additional SSA dryrun in our Cluster topology controller. Basically on every reconcile we run SSA against the apiserver.
Now it's time to optimize the amounts of SSA calls to avoid overloading the apiserver.
This PR implements caching for SSA requests. The high-level idea is that once we know for a given original and modified object that the request doesn't lead to any changes, we cache this information for 10 minutes.
SSA will only be executed again once either original or modified objects change or the 10 minutes are over.
Some data:
The first two are ~ quickstart cluster creations with ClusterClass & caching => ~ 148 Patch calls
The last one is the same without caching => ~ 310 Patch calls until the cluster exists and then just continuous calls.
This PR currently contains caching for:
The difference should be even bigger once we implemented KCP reconciliation with SSA.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #8146