Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease memory usage by K8S Clients #2463

Merged
merged 1 commit into from
Jul 11, 2023
Merged

Conversation

jdn5126
Copy link
Contributor

@jdn5126 jdn5126 commented Jul 11, 2023

What type of PR is this?
bug

Which issue does this PR fix:
#2436

What does this PR do / Why do we need it:
This PR modifies the schemas used by the raw and cached Kubernetes clients in the IPAMD and Metrics Helper agents. These clients only need to add the corev1 scheme: https://github.com/kubernetes/api/blob/v0.26.5/core/v1/register.go#L45 , as the clients only make GET and LIST calls for objects added by this scheme.

Decreasing the number of objects loaded decreases the memory consumed by these clients. For IPAMD, a selector is added for pods such that only pods on the deployed node are cached. Metrics Helper does not add this selector, as it needs to query aws-node pods on all nodes. Metrics Helper already has a label selector in its pod watcher.

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:

Testing done on this change:
Running the performance tests yielded the following memory decrease:
v1.13.2:

  • Base memory usage of aws-node pod: 44Mi
  • After deploying 5000 pods: 107Mi
  • After scaling down to 0 pods: 70Mi

this PR:

  • Base memory usage of aws-node pod: 30Mi
  • After deploying 5000 pods: 45Mi
  • After scaling down to 0 pods: 45Mi

I also validated that all integration tests pass. I will schedule a manual run against this PR as well.

Automation added to e2e:
N/A

Will this PR introduce any new dependencies?:
No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No, Yes

Does this change require updates to the CNI daemonset config files to work?:
No

Does this PR introduce any user-facing change?:
Yes

Decrease `aws-node` and `cni-metrics-helper` memory usage.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jdn5126 jdn5126 requested a review from a team as a code owner July 11, 2023 16:36
@jdn5126 jdn5126 changed the title k8s clients only need to access corev1; add pod selector Decrease memory usage by K8S Clients Jul 11, 2023
@jdn5126
Copy link
Contributor Author

jdn5126 commented Jul 11, 2023

Copy link
Contributor

@haouc haouc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jdn5126 jdn5126 merged commit 1d88b8f into aws:master Jul 11, 2023
4 checks passed
@jdn5126 jdn5126 deleted the cache_scale branch July 11, 2023 21:56
jdn5126 added a commit that referenced this pull request Jul 11, 2023
* refactor canary test to access images from AWS registries (#2398)

* upgrade client-go and controller-runtime modules (#2396)

* updates for v1.13.0 release (#2400)

* chore: Added dependabot (#2403)

* dependency updates (#2412)

* deprecate ENABLE_NFTABLES and set iptables mode using iptables-wrapper script (#2402)

* update networking test agent to go1.20 and latest sys module (#2413)

* skip delete test cluster to debug (#2414)

* Revert "skip delete test cluster to debug (#2414)" (#2415)

This reverts commit 7c30943.

* authenticate to test image registry (#2417)

* update test agent image (#2419)

* update test agent hash in go.mod (#2422)

* fix hard-coded nitro instances (#2428)

* move authentication step from test canary script (#2429)

* node initialization must come after primary ENI's security groups are synced to cache (#2427)

* Add 1.27 to Rec Version Table (#2404)

* revise rec version table

* make DOCKER_ARGS a passable var from CLI builds (#2434)

Signed-off-by: jonahjon <jonahjones094@gmail.com>

* Update Kops cluster to latest and add parameter for kops version (#2435)

* Updates instance limits including c7gn (#2438)

* Update Kops cluster to latest and add parameter for kops version (#2440)

* update image tag to v1.13.2 (#2432)

* update docs and CNI logging (#2433)

* remove default canary test run from integration tests (#2443)

* Silences nightly cron jobs for forks (#2444)

* Silences weekly cron jobs for forks (#2459)

* refactor performance tests (#2455)

* add custom-networking test covering ENIConfig objects with no security (#2445)

groups

* k8s clients only need to access corev1; add pod selector (#2463)

---------

Signed-off-by: jonahjon <jonahjones094@gmail.com>
Co-authored-by: Olivia Song <sonyingy@amazon.com>
Co-authored-by: Ellis Tarn <ellistarn@gmail.com>
Co-authored-by: Geoffrey Cline <geoffreyc@outlook.com>
Co-authored-by: Jonah Jones <jonahjones094@gmail.com>
Co-authored-by: Jay Deokar <23660509+jaydeokar@users.noreply.github.com>
Co-authored-by: Matt <matt.merkes@gmail.com>
Co-authored-by: Matt <merkes@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants