-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10x memory usage in 1.13.2
compared to 1.12.x
#2436
Comments
Mind you, in a smaller cluster (19 nodes), memory usage is much more reasonable (still a bit higher)
Whereas in the main example (in the description), we are at about 100 nodes, so that's why I suspect node count is somehow correlated with a memory increase in the latest release. |
@alam0rt |
Also, I see that your pod names are |
0.13.1
compared to 0.12.x
1.13.1
compared to 1.12.x
We have some ruby which pulls down container["name"] = "aws-cni We also build
My bad, fixed the typo. |
Tested using unmodified (yet built ourselves) aws-cni container and same behaviour. I can probably get a pprof going. |
1.13.1
compared to 1.12.x
1.13.2
compared to 1.12.x
@adammw and I suspect it's due to the caching client: amazon-vpc-cni-k8s/pkg/k8sapi/k8sutils.go Lines 61 to 92 in 0ac4b39
Am going to try adding a selector to scope to the node only. Probably will use
|
This seems to do the trick. |
Thanks for the excellent debugging @alam0rt and @adammw ! Sorry, just getting back from vacation, but I assumed that the k8s client cache or EC2 metadata cache could be the only components in IPAMD where an issue could cause memory to scale with the number of nodes/pods. There is also a big delta in I'd like to do some further digging and testing on #2439 this week and then I can approve. |
We just ran into this rolling |
@jdn5126 We tried bumping up our limits to 1GI and we still got OOMs... so we are on 1.12.6 until this is out. Thanks. |
Look like the PR was merged and release Does anyone know when this version will be available in
|
Hi @sam-som , that pipeline is in progress and should be completed by the end of this week |
Closing as fixed by #2463 and released in v1.13.3 |
|
Deploying 1.13.2 revealed that the memory usage at both start up + after running for some time has drastically increased.
At start up, we saw memory usage climb to 450 or so Mi and settle down to about 400Mi.
This appears to increase with the number of nodes in the cluster.
The red bar is OOM kills, pre-spike is running 1.12.0, post is 1.13.2.
1.12.0
1.13.2
What happened:
Updated to 1.13.2
Attach logs
What you expected to happen:
For aws-cni not to use 10 times the memory.
How to reproduce it (as minimally and precisely as possible):
Deploy 1.13.2
Anything else we need to know?:
Environment:
kubectl version
): 1.25.10cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: