OOM keda operator and metricServer #4687

yuvalweber · 2023-06-14T16:43:54Z

Report

For some reason after deploying only one scaledObject in my cluster (very large cluster) the keda-operator started crashing due to OOM (before that he was using only 20Mi).
I am using the default spec of keda which means I have 100Mi and limits to 1000Mi.
Because of the OOM I changed the pod to have 2Gi and now he can survive with 600Mi of memory.
After that the metrics server started crashing due to OOM as well and when I changed his configuration as well he manage to work but bounced as well to this amount of memory.

my question is how can I investigate what causing this memory burst cause with debug logs I can’t see anything which seems related.

Expected Behavior

Shouldn’t jump to 30 times more memory consumption due to only one scaled object

Actual Behavior

Jump to large amount of memory

Steps to Reproduce the Problem

Have a large cluster
Deploy a scaled object with Prometheus scaler
Have a memory burst

Logs from KEDA operator

example

KEDA Version

2.8.1

Kubernetes Version

1.23

Platform

Amazon Web Services

Scaler Details

Prometheus

Anything else?

No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2023-06-14T18:29:08Z

Could you share the logs please?

yuvalweber · 2023-06-14T19:02:14Z

Of course.
keda-operator-logs.log
keda-metric-server.log

JorTurFer · 2023-06-14T19:46:13Z

I see that you are registering 106 custom CAs, is that correct?
Could you share KEDA operator/metrics server deployment yaml? are you using helm?

yuvalweber · 2023-06-14T20:03:46Z

How can I check this thing with the CA cause I don't see that i'm registering 106 CAs?.

This are the deployments
keda-operator-deplopyment.log
keda-metrics-server-deployment.log

JorTurFer · 2023-06-14T20:28:58Z

oh f**k,
I was wrong, that CAs are internals, ignore my previous comment xD

JorTurFer · 2023-06-14T20:30:15Z

Could you share the ScaledObject that you are deploying?

yuvalweber · 2023-06-14T20:39:31Z

Of course
keda-scaled-object.log

zroubalik · 2023-06-15T08:15:44Z

I see you are using 2.8.1 could you please update the version? I recall there were some critical issues fixed.

yuvalweber · 2023-06-15T15:28:27Z

Hey I can't try this right now but I have a memory profile.
Maybe you can help me understand it?
keda_memory_map.pdf

yuvalweber · 2023-06-27T10:18:03Z

Found out what the problem was.
Turned out that because I was using an older version of keda I was using a controller-runtime older version as well (0.12.3 instead of 0.15.0).
In Keda version 2.11.0 there was upgrade to newer version of controller-runtime.
In this version of controller-runtime they are not using the createStructuredListWatch which has problem with caching and they changed many settings regarding the caching of the informers the controller uses.

When I looked at the differences between the requests heading to the api-server of kubernetes I could see that in version 2.8.1 it queries: "/api/v1/secrets?limit=500"
and in version 2.11.0 it queries: "/api/v1/secrets/namespaces/<namespace_name>?limit=500"
which gets way less secrets and didn't fill the all memory we provided to keda.

Just added here the explanation because I thought it would be good to other people as well.

zroubalik · 2023-06-27T10:44:07Z

@yuvalweber Thanks, appreciate that!

yuvalweber added the bug Something isn't working label Jun 14, 2023

yuvalweber closed this as completed Jun 27, 2023

yuvalweber mentioned this issue Aug 28, 2023

Keda-Operator OOM problem after upgrade to Keda v2.11.* #4789

Closed

yuvalweber mentioned this issue Oct 19, 2023

Support profiling for keda components #5091

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM keda operator and metricServer #4687

OOM keda operator and metricServer #4687

yuvalweber commented Jun 14, 2023

JorTurFer commented Jun 14, 2023

yuvalweber commented Jun 14, 2023

JorTurFer commented Jun 14, 2023

yuvalweber commented Jun 14, 2023

JorTurFer commented Jun 14, 2023

JorTurFer commented Jun 14, 2023

yuvalweber commented Jun 14, 2023

zroubalik commented Jun 15, 2023

yuvalweber commented Jun 15, 2023

yuvalweber commented Jun 27, 2023

zroubalik commented Jun 27, 2023

OOM keda operator and metricServer #4687

OOM keda operator and metricServer #4687

Comments

yuvalweber commented Jun 14, 2023

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Jun 14, 2023

yuvalweber commented Jun 14, 2023

JorTurFer commented Jun 14, 2023

yuvalweber commented Jun 14, 2023

JorTurFer commented Jun 14, 2023

JorTurFer commented Jun 14, 2023

yuvalweber commented Jun 14, 2023

zroubalik commented Jun 15, 2023

yuvalweber commented Jun 15, 2023

yuvalweber commented Jun 27, 2023

zroubalik commented Jun 27, 2023