Possible memory leak in Keda-operator with Kafka scaler #814

jeli8-cor · 2020-05-12T15:57:51Z

Hi everyone,
I started to use keda with the kafka scaler, defined it pretty simple according to the example and after deploying it to production, I noticed that every 3 days the pod reaches the kubernetes limits and get OOM. The memory increasing constantly and I'm not really sure why.
This is the deployment description (I added few parameters such as limits, priorityClass and others):

Name:                   keda-operator
Namespace:              keda
CreationTimestamp:      Mon, 27 Apr 2020 12:56:42 +0300
Labels:                 app=keda-operator
                        app.kubernetes.io/component=operator
                        app.kubernetes.io/name=keda-operator
                        app.kubernetes.io/part-of=keda-operator
                        app.kubernetes.io/version=1.4.1
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               app=keda-operator
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=keda-operator
  Service Account:  keda-operator
  Containers:
   keda-operator:
    Image:      docker.io/kedacore/keda:1.4.1
    Port:       <none>
    Host Port:  <none>
    Command:
      keda
    Args:
      --zap-level=info
    Limits:
      cpu:     100m
      memory:  200Mi
    Requests:
      cpu:     100m
      memory:  200Mi
    Environment:
      WATCH_NAMESPACE:
      POD_NAME:          (v1:metadata.name)
      OPERATOR_NAME:    keda-operator
    Mounts:             <none>
  Volumes:              <none>
  Priority Class Name:  line-of-business-service
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   keda-operator-fd678455c (1/1 replicas created)
Events:          <none>

The Heap starts at around 30-40M and rinse till almost 200M, jumps to 240 and up and get OOM and restarted by the kubernetes daemon set.

Steps to Reproduce the Problem

create a scaledObject to read from a Kafka topic.
After hours of running the memory keeps increasing till it reached the pod limit

Specifications

KEDA Version: 1.4.1
Platform & Version: Kafka 2.3.0
Kubernetes Version: 1.15
Scaler(s): Kafka

The text was updated successfully, but these errors were encountered:

zroubalik · 2020-05-12T17:00:03Z

@jeli8-cor thanks for submitting the issue. Are you willing to help a little bit with tracking down the bug? Could you please ping me on slack and we can sync?

jeli8-cor · 2020-05-13T04:33:13Z

Sure, I would love to help with that. How can I find you in slack?

zroubalik · 2020-05-13T07:29:16Z

@jeli8-cor great, you can find me on kubernetes slack, #keda channel

jeli8-cor · 2020-05-13T12:01:01Z

Following a few tests and debugs with the dear friend @zroubalik here, I'm updating about the issue:
The issue exists in versions: 1.3.0, 1.4.0, 1.4.1 (these are the ones I checked).
The issue resolved in version V2 , still in alpha, but currently the pod runs over 3 hours now without even a minor jump in memory.
The scaler I used was Kafka and I tested it with increasing the lag, increasing the throughput, and "trigger" the scale up and down of the scaler.
According to my checks I test only Kafka scaler and prometheus scaler. Not sure 100%, but I think it's in prometheus scaler too.

zroubalik · 2020-05-13T12:22:35Z

@jeli8-cor thanks a lot for the testing!

We should speed up development (and release) of v2, in case we are not able to find the cause and proved a fix for this issue in v1.

zroubalik · 2020-07-08T15:10:57Z

Adding note that this should be included in changelog, so we don't forget.

eexwhyzee · 2020-07-30T21:21:02Z

also experienced the same memory leak issues using keda v1.4.1 with the redis list scaler, but I upgraded to v1.5.0 and looks like that resolved it

lallinger-tech · 2020-09-21T08:30:10Z

Memory leak still exists in v2.0.0-beta, but it seems to be better. Memory and CPU resources rise slowly

zroubalik · 2020-09-21T08:35:43Z

@lallinger-arbeit would you mind sharing, which scalers (and how many of them) are you using? Thanks

lallinger-tech · 2020-09-21T08:39:43Z

@zroubalik Yeah, we have 23 scaledobjects of which 17 use a kafka trigger and 6 a redis trigger

mariusgrigoriu · 2020-12-17T12:35:28Z

+1 to CPU and memory usage rising over time. 1000 SO exclusively using the Kafka scaler:

MaxWinterstein · 2021-02-18T11:39:47Z

also experienced the same memory leak issues using keda v1.4.1 with the redis list scaler, but I upgraded to v1.5.0 and looks like that resolved it

Having only redis scalers and running on 2.0 I can confirm it is still existing for me.

Will try 2.1 and if it still is a thing create some own issue on that as is don't see a reference to redis directly in this one.

stale · 2021-10-13T20:08:26Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

lallinger-tech · 2021-10-14T07:35:58Z

AFAIK this has not been resolved yet. But i will try to confirm this with the latest 2.4.0 version in the next few weeks, as i don't have the time at the moment

ahmelsayed · 2021-10-14T23:54:38Z

@lallinger-arbeit I think this was the same root cause as #1565 fixed in #1572

jeli8-cor added the bug Something isn't working label May 12, 2020

zroubalik added this to the v2.0 milestone Jul 8, 2020

tomkerkhove removed this from the v2.0 milestone Jan 6, 2021

ahmelsayed self-assigned this Feb 19, 2021

stale bot added the stale All issues that are marked as stale due to inactivity label Oct 13, 2021

stale bot removed the stale All issues that are marked as stale due to inactivity label Oct 14, 2021

ahmelsayed closed this as completed Oct 14, 2021

SpiritZhou pushed a commit to SpiritZhou/keda that referenced this issue Jul 18, 2023

Add Cisco as a user + Add TLSv1.3 docs (kedacore#814)

4149eea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible memory leak in Keda-operator with Kafka scaler #814

Possible memory leak in Keda-operator with Kafka scaler #814

jeli8-cor commented May 12, 2020 •

edited by zroubalik

Loading

zroubalik commented May 12, 2020 •

edited

Loading

jeli8-cor commented May 13, 2020

zroubalik commented May 13, 2020

jeli8-cor commented May 13, 2020

zroubalik commented May 13, 2020

zroubalik commented Jul 8, 2020

eexwhyzee commented Jul 30, 2020

lallinger-tech commented Sep 21, 2020

zroubalik commented Sep 21, 2020

lallinger-tech commented Sep 21, 2020

mariusgrigoriu commented Dec 17, 2020

MaxWinterstein commented Feb 18, 2021

stale bot commented Oct 13, 2021

lallinger-tech commented Oct 14, 2021

ahmelsayed commented Oct 14, 2021

Possible memory leak in Keda-operator with Kafka scaler #814

Possible memory leak in Keda-operator with Kafka scaler #814

Comments

jeli8-cor commented May 12, 2020 • edited by zroubalik Loading

Steps to Reproduce the Problem

Specifications

zroubalik commented May 12, 2020 • edited Loading

jeli8-cor commented May 13, 2020

zroubalik commented May 13, 2020

jeli8-cor commented May 13, 2020

zroubalik commented May 13, 2020

zroubalik commented Jul 8, 2020

eexwhyzee commented Jul 30, 2020

lallinger-tech commented Sep 21, 2020

zroubalik commented Sep 21, 2020

lallinger-tech commented Sep 21, 2020

mariusgrigoriu commented Dec 17, 2020

MaxWinterstein commented Feb 18, 2021

stale bot commented Oct 13, 2021

lallinger-tech commented Oct 14, 2021

ahmelsayed commented Oct 14, 2021

jeli8-cor commented May 12, 2020 •

edited by zroubalik

Loading

zroubalik commented May 12, 2020 •

edited

Loading