keda-operator v2.13.0 leaks go routines #5448

FrancoisPoinsot · 2024-01-31T14:28:40Z

Report

After upgrading keda to 2.13.0 there seems to be a memory leak
Looking at go_goroutines metric I see the number growing indefinitely. Confirmed at least above 30k.
Here is a graph for go_goroutines.

I have deployed keda in different cloud vendor. I only see this issue in GCP.
Might be related to pubsub scalers I used only in GCP clusters.

Expected Behavior

memory/goroutine count remains somewhat constant.
You can see very clearly on the graph above when the upgrade to v2.13.0 happens.

Actual Behavior

memory increases indefinitely

Steps to Reproduce the Problem

Logs from KEDA operator

example

KEDA Version

2.13.0

Kubernetes Version

1.26

Platform

Google Cloud

Scaler Details

prometheus, gcp-pubsub

Anything else?

No response

The text was updated successfully, but these errors were encountered:

zroubalik · 2024-01-31T14:32:02Z

Thanks for reporting. Could you please check whether you can see the leak also when you use only Prometheus scaler on GCP? So we can narrow down the possible problems. Thanks!

FrancoisPoinsot · 2024-01-31T15:21:33Z

With only Prometheus scalers, the go routines count is stable at 178 go routines.

FrancoisPoinsot · 2024-01-31T15:44:05Z

here are the goroutines from pprof.

goroutines.txt

zroubalik · 2024-01-31T18:37:14Z

Cool, thanks for confirmation. And to clarify, this doesn't happen with version < 2.13.0 ? If it is a regression, than we should be able to track down changes in GCP pubu sub scaler.

FrancoisPoinsot · 2024-02-01T08:57:25Z

I confirm this does not happen in v2.12.1

JorTurFer · 2024-02-01T09:25:30Z

Maybe it's something related with the changes in the gcp client?

JorTurFer · 2024-02-01T10:54:34Z

Do you see errors in KEDA operator logs? Maybe we are not closing well the client on failures? Could this be related with #5429? (as the scaler cache is being refreshed on each error)

JorTurFer · 2024-02-01T11:03:40Z

yeah, the new queryClient isn't closed, so if the scaler is being refreshed due to #5429, the connections aren't properly closed. I guess that it could be the root cause (I'll update my PR)

FrancoisPoinsot added the bug Something isn't working label Jan 31, 2024

JorTurFer mentioned this issue Feb 1, 2024

fix(gcp scalers): Restore previous time horizon to fix missing metrics and properly close the connecctions #5452

Merged

1 task

JorTurFer closed this as completed in #5452 Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keda-operator v2.13.0 leaks go routines #5448

keda-operator v2.13.0 leaks go routines #5448

FrancoisPoinsot commented Jan 31, 2024 •

edited

Loading

zroubalik commented Jan 31, 2024 •

edited

Loading

FrancoisPoinsot commented Jan 31, 2024

FrancoisPoinsot commented Jan 31, 2024

zroubalik commented Jan 31, 2024

FrancoisPoinsot commented Feb 1, 2024

JorTurFer commented Feb 1, 2024

JorTurFer commented Feb 1, 2024

JorTurFer commented Feb 1, 2024 •

edited

Loading

keda-operator v2.13.0 leaks go routines #5448

keda-operator v2.13.0 leaks go routines #5448

Comments

FrancoisPoinsot commented Jan 31, 2024 • edited Loading

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

zroubalik commented Jan 31, 2024 • edited Loading

FrancoisPoinsot commented Jan 31, 2024

FrancoisPoinsot commented Jan 31, 2024

zroubalik commented Jan 31, 2024

FrancoisPoinsot commented Feb 1, 2024

JorTurFer commented Feb 1, 2024

JorTurFer commented Feb 1, 2024

JorTurFer commented Feb 1, 2024 • edited Loading

FrancoisPoinsot commented Jan 31, 2024 •

edited

Loading

zroubalik commented Jan 31, 2024 •

edited

Loading

JorTurFer commented Feb 1, 2024 •

edited

Loading