Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keda-operator v2.13.0 leaks go routines #5448

Closed
FrancoisPoinsot opened this issue Jan 31, 2024 · 8 comments · Fixed by #5452
Closed

keda-operator v2.13.0 leaks go routines #5448

FrancoisPoinsot opened this issue Jan 31, 2024 · 8 comments · Fixed by #5452
Labels
bug Something isn't working

Comments

@FrancoisPoinsot
Copy link

FrancoisPoinsot commented Jan 31, 2024

Report

After upgrading keda to 2.13.0 there seems to be a memory leak
Looking at go_goroutines metric I see the number growing indefinitely. Confirmed at least above 30k.
Here is a graph for go_goroutines.
image

I have deployed keda in different cloud vendor. I only see this issue in GCP.
Might be related to pubsub scalers I used only in GCP clusters.

Expected Behavior

memory/goroutine count remains somewhat constant.
You can see very clearly on the graph above when the upgrade to v2.13.0 happens.

Actual Behavior

memory increases indefinitely

Steps to Reproduce the Problem

Logs from KEDA operator

example

KEDA Version

2.13.0

Kubernetes Version

1.26

Platform

Google Cloud

Scaler Details

prometheus, gcp-pubsub

Anything else?

No response

@FrancoisPoinsot FrancoisPoinsot added the bug Something isn't working label Jan 31, 2024
@zroubalik
Copy link
Member

zroubalik commented Jan 31, 2024

Thanks for reporting. Could you please check whether you can see the leak also when you use only Prometheus scaler on GCP? So we can narrow down the possible problems. Thanks!

@FrancoisPoinsot
Copy link
Author

With only Prometheus scalers, the go routines count is stable at 178 go routines.

@FrancoisPoinsot
Copy link
Author

here are the goroutines from pprof.

goroutines.txt

@zroubalik
Copy link
Member

Cool, thanks for confirmation. And to clarify, this doesn't happen with version < 2.13.0 ? If it is a regression, than we should be able to track down changes in GCP pubu sub scaler.

@FrancoisPoinsot
Copy link
Author

I confirm this does not happen in v2.12.1

@JorTurFer
Copy link
Member

Maybe it's something related with the changes in the gcp client?

@JorTurFer
Copy link
Member

Do you see errors in KEDA operator logs? Maybe we are not closing well the client on failures? Could this be related with #5429? (as the scaler cache is being refreshed on each error)

@JorTurFer
Copy link
Member

JorTurFer commented Feb 1, 2024

yeah, the new queryClient isn't closed, so if the scaler is being refreshed due to #5429, the connections aren't properly closed. I guess that it could be the root cause (I'll update my PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
3 participants