Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keda Operator - invalid memory address or nil pointer dereference during large scaler failures #6176

Open
jark-AB opened this issue Sep 18, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@jark-AB
Copy link

jark-AB commented Sep 18, 2024

Report

KEDA Version: 2.15.1
Running on Kubernetes 1.29+ {"version": "v1.29.7-eks-a18cd3a"}

I am running into a transient issue where i have multiple scaledObjects (464) in many namespaces. We use metricsAPI scaler to poll against an api pod in each namespace that dynamically scales off an endpoint.

This works extremely well and at scale. However, what we noticed is if there is an issue where multiple scaledObjects are failing at once (such as a networking or control plane issue), the Operator will run into a null pointer reference when it runs into many variations of:

 keda-operator-548d6df695-qfzgn keda-operator ERROR scale_handler	error getting scale decision	{"scaledObject.Namespace": "se-mcondon", "scaledObject.Name": "worker-powerbi-scaler", "scaler": "metricsAPIScaler", "error": "error requesting metrics endpoint: Get \"http://api.se-mcondon.svc.cluster.local:9001/api/v1/metrics\": dial tcp 10.100.28.160:9001: connect: connection refused"}

Can replace namespace with any of the 150 namespaces etc. This results in the Keda Operator restarting and prior to a restart the following nil pointer reference logs are shown:

Expected Behavior

I'd expect the Operator to not run into null pointers when Scalers fail at a large amount or run into transient issues.

Actual Behavior

Operator cannot get a scaling decision, spread across 500 scaledObjects and runs into nill pointer panic and restarts

Steps to Reproduce the Problem

  1. Deploy Keda Operator
  2. Launch 500 scaledObjects
  3. Run a situation where the API endpoint it is attempting to scale from refuses a connection or results in errors in getting scaled decision (such as mass deleting the api pods etc.)

Logs from KEDA operator

Sep 18 14:25:26 keda-operator-548d6df695-qfzgn keda-operator error panic: runtime error: invalid memory address or nil pointer dereference
Sep 18 14:25:26 keda-operator-548d6df695-qfzgn keda-operator [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1efa392]
Sep 18 14:25:26 keda-operator-548d6df695-qfzgn keda-operator goroutine 3144 [running]:
Sep 18 14:25:26 keda-operator-548d6df695-qfzgn keda-operator github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).updateScaleOnScaleTarget(0xc000c3c190?, {0x4a92e40, 0xc00a599c20}, 0xc00aa2d8c8, 0xc01b37fd36?, 0x1?)
	/workspace/pkg/scaling/executor/scale_scaledobjects.go:349 +0xb2
Sep 18 14:25:26 keda-operator-548d6df695-qfzgn keda-operator github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).doFallbackScaling(0xc000c3c190, {0x4a92e40, 0xc00a599c20}, 0xc00aa2d8c8, 0x4a92e40?, {{0x4a9cdf8?, 0xc00ac34ae0?}, 0x43ec18c?}, 0x1)
	/workspace/pkg/scaling/executor/scale_scaledobjects.go:234 +0x76
Sep 18 14:25:26 keda-operator-548d6df695-qfzgn keda-operator github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestScale(0xc000c3c190, {0x4a92e40, 0xc00a599c20}, 0xc00aa2d8c8, 0x0, 0x1, 0xc00a9f4090)
	/workspace/pkg/scaling/executor/scale_scaledobjects.go:169 +0xee5
Sep 18 14:25:26 keda-operator-548d6df695-qfzgn keda-operator github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers(0xc000661340, {0x4a92e40, 0xc00a599c20}, {0x4337560, 0xc00aa2d8c8}, {0x4a67b08, 0xc00aa467c0})
	/workspace/pkg/scaling/scale_handler.go:249 +0x455
Sep 18 14:25:26 keda-operator-548d6df695-qfzgn keda-operator github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop(0xc000661340, {0x4a92e40, 0xc00a599c20}, 0xc00aa243c0, {0x4337560, 0xc00aa2d8c8}, {0x4a67b08, 0xc00aa467c0}, 0x1)
	/workspace/pkg/scaling/scale_handler.go:182 +0x3eb
Sep 18 14:25:26 keda-operator-548d6df695-qfzgn keda-operator created by github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).HandleScalableObject in goroutine 413
	/workspace/pkg/scaling/scale_handler.go:128 +0x4ce

KEDA Version

2.15.1

Kubernetes Version

1.29

Platform

Amazon Web Services

Scaler Details

MetricsAPI

Anything else?

No response

@jark-AB jark-AB added the bug Something isn't working label Sep 18, 2024
@wozniakjan
Copy link
Member

today is a community bi-weekly call, I will add it to the agenda, you are welcome to attend as well :)

@tonylee-shopback
Copy link

I encountered the similar issue in 2.13.1

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14a5b20]

goroutine 1133 [running]:
github.com/kedacore/keda/v2/pkg/scaling/executor.(*scaleExecutor).RequestScale(0x4000a3b400, {0x41d8d78, 0x401c3c0cd0}, 0x401c390e00, 0x1, 0x0)
	/workspace/pkg/scaling/executor/scale_scaledobjects.go:39 +0x140
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers(0x40009705b0, {0x41d8d78, 0x401c3c0cd0}, {0x3ab0040?, 0x401c390e00?}, {0x41ae5d0, 0x401c7ce110})
	/workspace/pkg/scaling/scale_handler.go:249 +0x334
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop(0x4015aaecc0?, {0x41d8d78, 0x401c3c0cd0}, 0x401c7de000, {0x3ab0040, 0x401c390e00}, {0x41ae5d0, 0x401c7ce110}, 0xa0?)
	/workspace/pkg/scaling/scale_handler.go:182 +0x374
created by github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).HandleScalableObject in goroutine 341
	/workspace/pkg/scaling/scale_handler.go:128 +0x468

@wozniakjan
Copy link
Member

this was briefly discussed last week, possibly an issue with trigger cache invalidation. It may take some time before it gets fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To Triage
Development

No branches or pull requests

3 participants