keda-metrics-apiserver failing with connection refused and invalid memory address #6201

zoechou · 2024-10-01T01:36:37Z

Report

We have two errors with the keda operator, we have to constantly restart the pods to temporarily fix the errors, but after a short time the error reappears.

The HPA was unable to compute the replica count: unable to get external metric 
example/p0-prometheus/$LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: 
prometheus-scaleobject-batch},MatchExpressions:[]LabelSelectorRequirement{}}: unable to fetch metrics from external metrics 
API: the server is currently unable to handle the request (get 
s0-prometheus.external.metrics.k8s.io)

I0930 08:05:49.140779       1 client.go:88] keda_metrics_adapter/provider "msg"="Waiting for establishing a gRPC connection to KEDA Metrics Server" 
E0930 08:05:49.220586       1 provider.go:91] keda_metrics_adapter/provider "msg"="timeout" "error"="timeout while waiting to establish gRPC connection to KEDA Metrics Service server" "server"="keda-operator.keda.svc.cluster.local:9666"
I0930 08:05:49.220625       1 trace.go:236] Trace[806696940]: "List" accept:application/vnd.kubernetes.protobuf, */*,audit-id:ecf19ae7-293c-42ed-a09c-033333468d8c,client:172.16.123.246,protocol:HTTP/2.0,resource:s0-prometheus,scope:namespace,url:/apis/external.metrics.k8s.io/v1beta1/namespaces/ms-algorithm-production/s0-prometheus,user-agent:kube-controller-manager/v1.28.12 (linux/amd64) kubernetes/18fffbd/system:serviceaccount:kube-system:horizontal-pod-autoscaler,verb:LIST (30-Sep-2024 08:04:49.136) (total time: 60084ms):
Trace[806696940]: [1m0.084304287s] [1m0.084304287s] END
E0930 08:05:49.220727       1 timeout.go:142] post-timeout activity - time-elapsed: 85.323659ms, GET "/apis/external.metrics.k8s.io/v1beta1/namespaces/ms-algorithm-production/s0-prometheus" result: runtime error: invalid memory address or nil pointer dereference
goroutine 9641 [running]:
...

Expected Behavior

Can establish a gRPC connection to KEDA Metrics Server
Will not encounter runtime error

Actual Behavior

Errors with the keda operator/keda metrics server, we have to constantly restart the pods to temporarily fix the errors, but after a short time the error reappears.

Steps to Reproduce the Problem

No special steps, it just happened occasionally

Logs from KEDA operator

No errors on keda operator, but on keda metrics server

I0930 08:05:49.140779       1 client.go:88] keda_metrics_adapter/provider "msg"="Waiting for establishing a gRPC connection to KEDA Metrics Server" 
E0930 08:05:49.220586       1 provider.go:91] keda_metrics_adapter/provider "msg"="timeout" "error"="timeout while waiting to establish gRPC connection to KEDA Metrics Service server" "server"="keda-operator.keda.svc.cluster.local:9666"
I0930 08:05:49.220625       1 trace.go:236] Trace[806696940]: "List" accept:application/vnd.kubernetes.protobuf, */*,audit-id:ecf19ae7-293c-42ed-a09c-033333468d8c,client:172.16.123.246,protocol:HTTP/2.0,resource:s0-prometheus,scope:namespace,url:/apis/external.metrics.k8s.io/v1beta1/namespaces/ms-algorithm-production/s0-prometheus,user-agent:kube-controller-manager/v1.28.12 (linux/amd64) kubernetes/18fffbd/system:serviceaccount:kube-system:horizontal-pod-autoscaler,verb:LIST (30-Sep-2024 08:04:49.136) (total time: 60084ms):
Trace[806696940]: [1m0.084304287s] [1m0.084304287s] END
E0930 08:05:49.220727       1 timeout.go:142] post-timeout activity - time-elapsed: 85.323659ms, GET "/apis/external.metrics.k8s.io/v1beta1/namespaces/ms-algorithm-production/s0-prometheus" result: runtime error: invalid memory address or nil pointer dereference
goroutine 9641 [running]:
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1()
    /workspace/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:110 +0x9c
panic({0x216ba20, 0x432cd40})
    /usr/local/go/src/runtime/panic.go:884 +0x1f4
...

KEDA Version

2.12.1

Kubernetes Version

1.28

Platform

Amazon Web Services

Scaler Details

Prometheus, AWS SQS

Anything else?

No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2024-10-01T06:54:56Z

Hello
I think that the problem is related with the operator pod rather than the metrics server pod. Do you see any error on KEDA operator pod? The metrics server is just a proxy between KEDA operator and the k8s API Server

zoechou · 2024-10-01T09:22:53Z

At that moment, no errors on keda-operator, or the logs are being flushed by Reconciling ScaledObject events.

We launched two metrics-api-server pods, only one encountered the issue, it's work on other pod,
should it still related to keda-operator?

thanks

JorTurFer · 2024-10-02T22:26:54Z

Do you see if the CPU is throttled?

zoechou · 2024-10-03T09:30:12Z

No, the usage is quite low, attached the pod and related node resource request and utilization.

Collecting the keda-operator logs when we encountered issue on keda-metrics-server

2024/10/03 08:00:31 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
I1003 08:00:31.536914       1 leaderelection.go:250] attempting to acquire leader lease keda/operator.keda.sh...
I1003 08:01:40.152807       1 leaderelection.go:260] successfully acquired lease keda/operator.keda.sh
I1003 08:01:45.070930       1 request.go:697] Waited for 1.03611623s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/ms-algorithm-staging/scaledobjects/remote-fashionstyleoccasion-2-shared/status
I1003 08:01:55.071071       1 request.go:697] Waited for 1.255528513s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/remote-furnitureattributes-11-shared/status
I1003 08:02:05.077328       1 request.go:697] Waited for 1.452206135s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/cpu-product.item.11.17.openvino/status
I1003 08:02:16.258312       1 request.go:697] Waited for 1.038585252s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/cpu-eyewear.rotate.openvino/status
I1003 08:02:32.063725       1 request.go:697] Waited for 1.041897597s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/cpu-fabric.pattern.openvino/status
I1003 08:02:45.929433       1 request.go:697] Waited for 1.035127477s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/cpu-fabric.pattern.openvino/status
I1003 08:03:01.166719       1 request.go:697] Waited for 1.026740222s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/cpu-product.item.12.05.openvino/status
I1003 08:03:21.398165       1 request.go:697] Waited for 1.002024811s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/ms-algorithm-staging/scaledobjects/remote-fashionattributes-349-shared/status

keda-metrics-server

E1003 09:28:33.992579       1 provider.go:91] keda_metrics_adapter/provider "msg"="timeout" "error"="timeout while waiting to establish gRPC connection to KEDA Metrics Service server" "server"="keda-operator.keda.svc.cluster.local:9666"
I1003 09:28:33.992711       1 trace.go:236] Trace[1770246557]: "List" accept:application/vnd.kubernetes.protobuf, */*,audit-id:5511ba79-f070-46b0-a270-7fb36f256a07,client:172.16.123.246,protocol:HTTP/2.0,resource:s1-prometheus,scope:namespace,url:/apis/external.metrics.k8s.io/v1beta1/namespaces/feature-insert-v13-19-0-v14-21-0/s1-prometheus,user-agent:kube-controller-manager/v1.28.12 (linux/amd64) kubernetes/18fffbd/system:serviceaccount:kube-system:horizontal-pod-autoscaler,verb:LIST (03-Oct-2024 09:27:33.911) (total time: 60081ms):
Trace[1770246557]: [1m0.081078558s] [1m0.081078558s] END
E1003 09:28:33.992884       1 timeout.go:142] post-timeout activity - time-elapsed: 81.104291ms, GET "/apis/external.metrics.k8s.io/v1beta1/namespaces/feature-insert-v13-19-0-v14-21-0/s1-prometheus" result: runtime error: invalid memory address or nil pointer dereference
goroutine 22475 [running]:
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1()
    /workspace/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:110 +0x9c
panic({0x216ba20, 0x432cd40})

zoechou added the bug Something isn't working label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keda-metrics-apiserver failing with connection refused and invalid memory address #6201

keda-metrics-apiserver failing with connection refused and invalid memory address #6201

zoechou commented Oct 1, 2024

JorTurFer commented Oct 1, 2024

zoechou commented Oct 1, 2024

JorTurFer commented Oct 2, 2024

zoechou commented Oct 3, 2024

keda-metrics-apiserver failing with connection refused and invalid memory address #6201

keda-metrics-apiserver failing with connection refused and invalid memory address #6201

Comments

zoechou commented Oct 1, 2024

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Oct 1, 2024

zoechou commented Oct 1, 2024

JorTurFer commented Oct 2, 2024

zoechou commented Oct 3, 2024