Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keda-metrics-apiserver failing with connection refused and invalid memory address #6201

Open
zoechou opened this issue Oct 1, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@zoechou
Copy link

zoechou commented Oct 1, 2024

Report

We have two errors with the keda operator, we have to constantly restart the pods to temporarily fix the errors, but after a short time the error reappears.

The HPA was unable to compute the replica count: unable to get external metric 
example/p0-prometheus/$LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: 
prometheus-scaleobject-batch},MatchExpressions:[]LabelSelectorRequirement{}}: unable to fetch metrics from external metrics 
API: the server is currently unable to handle the request (get 
s0-prometheus.external.metrics.k8s.io)
I0930 08:05:49.140779       1 client.go:88] keda_metrics_adapter/provider "msg"="Waiting for establishing a gRPC connection to KEDA Metrics Server" 
E0930 08:05:49.220586       1 provider.go:91] keda_metrics_adapter/provider "msg"="timeout" "error"="timeout while waiting to establish gRPC connection to KEDA Metrics Service server" "server"="keda-operator.keda.svc.cluster.local:9666"
I0930 08:05:49.220625       1 trace.go:236] Trace[806696940]: "List" accept:application/vnd.kubernetes.protobuf, */*,audit-id:ecf19ae7-293c-42ed-a09c-033333468d8c,client:172.16.123.246,protocol:HTTP/2.0,resource:s0-prometheus,scope:namespace,url:/apis/external.metrics.k8s.io/v1beta1/namespaces/ms-algorithm-production/s0-prometheus,user-agent:kube-controller-manager/v1.28.12 (linux/amd64) kubernetes/18fffbd/system:serviceaccount:kube-system:horizontal-pod-autoscaler,verb:LIST (30-Sep-2024 08:04:49.136) (total time: 60084ms):
Trace[806696940]: [1m0.084304287s] [1m0.084304287s] END
E0930 08:05:49.220727       1 timeout.go:142] post-timeout activity - time-elapsed: 85.323659ms, GET "/apis/external.metrics.k8s.io/v1beta1/namespaces/ms-algorithm-production/s0-prometheus" result: runtime error: invalid memory address or nil pointer dereference
goroutine 9641 [running]:
...

Expected Behavior

  • Can establish a gRPC connection to KEDA Metrics Server
  • Will not encounter runtime error

Actual Behavior

Errors with the keda operator/keda metrics server, we have to constantly restart the pods to temporarily fix the errors, but after a short time the error reappears.

Steps to Reproduce the Problem

No special steps, it just happened occasionally

Logs from KEDA operator

No errors on keda operator, but on keda metrics server

I0930 08:05:49.140779       1 client.go:88] keda_metrics_adapter/provider "msg"="Waiting for establishing a gRPC connection to KEDA Metrics Server" 
E0930 08:05:49.220586       1 provider.go:91] keda_metrics_adapter/provider "msg"="timeout" "error"="timeout while waiting to establish gRPC connection to KEDA Metrics Service server" "server"="keda-operator.keda.svc.cluster.local:9666"
I0930 08:05:49.220625       1 trace.go:236] Trace[806696940]: "List" accept:application/vnd.kubernetes.protobuf, */*,audit-id:ecf19ae7-293c-42ed-a09c-033333468d8c,client:172.16.123.246,protocol:HTTP/2.0,resource:s0-prometheus,scope:namespace,url:/apis/external.metrics.k8s.io/v1beta1/namespaces/ms-algorithm-production/s0-prometheus,user-agent:kube-controller-manager/v1.28.12 (linux/amd64) kubernetes/18fffbd/system:serviceaccount:kube-system:horizontal-pod-autoscaler,verb:LIST (30-Sep-2024 08:04:49.136) (total time: 60084ms):
Trace[806696940]: [1m0.084304287s] [1m0.084304287s] END
E0930 08:05:49.220727       1 timeout.go:142] post-timeout activity - time-elapsed: 85.323659ms, GET "/apis/external.metrics.k8s.io/v1beta1/namespaces/ms-algorithm-production/s0-prometheus" result: runtime error: invalid memory address or nil pointer dereference
goroutine 9641 [running]:
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1()
    /workspace/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:110 +0x9c
panic({0x216ba20, 0x432cd40})
    /usr/local/go/src/runtime/panic.go:884 +0x1f4
...

KEDA Version

2.12.1

Kubernetes Version

1.28

Platform

Amazon Web Services

Scaler Details

Prometheus, AWS SQS

Anything else?

No response

@zoechou zoechou added the bug Something isn't working label Oct 1, 2024
@JorTurFer
Copy link
Member

Hello
I think that the problem is related with the operator pod rather than the metrics server pod. Do you see any error on KEDA operator pod? The metrics server is just a proxy between KEDA operator and the k8s API Server

@zoechou
Copy link
Author

zoechou commented Oct 1, 2024

At that moment, no errors on keda-operator, or the logs are being flushed by Reconciling ScaledObject events.

We launched two metrics-api-server pods, only one encountered the issue, it's work on other pod,
should it still related to keda-operator?

thanks

@JorTurFer
Copy link
Member

Do you see if the CPU is throttled?

@zoechou
Copy link
Author

zoechou commented Oct 3, 2024

No, the usage is quite low, attached the pod and related node resource request and utilization.

Collecting the keda-operator logs when we encountered issue on keda-metrics-server

2024/10/03 08:00:31 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
I1003 08:00:31.536914       1 leaderelection.go:250] attempting to acquire leader lease keda/operator.keda.sh...
I1003 08:01:40.152807       1 leaderelection.go:260] successfully acquired lease keda/operator.keda.sh
I1003 08:01:45.070930       1 request.go:697] Waited for 1.03611623s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/ms-algorithm-staging/scaledobjects/remote-fashionstyleoccasion-2-shared/status
I1003 08:01:55.071071       1 request.go:697] Waited for 1.255528513s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/remote-furnitureattributes-11-shared/status
I1003 08:02:05.077328       1 request.go:697] Waited for 1.452206135s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/cpu-product.item.11.17.openvino/status
I1003 08:02:16.258312       1 request.go:697] Waited for 1.038585252s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/cpu-eyewear.rotate.openvino/status
I1003 08:02:32.063725       1 request.go:697] Waited for 1.041897597s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/cpu-fabric.pattern.openvino/status
I1003 08:02:45.929433       1 request.go:697] Waited for 1.035127477s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/cpu-fabric.pattern.openvino/status
I1003 08:03:01.166719       1 request.go:697] Waited for 1.026740222s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/feature-insert-v13-19-0-v14-21-0/scaledobjects/cpu-product.item.12.05.openvino/status
I1003 08:03:21.398165       1 request.go:697] Waited for 1.002024811s due to client-side throttling, not priority and fairness, request: PATCH:https://172.20.0.1:443/apis/keda.sh/v1alpha1/namespaces/ms-algorithm-staging/scaledobjects/remote-fashionattributes-349-shared/status

keda-metrics-server

E1003 09:28:33.992579       1 provider.go:91] keda_metrics_adapter/provider "msg"="timeout" "error"="timeout while waiting to establish gRPC connection to KEDA Metrics Service server" "server"="keda-operator.keda.svc.cluster.local:9666"
I1003 09:28:33.992711       1 trace.go:236] Trace[1770246557]: "List" accept:application/vnd.kubernetes.protobuf, */*,audit-id:5511ba79-f070-46b0-a270-7fb36f256a07,client:172.16.123.246,protocol:HTTP/2.0,resource:s1-prometheus,scope:namespace,url:/apis/external.metrics.k8s.io/v1beta1/namespaces/feature-insert-v13-19-0-v14-21-0/s1-prometheus,user-agent:kube-controller-manager/v1.28.12 (linux/amd64) kubernetes/18fffbd/system:serviceaccount:kube-system:horizontal-pod-autoscaler,verb:LIST (03-Oct-2024 09:27:33.911) (total time: 60081ms):
Trace[1770246557]: [1m0.081078558s] [1m0.081078558s] END
E1003 09:28:33.992884       1 timeout.go:142] post-timeout activity - time-elapsed: 81.104291ms, GET "/apis/external.metrics.k8s.io/v1beta1/namespaces/feature-insert-v13-19-0-v14-21-0/s1-prometheus" result: runtime error: invalid memory address or nil pointer dereference
goroutine 22475 [running]:
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1()
    /workspace/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:110 +0x9c
panic({0x216ba20, 0x432cd40})

Screenshot 2024-10-03 at 5 27 45 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants