Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break down Pending vs Terminating status #3048

Open
rgaiacs opened this issue Aug 2, 2024 · 1 comment
Open

Break down Pending vs Terminating status #3048

rgaiacs opened this issue Aug 2, 2024 · 1 comment
Assignees

Comments

@rgaiacs
Copy link
Collaborator

rgaiacs commented Aug 2, 2024

When I look at Graphana, I see

Screenshot 2024-08-02 at 16-14-01 Pod Activity - Dashbo

I checked

sum(label_replace(kube_pod_status_phase{phase="Pending",pod=~"jupyter-.*"}, "repo", "$1", "pod", "jupyter-(.+)-[^-]+")) by (repo)

on Prometheus and I got

Screenshot 2024-08-02 at 16-16-40 Prometheus Time Serie

Based on the information that Prometheus provided, the information on Graphana is wrong because Graphana mentions 25 pending pods but Prometheus only reports 6.

I looked at Kubernetes and the number of pending pods is only 6 as reported by Prometheus.

Screenshot 2024-08-02 161434

But we have 19 "Terminating" pods:

Screenshot 2024-08-02 161506

My understanding is that Graphana is merging "Pending" and "Terminating". I looked at the expression used by Graphana

sum(kube_pod_status_phase{pod=~\"^jupyter-.*\", kubernetes_namespace!=\"jhub-ns\"}) by (phase)

This expression looks good to me. This means that the problem is at the metric exporting part.

In kubernetes/kube-state-metrics#1013, someone said

I have a pod in status Terminating but whith kube-state-metrics:v2.7.0 can not see kube_pod_status_phase{phase="Terminating"}

@sgibson91 and @manics can you help me to have the pod "Terminating" state exported? Thanks!

@rgaiacs rgaiacs self-assigned this Aug 2, 2024
@manics
Copy link
Member

manics commented Aug 5, 2024

https://github.com/kubernetes/kube-state-metrics/blob/b1c2e0c1cf897202fa10da7b622e883df8a7a66e/docs/metrics/workload/pod-metrics.md#useful-metrics-queries
suggests
count(kube_pod_deletion_timestamp) by (namespace, pod) * count(kube_pod_status_reason{reason="NodeLost"} == 0) by (namespace, pod)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants