prometheus client: disable metric consistency #194

brancz · 2017-08-01T09:20:23Z

This reverts the client_golang library back to the state that we need and was previously released with v0.5.0. #185 updated the deps and I missed to note that we were overriding this manual patch.

Fixes #192

@fabxc @andyxning @matthiasr

@cofyc as you reported this, could you quickly try this out and report whether it fixes the behavior you were seeing?

This change is

cofyc · 2017-08-01T09:32:05Z

hi, I rebuild kube-state-metrics on fix-metric-consistency branch, /metrics does not report errors any more. All good!

Here is logs for review:

# git fetch upstream # `upstream` points to https://github.com/kubernetes/kube-state-metrics.
# git checkout fix-metric-consistency
# go install
# kube-state-metrics --apiserver=localhost:443 --in-cluster=false --port=8081 --kubeconfig=/var/run/kubernetes/a
dmin.kubeconfig
...
# curl -s http://localhost:8081/metrics
# HELP kube_daemonset_metadata_generation Sequence number representing a specific generation of the desired state.
# TYPE kube_daemonset_metadata_generation gauge
kube_daemonset_metadata_generation{daemonset="npd-v0.4.1",namespace="kube-system"} 1
# HELP kube_daemonset_status_current_number_scheduled The number of nodes running at least one daemon pod and are supposed to.
# TYPE kube_daemonset_status_current_number_scheduled gauge
kube_daemonset_status_current_number_scheduled{daemonset="npd-v0.4.1",namespace="kube-system"} 1
# HELP kube_daemonset_status_desired_number_scheduled The number of nodes that should be running the daemon pod.
# TYPE kube_daemonset_status_desired_number_scheduled gauge
kube_daemonset_status_desired_number_scheduled{daemonset="npd-v0.4.1",namespace="kube-system"} 1
# HELP kube_daemonset_status_number_misscheduled The number of nodes running a daemon pod but are not supposed to.
...

andyxning · 2017-08-01T09:35:14Z

@brancz This reminds me about the PR #130 which add pod/node labels metric. It's a pity that github by default not rendering registry.go and i have not noticed that we have manually change the vendored code. :(

I have make a comparison about the diff to registry.go compared with [#130], sgtm.

@cofyc Thanks for your report and test. 👍

brancz · 2017-08-01T09:57:14Z

Yep, I actually just checked out the version from that PR 😄 . To be clear, what we're doing is pretty bad in the first place, but we haven't been able to get a more relaxed behavior into the Prometheus client lib (the constraint is only a logical one, not a technical one), so this is our only option right now.

brancz · 2017-08-01T09:57:22Z

Thanks for reviewing and testing!

matthiasr · 2017-08-01T11:51:13Z

There must be a less hacky way to do this. @beorn7?

andyxning · 2017-08-01T13:00:10Z

@matthiasr Is any knob or possible to add a knob to control client-go's metrics labels inconsistent check?

brancz · 2017-08-01T13:05:49Z

I had started a discussion around this on IRC and got enough push-back on IRC that I didn't even start implementing it in the client-go library. If @beorn7 agrees I can still go ahead and implement this and open a PR, but given that we've released this without any problems in v0.5.0 I'd say it's an improvement for the future as the client-go lib also is relatively rarely released.

To be clear, I still think this is terrible, but I don't think we should let this be a blocker.

beorn7 · 2017-08-07T16:06:59Z

I'm not sure I understand the exact issue here.

Is this about a way to give prometheus/client_golang a nob to allow inconsistent exposition? So far, the explicit goal was always to make it outright impossible to create an inconsistent exposition using a client library. Older versions of client_golang fell short on that, but that was a bug and not a feature. Keep in mind that Prometheus follows the robustness principle, i.e. it is liberal in what it accepts from others. However, the robustness principles has two sides, namely to be conservative in what you do. And that other side is (or should be) obeyed by the Prometheus client libraries, i.e. the should never expose anything that is not a strictly valid exposition of Prometheus metrics.

beorn7 · 2017-08-07T16:09:22Z

Corollary of the above: Especially if we advertise the Prometheus exposition format as something other producers and consumers can use to mutual benefit, we have to expect consumers that would choke on inconsistent metrics in an exposition.

andyxning · 2017-08-08T02:24:00Z

Especially if we advertise the Prometheus exposition format as something other producers and consumers can use to mutual benefit, we have to expect consumers that would choke on inconsistent metrics in an exposition.

@beorn7 Does Prometheus itself has some limitation on consuming inconsistent metrics? For now, it seems prometheus works properly with inconsistent metrics. But we need to be sure about that.

beorn7 · 2017-08-08T12:00:26Z

Currently, Prometheus deals gracefully with label inconsistencies. (Missing labels are treated as if they existed with an empty label value.) Illegal characters in metric names and label names cause a failed scrape, as do invalid UTF-8 characters in a label value.
I'm not sure what duplicate label names with different type will do. Might be an error, or the type is ignored.

However, none of this is guaranteed. In principle, Prometheus could reject inconsistent metrics tomorrow. You can only be sure about what's in the spec.

brancz · 2017-08-08T13:37:36Z

I understand your concern, but realistically, if you redeploy an application and the target ends up being the same, but because your new version adds a new label to a metric, Prometheus will reject it? Seems extreme. This is the same use case. As the change of the label-set can only happen when you "re-deploy" a Kubernetes object.

To be clear, when we did this initially, @fabxc and I discussed this use case with @brian-brazil and concluded that in this use case it's valid to do this. I would argue having the "normal" registry in Prometheus' client_golang to be more relaxed about this behavior and the "pedantic" registry to perform this check. Usually I agree this kind of behavior/usage is most likely incorrect, but I can see especially exporters requiring this.

brian-brazil · 2017-08-08T13:42:37Z

Given that this is usually a mistake on the part of the user, we could take the stance that for users with a valid use case they can ensure all the metrics they expose are consistent. That is expose all of the time series in that metric with the superset of all used labelnames.

beorn7 · 2017-08-08T13:52:10Z

I understand your concern, but realistically, if you redeploy an application and the target ends up being the same, but because your new version adds a new label to a metric, Prometheus will reject it? Seems extreme. This is the same use case.

Not quite. The inconsistencies would only appear if looking at multiple scrapes. Each scrape on its own will be consistent.

beorn7 · 2017-08-08T13:58:53Z

That is expose all of the time series in that metric with the superset of all used labelnames.

Yes, that would be my recommendation with the current state of affairs.

What would you all think about a registry that would do this automatically? It would be an option autoFillLabels or something, in which case missing label dimensions would be filled with empty-valued labels instead of error'ing out.

matthiasr · 2017-08-28T17:23:17Z

Could that just be the default?

matthiasr · 2017-08-28T17:29:21Z

this is usually a mistake on the part of the user

It is not, having different services with different label sets is a completely valid thing to do, just as is having different containers with different label sets (google/cadvisor#1704). We should not be talking the "great fit" and "flexibility" of the label/value system if we also enforce a single global set of labels for everything and everyone. In the end, if we don't find a way to make this easier, it will only become unattractive to use client_golang which is exactly the opposite of what we want.

brian-brazil · 2017-08-28T17:39:12Z

In metrics terms, varying labels is a major anti-pattern and I don't think we should make it any way easier for users to do it. One of the big reasons the old java client got rewritten was that it allowed this.

If users want to do this, they should take on the complexity of it in their own code.

beorn7 · 2017-08-28T18:56:45Z

Could that just be the default?

I'd say the default should be to tell you that your labels are inconsistent.

This is a workaround for google#1704, restoring 0.25.0 /metrics behaviour for 0.27.1. This is fundamentally the same patch as: kubernetes/kube-state-metrics#194

prometheus client: disable metric consistency

prometheus client: disable metric consistency

a6f497b

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 1, 2017

brancz merged commit 804a73c into master Aug 1, 2017

brancz deleted the fix-metric-consistency branch August 1, 2017 10:10

matthiasr mentioned this pull request Aug 28, 2017

Inconsistent container metrics in prometheus route google/cadvisor#1704

Closed

while1malloc0 pushed a commit to while1malloc0/kube-state-metrics that referenced this pull request Jul 2, 2018

Merge pull request kubernetes#194 from kubernetes/fix-metric-consistency

ee7503f

prometheus client: disable metric consistency

xrstf mentioned this pull request Aug 1, 2022

fix _labels metrics kubermatic/kubermatic#10634

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prometheus client: disable metric consistency #194

prometheus client: disable metric consistency #194

brancz commented Aug 1, 2017 •

edited by k8s-reviewable

Loading

cofyc commented Aug 1, 2017 •

edited

Loading

andyxning commented Aug 1, 2017 •

edited

Loading

brancz commented Aug 1, 2017

brancz commented Aug 1, 2017

matthiasr commented Aug 1, 2017

andyxning commented Aug 1, 2017

brancz commented Aug 1, 2017 •

edited

Loading

beorn7 commented Aug 7, 2017

beorn7 commented Aug 7, 2017

andyxning commented Aug 8, 2017 •

edited

Loading

beorn7 commented Aug 8, 2017

brancz commented Aug 8, 2017

brian-brazil commented Aug 8, 2017

beorn7 commented Aug 8, 2017

beorn7 commented Aug 8, 2017

matthiasr commented Aug 28, 2017

matthiasr commented Aug 28, 2017

brian-brazil commented Aug 28, 2017

beorn7 commented Aug 28, 2017

prometheus client: disable metric consistency #194

prometheus client: disable metric consistency #194

Conversation

brancz commented Aug 1, 2017 • edited by k8s-reviewable Loading

cofyc commented Aug 1, 2017 • edited Loading

andyxning commented Aug 1, 2017 • edited Loading

brancz commented Aug 1, 2017

brancz commented Aug 1, 2017

matthiasr commented Aug 1, 2017

andyxning commented Aug 1, 2017

brancz commented Aug 1, 2017 • edited Loading

beorn7 commented Aug 7, 2017

beorn7 commented Aug 7, 2017

andyxning commented Aug 8, 2017 • edited Loading

beorn7 commented Aug 8, 2017

brancz commented Aug 8, 2017

brian-brazil commented Aug 8, 2017

beorn7 commented Aug 8, 2017

beorn7 commented Aug 8, 2017

matthiasr commented Aug 28, 2017

matthiasr commented Aug 28, 2017

brian-brazil commented Aug 28, 2017

beorn7 commented Aug 28, 2017

brancz commented Aug 1, 2017 •

edited by k8s-reviewable

Loading

cofyc commented Aug 1, 2017 •

edited

Loading

andyxning commented Aug 1, 2017 •

edited

Loading

brancz commented Aug 1, 2017 •

edited

Loading

andyxning commented Aug 8, 2017 •

edited

Loading