Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus client: disable metric consistency #194

Merged
merged 1 commit into from
Aug 1, 2017

Conversation

brancz
Copy link
Member

@brancz brancz commented Aug 1, 2017

This reverts the client_golang library back to the state that we need and was previously released with v0.5.0. #185 updated the deps and I missed to note that we were overriding this manual patch.

Fixes #192

@fabxc @andyxning @matthiasr

@cofyc as you reported this, could you quickly try this out and report whether it fixes the behavior you were seeing?


This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 1, 2017
@cofyc
Copy link
Member

cofyc commented Aug 1, 2017

hi, I rebuild kube-state-metrics on fix-metric-consistency branch, /metrics does not report errors any more. All good!

Here is logs for review:

# git fetch upstream # `upstream` points to https://github.com/kubernetes/kube-state-metrics.
# git checkout fix-metric-consistency
# go install
# kube-state-metrics --apiserver=localhost:443 --in-cluster=false --port=8081 --kubeconfig=/var/run/kubernetes/a
dmin.kubeconfig
...
# curl -s http://localhost:8081/metrics
# HELP kube_daemonset_metadata_generation Sequence number representing a specific generation of the desired state.
# TYPE kube_daemonset_metadata_generation gauge
kube_daemonset_metadata_generation{daemonset="npd-v0.4.1",namespace="kube-system"} 1
# HELP kube_daemonset_status_current_number_scheduled The number of nodes running at least one daemon pod and are supposed to.
# TYPE kube_daemonset_status_current_number_scheduled gauge
kube_daemonset_status_current_number_scheduled{daemonset="npd-v0.4.1",namespace="kube-system"} 1
# HELP kube_daemonset_status_desired_number_scheduled The number of nodes that should be running the daemon pod.
# TYPE kube_daemonset_status_desired_number_scheduled gauge
kube_daemonset_status_desired_number_scheduled{daemonset="npd-v0.4.1",namespace="kube-system"} 1
# HELP kube_daemonset_status_number_misscheduled The number of nodes running a daemon pod but are not supposed to.
...

@andyxning
Copy link
Member

andyxning commented Aug 1, 2017

@brancz This reminds me about the PR #130 which add pod/node labels metric. It's a pity that github by default not rendering registry.go and i have not noticed that we have manually change the vendored code. :(

I have make a comparison about the diff to registry.go compared with [#130], sgtm.

@cofyc Thanks for your report and test. 👍

@brancz
Copy link
Member Author

brancz commented Aug 1, 2017

Yep, I actually just checked out the version from that PR 😄 . To be clear, what we're doing is pretty bad in the first place, but we haven't been able to get a more relaxed behavior into the Prometheus client lib (the constraint is only a logical one, not a technical one), so this is our only option right now.

@brancz
Copy link
Member Author

brancz commented Aug 1, 2017

Thanks for reviewing and testing!

@brancz brancz merged commit 804a73c into master Aug 1, 2017
@brancz brancz deleted the fix-metric-consistency branch August 1, 2017 10:10
@matthiasr
Copy link

There must be a less hacky way to do this. @beorn7?

@andyxning
Copy link
Member

@matthiasr Is any knob or possible to add a knob to control client-go's metrics labels inconsistent check?

@brancz
Copy link
Member Author

brancz commented Aug 1, 2017

I had started a discussion around this on IRC and got enough push-back on IRC that I didn't even start implementing it in the client-go library. If @beorn7 agrees I can still go ahead and implement this and open a PR, but given that we've released this without any problems in v0.5.0 I'd say it's an improvement for the future as the client-go lib also is relatively rarely released.

To be clear, I still think this is terrible, but I don't think we should let this be a blocker.

@beorn7
Copy link
Contributor

beorn7 commented Aug 7, 2017

I'm not sure I understand the exact issue here.

Is this about a way to give prometheus/client_golang a nob to allow inconsistent exposition? So far, the explicit goal was always to make it outright impossible to create an inconsistent exposition using a client library. Older versions of client_golang fell short on that, but that was a bug and not a feature. Keep in mind that Prometheus follows the robustness principle, i.e. it is liberal in what it accepts from others. However, the robustness principles has two sides, namely to be conservative in what you do. And that other side is (or should be) obeyed by the Prometheus client libraries, i.e. the should never expose anything that is not a strictly valid exposition of Prometheus metrics.

@beorn7
Copy link
Contributor

beorn7 commented Aug 7, 2017

Corollary of the above: Especially if we advertise the Prometheus exposition format as something other producers and consumers can use to mutual benefit, we have to expect consumers that would choke on inconsistent metrics in an exposition.

@andyxning
Copy link
Member

andyxning commented Aug 8, 2017

Especially if we advertise the Prometheus exposition format as something other producers and consumers can use to mutual benefit, we have to expect consumers that would choke on inconsistent metrics in an exposition.

@beorn7 Does Prometheus itself has some limitation on consuming inconsistent metrics? For now, it seems prometheus works properly with inconsistent metrics. But we need to be sure about that.

@beorn7
Copy link
Contributor

beorn7 commented Aug 8, 2017

Currently, Prometheus deals gracefully with label inconsistencies. (Missing labels are treated as if they existed with an empty label value.) Illegal characters in metric names and label names cause a failed scrape, as do invalid UTF-8 characters in a label value.
I'm not sure what duplicate label names with different type will do. Might be an error, or the type is ignored.

However, none of this is guaranteed. In principle, Prometheus could reject inconsistent metrics tomorrow. You can only be sure about what's in the spec.

@brancz
Copy link
Member Author

brancz commented Aug 8, 2017

I understand your concern, but realistically, if you redeploy an application and the target ends up being the same, but because your new version adds a new label to a metric, Prometheus will reject it? Seems extreme. This is the same use case. As the change of the label-set can only happen when you "re-deploy" a Kubernetes object.

To be clear, when we did this initially, @fabxc and I discussed this use case with @brian-brazil and concluded that in this use case it's valid to do this. I would argue having the "normal" registry in Prometheus' client_golang to be more relaxed about this behavior and the "pedantic" registry to perform this check. Usually I agree this kind of behavior/usage is most likely incorrect, but I can see especially exporters requiring this.

@brian-brazil
Copy link

Given that this is usually a mistake on the part of the user, we could take the stance that for users with a valid use case they can ensure all the metrics they expose are consistent. That is expose all of the time series in that metric with the superset of all used labelnames.

@beorn7
Copy link
Contributor

beorn7 commented Aug 8, 2017

I understand your concern, but realistically, if you redeploy an application and the target ends up being the same, but because your new version adds a new label to a metric, Prometheus will reject it? Seems extreme. This is the same use case.

Not quite. The inconsistencies would only appear if looking at multiple scrapes. Each scrape on its own will be consistent.

@beorn7
Copy link
Contributor

beorn7 commented Aug 8, 2017

That is expose all of the time series in that metric with the superset of all used labelnames.

Yes, that would be my recommendation with the current state of affairs.

What would you all think about a registry that would do this automatically? It would be an option autoFillLabels or something, in which case missing label dimensions would be filled with empty-valued labels instead of error'ing out.

@matthiasr
Copy link

Could that just be the default?

@matthiasr
Copy link

this is usually a mistake on the part of the user

It is not, having different services with different label sets is a completely valid thing to do, just as is having different containers with different label sets (google/cadvisor#1704). We should not be talking the "great fit" and "flexibility" of the label/value system if we also enforce a single global set of labels for everything and everyone. In the end, if we don't find a way to make this easier, it will only become unattractive to use client_golang which is exactly the opposite of what we want.

@brian-brazil
Copy link

In metrics terms, varying labels is a major anti-pattern and I don't think we should make it any way easier for users to do it. One of the big reasons the old java client got rewritten was that it allowed this.

If users want to do this, they should take on the complexity of it in their own code.

@beorn7
Copy link
Contributor

beorn7 commented Aug 28, 2017

Could that just be the default?

I'd say the default should be to tell you that your labels are inconsistent.

mfournier pushed a commit to mfournier/cadvisor that referenced this pull request Sep 7, 2017
This is a workaround for google#1704, restoring 0.25.0 /metrics behaviour for
0.27.1.

This is fundamentally the same patch as:
kubernetes/kube-state-metrics#194
while1malloc0 pushed a commit to while1malloc0/kube-state-metrics that referenced this pull request Jul 2, 2018
prometheus client: disable metric consistency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kube_service_labels metric has inconsistent label dimensions
7 participants