-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid dupe labels in prom metrics #2194
Avoid dupe labels in prom metrics #2194
Conversation
Since the # of containers shouldn't be massive on a single machine this is probably fine for memory allocation.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Hi @blakebarnett. Thanks for your PR. I'm waiting for a google or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I signed it! |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
/ok-to-test |
The following files are not properly formatted: |
metrics/prometheus.go
Outdated
sl := sanitizeLabelName(l) | ||
for _, x := range labels { | ||
if sl != x { | ||
duplicate = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will stay permanently true, and we will skip all subsequent labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, fixing
metrics/prometheus.go
Outdated
break | ||
} | ||
} | ||
if duplicate != true { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/duplicate != true/!duplicate
metrics/prometheus.go
Outdated
@@ -1155,8 +1155,19 @@ func (c *PrometheusCollector) collectContainersInfo(ch chan<- prometheus.Metric) | |||
values := make([]string, 0, len(rawLabels)) | |||
labels := make([]string, 0, len(rawLabels)) | |||
containerLabels := c.containerLabelsFunc(cont) | |||
duplicate := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just declare duplicate
inside the for l := range rawLabels
block? Then you don't need to reset it to false each loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, confused myself by doing this a different way before submitting the PR. Thanks :)
} | ||
} | ||
if !duplicate { | ||
labels = append(labels, sl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we end up with fewer labels than values? Should we move the values = append(values, containerLabels[l])
statement inside here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering that after the test failure, but this shouldn't change that behavior right? It will only exclude a label if a duplicate label already exists and the value will still get set for that label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be surprised if prometheus didn't yell at us if we tried to, for example, use a description with 3 labels, but then provide 4 label values when creating the metric. What you are implicitly doing here is just using the first occurrence of a given sanitized label, and ignoring subsequent ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah I see. It could happen if someone provides multiple permutations of a label that all normalize to the same thing. Should we just throw them out in that case? I can't think of a great default behavior there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ on the same container, it should hopefully be a rare edge-case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just picking the first is a fine behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No #2181 was node-wide. 2 separate containers (separate pods) with annotations that normalize to the same thing caused the panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I get the difference now. I think we would still always get more values than labels, since if the label isn't present, we still add it with the value "".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, in fact I noticed when running without --store_container_labels=false
all labels that were present on any of the containers on the host showed up with empty values for all container metrics in prometheus, that was what made me look into the cgroup whitelisting initially and then I noticed this crash behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... Prometheus requires that all metric streams in a given scrape have the same set of labels. So our workaround is just to add empty values for all labels we don't have.
We noticed a something similar to this: #2183 with This is without any changes and without this PR applied, I discovered it when validating this change. |
Feel free to close if you are no longer working on this. IIRC, this re-introduces #1704 in its current form. Let me know if that is not the case. |
Sorry, lost track of this one. This should be fine to go in. The issue I mentioned above about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Fixes #2181