-
Notifications
You must be signed in to change notification settings - Fork 800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus metrics: Use ServiceMonitor instead of deprecated annotation mechanism #2290
Prometheus metrics: Use ServiceMonitor instead of deprecated annotation mechanism #2290
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Build Failed 😱 Build Id: 16edd22f-321f-4009-81f1-54bd46fae67b To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 83ec6eb0-5533-494a-a4f7-cf126be1fb34 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
To get past testing, you will need to run |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
Build Failed 😱 Build Id: cbfc03c4-d1a3-45a9-b18a-4b58534c960d To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: fde7d232-06ba-4cff-860f-524fbac1d56c To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 74a24e33-269a-4bd5-8363-55bc861d7213 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: c3e79aa5-308b-4c25-99cb-7b5a60e9ba31 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 0e9ec9c2-b528-46bb-9180-df9b0cbfa27c To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 373ad062-67ad-400d-ad61-f59d0675a2c9 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
@markmandel can you help me?
I'm not sure, is it connected with my changes? Because the same failure I see in this pr #2288 |
I've rolled back the helm install. It should be ready for e2e testing again. |
Build Failed 😱 Build Id: 3e7d5be4-f175-4e35-adc9-c6aa8a5aa6a6 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 39ddf0b1-a491-4189-aaf8-86598600d2dc The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Oh, tests is definitely unstable :( |
Hi @aLekSer @roberthbailey! |
Hi @zifter - I (and maybe others) was waiting until after the feature freeze (which ends with the release cut scheduled for today) to review new PRs. So you should get some feedback in the next day or two. |
There are some problems in CI build, which is not connected with my changes |
Build Succeeded 👏 Build Id: 6d5cb97d-5cf2-4ed8-a942-e4e5ff5615ca The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
install/helm/agones/values.yaml
Outdated
@@ -163,6 +166,13 @@ agones: | |||
port: 443 | |||
portName: grpc | |||
targetPort: 8443 | |||
serviceInternal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wonder if this name should convey that this is used for scraping metrics? "internal service" is a bit vague, and might be confused as a way to allocate game servers from within the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to connect naming of that service with metrics. Yes, at that moment it's using for metrics scraping only, but in the future, I suppose, it will be used for other features too. In that case it will be confusing. It will require name change and become the problem for backward compatibility.
I'd like to make it opposite to service, which is available outside cluster (service section).
But, of course, I do not insist.
How would like you prefer to make it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, the allocator service doesn't expose much on port 8080 - just the health handlers (for liveness / readiness probes) and the metrics endpoint. So you can't actually use this service to do anything other than scrape metrics -- you can't, for instance, call this service to allocate a game server. So I don't see a problem making it obvious that this is strictly an internal metric gathering service.
One other question that occurred to me as I was thinking about this - should prometheus be scraping all allocator pods instead of using a service to pull metrics from one pod at a time (and likely different pods each time a new request is made)? If there was only one pod behind the service then using a service gives a stable name to find the pod, but when there are multiple pods and each one will have different stats, it seems like we should pull from all of them to get things like total aggregated allocations (which is the sum of allocations from all pods in the deployment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree!
I will rename it to serviceMetrics, ok?
ServiceMonitor will scrape metrics from all pods which are discovered by Service label.
So, don't worry, all replicas of allocator\controller will be scrapped :)
For more information how it works, refer to this doc.
https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md#servicemonitor
I was going to update your branch but it looks like you need to resolve a conflict. |
Build Succeeded 👏 Build Id: 2d2fb5be-0829-4076-9972-8f50674de3c3 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix some review notes
Build Succeeded 👏 Build Id: 5089ea86-61ae-4224-94c2-9d45cb43459d The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Succeeded 👏 Build Id: 57cc7617-30a8-4652-a135-7b36f43f547b The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
@@ -63,7 +63,54 @@ spec: | |||
{{ toYaml .Values.agones.allocator.service.loadBalancerSourceRanges | indent 4 }} | |||
{{- end }} | |||
{{- end }} | |||
|
|||
{{- if .Values.agones.allocator.serviceInternal.enabled }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see .Values.agones.allocator.serviceInternal.name
in the values file (there is an http.enabled though). I'm wondering if this should be on or off by default. There isn't much overhead of having an unused internal service in k8s (no new cloud resources need to be created) so I'm ok with leaving it on by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was untested changed after previous review notes, sorry.
I agree with your point and, I think, it's better to remove enabled flag at all for that service.
install/helm/agones/values.yaml
Outdated
@@ -163,6 +166,13 @@ agones: | |||
port: 443 | |||
portName: grpc | |||
targetPort: 8443 | |||
serviceInternal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, the allocator service doesn't expose much on port 8080 - just the health handlers (for liveness / readiness probes) and the metrics endpoint. So you can't actually use this service to do anything other than scrape metrics -- you can't, for instance, call this service to allocate a game server. So I don't see a problem making it obvious that this is strictly an internal metric gathering service.
One other question that occurred to me as I was thinking about this - should prometheus be scraping all allocator pods instead of using a service to pull metrics from one pod at a time (and likely different pods each time a new request is made)? If there was only one pod behind the service then using a service gives a stable name to find the pod, but when there are multiple pods and each one will have different stats, it seems like we should pull from all of them to get things like total aggregated allocations (which is the sum of allocations from all pods in the deployment).
Just a few more comments - this is getting close! |
Build Failed 😱 Build Id: f06e5e98-ec4f-4dc4-a124-a2136dc369cc To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review notes fixed
install/helm/agones/values.yaml
Outdated
@@ -163,6 +166,13 @@ agones: | |||
port: 443 | |||
portName: grpc | |||
targetPort: 8443 | |||
serviceInternal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree!
I will rename it to serviceMetrics, ok?
ServiceMonitor will scrape metrics from all pods which are discovered by Service label.
So, don't worry, all replicas of allocator\controller will be scrapped :)
For more information how it works, refer to this doc.
https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md#servicemonitor
Build Succeeded 👏 Build Id: 1ab3d785-48fd-4600-bfaa-e6a74d8f1dba The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with this change over the many review cycles!
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: roberthbailey, zifter The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Build Succeeded 👏 Build Id: be1a445e-d671-4f13-90b7-a4d2780d61b5 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
What type of PR is this?
/kind feature
What this PR does / Why we need it:
It will allow collecting Prometheus metrics with prometheus-operator out of box.
Which issue(s) this PR fixes:
Closes #2262
Special notes for your reviewer: