Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus metrics: Use ServiceMonitor instead of deprecated annotation mechanism #2290

Merged
merged 11 commits into from
Oct 20, 2021
24 changes: 23 additions & 1 deletion build/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -319,10 +319,11 @@ install: CRD_CLEANUP := true
install: LOG_LEVEL := "debug"
install: EXTERNAL_IP ?= $(shell $(DOCKER_RUN) kubectl get services agones-allocator -n agones-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
install: FEATURE_GATES ?= $(ALPHA_FEATURE_GATES)
install: SERVICE_MONITOR := false
install: HELM_ARGS ?=
install: $(ensure-build-image) install-custom-pull-secret
$(DOCKER_RUN) \
helm upgrade --install --atomic --namespace=agones-system \
helm upgrade --install --atomic --wait --namespace=agones-system \
--create-namespace \
--set agones.image.tag=$(VERSION),agones.image.registry=$(REGISTRY) \
--set agones.image.controller.pullPolicy=$(IMAGE_PULL_POLICY),agones.image.sdk.alwaysPull=$(ALWAYS_PULL_SIDECAR) \
Expand All @@ -333,6 +334,7 @@ install: $(ensure-build-image) install-custom-pull-secret
--set agones.crds.cleanupOnDelete=$(CRD_CLEANUP) \
--set agones.featureGates=$(FEATURE_GATES) \
--set agones.allocator.service.loadBalancerIP=$(EXTERNAL_IP) \
--set agones.metrics.serviceMonitor.enabled=$(SERVICE_MONITOR) \
$(HELM_ARGS) \
agones $(mount_path)/install/helm/agones/

Expand Down Expand Up @@ -553,6 +555,26 @@ uninstall-grafana: $(ensure-build-image)
$(DOCKER_RUN) helm uninstall grafana --namespace=metrics
$(DOCKER_RUN) kubectl delete -f $(mount_path)/build/grafana/

# setup prometheus-stack in the current cluster by default Persistent Volume Claims are requested.
setup-prometheus-stack: PVC ?= true
setup-prometheus-stack: PV_SIZE ?= 64Gi
setup-prometheus-stack: SCRAPE_INTERVAL=30s
setup-prometheus-stack: PASSWORD ?= admin
setup-prometheus-stack:
$(DOCKER_RUN) helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$(DOCKER_RUN) helm repo update
$(DOCKER_RUN) helm upgrade prometheus-stack prometheus-community/kube-prometheus-stack --install --wait --version 19.0.2 \
roberthbailey marked this conversation as resolved.
Show resolved Hide resolved
--namespace monitoring --create-namespace \
--set prometheus.server.global.scrape_interval=$(SCRAPE_INTERVAL) \
--set prometheus.server.persistentVolume.enabled=$(PVC) \
--set prometheus.server.persistentVolume.size=$(PV_SIZE) \
--set grafana.adminPassword=$(PASSWORD) \
$(HELM_ARGS) -f $(mount_path)/build/prometheus-stack.yaml

uninstall-prometheus-stack: $(ensure-build-image)
$(DOCKER_RUN) \
helm uninstall prometheus-stack --namespace=monitoring

helm-repo-update:
$(DOCKER_RUN) helm repo update

Expand Down
31 changes: 27 additions & 4 deletions build/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Table of Contents
* [make test-e2e-failure](#make-test-e2e-failure)
* [make setup-prometheus](#make-setup-prometheus)
* [make setup-grafana](#make-setup-grafana)
* [make setup-prometheus-stack](#make-setup-prometheus-stack)
roberthbailey marked this conversation as resolved.
Show resolved Hide resolved
* [make prometheus-portforward](#make-prometheus-portforward)
* [make grafana-portforward](#make-grafana-portforward)
* [make controller-portforward](#make-controller-portforward)
Expand Down Expand Up @@ -76,6 +77,7 @@ Table of Contents
* [make minikube-install](#make-minikube-install)
* [make minikube-setup-prometheus](#make-minikube-setup-prometheus)
* [make minikube-setup-grafana](#make-minikube-setup-grafana)
* [make minikube-setup-prometheus-stack](#make-minikube-setup-prometheus)
* [make minikube-prometheus-portforward](#make-minikube-prometheus-portforward)
* [make minikube-grafana-portforward](#make-minikube-grafana-portforward)
* [make minikube-test-e2e](#make-minikube-test-e2e)
Expand All @@ -88,6 +90,7 @@ Table of Contents
* [make kind-install](#make-kind-install)
* [make kind-setup-prometheus](#make-kind-setup-prometheus)
* [make kind-setup-grafana](#make-kind-setup-grafana)
* [make kind-setup-prometheus-stack](#make-kind-setup-prometheus-stack)
* [make kind-prometheus-portforward](#make-kind-prometheus-portforward)
* [make kind-grafana-portforward](#make-kind-grafana-portforward)
* [make kind-test-e2e](#make-kind-test-e2e)
Expand Down Expand Up @@ -485,7 +488,7 @@ Run controller failure portion of the end-to-end tests.

#### `make setup-prometheus`

Install Prometheus server using [Prometheus Community](https://prometheus-community.github.io/helm-charts)
Install Prometheus server using [Prometheus Community](https://prometheus-community.github.io/helm-charts)
chart into the current cluster.

By default all exporters and alertmanager is disabled.
Expand All @@ -500,13 +503,23 @@ Run helm repo update to get the mose recent charts.

#### `make setup-grafana`

Install Grafana server using [grafana community](https://grafana.github.io/helm-charts) chart into
Install Grafana server using [grafana community](https://grafana.github.io/helm-charts) chart into
the current cluster and setup [Agones dashboards with Prometheus datasource](./grafana/).

You can set your own password using the `PASSWORD` environment variable.

See [`make minikube-setup-grafana`](#make-minikube-setup-grafana) and [`make kind-setup-grafana`](#make-kind-setup-grafana) to run the installation on Minikube or Kind.

#### `make setup-prometheus-stack`

Install Prometheus-stack using [Prometheus Community](https://prometheus-community.github.io/helm-charts) chart into the current cluster.

By default only prometheus and grafana will installed, all exporters and alertmanager is disabled.

You can use this to collect Agones [Metrics](../site/content/en/docs/Guides/metrics.md) using ServiceMonitor.

See [`make minikube-setup-prometheus-stack`](#make-minikube-setup-prometheus-stack) and [`make kind-setup-prometheus-stack`](#make-kind-setup-prometheus-stack) to run the installation on Minikube or Kind.

#### `make prometheus-portforward`

Sets up port forwarding to the
Expand Down Expand Up @@ -671,6 +684,11 @@ Use this instead of `make setup-prometheus`, as it disables Persistent Volume Cl
Installs grafana into the Kubernetes cluster.
Use this instead of `make setup-grafana`, as it disables Persistent Volume Claim.

#### `make minikube-setup-prometheus-stack`

Installs prometheus-stack into the Kubernetes cluster.
Use this instead of `make setup-prometheus-stack`, as it disables Persistent Volume Claim.

#### `make minikube-prometheus-portforward`

The minikube version of [`make prometheus-portforward`](#make-prometheus-portforward) to setup
Expand Down Expand Up @@ -722,14 +740,19 @@ Use this instead of `make setup-prometheus`, as it disables Persistent Volume Cl
Installs grafana into the Kubernetes cluster.
Use this instead of `make setup-grafana`, as it disables Persistent Volume Claim.

#### `make kind-setup-prometheus-stack`

Installs prometheus-stack into the Kubernetes cluster.
Use this instead of `make setup-prometheus-stack`, as it disables Persistent Volume Claim.

#### `make kind-prometheus-portforward`

The minikube version of [`make prometheus-portforward`](#make-prometheus-portforward) to setup
The kind version of [`make prometheus-portforward`](#make-prometheus-portforward) to setup
roberthbailey marked this conversation as resolved.
Show resolved Hide resolved
port forwarding to the prometheus deployment.

#### `make kind-grafana-portforward`

The minikube version of [`make grafana-portforward`](#make-grafana-portforward) to setup
The kind version of [`make grafana-portforward`](#make-grafana-portforward) to setup
port forwarding to the grafana deployment.


Expand Down
4 changes: 4 additions & 0 deletions build/includes/kind.mk
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ kind-setup-prometheus:
kind-setup-grafana:
$(MAKE) setup-grafana DOCKER_RUN_ARGS="--network=host" PVC=false

kind-setup-prometheus-stack:
$(MAKE) setup-prometheus-stack DOCKER_RUN_ARGS="--network=host" PVC=false \
HELM_ARGS="--set prometheus.server.resources.requests.cpu=0,prometheus.server.resources.requests.memory=0"

# kind port forwarding controller web
kind-controller-portforward:
$(MAKE) controller-portforward DOCKER_RUN_ARGS="--network=host"
Expand Down
8 changes: 7 additions & 1 deletion build/includes/minikube.mk
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,19 @@ minikube-setup-prometheus:
DOCKER_RUN_ARGS="--network=host -v $(minikube_cert_mount)" \
PVC=false HELM_ARGS="--set server.resources.requests.cpu=0,server.resources.requests.memory=0"


# grafana on minkube with dashboards and prometheus datasource installed.
# we have to disable PVC as it's not supported on minkube.
minikube-setup-grafana:
$(MAKE) setup-grafana \
DOCKER_RUN_ARGS="--network=host -v $(minikube_cert_mount)"

# prometheus-stack on minkube
# we have to disable PVC as it's not supported on minkube.
minikube-setup-prometheus-stack:
$(MAKE) setup-prometheus-stack \
DOCKER_RUN_ARGS="--network=host -v $(minikube_cert_mount)" \
PVC=false HELM_ARGS="--set prometheus.server.resources.requests.cpu=0,prometheus.server.resources.requests.memory=0"

# minikube port forwarding
minikube-controller-portforward:
$(MAKE) controller-portforward DOCKER_RUN_ARGS="--network=host -v $(minikube_cert_mount)"
Expand Down
37 changes: 37 additions & 0 deletions build/prometheus-stack.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# disabled all not necessary chart
defaultRules:
enabled: false
alertmanager:
enabled: false
kubeApiServer:
enabled: false
kubelet:
enabled: false
kubeControllerManager:
enabled: false
coreDns:
enabled: false
kubeDns:
enabled: false
kubeEtcd:
enabled: false
kubeScheduler:
enabled: false
kubeProxy:
enabled: false
kubeStateMetrics:
enabled: false
nodeExporter:
enabled: false

prometheusOperator:
enabled: true
prometheus:
enabled: true
prometheusSpec:
# allow discovery of third party service monitor
serviceMonitorSelectorNilUsesHelmValues: false
grafana:
enabled: true
adminUser: admin
adminPassword: admin
9 changes: 7 additions & 2 deletions install/helm/agones/templates/controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ spec:
{{- end }}
{{- if and (.Values.agones.metrics.prometheusServiceDiscovery) (.Values.agones.metrics.prometheusEnabled) }}
prometheus.io/scrape: "true"
prometheus.io/port: {{ .Values.agones.controller.http.port | quote }}
prometheus.io/port: "8080"
roberthbailey marked this conversation as resolved.
Show resolved Hide resolved
prometheus.io/path: "/metrics"
{{- end }}
{{- if .Values.agones.controller.annotations }}
Expand Down Expand Up @@ -129,10 +129,15 @@ spec:
fieldPath: metadata.namespace
- name: CONTAINER_NAME
value: "agones-controller"
ports:
- name: webhooks
containerPort: 8081
- name: http
containerPort: 8080
livenessProbe:
httpGet:
path: /live
port: {{ .Values.agones.controller.http.port }}
port: http
initialDelaySeconds: {{ .Values.agones.controller.healthCheck.initialDelaySeconds }}
periodSeconds: {{ .Values.agones.controller.healthCheck.periodSeconds }}
failureThreshold: {{ .Values.agones.controller.healthCheck.failureThreshold }}
Expand Down
56 changes: 56 additions & 0 deletions install/helm/agones/templates/service-monitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{{- if and (.Values.agones.metrics.prometheusEnabled) (.Values.agones.metrics.serviceMonitor.enabled) }}
# Copyright 2018 Google LLC All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: agones-controller-monitor
namespace: {{ .Release.Namespace }}
labels:
agones.dev/role: controller
app: {{ template "agones.name" . }}
chart: {{ template "agones.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
selector:
matchLabels:
agones.dev/role: controller
roberthbailey marked this conversation as resolved.
Show resolved Hide resolved
endpoints:
- port: web
roberthbailey marked this conversation as resolved.
Show resolved Hide resolved
path: /metrics
interval: {{ .Values.agones.metrics.serviceMonitor.interval }}
relabelings:
- sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: "true"
- sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
targetLabel: __metrics_path__
regex: (.+)
- sourceLabels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
targetLabel: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- sourceLabels: [__meta_kubernetes_namespace]
action: replace
targetLabel: kubernetes_namespace
- sourceLabels: [__meta_kubernetes_pod_name]
action: replace
targetLabel: kubernetes_pod_name
{{- end }}
5 changes: 3 additions & 2 deletions install/helm/agones/templates/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ spec:
ports:
- name: webhooks
port: 443
targetPort: 8081
targetPort: webhooks
- name: web
port: {{ .Values.agones.controller.http.port }}
port: {{ .Values.agones.controller.http.port }}
targetPort: http
49 changes: 48 additions & 1 deletion install/helm/agones/templates/service/allocation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,54 @@ spec:
{{ toYaml .Values.agones.allocator.service.loadBalancerSourceRanges | indent 4 }}
{{- end }}
{{- end }}

{{- if .Values.agones.allocator.serviceInternal.enabled }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see .Values.agones.allocator.serviceInternal.name in the values file (there is an http.enabled though). I'm wondering if this should be on or off by default. There isn't much overhead of having an unused internal service in k8s (no new cloud resources need to be created) so I'm ok with leaving it on by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was untested changed after previous review notes, sorry.
I agree with your point and, I think, it's better to remove enabled flag at all for that service.

---
apiVersion: v1
kind: Service
roberthbailey marked this conversation as resolved.
Show resolved Hide resolved
metadata:
name: {{ .Values.agones.allocator.serviceInternal.name }}
namespace: {{ .Release.Namespace }}
labels:
multicluster.agones.dev/role: allocator
app: {{ template "agones.name" . }}
chart: {{ template "agones.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
{{- if .Values.agones.allocator.serviceInternal.annotations }}
annotations:
{{ toYaml .Values.agones.allocator.serviceInternal.annotations | indent 4 }}
{{- end }}
spec:
selector:
multicluster.agones.dev/role: allocator
ports:
- port: {{ .Values.agones.allocator.serviceInternal.http.port }}
name: {{ .Values.agones.allocator.serviceInternal.http.portName }}
targetPort: 8080
protocol: TCP
{{- end }}
{{- if and (.Values.agones.metrics.prometheusEnabled) (.Values.agones.metrics.serviceMonitor.enabled) }}
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: agones-allocator-monitor
namespace: {{ .Release.Namespace }}
labels:
multicluster.agones.dev/role: allocator
app: {{ template "agones.name" . }}
chart: {{ template "agones.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
selector:
matchLabels:
multicluster.agones.dev/role: allocator
endpoints:
- port: {{ .Values.agones.allocator.serviceInternal.http.portName }}
path: /metrics
interval: {{ .Values.agones.metrics.serviceMonitor.interval }}
{{- end }}
---
# Deploy pods to run the agones-allocator code
apiVersion: apps/v1
Expand Down
10 changes: 10 additions & 0 deletions install/helm/agones/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ agones:
stackdriverEnabled: false
stackdriverProjectID: ""
stackdriverLabels: ""
serviceMonitor:
enabled: false
interval: 30s
rbacEnabled: true
registerServiceAccounts: true
registerWebhooks: true
Expand Down Expand Up @@ -164,6 +167,13 @@ agones:
port: 443
portName: grpc
targetPort: 8443
serviceInternal:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder if this name should convey that this is used for scraping metrics? "internal service" is a bit vague, and might be confused as a way to allocate game servers from within the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to connect naming of that service with metrics. Yes, at that moment it's using for metrics scraping only, but in the future, I suppose, it will be used for other features too. In that case it will be confusing. It will require name change and become the problem for backward compatibility.
I'd like to make it opposite to service, which is available outside cluster (service section).
But, of course, I do not insist.
How would like you prefer to make it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, the allocator service doesn't expose much on port 8080 - just the health handlers (for liveness / readiness probes) and the metrics endpoint. So you can't actually use this service to do anything other than scrape metrics -- you can't, for instance, call this service to allocate a game server. So I don't see a problem making it obvious that this is strictly an internal metric gathering service.

One other question that occurred to me as I was thinking about this - should prometheus be scraping all allocator pods instead of using a service to pull metrics from one pod at a time (and likely different pods each time a new request is made)? If there was only one pod behind the service then using a service gives a stable name to find the pod, but when there are multiple pods and each one will have different stats, it seems like we should pull from all of them to get things like total aggregated allocations (which is the sum of allocations from all pods in the deployment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree!
I will rename it to serviceMetrics, ok?

ServiceMonitor will scrape metrics from all pods which are discovered by Service label.
So, don't worry, all replicas of allocator\controller will be scrapped :)
For more information how it works, refer to this doc.
https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md#servicemonitor

name: agones-allocator-service
annotations: {}
http:
enabled: true
roberthbailey marked this conversation as resolved.
Show resolved Hide resolved
port: 8080
portName: http
disableSecretCreation: false
generateTLS: true
generateClientTLS: true
Expand Down
Loading