-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the docs for all the grafana dashboards. #21795
base: main
Are you sure you want to change the base?
Conversation
Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement Learn more about why HashiCorp requires a CLA and what the CLA includes 1 out of 2 committers have signed the CLA.
Lorin Lorin Kaygalak seems not to be a GitHub user. Have you signed the CLA already but the status is still pending? Recheck it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YasminLorinKaygalak Here is a preliminary review that outlines the repeated problems to correct.
For each of the reference pages, please implement the following three changes to each of the metrics and their descriptions:
- Sentence case in headings
- Line break between heading and list
Grafana query
instead ofMetric
Then, remove the colons from the headings and ensure that there are sentences between each heading. Feel free to use the suggestions in this review as templates for each of the pages.
Don't worry about rewriting all of the descriptions at this time. Let's get these repeated formatting issues fixed first!
website/content/docs/connect/observability/grafanadashboards/consuldataplanedashboard.mdx
Outdated
Show resolved
Hide resolved
website/content/docs/connect/observability/grafanadashboards/index.mdx
Outdated
Show resolved
Hide resolved
|
||
# Consul DataPlane Dashboard | ||
|
||
The **Consul DataPlane Dashboard** provides a comprehensive view of the service health, performance, and resource utilization within the Consul service mesh. It enables operators to monitor key metrics at both the cluster and service levels, helping ensure service reliability and performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The **Consul DataPlane Dashboard** provides a comprehensive view of the service health, performance, and resource utilization within the Consul service mesh. It enables operators to monitor key metrics at both the cluster and service levels, helping ensure service reliability and performance. | |
This page provides reference information about the Grafana dashboard configuration included in [this GitHub repository](https://github.com/YasminLorinKaygalak/GrafanaDemo/tree/main). The Consul dataplane dashboard provides a comprehensive view of the service health, performance, and resource utilization within the Consul service mesh. | |
You can monitor key metrics at both the cluster and service levels with this dashboard. It can help you ensure service reliability and performance. |
Besides including a link to what I think is the repo this doc references, these suggestions:
- Make style guide edits for capitalization and formatting
- Follow our desired formatting for the beginning of reference pages
- Speaks directly to the reader ("you") instead of referring to them as an "operator" or "user"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix this link once the PR is complete for the dashboards.
website/content/docs/connect/observability/grafanadashboards/consuldataplanedashboard.mdx
Outdated
Show resolved
Hide resolved
website/content/docs/connect/observability/grafanadashboards/consuldataplanedashboard.mdx
Outdated
Show resolved
Hide resolved
website/content/docs/connect/observability/grafanadashboards/consuldataplanedashboard.mdx
Outdated
Show resolved
Hide resolved
|
||
## Enabling Observability | ||
|
||
The following script is the configuration needed to enable the observability tools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following script is the configuration needed to enable the observability tools. | |
Add the following configurations to your Consul Helm chart to enable the observability tools in [the sample repo](https://github.com/YasminLorinKaygalak/GrafanaDemo/tree/main). |
<CodeTabs tabs={[ "Kubernetes YAML"]}> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<CodeTabs tabs={[ "Kubernetes YAML"]}> |
Code tabs are unnecessary since there aren't other tabs. could be used if you want to highlight specific lines in the example configuration.
For the configuration - are all of these values required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one thing about these docs, is we really can only enable prometheus in our helm chart. So to actually see the dashboards on Grafana, the user needs to deploy their own Grafana. I feel like that may be more of a tutorial thing? But we can for sure only include the values that apply to enabling prometheus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment about this section: https://github.com/hashicorp/consul/pull/21795/files#r1791854125
website/content/docs/connect/observability/grafanadashboards/consuldataplanedashboard.mdx
Outdated
Show resolved
Hide resolved
website/content/docs/connect/observability/grafanadashboards/consuldataplanedashboard.mdx
Outdated
Show resolved
Hide resolved
…onsuldataplanedashboard.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
…ndex.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
…onsuldataplanedashboard.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
…onsuldataplanedashboard.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
…onsuldataplanedashboard.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
|
||
- **Consul Server Dashboard**: Provides detailed monitoring of Consul servers, tracking key metrics like server health, CPU and memory usage, disk I/O, and network performance. This dashboard is critical for ensuring the stability and performance of Consul servers within the service mesh. | ||
|
||
## Enabling Observability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Enabling Observability | |
## Enabling Prometheus | |
The Helm chart provides configuration to enable a demo Prometheus server. https://developer.hashicorp.com/consul/docs/k8s/helm#prometheus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boruszak I think the above may be all that we can say here? Basically we can install a demo prometheus server, but it is really on the user to deploy Prometheus/Loki/Grafana, and just upload our dashboards into Grafana.
…onsuldataplanedashboard.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
…ndex.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
…onsuldataplanedashboard.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
…onsuldataplanedashboard.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Do we care about ordering these alphabetically in the sidebar? @boruszak |
|
||
- **Consul service dashboard**: Tracks key metrics for Envoy proxies at the cluster and service levels, ensuring the performance and reliability of individual services within the mesh. | ||
|
||
- **Consul dataPlane dashboard**: Offers a comprehensive overview of service health and performance, including request success rates, resource utilization (CPU and memory), active connections, and cluster health. It helps operators maintain service reliability and optimize resource usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Consul dataPlane dashboard**: Offers a comprehensive overview of service health and performance, including request success rates, resource utilization (CPU and memory), active connections, and cluster health. It helps operators maintain service reliability and optimize resource usage. | |
- **Consul dataplane dashboard**: Offers a comprehensive overview of service health and performance, including request success rates, resource utilization (CPU and memory), active connections, and cluster health. It helps operators maintain service reliability and optimize resource usage. |
Also I don't know what the standard is for this, but is there a way to make these look a bit more readable? |
- **Grafana query:** `sum(envoy_server_live{app=~"$service"})` | ||
- **Description:** Displays the total number of live Envoy proxies currently running in the service mesh. It helps track the overall availability of services and identify any outages or other widespread issues in the service mesh. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Grafana query:** `sum(envoy_server_live{app=~"$service"})` | |
- **Description:** Displays the total number of live Envoy proxies currently running in the service mesh. It helps track the overall availability of services and identify any outages or other widespread issues in the service mesh. | |
**Description:** Displays the total number of live Envoy proxies currently running in the service mesh. It helps track the overall availability of services and identify any outages or other widespread issues in the service mesh. | |
<CodeBlockConfig heading="Grafana query"> | |
``` | |
sum(envoy_server_live{app=~"$service"}) | |
``` | |
</CodeBlockConfig> |
To meet the request from @missylbytes to make the Grafana query easy to copy-and-paste, I'd suggest making these formatting changes to each of the sections:
- Remove unordered list
- Move Description above the Grafana query
- Use the component with the heading set to "Grafana query" to render the code block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block should also specify the language to enable syntax highlighting.
<CodeBlockConfig heading="Grafana query" language="promql">
Alternatively you can place the promql
directly after the ``` that signifies the start of the code block.
|
||
You can monitor key metrics at both the cluster and service levels with this dashboard. It can help you ensure service reliability and performance. | ||
|
||
![Preview of the Consul dataplane dashboard](../../../../public/img/grafana/consul-dataplane-dashboard.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![Preview of the Consul dataplane dashboard](../../../../public/img/grafana/consul-dataplane-dashboard.png) | |
![Preview of the Consul dataplane dashboard](/public/img/grafana/consul-dataplane-dashboard.png) |
This should be an absolute path.
|
||
# Consul dataplane monitoring dashboard | ||
|
||
This page provides reference information about the Grafana dashboard configuration included in [this GitHub repository](https://github.com/YasminLorinKaygalak/GrafanaDemo/tree/main). The Consul dataplane dashboard provides a comprehensive view of the service health, performance, and resource utilization within the Consul service mesh. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be modified to reference code that exists in the Consul repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the PR just merged we will update it.
- **Grafana query:** `sum(envoy_server_live{app=~"$service"})` | ||
- **Description:** Displays the total number of live Envoy proxies currently running in the service mesh. It helps track the overall availability of services and identify any outages or other widespread issues in the service mesh. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block should also specify the language to enable syntax highlighting.
<CodeBlockConfig heading="Grafana query" language="promql">
Alternatively you can place the promql
directly after the ``` that signifies the start of the code block.
layout: docs | ||
page_title: Dashboard for Consul k8s control plane metrics | ||
description: >- | ||
This documentation provides an overview of the Consul K8s Dashboard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation provides an overview of the Consul K8s Dashboard | |
This documentation provides an overview of the Consul Kubernetes Dashboard |
This documentation provides an overview of the Consul K8s Dashboard | ||
--- | ||
|
||
# Consul k8s monitoring (Control Plane) dashboard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Consul k8s monitoring (Control Plane) dashboard | |
# Consul Kubernetes monitoring (Control Plane) dashboard |
|
||
- **Grafana query:** `rate(container_cpu_usage_seconds_total{pod=~".*-connect-injector-.*", | ||
container="sidecar-injector"}[5m])` | ||
- **Description:** Tracks the CPU usage of the Connect Injector, which is responsible for injecting Envoy sidecars. Monitoring this helps ensure that Connect Injector has adequate CPU resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Description:** Tracks the CPU usage of the Connect Injector, which is responsible for injecting Envoy sidecars. Monitoring this helps ensure that Connect Injector has adequate CPU resources. | |
- **Description:** Tracks the CPU usage of the Connect Injector, which is responsible for injecting Envoy sidecars and other operations within the mesh. Monitoring this helps ensure that Connect Injector has adequate CPU resources. |
The connect-injector
process also acts as the controller for API Gateway.
### Transaction apply time | ||
|
||
- **Grafana query:** `consul_txn_apply` | ||
- **Description:** Tracks the time spent applying transaction operations in Consul, providing insights into potential bottlenecks in transactional workloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Description:** Tracks the time spent applying transaction operations in Consul, providing insights into potential bottlenecks in transactional workloads. | |
- **Description:** Tracks the time spent applying transaction operations in Consul, providing insights into potential bottlenecks in transaction operations. |
### Catalog operation time | ||
|
||
- **Grafana query:** `consul_catalog_register`, `consul_catalog_deregister` | ||
- **Description:** Measures the time taken to complete catalog register or deregister operations. Spikes in this metric can indicate performance issues within the catalog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Description:** Measures the time taken to complete catalog register or deregister operations. Spikes in this metric can indicate performance issues within the catalog. | |
- **Description:** Measures the time taken to complete catalog register or deregister operations. |
Spikes in these values just mean that a large number of services were registered, or deregistered. It does not necessarily mean that there is a performance issue.
### Total logs | ||
|
||
- **Grafana query:** `sum(count_over_time(({container="consul-dataplane",namespace=~"$namespace"})[$__interval]))` | ||
- **Description:** This metric counts the total number of log lines produced by Consul DataPlane containers. It provides an overview of the volume of logs being generated for a specific namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Description:** This metric counts the total number of log lines produced by Consul DataPlane containers. It provides an overview of the volume of logs being generated for a specific namespace. | |
- **Description:** This metric counts the total number of log lines produced by Consul dataplane containers. It provides an overview of the volume of logs being generated for a specific namespace. |
- `p50`: `histogram_quantile(0.50, sum by(le) (rate(envoy_cluster_upstream_rq_time_bucket{kubernetes_namespace=~"$namespace", local_cluster=~"$service"}[5m])))` | ||
- `p75`: `histogram_quantile(0.75, sum by(le) (rate(envoy_cluster_upstream_rq_time_bucket{kubernetes_namespace=~"$namespace", local_cluster=~"$service"}[5m])))` | ||
- `p90`: `histogram_quantile(0.90, sum by(le) (rate(envoy_cluster_upstream_rq_time_bucket{kubernetes_namespace=~"$namespace", local_cluster=~"$service"}[5m])))` | ||
- `p99.9`: `histogram_quantile(0.999, sum by(le) (rate(envoy_cluster_upstream_rq_time_bucket{kubernetes_namespace=~"$namespace", local_cluster=~"$service"}[5m])))` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try rendering this in a CodeTabs block. It might display a little better than multiple code stanzas in an unordered list.
Description
NET-11158
Added the docs for the grafana dashboards.
PR Checklist