hashicorp · YasminLorinKaygalak · Sep 28, 2024 · Oct 3, 2024 · Oct 3, 2024 · Oct 3, 2024
diff --git a/.changelog/21795.txt b/.changelog/21795.txt
@@ -0,0 +1,3 @@
+```release-note:feature
+docs: added the docs for the grafana dashboards
+```
@@ -0,0 +1,91 @@
+---
+layout: docs
+page_title: Dashboard for Consul dataplane metrics
+description: >-
+  This Grafana dashboard that provides Consul dataplane metrics on Kubernetes deployments. Learn about the Grafana queries that produce the metrics and visualizations in this dashboard.
+---
+
+# Consul dataplane monitoring dashboard
+
+This page provides reference information about the Grafana dashboard configuration included in [this GitHub repository](https://github.com/YasminLorinKaygalak/GrafanaDemo/tree/main). The Consul dataplane dashboard provides a comprehensive view of the service health, performance, and resource utilization within the Consul service mesh. 
+
+You can monitor key metrics at both the cluster and service levels with this dashboard. It can help you ensure service reliability and performance.
+
+![Preview of the Consul dataplane dashboard](../../../../public/img/grafana/consul-dataplane-dashboard.png)
-![Preview of the Consul dataplane dashboard](../../../../public/img/grafana/consul-dataplane-dashboard.png)
+![Preview of the Consul dataplane dashboard](/public/img/grafana/consul-dataplane-dashboard.png)
-![Preview of the Consul dataplane dashboard](../../../../public/img/grafana/consul-dataplane-dashboard.png)
+![Preview of the Consul dataplane dashboard](/public/img/grafana/consul-dataplane-dashboard.png)
+
+## Consul dataplane metrics
+
+The Consul dataplane dashboard provides the following information about service mesh operations.
+
+### Live service count
+
+- **Grafana query:** `sum(envoy_server_live{app=~"$service"})`
+- **Description:** Displays the total number of live Envoy proxies currently running in the service mesh. It helps track the overall availability of services and identify any outages or other widespread issues in the service mesh.
- **Grafana query:** `sum(envoy_server_live{app=~"$service"})`
- **Description:** Displays the total number of live Envoy proxies currently running in the service mesh. It helps track the overall availability of services and identify any outages or other widespread issues in the service mesh.
+**Description:** Displays the total number of live Envoy proxies currently running in the service mesh. It helps track the overall availability of services and identify any outages or other widespread issues in the service mesh.
+
+<CodeBlockConfig heading="Grafana query">
+```
+sum(envoy_server_live{app=~"$service"})
+```
+</CodeBlockConfig>
- **Grafana query:** `sum(envoy_server_live{app=~"$service"})`
- **Description:** Displays the total number of live Envoy proxies currently running in the service mesh. It helps track the overall availability of services and identify any outages or other widespread issues in the service mesh.
+**Description:** Displays the total number of live Envoy proxies currently running in the service mesh. It helps track the overall availability of services and identify any outages or other widespread issues in the service mesh.
+
+<CodeBlockConfig heading="Grafana query">
+```
+sum(envoy_server_live{app=~"$service"})
+```
+</CodeBlockConfig>
+
+### Total request success rate
+
+- **Grafana query:** `sum(irate(envoy_cluster_upstream_rq_xx{...}[10m]))`
+- **Description:** Tracks the percentage of successful requests across the service mesh. It excludes 4xx and 5xx response codes to focus on operational success. Use it to monitor the overall reliability of your services.
- **Description:** Tracks the percentage of successful requests across the service mesh. It excludes 4xx and 5xx response codes to focus on operational success. Use it to monitor the overall reliability of your services.
+- **Description:** Tracks the percentage of successful requests across the service mesh. Use it to monitor the overall reliability of your services.
- **Description:** Tracks the percentage of successful requests across the service mesh. It excludes 4xx and 5xx response codes to focus on operational success. Use it to monitor the overall reliability of your services.
+- **Description:** Tracks the percentage of successful requests across the service mesh. Use it to monitor the overall reliability of your services.
+
+### Total failed requests
+
+- **Grafana query:** `sum(increase(envoy_cluster_upstream_rq_xx{...}[10m]))`
+- **Description:** This pie chart shows the total number of failed requests within the service mesh, categorized by service. It provides a visual breakdown of where failures are occurring, allowing operators to focus on problematic services.
- **Description:** This pie chart shows the total number of failed requests within the service mesh, categorized by service. It provides a visual breakdown of where failures are occurring, allowing operators to focus on problematic services.
+- **Description:** This pie chart shows the total number of failed requests within the service mesh, categorized by service. It provides a visual breakdown of where failures are occurring, allowing operators to identify problematic services.
- **Description:** This pie chart shows the total number of failed requests within the service mesh, categorized by service. It provides a visual breakdown of where failures are occurring, allowing operators to focus on problematic services.
+- **Description:** This pie chart shows the total number of failed requests within the service mesh, categorized by service. It provides a visual breakdown of where failures are occurring, allowing operators to identify problematic services.
+
+### Requests per second
+
+- **Grafana query:** `sum(rate(envoy_http_downstream_rq_total{...}[5m]))`
+- **Description:** This metric shows the rate of incoming HTTP requests per second to the selected services. It helps operators understand the current load on services and how much traffic they are processing.
- **Description:** This metric shows the rate of incoming HTTP requests per second to the selected services. It helps operators understand the current load on services and how much traffic they are processing.
+- **Description:** This metric shows the rate of incoming HTTP requests per second to the selected services over a 5 minute period. It helps operators understand the current load on services and how much traffic they are processing.
- **Description:** This metric shows the rate of incoming HTTP requests per second to the selected services. It helps operators understand the current load on services and how much traffic they are processing.
+- **Description:** This metric shows the rate of incoming HTTP requests per second to the selected services. It helps with understanding the current load on services and how much traffic they are processing.
- **Description:** This metric shows the rate of incoming HTTP requests per second to the selected services. It helps operators understand the current load on services and how much traffic they are processing.
+- **Description:** This metric shows the rate of incoming HTTP requests per second to the selected services over a 5 minute period. It helps operators understand the current load on services and how much traffic they are processing.
- **Description:** This metric shows the rate of incoming HTTP requests per second to the selected services. It helps operators understand the current load on services and how much traffic they are processing.
+- **Description:** This metric shows the rate of incoming HTTP requests per second to the selected services. It helps with understanding the current load on services and how much traffic they are processing.
+
+### Unhealthy clusters
+
+- **Grafana query:** `(sum(envoy_cluster_membership_healthy{...}) - sum(envoy_cluster_membership_total{...}))`
+- **Description:** This metric tracks the number of unhealthy clusters in the mesh, helping operators identify services that are experiencing issues and need attention to ensure operational health.
- **Description:** This metric tracks the number of unhealthy clusters in the mesh, helping operators identify services that are experiencing issues and need attention to ensure operational health.
+- **Description:** This metric tracks the number of unhealthy clusters in the mesh, helping to identify services that are experiencing issues and need attention to ensure operational health.
- **Description:** This metric tracks the number of unhealthy clusters in the mesh, helping operators identify services that are experiencing issues and need attention to ensure operational health.
+- **Description:** This metric tracks the number of unhealthy clusters in the mesh, helping to identify services that are experiencing issues and need attention to ensure operational health.
+
+### Heap size
+
+- **Grafana query:** `SUM(envoy_server_memory_heap_size{app=~"$service"})`
+- **Description:** This metric displays the total memory heap size of the Envoy proxies. Monitoring heap size is essential to detect memory issues and ensure that services are operating efficiently.
- **Description:** This metric displays the total memory heap size of the Envoy proxies. Monitoring heap size is essential to detect memory issues and ensure that services are operating efficiently.
+- **Description:** This metric displays the total memory heap size of the Envoy proxies.
- **Description:** This metric displays the total memory heap size of the Envoy proxies. Monitoring heap size is essential to detect memory issues and ensure that services are operating efficiently.
+- **Description:** This metric displays the total memory heap size of the Envoy proxies.
+
+### Allocated memory
+
+- **Grafana query:** `SUM(envoy_server_memory_allocated{app=~"$service"})`
+- **Description:** This metric shows the amount of memory allocated by the Envoy proxies. It helps operators monitor the resource usage of services to prevent memory overuse and optimize performance.
- **Description:** This metric shows the amount of memory allocated by the Envoy proxies. It helps operators monitor the resource usage of services to prevent memory overuse and optimize performance.
+- **Description:** This metric shows the amount of memory allocated by the Envoy proxies.
- **Description:** This metric shows the amount of memory allocated by the Envoy proxies. It helps operators monitor the resource usage of services to prevent memory overuse and optimize performance.
+- **Description:** This metric shows the amount of memory allocated by the Envoy proxies.
+
+### Avg uptime per node
+
+- **Grafana query:** `avg(envoy_server_uptime{app=~"$service"})`
+- **Description:** This metric calculates the average uptime of Envoy proxies across all nodes. It helps operators monitor the stability of services and detect potential issues with service restarts or crashes.
- **Description:** This metric calculates the average uptime of Envoy proxies across all nodes. It helps operators monitor the stability of services and detect potential issues with service restarts or crashes.
+- **Description:** This metric calculates the average uptime of Envoy proxies across all nodes. Use it to monitor the overall stability of services and detect potential issues with service restarts or crashes.
- **Description:** This metric calculates the average uptime of Envoy proxies across all nodes. It helps operators monitor the stability of services and detect potential issues with service restarts or crashes.
+- **Description:** This metric calculates the average uptime of Envoy proxies across all nodes. Use it to monitor the overall stability of services and detect potential issues with service restarts or crashes.
+
+### Cluster state
+
+- **Grafana query:** `(sum(envoy_cluster_membership_total{...}) - sum(envoy_cluster_membership_healthy{...})) == bool 0`
+- **Description:** This metric indicates whether all clusters are healthy. It provides a quick overview of the cluster state to ensure that there are no issues affecting service performance.
- **Description:** This metric indicates whether all clusters are healthy. It provides a quick overview of the cluster state to ensure that there are no issues affecting service performance.
+- **Description:** This metric indicates whether all clusters are healthy. It provides a quick overview of the cluster state to ensure that there are no issues affecting service performance.
- **Description:** This metric indicates whether all clusters are healthy. It provides a quick overview of the cluster state to ensure that there are no issues affecting service performance.
+- **Description:** This metric indicates whether all clusters are healthy. It provides a quick overview of the cluster state to ensure that there are no issues affecting service performance.
+
+### CPU throttled seconds by namespace
+
+- **Grafana query:** `rate(container_cpu_cfs_throttled_seconds_total{namespace=~"$namespace"}[5m])`
+- **Description:** This metric tracks the number of seconds during which CPU usage was throttled. Monitoring CPU throttling helps operators identify when services are exceeding their allocated CPU limits and may need optimization.
+
+### Memory usage by pod limits
+
+- **Grafana query:** `100 * max(container_memory_working_set_bytes{namespace=~"$namespace"}
+ / kube_pod_container_resource_limits{resource="memory"})`
+- **Description:** This metric shows memory usage as a percentage of the memory limit set for each pod. It helps operators ensure that services are staying within their allocated memory limits to avoid performance degradation.
+
+### CPU usage by pod limits
+
+- **Grafana query:** `100 * max(container_cpu_usage{namespace=~"$namespace"} / kube_pod_container_resource_limits{resource="cpu"})`
+- **Description:** This metric displays CPU usage as a percentage of the CPU limit set for each pod. Monitoring CPU usage helps operators optimize service performance and prevent CPU exhaustion.
+
+### Total active upstream connections
+
+- **Grafana query:** `sum(envoy_cluster_upstream_cx_active{app=~"$service"})`
+- **Description:** This metric tracks the total number of active upstream connections to other services in the mesh. It provides insight into service dependencies and network load.
+
+### Total active downstream connections
+
+- **Grafana query:** `sum(envoy_http_downstream_cx_active{app=~"$service"})`
+- **Description:** This metric tracks the total number of active downstream connections from services to clients. It helps operators monitor service load and ensure that services are able to handle the traffic effectively.
- **Description:** This metric tracks the total number of active downstream connections from services to clients. It helps operators monitor service load and ensure that services are able to handle the traffic effectively.
+- **Description:** This metric tracks the total number of active downstream connections to a given service. It helps operators monitor service load and ensure that services are able to handle the traffic effectively.
- **Description:** This metric tracks the total number of active downstream connections from services to clients. It helps operators monitor service load and ensure that services are able to handle the traffic effectively.
+- **Description:** This metric tracks the total number of active downstream connections to a given service. It helps operators monitor service load and ensure that services are able to handle the traffic effectively.
+
+
@@ -0,0 +1,80 @@
+---
+layout: docs
+page_title: Dashboard for Consul k8s control plane metrics
+description: >-
+  This documentation provides an overview of the Consul K8s Dashboard
-  This documentation provides an overview of the Consul K8s Dashboard
+  This documentation provides an overview of the Consul Kubernetes Dashboard
-  This documentation provides an overview of the Consul K8s Dashboard
+  This documentation provides an overview of the Consul Kubernetes Dashboard
+---
+
+# Consul k8s monitoring (Control Plane) dashboard
-# Consul k8s monitoring (Control Plane) dashboard
+# Consul Kubernetes monitoring (Control Plane) dashboard
-# Consul k8s monitoring (Control Plane) dashboard
+# Consul Kubernetes monitoring (Control Plane) dashboard
+
+### Number of Consul servers
+
+- **Grafana query:** `consul_consul_server_0_consul_members_servers{pod="consul-server-0"}`
+- **Description:** Displays the number of Consul servers currently active. This metric provides insight into the cluster's health and the number of Consul nodes running in the environment.
- **Grafana query:** `consul_consul_server_0_consul_members_servers{pod="consul-server-0"}`
- **Description:** Displays the number of Consul servers currently active. This metric provides insight into the cluster's health and the number of Consul nodes running in the environment.
+- **Grafana query:** `consul_consul_server_0_consul_members_servers{pod="consul-server-0"}`
+- **Description:** Displays the number of Consul servers currently active. This metric provides insight into the cluster's health and the number of Consul nodes running in the environment.
- **Grafana query:** `consul_consul_server_0_consul_members_servers{pod="consul-server-0"}`
- **Description:** Displays the number of Consul servers currently active. This metric provides insight into the cluster's health and the number of Consul nodes running in the environment.
+- **Grafana query:** `consul_consul_server_0_consul_members_servers{pod="consul-server-0"}`
+- **Description:** Displays the number of Consul servers currently active. This metric provides insight into the cluster's health and the number of Consul nodes running in the environment.
+
+### Number of connected Consul dataplanes
+
+- **Grafana query:** `count(consul_dataplane_envoy_connected)`
+- **Description:** Tracks the number of connected Consul dataplanes. This metric helps operators understand how many Envoy sidecars are actively connected to the mesh.
+
+### CPU usage in seconds (Consul servers)
+
+- **Grafana query:** `rate(container_cpu_usage_seconds_total{container="consul", pod=~"consul-server-.*"}[5m])`
+- **Description:** This metric shows the CPU usage of the Consul servers over time, helping operators monitor resource consumption.
- **Description:** This metric shows the CPU usage of the Consul servers over time, helping operators monitor resource consumption.
+- **Description:** This metric shows the CPU usage of the Consul servers over time, helping monitor resource consumption.
- **Description:** This metric shows the CPU usage of the Consul servers over time, helping operators monitor resource consumption.
+- **Description:** This metric shows the CPU usage of the Consul servers over time, helping monitor resource consumption.
+
+### Memory usage (Consul servers)
+
+- **Grafana query:** `container_memory_working_set_bytes{container="consul", pod=~"consul-server-.*"}`
+- **Description:** Displays the memory usage of the Consul servers. This metric helps ensure that the servers have sufficient memory resources for proper operation.
+
+### Disk read/write total per 5 minutes (Consul servers)
+
+- **Grafana query:** `sum(rate(container_fs_writes_bytes_total{pod=~"consul-server-.*",
+ container="consul"}[5m])) by (pod, device)`
+- **Grafana query:** `sum(rate(container_fs_reads_bytes_total{pod=~"consul-server-.*", container="consul"}[5m])) by (pod, device)`
+- **Description:** Monitors disk read and write operations over 5-minute intervals for Consul servers. This helps identify potential disk bottlenecks or issues.
- **Description:** Monitors disk read and write operations over 5-minute intervals for Consul servers. This helps identify potential disk bottlenecks or issues.
+- **Description:** Monitors disk read and write operations over 5-minute intervals for Consul servers. Use this metric to identify potential disk I/O bottlenecks or throughput issues.
- **Description:** Monitors disk read and write operations over 5-minute intervals for Consul servers. This helps identify potential disk bottlenecks or issues.
+- **Description:** Monitors disk read and write operations over 5-minute intervals for Consul servers. Use this metric to identify potential disk I/O bottlenecks or throughput issues.
+
+### Received bytes total per 5 minutes (Consul servers)
+
+- **Grafana query:** `sum(rate(container_network_receive_bytes_total{pod=~"consul-server-.*"}[5m])) by (pod)`
+- **Description:** Tracks the total network bytes received by Consul servers within a 5-minute window. This metric helps assess the network load on Consul nodes.
+
+### Memory limit (Consul servers)
+
+- **Grafana query:** `kube_pod_container_resource_limits{resource="memory", pod="consul-server-0"}`
+- **Description:** Displays the memory limit for Consul servers. This metric ensures that memory usage stays within the defined limits for each Consul server.
+
+### CPU limit in seconds (Consul servers)
+
+- **Grafana query:** `kube_pod_container_resource_limits{resource="cpu", pod="consul-server-0"}`
+- **Description:** Displays the CPU limit for Consul servers. Monitoring CPU limits helps operators ensure that the services are not constrained by resource limitations.
+
+### Disk usage (Consul servers)
+
+- **Grafana query:** `sum(container_fs_usage_bytes{}) by (pod)`
+- **Grafana query:** `sum(container_fs_usage_bytes{pod="consul-server-0"})`
+- **Description:** Shows the amount of filesystem storage used by Consul servers. This metric helps operators track disk usage and plan for capacity.
+
+### CPU usage in seconds (Connect injector)
+
+- **Grafana query:** `rate(container_cpu_usage_seconds_total{pod=~".*-connect-injector-.*",
+container="sidecar-injector"}[5m])`
+- **Description:** Tracks the CPU usage of the Connect Injector, which is responsible for injecting Envoy sidecars. Monitoring this helps ensure that Connect Injector has adequate CPU resources.
- **Description:** Tracks the CPU usage of the Connect Injector, which is responsible for injecting Envoy sidecars. Monitoring this helps ensure that Connect Injector has adequate CPU resources.
+- **Description:** Tracks the CPU usage of the Connect Injector, which is responsible for injecting Envoy sidecars and other operations within the mesh. Monitoring this helps ensure that Connect Injector has adequate CPU resources.
- **Description:** Tracks the CPU usage of the Connect Injector, which is responsible for injecting Envoy sidecars. Monitoring this helps ensure that Connect Injector has adequate CPU resources.
+- **Description:** Tracks the CPU usage of the Connect Injector, which is responsible for injecting Envoy sidecars and other operations within the mesh. Monitoring this helps ensure that Connect Injector has adequate CPU resources.
+
+### CPU limit in seconds (Connect injector)
+
+- **Grafana query:** `max(kube_pod_container_resource_limits{resource="cpu", container="sidecar-injector"})`
+- **Description:** Displays the CPU limit for the Connect Injector. Monitoring the CPU limits ensures that Connect Injector is not constrained by resource limitations.
+
+### Memory usage (Connect injector)
+
+- **Grafana query:** `container_memory_working_set_bytes{pod=~".*-connect-injector-.*",
+container="sidecar-injector"}`
+- **Description:** Tracks the memory usage of the Connect Injector. Monitoring this helps ensure the Connect Injector has sufficient memory resources.
+
+### Memory limit (Connect injector)
+
+- **Grafana query:** `max(kube_pod_container_resource_limits{resource="memory", container="sidecar-injector"})`
+- **Description:** Displays the memory limit for the Connect Injector, helping to monitor if the service is nearing its resource limits.
+
+
@@ -0,0 +1,96 @@
+---
+layout: docs
+page_title: Dashboard for Consul server metrics
+description: >-
+  This documentation provides an overview of the Consul Server Dashboard
+---
+
+# Consul server monitoring dashboard
+
+### Raft commit time
+
+- **Grafana query:** `consul_raft_commitTime`
+- **Description:** This metric measures the time it takes to commit Raft log entries. Stable values are expected for a healthy cluster. High values can indicate issues with resources such as memory, CPU, or disk space.
+
+### Raft commits per 5 minutes
+
+- **Grafana query:** `rate(consul_raft_apply[5m])`
+- **Description:** This metric tracks the rate of Raft log commits emitted by the leader, showing how quickly changes are being applied across the cluster.
+
+### Last contacted leader
+
+- **Grafana query:** `consul_raft_leader_lastContact != 0`
+- **Description:** Measures the duration since the last contact with the Raft leader. Spikes in this metric can indicate network issues or an unavailable leader, which may affect cluster stability.
+
+### Election events
+
+- **Grafana query:** `rate(consul_raft_state_candidate[1m])`, `rate(consul_raft_state_leader[1m])`
+- **Description:** Tracks Raft state transitions, indicating leadership elections. Frequent transitions might suggest cluster instability and require investigation.
+
+### Autopilot health
+
+- **Grafana query:** `consul_autopilot_healthy`
+- **Description:** A boolean metric that shows a value of 1 when Autopilot is healthy and 0 when issues are detected. Ensures that the cluster has sufficient resources and an operational leader.
+
+### DNS queries per 5 minutes
+
+- **Grafana query:** `rate(consul_dns_domain_query_count[5m])`
+- **Description:** This metric tracks the rate of DNS queries per node, bucketed into 5-minute intervals. It helps monitor the query load on Consul’s DNS service.
+
+### DNS domain query time
+
+- **Grafana query:** `consul_dns_domain_query`
+- **Description:** Measures the time spent handling DNS domain queries. Spikes in this metric may indicate high contention in the catalog or too many concurrent queries.
+
+### DNS reverse query time
+
+- **Grafana query:** `consul_dns_ptr_query`
+- **Description:** Tracks the time spent processing reverse DNS queries. Spikes in query time may indicate performance bottlenecks or increased workload.
+
+### KV applies per 5 minutes
+
+- **Grafana query:** `rate(consul_kvs_apply_count[5m])`
+- **Description:** This metric tracks the rate of Key-Value store applies over 5-minute intervals, indicating the operational load on Consul’s KV store.
+
+### KV apply time
+
+- **Grafana query:** `consul_kvs_apply`
+- **Description:** Measures the time taken to apply updates to the Key-Value store. Spikes in this metric might suggest resource contention or client overload.
+
+### Transaction apply time
+
+- **Grafana query:** `consul_txn_apply`
+- **Description:** Tracks the time spent applying transaction operations in Consul, providing insights into potential bottlenecks in transactional workloads.
- **Description:** Tracks the time spent applying transaction operations in Consul, providing insights into potential bottlenecks in transactional workloads.
+- **Description:** Tracks the time spent applying transaction operations in Consul, providing insights into potential bottlenecks in transaction operations.
- **Description:** Tracks the time spent applying transaction operations in Consul, providing insights into potential bottlenecks in transactional workloads.
+- **Description:** Tracks the time spent applying transaction operations in Consul, providing insights into potential bottlenecks in transaction operations.
+
+### ACL resolves per 5 minutes
+
+- **Grafana query:** `rate(consul_acl_ResolveToken_count[5m])`
+- **Description:** This metric tracks the rate of ACL token resolutions per 5-minute intervals. It provides insights into the activity related to ACL tokens within the cluster.
+
+### ACL resolve token time
+
+- **Grafana query:** `consul_acl_ResolveToken`
+- **Description:** Measures the time taken to resolve ACL tokens into their associated policies. Spikes in this metric might indicate resource issues or configuration problems.
+
+### ACL updates per 5 minutes
+
+- **Grafana query:** `rate(consul_acl_apply_count[5m])`
+- **Description:** Tracks the rate of ACL updates per 5-minute intervals. This metric helps monitor changes in ACL configurations over time.
+
+### ACL apply time
+
+- **Grafana query:** `consul_acl_apply`
+- **Description:** Measures the time spent applying ACL changes. Spikes in apply time might suggest resource constraints or high operational load.
+
+### Catalog operations per 5 minutes
+
+- **Grafana query:** `rate(consul_catalog_register_count[5m])`, `rate(consul_catalog_deregister_count[5m])`
+- **Description:** Tracks the rate of register and deregister operations in the Consul catalog, providing insights into the churn of services within the cluster.
+
+### Catalog operation time
+
+- **Grafana query:** `consul_catalog_register`, `consul_catalog_deregister`
+- **Description:** Measures the time taken to complete catalog register or deregister operations. Spikes in this metric can indicate performance issues within the catalog.
- **Description:** Measures the time taken to complete catalog register or deregister operations. Spikes in this metric can indicate performance issues within the catalog.
+- **Description:** Measures the time taken to complete catalog register or deregister operations.
- **Description:** Measures the time taken to complete catalog register or deregister operations. Spikes in this metric can indicate performance issues within the catalog.
+- **Description:** Measures the time taken to complete catalog register or deregister operations.
+
+
+
@@ -0,0 +1,115 @@
+---
+layout: docs
+page_title: Service Mesh Observability - Dashboards
+description: >-
+  This documentation provides an overview of several dashboards designed for monitoring and managing services within a Consul-managed Envoy service mesh. Learn how to enable access logs and configure key performance and operational metrics to ensure the reliability and performance of services in the service mesh.
+---
+
+# Dashboards for service mesh observability
+
+This topic describes the configuration and usage of dashboards for monitoring and managing services within a Consul-managed Envoy service mesh. These dashboards provide critical insights into the health, performance, and resource utilization of services. The dashboards described here are essential tools for ensuring the stability, efficiency, and reliability of your service mesh environment.
+
+## Dashboards overview
+
+The repository includes the following dashboards:
+
+  - **Consul service-to-service dashboard**: Provides a detailed view of service-to-service communications, monitoring key metrics like access logs, HTTP requests, error counts, response code distributions, and request success rates. The dashboard includes customizable filters for focusing on specific services and namespaces.
+
+  - **Consul service dashboard**: Tracks key metrics for Envoy proxies at the cluster and service levels, ensuring the performance and reliability of individual services within the mesh.
+
+  - **Consul dataPlane dashboard**: Offers a comprehensive overview of service health and performance, including request success rates, resource utilization (CPU and memory), active connections, and cluster health. It helps operators maintain service reliability and optimize resource usage.
-  - **Consul dataPlane dashboard**: Offers a comprehensive overview of service health and performance, including request success rates, resource utilization (CPU and memory), active connections, and cluster health. It helps operators maintain service reliability and optimize resource usage.
+  - **Consul dataplane dashboard**: Offers a comprehensive overview of service health and performance, including request success rates, resource utilization (CPU and memory), active connections, and cluster health. It helps operators maintain service reliability and optimize resource usage.
-  - **Consul dataPlane dashboard**: Offers a comprehensive overview of service health and performance, including request success rates, resource utilization (CPU and memory), active connections, and cluster health. It helps operators maintain service reliability and optimize resource usage.
+  - **Consul dataplane dashboard**: Offers a comprehensive overview of service health and performance, including request success rates, resource utilization (CPU and memory), active connections, and cluster health. It helps operators maintain service reliability and optimize resource usage.
+
+  - **Consul k8s dashboard**: Focuses on monitoring the health and resource usage of the Consul control plane within a Kubernetes environment, ensuring the stability of the control plane.
+
+  - **Consul server dashboard**: Provides detailed monitoring of Consul servers, tracking key metrics like server health, CPU and memory usage, disk I/O, and network performance. This dashboard is critical for ensuring the stability and performance of Consul servers within the service mesh.
+
+## Enabling observability
+
+Add the following configurations to your Consul Helm chart to enable the observability tools in [the sample repo](https://github.com/YasminLorinKaygalak/GrafanaDemo/tree/main).
+
+<CodeTabs tabs={[ "Kubernetes YAML"]}>
+
-<CodeTabs tabs={[ "Kubernetes YAML"]}>
-<CodeTabs tabs={[ "Kubernetes YAML"]}>
+```yaml
+global:
+  logLevel: trace
+  name: consul
+  datacenter: dc1
+  tls:
+    enabled: true
+    enableAutoEncrypt: true
+    httpsOnly: false
+  acls:
+    manageSystemACLs: true
+  metrics:
+    enabled: true
+    provider: "prometheus"
+    enableAgentMetrics: true
+    agentMetricsRetentionTime: "10m"
+
+prometheus:
+  enabled: true
+
+server:
+  logLevel: trace
+  replicas: 1
+  annotations: |
+    "prometheus.io/scheme": "https"
+    "prometheus.io/port": "8501"
+
+ui:
+  enabled: true
+  service:
+    type: NodePort
+  metrics:
+    enabled: true
+    provider: "prometheus"
+    baseURL: http://prometheus-server.consul
+
+connectInject:
+  enabled: true
+  metrics:
+    defaultEnabled: true
+  apiGateway:
+    managedGatewayClass:
+      serviceType: LoadBalancer
+```
+
+</CodeTabs>
+
+## Enable access logs
+
+Access logs configurations are defined globally in the [`proxy-defaults`](/consul/docs/connect/config-entries/proxy-defaults#accesslogs) configuration entry. 
+
+The following example is a minimal configuration for enabling access logs:
+
+<CodeTabs tabs={[ "HCL", "Kubernetes YAML", "JSON" ]}>
+
+```hcl
+Kind      = "proxy-defaults"
+Name      = "global"
+AccessLogs {
+  Enabled = true
+}
+```
+
+```yaml
+apiVersion: consul.hashicorp.com/v1alpha1
+kind: ProxyDefaults
+metadata:
+  name: global
+spec:
+  accessLogs:
+    enabled: true
+```
+
+```json
+{
+  "Kind": "proxy-defaults",
+  "Name": "global",
+  "AccessLogs": {
+    "Enabled": true
+  }
+}
+```
+
+</CodeTabs>