A comprehensive Terraform module for deploying a production-ready monitoring and observability stack on Kubernetes clusters. This module provides a complete solution for metrics, logs, and traces collection with long-term storage capabilities.
- Complete Observability Stack: Deploy Prometheus, Grafana, Loki, Alloy, and OpenTelemetry in a single module
- Long-term Metrics Storage: Grafana Mimir for scalable, multi-tenant Prometheus backend
- Log Aggregation: Distributed Loki deployment for centralized log management
- Log Collection: Grafana Alloy for efficient log collection and forwarding to Loki
- Trace Collection: OpenTelemetry Operator for distributed tracing
- Certificate Management: Automated TLS certificate handling with cert-manager
- DNS Management: External-DNS for automatic DNS record creation
- Modular Design: Enable/disable individual components based on your needs
- Production Ready: Persistent storage, high availability configurations, and proper resource limits
- Pre-configured Integration: Components are automatically integrated with proper data sources
module "monitoring" {
source = "path/to/terraform-module-monitoring"
# Deploy the full monitoring stack
prometheus = {
enabled = true
}
grafana = {
enabled = true
}
loki = {
enabled = true
}
alloy = {
enabled = true
}
namespace = "monitoring"
}
module "monitoring" {
source = "path/to/terraform-module-monitoring"
# Kubernetes configuration
kube_context = "my-cluster"
kubeconfig = "~/.kube/config"
namespace = "observability"
# Enable all components with custom configuration
external_dns = {
enabled = true
version = "1.17.0"
namespace = "external-dns"
}
cert_manager = {
enabled = true
version = "v1.18.2"
namespace = "cert-manager"
}
prometheus = {
enabled = true
version = "75.9.0"
name = "prometheus"
chart = "kube-prometheus-stack"
}
grafana_mimir = {
enabled = true
version = "5.7.0"
namespace = "monitoring"
}
loki = {
enabled = true
version = "0.79.3"
namespace = "monitoring"
}
alloy = {
enabled = true
version = "0.10.0"
namespace = "monitoring"
}
opentelemetry = {
enabled = true
version = "0.90.4"
}
grafana = {
enabled = true
version = "12.0.2"
}
}
The module includes a K3s-based local development environment:
# Start the local K3s cluster
cd k3s
./start.sh
# Apply the monitoring stack
terraform init
terraform apply -var-file="k3s/terraform.tfvars"
# Access services (using nip.io for local DNS)
# Prometheus: http://prometheus.127.0.0.1.nip.io
# Grafana: http://grafana.127.0.0.1.nip.io
# Cleanup
./destroy.sh
Name | Version |
---|---|
terraform | >= 1.0.0 |
helm | >= 2.0.0 |
kubernetes | >= 2.0.0 |
Name | Version |
---|---|
helm | >= 2.0.0 |
kubernetes | >= 2.0.0 |
Name | Type | Default | Description |
---|---|---|---|
kube_context |
string |
null |
The Kubernetes context to use |
kubeconfig |
string |
"~/.kube/config" |
Path to the kubernetes config file |
namespace |
string |
"monitoring" |
The default Kubernetes namespace where resources will be installed |
external_dns |
any |
See below | External-DNS Helm chart configuration |
cert_manager |
any |
See below | Cert-Manager Helm chart configuration with self-signed cluster issuer |
prometheus |
any |
See below | Prometheus Helm chart configuration with ingress enabled |
grafana_mimir |
any |
See below | Grafana Mimir (distributed Prometheus backend) Helm chart configuration |
loki |
any |
See below | Loki distributed Helm chart configuration for log aggregation |
alloy |
any |
See below | Grafana Alloy Helm chart configuration for log collection and forwarding |
opentelemetry |
any |
See below | OpenTelemetry Helm chart configuration |
grafana |
any |
See below | Grafana Helm chart configuration with pre-configured data sources |
Each component can be configured with the following structure:
{
enabled = bool # Whether to install this component
version = string # Helm chart version
name = string # Helm release name
chart = string # Helm chart name
namespace = string # Kubernetes namespace (optional, uses module namespace if not specified)
repository = string # Helm chart repository URL
}
Default configurations:
- external_dns:
enabled = true
,version = "1.17.0"
- cert_manager:
enabled = true
,version = "v1.18.2"
- prometheus:
enabled = true
,version = "75.9.0"
, usingkube-prometheus-stack
- grafana_mimir:
enabled = true
,version = "5.7.0"
, usingmimir-distributed
- loki:
enabled = true
,version = "0.79.3"
, usingloki-distributed
- alloy:
enabled = true
,version = "0.10.0"
- opentelemetry:
enabled = true
,version = "0.90.4"
- grafana:
enabled = false
,version = "12.0.2"
Currently, this module does not expose any outputs. Future versions may include:
- Service URLs for Prometheus, Grafana, and other components
- Installation status for each component
- Generated passwords and credentials
The module deploys the following architecture:
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Prometheus │ │ Grafana │ │ External-DNS │ │
│ │ Stack │ │ │ │ │ │
│ └──────┬──────┘ └──────┬───────┘ └─────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Grafana │ │ Loki │ │ Cert-Manager │ │
│ │ Mimir │ │ (Distributed)│ │ │ │
│ └─────────────┘ └──────┬───────┘ └─────────────────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ Alloy │ │
│ │ (DaemonSet) │ │
│ └──────────────┘ │
│ │
│ ┌───────────────────────┐ ┌───────────────────────┐ │
│ │ OpenTelemetry │ │ Jaeger Operator │ │
│ │ Operator │ │ │ │
│ └───────────────────────┘ └───────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
- Metrics: Prometheus scrapes metrics → Remote writes to Grafana Mimir for long-term storage
- Logs: Applications → Alloy (DaemonSet) → Loki → Grafana for visualization
- Traces: Applications → OpenTelemetry Collector → Storage backend
- Visualization: Grafana provides unified dashboards for all telemetry data
- Full Prometheus Operator deployment
- Pre-configured ServiceMonitors for Kubernetes components
- AlertManager for alert routing
- Prometheus server with remote write to Mimir
- Horizontally scalable, multi-tenant Prometheus backend
- Long-term metrics storage
- Compatible with Prometheus remote write API
- Includes MinIO for object storage (optional)
- Distributed deployment for high availability
- Efficient log aggregation and storage
- Integrated with Grafana for log exploration
- Configured with appropriate storage classes
- Operator pattern for managing OpenTelemetry collectors
- Support for traces, metrics, and logs
- Auto-instrumentation capabilities
- Kubernetes-native resource management
- Pre-configured data sources for Prometheus, Mimir, and Loki
- Dashboard provisioning for common use cases
- RBAC and authentication support
- Ingress configuration for web access
- DaemonSet deployment for log collection from all nodes
- Kubernetes service discovery for automatic pod log collection
- Pre-configured pipeline to forward logs to Loki
- Efficient log processing with static labels and filtering
- Container runtime agnostic (Docker, containerd, CRI-O)
Contributions are welcome! Please feel free to submit a Pull Request.
At Digitalis, our mission is to make the adoption of cloud-native and distributed data technologies as easy and seamless as possible for enterprises—on any Kubernetes, any cloud, and any data center. We focus on the technology stack that powers modern businesses, knowing this area can create a significant impact for our customers. If your organization is considering these technologies to drive transformation, we're here to guide you every step of the way.
Contact our team for a free consultation to discuss how we can tailor our approach to your specific needs and challenges.
This module is licensed under the MIT License - see the LICENSE file for details.