Skip to content

15. Monitoring, Logging & ObservabilityΒΆ

15.1 Observability StackΒΆ

graph TD
  subgraph COLLECT["πŸ“‘ Data Collection"]
    direction LR
    M["πŸ“Š Metrics\nPrometheus + Thanos"] ~~~ L["πŸ“ Logs\nLoki + Fluent Bit"] ~~~ T["πŸ”— Traces\nJaeger / Tempo"]
  end

  COLLECT --> VIZ["πŸ“ˆ Visualization β€” Grafana (unified dashboards)"]
  VIZ --> ALERT["🚨 Alerting β€” Alertmanager β†’ Telegram / Email / PagerDuty"]

Note: Elasticsearch is retained solely for Wazuh SIEM (security event monitoring) and Meilisearch for app-level search. All application and infrastructure logging uses Loki.

15.2 Tool BreakdownΒΆ

Tool Purpose
Prometheus Metrics collection (CPU, memory, request rates, SLAs)
Grafana Unified dashboards for metrics, logs, traces
Loki Log aggregation for all application and infrastructure logs
Fluent Bit Log shipping from containers and servers
Jaeger / Tempo Distributed tracing across microservices
Thanos Long-term Prometheus storage, multi-cluster
Uptime Kuma HTTP/TCP/DNS uptime monitoring
Alertmanager Alert routing to Telegram groups, email, SMS