RED Metrics
Request-level health for any service
- Rate · Requests per second across endpoints and methods.
- Errors · Failed request rate — HTTP 5xx plus business errors.
- Duration · Request latency at p50, p95, and p99 buckets.
Real-time metrics, SLO dashboards, and alerting for Spring Boot and Node.js. RED and USE signals wired end-to-end — from HTTP handler to LLM pipeline.
Two families. RED for request health, USE for resource health. Ship both.
Request-level health for any service
System-level saturation signals
How observability held 99.5% query accuracy for an AI-powered NL-to-SQL system
Define and track Service Level Objectives. Know when you're burning error budget before SLA violations occur.
Correlate metrics across services during incidents. Identify root causes faster with historical data and trend analysis.
Use historical trends to predict resource needs. Scale proactively based on data, not guesswork.
Questions engineers ask before wiring their first production metrics stack.
Monitoring answers predefined questions ("is CPU above 80%?"). Observability lets you ask any question about system state by combining metrics, logs, and traces with high-cardinality labels. In practice: monitoring gives you a fixed dashboard, observability gives you a query language (PromQL, LogQL, or equivalent) to explore behavior you did not predict.
RED — Rate, Errors, Duration — describes request-level health for any service (best fit for REST APIs, Kafka consumers, gRPC). USE — Utilization, Saturation, Errors — describes resource-level health (CPU, memory, disk, connection pools). Ship both. RED catches user-visible degradation; USE catches the root cause underneath.
Add micrometer-registry-prometheus to your dependencies. Spring Boot Actuator exposes /actuator/prometheus out of the box. Use @Timed and @Counted annotations for method-level timing, or inject MeterRegistry to emit custom metrics. Scrape the endpoint with Prometheus, visualize in Grafana. Total setup: about 15 minutes for a greenfield service.
A Service Level Objective is a target for a Service Level Indicator (SLI) measured over a rolling window. Example SLI: "99.5% of /api/query requests complete in under 500ms." Example SLO: "maintain 99.5% over any 28-day window." The gap between SLO and 100% is the error budget. When you burn through it, you freeze feature work and fix reliability.
Instrument four signals: token usage per request (input + output), LLM response latency (p95 per provider), cache hit rate for prompts and retrieval, and fallback/retry frequency. I documented the Text2SQL case in the use case above — the same pattern works for any RAG or LLM-orchestration stack. Label metrics by model and provider so you can A/B route and compare cost/quality.
MTTD is the time between when a problem starts and when a human notices. Without observability, MTTD is often hours (someone reports it). With RED alerts on error rate and latency, MTTD drops to minutes. The Text2SQL case above took MTTD from hours to minutes by alerting on LLM latency breaking a p95 threshold before users noticed timeouts.