Skip to main content
Rohit Raj
AccueilProjetsServicesDépôtsNotesÀ proposContactVoir Travail Actuel
Home→Reliability→Observability
📊 Observability

Production Observability with Prometheus + Grafana

Real-time metrics, SLO dashboards, and alerting for Spring Boot and Node.js. RED and USE signals wired end-to-end — from HTTP handler to LLM pipeline.

By Rohit Raj · Founding Engineer·Published Jan 31, 2026·Updated Apr 24, 2026

Live Prometheus Metrics

Request Rate
1,247req/s
Error Rate
0.12%
P95 Latency
23.4ms
P99 Latency
45.2ms
Active Connections
342
CPU Usage
42.3%

What Metrics Should Every Production Service Emit?

Two families. RED for request health, USE for resource health. Ship both.

RED Metrics

Request-level health for any service

  • Rate · Requests per second across endpoints and methods.
  • Errors · Failed request rate — HTTP 5xx plus business errors.
  • Duration · Request latency at p50, p95, and p99 buckets.

USE Metrics

System-level saturation signals

  • Utilization · CPU, memory, and thread-pool saturation.
  • Saturation · Queue depth, backlog, and connection pool waits.
  • Errors · System-level failures (GC stalls, OOM, disk full).
🚀

Real Use Case: Text2SQL Query Engine SLOs

How observability held 99.5% query accuracy for an AI-powered NL-to-SQL system

Challenge
Maintain sub-500ms p95 latency while translating natural language → SQL with high accuracy. Without visibility into the LLM pipeline, debugging query failures would require parsing logs across API, schema resolver, and LLM layers.
Solution
Prometheus metrics across the full pipeline, split into two dashboards.

🤖 LLM Pipeline Metrics

  • Token usage per request
  • LLM response latency (p50, p95, p99)
  • Schema context cache hit rate

📊 Query Accuracy Metrics

  • SQL syntax validation rate
  • Query execution success rate
  • Fallback/retry frequency
Impact
Mean time to detection (MTTD) fell from hours to minutes. When LLM latency spiked on context overflow, alerts fired before users reported timeouts — enabling proactive token optimization.

How Does Production Observability Pay for Itself?

🎯

SLO Tracking

Define and track Service Level Objectives. Know when you're burning error budget before SLA violations occur.

🔍

Incident Response

Correlate metrics across services during incidents. Identify root causes faster with historical data and trend analysis.

📊

Capacity Planning

Use historical trends to predict resource needs. Scale proactively based on data, not guesswork.

Frequently Asked Questions

Questions engineers ask before wiring their first production metrics stack.

What is observability and how is it different from monitoring?

Monitoring answers predefined questions ("is CPU above 80%?"). Observability lets you ask any question about system state by combining metrics, logs, and traces with high-cardinality labels. In practice: monitoring gives you a fixed dashboard, observability gives you a query language (PromQL, LogQL, or equivalent) to explore behavior you did not predict.

What are RED and USE metrics?

RED — Rate, Errors, Duration — describes request-level health for any service (best fit for REST APIs, Kafka consumers, gRPC). USE — Utilization, Saturation, Errors — describes resource-level health (CPU, memory, disk, connection pools). Ship both. RED catches user-visible degradation; USE catches the root cause underneath.

How do I add Prometheus metrics to a Spring Boot application?

Add micrometer-registry-prometheus to your dependencies. Spring Boot Actuator exposes /actuator/prometheus out of the box. Use @Timed and @Counted annotations for method-level timing, or inject MeterRegistry to emit custom metrics. Scrape the endpoint with Prometheus, visualize in Grafana. Total setup: about 15 minutes for a greenfield service.

What is an SLO and how do I define one?

A Service Level Objective is a target for a Service Level Indicator (SLI) measured over a rolling window. Example SLI: "99.5% of /api/query requests complete in under 500ms." Example SLO: "maintain 99.5% over any 28-day window." The gap between SLO and 100% is the error budget. When you burn through it, you freeze feature work and fix reliability.

How do I monitor LLM pipelines with Prometheus?

Instrument four signals: token usage per request (input + output), LLM response latency (p95 per provider), cache hit rate for prompts and retrieval, and fallback/retry frequency. I documented the Text2SQL case in the use case above — the same pattern works for any RAG or LLM-orchestration stack. Label metrics by model and provider so you can A/B route and compare cost/quality.

What is mean time to detection (MTTD) and how does observability reduce it?

MTTD is the time between when a problem starts and when a human notices. Without observability, MTTD is often hours (someone reports it). With RED alerts on error rate and latency, MTTD drops to minutes. The Text2SQL case above took MTTD from hours to minutes by alerting on LLM latency breaking a p95 threshold before users noticed timeouts.

Related Reading

  • Kafka Consumer Testing with Embedded Kafka →
  • Load Testing REST APIs with k6 →
  • API Contract Testing with Postman + Newman →
  • Spring Boot + MCP: Tool-Using AI Agents →
  • Multi-Tenant SaaS on Spring Boot + Java 21 →
  • Hire Rohit: SRE & Observability Consulting →
← Back to Reliability Overview

Rohit Raj — Ingénieur Backend & Systèmes IA

Services

Mobile App DevelopmentAI Chatbot DevelopmentFull-Stack Development

Recevoir les Mises à Jour