Ramping Tests
Gradually increase load to find breaking points. Identify at what VU count latency degrades or errors spike.
stages: [{ duration: "5m", target: 500 }]Validate capacity under real traffic. Ramping, soak, and spike tests for REST and Kafka-backed APIs — with thresholds that fail the CI build when performance regresses.
Ramping, soak, and spike — each catches different classes of production failure.
Gradually increase load to find breaking points. Identify at what VU count latency degrades or errors spike.
stages: [{ duration: "5m", target: 500 }]Sustained load for hours. Catches memory leaks, connection pool exhaustion, and slow degradation over time.
duration: "2h", vus: 100Sudden traffic bursts to test auto-scaling. Verifies the system recovers gracefully after load spikes.
stages: [{ duration: "30s", target: 1000 }]How load testing revealed hidden bottlenecks in the AI query pipeline
export const options = {
stages: [
{ duration: "2m", target: 100 },
{ duration: "5m", target: 500 },
{ duration: "2m", target: 0 },
],
thresholds: {
http_req_duration: ["p(95)<800"],
http_req_failed: ["rate<0.005"],
},
};
export default function () {
http.post(API_URL, JSON.stringify({ query: "Show sales by region last quarter" }));
}Know your system's limits before you hit them. Make data-driven scaling decisions based on actual performance curves.
Pinpoint exactly where performance degrades. Database queries? Thread pools? Network I/O? Load testing reveals the answer.
Prove that your system meets latency and throughput SLAs under realistic load. Ship with confidence.
k6 questions engineers ask before their first production load test.
Load testing validates performance at expected production traffic (baseline behavior). Stress testing pushes load past the breaking point to find failure modes. Soak testing runs sustained load for hours to catch memory leaks, pool exhaustion, and slow degradation. Ship all three before production — each catches different bugs.
k6 wins for most modern teams: JavaScript test scripts, native CI/CD integration, built-in cloud runner, Prometheus/Grafana output. JMeter is legacy XML — avoid for greenfield. Gatling is strong for JVM shops that want Scala DSL. Locust is Python-native but weaker on high VU counts. I default to k6 unless the team has a hard reason otherwise.
For user-facing APIs, target p95 < 300ms at expected load. For backend-to-backend APIs, p95 < 800ms is usually fine. p99 should be < 2s. Set these as k6 thresholds so the test fails CI when they regress: `thresholds: { http_req_duration: ["p(95)<300"] }`. Numbers below those are aspirational — prove you hit them before you commit to an SLA.
Start at 2-3x your observed production peak concurrent connections. For most APIs that is 50-200 VUs for normal load tests and 500-2000 VUs for capacity tests. Run a ramping test first to find your breaking point, then anchor your baseline 30% below that number.
Split into two passes: a fast smoke test (30s, 5 VUs, blocks deploy) on every PR, and a full capacity test (10-15min) that runs nightly or on release-candidate tags. Use k6 thresholds to fail the build on regression. Export results to Prometheus and overlay against production baseline in Grafana.
Two considerations: (1) rate-limit your test against the upstream (LLM APIs have strict RPM caps, and Kafka partition count bounds consumer parallelism), (2) test the queuing and backpressure path explicitly — produce 10x peak load and verify your system queues/retries rather than cascading failures. I cover the LLM case in the Text2SQL use case above.