Load Testing

Performance Validation with k6

Validate capacity under real traffic. Ramping, soak, and spike tests for REST and Kafka-backed APIs — with thresholds that fail the CI build when performance regresses.

By Rohit Raj · Founding EngineerPublished Jan 31, 2026Updated Apr 24, 2026

Live k6 Test Execution

RUNNING

Virtual Users

150

Requests/sec

1247

P95 Latency

23.4ms

P99 Latency

45.2ms

Error Rate

0.12%

Test Scenarios & Results

Virtual Users

500

Peak Concurrency

Throughput

2500

Requests per second

P95 Latency

24.3ms

95th percentile

Error Rate

0.08%

Failed requests

What Are the Three Core k6 Test Types?

Ramping, soak, and spike — each catches different classes of production failure.

Ramping Tests

Gradually increase load to find breaking points. Identify at what VU count latency degrades or errors spike.

stages: [{ duration: "5m", target: 500 }]

Soak Tests

Sustained load for hours. Catches memory leaks, connection pool exhaustion, and slow degradation over time.

duration: "2h", vus: 100

Spike Tests

Sudden traffic bursts to test auto-scaling. Verifies the system recovers gracefully after load spikes.

stages: [{ duration: "30s", target: 1000 }]

Real Use Case: 500 Concurrent Queries on a Text2SQL API

How load testing revealed hidden bottlenecks in the AI query pipeline

Challenge: The Text2SQL Query Engine needed to handle 500 concurrent natural language queries with p95 latency under 800ms. Business dashboards refresh on a schedule, causing predictable traffic spikes.
Solution: Created k6 scripts simulating diverse query patterns across 20 query templates with realistic payload sizes.

k6 script

export const options = {
  stages: [
    { duration: "2m", target: 100 },
    { duration: "5m", target: 500 },
    { duration: "2m", target: 0 },
  ],
  thresholds: {
    http_req_duration: ["p(95)<800"],
    http_req_failed: ["rate<0.005"],
  },
};

export default function () {
  http.post(API_URL, JSON.stringify({ query: "Show sales by region last quarter" }));
}

Discovery: At 350 concurrent users, p95 latency jumped from 620ms to 3.2 seconds. Root cause: LLM provider rate limiting combined with database connection pool saturation.
Fix: Request queuing with exponential backoff for LLM calls. DB pool raised from 25 → 100. Re-tested at 600 concurrent — p95 stayed at 750ms.
Impact: Prevented timeouts during peak dashboard refresh periods. Load testing caught the cascading failures before going live.

How Does Load Testing Help in 2026?

Capacity Planning

Know your system's limits before you hit them. Make data-driven scaling decisions based on actual performance curves.

Bottleneck Identification

Pinpoint exactly where performance degrades. Database queries? Thread pools? Network I/O? Load testing reveals the answer.

SLA Validation

Prove that your system meets latency and throughput SLAs under realistic load. Ship with confidence.

Frequently Asked Questions

k6 questions engineers ask before their first production load test.

What is the difference between load testing, stress testing, and soak testing?

Load testing validates performance at expected production traffic (baseline behavior). Stress testing pushes load past the breaking point to find failure modes. Soak testing runs sustained load for hours to catch memory leaks, pool exhaustion, and slow degradation. Ship all three before production — each catches different bugs.

How do I choose between k6, JMeter, Gatling, and Locust in 2026?

k6 wins for most modern teams: JavaScript test scripts, native CI/CD integration, built-in cloud runner, Prometheus/Grafana output. JMeter is legacy XML — avoid for greenfield. Gatling is strong for JVM shops that want Scala DSL. Locust is Python-native but weaker on high VU counts. I default to k6 unless the team has a hard reason otherwise.

What p95 latency threshold is acceptable for a REST API?

For user-facing APIs, target p95 < 300ms at expected load. For backend-to-backend APIs, p95 < 800ms is usually fine. p99 should be < 2s. Set these as k6 thresholds so the test fails CI when they regress: `thresholds: { http_req_duration: ["p(95)<300"] }`. Numbers below those are aspirational — prove you hit them before you commit to an SLA.

How many virtual users (VUs) should I simulate?

Start at 2-3x your observed production peak concurrent connections. For most APIs that is 50-200 VUs for normal load tests and 500-2000 VUs for capacity tests. Run a ramping test first to find your breaking point, then anchor your baseline 30% below that number.

How do I run k6 in CI/CD without slowing down pipelines?

Split into two passes: a fast smoke test (30s, 5 VUs, blocks deploy) on every PR, and a full capacity test (10-15min) that runs nightly or on release-candidate tags. Use k6 thresholds to fail the build on regression. Export results to Prometheus and overlay against production baseline in Grafana.

How do I load test an API backed by Kafka or LLM providers?

Two considerations: (1) rate-limit your test against the upstream (LLM APIs have strict RPM caps, and Kafka partition count bounds consumer parallelism), (2) test the queuing and backpressure path explicitly — produce 10x peak load and verify your system queues/retries rather than cascading failures. I cover the LLM case in the Text2SQL use case above.

Performance Validation with k6

Live k6 Test Execution

Test Scenarios & Results

What Are the Three Core k6 Test Types?

Ramping Tests

Soak Tests

Spike Tests

Real Use Case: 500 Concurrent Queries on a Text2SQL API

How Does Load Testing Help in 2026?

Capacity Planning

Bottleneck Identification

SLA Validation

Frequently Asked Questions

Related Reading

Performance Validation with k6

Live k6 Test Execution

Test Scenarios & Results

What Are the Three Core k6 Test Types?

Ramping Tests

Soak Tests

Spike Tests

Real Use Case: 500 Concurrent Queries on a Text2SQL API

How Does Load Testing Help in 2026?

Capacity Planning

Bottleneck Identification

SLA Validation

Frequently Asked Questions

Related Reading