Skip to main content
Rohit Raj
الرئيسيةالمشاريعServicesالمستودعاتالملاحظاتنبذة عنياتصل بيعرض العمل الحالي
Home→Reliability→Load Testing
⚡ Load Testing

Performance Validation with k6

Validate capacity under real traffic. Ramping, soak, and spike tests for REST and Kafka-backed APIs — with thresholds that fail the CI build when performance regresses.

By Rohit Raj · Founding Engineer·Published Jan 31, 2026·Updated Apr 24, 2026

Live k6 Test Execution

RUNNING
Virtual Users
150
Requests/sec
1247
P95 Latency
23.4ms
P99 Latency
45.2ms
Error Rate
0.12%

Test Scenarios & Results

Virtual Users
500
Peak Concurrency
Throughput
2500
Requests per second
P95 Latency
24.3ms
95th percentile
Error Rate
0.08%
Failed requests

What Are the Three Core k6 Test Types?

Ramping, soak, and spike — each catches different classes of production failure.

Ramping Tests

Gradually increase load to find breaking points. Identify at what VU count latency degrades or errors spike.

stages: [{ duration: "5m", target: 500 }]

Soak Tests

Sustained load for hours. Catches memory leaks, connection pool exhaustion, and slow degradation over time.

duration: "2h", vus: 100

Spike Tests

Sudden traffic bursts to test auto-scaling. Verifies the system recovers gracefully after load spikes.

stages: [{ duration: "30s", target: 1000 }]
⚡

Real Use Case: 500 Concurrent Queries on a Text2SQL API

How load testing revealed hidden bottlenecks in the AI query pipeline

Challenge
The Text2SQL Query Engine needed to handle 500 concurrent natural language queries with p95 latency under 800ms. Business dashboards refresh on a schedule, causing predictable traffic spikes.
Solution
Created k6 scripts simulating diverse query patterns across 20 query templates with realistic payload sizes.
k6 script
export const options = {
  stages: [
    { duration: "2m", target: 100 },
    { duration: "5m", target: 500 },
    { duration: "2m", target: 0 },
  ],
  thresholds: {
    http_req_duration: ["p(95)<800"],
    http_req_failed: ["rate<0.005"],
  },
};

export default function () {
  http.post(API_URL, JSON.stringify({ query: "Show sales by region last quarter" }));
}
Discovery
At 350 concurrent users, p95 latency jumped from 620ms to 3.2 seconds. Root cause: LLM provider rate limiting combined with database connection pool saturation.
Fix
Request queuing with exponential backoff for LLM calls. DB pool raised from 25 → 100. Re-tested at 600 concurrent — p95 stayed at 750ms.
Impact
Prevented timeouts during peak dashboard refresh periods. Load testing caught the cascading failures before going live.

How Does Load Testing Help in 2026?

📊

Capacity Planning

Know your system's limits before you hit them. Make data-driven scaling decisions based on actual performance curves.

🔍

Bottleneck Identification

Pinpoint exactly where performance degrades. Database queries? Thread pools? Network I/O? Load testing reveals the answer.

✅

SLA Validation

Prove that your system meets latency and throughput SLAs under realistic load. Ship with confidence.

Frequently Asked Questions

k6 questions engineers ask before their first production load test.

What is the difference between load testing, stress testing, and soak testing?

Load testing validates performance at expected production traffic (baseline behavior). Stress testing pushes load past the breaking point to find failure modes. Soak testing runs sustained load for hours to catch memory leaks, pool exhaustion, and slow degradation. Ship all three before production — each catches different bugs.

How do I choose between k6, JMeter, Gatling, and Locust in 2026?

k6 wins for most modern teams: JavaScript test scripts, native CI/CD integration, built-in cloud runner, Prometheus/Grafana output. JMeter is legacy XML — avoid for greenfield. Gatling is strong for JVM shops that want Scala DSL. Locust is Python-native but weaker on high VU counts. I default to k6 unless the team has a hard reason otherwise.

What p95 latency threshold is acceptable for a REST API?

For user-facing APIs, target p95 < 300ms at expected load. For backend-to-backend APIs, p95 < 800ms is usually fine. p99 should be < 2s. Set these as k6 thresholds so the test fails CI when they regress: `thresholds: { http_req_duration: ["p(95)<300"] }`. Numbers below those are aspirational — prove you hit them before you commit to an SLA.

How many virtual users (VUs) should I simulate?

Start at 2-3x your observed production peak concurrent connections. For most APIs that is 50-200 VUs for normal load tests and 500-2000 VUs for capacity tests. Run a ramping test first to find your breaking point, then anchor your baseline 30% below that number.

How do I run k6 in CI/CD without slowing down pipelines?

Split into two passes: a fast smoke test (30s, 5 VUs, blocks deploy) on every PR, and a full capacity test (10-15min) that runs nightly or on release-candidate tags. Use k6 thresholds to fail the build on regression. Export results to Prometheus and overlay against production baseline in Grafana.

How do I load test an API backed by Kafka or LLM providers?

Two considerations: (1) rate-limit your test against the upstream (LLM APIs have strict RPM caps, and Kafka partition count bounds consumer parallelism), (2) test the queuing and backpressure path explicitly — produce 10x peak load and verify your system queues/retries rather than cascading failures. I cover the LLM case in the Text2SQL use case above.

Related Reading

  • Kafka Consumer Testing with Embedded Kafka →
  • Observability: Prometheus + Grafana for SLOs →
  • API Contract Testing with Postman + Newman →
  • Multi-Tenant SaaS on Spring Boot + Java 21 →
  • Hire Rohit: Performance & Reliability Consulting →
← Back to Reliability Overview

روهيت راج — مهندس الخلفية والذكاء الاصطناعي

Services

Mobile App DevelopmentAI Chatbot DevelopmentFull-Stack Development

احصل على التحديثات