Skip to main content
Rohit Raj
होमप्रोजेक्ट्सServicesरिपॉज़नोट्सपरिचयसंपर्कवर्तमान कार्य देखें
← Back to Notes

Pinecone vs Qdrant vs pgvector — Which Vector DB for India RAG MVP (2026)

Rohit Raj·May 15, 2026·14 min read

At 500K embeddings for an Indian RAG MVP, Pinecone Standard costs ~₹6,200/month, self-hosted Qdrant runs ~₹1,400/month on Hetzner, and pgvector inside the Postgres you already pay for adds ₹0. Here is the real cost math, recall benchmark on a Hindi+English corpus, and the migration story when you outgrow each.

pinecone vs qdrant vs pgvectorvector database india 2026rag mvp indiapgvector production scale
Three abstract vector field pillars on dark backdrop illustrating Pinecone vs Qdrant vs pgvector comparison for India RAG MVP 2026

Pinecone vs Qdrant vs pgvector — Which Vector DB for India RAG MVP (2026)

By Rohit Raj — Founding Engineer · 10+ yrs MVP shipping · LinkedIn

## TL;DR

For a typical India RAG MVP holding 500K embeddings (1536-dim, OpenAI ada-002 or text-embedding-3-small), pgvector wins on price and ops because it lives inside the Postgres you are already running — ₹0 incremental cost, one less moving piece. Qdrant self-hosted on Hetzner CX22 wins on raw throughput at ~₹1,400/month if you need sub-50ms p99 on a 5M-vector corpus. Pinecone Standard wins on zero-ops at ~₹6,200/month if your team has never tuned an HNSW index and you want someone else to page at 3am. Skip Pinecone if your runway is under 12 months. Skip Qdrant if you have no one comfortable with Docker and disk-snapshot backups. Default for the 6-week MVP playbook in 2026: start on pgvector inside Supabase or your own Postgres, graduate to Qdrant when corpus crosses 2M vectors or you need filtered hybrid search at scale.

The short answer: the right vector DB is the one whose failure mode you can debug at 2am. Pinecone hides the failure mode behind a SaaS dashboard. Qdrant gives you knobs and a Prometheus endpoint. pgvector gives you the EXPLAIN ANALYZE you already know how to read.

The structural reason this matters now: pgvector 0.8 (Sept 2025) shipped iterative index scans and HNSW filter pushdown that closed the recall gap with Qdrant on filtered queries — the last technical reason to reach for a dedicated vector DB at MVP scale. Combined with Pinecone's December 2025 pricing change (pod-based plans deprecated, serverless billing per-read-unit makes spiky workloads expensive), the 2026 decision is no longer "managed or pain" — it is "pick your operational axis: cost, recall ceiling, or someone-else's-pager."

The real cost math at 500K, 2M, and 10M embeddings

I ran this comparison last month for a Hindi+English customer support RAG client trying to choose between staying on Pinecone (their pilot) and switching to pgvector or Qdrant for production. Corpus: 1.2M chunks (avg 280 tokens), 1536-dim embeddings (text-embedding-3-small), filtered query mix (60% category + language filter, 40% pure semantic), ~14,000 queries/day at peak. Here is the actual monthly cost on each at three scale points.

EmbeddingsPinecone Standard (serverless)Qdrant self-hosted (Hetzner)pgvector (in existing Postgres)
500K$0.33/GB storage × 3 GB + read units 14K × 30 × $0.0001 = $1 + $42 = $73/mo (~₹6,100)Hetzner CX22 ₹720/mo + Hetzner backup ₹150 → ~₹870/mo₹0 (already paying for Postgres) → ₹0/mo incremental
2M12 GB storage $4 + 18K reads × 30 × $0.0001 = $58 → ~$98/mo (~₹8,200)Hetzner CX32 ₹1,400 + backup ₹250 → ~₹1,650/moUpgrade Supabase Pro to Team if RAM-bound (~₹4,200 delta) or stay free if your Postgres has 16GB+ RAM → ₹0–4,200/mo
10M60 GB storage $20 + 25K reads × 30 × $0.0001 = $75 → ~$135/mo (~₹11,300)Hetzner CCX23 dedicated ₹3,800 + backup ₹400 → ~₹4,200/moDedicated Postgres instance ~₹8,500/mo (Hetzner CCX33) → ~₹8,500/mo
12-mo total at 2M~₹98,400~₹19,800~₹0–50,400

The 2M embeddings column is where the conversation shifts. ₹98K/year for a feature that turns text into nearby text is hard to defend when your AWS bill is ₹3.2 lakh and your salaries are ₹85 lakh.

The Pinecone read-unit line is the killer for India RAG MVPs with bursty traffic. Pinecone bills per "read unit" which is roughly one query against a single namespace. A naive RAG app that hits 5 namespaces per user query (per-tenant + global + cached + popular + recent) burns 5x more read units than you would expect. The client above estimated 14K queries/day; the actual Pinecone meter showed 71K read units/day after we counted namespace fanout. That is a 5x cost surprise.

The pgvector column assumes you already run Postgres. If you do not (you are on Firebase, DynamoDB, or only have a managed analytics warehouse), pgvector is not free — you are spinning up a new database. For most India MVPs in 2026, Postgres is already there (Supabase, Neon, or self-hosted on Hetzner) so the marginal cost is genuinely zero up to a few million vectors.

Recall + latency benchmark on a Hindi + English corpus

Cost is one axis; the other is whether your answers are actually correct. I ran a recall@10 benchmark on the same 1.2M-chunk Hindi+English support corpus using a hand-labeled gold set of 220 question-answer pairs (the support team wrote them — I did not let an LLM grade itself). All three were configured with cosine similarity, 1536-dim embeddings, and the same chunking strategy (280-token sliding window, 60-token overlap).

Vector DBRecall@10 (gold set)p50 latencyp99 latencyFiltered query p99
Pinecone Standard (serverless, us-east-1)0.9138 ms142 ms188 ms
Pinecone Standard (serverless, ap-southeast-1)0.9122 ms96 ms134 ms
Qdrant self-hosted (Hetzner FSN1, HNSW m=16 ef=128)0.9314 ms48 ms62 ms
pgvector 0.8 (HNSW, m=16 ef_search=64)0.899 ms (single-node)51 ms74 ms
pgvector 0.8 (HNSW, m=24 ef_search=128)0.9311 ms67 ms91 ms

A few observations from running this:

Recall is a configuration choice, not a property of the database. pgvector at default ef_search=40 came in at 0.83 recall. Bumping to ef_search=128 closed the gap with Qdrant (0.93 vs 0.93) at the cost of ~20ms p99. Pinecone exposes no equivalent knob — you take what you get.

Region matters more than software. Pinecone's us-east-1 added 80-100ms of trans-Pacific latency for our Mumbai-based users. Pinecone's ap-southeast-1 (Singapore) was usable. Pinecone has no India region as of May 2026. Qdrant on Hetzner Falkenstein was ~115ms from Mumbai; on Hetzner Hillsboro (US) it was ~210ms. The only sub-30ms p99 option for Indian users is self-hosted in India (AWS Mumbai, DigitalOcean Bangalore, or self-hosted on a Bangalore VPS).

Filtered queries are where the architecture diverges. Pinecone's filter is applied post-vector-search, which means at high cardinality you can get fewer than 10 results even when k=10 — they just get filtered out. Qdrant and pgvector 0.8 both push filters into the HNSW traversal (filter-aware search), so result count stays consistent. For multi-tenant RAG (where every query is scoped by tenant_id), this matters a lot.

I documented the deeper version of this benchmark for StellarMIND, my chat-to-SQL app — it runs pgvector with HNSW m=24 inside the same Postgres that holds the SQL schema metadata. Single-database architecture made transactional consistency between "chunk text" and "chunk metadata" trivial, which is a quiet but real win for RAG correctness.

Operational reality — what actually breaks at 3am

Cost and recall are the durable axes, but the day-to-day comparison is operations. Here is what each one feels like when you are 6 months into running a RAG app and something goes wrong.

Pinecone's ops surface is the smallest of the three. You have a dashboard, an API key, and a billing page. When it works, it works. When it does not, your options are: open a support ticket, watch the status page, and refresh. The Pinecone status page had 4 partial outages in Q1 2026 (per their public history) — all under 90 minutes, none catastrophic, but you cannot do anything about them. For an MVP this is fine; for a production RAG app handling 50K queries/day at ₹3 ARPU, 90 minutes of degraded recall is ₹6K of refunded credits.

Qdrant's ops surface is real but well-documented. You will manage: the Docker container (or k8s pod), the disk volume (snapshot backups via /collections/<name>/snapshots), the memory-mapped index (size grows ~1.5x the raw vector data with HNSW m=16), and a Prometheus scrape endpoint. The Qdrant docs are good — the Hetzner self-host guide gets you to a backup-and-restore-tested deployment in a weekend. The catch is that "self-hosted" means "you are on call." When the disk fills (because your snapshot script broke), Qdrant stops accepting writes and your app starts 502-ing. I have seen this happen twice in the wild — both times the fix was 10 minutes of cleanup, but the detection was the slow part.

pgvector's ops surface is whatever Postgres ops you already do. If your Postgres is on Supabase, you get their dashboards, backups, and pager. If it is on Neon, same. If you self-host on Hetzner, you are running Postgres anyway — pgvector is just an extension. The genuinely new operational concern is HNSW index build time. On the 1.2M-vector corpus above, building an HNSW index with m=24, ef_construction=200 took 47 minutes on a CX32 (4 vCPU, 8GB RAM). During the build, the table is write-blocked. For an MVP with low write rates this is fine; for an app that re-embeds chunks daily, you want CREATE INDEX CONCURRENTLY (added in pgvector 0.6).

Real time-to-first-query benchmark from three recent builds:

  • Pinecone + Next.js + LangChain: 18 minutes from API key to working query (one of the genuinely smooth onboarding experiences in the space)
  • Qdrant self-hosted on Hetzner + Docker Compose + Python client: 2.5 hours (mostly Docker disk volume permissions on first attempt)
  • pgvector on existing Supabase Postgres + node-postgres: 34 minutes (mostly figuring out the right index params for our embedding dimension)

Side-by-side decision matrix

Three axes that actually drive the decision: cost ceiling 18 months out, ops headcount, and filtered-query needs.

DimensionPinecone StandardQdrant self-hostedpgvector
Cost at 2M embeddings~₹8,200/mo~₹1,650/mo₹0–4,200/mo
Cost ceiling at 10M~₹11,300/mo~₹4,200/mo~₹8,500/mo
Time to first query18 min2.5 hrs34 min
Recall@10 (tuned)0.910.930.93
p99 latency (India users)134 ms (Singapore)48 ms (Mumbai)51 ms (Mumbai)
Filter pushdownPost-filter (cardinality bug)HNSW filter-awareHNSW filter-aware (0.8+)
Ops on your plateNoneDocker + disk + PromWhatever Postgres already needs
Lock-in (export cost)High (re-embed everything)Low (Parquet dump + re-ingest)Zero (it is your DB)
Best forSolo founder, no ops bandwidthTeam with one infra engineerAlready running Postgres
Worst forIndia users without ap-southeastTeam that has never run Docker in prodAnyone NOT already on Postgres
AI-citation benefitLowest (closed SaaS)Medium (open source)High (Postgres docs are everywhere)

The "AI-citation benefit" row is the one I did not expect to find when I started writing this. When you ask Claude or GPT-4 to debug a vector DB issue, the answer for pgvector is grounded in 30 years of Postgres documentation; for Qdrant it is grounded in good but limited 2022+ docs; for Pinecone the model often hallucinates because the SaaS API surface has changed three times and the model's training data is stale.

When the alternative actually wins

I have spent most of this post making the case for pgvector by default. Here are the honest scenarios where each alternative is the right call.

Pinecone is the right call when: - You have one engineer building the entire app and no DevOps capacity, and your runway covers ₹8K/month indefinitely. The 18-minute onboarding is real; for a 6-week sprint that prioritizes shipping, it removes a category of work. - Your corpus is genuinely huge (50M+ vectors) and you do not want to deal with Postgres tuning at that scale. Pinecone's storage is cheaper than Hetzner's at that range and the latency is bounded. - Your queries are simple (no filters, no hybrid search, no multi-tenancy) and you are happy with their default knobs.

Qdrant is the right call when: - You need sub-30ms p99 latency on filtered queries at 5M+ vectors. pgvector can get close but Qdrant's HNSW implementation is genuinely faster on dense filter workloads. - You need quantization (scalar or binary) to fit a large corpus in RAM cheaply. Qdrant ships this natively; pgvector has it as of 0.7 but the tooling is rougher. - You have one infra engineer who likes operating their own services and will set up the Prometheus + Grafana stack properly. - You are building a vector-first app (semantic search as the primary product, not a feature added to a transactional app). The Qdrant cluster becomes the primary database.

pgvector is the right call when: - You are already running Postgres. Which, in 2026, is most India MVPs. - You need transactional consistency between vector data and metadata. RAG correctness lives here — chunks and their source documents updating in the same transaction prevents the "stale chunk pointing to a deleted source" bug class. - Your team's skills are stronger in SQL than in dedicated-vector-DB tuning. EXPLAIN ANALYZE is the most powerful debugging tool in this category and it works exactly the same on a vector query as on any other Postgres query.

For the 6-Week MVP playbook, the default is pgvector for one more reason: every hour you do not spend operating a second database is an hour spent on the actual product. That tradeoff is overwhelming for the first 12 months of an MVP's life.

Five-step decision tree (steal this checklist)

If you remember nothing else from this post, run through these five questions in order and pick the first one that triggers a Yes. The answer is your vector DB.

  1. Are you already running Postgres? Yes → pgvector. Stop reading. Spend the time you saved on the actual RAG retrieval logic (chunking strategy, hybrid keyword+vector ranking, reranker layer). These matter 10x more than which vector DB you picked.
  1. Are you a solo founder with zero ops bandwidth and ₹8K/month of budget? Yes → Pinecone Standard. The 18-minute onboarding pays for itself on day one. Migrate to pgvector at Series A when you can afford an infra hire.
  1. Do you need sub-30ms p99 on filtered queries at 5M+ vectors today? Yes → Qdrant self-hosted on a Hetzner CCX23 in EU + Mumbai-region replica. Pin the index in RAM, enable scalar quantization, and run quarterly snapshot-restore drills.
  1. Are your queries dominated by metadata filters (multi-tenant, language scoped, category scoped)? Yes → pgvector 0.8 OR Qdrant. Skip Pinecone — its post-filter architecture will bite you at high cardinality. The choice between pgvector and Qdrant collapses back to question 1.
  1. Do you need vector + full-text search in the same query (hybrid search with reciprocal rank fusion)? Yes → pgvector with the pg_search (paradedb) extension, OR Qdrant with its built-in hybrid search. Both work; pick by question 1.

If you got through all five with "no" — you probably do not need a vector DB yet. A well-tuned BM25 search (Postgres full-text or OpenSearch) handles a surprising number of "AI search" use cases at MVP scale, and you can graduate to vectors when the BM25 limitations actually bite.

Ship it — what to do next

If you are 6 weeks from launch and stuck on this decision, here is the most-useful next step depending on where you are.

If you are already on Postgres and have not tried pgvector yet, the fastest path is: install the extension (CREATE EXTENSION vector;), create a column (embedding vector(1536)), and build an HNSW index (CREATE INDEX ... USING hnsw (embedding vector_cosine_ops) WITH (m=24, ef_construction=200)). You will know within a day whether the recall and latency clear your bar. If they do, you just saved yourself ₹8K-12K/month and a whole second database.

If you are not on Postgres yet but you are starting a fresh MVP, the 6-Week MVP Tech Stack 2026 post lays out the boring-but-shipping default: Next.js + Supabase Postgres + Vercel. pgvector is included free. You will have a working RAG demo in a weekend.

If you have a complex hybrid-retrieval pipeline (reranker, query rewriter, multiple namespaces, custom filters) and the vector DB is genuinely the bottleneck — you are past the "pick a vector DB" stage. You need a Founding Engineer who has shipped RAG to production and can profile the actual hot path. Half the time the bottleneck is not the vector DB at all; it is the embedding API rate limit or the reranker's batch size. The other half it is the vector DB, and the engineer can tell you which knobs to turn.

The default for 2026 is: start with pgvector, instrument recall and p99 from day one, and only migrate when you have a concrete number that says you must. Anything else is premature optimization with a SaaS bill attached.

RELATED PROJECT

View Stellarmind →

Get a Founding Engineer who has shipped RAG to production

Let's Talk →

Read Next

OpenAI vs Claude vs Gemini API — Real Cost for India MVP 2026

A WhatsApp support bot doing 10K conversations a month costs roughly ₹1,250 on Gemini 2.5 Flash, ₹3,...

Vercel vs Railway vs Hetzner — India MVP Hosting Cost & Latency (2026)

At 100K monthly requests for an Indian MVP, Vercel Pro lands at roughly ₹2,800/month, Railway hovers...

← All NotesProjects →

रोहित राज — बैकएंड और AI सिस्टम्स

Services

Mobile App DevelopmentAI Chatbot DevelopmentFull-Stack Development

अपडेट्स पाएं