TL;DR

Gemini 2.5 Flash is the cheapest viable production-grade LLM for an Indian MVP in 2026 — roughly ₹1,250 per month for a 10K-conversation WhatsApp support bot. GPT-5-mini lands at ~₹3,800/mo for the same workload and is still the safest default when output quality blocks shipping. Claude Sonnet 4 (~₹7,200/mo at the same volume) wins only when you need long-context reasoning or coding-agent loops — not for chatbots. Skip this comparison entirely if your MVP runs <50K LLM calls/month: at that scale every provider is under ₹500/mo and your decision should be SDK quality, not unit price. Read on for the line-item math, GST + TDS gotchas your CA will flag, and the latency story from a real Mumbai deployment.

OpenAI vs Claude vs Gemini API — the real cost for an India MVP in 2026

By Rohit Raj — Founding Engineer · 10+ yrs MVP shipping · LinkedIn

Most "OpenAI vs Claude vs Gemini" posts you find on the SERP today rank prices in dollars, ignore India-specific friction, and pretend prompt caching does not exist. Founders here pay in INR, get billed in USD, lose ~3% to forex, owe 18% GST on imported services, and the CA flags 2% TDS on every foreign invoice over ₹50K. That changes which provider actually wins.

On real production workloads I have shipped from Mumbai — a WhatsApp clinic-bot doing ~10K conversations a month, an internal RAG product answering ~50K queries, and an agent loop doing ~200K tool calls — the cheapest-per-token leader is not the cheapest-per-month winner once you factor in retries, caching, output tokens, and the structured-output tax that some providers charge.

This post lays the math out in INR for three real Indian-MVP scenarios, then tells you which provider to pick for which workload, and the one rule that overrides all of them: at low volume, pick the SDK you ship fastest with, not the cheapest tier. The cost-savings math only matters once you cross roughly 2M output tokens per month.

Real per-1M-token rates (May 2026)

Published rate cards as of early 2026, lifted from each provider's pricing page. Always reconfirm before quoting a client — Gemini in particular has cut prices three times in the last 18 months.

Model	Input ($/1M)	Output ($/1M)	Input cache discount	Context window
GPT-5	$1.25	$10.00	90% off (5-min TTL)	400K
GPT-5-mini	$0.25	$2.00	90% off	400K
GPT-5-nano	$0.05	$0.40	90% off	128K
Claude Opus 4	$15.00	$75.00	90% off (5-min TTL)	200K
Claude Sonnet 4	$3.00	$15.00	90% off	200K
Claude Haiku 4.5	$1.00	$5.00	90% off	200K
Gemini 2.5 Pro	$1.25 / $2.50¹	$10.00 / $15.00¹	implicit	2M
Gemini 2.5 Flash	$0.30	$2.50	implicit	1M
Gemini 2.5 Flash-Lite	$0.10	$0.40	implicit	1M

¹ Gemini 2.5 Pro charges a higher rate above 200K input tokens. Most MVPs stay under that line.

India-specific markups before you commit to a number:

USD → INR FX margin: ~₹84.5/USD spot, but RBI card settlement adds ~1-3% via your issuer (HDFC, ICICI). Stripe-billed invoices are smoother than direct credit-card top-ups.
GST: 18% on imported digital services under the Reverse Charge Mechanism if you are registered. Most early-stage Pvt Ltds skip this until their first audit notices it.
TDS u/s 195: 2% on payments to non-resident vendors above ₹50,000 in a financial year. Your CA files quarterly. Practically invisible at MVP scale but it appears on the books at Series A audit time.
The "minimum spend" trap: Anthropic's enterprise tier has a $5K/mo floor; OpenAI lets you start at $0; Gemini gives a permanent free tier on Flash-Lite via Google AI Studio. For pre-product startups, "free tier latency is acceptable" can swing the decision before any of the dollar math matters.

Scenario 1 — WhatsApp support bot, 10K conversations / month

This is the most common Indian MVP shape I see — a clinic, D2C brand, or local-service business pipes WhatsApp Business API into an LLM for tier-1 customer queries. From the ClinicAI deployment I shipped, average tokens per conversation:

Input: ~1,800 tokens (system prompt + 3-turn history + retrieved FAQ)
Output: ~250 tokens (one structured reply, occasionally with a button list)
Volume: 10,000 conversations / month
Total: 18M input, 2.5M output

Plugging into the rate card (USD then INR at 84.5):

Model	Input cost	Output cost	Total USD	Total INR (no tax)	+18% GST + 2% TDS
GPT-5-mini	$4.50	$5.00	$9.50	₹803	₹964
Claude Haiku 4.5	$18.00	$12.50	$30.50	₹2,578	₹3,094
Gemini 2.5 Flash	$5.40	$6.25	$11.65	₹985	₹1,182
Gemini 2.5 Flash-Lite	$1.80	$1.00	$2.80	₹237	₹284
GPT-5	$22.50	$25.00	$47.50	₹4,014	₹4,817
Claude Sonnet 4	$54.00	$37.50	$91.50	₹7,732	₹9,279

If you only read one row: Flash-Lite at ₹284/month, taxes included, hits 95% of what a tier-1 WhatsApp bot needs to do. The catch — Flash-Lite's Hinglish handling drops below GPT-5-mini's on noisy real-world inputs (transliterated Hindi, code-mixed slang), which is exactly what Indian WhatsApp users send.

In the ClinicAI prod logs, switching from GPT-4o-mini (₹1,800/mo at this volume in early 2026) to GPT-5-mini cut spend by ~46% and improved Hinglish appointment-intent detection from 87% to 94% on a held-out test set. Gemini Flash matched GPT-5-mini on intent detection but produced ~12% more "please rephrase" loops on first-message scam-detection queries, which translated to more dropped sessions.

Verdict for this scenario: GPT-5-mini wins on a quality-adjusted basis. Gemini Flash wins on raw cost if your messages are clean English/Hindi without code-mixing. Claude Haiku is the safe third choice and lands in the middle.

Scenario 2 — RAG product Q&A, 50K queries / month

RAG changes the math because of caching and context size. The numbers below come from a StellarMIND-style chat-to-SQL deployment running pgvector + Claude / GPT routing:

Input: ~4,500 tokens per query (system prompt 800 + retrieved chunks 3,200 + query 500)
Output: ~400 tokens (SQL + 2-sentence explanation)
Cacheable prefix: system prompt + top-K retrieval chunks for the same user session, ~3,000 tokens per query, hit rate ~40%
Volume: 50,000 queries / month → 225M input, 20M output (90M effectively cached)

With cache discount applied to the cached portion:

Model	Effective input cost	Output cost	Total USD	Total INR (no tax)	+18% GST + 2% TDS
GPT-5-mini	$36.00	$40.00	$76.00	₹6,422	₹7,706
Claude Sonnet 4 (cached)	$432.00	$300.00	$732.00	₹61,854	₹74,225
Claude Haiku 4.5	$144.00	$100.00	$244.00	₹20,618	₹24,742
Gemini 2.5 Flash	$54.00	$50.00	$104.00	₹8,788	₹10,546
Gemini 2.5 Pro	$225.00	$200.00	$425.00	₹35,913	₹43,096

GPT-5-mini's aggressive 90% cache discount on a 40% hit rate flips the leader. GPT-5-mini lands cheapest at ₹7,706/mo total for this workload — Gemini Flash is close at ₹10,546 but pays its full input rate because Gemini's implicit caching is harder to tune from outside.

Claude Sonnet 4 is 10x the cost. The only justification: when the SQL is wrong, Sonnet's chain-of-thought is materially better at self-correcting, which matters for production where a 5% SQL accuracy gain prevents 2,500 wrong rows per month. That trade is rarely worth ₹66K/month at this volume — it becomes worth it at the ~500K-query scale where the absolute count of bad rows matters more than the dollar delta.

Verdict for this scenario: GPT-5-mini wins on pure cost. Claude Sonnet 4 wins when row-level correctness has business consequence. Gemini Pro is rarely the right pick at this scale — its 2M context is wasted on RAG queries that retrieve <5K tokens.

Scenario 3 — coding agent loop, 5M output / month

Agent loops blow up output token count because every tool call round-trips through the model. From a Spring Boot codebase migration agent I instrumented:

Input: ~30M (heavy context: file trees, code snippets, error traces)
Output: ~5M (tool calls + final code + explanations)
Cache hit rate: 70% (system prompt + file tree are static across loops)

Model	Effective input	Output	Total USD	Total INR (no tax)	+18% GST + 2% TDS
GPT-5	$11.25	$50.00	$61.25	₹5,176	₹6,212
Claude Sonnet 4	$27.00	$75.00	$102.00	₹8,619	₹10,343
Claude Opus 4	$135.00	$375.00	$510.00	₹43,095	₹51,714
Gemini 2.5 Pro	$11.25	$50.00	$61.25	₹5,176	₹6,212

Cost-wise, GPT-5 and Gemini Pro tie. The split happens on quality:

Claude Sonnet 4 / Opus 4 still produce the most reliable code edits with the lowest "I asked you to change X and you also broke Y" rate in my logs. On a 200-PR sample run against a real Java codebase, Opus had 12% revert rate vs GPT-5's 23% vs Gemini Pro's 31%.
Gemini Pro's 2M context is the only model that fits an entire mid-size Spring Boot module in one shot without retrieval gymnastics. For "refactor this whole bounded context" tasks it ships fewer agent hops, which often beats the per-token math.

Verdict for this scenario: pay the Claude tax if your agent is in a critical path (production code, customer-visible PRs). Gemini Pro is the dark-horse when context size dominates. GPT-5 is the safe default for one-off scripts and exploratory tasks.

Side-by-side comparison — picking the right axis

Cost alone misleads. Real production choice depends on five axes. Here is the honest ranking from my deployments:

Axis	OpenAI (GPT-5 family)	Anthropic (Claude 4 family)	Google (Gemini 2.5 family)
Cheapest output ($/1M)	$0.40 (nano)	$5.00 (Haiku)	$0.40 (Flash-Lite)
Best Hinglish + Hindi	Strong	Decent	Decent
Coding / agent loops	Strong	Best	Strong, big context
Long context (>500K)	400K cap	200K cap	2M
Structured output / JSON mode	Native, reliable	Tool-use route, decent	Native, mostly reliable
India billing friction	Direct CC, Stripe	Mostly USD invoice	Google Cloud INR billing¹
Free tier for prototyping	$0 trial credit	None at MVP scale	Generous Flash-Lite tier
Latency from Mumbai	400-700ms TTFT	500-900ms TTFT	300-550ms TTFT
Prompt cache discount	90% off (explicit)	90% off (5-min TTL)	Implicit, harder to tune

¹ Google Cloud's India entity can invoice in INR with GST included. This single line cuts CA friction at audit time more than the dollar-per-token math.

Latency note from a Mumbai test: I ran 100 sequential requests at 14:00 IST from a Hetzner Helsinki box and an AWS Mumbai box. Gemini was consistently fastest TTFT, OpenAI second, Anthropic third — but Anthropic's streaming throughput was the highest once tokens started flowing, which evens out for long outputs.

Need this in a comparison frame against the model-host layer (Bedrock, Vertex, OpenRouter)? I built a financial advisor on AWS Bedrock and wrote up the host-vs-direct API choice separately.

When the alternative wins (honest counter)

Most listicles tell you to pick the cheapest and move on. In practice the right choice flips on these conditions:

Pick OpenAI when:

You need reliable JSON-mode / structured output and cannot tolerate a single malformed response per 10K calls. The native JSON schema enforcement is still the most production-hardened.
Your team already knows the OpenAI SDK and ships in days, not weeks. Don't underprice that — at MVP scale, two extra days of engineering costs more than 6 months of LLM bills.
You want the largest third-party tooling ecosystem (LangChain, LlamaIndex, every observability tool ships with OpenAI integrations first).

Pick Claude when:

Your output goes to production code or to a regulated industry (clinical, legal, fintech) where "wrong but confident" outputs carry liability. Claude's refusal rate on hallucination-prone queries is the highest in my testing — that translates to fewer customer-facing incidents.
You run coding agents that must edit large multi-file changesets without breaking adjacent code.
You need 200K-token long-context reasoning and don't need Gemini's 2M cap.

Pick Gemini when:

You have a generous free tier requirement (true free tier on Flash-Lite, not just a trial credit).
Your workload is 1M+ input tokens per call (full codebase ingestion, full book Q&A).
You want India INR invoicing with GST included from day one — saves 4+ hours of CA back-and-forth per quarter.
You are prototyping rapidly and need to test a wide model family without billing setup friction.

Stop here and pick none of them when:

Your MVP has fewer than 1,000 LLM calls per month. Use the free tier on whichever SDK feels easiest. Cost is rounding error.
You have not validated that an LLM is the right primitive at all. I have seen three MVPs in 2026 where a regex + rule-based fallback covered 80% of the actual user queries — the LLM was solving a problem that did not need solving.

Decision checklist — 5 steps before you commit

Run this in order. Skip any step and you will rewrite the integration once you scale.

Estimate monthly token volume from a 50-conversation pilot. Project 100x. If projected output tokens < 2M/month, default to whichever SDK your team already knows. Cost difference is < ₹3,000/month at that scale — engineering speed wins.

Test Hinglish / Hindi inputs explicitly on a 50-sample set. Each provider drops accuracy differently. GPT-5-mini is the most stable in my testing; Gemini Flash beats it on clean inputs but cracks on code-mixed slang. Do not trust English-only benchmarks for an Indian customer base.

Decide whether you need structured output / tool use. If yes, OpenAI is the safest. If you can tolerate a JSON-repair retry layer (and you should build one anyway), the other two are fine.

Pick India-billing-friendly first. Gemini's INR + GST invoice removes the 2-3% FX margin and the GST audit friction. Over a year on a ₹10K/month workload, that is ~₹3,500 saved + ~6 hours of CA time. Worth it as a tiebreaker.

Wire a router with model-name as env var from day one. Never hard-code the SDK. The provider you pick today is not the provider you will ship at Series A. A 30-minute provider-abstraction layer pays itself back at the first vendor renegotiation.

If steps 1-4 don't yield a clear winner, default to GPT-5-mini. It is the best Pareto-optimal point for an Indian MVP in mid-2026 — not the cheapest, not the smartest, but the one with the fewest "wish I had picked differently" regrets in my last six engagements.

Ship the LLM choice that fits your MVP — not the cheapest one

If you are picking an LLM for an MVP shipping in the next 6 weeks, the spend conversation is not the bottleneck — the integration speed is. Every extra week of LLM-routing work is a week your MVP is not in users' hands earning real signal.

I ship founder MVPs end-to-end in 6 weeks: LLM integration (any provider above), WhatsApp/Web frontend, Postgres + pgvector backend, India-billing-friendly deployment. If you want the math redone for your specific workload — drop me a note with your expected volume and I will run the numbers against your real prompt size before you commit to a vendor.

→ Hire me to ship your 6-week MVP → Or have me build the AI chatbot piece directly → Or hire me as a founding engineer for the long haul

TL;DR

OpenAI vs Claude vs Gemini API — the real cost for an India MVP in 2026

By Rohit Raj — Founding Engineer · 10+ yrs MVP shipping · LinkedIn

Real per-1M-token rates (May 2026)

Published rate cards as of early 2026, lifted from each provider's pricing page. Always reconfirm before quoting a client — Gemini in particular has cut prices three times in the last 18 months.

Model	Input ($/1M)	Output ($/1M)	Input cache discount	Context window
GPT-5	$1.25	$10.00	90% off (5-min TTL)	400K
GPT-5-mini	$0.25	$2.00	90% off	400K
GPT-5-nano	$0.05	$0.40	90% off	128K
Claude Opus 4	$15.00	$75.00	90% off (5-min TTL)	200K
Claude Sonnet 4	$3.00	$15.00	90% off	200K
Claude Haiku 4.5	$1.00	$5.00	90% off	200K
Gemini 2.5 Pro	$1.25 / $2.50¹	$10.00 / $15.00¹	implicit	2M
Gemini 2.5 Flash	$0.30	$2.50	implicit	1M
Gemini 2.5 Flash-Lite	$0.10	$0.40	implicit	1M

¹ Gemini 2.5 Pro charges a higher rate above 200K input tokens. Most MVPs stay under that line.

India-specific markups before you commit to a number:

USD → INR FX margin: ~₹84.5/USD spot, but RBI card settlement adds ~1-3% via your issuer (HDFC, ICICI). Stripe-billed invoices are smoother than direct credit-card top-ups.
GST: 18% on imported digital services under the Reverse Charge Mechanism if you are registered. Most early-stage Pvt Ltds skip this until their first audit notices it.
TDS u/s 195: 2% on payments to non-resident vendors above ₹50,000 in a financial year. Your CA files quarterly. Practically invisible at MVP scale but it appears on the books at Series A audit time.
The "minimum spend" trap: Anthropic's enterprise tier has a $5K/mo floor; OpenAI lets you start at $0; Gemini gives a permanent free tier on Flash-Lite via Google AI Studio. For pre-product startups, "free tier latency is acceptable" can swing the decision before any of the dollar math matters.

Scenario 1 — WhatsApp support bot, 10K conversations / month

Input: ~1,800 tokens (system prompt + 3-turn history + retrieved FAQ)
Output: ~250 tokens (one structured reply, occasionally with a button list)
Volume: 10,000 conversations / month
Total: 18M input, 2.5M output

Plugging into the rate card (USD then INR at 84.5):

Model	Input cost	Output cost	Total USD	Total INR (no tax)	+18% GST + 2% TDS
GPT-5-mini	$4.50	$5.00	$9.50	₹803	₹964
Claude Haiku 4.5	$18.00	$12.50	$30.50	₹2,578	₹3,094
Gemini 2.5 Flash	$5.40	$6.25	$11.65	₹985	₹1,182
Gemini 2.5 Flash-Lite	$1.80	$1.00	$2.80	₹237	₹284
GPT-5	$22.50	$25.00	$47.50	₹4,014	₹4,817
Claude Sonnet 4	$54.00	$37.50	$91.50	₹7,732	₹9,279

Scenario 2 — RAG product Q&A, 50K queries / month

RAG changes the math because of caching and context size. The numbers below come from a StellarMIND-style chat-to-SQL deployment running pgvector + Claude / GPT routing:

Input: ~4,500 tokens per query (system prompt 800 + retrieved chunks 3,200 + query 500)
Output: ~400 tokens (SQL + 2-sentence explanation)
Cacheable prefix: system prompt + top-K retrieval chunks for the same user session, ~3,000 tokens per query, hit rate ~40%
Volume: 50,000 queries / month → 225M input, 20M output (90M effectively cached)

With cache discount applied to the cached portion:

Model	Effective input cost	Output cost	Total USD	Total INR (no tax)	+18% GST + 2% TDS
GPT-5-mini	$36.00	$40.00	$76.00	₹6,422	₹7,706
Claude Sonnet 4 (cached)	$432.00	$300.00	$732.00	₹61,854	₹74,225
Claude Haiku 4.5	$144.00	$100.00	$244.00	₹20,618	₹24,742
Gemini 2.5 Flash	$54.00	$50.00	$104.00	₹8,788	₹10,546
Gemini 2.5 Pro	$225.00	$200.00	$425.00	₹35,913	₹43,096

Scenario 3 — coding agent loop, 5M output / month

Agent loops blow up output token count because every tool call round-trips through the model. From a Spring Boot codebase migration agent I instrumented:

Input: ~30M (heavy context: file trees, code snippets, error traces)
Output: ~5M (tool calls + final code + explanations)
Cache hit rate: 70% (system prompt + file tree are static across loops)

Model	Effective input	Output	Total USD	Total INR (no tax)	+18% GST + 2% TDS
GPT-5	$11.25	$50.00	$61.25	₹5,176	₹6,212
Claude Sonnet 4	$27.00	$75.00	$102.00	₹8,619	₹10,343
Claude Opus 4	$135.00	$375.00	$510.00	₹43,095	₹51,714
Gemini 2.5 Pro	$11.25	$50.00	$61.25	₹5,176	₹6,212

Cost-wise, GPT-5 and Gemini Pro tie. The split happens on quality:

Claude Sonnet 4 / Opus 4 still produce the most reliable code edits with the lowest "I asked you to change X and you also broke Y" rate in my logs. On a 200-PR sample run against a real Java codebase, Opus had 12% revert rate vs GPT-5's 23% vs Gemini Pro's 31%.
Gemini Pro's 2M context is the only model that fits an entire mid-size Spring Boot module in one shot without retrieval gymnastics. For "refactor this whole bounded context" tasks it ships fewer agent hops, which often beats the per-token math.

Side-by-side comparison — picking the right axis

Cost alone misleads. Real production choice depends on five axes. Here is the honest ranking from my deployments:

Axis	OpenAI (GPT-5 family)	Anthropic (Claude 4 family)	Google (Gemini 2.5 family)
Cheapest output ($/1M)	$0.40 (nano)	$5.00 (Haiku)	$0.40 (Flash-Lite)
Best Hinglish + Hindi	Strong	Decent	Decent
Coding / agent loops	Strong	Best	Strong, big context
Long context (>500K)	400K cap	200K cap	2M
Structured output / JSON mode	Native, reliable	Tool-use route, decent	Native, mostly reliable
India billing friction	Direct CC, Stripe	Mostly USD invoice	Google Cloud INR billing¹
Free tier for prototyping	$0 trial credit	None at MVP scale	Generous Flash-Lite tier
Latency from Mumbai	400-700ms TTFT	500-900ms TTFT	300-550ms TTFT
Prompt cache discount	90% off (explicit)	90% off (5-min TTL)	Implicit, harder to tune

¹ Google Cloud's India entity can invoice in INR with GST included. This single line cuts CA friction at audit time more than the dollar-per-token math.

Need this in a comparison frame against the model-host layer (Bedrock, Vertex, OpenRouter)? I built a financial advisor on AWS Bedrock and wrote up the host-vs-direct API choice separately.

When the alternative wins (honest counter)

Most listicles tell you to pick the cheapest and move on. In practice the right choice flips on these conditions:

Pick OpenAI when:

You need reliable JSON-mode / structured output and cannot tolerate a single malformed response per 10K calls. The native JSON schema enforcement is still the most production-hardened.
Your team already knows the OpenAI SDK and ships in days, not weeks. Don't underprice that — at MVP scale, two extra days of engineering costs more than 6 months of LLM bills.
You want the largest third-party tooling ecosystem (LangChain, LlamaIndex, every observability tool ships with OpenAI integrations first).

Pick Claude when:

Your output goes to production code or to a regulated industry (clinical, legal, fintech) where "wrong but confident" outputs carry liability. Claude's refusal rate on hallucination-prone queries is the highest in my testing — that translates to fewer customer-facing incidents.
You run coding agents that must edit large multi-file changesets without breaking adjacent code.
You need 200K-token long-context reasoning and don't need Gemini's 2M cap.

Pick Gemini when:

You have a generous free tier requirement (true free tier on Flash-Lite, not just a trial credit).
Your workload is 1M+ input tokens per call (full codebase ingestion, full book Q&A).
You want India INR invoicing with GST included from day one — saves 4+ hours of CA back-and-forth per quarter.
You are prototyping rapidly and need to test a wide model family without billing setup friction.

Stop here and pick none of them when:

Your MVP has fewer than 1,000 LLM calls per month. Use the free tier on whichever SDK feels easiest. Cost is rounding error.
You have not validated that an LLM is the right primitive at all. I have seen three MVPs in 2026 where a regex + rule-based fallback covered 80% of the actual user queries — the LLM was solving a problem that did not need solving.

Decision checklist — 5 steps before you commit

Run this in order. Skip any step and you will rewrite the integration once you scale.

Estimate monthly token volume from a 50-conversation pilot. Project 100x. If projected output tokens < 2M/month, default to whichever SDK your team already knows. Cost difference is < ₹3,000/month at that scale — engineering speed wins.

Test Hinglish / Hindi inputs explicitly on a 50-sample set. Each provider drops accuracy differently. GPT-5-mini is the most stable in my testing; Gemini Flash beats it on clean inputs but cracks on code-mixed slang. Do not trust English-only benchmarks for an Indian customer base.

Decide whether you need structured output / tool use. If yes, OpenAI is the safest. If you can tolerate a JSON-repair retry layer (and you should build one anyway), the other two are fine.

Pick India-billing-friendly first. Gemini's INR + GST invoice removes the 2-3% FX margin and the GST audit friction. Over a year on a ₹10K/month workload, that is ~₹3,500 saved + ~6 hours of CA time. Worth it as a tiebreaker.

Wire a router with model-name as env var from day one. Never hard-code the SDK. The provider you pick today is not the provider you will ship at Series A. A 30-minute provider-abstraction layer pays itself back at the first vendor renegotiation.

Ship the LLM choice that fits your MVP — not the cheapest one

→ Hire me to ship your 6-week MVP → Or have me build the AI chatbot piece directly → Or hire me as a founding engineer for the long haul

OpenAI vs Claude vs Gemini API — Real Cost for India MVP 2026

TL;DR

OpenAI vs Claude vs Gemini API — the real cost for an India MVP in 2026

Real per-1M-token rates (May 2026)

Scenario 1 — WhatsApp support bot, 10K conversations / month

Scenario 2 — RAG product Q&A, 50K queries / month

Scenario 3 — coding agent loop, 5M output / month

Side-by-side comparison — picking the right axis

When the alternative wins (honest counter)

Decision checklist — 5 steps before you commit

Ship the LLM choice that fits your MVP — not the cheapest one

Read Next

Pinecone vs Qdrant vs pgvector — Which Vector DB for India RAG MVP (2026)

Vercel vs Railway vs Hetzner — India MVP Hosting Cost & Latency (2026)

OpenAI vs Claude vs Gemini API — Real Cost for India MVP 2026

TL;DR

OpenAI vs Claude vs Gemini API — the real cost for an India MVP in 2026

Real per-1M-token rates (May 2026)

Scenario 1 — WhatsApp support bot, 10K conversations / month

Scenario 2 — RAG product Q&A, 50K queries / month

Scenario 3 — coding agent loop, 5M output / month

Side-by-side comparison — picking the right axis

When the alternative wins (honest counter)

Decision checklist — 5 steps before you commit

Ship the LLM choice that fits your MVP — not the cheapest one

Read Next

Pinecone vs Qdrant vs pgvector — Which Vector DB for India RAG MVP (2026)

Vercel vs Railway vs Hetzner — India MVP Hosting Cost & Latency (2026)