Cheapest LLM API in 2025:
Full Pricing Comparison
Compare the lowest-cost AI APIs — Claude Haiku, GPT-4o mini, Gemini Flash, Mistral, and more. Find the best model for your budget and use case.
Calculate My Exact Costs →LLM API Pricing Comparison — 2025 (per 1M tokens)
| Model | Input / 1M | Output / 1M | Cache Read / 1M | Context |
|---|---|---|---|---|
| Gemini 1.5 Flash Cheapest raw | $0.075 | $0.30 | $0.019 | 1M |
| GPT-4o mini | $0.15 | $0.60 | $0.075 | 128K |
| Claude Haiku 3.5 Best cached | $0.80 | $4.00 | $0.08 | 200K |
| Mistral Small | $0.20 | $0.60 | — | 128K |
| Claude Sonnet 4.5 | $3.00 | $15.00 | $0.30 | 200K |
| GPT-4o | $2.50 | $10.00 | $1.25 | 128K |
| Gemini 1.5 Pro | $1.25 | $5.00 | $0.31 | 2M |
| Claude Opus (latest) | $15.00 | $75.00 | $1.50 | 200K |
Which Cheap Model Should You Use?
The "cheapest" API depends on your workload. Here's a quick decision guide:
Frequently Asked Questions
What is the cheapest LLM API in 2025?
On a pure per-token basis: Gemini 1.5 Flash ($0.075/M input) and GPT-4o mini ($0.15/M input) are the cheapest frontier models. However, for workloads with large repeated system prompts (like Claude Code), Claude Haiku's prompt-cache read price of $0.08/M makes it effectively cheaper after the first call. There is no single "cheapest" — it depends on your token ratio and caching patterns.
Is Claude Haiku cheaper than GPT-4o mini?
GPT-4o mini is cheaper on raw input price ($0.15/M vs $0.80/M). But Claude Haiku pulls ahead when prompt caching is active: cache reads at $0.08/M are 12.5× cheaper than GPT-4o mini's standard input price. For Claude Code sessions where the same large system prompt is reused across many calls, Haiku often ends up costing less overall.
Which cheap LLM has the best quality for coding tasks?
Claude Haiku 3.5 leads the budget tier for coding. It outperforms GPT-4o mini on multi-file code understanding, tool use, and agentic task completion — the capabilities that matter most in code-generation pipelines. If budget is the hard constraint, Haiku gives the best coding quality per dollar among sub-$1/M input models.
How do I find out which LLM is actually cheapest for my workload?
The best approach is to run a real session and analyze the token breakdown. If you use Claude Code, paste your session log (~/.claude/projects/<project>/*.jsonl) into the Claude Code Cost Calculator. It breaks down costs by model (input, output, cache write, cache read), which lets you directly see where your money goes and model what a cheaper routing strategy would cost.