Open Calculator →

Cheapest LLM API in 2025:
Full Pricing Comparison

Compare the lowest-cost AI APIs — Claude Haiku, GPT-4o mini, Gemini Flash, Mistral, and more. Find the best model for your budget and use case.

Calculate My Exact Costs →

LLM API Pricing Comparison — 2025 (per 1M tokens)

Model Input / 1M Output / 1M Cache Read / 1M Context
Gemini 1.5 Flash Cheapest raw $0.075 $0.30 $0.019 1M
GPT-4o mini $0.15 $0.60 $0.075 128K
Claude Haiku 3.5 Best cached $0.80 $4.00 $0.08 200K
Mistral Small $0.20 $0.60 128K
Claude Sonnet 4.5 $3.00 $15.00 $0.30 200K
GPT-4o $2.50 $10.00 $1.25 128K
Gemini 1.5 Pro $1.25 $5.00 $0.31 2M
Claude Opus (latest) $15.00 $75.00 $1.50 200K

Which Cheap Model Should You Use?

The "cheapest" API depends on your workload. Here's a quick decision guide:

High-volume classification / routing
Best: Gemini 1.5 Flash
Cheapest raw price ($0.075/M input), large context, fast latency.
Repeated long-prompt workloads
Best: Claude Haiku with prompt caching
Cache reads at $0.08/M — lowest among frontier models. Beats Gemini Flash after first call.
Code generation (budget tier)
Best: Claude Haiku 3.5
Stronger code understanding than GPT-4o mini, despite higher per-token cost.
Chat / dialogue at scale
Best: GPT-4o mini or Mistral Small
Low output prices, good conversational quality, widely supported in tooling.
Claude Code sub-tasks
Best: Claude Haiku with caching
Native tool-call support, 200K context, and sub-agent routing from Sonnet → Haiku cuts bill 60–75%.
Multi-step agentic pipelines
Best: Mix Haiku + Sonnet
Route simple steps to Haiku, hard reasoning to Sonnet. Real-world cost: 3–5× cheaper than Sonnet-only.

Frequently Asked Questions

What is the cheapest LLM API in 2025?

On a pure per-token basis: Gemini 1.5 Flash ($0.075/M input) and GPT-4o mini ($0.15/M input) are the cheapest frontier models. However, for workloads with large repeated system prompts (like Claude Code), Claude Haiku's prompt-cache read price of $0.08/M makes it effectively cheaper after the first call. There is no single "cheapest" — it depends on your token ratio and caching patterns.

Is Claude Haiku cheaper than GPT-4o mini?

GPT-4o mini is cheaper on raw input price ($0.15/M vs $0.80/M). But Claude Haiku pulls ahead when prompt caching is active: cache reads at $0.08/M are 12.5× cheaper than GPT-4o mini's standard input price. For Claude Code sessions where the same large system prompt is reused across many calls, Haiku often ends up costing less overall.

Which cheap LLM has the best quality for coding tasks?

Claude Haiku 3.5 leads the budget tier for coding. It outperforms GPT-4o mini on multi-file code understanding, tool use, and agentic task completion — the capabilities that matter most in code-generation pipelines. If budget is the hard constraint, Haiku gives the best coding quality per dollar among sub-$1/M input models.

How do I find out which LLM is actually cheapest for my workload?

The best approach is to run a real session and analyze the token breakdown. If you use Claude Code, paste your session log (~/.claude/projects/<project>/*.jsonl) into the Claude Code Cost Calculator. It breaks down costs by model (input, output, cache write, cache read), which lets you directly see where your money goes and model what a cheaper routing strategy would cost.