Open Calculator →

Gemini 2.0 Flash Pricing 2026
vs Claude Haiku Cost

Full Gemini 2.0 Flash and Flash-Lite pricing — tokens, context caching, 1M context window, and a direct comparison with Claude Haiku 4.5 to help you choose the cheapest model for your use case.

Flash Input

$0.10
per million tokens

Flash Output

$0.40
per million tokens

Flash-Lite Input

$0.025
per million tokens

Context Window

1M
tokens (5× Claude)

Gemini Model Pricing Table

Model Input ($/M) Output ($/M) Image input Context Free tier
Gemini 2.0 Flash $0.10 $0.40 Yes 1M Yes (15 RPM)
Gemini 2.0 Flash-Lite $0.025 $0.10 Yes 1M Yes (30 RPM)
Gemini 2.5 Pro $1.25 (<200K) $10.00 Yes 1M Limited
Free tier note: Gemini 2.0 Flash has a genuinely useful free tier (15 RPM, 1M TPM) — unlike Claude which has no sustained free API tier. For development and low-volume testing, Gemini Flash is the cheapest way to prototype a multimodal application before committing to paid API usage.

Gemini 2.0 Flash vs Claude Haiku 4.5

Feature Gemini 2.0 Flash Gemini Flash-Lite Claude Haiku 4.5
Input price/M (uncached) $0.10 $0.025 $0.80
Output price/M $0.40 $0.10 $4.00
Cache read/M ~$0.05 (50% off) ~$0.0125 $0.08 (90% off)
Context window 1M tokens 1M tokens 200K tokens
Image / vision Yes Yes Yes
Tool use quality Good Basic Excellent
Free tier Yes (15 RPM) Yes (30 RPM) No
Data residency Google (US/EU) Google (US/EU) Anthropic (US)
Prompt caching depth 50% off (implicit) 50% off (implicit) 90% off (explicit)

Monthly Cost Examples

Workload (per month) Gemini 2.0 Flash Gemini Flash-Lite Claude Haiku 4.5 Haiku + 80% cache
10M in / 2M out $1.80 $0.45 $16 $3.40
100M in / 20M out $18 $4.50 $160 $34
1B in / 100M out $140 $35 $1,200 $256
Cache math: At 80% cache hit rate, Claude Haiku drops to an effective $0.16/M input. That's still 60% higher than Gemini Flash's uncached rate. At very high volumes without caching, Gemini Flash-Lite is 32× cheaper than uncached Haiku. The crossover depends on whether your pipeline benefits from Claude's 90% cache discount.

When to Use Gemini 2.0 Flash vs Claude Haiku

Use case Best choice Why
Ultra-high-volume text classification Gemini Flash-Lite Lowest uncached price at $0.025/M in
Very long document processing (>200K tokens) Gemini 2.0 Flash 1M context vs Claude's 200K limit
Development / prototyping (no budget) Gemini 2.0 Flash Free tier at 15 RPM
Agentic tool-use pipelines Claude Haiku 4.5 Superior function calling reliability
Claude Code cost optimization Claude Haiku 4.5 90% cache discount, native Claude Code integration
Google Cloud / Vertex AI ecosystem Gemini 2.0 Flash Native GCP integration, unified billing

Frequently Asked Questions

How much does Gemini 2.0 Flash cost per token?

Gemini 2.0 Flash costs $0.10 per million input tokens and $0.40 per million output tokens via the Google AI API. Gemini 2.0 Flash-Lite is even cheaper at $0.025/M input and $0.10/M output. Both models have a 1M token context window and support image input at the same rate as text tokens.

Is Gemini 2.0 Flash cheaper than Claude Haiku 4.5?

At sticker price, yes — Gemini Flash ($0.10/M) is 8× cheaper than Haiku ($0.80/M). However, Claude's 90% prompt caching drops Haiku's effective input cost to $0.08/M on cache hits. For pipelines with high cache hit rates (agent loops with fixed system prompts), Haiku can match or beat Gemini Flash on cost while offering superior tool use reliability.

Does Gemini 2.0 Flash have a free tier?

Yes. The Google AI API (not Vertex AI) includes a free tier for Gemini 2.0 Flash at 15 requests per minute and 1M tokens per minute. Flash-Lite has a 30 RPM free tier. This free tier is available for development and low-volume production use. The Claude API has no comparable sustained free tier — only initial trial credits upon signup.

What is the difference between Gemini 2.0 Flash and Flash-Lite?

Gemini 2.0 Flash is the full-capability model ($0.10/M input): better reasoning, more reliable instruction following, stronger tool use, and higher quality on complex tasks. Flash-Lite ($0.025/M input) is a smaller, faster model optimized for simple tasks where quality is less critical. For classification, routing, and summarization of straightforward content, Flash-Lite is adequate. For multi-step reasoning, structured output, or complex instructions, use Flash or a stronger model.

How does Gemini context caching compare to Claude prompt caching?

Gemini uses implicit context caching — Google automatically reuses matching context prefixes. Cache reads cost roughly 50% of input price. Claude uses explicit prompt caching with cache_control markers — cache reads cost 10% of input price (90% discount). For most use cases, Claude's 90% cache discount is more aggressive. Gemini's implicit system is easier to implement (no cache markers needed) but less cost-effective at high cache hit rates.

How do I calculate Gemini 2.0 Flash monthly costs?

Monthly cost = (input_tokens × $0.0000001) + (output_tokens × $0.0000004). At 100M input + 20M output tokens/month: (100M × $0.10/M) + (20M × $0.40/M) = $10 + $8 = $18/month. Compare this with your Claude workload using our free cost calculator — paste your Claude Code session logs to get exact token counts, then apply Gemini's pricing to see the side-by-side comparison.