Open Calculator →

Llama 4 API Pricing 2026
Scout & Maverick vs Claude

Meta Llama 4 API pricing across providers — Together AI, Fireworks, Groq — with a direct comparison against Claude Haiku 4.5 and Sonnet 4.6 to help you find the cheapest model for your workload.

Scout Input (Together AI)

$0.08
per million tokens

Scout Output

$0.30
per million tokens

Maverick Input

$0.22
per million tokens

Context Window

10M
tokens (max spec)

Llama 4 Pricing by Provider

Provider Model Input ($/M) Output ($/M) SLA / Uptime SOC 2
Together AI Llama 4 Scout $0.08 $0.30 99.9% Yes
Together AI Llama 4 Maverick $0.22 $0.88 99.9% Yes
Fireworks AI Llama 4 Scout $0.10 $0.30 99.5% Yes
Groq Llama 4 Scout $0.05 $0.10 Best effort Limited
OpenRouter Llama 4 Scout $0.08–0.12 $0.30–0.40 Varies No
Open-weights advantage: Llama 4 is released under Meta's open license — you can run it on your own GPU cluster at zero per-token cost. At 1B+ tokens/month, self-hosting pays off vs any API provider. A single H100 80GB handles Scout at ~8K tokens/second; 4× H100s cover Maverick at 2-3K tokens/second.

Llama 4 vs Claude — Full Comparison

Feature Llama 4 Scout Llama 4 Maverick Claude Haiku 4.5 Claude Sonnet 4.6
Input price/M (API) $0.08 $0.22 $0.80 $3.00
Output price/M (API) $0.30 $0.88 $4.00 $15.00
Prompt caching Not available Not available 90% off (explicit) 90% off (explicit)
Context window (max spec) 10M tokens 10M tokens 200K tokens 200K tokens
Vision / image input Yes Yes Yes Yes
Tool use quality Adequate Good Excellent Excellent
Self-hosting possible Yes (open weights) Yes (open weights) No No
Enterprise SLA Via provider Via provider Anthropic direct Anthropic direct

Monthly Cost Examples

Workload (per month) Llama 4 Scout Llama 4 Maverick Claude Haiku 4.5 Haiku + 80% cache
10M in / 2M out $1.40 $3.96 $16 $3.40
100M in / 20M out $14 $39.60 $160 $34
1B in / 100M out $110 $308 $1,200 $256
Caching crossover: Claude Haiku with 80% cache hit rate costs $34/month at 100M in / 20M out — vs Llama 4 Scout at $14/month uncached. Haiku is 2.4× more expensive here. But if your pipeline relies heavily on caching (system prompts, RAG context), Haiku's effective cost approaches Scout's. For pipelines without caching opportunities, Scout is the clear cost leader.

When to Use Llama 4 vs Claude

Use case Best choice Why
Absolute lowest cost inference Llama 4 Scout (Groq) $0.05/M input via Groq LPU
Self-hosted / air-gapped Llama 4 (any) Open weights, no API dependency
Agentic tool-use pipelines Claude Haiku / Sonnet Superior function calling reliability
Cache-heavy agent loops Claude Haiku 4.5 90% cache discount closes cost gap
Very long context (200K+) Llama 4 Scout/Maverick 10M context vs Claude's 200K
European data sovereignty Llama 4 (self-hosted) Run on EU infrastructure with full control

Frequently Asked Questions

How much does Llama 4 cost per token via API?

Llama 4 Scout costs $0.08/M input and $0.30/M output via Together AI — or as low as $0.05/M input via Groq. Llama 4 Maverick costs $0.22/M input and $0.88/M output on Together AI. Prices vary by provider since Meta distributes Llama 4 as open-weights and doesn't operate an official API — third-party inference providers host it at different rates.

Can I use Llama 4 for free?

Llama 4 is open-weights (free to download and run locally) but third-party APIs charge per token. Groq offers a limited free tier. Together AI and Fireworks offer trial credits upon signup. For self-hosting, you can run Llama 4 Scout on a single H100 80GB or equivalent for near-zero variable cost — ideal for high-volume workloads where capital expenditure on hardware makes sense.

Is Llama 4 Maverick better than Claude Sonnet 4.6?

Llama 4 Maverick is competitive with Claude Sonnet on general reasoning and coding tasks at a fraction of the price ($0.22/M vs $3/M). However, Sonnet consistently outperforms Maverick on complex multi-step tool use, structured output reliability, instruction-following precision, and long-context fidelity. For agentic applications where reliability matters, the cost difference often justifies Claude Sonnet.

Does Llama 4 support prompt caching?

Most third-party providers don't offer Claude-style explicit prompt caching for Llama 4. This is a meaningful cost disadvantage for pipelines with repeated long system prompts or RAG context. Claude Haiku's 90% cache discount on repeated context can make it cost-competitive with Llama 4 Scout on cache-heavy workloads, despite Haiku's higher sticker price. If your workload has less than 50% cache hit rate, Llama 4 Scout will typically be cheaper.

Which is the best provider for Llama 4 API?

For production: Together AI offers the best reliability (99.9% SLA, SOC 2, BAA for healthcare) at competitive $0.08/M Scout pricing. For speed/cost: Groq's LPU delivers 800+ tokens/second at $0.05/M — ideal for latency-sensitive applications. For variety: OpenRouter aggregates multiple providers and routes to the cheapest available. For self-hosting: build on vLLM or Together's open-source inference stack with Llama 4's MIT license.

How do I calculate Llama 4 API monthly costs?

Monthly cost = (input_tokens × $0.00000008) + (output_tokens × $0.0000003) for Scout via Together AI. At 100M input + 20M output tokens/month: $8 + $6 = $14/month. Compare this with your Claude workload using our free cost calculator — paste your Claude Code session logs to get exact token counts, then apply Llama 4's pricing to see which model is cheaper for your actual usage pattern.