Claude API Batch Pricing: 50% Off for Async Workloads
Anthropic's Batch Messages API cuts your token bill in half for any workload that doesn't need a real-time response. If you have thousands of documents to process, a nightly pipeline to run, or a large evaluation to score, batch is the right tool — and the savings are immediate.
Calculate My Batch Cost →Batch vs Standard Pricing — All Claude 4 Models
| Model | Standard Input | Batch Input | Standard Output | Batch Output | Savings |
|---|---|---|---|---|---|
| Claude Haiku 4.5 Cheapest | $0.80/M | $0.40/M | $4.00/M | $2.00/M | 50% |
| Claude Sonnet 4.6 Default | $3.00/M | $1.50/M | $15.00/M | $7.50/M | 50% |
| Claude Opus 4.7 Most Capable | $15.00/M | $7.50/M | $75.00/M | $37.50/M | 50% |
Real-World Batch Cost Examples
Concrete dollar amounts for typical async workloads. All figures use 2026 Anthropic published rates.
When to Use Batch vs Real-Time API
| Use Case | Real-Time API | Batch API |
|---|---|---|
| User-facing chatbots | Yes — required | No |
| Interactive coding tools | Yes — required | No |
| CI/CD quality checks | Yes (if blocking) | Yes (if async is OK) |
| Bulk document labeling | Wasteful | Yes — ideal |
| Nightly reports & summaries | Wasteful | Yes — ideal |
| Model evaluation pipelines | Wasteful | Yes — ideal |
| Bulk document summarization | Wasteful | Yes — ideal |
| Training data generation | Wasteful | Yes — ideal |
| Real-time sentiment scoring | Yes — required | No |
| Offline content moderation | Wasteful | Yes — ideal |
How to Submit a Batch Job (Python)
import anthropic, time client = anthropic.Anthropic() # Build up to 10,000 requests per batch requests = [ {"custom_id": f"doc-{i}", "params": { "model": "claude-sonnet-4-6", "max_tokens": 512, "messages": [{"role": "user", "content": f"Summarize: {docs[i]}"}] }} for i in range(len(docs)) ] # Submit the batch batch = client.beta.messages.batches.create(requests=requests) print(f"Submitted batch {batch.id}, status: {batch.processing_status}") # Poll until complete (batches finish within 24 hours) while batch.processing_status == "in_progress": time.sleep(60) batch = client.beta.messages.batches.retrieve(batch.id) # Stream results for result in client.beta.messages.batches.results(batch.id): print(result.custom_id, result.result.message.content[0].text)
custom_id to your own document or record identifier. When you retrieve results, each result maps back to the original request via custom_id, so you can process results in any order without worrying about position.
Frequently Asked Questions
What discount does the Claude Batch API offer?
The Claude Batch Messages API offers a 50% discount on all input and output tokens compared to the standard real-time API. There is no discount on prompt cache write or cache read tokens when using the batch endpoint.
What are Claude Batch API pricing rates in 2026?
Haiku 4.5 batch: $0.40/M input, $2.00/M output.
Sonnet 4.6 batch: $1.50/M input, $7.50/M output.
Opus 4.7 batch: $7.50/M input, $37.50/M output.
Standard (non-batch) rates are exactly 2× these amounts.
When should I use the Claude Batch API vs real-time API?
Use batch when results don't need to arrive in real time: nightly data pipelines, bulk document processing, offline evaluations, training data labeling, or any task where a 24-hour processing window is acceptable.
Batch is NOT appropriate for user-facing chatbots, interactive tools, or any latency-sensitive path. If a human is waiting for the response, use the standard real-time API.
Is there a limit on batch request size?
Each Batch API request can contain up to 10,000 individual message requests. Each individual request within the batch follows the standard token limits (200K context window). Batches typically complete within 24 hours; you poll or use webhooks to retrieve results.
Does the Batch API support prompt caching?
Yes — prompt caching is compatible with the Batch API. Cache write tokens cost the same rate as the standard API (not discounted). Cache read tokens also cost the same rate. The 50% discount applies only to uncached input tokens and all output tokens.
If you're processing many documents that share a large common system prompt, combining prompt caching with the Batch API can yield savings well above 50% on the input side.