Open Calculator →
50% Off Standard Rates All Claude 4 Models Up to 10,000 Requests / Batch

Claude API Batch Pricing: 50% Off for Async Workloads

Anthropic's Batch Messages API cuts your token bill in half for any workload that doesn't need a real-time response. If you have thousands of documents to process, a nightly pipeline to run, or a large evaluation to score, batch is the right tool — and the savings are immediate.

Calculate My Batch Cost →

Batch vs Standard Pricing — All Claude 4 Models

Model Standard Input Batch Input Standard Output Batch Output Savings
Claude Haiku 4.5 Cheapest $0.80/M $0.40/M $4.00/M $2.00/M 50%
Claude Sonnet 4.6 Default $3.00/M $1.50/M $15.00/M $7.50/M 50%
Claude Opus 4.7 Most Capable $15.00/M $7.50/M $75.00/M $37.50/M 50%
Prompt caching note: The 50% batch discount applies to uncached input tokens and all output tokens. Cache write and cache read tokens are billed at the same rate as the standard real-time API — the batch discount does not apply to those token types.

Real-World Batch Cost Examples

Concrete dollar amounts for typical async workloads. All figures use 2026 Anthropic published rates.

Label 100,000 short documents
200 tokens input + 50 tokens output each — Claude Haiku 4.5 — 20M input / 5M output total
Standard API cost $36.00  (20M × $0.80 + 5M × $4.00)
Batch API cost $18.00  (20M × $0.40 + 5M × $2.00)
You save $18.00  (50%)
Generate 50,000 product descriptions
500 tokens input + 300 tokens output each — Claude Sonnet 4.6 — 25M input / 15M output total
Standard API cost $300.00  (25M × $3.00 + 15M × $15.00)
Batch API cost $150.00  (25M × $1.50 + 15M × $7.50)
You save $150.00  (50%)
Summarize 10,000 legal contracts
8,000 tokens input + 500 tokens output each — Claude Sonnet 4.6 — 80M input / 5M output total
Standard API cost $315.00  (80M × $3.00 + 5M × $15.00)
Batch API cost $157.50  (80M × $1.50 + 5M × $7.50)
You save $157.50  (50%)
Evaluate 5,000 long-form essays
3,000 tokens input + 200 tokens output each — Claude Opus 4.7 — 15M input / 1M output total
Standard API cost $300.00  (15M × $15.00 + 1M × $75.00)
Batch API cost $150.00  (15M × $7.50 + 1M × $37.50)
You save $150.00  (50%)

When to Use Batch vs Real-Time API

Use Case Real-Time API Batch API
User-facing chatbots Yes — required No
Interactive coding tools Yes — required No
CI/CD quality checks Yes (if blocking) Yes (if async is OK)
Bulk document labeling Wasteful Yes — ideal
Nightly reports & summaries Wasteful Yes — ideal
Model evaluation pipelines Wasteful Yes — ideal
Bulk document summarization Wasteful Yes — ideal
Training data generation Wasteful Yes — ideal
Real-time sentiment scoring Yes — required No
Offline content moderation Wasteful Yes — ideal
Key rule of thumb: If a human is waiting in front of a screen for your response, use the real-time API. If the result will be stored, processed, or read later, the Batch API will cut your cost in half with zero change to output quality.

How to Submit a Batch Job (Python)

Python — anthropic SDK — submit + poll
import anthropic, time

client = anthropic.Anthropic()

# Build up to 10,000 requests per batch
requests = [
    {"custom_id": f"doc-{i}", "params": {
        "model": "claude-sonnet-4-6",
        "max_tokens": 512,
        "messages": [{"role": "user",
                       "content": f"Summarize: {docs[i]}"}]
    }}
    for i in range(len(docs))
]

# Submit the batch
batch = client.beta.messages.batches.create(requests=requests)
print(f"Submitted batch {batch.id}, status: {batch.processing_status}")

# Poll until complete (batches finish within 24 hours)
while batch.processing_status == "in_progress":
    time.sleep(60)
    batch = client.beta.messages.batches.retrieve(batch.id)

# Stream results
for result in client.beta.messages.batches.results(batch.id):
    print(result.custom_id, result.result.message.content[0].text)
Tip: Set custom_id to your own document or record identifier. When you retrieve results, each result maps back to the original request via custom_id, so you can process results in any order without worrying about position.

Frequently Asked Questions

What discount does the Claude Batch API offer?

The Claude Batch Messages API offers a 50% discount on all input and output tokens compared to the standard real-time API. There is no discount on prompt cache write or cache read tokens when using the batch endpoint.

What are Claude Batch API pricing rates in 2026?

Haiku 4.5 batch: $0.40/M input, $2.00/M output.

Sonnet 4.6 batch: $1.50/M input, $7.50/M output.

Opus 4.7 batch: $7.50/M input, $37.50/M output.

Standard (non-batch) rates are exactly 2× these amounts.

When should I use the Claude Batch API vs real-time API?

Use batch when results don't need to arrive in real time: nightly data pipelines, bulk document processing, offline evaluations, training data labeling, or any task where a 24-hour processing window is acceptable.

Batch is NOT appropriate for user-facing chatbots, interactive tools, or any latency-sensitive path. If a human is waiting for the response, use the standard real-time API.

Is there a limit on batch request size?

Each Batch API request can contain up to 10,000 individual message requests. Each individual request within the batch follows the standard token limits (200K context window). Batches typically complete within 24 hours; you poll or use webhooks to retrieve results.

Does the Batch API support prompt caching?

Yes — prompt caching is compatible with the Batch API. Cache write tokens cost the same rate as the standard API (not discounted). Cache read tokens also cost the same rate. The 50% discount applies only to uncached input tokens and all output tokens.

If you're processing many documents that share a large common system prompt, combining prompt caching with the Batch API can yield savings well above 50% on the input side.