50% Off Standard Rates All Claude 4 Models Up to 10,000 Requests / Batch

Claude API Batch Pricing: 50% Off for Async Workloads

Q: What discount does the Claude Batch API offer?

The Claude Batch Messages API offers a 50% discount on all input and output tokens compared to the standard real-time API. There is no discount on prompt cache write or cache read tokens when using the batch endpoint.

Q: What are Claude Batch API pricing rates in 2026?

Haiku 4.5 batch: $0.40/M input, $2.00/M output. Sonnet 4.6 batch: $1.50/M input, $7.50/M output. Opus 4.7 batch: $7.50/M input, $37.50/M output. Standard (non-batch) rates are exactly 2× these amounts.

Q: Is there a limit on batch request size?

Each Batch API request can contain up to 10,000 individual message requests. Each individual request within the batch follows the standard token limits (200K context window). Batches typically complete within 24 hours; you poll or use webhooks to retrieve results.

Q: Does the Batch API support prompt caching?

Yes — prompt caching is compatible with the Batch API. Cache write tokens cost the same rate as the standard API (not discounted). Cache read tokens also cost the same rate. The 50% discount applies only to uncached input tokens and all output tokens.

Anthropic's Batch Messages API cuts your token bill in half for any workload that doesn't need a real-time response. If you have thousands of documents to process, a nightly pipeline to run, or a large evaluation to score, batch is the right tool — and the savings are immediate.

Calculate My Batch Cost →

Batch vs Standard Pricing — All Claude 4 Models

Model	Standard Input	Batch Input	Standard Output	Batch Output	Savings
Claude Haiku 4.5 Cheapest	$0.80/M	$0.40/M	$4.00/M	$2.00/M	50%
Claude Sonnet 4.6 Default	$3.00/M	$1.50/M	$15.00/M	$7.50/M	50%
Claude Opus 4.7 Most Capable	$15.00/M	$7.50/M	$75.00/M	$37.50/M	50%

Prompt caching note: The 50% batch discount applies to uncached input tokens and all output tokens. Cache write and cache read tokens are billed at the same rate as the standard real-time API — the batch discount does not apply to those token types.

Real-World Batch Cost Examples

Concrete dollar amounts for typical async workloads. All figures use 2026 Anthropic published rates.

Label 100,000 short documents

200 tokens input + 50 tokens output each — Claude Haiku 4.5 — 20M input / 5M output total

Standard API cost $36.00 (20M × $0.80 + 5M × $4.00)

Batch API cost $18.00 (20M × $0.40 + 5M × $2.00)

You save $18.00 (50%)

Generate 50,000 product descriptions

500 tokens input + 300 tokens output each — Claude Sonnet 4.6 — 25M input / 15M output total

Standard API cost $300.00 (25M × $3.00 + 15M × $15.00)

Batch API cost $150.00 (25M × $1.50 + 15M × $7.50)

You save $150.00 (50%)

Summarize 10,000 legal contracts

8,000 tokens input + 500 tokens output each — Claude Sonnet 4.6 — 80M input / 5M output total

Standard API cost $315.00 (80M × $3.00 + 5M × $15.00)

Batch API cost $157.50 (80M × $1.50 + 5M × $7.50)

You save $157.50 (50%)

Evaluate 5,000 long-form essays

3,000 tokens input + 200 tokens output each — Claude Opus 4.7 — 15M input / 1M output total

Standard API cost $300.00 (15M × $15.00 + 1M × $75.00)

Batch API cost $150.00 (15M × $7.50 + 1M × $37.50)

You save $150.00 (50%)

When to Use Batch vs Real-Time API

Use Case	Real-Time API	Batch API
User-facing chatbots	Yes — required	No
Interactive coding tools	Yes — required	No
CI/CD quality checks	Yes (if blocking)	Yes (if async is OK)
Bulk document labeling	Wasteful	Yes — ideal
Nightly reports & summaries	Wasteful	Yes — ideal
Model evaluation pipelines	Wasteful	Yes — ideal
Bulk document summarization	Wasteful	Yes — ideal
Training data generation	Wasteful	Yes — ideal
Real-time sentiment scoring	Yes — required	No
Offline content moderation	Wasteful	Yes — ideal

Key rule of thumb: If a human is waiting in front of a screen for your response, use the real-time API. If the result will be stored, processed, or read later, the Batch API will cut your cost in half with zero change to output quality.

How to Submit a Batch Job (Python)

Python — anthropic SDK — submit + poll

import anthropic, time

client = anthropic.Anthropic()

# Build up to 10,000 requests per batch
requests = [
    {"custom_id": f"doc-{i}", "params": {
        "model": "claude-sonnet-4-6",
        "max_tokens": 512,
        "messages": [{"role": "user",
                       "content": f"Summarize: {docs[i]}"}]
    }}
    for i in range(len(docs))
]

# Submit the batch
batch = client.beta.messages.batches.create(requests=requests)
print(f"Submitted batch {batch.id}, status: {batch.processing_status}")

# Poll until complete (batches finish within 24 hours)
while batch.processing_status == "in_progress":
    time.sleep(60)
    batch = client.beta.messages.batches.retrieve(batch.id)

# Stream results
for result in client.beta.messages.batches.results(batch.id):
    print(result.custom_id, result.result.message.content[0].text)

Tip: Set custom_id to your own document or record identifier. When you retrieve results, each result maps back to the original request via custom_id, so you can process results in any order without worrying about position.

Frequently Asked Questions

What discount does the Claude Batch API offer?

The Claude Batch Messages API offers a 50% discount on all input and output tokens compared to the standard real-time API. There is no discount on prompt cache write or cache read tokens when using the batch endpoint.

What are Claude Batch API pricing rates in 2026?

Haiku 4.5 batch: $0.40/M input, $2.00/M output.

Sonnet 4.6 batch: $1.50/M input, $7.50/M output.

Opus 4.7 batch: $7.50/M input, $37.50/M output.

Standard (non-batch) rates are exactly 2× these amounts.

When should I use the Claude Batch API vs real-time API?

Use batch when results don't need to arrive in real time: nightly data pipelines, bulk document processing, offline evaluations, training data labeling, or any task where a 24-hour processing window is acceptable.

Batch is NOT appropriate for user-facing chatbots, interactive tools, or any latency-sensitive path. If a human is waiting for the response, use the standard real-time API.

Is there a limit on batch request size?

Each Batch API request can contain up to 10,000 individual message requests. Each individual request within the batch follows the standard token limits (200K context window). Batches typically complete within 24 hours; you poll or use webhooks to retrieve results.

Does the Batch API support prompt caching?

Yes — prompt caching is compatible with the Batch API. Cache write tokens cost the same rate as the standard API (not discounted). Cache read tokens also cost the same rate. The 50% discount applies only to uncached input tokens and all output tokens.

If you're processing many documents that share a large common system prompt, combining prompt caching with the Batch API can yield savings well above 50% on the input side.