What workloads are suitable for the OpenAI Batch API?

Best for: nightly document classification, bulk content extraction, scheduled data processing, email digest generation, and any task where results are needed within hours rather than seconds. Not suitable for user-facing chat, real-time search, or any feature requiring sub-second responses.

How long do OpenAI Batch API requests take to complete?

OpenAI processes batch requests within 24 hours, but most batches complete in 1–4 hours depending on size and server load. There is no real-time status — you poll the batch ID to check completion, then download the JSONL output file.

OpenAI Batch API: Can It Really Save You Money?

TL;DR — Yes. OpenAI Batch API cuts per-token price by 50% (GPT-4o: $2.50 → $1.25/M input). Tradeoff: results in up to 24 hours, not real-time. Best for: nightly data processing, bulk classification, async document analysis. Not for: user-facing chat, real-time search.

Key Takeaways

Batch API gives a flat 50% discount: GPT-4o drops from $2.50 to $1.25/M input tokens (OpenAI, 2026)

Tradeoff: results within 24 hours, not real-time — you submit a JSONL file and download results later

Best for: nightly processing, bulk classification, document analysis, content generation pipelines

Not for: user-facing chat, real-time search, or anything requiring sub-second response times

Yes. According to OpenAI's official API documentation (2026), the Batch API gives you a flat 50% discount on per-token pricing. GPT-4o input tokens drop from $2.50 to $1.25 per million. GPT-4o-mini drops from $0.15 to $0.075. Same models, same quality, half the price.

The tradeoff: you give up real-time responses. Batch requests are queued and processed within a 24-hour window. You submit a file of requests, OpenAI processes them when capacity is available, and you download the results later. If your workload doesn't need instant responses, this is the easiest cost optimization available.

How does the Batch API work?

Instead of sending individual API calls, you upload a JSONL file containing multiple requests. Each line is a complete chat completion request:

{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o", "messages": [{"role": "user", "content": "Summarize this article..."}]}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o", "messages": [{"role": "user", "content": "Extract entities from..."}]}}

Then you create a batch:

import openai

# Upload the file
batch_file = openai.files.create(
    file=open("requests.jsonl", "rb"),
    purpose="batch"
)

# Create the batch
batch = openai.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

# Check status later
status = openai.batches.retrieve(batch.id)
print(status.status)  # "completed"

# Download results
results = openai.files.content(status.output_file_id)

OpenAI processes the batch and returns a JSONL file with all responses, mapped back to your custom_id values.

What are the exact savings?

Model	Standard input	Batch input	Standard output	Batch output	Savings
GPT-4o	$2.50/1M	$1.25/1M	$10.00/1M	$5.00/1M	50%
GPT-4o-mini	$0.15/1M	$0.075/1M	$0.60/1M	$0.30/1M	50%
o1	$15.00/1M	$7.50/1M	$60.00/1M	$30.00/1M	50%

The discount is consistent across all models. For the full pricing table on standard rates, see our LLM API pricing guide.

Real savings example: A team processing 10,000 product descriptions daily through GPT-4o for categorization:

Average input: 500 tokens per request
Average output: 100 tokens per request
Daily tokens: 5M input + 1M output

Pricing	Daily cost	Monthly cost
Standard	$12.50 + $10.00 = $22.50	$675
Batch	$6.25 + $5.00 = $11.25	$337.50
Savings	$11.25/day	$337.50/month

That's $337.50/month saved by changing how you submit requests — no prompt changes, no model downgrade, no quality loss.

When does the Batch API make sense?

The Batch API works for any workload where you don't need an instant response. Common use cases:

A Gartner (2024) forecast projected that enterprises will spend $644 billion on AI by 2027, with batch inference workloads representing a growing share of that expenditure.

Content generation at scale. Generating product descriptions, marketing copy, or blog drafts. You prepare a batch of prompts, submit overnight, and review results in the morning.

Data processing and extraction. Parsing invoices, extracting entities from documents, classifying support tickets. These are typically queued jobs anyway — the batch API just makes them cheaper.

Evaluation and testing. Running your prompt suite against a new model version. Instead of firing 1,000 API calls in real-time, batch them and get results in a few hours at half the cost.

Embeddings generation. Processing a large corpus for RAG indexing. Embedding 100,000 documents doesn't need to happen in real-time.

Dataset labeling. Using GPT-4o to label training data for a fine-tuned model. Label quality doesn't change whether you process the request at 2pm or 2am.

When doesn't the Batch API work?

User-facing features. Chatbots, search, real-time recommendations — anything where a user is waiting for a response. A 24-hour completion window is obviously not acceptable.

Interactive workflows. Multi-turn conversations, agent loops, or any flow where the next step depends on the previous response. Batch can't handle sequential dependencies.

Time-sensitive processing. Fraud detection, real-time content moderation, or alerts that need to fire immediately.

Small volumes. If you're making 100 calls/day, the Batch API saves you a few cents. The engineering effort to restructure your code for batch processing isn't worth it below roughly 1,000 requests/day.

How do you combine batch and real-time calls?

McKinsey (2024) found that organizations splitting AI workloads between real-time and batch processing reduced their overall inference costs by 25-35%. The highest-impact approach is splitting your workload:

Real-time (standard pricing):
├── User-facing chatbot responses
├── Search and recommendations
└── Interactive agent steps

Batch (50% discount):
├── Nightly content generation
├── Daily report summarization
├── Weekly data classification
└── Embedding index updates

Most teams have both workload types. A SaaS app might use real-time API calls for the chatbot feature (user is waiting) and batch processing for the nightly analytics digest (nobody is waiting). Running the batch portion through the Batch API cuts that segment's cost in half.

To understand which parts of your app could move to batch, start by tracking costs per feature. Once you see the cost breakdown, you can identify which features are batch-eligible and calculate the potential savings.

What Batch API limitations should you know?

24-hour window, not instant. OpenAI's Batch API documentation (2026) states a 24-hour SLA, but most batches finish in 1-4 hours. You can't rely on a specific completion time — don't build workflows that assume results arrive within an hour.

File size limits. Maximum 50,000 requests per batch. For larger workloads, split into multiple batch submissions.

No streaming. Batch responses are returned as complete JSON objects. If you normally rely on streaming for progress indicators, you'll need a different UX pattern for batch results.

Error handling is different. In real-time, you catch a 429 or 500 error and retry immediately. In batch, failed requests appear in the output file with an error field. You need post-processing logic to identify and resubmit failures.

Not all endpoints supported. The Batch API supports chat completions and embeddings. It does not support image generation, audio, or fine-tuning.

How do you track batch vs real-time costs?

When you split workloads between batch and real-time, you need to track costs for both channels. Provider dashboards don't always separate batch and standard usage clearly.

With Tokonomics, you can tag batch requests differently from real-time ones:

# Real-time calls
headers = {"X-Metering-Tags": '{"channel":"realtime","feature":"chatbot"}'}

# Before submitting to batch, log the expected cost
headers = {"X-Metering-Tags": '{"channel":"batch","feature":"digest"}'}

This lets you see in your cost dashboard exactly how much you're spending on each channel, verify the 50% savings is materializing, and catch any drift.

Frequently Asked Questions

How much does the OpenAI Batch API save compared to real-time?

Exactly 50% off standard per-token pricing (OpenAI, 2026). GPT-4o drops from $2.50 to $1.25 per million input tokens. GPT-4o-mini drops from $0.15 to $0.075. Same models, same quality. You just trade real-time responses for a 24-hour completion window.

How long do Batch API requests take to finish?

OpenAI guarantees completion within 24 hours, but most batches finish in 1-4 hours depending on size and server load. You poll the batch ID to check status, then download the JSONL output file. There's no streaming or real-time progress indicator during processing.

What workloads work best with the Batch API?

Nightly document classification, bulk content extraction, scheduled report generation, and email digest processing are ideal candidates. Any task where results are needed within hours rather than seconds qualifies. If you're unsure, audit your current spending to identify which calls don't need instant responses.

Can I use batch and real-time API calls together?

Absolutely, and that's the recommended approach. Route user-facing chat and search to real-time endpoints, then shift background processing to batch. Tag each channel separately in your cost dashboard so you can verify the 50% savings is actually showing up on batch workloads.

What is the bottom line on Batch API savings?

The Batch API is the simplest cost optimization OpenAI offers. No prompt engineering, no model switching, no quality tradeoff. You restructure how you submit requests and save 50%.

Action items:

Audit your workloads. Which features don't need real-time responses?
Calculate potential savings. Multiply your batch-eligible volume by current cost, then halve it.
Start with one workload. Pick your highest-volume non-real-time task and migrate it to batch.
Track the savings. Use budget monitoring to verify the cost reduction shows up.

If you're spending more than $500/month on OpenAI and any portion of your workload is non-real-time, the Batch API should be your first optimization — before prompt optimization, before model switching, before anything else. It's free money.

Last updated June 2026. All sources retrieved June 2026.