TL;DR — Agent cost = (avg calls per task) × (avg tokens per call) × (model rate). A customer-support agent on Claude Haiku at 5 calls/task ≈ $0.04/conversation. A research agent on GPT-4o at 40 calls/task ≈ $3.50/task. Cap your loops, pick the cheapest model that passes quality, and monitor per-task cost — not just total spend.
AI agents are not chatbots. A chatbot handles one question, returns one answer, and stops. An agent runs loops — it reasons, calls tools, reads results, reasons again, and keeps going until the task is done. Each loop iteration is a separate LLM API call, and each call bills for tokens.
This is why agent costs surprise people. A chatbot that costs $0.01 per conversation turns into an agent that costs $0.50-$5.00 per task — because the agent might make 10-50 LLM calls to complete a single task. Multiply that by hundreds of tasks per day and you're looking at real money.
This article gives you the actual costs for four common agent types, explains why agents are fundamentally more expensive than simple LLM calls, and shows you how to estimate and control your agent spending.
Why agents cost more than you think
A standard LLM API call has a predictable cost: input tokens × input price + output tokens × output price. You can estimate it before you run it.
Agents break this model because of three cost multipliers:
1. Loop iterations. An agent that uses ReAct (Reason + Act) might call the LLM 5-30 times per task. Each call includes the full conversation history — system prompt, previous reasoning steps, tool results — so input tokens grow with every iteration.
2. Tool call overhead. When an agent calls a tool (search, database query, API call), the tool definition is injected as tokens. A schema with 10 tools can add 2,000-3,000 input tokens to every single LLM call, even when most tools aren't used.
3. Context accumulation. Unlike a single-turn chatbot, agents accumulate context across iterations. By iteration 10, the agent is sending the system prompt + all 9 previous reasoning steps + all 9 tool results as input. This means iteration 10 costs 5-10x more than iteration 1.
Real cost breakdowns by agent type
All costs below use June 2026 pricing. See our full pricing guide for current rates.
Customer support agent
What it does: Answers customer questions by searching a knowledge base, looking up order status, and composing a response. Escalates to a human when confidence is low.
Typical flow: 3-5 LLM calls per ticket (classify → retrieve → draft → refine → respond)
| Component | Tokens per task | Cost (GPT-4o) | Cost (GPT-4o-mini) |
|---|---|---|---|
| System prompt (per call, 5 calls) | 5 × 1,200 = 6,000 | $0.015 | $0.0009 |
| Tool schemas (per call) | 5 × 1,500 = 7,500 | $0.019 | $0.0011 |
| Accumulated context | ~8,000 | $0.020 | $0.0012 |
| User messages + KB results | ~4,000 | $0.010 | $0.0006 |
| Agent reasoning output | ~2,500 | $0.025 | $0.0015 |
| Final response output | ~400 | $0.004 | $0.0002 |
| Total per ticket | ~28,400 | $0.093 | $0.0055 |
At 200 tickets/day:
- GPT-4o: $18.60/day → $558/month
- GPT-4o-mini: $1.10/day → $33/month
The 17x cost difference is why model selection matters enormously for agents. For most customer support queries, GPT-4o-mini handles the task well — the knowledge base does the heavy lifting, not the model's reasoning ability.
Coding agent
What it does: Takes a task description, reads relevant files, writes code, runs tests, fixes errors, and iterates until tests pass.
Typical flow: 8-20 LLM calls per task (understand → plan → read files → write code → run tests → fix errors → repeat)
| Component | Tokens per task | Cost (Claude Sonnet 4) |
|---|---|---|
| System prompt (per call, 12 calls avg) | 12 × 2,000 = 24,000 | $0.072 |
| Code file contents (input) | ~30,000 | $0.090 |
| Tool schemas + accumulated context | ~20,000 | $0.060 |
| Reasoning + code output | ~15,000 | $0.225 |
| Total per task | ~89,000 | $0.447 |
At 50 tasks/day:
- Claude Sonnet 4: $22.35/day → $670/month
Coding agents are expensive because code files are large (high input tokens) and generated code is verbose (high output tokens). Output tokens cost 5x more than input tokens on Claude Sonnet 4, so the code generation step dominates the bill.
Research agent
What it does: Takes a research question, searches the web, reads multiple pages, synthesizes findings, and produces a report.
Typical flow: 10-25 LLM calls (search → read page → evaluate → search again → synthesize)
| Component | Tokens per task | Cost (GPT-4o) |
|---|---|---|
| System prompt (per call, 15 calls avg) | 15 × 1,000 = 15,000 | $0.038 |
| Web page contents (input) | ~60,000 | $0.150 |
| Accumulated reasoning | ~25,000 | $0.063 |
| Intermediate summaries (output) | ~8,000 | $0.080 |
| Final report (output) | ~3,000 | $0.030 |
| Total per task | ~111,000 | $0.361 |
At 30 research tasks/day:
- GPT-4o: $10.83/day → $325/month
Research agents are input-heavy because they consume entire web pages. The single biggest optimization is summarizing pages before adding them to context, rather than stuffing raw HTML into the prompt.
Sales outreach agent
What it does: Researches a prospect (LinkedIn, company website), drafts a personalized email, and suggests talking points.
Typical flow: 4-6 LLM calls (research prospect → research company → draft email → refine)
| Component | Tokens per task | Cost (GPT-4o-mini) |
|---|---|---|
| System prompt + persona (per call, 5 calls) | 5 × 800 = 4,000 | $0.0006 |
| Prospect/company data (input) | ~6,000 | $0.0009 |
| Context accumulation | ~3,000 | $0.0005 |
| Email drafts (output) | ~1,500 | $0.0009 |
| Total per prospect | ~14,500 | $0.0029 |
At 500 prospects/day:
- GPT-4o-mini: $1.45/day → $43.50/month
Sales agents are the cheapest because individual tasks are small and GPT-4o-mini handles personalized writing well enough. This is a clear case where the cheaper model wins.
The agent cost formula
For any agent, estimate monthly cost with this formula:
Monthly cost = (avg_tokens_per_task × cost_per_token) × tasks_per_day × 30
Where avg_tokens_per_task includes all iterations:
avg_tokens_per_task = avg_iterations × (
system_prompt_tokens +
tool_schema_tokens +
avg_accumulated_context +
avg_output_per_iteration
)
The variable that matters most is avg_iterations. An agent that averages 5 iterations per task costs roughly half of one that averages 10. Capping maximum iterations is the single most effective cost control.
How to reduce agent costs
1. Use cheaper models for simple steps
Not every agent step needs GPT-4o or Claude Sonnet. The classification step ("is this a billing question or a technical question?") can use GPT-4o-mini. The final formatting step can use a cheap model. Only the core reasoning step needs the expensive model.
# Route by step complexity
if step == "classify":
model = "gpt-4o-mini" # $0.15/1M input
elif step == "reason":
model = "gpt-4o" # $2.50/1M input
elif step == "format_response":
model = "gpt-4o-mini" # $0.15/1M input
This hybrid approach can cut agent costs by 40-60% because classification and formatting are typically 3-4 of the 5+ iterations.
2. Cap iterations
Set a hard limit on how many loops the agent can run. Most tasks that aren't solved in 10 iterations won't be solved in 20 — the agent is stuck, and more iterations just burn tokens.
MAX_ITERATIONS = 10
for i in range(MAX_ITERATIONS):
result = agent.step()
if result.is_complete:
break
if not result.is_complete:
escalate_to_human(task)
3. Summarize context instead of accumulating it
Instead of passing the full history of every iteration to the next call, summarize previous steps into a compressed context. This keeps input tokens roughly constant across iterations instead of growing linearly.
if len(history) > 5:
# Summarize older history to save tokens
summary = llm.summarize(history[:-3])
context = [summary] + history[-3:] # keep last 3 in full
else:
context = history
4. Reduce tool schemas
Only include tools relevant to the current step. If the agent is in a "search" phase, don't include the "send email" tool definition. Each unused tool definition wastes 200-500 tokens per call.
5. Use prompt caching
If your agent sends the same system prompt and tool definitions on every call — which most do — prompt caching can reduce the cost of those repeated tokens by 50-90%. OpenAI and Anthropic both support this.
Tracking agent costs in production
Agent costs are harder to track than simple API calls because one user action triggers multiple LLM calls. You need to group related calls into a single "task" to understand the true cost per task.
With Tokonomics, you can tag all calls from a single agent task with the same identifier using the X-Metering-Tags header:
headers = {
"X-Metering-Tags": json.dumps({
"agent": "support",
"task_id": task.id,
"iteration": str(i)
})
}
This lets you see in the analytics dashboard:
- Average cost per agent task (not just per API call)
- How many iterations tasks actually take
- Which agent type costs the most
- Cost trends as you optimize
Without per-task tracking, you only see aggregate API spend — you can't tell if your agent optimization actually worked. Per-feature cost tracking is how you close that feedback loop.
Set budget alerts specifically for agent workloads. Agents can spiral — a bug in the loop logic can cause infinite iterations. A hard spending cap prevents a runaway agent from burning through your entire monthly budget overnight.
What to expect at different scales
| Scale | Tasks/day | Model | Estimated monthly cost |
|---|---|---|---|
| Side project | 10 | GPT-4o-mini | $1-5 |
| Small SaaS | 100 | GPT-4o-mini | $15-50 |
| Growing startup | 500 | Mixed (4o + mini) | $200-800 |
| Scaling company | 2,000 | Mixed | $800-3,000 |
| Enterprise | 10,000+ | Mixed | $4,000-15,000+ |
These ranges assume average-complexity agents (5-10 iterations per task). Simple agents (classify + respond) cost 3-5x less. Complex agents (research + multi-step reasoning) cost 2-3x more.
The key insight: agent costs scale linearly with tasks but can be reduced per-task through the optimizations above. A team that optimizes model selection, caps iterations, and uses prompt caching typically runs agents at 30-40% of the naive cost.
Start by estimating your costs with the formula above, deploy with monitoring, and audit monthly to catch drift. Agent costs compound quietly — the teams that control them are the ones that measure them.
Last updated June 2026. All sources retrieved June 2026.