← Blog
llm-cost-estimation llm-api-pricing token-counting June 6, 2026 9 min read

How to Estimate LLM API Costs Before You Deploy

Calculator and financial documents representing LLM API cost estimation and budget planning

TL;DR — Formula: cost = (input_tokens × input_rate) + (output_tokens × output_rate). A 500-token prompt + 300-token response on GPT-4o = $0.00170 per call. At 10,000 calls/day = $17/day = $510/month. Tokenize with tiktoken before deploying, not after.

Most developers don't estimate LLM costs before deploying. They build the feature, ship it, and find out what it costs when the invoice arrives. That's how a chatbot that "should cost nothing" ends up at $3,400/month.

The math isn't hard. You need three numbers: how many tokens your prompt uses, how many tokens the response generates, and what your provider charges per token. Multiply, add, and scale by your expected volume. This article gives you the exact formulas, tools, and worked examples to estimate costs before you spend a dollar.

The cost formula every developer needs

Every LLM API call has two billable components: input tokens (your prompt) and output tokens (the model's response). The formula:

Cost per call = (input tokens x input price) + (output tokens x output price)

Prices are quoted per million tokens. Here are the current rates for the most popular models as of June 2026:

Model Input (per 1M tokens) Output (per 1M tokens)
GPT-4o $2.50 $10.00
GPT-4o-mini $0.15 $0.60
Claude Sonnet 4 $3.00 $15.00
Claude Haiku 3.5 $0.80 $4.00
DeepSeek V3 $0.27 $1.10
Gemini 2.5 Flash $0.15 $0.60

For the full pricing breakdown across all providers, see our LLM API Pricing Guide 2026.

How to count tokens before you send the request

Tokens are not words. A token is roughly 3/4 of a word in English. A 500-word prompt is approximately 670 tokens. A 1,000-word response is about 1,330 tokens.

Three ways to count tokens before running a prompt:

1. Use OpenAI's tiktoken library. It's the most accurate tokenizer for GPT models. Install it with pip install tiktoken, then:

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
tokens = enc.encode("Your prompt text here")
print(len(tokens))  # exact token count

2. Use the rough ratio. For quick estimates, divide your word count by 0.75. A 400-word prompt is roughly 530 tokens. This is accurate within 10-15% for English text. Code and non-English text use more tokens per word.

3. Use OpenAI's tokenizer playground. Paste your prompt at platform.openai.com/tokenizer to see the exact count. This works for quick checks but doesn't scale for automation.

For Anthropic models, the tokenizer differs slightly but the 0.75 ratio holds close enough for estimation. The exact count only matters at scale — a 5% difference on a single call is fractions of a cent.

Worked example: customer support chatbot

Let's estimate costs for a real use case. You're building a customer support chatbot that handles 500 conversations per day. Each conversation averages 3 back-and-forth exchanges.

Your inputs:

Per-conversation token math (3 turns):

Turn Input tokens Output tokens
Turn 1 800 + 150 = 950 400
Turn 2 800 + 150 + 400 + 150 = 1,500 400
Turn 3 800 + 150 + 400 + 150 + 400 + 150 = 2,050 400
Total 4,500 1,200

Notice how input tokens grow with each turn because you're resending the conversation history. This is the cost multiplier most developers miss.

Daily cost at different model tiers:

Model Daily cost (500 conversations)
GPT-4o (4,500 x $2.50 + 1,200 x $10.00) / 1M x 500 = $11.63
GPT-4o-mini (4,500 x $0.15 + 1,200 x $0.60) / 1M x 500 = $0.70
Claude Sonnet 4 (4,500 x $3.00 + 1,200 x $15.00) / 1M x 500 = $15.75
DeepSeek V3 (4,500 x $0.27 + 1,200 x $1.10) / 1M x 500 = $1.27

The difference between GPT-4o and GPT-4o-mini is $10.93 per day — roughly $328/month. For a customer support chatbot where GPT-4o-mini handles 90% of queries adequately, that's $328/month saved with a single model switch.

The hidden costs that break your estimate

The formula above covers the base case. Real production usage has cost multipliers that inflate your estimate:

Conversation history accumulation. As shown above, multi-turn conversations resend all prior messages. A 10-turn conversation sends your system prompt 10 times. At scale, this is often 3-5x more expensive than single-turn estimates suggest.

Retry logic. If your app retries failed requests, each retry is a full billable call. A 5% error rate with 2 retries adds roughly 10% to your costs.

System prompt size. Some teams stuff entire knowledge bases into the system prompt. A 4,000-token system prompt costs $0.01 per call on GPT-4o. At 10,000 calls/day, that's $100/day just for the system prompt — before the user says anything.

Output variability. You control your prompt length. You don't control output length. Setting max_tokens caps the ceiling but the model may generate anywhere from 50 to 2,000 tokens per response. Estimate using your average observed output, not the max.

Function calling and tool use. Tool definitions are injected as tokens. A complex schema with 10 functions can add 2,000-3,000 tokens to every request. These tokens appear in your input count even when no function is called.

How to estimate costs for different use cases

The numbers vary dramatically by use case. Here are estimates for common patterns based on the pricing above:

Content generation (blog posts, marketing copy)

Content generation is cheap because volume is low. Even with GPT-4o, it's pennies per article.

RAG (retrieval-augmented generation)

RAG gets expensive fast because retrieved context inflates input tokens. Using GPT-4o-mini drops that to $9/day ($270/month) — a 94% reduction. For most RAG applications, the smaller model performs comparably since the answer is already in the retrieved context. See our cheapest LLM for each use case guide for specific benchmarks.

Code review / analysis

Data extraction / classification

Classification is the strongest case for small models. Output is structured and short, so the output cost is negligible.

Three strategies to reduce estimated costs before you ship

Once you have your estimate, you can optimize before writing any production code:

1. Pick the right model tier for each task

Not every call needs GPT-4o or Claude Sonnet. Route simple tasks (classification, extraction, summarization) to cheaper models and reserve expensive models for tasks that need reasoning. A model router that sends 70% of calls to GPT-4o-mini and 30% to GPT-4o cuts your average cost by 60%.

For a detailed breakdown of which model fits each use case, see our model comparison guide.

2. Shrink your system prompt

Every token in your system prompt is billed on every single call. Audit it ruthlessly:

Cutting your system prompt from 3,000 tokens to 800 tokens saves $0.0055 per GPT-4o call. At 10,000 calls/day, that's $55/day saved — $1,650/month from a one-time optimization.

3. Cap output tokens

Set max_tokens to a reasonable ceiling for each use case. If your classification task should return 50 tokens, don't leave it unbounded. A runaway response that generates 4,000 tokens costs 80x more than the expected 50 tokens on the output side.

How to track actual costs against your estimates

Estimates break. User behavior differs from assumptions. Prompts get modified. Models get updated. The only way to know your real costs is to measure every call in production.

This is what Tokonomics does. It sits between your app and any LLM provider — OpenAI, Anthropic, DeepSeek, or any OpenAI-compatible API — and records every call with exact token counts and costs. No SDK required. Change your base URL and add a Tokonomics API key. Your cost estimate becomes a living dashboard instead of a spreadsheet.

You can set budget alerts that fire at 50%, 80%, and 100% of your monthly budget. You can set hard spending caps that automatically block requests when you hit your limit. And you can break down costs by feature using custom tags so you know exactly which part of your app is spending what.

The 60-second cost estimation checklist

Before you deploy any LLM feature to production:

  1. Count your prompt tokens — use tiktoken or the 0.75 ratio
  2. Estimate output tokens — test 20 sample calls, take the average
  3. Multiply by volume — daily calls x 30 for monthly cost
  4. Add the hidden multipliers — conversation history, retries, tool schemas
  5. Compare at least 2 model tiers — the cheaper one is often good enough
  6. Set a budget cap — decide your max monthly spend before you ship

The developers who get surprised by LLM costs are the ones who skip this step. Five minutes of estimation saves thousands in unexpected charges.

For strategies beyond estimation — optimizing costs on features already in production — read our LLM cost optimization strategies guide.

Last updated June 2026. All sources retrieved June 2026.

About the author
Zouhair is the founder of Tokonomics. He built the platform after receiving a $47,000 LLM invoice that his team didn't see coming. He tracks LLM pricing changes weekly across all major providers.
Connect on LinkedIn →
← Back to Blog