How do I calculate LLM API cost before deploying?

Formula: cost = (input_tokens × input_rate) + (output_tokens × output_rate). Count tokens using tiktoken (OpenAI's library) or estimate with the 0.75 rule (1 word ≈ 1.33 tokens). Test 20 sample prompts to get average output length. Multiply by expected daily volume and add 3× safety margin for retries and conversation history.

How many tokens is 1000 words in an LLM prompt?

Approximately 1,333 tokens (1 word ≈ 1.33 tokens for English text). For code, the ratio is higher — about 1.5–2 tokens per word due to special characters and whitespace. Use tiktoken for exact counts: import tiktoken; enc = tiktoken.encoding_for_model('gpt-4o'); len(enc.encode(text)).

Why is my LLM API bill higher than my estimate?

Common causes: conversation history accumulates (each message adds all prior turns to input tokens), retries from 429 errors add extra calls, system prompts are longer than counted, and output tokens vary more than estimated. Always add a 2–3× buffer to your estimates and use a proxy like Tokonomics to track actual vs estimated costs in production.

How to Estimate LLM API Costs Before You Deploy

TL;DR — Formula: cost = (input_tokens × input_rate) + (output_tokens × output_rate). A 500-token prompt + 300-token response on GPT-4o = $0.00170 per call. At 10,000 calls/day = $17/day = $510/month. Tokenize with tiktoken before deploying, not after.

Key Takeaways

Cost per call = (input tokens × input price) + (output tokens × output price) — the universal formula for every LLM provider

A 500-token prompt + 300-token response on GPT-4o costs $0.00170 — at 10K calls/day that's $510/month

Use tiktoken (OpenAI) or Anthropic's tokenizer to count tokens before deploying — not after the invoice arrives

System prompts are billed on every call — a 2,000-token system prompt at GPT-4o rates adds $500/month at 100K requests

Most developers don't estimate LLM costs before deploying. They build the feature, ship it, and find out what it costs when the invoice arrives. That's how a chatbot that "should cost nothing" ends up at $3,400/month.

The math isn't hard. You need three numbers: how many tokens your prompt uses, how many tokens the response generates, and what your provider charges per token. Multiply, add, and scale by your expected volume. This article gives you the exact formulas, tools, and worked examples to estimate costs before you spend a dollar.

What is the cost formula every developer needs?

Every LLM API call has two billable components: input tokens (your prompt) and output tokens (the model's response). The formula:

Cost per call = (input tokens x input price) + (output tokens x output price)

Prices are quoted per million tokens. Here are the current rates for the most popular models as of June 2026:

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-4o	$2.50	$10.00
GPT-4o-mini	$0.15	$0.60
Claude Sonnet 4	$3.00	$15.00
Claude Haiku 3.5	$0.80	$4.00
DeepSeek V3	$0.27	$1.10
Gemini 2.5 Flash	$0.15	$0.60

For the full pricing breakdown across all providers, see our LLM API Pricing Guide 2026.

How do you count tokens before sending the request?

Tokens are not words. A token is roughly 3/4 of a word in English. A 500-word prompt is approximately 670 tokens. A 1,000-word response is about 1,330 tokens.

Three ways to count tokens before running a prompt:

1. Use OpenAI's tiktoken library. It's the most accurate tokenizer for GPT models. Install it with pip install tiktoken, then:

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
tokens = enc.encode("Your prompt text here")
print(len(tokens))  # exact token count

2. Use the rough ratio. For quick estimates, divide your word count by 0.75. A 400-word prompt is roughly 530 tokens. This is accurate within 10-15% for English text. Code and non-English text use more tokens per word.

3. Use a tokenizer playground. Paste your prompt into our free LLM token counter or OpenAI's tokenizer to see the exact count. This works for quick checks but doesn't scale for automation.

For Anthropic models, the tokenizer differs slightly but the 0.75 ratio holds close enough for estimation. The exact count only matters at scale — a 5% difference on a single call is fractions of a cent.

What does a customer support chatbot actually cost?

Let's estimate costs for a real use case. You're building a customer support chatbot that handles 500 conversations per day. Each conversation averages 3 back-and-forth exchanges.

Your inputs:

System prompt: 800 tokens (fixed, sent every call)
Average user message: 150 tokens
Average assistant response: 400 tokens
Conversation history grows with each turn

Per-conversation token math (3 turns):

Turn	Input tokens	Output tokens
Turn 1	800 + 150 = 950	400
Turn 2	800 + 150 + 400 + 150 = 1,500	400
Turn 3	800 + 150 + 400 + 150 + 400 + 150 = 2,050	400
Total	4,500	1,200

Notice how input tokens grow with each turn because you're resending the conversation history. This is the cost multiplier most developers miss.

Daily cost at different model tiers:

Model	Daily cost (500 conversations)
GPT-4o	(4,500 x $2.50 + 1,200 x $10.00) / 1M x 500 = $11.63
GPT-4o-mini	(4,500 x $0.15 + 1,200 x $0.60) / 1M x 500 = $0.70
Claude Sonnet 4	(4,500 x $3.00 + 1,200 x $15.00) / 1M x 500 = $15.75
DeepSeek V3	(4,500 x $0.27 + 1,200 x $1.10) / 1M x 500 = $1.27

The difference between GPT-4o and GPT-4o-mini is $10.93 per day — roughly $328/month. For a customer support chatbot where GPT-4o-mini handles 90% of queries adequately, that's $328/month saved with a single model switch.

What hidden costs break your estimate?

The formula above covers the base case. Real production usage has cost multipliers that inflate your estimate:

Conversation history accumulation. As shown above, multi-turn conversations resend all prior messages. A 10-turn conversation sends your system prompt 10 times. At scale, this is often 3-5x more expensive than single-turn estimates suggest.

Retry logic. If your app retries failed requests, each retry is a full billable call. A 5% error rate with 2 retries adds roughly 10% to your costs.

System prompt size. Some teams stuff entire knowledge bases into the system prompt. A 4,000-token system prompt costs $0.01 per call on GPT-4o. At 10,000 calls/day, that's $100/day just for the system prompt — before the user says anything.

Output variability. You control your prompt length. You don't control output length. Setting max_tokens caps the ceiling but the model may generate anywhere from 50 to 2,000 tokens per response. Estimate using your average observed output, not the max.

Function calling and tool use. Tool definitions are injected as tokens. A complex schema with 10 functions can add 2,000-3,000 tokens to every request. These tokens appear in your input count even when no function is called.

How do you estimate costs for different use cases?

The numbers vary dramatically by use case. Here are estimates for common patterns based on the pricing above:

Content generation (blog posts, marketing copy)

Input: ~2,000 tokens (instructions + context)
Output: ~3,000 tokens (a ~2,000-word article)
Cost per piece with GPT-4o: $0.035
At 100 articles/month: $3.50/month

Content generation is cheap because volume is low. Even with GPT-4o, it's pennies per article.

RAG (retrieval-augmented generation)

Input: ~4,000 tokens (query + 3-5 retrieved chunks + system prompt)
Output: ~500 tokens
Cost per query with GPT-4o: $0.015
At 10,000 queries/day: $150/day ($4,500/month)

RAG gets expensive fast because retrieved context inflates input tokens. Using GPT-4o-mini drops that to $9/day ($270/month) — a 94% reduction. For most RAG applications, the smaller model performs comparably since the answer is already in the retrieved context. See our cheapest LLM for each use case guide for specific benchmarks.

Code review / analysis

Input: ~6,000 tokens (code file + instructions)
Output: ~1,000 tokens (review comments)
Cost per review with Claude Sonnet 4: $0.033
At 200 reviews/day: $6.60/day ($198/month)

Data extraction / classification

Input: ~1,000 tokens (document snippet + schema)
Output: ~100 tokens (structured JSON)
Cost per extraction with GPT-4o-mini: $0.000,21
At 50,000 extractions/day: $10.50/day ($315/month)

Classification is the strongest case for small models. Output is structured and short, so the output cost is negligible.

What are three strategies to cut costs before launch?

Once you have your estimate, you can optimize before writing any production code:

1. Pick the right model tier for each task

Not every call needs GPT-4o or Claude Sonnet. Route simple tasks (classification, extraction, summarization) to cheaper models and reserve expensive models for tasks that need reasoning. A model router that sends 70% of calls to GPT-4o-mini and 30% to GPT-4o cuts your average cost by 60%.

For a detailed breakdown of which model fits each use case, see our model comparison guide.

2. Shrink your system prompt

Every token in your system prompt is billed on every single call. Audit it ruthlessly:

Remove example outputs that the model doesn't need
Replace verbose instructions with concise ones
Move static reference data to RAG retrieval instead of stuffing it in the prompt
Use prompt caching if your provider supports it (learn how)

Cutting your system prompt from 3,000 tokens to 800 tokens saves $0.0055 per GPT-4o call. At 10,000 calls/day, that's $55/day saved — $1,650/month from a one-time optimization.

3. Cap output tokens

Set max_tokens to a reasonable ceiling for each use case. If your classification task should return 50 tokens, don't leave it unbounded. A runaway response that generates 4,000 tokens costs 80x more than the expected 50 tokens on the output side.

How do you track actual costs against estimates?

Estimates break. User behavior differs from assumptions. Prompts get modified. Models get updated. The only way to know your real costs is to measure every call in production.

This is what Tokonomics does. It sits between your app and any LLM provider — OpenAI, Anthropic, DeepSeek, or any OpenAI-compatible API — and records every call with exact token counts and costs. No SDK required. Change your base URL and add a Tokonomics API key. Your cost estimate becomes a living dashboard instead of a spreadsheet.

You can set budget alerts that fire at 50%, 80%, and 100% of your monthly budget. You can set hard spending caps that automatically block requests when you hit your limit. And you can break down costs by feature using custom tags so you know exactly which part of your app is spending what.

Frequently Asked Questions

How accurate are LLM cost estimates before production?

Initial estimates are typically 2-3x lower than actual production costs. Conversation history accumulation, retries from rate limits, and variable output lengths all inflate the real bill. Always apply a 3x safety margin to your estimate. Teams that set budget alerts catch the gap before it becomes a surprise invoice.

What's the cheapest way to estimate token counts without calling the API?

Use OpenAI's tiktoken library (free, open-source) for exact counts, or apply the 0.75 rule: 1 token is roughly 0.75 English words. For a 1,000-word prompt, estimate about 1,333 tokens. Code is denser, closer to 1.5-2 tokens per word. Both methods cost nothing and take seconds.

Should I estimate costs per API call or per user session?

Per user session gives you the number that matters for unit economics. A single chatbot session with 10 conversation turns can accumulate 15,000+ input tokens from history alone. According to OpenAI's pricing page, GPT-4o charges $2.50 per million input tokens, making a 10-turn session cost roughly $0.04 in input alone.

What is the 60-second cost estimation checklist?

Before you deploy any LLM feature to production:

Count your prompt tokens — use tiktoken or the 0.75 ratio
Estimate output tokens — test 20 sample calls, take the average
Multiply by volume — daily calls x 30 for monthly cost
Add the hidden multipliers — conversation history, retries, tool schemas
Compare at least 2 model tiers — the cheaper one is often good enough
Set a budget cap — decide your max monthly spend before you ship

The developers who get surprised by LLM costs are the ones who skip this step. Five minutes of estimation saves thousands in unexpected charges.

For strategies beyond estimation — optimizing costs on features already in production — read our LLM cost optimization strategies guide.

Last updated June 2026. All sources retrieved June 2026.