TL;DR — Run a 30-minute monthly audit: pull raw numbers, compare month-over-month, break down by model and feature, flag zombie endpoints, check for model mismatches. Tools: OpenAI usage dashboard, Anthropic console, or Tokonomics. Most teams find 20–40% of spend is wasted on the first audit.
Teams that don't audit their LLM spending monthly have no idea where their money goes. They know the total — OpenAI charged $2,800 last month — but they can't tell you which feature consumed 60% of that, which model was used for tasks a cheaper model handles fine, or why spending jumped 40% from April to May.
A monthly audit takes 30 minutes. It catches three things: cost leaks (requests you didn't know were happening), model inefficiencies (expensive models used for cheap tasks), and trend shifts (gradual cost increases that compound into budget problems). Skip it, and you're managing AI costs by invoice surprise.
Here's the exact process.
Step 1: Pull the raw numbers (5 minutes)
Start with the basics. You need four numbers for the current month and the previous month:
- Total spend — what you were billed
- Total requests — how many API calls
- Average cost per request — total spend / total requests
- Total tokens — input and output, separately
Where to find them:
- OpenAI: platform.openai.com/usage — shows daily spend, model breakdown, and token counts
- Anthropic: console.anthropic.com — usage tab with per-model breakdown
- DeepSeek: API usage page in your account dashboard
If you're routing through a proxy like Tokonomics, all of this is in one dashboard regardless of how many providers you use. One view, all providers, already broken down by model, API key, and custom tags.
Write these numbers down. You'll compare them against last month.
Step 2: Month-over-month comparison (5 minutes)
Calculate the change for each metric:
| Metric | Last month | This month | Change |
|---|---|---|---|
| Total spend | $2,100 | $2,800 | +33% |
| Total requests | 185,000 | 210,000 | +14% |
| Avg cost/request | $0.0114 | $0.0133 | +17% |
| Input tokens | 92M | 118M | +28% |
| Output tokens | 31M | 38M | +23% |
The numbers that matter most are the ratios, not the totals.
Requests grew 14% but spend grew 33%. That means each request got more expensive. Why? Either you're using more expensive models, your prompts got longer, or output is longer. Dig into which.
Input tokens grew faster than requests. This usually means system prompts grew, conversation history is accumulating more turns, or RAG is retrieving more chunks per query. Each of these is fixable — see our cost optimization strategies guide.
A healthy baseline. If your spend growth matches your request growth within 5%, your cost per request is stable. That's the goal. If spend is growing faster than requests, something changed — find it.
Step 3: Model mix analysis (5 minutes)
Break down spend by model. This is where the most money hides.
| Model | Requests | Spend | Avg cost/req |
|---|---|---|---|
| GPT-4o | 45,000 | $1,890 | $0.042 |
| GPT-4o-mini | 140,000 | $546 | $0.0039 |
| Claude Sonnet 4 | 25,000 | $364 | $0.0146 |
The audit question: For each model, does the task require that model's capability?
Look at your GPT-4o usage. Those 45,000 calls cost $1,890. If even 50% of them could be handled by GPT-4o-mini (classification, simple summarization, data extraction), switching saves $870/month.
This is the single highest-ROI finding in most audits. Teams default to their best model for everything and never revisit. A model-mix audit done once saves money every month going forward.
For a detailed guide on which model fits which task, see our cheapest LLM for each use case breakdown.
Step 4: Find the zombie endpoints (5 minutes)
Zombie endpoints are API calls that still run but no longer serve a purpose. They're surprisingly common:
- Deprecated features. You removed the chatbot from the pricing page but the backend endpoint still runs, processing bot traffic.
- Test environments. A staging server is making real API calls with a production key.
- Monitoring scripts. Someone wrote a health check that sends a real prompt to GPT-4o every minute. That's 43,800 calls/month.
- Retry storms. A misconfigured retry loop sends 5 retries for every failed request, multiplying your call count by 6x.
How to find them: sort your API calls by endpoint or by API key. Look for keys with high volume but no corresponding product feature. Look for consistent, robotic patterns — exactly 1 request per minute is a cron job, not a user.
If you're tagging requests by feature (X-Metering-Tags: {"feature":"chatbot"}), zombies show up immediately as features with spend but no product value.
Step 5: Check your prompt efficiency (5 minutes)
Pull your average input token count per call. If it's climbing month over month, one of these is happening:
System prompt creep. Someone added "be more detailed" instructions, few-shot examples, or reference data to the system prompt. System prompts get longer over time because adding feels free — but every token is billed on every call. A 4,000-token system prompt costs $0.01 per GPT-4o call. Audit it. Cut what doesn't measurably improve output quality.
Conversation history bloat. Multi-turn conversations resend all prior messages. If your average conversation is growing from 3 turns to 5 turns, your input tokens per conversation roughly double. Solutions: summarize old turns, limit history window to the last N messages, or use prompt caching to reduce rebilling of repeated context.
RAG chunk inflation. Your retrieval pipeline is returning more chunks per query — either because the index grew or because similarity thresholds were lowered. More retrieved context means more input tokens. Check if output quality actually improves with the extra chunks.
Step 6: Set targets for next month (5 minutes)
Based on what you found, set 1-3 concrete targets:
- "Move classification calls from GPT-4o to GPT-4o-mini — expected savings: $400/month"
- "Trim system prompt from 3,200 to 1,500 tokens — expected savings: $180/month"
- "Revoke staging API key — expected savings: $95/month"
Write them down. Review them in next month's audit. This creates accountability. Without specific targets, audits become "interesting but not actionable."
The audit checklist
Here's the complete checklist in one view. Bookmark this for your monthly review:
Numbers pull (5 min)
- [ ] Total spend, requests, avg cost/request, tokens (this month)
- [ ] Same numbers for last month
- [ ] Calculate month-over-month % change
Trend analysis (5 min)
- [ ] Is spend growing faster than requests? If yes, investigate.
- [ ] Is average cost per request increasing? Check model mix.
- [ ] Are input tokens growing faster than output? Check prompt sizes.
Model audit (5 min)
- [ ] List spend by model
- [ ] For each expensive model: could a cheaper model handle these tasks?
- [ ] Identify top model-switch opportunity and estimate savings
Zombie hunt (5 min)
- [ ] Sort calls by API key — any keys with unexpected volume?
- [ ] Sort calls by feature tag — any deprecated features still spending?
- [ ] Check for robotic patterns (fixed intervals, identical payloads)
Prompt check (5 min)
- [ ] Compare average input tokens vs last month
- [ ] Review system prompt length — can it be shortened?
- [ ] Check conversation history settings — is context window capped?
Action items (5 min)
- [ ] Set 1-3 specific targets with dollar estimates
- [ ] Assign owners for each target
- [ ] Schedule next month's audit
Automating the boring parts
The audit steps above are manual. They work, but they rely on someone actually doing them every month. The more you can automate, the more consistently it happens.
Budget alerts handle the biggest risk — spending more than you planned. Set alerts at 50%, 80%, and 100% of your monthly budget. If your budget is $3,000/month, an 80% alert fires at $2,400 and gives you a week to investigate before you hit the cap.
Hard spending caps handle the catastrophic case — a bug that sends 100x normal volume. Caps automatically block requests when your budget is exceeded. No human intervention needed.
For the model-mix and per-feature analysis, Tokonomics does this automatically. Every API call is logged with the model, cost, tokens, and any custom tags you attach. The dashboard shows spend by model, by API key, and by feature — the same breakdowns you'd compute manually, updated in real time. The analytics endpoints also support programmatic access, so you can build your own alerting on top.
What a healthy audit looks like
After three months of auditing, you should see:
- Cost per request is stable or declining. You're optimizing model selection and prompt efficiency.
- No zombie endpoints. Every API key and feature tag maps to an active product feature.
- Model mix is intentional. You know why each model is used and what it would cost to switch.
- Budget alerts are set. You hear about cost problems before the invoice, not after.
- Targets are tracked. Last month's savings targets were implemented and measured.
The teams that do this well treat LLM spending like any other infrastructure cost — measured, budgeted, and optimized continuously. The teams that don't are the ones who post on Reddit asking why their AI bill surprised them.
Thirty minutes a month. That's the difference.
Last updated June 2026. All sources retrieved June 2026.