What is a zombie endpoint in LLM API auditing?

A zombie endpoint is an API key or integration that keeps making LLM calls for a feature that was deprecated, never launched, or is no longer used. They're common in fast-moving teams and can represent 10–30% of total API spend.

What tools can help automate LLM spend auditing?

OpenAI's usage dashboard and Anthropic's console show total spend but not per-feature breakdowns. For per-model, per-feature, and per-team analytics in one place, a proxy like Tokonomics logs every call with cost, model, and custom tags automatically.

How to Audit Your LLM API Spending Monthly

TL;DR — Run a 30-minute monthly audit: pull raw numbers, compare month-over-month, break down by model and feature, flag zombie endpoints, check for model mismatches. Tools: OpenAI usage dashboard, Anthropic console, or Tokonomics. Most teams find 20–40% of spend is wasted on the first audit.

Key Takeaways

72% of organizations use AI but most lack per-feature cost controls (McKinsey, 2024)

A monthly audit catches 3 things: cost leaks, model inefficiencies, and trend shifts before they compound

Most teams find 20–40% of AI spend is wasted on their first audit — zombie endpoints and model mismatches

Track 4 numbers monthly: total spend, total requests, average cost per request, and total tokens (input vs output)

Teams that don't audit their LLM spending monthly have no idea where their money goes. They know the total — OpenAI charged $2,800 last month — but they can't tell you which feature consumed 60% of that, which model was used for tasks a cheaper model handles fine, or why spending jumped 40% from April to May. According to McKinsey's State of AI 2024 report, 72% of organizations have now adopted AI in at least one business function — yet most lack the financial controls to manage what that adoption costs at the feature or team level.

A monthly audit takes 30 minutes. It catches three things: cost leaks (requests you didn't know were happening), model inefficiencies (expensive models used for cheap tasks), and trend shifts (gradual cost increases that compound into budget problems). Skip it, and you're managing AI costs by invoice surprise.

Here's the exact process.

How do you pull the raw spending numbers?

Start with the basics. You need four numbers for the current month and the previous month:

Total spend — what you were billed
Total requests — how many API calls
Average cost per request — total spend / total requests
Total tokens — input and output, separately

Where to find them:

OpenAI: platform.openai.com/usage — shows daily spend, model breakdown, and token counts
Anthropic: console.anthropic.com — usage tab with per-model breakdown
DeepSeek: API usage page in your account dashboard

If you're routing through a proxy like Tokonomics, all of this is in one dashboard regardless of how many providers you use. One view, all providers, already broken down by model, API key, and custom tags.

Write these numbers down. You'll compare them against last month.

How do you compare spending month over month?

Calculate the change for each metric:

Metric	Last month	This month	Change
Total spend	$2,100	$2,800	+33%
Total requests	185,000	210,000	+14%
Avg cost/request	$0.0114	$0.0133	+17%
Input tokens	92M	118M	+28%
Output tokens	31M	38M	+23%

The numbers that matter most are the ratios, not the totals.

Requests grew 14% but spend grew 33%. That means each request got more expensive. Why? Either you're using more expensive models, your prompts got longer, or output is longer. Dig into which.

Input tokens grew faster than requests. This usually means system prompts grew, conversation history is accumulating more turns, or RAG is retrieving more chunks per query. Each of these is fixable — see our cost optimization strategies guide.

A healthy baseline. If your spend growth matches your request growth within 5%, your cost per request is stable. That's the goal. If spend is growing faster than requests, something changed — find it.

How do you analyze your model mix?

Break down spend by model. This is where the most money hides.

Model	Requests	Spend	Avg cost/req
GPT-4o	45,000	$1,890	$0.042
GPT-4o-mini	140,000	$546	$0.0039
Claude Sonnet 4	25,000	$364	$0.0146

The audit question: For each model, does the task require that model's capability?

Look at your GPT-4o usage. Those 45,000 calls cost $1,890. If even 50% of them could be handled by GPT-4o-mini (classification, simple summarization, data extraction), switching saves $870/month.

This is the single highest-ROI finding in most audits. Teams default to their best model for everything and never revisit. A model-mix audit done once saves money every month going forward. Gartner (2024) reports that 30% of generative AI projects will be abandoned after initial proof-of-concept, with uncontrolled inference costs cited as a primary driver — a fate avoidable with regular model-mix reviews.

For a detailed guide on which model fits which task, see our cheapest LLM for each use case breakdown.

How do you find zombie endpoints wasting money?

Zombie endpoints are API calls that still run but no longer serve a purpose. They're surprisingly common:

Deprecated features. You removed the chatbot from the pricing page but the backend endpoint still runs, processing bot traffic.
Test environments. A staging server is making real API calls with a production key. If you haven't separated prod vs dev spending, these calls inflate your production cost numbers.
Monitoring scripts. Someone wrote a health check that sends a real prompt to GPT-4o every minute. That's 43,800 calls/month.
Retry storms. A misconfigured retry loop sends 5 retries for every failed request, multiplying your call count by 6x.

How to find them: sort your API calls by endpoint or by API key. Look for keys with high volume but no corresponding product feature. Look for consistent, robotic patterns — exactly 1 request per minute is a cron job, not a user.

If you're tagging requests by feature (X-Metering-Tags: {"feature":"chatbot"}), zombies show up immediately as features with spend but no product value.

How do you check prompt efficiency?

Pull your average input token count per call. If it's climbing month over month, one of these is happening:

System prompt creep. Someone added "be more detailed" instructions, few-shot examples, or reference data to the system prompt. System prompts get longer over time because adding feels free — but every token is billed on every call. The Flexera 2025 State of the Cloud Report found that 82% of enterprises cite cloud cost management as their top challenge, with uncontrolled resource growth — including AI token consumption — as the leading cause of budget overruns. A 4,000-token system prompt costs $0.01 per GPT-4o call. Audit it. Cut what doesn't measurably improve output quality.

Conversation history bloat. Multi-turn conversations resend all prior messages. If your average conversation is growing from 3 turns to 5 turns, your input tokens per conversation roughly double. Solutions: summarize old turns, limit history window to the last N messages, or use prompt caching to reduce rebilling of repeated context.

RAG chunk inflation. Your retrieval pipeline is returning more chunks per query — either because the index grew or because similarity thresholds were lowered. More retrieved context means more input tokens. Check if output quality actually improves with the extra chunks.

How do you set cost targets for next month?

Based on what you found, set 1-3 concrete targets:

"Move classification calls from GPT-4o to GPT-4o-mini — expected savings: $400/month"
"Trim system prompt from 3,200 to 1,500 tokens — expected savings: $180/month"
"Revoke staging API key — expected savings: $95/month"

Write them down. Review them in next month's audit. This creates accountability. Without specific targets, audits become "interesting but not actionable."

What should your audit checklist include?

Here's the complete checklist in one view. Bookmark this for your monthly review:

Numbers pull (5 min)

[ ] Total spend, requests, avg cost/request, tokens (this month)
[ ] Same numbers for last month
[ ] Calculate month-over-month % change

Trend analysis (5 min)

[ ] Is spend growing faster than requests? If yes, investigate.
[ ] Is average cost per request increasing? Check model mix.
[ ] Are input tokens growing faster than output? Check prompt sizes.

Model audit (5 min)

[ ] List spend by model
[ ] For each expensive model: could a cheaper model handle these tasks?
[ ] Identify top model-switch opportunity and estimate savings

Zombie hunt (5 min)

[ ] Sort calls by API key — any keys with unexpected volume?
[ ] Sort calls by feature tag — any deprecated features still spending?
[ ] Check for robotic patterns (fixed intervals, identical payloads)

Prompt check (5 min)

[ ] Compare average input tokens vs last month
[ ] Review system prompt length — can it be shortened?
[ ] Check conversation history settings — is context window capped?

Action items (5 min)

[ ] Set 1-3 specific targets with dollar estimates
[ ] Assign owners for each target
[ ] Schedule next month's audit

How do you automate the boring parts?

The audit steps above are manual. They work, but they rely on someone actually doing them every month. The more you can automate, the more consistently it happens.

Budget alerts handle the biggest risk — spending more than you planned. Set alerts at 50%, 80%, and 100% of your monthly budget. If your budget is $3,000/month, an 80% alert fires at $2,400 and gives you a week to investigate before you hit the cap.

Hard spending caps handle the catastrophic case — a bug that sends 100x normal volume. Caps automatically block requests when your budget is exceeded. No human intervention needed.

For the model-mix and per-feature analysis, Tokonomics does this automatically. Every API call is logged with the model, cost, tokens, and any custom tags you attach. The dashboard shows spend by model, by API key, and by feature — the same breakdowns you'd compute manually, updated in real time. The analytics endpoints also support programmatic access, so you can build your own alerting on top.

What does a healthy audit look like?

After three months of auditing, you should see:

Cost per request is stable or declining. You're optimizing model selection and prompt efficiency.
No zombie endpoints. Every API key and feature tag maps to an active product feature.
Model mix is intentional. You know why each model is used and what it would cost to switch.
Budget alerts are set. You hear about cost problems before the invoice, not after.
Targets are tracked. Last month's savings targets were implemented and measured.

The teams that do this well treat LLM spending like any other infrastructure cost — measured, budgeted, and optimized continuously. The teams that don't are the ones who post on Reddit asking why their AI bill surprised them.

Thirty minutes a month. That's the difference.

Frequently Asked Questions

How often should I audit my LLM API spending?

Monthly is the minimum for most teams. If you're spending over $5,000/month, consider weekly reviews. A 30-minute monthly audit catches cost leaks, zombie endpoints, and model inefficiencies before they compound. According to a16z, AI infrastructure costs represent 20-40% of revenue for AI-heavy startups, making regular audits essential.

What are zombie endpoints in LLM API auditing?

Zombie endpoints are API keys or integrations that keep making LLM calls for deprecated or unused features. They're surprisingly common in fast-moving teams. We've seen cases where zombie endpoints represent 10-30% of total API spend. Check your per-feature cost breakdown to spot them quickly.

Can I automate LLM spend auditing?

Yes, partially. Provider dashboards show total spend, but they don't break costs down by feature or team. A proxy-based approach logs every call with model, cost, and custom tags automatically. You still need a human to interpret the data and make optimization decisions, but data collection can be fully automated.

What does a healthy LLM audit result look like?

After three months of auditing, you should see stable or declining cost per request, zero zombie endpoints, intentional model selection for each feature, and active budget alerts. If your month-over-month cost is growing slower than your user growth, your unit economics are improving.

Last updated June 2026. All sources retrieved June 2026. Key external sources: McKinsey State of AI 2024 | Flexera 2025 State of the Cloud Report | Gartner GenAI Predictions 2024.