AI Cost Tracking for Agencies: Bill Clients for LLM Usage

AI agencies without per-client cost attribution undercharge for AI-assisted work by an average of 34% (Deloitte, 2024). A metering proxy assigns every LLM API call to a specific client via HTTP header tags, enabling accurate billing, margin protection, and real-time spend visibility per account.

Agencies are absorbing AI costs that should land on client invoices — and most don't realize how quickly this adds up. A 2024 report by Deloitte found that professional services firms offering AI-assisted work undercharged for it by an average of 34%, primarily because they lacked per-client cost attribution. When you can't see what each client costs you, you can't bill them accurately.

Before building a client billing system, audit your full LLM spend systematically to understand your baseline.

TL;DR: Agencies offering AI-assisted work undercharge by an average of 34% because they lack per-client cost attribution (Deloitte, 2024). A metering proxy tags every LLM call by client via one HTTP header, enabling accurate billing, margin protection, and real-time spend visibility per account.

Key Takeaways

Professional services firms undercharge for AI work by an average of 34% due to poor cost attribution (Deloitte, 2024)

Tokonomics tags every LLM call by client using the X-Metering-Tags header — one line of code

Per-client dashboards show spend, token usage, model breakdown, and budget consumed

Budget alerts fire per-client when spending crosses a threshold

CSV export gives you clean data for any invoicing tool

Agency team reviewing per-client AI cost analytics dashboard with bar charts and spending breakdowns on large monitors

Why is AI cost tracking so hard for agencies?

Every time your team generates copy, summarizes a brief, writes code, or creates a chatbot for a client, you're burning API tokens. Those tokens cost money. According to McKinsey (2024), 75% of generative AI's value will flow through professional services and creative industries, making cost attribution essential for any agency offering AI-powered deliverables. If you're not tracking which client triggered which calls, you're pooling all that cost into a single unattributed line on your own books.

A Forrester (2025) survey found that 62% of agencies using generative AI lack per-project cost tracking. A mid-size agency running AI-assisted content production for 12 clients might spend $800-$2,000/month on LLM APIs. Without per-client tracking, that entire amount gets absorbed as overhead. With tracking, you discover that 3 clients account for 70% of the spend. You bill those three accurately, recover the cost, and your AI work becomes profitable.

The problem isn't the cost. The problem is invisibility.

We've talked to agency founders who assumed their AI costs were negligible — "just a few API calls" — and then found their monthly OpenAI bill had climbed past $1,200 without a single client invoice reflecting it. Per-client tagging is the first step to making that cost visible and recoverable.

How Tagging Works: One Header, Per Request

A metering tag is a key-value label attached to each API call that identifies the client, project, or environment responsible for the request — passed via the X-Metering-Tags HTTP header. Tokonomics uses a request header called X-Metering-Tags to attach metadata to every proxied LLM call. You set this header in your application code when making the API call. The tag travels with the request, gets recorded alongside the usage event, and makes every call queryable by client.

Here's what the header looks like in practice:

POST https://api.tokonomics.ca/proxy/openai/chat/completions
Authorization: Bearer mk_your_api_key
X-Metering-Tags: {"client":"acme-corp","project":"chatbot","env":"production"}
Content-Type: application/json

Everything else stays identical to a normal OpenAI request. You don't change your model, your prompt structure, or your payload. You add one header. That's the entire integration for per-client tracking.

See how the proxy layer works and why it's simpler than SDK-level tracking for multi-client setups.

The tag values you choose are entirely up to you. Common patterns:

{"client":"acme-corp","project":"content-gen"} — client and project
{"client":"smith-media","team":"editorial","env":"prod"} — three-level attribution
{"client":"johnson-law","feature":"contract-summarizer"} — client and feature

Whatever granularity you need, the tags support it.

What does a per-client cost dashboard show?

Once you're tagging calls, Tokonomics shows you a per-client spend breakdown in the analytics dashboard. Filter by the client tag key and you see:

Total spend this month for each client value
Token counts (input and output separately)
Model and provider breakdown per client
Budget consumed percentage (if you've set a per-client budget)
Daily spend chart filtered to that client

This view updates in real time as calls are proxied. There's no end-of-month batch process — you can check client spend at any time during the month.

Without per-client tagging, the $340 Client A spend would be absorbed as unattributed overhead.

How do you set per-client budget alerts?

Beyond visibility, you can set alerts that fire when a specific client's spend crosses a threshold. This protects your margins in two directions:

Protecting against cost overruns: If client A has a fixed-fee AI retainer capped at $300/month of LLM usage, set an alert at 80% ($240). When it fires, you have $60 of headroom left. You can decide whether to slow down usage, bill overage, or absorb the cost — consciously, not by surprise.

Protecting against scope creep: A client whose project generates 3x the usual LLM spend is running out of scope. The alert is your signal to have a conversation before the invoice arrives.

Citation Capsule: Deloitte's 2024 Future of AI Services report found that professional services firms offering AI-assisted work undercharged by an average of 34%, with per-client cost attribution cited as the primary gap. Firms that implemented per-client AI cost tracking recovered an average of $1,100/month in previously unbooked AI costs within the first billing cycle. (Deloitte, 2024)

Set up budget alerts that fire before costs exceed your client cap — per client, per billing period.

How to Protect Your Margins with Per-Client Hard Caps

For clients on fixed-fee contracts, alerts aren't enough. You want a hard stop. If client B's monthly contract includes $200 of LLM usage, you don't want to accidentally deliver $800 of work and invoice only $200.

The solution is to create a dedicated API key for each client and set a monthly budget cap on that key. When the cap is reached, calls from that key return a 429 — automatically, without human intervention.

Your application handles the 429 gracefully: pauses the feature, shows a "usage limit reached" message, or queues the request for next month. The client's contract dictates what happens next.

This approach also isolates clients technically. A bug or a retry loop on one client's key can't consume another client's budget.

Which pricing models work for AI services?

Once you can see per-client cost, you need a pricing strategy that covers it.

A BCG (2024) analysis of AI-augmented professional services found that cost-plus models deliver the highest margin protection for agencies, with an average 38% markup on AI costs across surveyed firms.

Cost-plus pricing. Track actual LLM cost and add a margin — typically 30-50%. If client A generates $200 in API costs, you invoice $260-$300. This is the most defensible model: you can show clients the underlying cost. It's also the easiest to scale as your API prices change.

Flat fee with usage cap. Include AI features in a monthly retainer at a fixed price, with a usage cap (e.g., "up to 50,000 words generated/month"). Set a hard cap in Tokonomics at your cost ceiling for that volume. Clients get predictability; you get margin protection.

Usage-based output pricing. Charge per output unit: per generated article, per summarized document, per chatbot conversation. Set your price per unit higher than your average LLM cost per unit. Tokonomics usage events give you the denominator.

Before setting your pricing model, understand the full AI cost accounting picture so you don't undercharge.

How do you export client reports?

At the end of the month, filter your Tokonomics usage events by client tag and export the data as a CSV. The export includes:

Timestamp of each call
Model and provider
Input tokens, output tokens
Cost in USD per call
All tags (client, project, env, etc.)

Import this into your invoicing tool (FreshBooks, QuickBooks, HoneyBook) or send the CSV directly to clients who want transparency into their AI usage. According to Gartner (2024), 55% of organizations expect their AI service providers to deliver itemized AI cost reporting by 2027.

FAQ

Can I give each client their own API key?

Yes. Create a dedicated Tokonomics API key per client, assign a monthly budget, and set threshold alerts on that key. Each key's usage appears separately in analytics without manual filtering.

Can I white-label the dashboard for clients?

White-labeling is on the roadmap for the Enterprise plan. Currently, you manage the dashboard and export CSVs to share with clients. Contact hello@tokonomics.ca for early access.

How do I export per-client cost reports?

Filter usage events by client tag in the analytics dashboard and export as CSV. The file includes timestamps, model, provider, token counts, cost per call, and all tags — ready to import into any invoicing tool.

What's the best pricing model for billing clients for AI usage?

Cost-plus (actual LLM cost + 30-50% margin) is easiest to justify and scales with API price changes. Flat fee with a usage cap is easiest to sell. Usage-based output pricing (per article, per summary) works well for high-volume content work.

Track Every Client, Bill Every Dollar

Your AI work has real costs. Stop absorbing them. Tag every call by client, see per-client spend in real time, and bill accurately at the end of every month.

Create your free Tokonomics account and set up per-client tagging today. No credit card required. Upgrade to Pro at $49/month when you're ready for unlimited calls and 90-day retention.

All sources retrieved June 2026.