← Blog
multi-tenant-llm llm-billing per-tenant-ai-cost June 2, 2026 3 min read

Multi-Tenant LLM Cost Isolation: Bill Your Users for AI

Business analytics and billing dashboard showing multiple account metrics representing multi-tenant cost isolation

When you build AI features for a SaaS product, you don't have one LLM cost problem — you have N problems, one per customer. Some customers run one AI query per day. Others run ten thousand. Without per-tenant tracking, you're subsidizing your heaviest users invisibly while pricing based on your average.

This guide covers the three patterns for multi-tenant LLM cost isolation, how to implement them, and how to use per-tenant cost data to build usage-based billing if your pricing model needs it.

Key Takeaways

  • Without per-tenant tracking, 5–20% of customers typically account for 60–80% of total AI spend
  • Per-tenant cost isolation is the foundation of usage-based billing, fair-use caps, and accurate margin analysis
  • The proxy-layer approach works in any language and any LLM provider without modifying feature code
  • A $49/month SaaS plan with a $2/customer AI cost means the bottom 90% of customers fund the top 10%

This post is part of our SaaS AI Features Cost Guide.


Why Per-Tenant Isolation Matters

Aggregate monthly invoices hide a distribution problem. In any SaaS with AI features, usage follows a power law: a small number of high-usage customers generate most of the AI cost.

Typical AI Cost Distribution Across SaaS Customers (Power Law) 0% 20% 40% 60% 62% of costs Top 5% of customers 25% of costs Next 15% of customers 13% of costs Bottom 80% of customers
Typical AI cost distribution across SaaS customers (power law pattern). Without per-tenant tracking, you price based on average but your P95 customers subsidize the rest. Illustrative data based on production patterns across SaaS teams.

Without per-tenant visibility:


Three Patterns for Multi-Tenant Cost Isolation

Pattern 1: Per-Tenant API Keys (Simple, Limited)

Issue each tenant a unique API key. Route that key through your LLM proxy. Associate all costs with the issuing key's tenant.

Pros: Simple mental model. Easy to revoke per tenant. Cons: Doesn't work if you're using a shared system prompt across tenants. Doesn't aggregate costs across features per tenant. Hard to enforce cross-feature budget caps.

When to use: Single-feature apps, early stage, when tenants each have genuinely isolated workloads.

Pattern 2: Request Metadata Tagging (Recommended)

Tag every outbound LLM request with tenant metadata via request headers or a proxy-layer configuration. The proxy logs tenant_id, cost, model, and feature on every call. A database or Redis counter aggregates totals per tenant per billing period.

# Example: HTTP headers passed to proxy
X-Tenant-ID: tenant_abc123
X-Feature-Name: support-bot
X-Environment: production
X-User-Tier: pro

Pros: Works in any language, any framework. One proxy config handles all features. Enables per-feature AND per-tenant breakdowns simultaneously. Cons: Requires discipline — every API call needs the headers. A proxy layer enforces this automatically.

When to use: Most SaaS products. This is the pattern Tokonomics implements out of the box.

Pattern 3: Tenant-Scoped LLM Budgets (Advanced)

Extend Pattern 2 with per-tenant Redis counters that enforce spending limits in real time. Each tenant has a monthly budget. Every request decrements the counter. At 90% of budget, downgrade to a cheaper model. At 100%, block with a graceful error.

-- Redis Lua script: check and decrement tenant budget
local key = "budget:" .. KEYS[1] .. ":" .. KEYS[2]  -- tenant_id:month
local cost = tonumber(ARGV[1])
local cap = tonumber(ARGV[2])
local current = tonumber(redis.call('GET', key) or "0")
if current + cost > cap then return "DENY" end
redis.call('INCRBYFLOAT', key, cost)
redis.call('EXPIRE', key, 2592000)  -- 30-day TTL
return "ALLOW"

Pros: Prevents any single tenant from consuming unlimited resources. Enables tiered service quality (Pro tenants get higher caps). Cons: Requires Redis in your stack. Slightly more complex logic.

When to use: Any SaaS with a freemium tier, plan-based AI limits, or customers who have shown usage variance that threatens margins.


Building Usage-Based Billing on Top of Cost Tracking

Once you have per-tenant cost data, you can build three billing models:

Included credits + overage:

if tenant.monthly_ai_cost <= plan.included_credits:
    no_charge()
else:
    charge_stripe(tenant.monthly_ai_cost - plan.included_credits, markup=1.5)

The markup covers your infrastructure overhead and LLM cost fluctuations. A 1.5× markup on AI costs that are already optimized typically adds $0.03–0.15/user/month at typical usage levels.

Usage-based subscription (Stripe Metered Billing): Report usage events to Stripe's metered billing API at the end of each month. Stripe automatically invoices the overage amount based on your per-unit price. This works well for developer-tier customers who have predictable, variable usage.

Hard fair-use with upgrade prompt: The simplest model: include a fixed number of AI queries per plan tier. When a tenant hits the limit, show an in-app upgrade prompt. No overage billing complexity. Works when query counts are more intuitive to customers than token counts.


What to Track Per Tenant

Every entry in your per-tenant cost log should include:

Field Example Purpose
tenant_id tenant_abc123 Isolation key
feature_name support-bot Feature-level attribution
model claude-haiku-4-5 Model cost validation
provider anthropic Provider-level reconciliation
input_tokens 523 Cost calculation
output_tokens 387 Cost calculation
cost_usd 0.002438 Aggregation
environment production Exclude staging from billing
user_tier pro For plan-based cap enforcement
created_at 2026-06-02T14:23:11Z Billing period queries

Frequently Asked Questions

Do I need to build this myself or use a tool?

You can build it, but a homemade per-tenant tracking system typically takes 3–4 weeks to reach feature parity: per-request logging, aggregation, alerting, hard caps, and a usable dashboard. A proxy tool like Tokonomics provides this at $49/month with no build time. For teams spending over $500/month on AI, the ROI is immediate.

How do I handle tenants on different plan tiers with different AI limits?

Store per-tenant budget caps in your database, keyed to the plan tier. Pass the budget cap to your proxy layer's Redis enforcement check on each request. When a tenant upgrades, update the cap immediately — no billing period restart needed. The Redis counter tracks cumulative monthly spend regardless of when the cap changes.

What if a tenant shares their API key with multiple team members?

Tenant-level tracking covers all usage under that tenant ID regardless of which team member generated it. If you need user-level attribution within a tenant, add a user_id field to your request metadata and log it alongside the tenant_id. For most SaaS, tenant-level granularity is sufficient.

How do I reconcile my per-tenant cost logs with provider invoices?

Sum all cost_usd values for the billing period and compare to the provider invoice. Small discrepancies are normal due to rounding; large ones indicate a pricing mismatch — verify your per-token rates are current. Provider pricing changes quarterly; update your cost calculator whenever a model's rates change.


The Bottom Line

Multi-tenant LLM cost isolation isn't optional once you're past a few hundred customers. Without it, you're pricing in the dark, subsidizing your heaviest users, and unable to make the margin calculations that determine whether your AI features are profitable.

The implementation is straightforward: tag every request with tenant metadata, log costs to a per-tenant accumulator, and add Redis counters for real-time enforcement. Or use a proxy that does it for you.

Read next: Per-Feature AI Cost Tracking | SaaS AI Features Cost Guide


About the authors: Written by the engineers behind Tokonomics. About → | Contact us →

About the author
Written by the engineers behind Tokonomics — built to solve the exact problem this post covers.
← Back to Blog