← Blog
how-tokonomics-works llm-cost-metering llm-proxy June 2, 2026 6 min read

How Tokonomics Works: LLM Cost Metering Explained

Glowing blue circuit board brain representing AI cost metering infrastructure and LLM request routing

Tokonomics is a drop-in API proxy. You change one URL in your app, and every LLM call is automatically tracked, costs calculated, and budget alerts configured — without storing your prompts or responses.

This post explains the full request flow, what data is captured, what isn't, and how the cost metering integrates with your existing stack.

Quick version: Your app → Tokonomics proxy → LLM provider. Same response, same latency. We intercept the usage metadata, calculate cost, check budgets, and fire alerts. Your prompt content never leaves the proxy.

This is the hub for Tokonomics' product-led content. Related: Getting Started with Tokonomics


The Request Flow

1. Your app sends a normal LLM API call to Tokonomics proxy URL
2. Proxy reads your API key → identifies your account
3. Proxy reads feature/tenant headers → routes to correct budget counter
4. Proxy checks Redis budget counter → ALLOW or DENY
5. On ALLOW: proxy forwards request to LLM provider (OpenAI, Anthropic, etc.)
6. Provider streams response back through proxy to your app
7. Proxy intercepts usage object from response → calculates cost
8. Proxy stores cost record (no prompt content, no response content)
9. Proxy updates Redis counter with actual cost
10. Proxy checks alert thresholds → fires webhook/email if triggered

Steps 2–4 and 7–10 happen transparently. Your app sends a request and receives a response — identical to calling the provider directly. The proxy adds sub-millisecond overhead.


What Gets Tracked

Every LLM call through Tokonomics logs exactly this:

Field Example Stored?
Account ID acc_abc123
Feature name support-bot
Tenant ID tenant_xyz789
Provider anthropic
Model claude-haiku-4-5
Input tokens 523
Output tokens 387
Cost (USD) $0.001046
Latency (ms) 842
Environment production
Timestamp 2026-06-02T14:23:11Z
Prompt content your actual prompt ❌ Never stored
Response content the model's response ❌ Never stored

Your prompts and completions are never stored. The proxy reads the usage object from the API response (which contains token counts, not content) and discards the rest. This is both a privacy feature and a design choice — storing response content at scale is expensive and creates compliance liability.


Supported Providers

Tokonomics works with any provider that has an OpenAI-compatible API:

Provider Base URL Models
OpenAI openai.com/v1 GPT-4o, GPT-4.1, GPT-4o-mini, o1, o3
Anthropic api.anthropic.com Claude Sonnet, Haiku, Opus
DeepSeek api.deepseek.com/v1 V4-Flash, V4-Pro
Google (via OpenAI compat) generativelanguage.googleapis.com Gemini 2.5 Flash, Pro
Mistral api.mistral.ai/v1 Large, Small
Any OpenAI-compatible Custom base URL Any model
Groq, Together, Perplexity, etc. Custom base URL Provider-specific

The Budget Enforcement Layer

Tokonomics uses Redis atomic counters for real-time budget enforcement:

Before the LLM call:

After the LLM call:

The three budget scopes:

  1. Global account budget — total monthly spend cap for your Tokonomics account
  2. Per-tenant budget — individual budget per customer in your SaaS
  3. Per-feature budget — individual budget for a specific feature (e.g., support-bot)

All three operate independently. A single call checks all applicable scopes.


Pricing Tiers and What's Included

Plan Price Key features
Starter $49/mo 1 seat, 3 budget alerts, 30-day data retention, up to 1M requests/mo
Pro $99/mo 5 seats, unlimited alerts, 90-day retention, up to 10M requests/mo
Enterprise $299/mo Unlimited seats, 12-month retention, SSO, white-label, unlimited requests

All plans include: all LLM providers, hard spending caps, per-tenant isolation, model routing, webhook alerts.


Frequently Asked Questions

Does Tokonomics add latency to my LLM calls?

Sub-millisecond. The proxy adds a DNS lookup (if uncached) and the Redis counter check. In practice, Tokonomics adds 0.5–2ms to each call. For calls that take 300ms–5,000ms, this is not user-detectable.

Does it work with streaming responses?

Yes. Tokonomics streams response chunks back to your app in real time. The usage metadata (token counts) appears at the end of the stream in most providers' streaming format. Tokonomics reads this and logs the cost without buffering the full response.

What happens if Tokonomics has an outage?

Tokonomics is designed to fail open: if the proxy is unavailable, your requests fall through to the provider directly (with no cost tracking that period). We maintain 99.9% uptime SLA on Pro and Enterprise plans.

Can I self-host Tokonomics?

Not currently. Tokonomics is a managed cloud service. If self-hosting is a hard requirement, LiteLLM is an open-source alternative with similar proxy functionality (though less polished monitoring).

Is my data GDPR-compliant?

Yes. Tokonomics stores only billing metadata — no prompt content, no user data. Our servers are in the US (Pro) and EU (Enterprise add-on). A GDPR Data Processing Agreement is available on Enterprise plans.


The Bottom Line

Tokonomics is a proxy. Point it at any LLM provider. Add three headers. Get real-time cost visibility, budget alerts, hard caps, and per-tenant isolation — without changing your feature code and without storing your customers' data.

Start free for 14 days →


Built by the team that hit a $47,000 LLM invoice. About → | Contact us →

About the author
Written by the engineers who built Tokonomics.
← Back to Blog