Tokonomics is a drop-in API proxy. You change one URL in your app, and every LLM call is automatically tracked, costs calculated, and budget alerts configured — without storing your prompts or responses.
This post explains the full request flow, what data is captured, what isn't, and how the cost metering integrates with your existing stack.
Quick version: Your app → Tokonomics proxy → LLM provider. Same response, same latency. We intercept the usage metadata, calculate cost, check budgets, and fire alerts. Your prompt content never leaves the proxy.
This is the hub for Tokonomics' product-led content. Related: Getting Started with Tokonomics
The Request Flow
1. Your app sends a normal LLM API call to Tokonomics proxy URL
2. Proxy reads your API key → identifies your account
3. Proxy reads feature/tenant headers → routes to correct budget counter
4. Proxy checks Redis budget counter → ALLOW or DENY
5. On ALLOW: proxy forwards request to LLM provider (OpenAI, Anthropic, etc.)
6. Provider streams response back through proxy to your app
7. Proxy intercepts usage object from response → calculates cost
8. Proxy stores cost record (no prompt content, no response content)
9. Proxy updates Redis counter with actual cost
10. Proxy checks alert thresholds → fires webhook/email if triggered
Steps 2–4 and 7–10 happen transparently. Your app sends a request and receives a response — identical to calling the provider directly. The proxy adds sub-millisecond overhead.
What Gets Tracked
Every LLM call through Tokonomics logs exactly this:
| Field | Example | Stored? |
|---|---|---|
| Account ID | acc_abc123 |
✅ |
| Feature name | support-bot |
✅ |
| Tenant ID | tenant_xyz789 |
✅ |
| Provider | anthropic |
✅ |
| Model | claude-haiku-4-5 |
✅ |
| Input tokens | 523 |
✅ |
| Output tokens | 387 |
✅ |
| Cost (USD) | $0.001046 |
✅ |
| Latency (ms) | 842 |
✅ |
| Environment | production |
✅ |
| Timestamp | 2026-06-02T14:23:11Z |
✅ |
| Prompt content | your actual prompt | ❌ Never stored |
| Response content | the model's response | ❌ Never stored |
Your prompts and completions are never stored. The proxy reads the usage object from the API response (which contains token counts, not content) and discards the rest. This is both a privacy feature and a design choice — storing response content at scale is expensive and creates compliance liability.
Supported Providers
Tokonomics works with any provider that has an OpenAI-compatible API:
| Provider | Base URL | Models |
|---|---|---|
| OpenAI | openai.com/v1 |
GPT-4o, GPT-4.1, GPT-4o-mini, o1, o3 |
| Anthropic | api.anthropic.com |
Claude Sonnet, Haiku, Opus |
| DeepSeek | api.deepseek.com/v1 |
V4-Flash, V4-Pro |
| Google (via OpenAI compat) | generativelanguage.googleapis.com |
Gemini 2.5 Flash, Pro |
| Mistral | api.mistral.ai/v1 |
Large, Small |
| Any OpenAI-compatible | Custom base URL | Any model |
| Groq, Together, Perplexity, etc. | Custom base URL | Provider-specific |
The Budget Enforcement Layer
Tokonomics uses Redis atomic counters for real-time budget enforcement:
Before the LLM call:
- Estimate cost based on request metadata (model + estimated token count)
- Check Redis counter for the relevant budget scope (global / per-tenant / per-feature)
- If estimated cost would exceed the budget cap: return 429 (no LLM call made, no tokens consumed)
After the LLM call:
- Read actual token counts from the provider's response
- Update Redis counter with the real cost (replacing the estimate)
- Check alert thresholds
- Fire webhook or email if a threshold is crossed
The three budget scopes:
- Global account budget — total monthly spend cap for your Tokonomics account
- Per-tenant budget — individual budget per customer in your SaaS
- Per-feature budget — individual budget for a specific feature (e.g.,
support-bot)
All three operate independently. A single call checks all applicable scopes.
Pricing Tiers and What's Included
| Plan | Price | Key features |
|---|---|---|
| Starter | $49/mo | 1 seat, 3 budget alerts, 30-day data retention, up to 1M requests/mo |
| Pro | $99/mo | 5 seats, unlimited alerts, 90-day retention, up to 10M requests/mo |
| Enterprise | $299/mo | Unlimited seats, 12-month retention, SSO, white-label, unlimited requests |
All plans include: all LLM providers, hard spending caps, per-tenant isolation, model routing, webhook alerts.
Frequently Asked Questions
Does Tokonomics add latency to my LLM calls?
Sub-millisecond. The proxy adds a DNS lookup (if uncached) and the Redis counter check. In practice, Tokonomics adds 0.5–2ms to each call. For calls that take 300ms–5,000ms, this is not user-detectable.
Does it work with streaming responses?
Yes. Tokonomics streams response chunks back to your app in real time. The usage metadata (token counts) appears at the end of the stream in most providers' streaming format. Tokonomics reads this and logs the cost without buffering the full response.
What happens if Tokonomics has an outage?
Tokonomics is designed to fail open: if the proxy is unavailable, your requests fall through to the provider directly (with no cost tracking that period). We maintain 99.9% uptime SLA on Pro and Enterprise plans.
Can I self-host Tokonomics?
Not currently. Tokonomics is a managed cloud service. If self-hosting is a hard requirement, LiteLLM is an open-source alternative with similar proxy functionality (though less polished monitoring).
Is my data GDPR-compliant?
Yes. Tokonomics stores only billing metadata — no prompt content, no user data. Our servers are in the US (Pro) and EU (Enterprise add-on). A GDPR Data Processing Agreement is available on Enterprise plans.
The Bottom Line
Tokonomics is a proxy. Point it at any LLM provider. Add three headers. Get real-time cost visibility, budget alerts, hard caps, and per-tenant isolation — without changing your feature code and without storing your customers' data.
Built by the team that hit a $47,000 LLM invoice. About → | Contact us →