Per-Feature AI Cost Tracking: Tag Every LLM Call

"Which feature is driving our AI bill?" is one of the most common questions SaaS CTOs ask after seeing an unexpected AI invoice. The answer is almost always: "We don't know — we only see the total."

That gap has a real cost. According to CloudZero (2024), 68% of engineering teams cannot attribute AI spend to specific features. Meanwhile, Flexera's 2023 State of the Cloud Report found that 82% of enterprises cite cost management as their top AI challenge. You're not alone — but you can fix it in an afternoon.

This guide explains the tagging pattern, which metadata fields matter most, and how to implement feature-level cost attribution in any stack.

TL;DR: 68% of engineering teams can't attribute AI costs to specific features (CloudZero, 2024). Adding one metadata tag — feature_name — to every LLM call transforms a lump-sum AI bill into a feature-by-feature breakdown. Document summarization uses 5-10x more tokens than simple Q&A, but most teams don't discover this without attribution data.

Key Takeaways

68% of engineering teams can't attribute AI costs to features (CloudZero, 2024)

Tagging one field, feature_name, transforms a lump-sum AI bill into a feature-by-feature cost breakdown

Document summarization uses 5-10x more tokens per call than simple Q&A, but teams rarely discover this without attribution data

GPT-4o costs $2.50/1M input tokens and $10/1M output tokens (OpenAI, 2024) - output token growth is the silent budget killer

Most teams find a single feature consumes 60-70% of total AI spend, and it's rarely the one they expected

What Per-Feature Attribution Actually Looks Like

Without feature tags, your cost dashboard shows a single growing number. With them, Benchmarkit/Mavvrik's 2025 AI Cost Management study found only 34% of companies can produce a breakdown like the one below - and those companies fix cost spikes in minutes, not weeks.

Without feature-level tracking, your monitoring looks like this:

Month	Total AI Cost
April	$6,200
May	$7,800
June	$8,500

With per-feature tagging, it looks like this:

Feature	June Cost	MoM Change	Avg Input Tokens
`support-bot`	$5,200	+42%	2,847
`chat-assistant`	$2,100	+8%	891
`code-reviewer`	$1,200	-3%	1,203

Now you know the support-bot prompt grew (token count up 40%) and that single fact explains 80% of the June increase. One prompt audit solves the problem. Without attribution, you'd spend three weeks investigating everything.

In our experience running Tokonomics, teams using feature tags discover the root cause of a cost spike within 15 minutes of it appearing. Untagged teams typically take 2-3 days to reach the same conclusion, often after a stressful all-hands review of every API call in the logs.

Feature cost distribution pattern observed across SaaS products with 3 active AI features. One feature consistently dominates 50-70% of total AI spend. Note: percentages are representative of patterns we've observed in production; your distribution will vary based on call volume and prompt design.

Citation capsule: A 2024 CloudZero study found that 68% of engineering teams cannot attribute AI spend to specific product features, making it impossible to perform targeted cost optimization or justify per-feature ROI. (CloudZero, 2024)

Why Do Chatbots and Summarizers Have Such Different Cost Profiles?

The hidden asymmetry in LLM cost comes down to token counts per call. GPT-4o is priced at $2.50 per million input tokens and $10 per million output tokens (OpenAI pricing, 2024) — output costs 4x more than input. Features that generate long responses pay disproportionately.

According to OpenAI's token counting documentation and tiktoken benchmarks, document summarization tasks typically consume 5-10x more tokens per call than simple Q&A interactions. A chatbot answering "What are your hours?" might use 150 input and 40 output tokens. A summarizer processing a 5-page support ticket uses 2,000 input and 600 output tokens for the same model. That's a 15x cost difference per call.

Chatbots look cheap per call but run at high volume, often thousands of requests per day. Summarizers are expensive per call but run less frequently. Without tagging, you'd never know which pattern dominates your bill.

Here's what typical token profiles look like across common AI features:

Feature Type	Avg Input Tokens	Avg Output Tokens	Relative Cost/Call
Simple Q&A / chatbot	200-400	50-150	1x (baseline)
Classification	100-300	10-30	0.5x
Code review	800-2,000	200-600	6x
Document summarizer	1,500-4,000	400-1,000	12-15x
Report generation	2,000-6,000	800-2,500	20-30x

Citation capsule: OpenAI charges $2.50 per million input tokens and $10 per million output tokens for GPT-4o as of 2024. Document summarization tasks generate 5-10x more tokens per call than simple Q&A, making them 12-15x more expensive per request despite appearing similar in code. (OpenAI Pricing, 2024; OpenAI Token Counting Docs, 2024)

How Does the Tagging Pattern Work?

The implementation has two parts: tagging outbound requests, and aggregating by tag in your cost store. The proxy-layer approach works in any language with zero changes to feature logic — just add headers to your HTTP client config.

Part 1: Tag Every LLM Request

The cleanest approach uses custom HTTP headers passed through your proxy layer. These headers don't reach the LLM API. Your proxy strips them before forwarding, but logs them with every cost record.

HTTP header approach (language-agnostic):

POST /proxy/openai/chat/completions
Authorization: Bearer mk_your_token
X-Feature-Name: support-bot
X-Tenant-ID: tenant_abc123
X-Environment: production
X-User-Tier: pro
Content-Type: application/json

{ ...your normal OpenAI request body... }

Alternative: SDK wrapper (for teams not using a proxy):

def call_llm(prompt, feature_name, tenant_id):
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    cost = calculate_cost(
        response.usage.prompt_tokens,
        response.usage.completion_tokens,
        "gpt-4o-mini"
    )
    log_cost(feature=feature_name, tenant=tenant_id, cost=cost)
    return response

The header approach is preferred because it's language-agnostic. It requires no changes to your existing feature code — just update the base URL and add three headers.

Part 2: The Metadata Fields That Matter

Field	Type	Required	Example	Why
`feature_name`	string	Yes	`support-bot`	Core attribution key
`tenant_id`	string	Yes	`tenant_abc123`	Multi-tenant cost isolation
`environment`	enum	Yes	`production`	Excludes staging from billing
`model`	string	Auto	`gpt-4o-mini`	Model cost validation
`user_tier`	string	Recommended	`pro`	Plan-based cap enforcement
`request_type`	string	Optional	`classification`	Task-level optimization
`version`	string	Optional	`v2.1`	A/B test cost comparison

The minimum viable tag set is feature_name + tenant_id + environment. The rest add analytical depth over time.

What Can You Do With Per-Feature Data?

Once you have feature-level attribution, four use cases open up immediately. Flexera's 2023 State of the Cloud Report found that 82% of enterprises cite cost management as their top AI challenge - and all four of these use cases directly address that problem.

1. Cost spike diagnosis. When your monthly bill jumps, you know which feature caused it. You often know why within minutes: token count growth, a usage spike, or a new deployment. Investigation time drops from days to minutes.

2. Feature-level optimization decisions. You can calculate the ROI of optimizing each feature separately. "The support-bot costs $5,200/month. Adding prompt caching projects $3,640/month in savings. Caching takes four hours to implement. That's a clear business case."

3. Pricing model validation. If your Pro plan includes "unlimited AI," feature tags tell you the actual cost per Pro user per feature. If summarizer costs $12/month per Pro user and you charge $29/month total, your margin math needs attention.

4. A/B test cost comparison. Tag requests with version: v1 vs version: v2. Compare cost per equivalent outcome. Run both prompt versions in parallel and measure cost per task, not just quality scores.

Citation capsule: Flexera's 2023 State of the Cloud Report found that 82% of enterprises cite cost management as their number-one AI challenge. Per-feature attribution is the first step toward actionable cost governance because it transforms a single aggregate number into feature-specific data that engineers and product managers can act on. (Flexera, 2023)

How Do You Implement This in Specific Stacks?

The pattern is three lines of code in any language. Add the headers to your HTTP client configuration once, and every call through that client gets tagged automatically. Here are working examples for the three most common stacks.

Python:

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["TOKONOMICS_KEY"],
    base_url=os.environ["TOKONOMICS_URL"] + "/proxy/openai",
)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    extra_headers={
        "X-Feature-Name": "support-bot",
        "X-Tenant-ID": str(tenant_id),
        "X-Environment": os.environ.get("APP_ENV", "production"),
    }
)

Node.js:

const response = await fetch(`${PROXY_URL}/proxy/openai/chat/completions`, {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.TOKONOMICS_KEY}`,
    'X-Feature-Name': 'support-bot',
    'X-Tenant-ID': req.user.tenantId,
    'X-Environment': process.env.NODE_ENV,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(payload)
});

Python:

import httpx
response = httpx.post(
    f"{PROXY_URL}/proxy/anthropic/messages",
    headers={
        "Authorization": f"Bearer {TOKONOMICS_KEY}",
        "X-Feature-Name": "code-reviewer",
        "X-Tenant-ID": current_tenant.id,
        "X-Environment": os.environ.get("ENV", "production"),
    },
    json=payload
)

The base URL changes. Three headers get added. Everything else stays the same. That's the full implementation for most teams.

Frequently Asked Questions

What's the minimum tagging setup that gives useful insights?

Tag feature_name and environment on every request. According to OpenAI's production best practices guide, even basic call metadata reduces debugging time significantly. Those two fields immediately show cost per feature and filter out staging noise. Add tenant_id when you need per-customer attribution. Start minimal and expand.

Does tagging add latency to my API calls?

No. Headers are processed by the proxy in under 1ms. Your request latency is unchanged from the LLM provider's perspective — the headers are stripped before forwarding. The HTTP/1.1 spec treats additional headers as negligible overhead.

How do I retroactively add tags to existing features?

Update each feature to pass the new headers to your HTTP client. This is typically a one-line change per feature. If you're instrumenting dozens of endpoints, start with your highest-cost features first. Pull 30 days of raw logs, sort by token count descending, and start at the top.

Can I use tags for automated routing and optimization?

Yes. Your routing config can use feature tags to select models: if feature == "classification" → use gpt-4o-mini instead of gpt-4o. That single rule can cut classification costs by 90%, since gpt-4o-mini costs $0.15/1M input tokens vs $2.50/1M for gpt-4o (OpenAI, 2024). Tags become the routing key for every downstream optimization.

What's the difference between feature tags and tenant tags?

Feature tags answer "which product capability is this?" Tenant tags answer "which customer is this?" You need both. Feature tags drive optimization decisions. Tenant tags drive billing and per-customer budget caps. They're complementary, not interchangeable. See our multi-tenant cost isolation guide for the full tenant tagging pattern.

From Feature Flags to Cost Clarity

Per-feature cost attribution is the single most impactful monitoring change most SaaS teams can make. It costs one afternoon to implement. It makes every subsequent optimization decision faster and more precise.

Tag every LLM call. Know which feature owns every dollar of your AI spend. The Benchmarkit/Mavvrik 2025 research found that companies with mature AI cost tracking - the 34% who've done this work - miss forecasts far less often and spend less time in reactive cost-cutting mode. The other 66% are still guessing.

The jump from "we see total spend" to "we see spend per feature" takes one afternoon. Start with your top three features. Add X-Feature-Name to each HTTP client config. Let the data show you where to focus next. See how Tokonomics works for the architecture behind tag-based cost attribution, or get started in 5 minutes.

About the author: Zouhair Ait Oukhrib is the founder of Tokonomics and a software engineer with over a decade of experience building SaaS infrastructure. He writes about AI cost management, LLM observability, and the practical side of scaling AI features in production. About → | Contact →

All sources retrieved June 2026.