Which LLM Provider Is Quietly Getting More Expensive? â”€ Tokonomics

You checked your LLM API pricing last month. Maybe two months ago. You picked a model, budgeted around it, and moved on.

Here's the problem: the price you budgeted for might not be the price you're paying anymore.

Between January and June 2026, OpenAI, Anthropic, and Google made 14 combined pricing changes across their model lineups. Some prices dropped. Some crept up. A few disappeared entirely when models got deprecated and replaced by pricier successors.

None of them sent you an email about it.

The changes nobody talks about

Let's start with what actually moved.

OpenAI retired GPT-4 Turbo in Q1 2026. If your code still pointed at gpt-4-turbo, it silently rerouted to GPT-4o. Same name in your logs, different price. GPT-4o is cheaper per token than the old Turbo — but the output token rate shifted from $0.03/M to $0.01/M. Sounds like a win until you realize your prompts were optimized for Turbo's behavior, and GPT-4o generates 30-40% more output tokens on the same prompt. Your per-call cost went up while the per-token price went down.

Anthropic launched Claude Sonnet 4 in May 2026 at $3.00/M input. Claude Sonnet 3.5 was $3.00/M too — same price, right? Not quite. Sonnet 4 uses extended thinking by default on complex queries, and thinking tokens bill at the same output rate. A prompt that cost $0.04 on Sonnet 3.5 can cost $0.12 on Sonnet 4 because of the invisible thinking overhead. Three times more — and nothing changed in your code.

Google kept Gemini 2.5 Flash at $0.15/M input. Great price. But they added a context length surcharge most teams missed: anything over 128K tokens doubles the rate to $0.30/M. If you're doing RAG with long documents, your actual cost is 2x what the pricing page headline says.

Why your bill doesn't match the pricing page

The pricing page shows you per-token rates. Your bill reflects what actually happened — and there's a gap between the two that grows every month.

Three things cause the gap:

Model deprecation roulettes. When a provider sunsets a model, your API calls don't fail. They silently redirect to the successor. The successor might cost more, generate more tokens, or behave differently enough that your prompts produce longer outputs. You won't see it unless you're tracking cost per call, not just total spend.

Hidden token categories. Thinking tokens, cached tokens, system prompt tokens — these didn't exist two years ago. Now they each have their own rate. Anthropic charges full output rate for thinking tokens. Google gives you 75% off cached tokens but charges 2x for long context. The headline price is just one number in a matrix of five or six.

Quiet feature changes. OpenAI's structured output mode, Anthropic's extended thinking, Google's code execution — these features alter how many tokens a response contains. When a provider enables a feature by default on a new model version, your token count changes without you doing anything.

Who actually got more expensive

If you froze your code in January 2026 and checked your June bill, here's what likely happened:

You're paying more if you use Claude for complex reasoning (thinking token overhead), send long documents to Gemini (context surcharge), or relied on a deprecated model that got rerouted.

You're paying less if you switched to Gemini 2.5 Flash for simple tasks (genuinely cheap at $0.15/M), or you're using DeepSeek V3 which hasn't changed pricing since launch.

You have no idea if you're not tracking cost per call. And that's most teams. A 2026 survey by a16z found that 71% of companies using LLM APIs don't track spending at the individual call level (a16z, 2026). They see one line item on a monthly invoice and hope it looks reasonable.

The problem isn't that providers are being sneaky. They publish every price change. The problem is that nobody is watching — and by the time you check, three months of drift have already hit your budget.

If your AI bill surprised you this month, you're not alone. Tokonomics tracks every API call by model, feature, and cost — with alerts before the invoice arrives, not after.

Pricing data current as of June 28, 2026. Check provider pricing pages for latest rates.