TL;DR — PAYG is cheaper when volume is unpredictable or under 50M tokens/month. Committed use saves 20–50% when your volume is consistent and high. Rule of thumb: if your last 3 months of spend vary by less than 20%, negotiating committed pricing is worth the conversation.
LLM providers offer two pricing models: pay-as-you-go (PAYG) and committed use. With PAYG, you pay per token with no upfront commitment. With committed use, you prepay or guarantee a minimum spend in exchange for lower per-token rates.
The difference is significant. OpenAI's committed use plans can reduce costs by 20-30%. Google Cloud's Vertex AI offers up to 50% off with committed use discounts. But lock in too early or at the wrong volume, and you end up paying for capacity you don't use.
This article breaks down how each pricing model works across major providers, gives you the math to calculate your break-even point, and tells you when each option makes financial sense.
How pay-as-you-go works
PAYG is the default for every LLM provider. You sign up, get an API key, and pay for exactly what you use. No minimums, no commitments, no risk.
Current PAYG rates for popular models (June 2026):
| Provider | Model | Input (per 1M) | Output (per 1M) |
|---|---|---|---|
| OpenAI | GPT-4o | $2.50 | $10.00 |
| OpenAI | GPT-4o-mini | $0.15 | $0.60 |
| Anthropic | Claude Sonnet 4 | $3.00 | $15.00 |
| Anthropic | Claude Haiku 3.5 | $0.80 | $4.00 |
| Gemini 2.5 Pro | $1.25 | $10.00 | |
| Gemini 2.5 Flash | $0.15 | $0.60 | |
| DeepSeek | DeepSeek V3 | $0.27 | $1.10 |
For the complete pricing across all models and providers, see our LLM API pricing guide 2026.
Advantages of PAYG:
- Zero risk — pay only for what you use
- Switch models or providers instantly
- No minimum commitment
- Scale up or down without penalty
Disadvantages:
- Highest per-token price
- No volume discounts (except tier-based rate limits)
- Costs are unpredictable if usage varies
How committed use works
Committed use plans vary by provider. Here's what each offers:
OpenAI: Usage tiers + prepaid credits
OpenAI doesn't offer traditional committed use contracts for most customers. Instead, they use a tier system based on cumulative spend:
| Tier | Spend threshold | Benefit |
|---|---|---|
| Tier 1 | $5+ | Base rate limits |
| Tier 2 | $50+ | 2x rate limits |
| Tier 3 | $100+ | 3x rate limits |
| Tier 4 | $250+ | Higher limits |
| Tier 5 | $1,000+ | Highest limits |
The tiers increase your rate limits (requests per minute, tokens per minute) but don't reduce per-token pricing. For actual price reductions, OpenAI offers enterprise agreements with custom pricing for high-volume customers — typically $10,000+/month spend.
OpenAI also sells prepaid credit packages. Buying $1,000+ in credits upfront doesn't currently give a discount, but it does prevent billing surprises and makes budgeting easier.
Anthropic: Volume commitments
Anthropic offers volume-based pricing for teams spending $5,000+/month. The structure is negotiated directly with their sales team. Typical discounts range from 10-25% depending on committed monthly volume and contract length (annual vs month-to-month).
Anthropic's Build and Scale tiers increase rate limits automatically based on usage, similar to OpenAI's tier system.
Google Cloud (Vertex AI): Committed use discounts
Google offers the most structured committed use pricing through Vertex AI:
- 1-year commitment: up to 20% discount on inference costs
- 3-year commitment: up to 40-50% discount
- Provisioned throughput: guaranteed capacity at a fixed monthly price, independent of actual usage
Google's model is closest to traditional cloud committed use (like EC2 Reserved Instances). You commit to a minimum monthly spend, and in return, you get a lower per-token rate for the duration of the contract.
Amazon Bedrock: Provisioned throughput
AWS offers provisioned throughput for models on Bedrock. You buy a fixed number of model units (processing capacity) at a monthly rate, regardless of how many tokens you process:
- No per-token charges — you pay for capacity, not usage
- Guaranteed latency and throughput
- Best for predictable, high-volume workloads
The break-even point is roughly 50-60% utilization. Below that, PAYG is cheaper. Above that, provisioned throughput saves money.
The break-even calculation
The decision to commit comes down to one number: your monthly spend consistency.
Formula:
Break-even utilization = committed price / PAYG price
Example with Google Vertex AI:
- PAYG rate for Gemini 2.5 Pro: $1.25/1M input tokens
- 1-year committed rate (20% off): $1.00/1M input tokens
- Committed monthly minimum: $5,000
If your actual monthly usage is:
- $6,000/month → You save $1,200/year (20% on the full $6k). Worth it.
- $4,000/month → You pay $5,000 but only use $4,000 worth. You lose $1,000/month. Not worth it.
- $5,500/month → You save $1,100 on usage but committed to $5,000 minimum. Net savings: $100/month. Marginal.
The rule of thumb: Commit only when your lowest expected monthly usage exceeds 80% of the commitment minimum. If your usage fluctuates between $3,000 and $8,000/month, a $5,000 commitment will cost you money in low months.
When PAYG makes sense
You're in the first 6 months of production. Your usage patterns haven't stabilized. You don't know which models you'll use in 3 months. Committing now locks you into pricing for models you might not use.
Your usage varies more than 30% month to month. Seasonal products, campaign-driven traffic, or features still being tested. A commitment sized for your peak month wastes money in every other month.
You're actively optimizing. If you're mid-way through switching from GPT-4o to GPT-4o-mini for 60% of your calls, your costs are about to drop significantly. Don't commit at today's spend level when next month's spend will be 40% lower.
You use multiple providers. Committing to one provider reduces your flexibility to route requests to the cheapest option. If you're using a multi-provider strategy (GPT-4o for complex tasks, DeepSeek for simple ones), PAYG across all providers gives you maximum flexibility.
Your total LLM spend is under $2,000/month. The administrative overhead of negotiating and managing a committed use contract isn't worth the savings at low volumes. At $2,000/month with a 20% discount, you save $400/month — less than the cost of the engineering time to set it up and manage it.
When committed use makes sense
Your monthly spend is stable and above $5,000. Three months of consistent $5,000+ spend means your usage patterns are established. A 20% discount saves $1,000+/month.
You're locked into one provider. If your entire app is built on OpenAI's function calling format and you're not planning to switch, the flexibility argument for PAYG doesn't apply.
You need guaranteed throughput. Provisioned throughput (Google, AWS) guarantees capacity during peak hours. PAYG can throttle you with rate limits during high-demand periods. If your app can't tolerate rate limits, provisioned throughput is both a performance and cost decision.
You have predictable growth. If your usage grows 10-15% per month consistently, you can commit to a level that matches your 3-month-forward projection and still save on every month in between.
The hybrid approach
Most teams that optimize well use a hybrid model:
-
Base load on committed pricing. Your minimum monthly usage goes on a committed plan at discounted rates. If you always use at least 50M tokens/month of GPT-4o, commit to that floor.
-
Burst capacity on PAYG. Usage above your committed floor stays on PAYG. You pay full price for the overflow, but your base is discounted.
-
Alternative models on PAYG. Your secondary models (GPT-4o-mini for simple tasks, DeepSeek for cost-sensitive batch jobs) stay on PAYG because their usage is less predictable.
This approach captures 60-80% of the savings from full commitment while maintaining flexibility for the variable portion of your usage.
How to prepare for a committed use negotiation
If you decide to commit, come to the negotiation with data:
-
Three months of usage history. Total tokens, broken down by model. If you don't have this, run a monthly audit for three months first.
-
Growth projection. Where will your usage be in 6 and 12 months? Base it on product roadmap, not optimism.
-
Model migration plans. If you're planning to switch models, don't commit to the old one. Wait until after the migration.
-
Usage floor, not average. Commit to your lowest expected month, not your average. Overshoot is pure waste.
-
Exit clauses. Models improve and pricing changes fast. A 3-year commitment on 2026 pricing may look bad when 2027 models are 3x cheaper. Negotiate annual price reviews or early termination clauses.
Tracking costs across pricing tiers
Whether you're on PAYG, committed use, or a hybrid, you need cost visibility. Provider dashboards show total spend but don't break it down by feature, team, or customer — the dimensions that actually drive decisions.
Tokonomics tracks every API call with exact token counts and costs, regardless of which provider or pricing tier you're on. It works as a proxy between your app and any LLM provider, so your cost data is centralized even if you're using OpenAI on committed pricing and DeepSeek on PAYG simultaneously.
The per-feature cost tracking is especially useful for committed use decisions. If you can see that your chatbot feature consistently uses 40M GPT-4o tokens/month and your summarizer uses 8M, you know exactly what to commit to and what to leave on PAYG.
Budget alerts and hard spending caps work independently of your provider's pricing model. They track your actual costs in real time, so you know when you're approaching your budget whether you're on PAYG or committed rates.
The decision checklist
Before choosing a pricing model:
- [ ] Do I have 3+ months of stable usage data?
- [ ] Is my monthly LLM spend above $5,000?
- [ ] Does my usage vary less than 30% month to month?
- [ ] Am I committed to a single provider for the next 12 months?
- [ ] Does the discount exceed the risk of overcommitting?
If you answered yes to all five, committed use will save you money. If you answered no to any of them, stay on PAYG and revisit in 3 months.
LLM pricing is dropping 2-3x per year. Locking in today's price for three years almost never makes sense. Annual commitments with price review clauses are the sweet spot — you capture this year's discount without betting against next year's price drops.
For teams still figuring out their usage patterns, start with PAYG, estimate your costs before deploying new features, and audit monthly. The data you collect now is what makes a committed use negotiation possible later.
Last updated June 2026. All sources retrieved June 2026.