← Blog
ai-vendor-lock-in multi-provider-llm llm-portability June 6, 2026 9 min read

How to Avoid Vendor Lock-In with AI API Services

Network connections representing multi-provider AI API architecture and vendor independence

TL;DR — Wrap every LLM call behind an abstraction layer (adapter pattern). Never call openai.chat.completions.create() directly from business logic. Use an LLM proxy (like Tokonomics) that routes by model name — then switching providers is a config change, not a rewrite.

Vendor lock-in with LLM APIs is more dangerous than with traditional cloud services. Models change pricing quarterly. A provider that's cheapest today may double their prices tomorrow. A model that's state-of-the-art this month gets surpassed next month. If switching providers requires rewriting your application, you're stuck paying whatever the current vendor charges.

The LLM market moves too fast to bet on one provider. OpenAI dropped GPT-4o pricing by 50% in a single announcement. DeepSeek launched with prices 18x lower than GPT-4o for comparable quality. If your architecture only works with OpenAI, you can't capitalize on these shifts.

This article covers five patterns that keep you provider-independent, ranked from simplest to most robust.

How lock-in happens with LLM APIs

Lock-in sneaks in through four doors:

1. Provider-specific SDKs. Using openai.chat.completions.create() directly means your code is coupled to OpenAI's SDK. Switching to Anthropic means rewriting every call to use anthropic.messages.create() — different parameters, different response format, different error handling.

2. Provider-specific features. OpenAI's function calling format differs from Anthropic's tool use format. OpenAI's JSON mode, response format constraints, and logprobs don't exist on other providers. Build on these, and switching means rebuilding those features.

3. Fine-tuned models. A model fine-tuned on OpenAI can't be exported or run on Anthropic. Your training data and the resulting model are locked to that provider.

4. Prompt engineering for one model. Prompts optimized for GPT-4o may produce worse results on Claude or DeepSeek. Different models respond differently to the same instructions. Heavy prompt engineering for one model creates soft lock-in.

Pattern 1: Abstraction layer (simplest)

Wrap your LLM calls in a provider-agnostic function that hides the SDK details:

# llm_client.py — your abstraction layer
class LLMClient:
    def __init__(self, provider="openai", model="gpt-4o-mini"):
        self.provider = provider
        self.model = model
    
    def complete(self, messages, max_tokens=1000, temperature=0.7):
        if self.provider == "openai":
            return self._openai_complete(messages, max_tokens, temperature)
        elif self.provider == "anthropic":
            return self._anthropic_complete(messages, max_tokens, temperature)
        elif self.provider == "deepseek":
            return self._deepseek_complete(messages, max_tokens, temperature)
    
    def _openai_complete(self, messages, max_tokens, temperature):
        response = openai.chat.completions.create(
            model=self.model,
            messages=messages,
            max_tokens=max_tokens,
            temperature=temperature,
        )
        return {
            "content": response.choices[0].message.content,
            "input_tokens": response.usage.prompt_tokens,
            "output_tokens": response.usage.completion_tokens,
            "model": self.model,
        }
    
    def _anthropic_complete(self, messages, max_tokens, temperature):
        # Convert OpenAI message format to Anthropic format
        system = next((m["content"] for m in messages if m["role"] == "system"), "")
        user_messages = [m for m in messages if m["role"] != "system"]
        
        response = anthropic.messages.create(
            model=self.model,
            system=system,
            messages=user_messages,
            max_tokens=max_tokens,
            temperature=temperature,
        )
        return {
            "content": response.content[0].text,
            "input_tokens": response.usage.input_tokens,
            "output_tokens": response.usage.output_tokens,
            "model": self.model,
        }

Your app code never touches a provider SDK directly:

# Anywhere in your app
client = LLMClient(provider="openai", model="gpt-4o-mini")
result = client.complete(messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Summarize this article..."}
])

# Switching to Anthropic? Change one line:
client = LLMClient(provider="anthropic", model="claude-haiku-3-5-20241022")

Implementation effort: 1-2 days for a basic wrapper. Handles 80% of use cases.

Limitation: Doesn't handle provider-specific features (function calling, vision, structured outputs). If you use these heavily, you'll need Pattern 2.

Pattern 2: OpenAI-compatible API standard

Most LLM providers now support the OpenAI API format. DeepSeek, Mistral, Groq, Together AI, and many others accept the same request format as OpenAI — you just change the base URL.

from openai import OpenAI

# OpenAI
client = OpenAI()

# DeepSeek — same SDK, different base URL
client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key="your-deepseek-key"
)

# Groq — same SDK, different base URL
client = OpenAI(
    base_url="https://api.groq.com/openai/v1",
    api_key="your-groq-key"
)

# Your code stays identical
response = client.chat.completions.create(
    model="deepseek-chat",  # only the model name changes
    messages=[{"role": "user", "content": "Hello"}]
)

This is the lowest-effort path to provider independence. If you standardize on the OpenAI API format, you can switch between 10+ providers by changing two lines: base_url and model.

Anthropic is the notable exception — their API format differs significantly. For Anthropic support, you need an abstraction layer (Pattern 1) or a proxy (Pattern 4).

Pattern 3: Configuration-driven model routing

Store your model choices in configuration, not in code. This lets you switch models per feature, per environment, or per customer without deploying new code:

# config/models.yml
features:
  chatbot:
    model: gpt-4o-mini
    provider: openai
    fallback_model: deepseek-chat
    fallback_provider: deepseek
  
  classification:
    model: claude-haiku-3-5-20241022
    provider: anthropic
    fallback_model: gpt-4o-mini
    fallback_provider: openai
  
  content_generation:
    model: gpt-4o
    provider: openai
    fallback_model: claude-sonnet-4-20250514
    fallback_provider: anthropic
config = load_config("config/models.yml")

def get_client(feature):
    settings = config["features"][feature]
    return LLMClient(
        provider=settings["provider"],
        model=settings["model"]
    )

# In your chatbot feature
client = get_client("chatbot")
response = client.complete(messages)

Why this matters for cost control: When a provider drops prices or a cheaper model launches, you update a config file instead of refactoring code. When GPT-4o-mini dropped from $0.30 to $0.15 per million tokens, teams with config-driven routing captured the savings in minutes. Teams with hardcoded models needed a code change, review, and deploy.

This is also the foundation of a multi-model cost optimization strategy — routing each feature to the cheapest model that meets quality requirements.

Pattern 4: Proxy layer

Route all LLM traffic through a proxy that handles provider differences. Your app talks to one endpoint; the proxy translates to whichever provider you configure.

# Your app always hits one URL
client = OpenAI(
    base_url="https://proxy.yourcompany.com/v1",
    api_key="your-proxy-key"
)

# The proxy decides which provider to forward to
response = client.chat.completions.create(
    model="gpt-4o-mini",  # proxy resolves this to the right provider
    messages=[{"role": "user", "content": "Hello"}]
)

The proxy approach gives you:

This is exactly what Tokonomics does. It sits between your app and any LLM provider, forwarding requests to the right upstream API while recording usage. You change one base URL and get provider independence plus cost visibility for free.

The getting started guide shows the integration in 5 minutes — regardless of your programming language or framework.

Pattern 5: Prompt portability

Even with a perfect abstraction layer, lock-in happens at the prompt level if your prompts only work well on one model. Build portable prompts:

Keep prompts simple and clear. Complex, model-specific prompt tricks (specific token patterns, unusual formatting that GPT responds to) break on other models. Clear, straightforward instructions work across all models.

Test prompts on multiple models. Before deploying, test your prompts on at least 2-3 models. If a prompt only works on GPT-4o, rewrite it until it works on Claude and DeepSeek too. This makes switching costless.

Separate prompt logic from model logic. Store prompts as templates. The model is a deployment decision, not a prompt decision:

# prompts/classify_intent.txt
Classify the following customer message into one of these categories:
- billing
- technical
- feature_request
- complaint
- other

Message: {user_message}

Category:

This prompt works identically on GPT-4o-mini, Claude Haiku, DeepSeek V3, and Llama 3.3. No model-specific tricks. Pure clarity.

Avoid fine-tuning unless necessary. A fine-tuned model is the strongest form of lock-in. The fine-tuned weights exist only on that provider. If you fine-tune on OpenAI and later want to switch, you need to retrain from scratch on the new platform. Fine-tune open-source models (Llama, Mistral) if portability matters — you can run them anywhere.

The cost of not being portable

Lock-in has a measurable cost. Here's a real scenario:

Locked-in team: Using GPT-4o for everything, deeply integrated with OpenAI's function calling format, fine-tuned model for one feature.

Portable team: Abstraction layer + config-driven routing, no fine-tuned models.

The abstraction layer cost 2 days to build. It saves $51,600/year in this example. That's the ROI of portability.

What to do right now

  1. Audit your integration depth. How many places in your code directly import a provider's SDK? Each one is a switching cost.
  2. Build an abstraction layer. Even a simple wrapper (Pattern 1) reduces switching from weeks to hours.
  3. Use the OpenAI format as standard. Most providers support it. Build against it by default.
  4. Store model choices in config. Never hardcode a model name in business logic.
  5. Track costs per provider. Use Tokonomics to see exactly what you're spending on each provider — that data drives switching decisions.
  6. Test prompts on 2+ models. Before deploying any prompt, verify it works on at least one alternative provider.

The LLM market is 18 months old and changing quarterly. Betting your architecture on a single provider is betting that today's market leader will still be the best and cheapest option in 12 months. History says that's a losing bet. Build for portability, and you'll always be able to move to the best option — whether that's cheaper, faster, or higher quality.

Last updated June 2026. All sources retrieved June 2026.

About the author
Zouhair is the founder of Tokonomics. He built the platform after receiving a $47,000 LLM invoice that his team didn't see coming. He tracks LLM pricing changes weekly across all major providers.
Connect on LinkedIn →
← Back to Blog