June 15, 2026
8 mins read

Vendor lock-in is something most technology leaders understand. Build too heavily on one cloud provider and migration becomes expensive. AWS and Azure do broadly the same things under different names. AWS S3 and Azure Blob Storage are equivalent. Moving between them involves cost and effort, but a like-for-like path exists.
AI model lock-in does not work like that. There is no like-for-like. And most organisations building AI capability have not fully mapped what that means for the systems they are building today.
This is not a theoretical concern. It is a practical architectural problem that tends to become visible at the worst possible moment: when a provider changes their pricing, deprecates a model, or when a competitor pulls ahead and you want to move.
When you build on a specific AI model, you are building on a specific set of weights. Those weights are the product of a unique training and post-training process that strongly shapes the model’s capabilities, learned patterns, response style, ambiguity-handling, and failure modes. In production, the final behaviour is also affected by system prompts, tool use, retrieval, decoding settings, and the surrounding application architecture. No two providers train the same model. No provider can give you another provider's model.
Give the same task to GPT-4 and Claude Opus and you will get meaningfully different outputs. Not always better or worse in an absolute sense, but different in ways that matter if your system depends on consistency. When you tune a workflow, a prompt, or an evaluation process to work well with one model's specific inference characteristics, you are making an implicit commitment. That commitment is not easily transferred.
This is the core of the problem. With cloud infrastructure, switching requires effort but a functional equivalent exists. With AI models, an exact functional equivalent does not exist. Another model may be good enough for the same business task, but it will not behave identically and should be treated as a migration, not a like-for-like swap. You cannot get GPT-4 from Anthropic. You cannot get Claude from OpenAI. What each model is, at a fundamental level, is the result of a training process that belongs to one organisation and cannot be replicated.
We encountered this directly while building the Edaro platform. Edaro is an AI-enabled spend visibility and control system for Multi-Academy Trusts, and we were building on commercially available AI models throughout the development process.
While we were building, the provider released new model versions. With those releases came pricing changes and rate limits on older models, the standard mechanism for encouraging migration to newer versions. We tested the newer models and found performance differences that required us to revisit work we had already completed. Some of what we had built was tuned to the specific behaviour of an earlier model. The newer version did not behave identically.
That was within a single provider. A migration to a different provider entirely would have been a substantially more significant problem. The output quality, the response characteristics, the way the model handles edge cases: all of these would shift. A system built and tested against one model's behaviour is not ready to run on a different model without meaningful rework. That rework is not always scoped or budgeted for when the original build decision is made.
The lesson from that experience is that model selection carries more long-term weight than it appears at the point of decision. You are not just choosing a capability level. You are choosing a set of inference characteristics that your system will be built around, and those characteristics are not portable.
One of the technical dimensions of model selection that organisations often underweight is the context window. The context window determines the maximum amount of input and output, measured in tokens, that the model can process in a single request. A larger window lets the model attend over more material, but it does not guarantee that every detail will be used accurately or that reasoning across the whole context will be reliable. Different models have substantially different context windows, and for some use cases that difference is not marginal. It is architectural.
For example, some current Opus and Sonnet models support context windows up to one million tokens, while a compact model such as Haiku 4.5 may support a smaller 200,000-token window, depending on platform and deployment That difference determines whether a system can process an entire document, a complete codebase, or a long conversation history in a single request, rather than chunking, retrieving, or summarising it first. If your use case depends on large context, you cannot simply switch to a model with a smaller window and expect the system to behave equivalently. The architecture changes.
This matters for lock-in because it means the model you choose shapes what becomes technically possible. A system designed around a large context window is not just dependent on a model's quality. It is dependent on a specific technical capability that may not exist in the same form elsewhere. When you later want to move, or are forced to move, the context window may be a constraint that is more significant than the quality difference.
Understanding lock-in matters primarily because of the circumstances that create pressure to change. There are three that are worth thinking about clearly.
The first is cost. Current token pricing is, in part, subsidised by investment capital. The major AI providers are not pricing at full cost recovery. That subsidy exists to drive adoption and establish market position. It will not last indefinitely. As providers mature commercially, whether through IPO or the natural evolution of their funding structure, the economics of token pricing will change.
Nobody outside those organisations knows the timing or the magnitude of that change. But organisations building commercial models on the current token cost are making an assumption about future pricing that may prove incorrect. If you are running high volumes of inference, a significant price increase is not an abstract risk. It is a commercial scenario that is worth modelling before you are committed to an architecture.
The second is capability. The relative positions of AI providers are not fixed. An organisation that builds around what one provider can do today is making an implicit bet that this provider remains competitive for the lifetime of the system. If a competitor advances in the areas that matter for a specific use case, the desire to move will increase. The lock-in problem means that desire may not be easily acted on.
The third is model deprecation. Providers retire models. When they do, organisations that have built on those models face a forced migration. This is not a hypothetical. It is an ongoing reality for anyone building on commercially available AI. The provider controls the upgrade cadence. You follow it, or you are left running on an unsupported model. That forced migration carries the same rework cost as any other model change, with the added pressure of an externally imposed deadline.
This dynamic has some resemblance to how Apple manages device support. They retire older devices, which simplifies their engineering and creates commercial pressure on customers to upgrade. The economics are not identical for AI providers, but the pattern is recognisable. You follow their cadence, or you manage the consequences of not doing so.
The practical response to this is not to avoid building on commercially available models. For most organisations and most use cases, they remain the most accessible route to AI capability. The response is to build with explicit awareness of the commitments being made and to reduce unnecessary lock-in where doing so does not add significant cost or complexity.
The most important question to ask early is how tightly the system is being tuned to a specific model's behaviour. The more a system relies on one model's particular inference characteristics to function correctly, the more expensive any future migration becomes. Where it is possible to abstract the model layer, maintaining that abstraction is worth the discipline it requires.
It's also worth building a realistic picture of token cost exposure. If the system involves high volumes of inference, modelling what a significant price increase would mean for its commercial viability is a sensible part of the architecture conversation, not something to be deferred to later.
For some organisations, particularly those processing sensitive data or operating under strict data governance requirements, the vendor lock-in question connects to a broader question about whether third-party AI providers are the right approach at all. That question is explored in the companion article on sovereign AI.
If you are in the early stages of building AI capability and want to make architectural decisions that reduce long-term exposure, our AI Discovery service maps the landscape, identifies where dependencies are concentrated, and helps you make more deliberate decisions about how you build.
For organisations with existing AI systems that want to understand their current lock-in position and what a migration or diversification strategy would involve, our dedicated consultancy team works through exactly these problems.
The question is not whether vendor lock-in is something worth thinking about. It is whether the architecture decisions being made today reflect a clear understanding of what they commit the organisation to. Most do not. Getting that clarity early is considerably less expensive than gaining it after the system is in production.