The Battle for AI Inference Stack Control Intensifies as Independent Platforms Consolidate
By Larbi Belkhit |
04 Mar 2026 |
IN-8059
Log In to unlock this content.
You have x unlocks remaining.
This content falls outside of your subscription, but you may view up to five pieces of premium content outside of your subscription each month
You have x unlocks remaining.
By Larbi Belkhit |
04 Mar 2026 |
IN-8059
NEWSServerless Inference Platforms Become a Key Competitive Battleground |
As enterprise Artificial Intelligence (AI) adoption continues to grow, the inference market begins to develop at a great pace across all layers of the stack. Inference compute, whether custom or general-purpose, continues to evolve to reduce the cost of inference, and cloud providers are allowing developers and enterprises to access model inference via Application Programming Interfaces (APIs) through serverless inference platforms, abstracting the management and provisioning of underlying cloud infrastructure away from them, in contrast to the AI training domain, which typically requires greater control over the underlying infrastructure. Furthermore, as different types of inference workloads proliferate, such as code generation, video generation, and agentic systems generally, having a platform to abstract this complexity is becoming a key battleground.
Hyperscalers such as Amazon Web Services (AWS), Microsoft, and Google Cloud have offered serverless inference through their respective platforms (AWS Bedrock, Azure AI Foundry, Vertex AI) for the last couple of years. Neoclouds are now entering this space to capture the growing inference opportunity and to address both training and inference workloads. Examples of neoclouds moving up the stack include Crusoe Cloud (Managed Inference), Together AI (Serverless Inference), and Nebius (Token Factory).
Now, some foundation model developers are making a more concerted effort to drive adoption of their models by moving down the stack. In February 2026, Mistral AI agreed to acquire Koyeb, an independent serverless AI inference platform provider, in its first acquisition. Independent serverless AI inference platforms are cloud-agnostic, horizontal platforms that enable developers to deploy and serve AI models via APIs, abstracting infrastructure management and optimizing for cost and latency across different compute environments. Koyeb’s 13-person team and founders are set to join Mistral AI and the acquisition is set to accelerate the launch of Mistral Compute, which was announced back in June 2025. This deal follows Mistral’s US$1.4 billion data center investment in Sweden, as it continues to evolve beyond a pure foundation model developer.
IMPACTIndependent Platform Consolidation Signals a Battle for Inference Stack Control |
The acquisition of Koyeb is part of a wider consolidation pattern of independent inference platforms in the industry recently. Cloudflare acquired Replicate to supplement its Workers platform at the beginning of 2026, NVIDIA acquired OctoML in 2024 and Lepton AI in 2025, and Red Hat acquired Neural Magic at the beginning of 2025. This trend reflects how the inference value chain is becoming vertically integrated from multiple directions:
- Model developers are moving down the stack to have more control over infrastructure and to optimize the cost efficiency of their models (e.g., Mistral).
- Hardware vendors are moving up the stack to optimize the cost of inference on top of their compute (e.g., NVIDIA).
- Cloud service providers are developing their own optimization layers to move beyond bare-metal provisioning to support adoption and build their moats (e.g., AWS, Microsoft).
The reason behind this trend is relatively simple: inference stack control matters to all stakeholders in the AI space. With it being continuous and revenue-generating, having control over the inference stack means that industry players can better manage their margins and end-customer relationships. For cloud service providers, they have pre-existing relationships with enterprises, but the developer-centric positioning of independent inference players means that acquiring such players almost guarantees buy-in from the developer community. For AI hardware vendors such as NVIDIA or AMD, acquiring such players helps to better optimize these platforms for their compute and to expand their Total Addressable Market (TAM) along with fostering an ecosystem around their compute.
For the independent inference platform players such as Baseten, Modal, and Modular, two trends will continue to occur: 1) acquisitions by incumbent players, or 2) mergers to strengthen value proposition and remaining independent from cloud or compute players. As for the latter, we saw Modular merge with BentoML in February 2026 to have a single cohesive stack for AI workflow performance optimization.
RECOMMENDATIONSSecuring Position in the Inference Value Chain |
The battle for AI inference control is set to be fiercer and more important than it was for AI training. The intersection between AI hardware, cloud providers, and the underlying platforms end users engage with is where the entire industry expects to see significant Returns on Investment (ROIs) that it has made. The next few years will determine which compute and cloud providers shape the competitive landscape for the AI inference market, and ABI Research expects to see Mergers & Acquisitions (M&A) activity of independent serverless inference platforms continue to occur over the next 3 to 5 years at an accelerated pace, as has occurred in recent months.
Independent inference platform providers must develop a differentiation in the market by specializing in specific inference workloads to continue embedding themselves in the developer community. This can be supplemented by forming strategic partnerships/merging with other independent players to complement their value proposition to become prevalent players in this market in the long term, challenging cloud providers for end users. The other strategy is to position the company for an acquisition, forming relationships with a wide variety of inference chip providers to ensure they are well positioned to enable developer access to a variety of compute platforms and making them attractive to cloud providers seeking hardware flexibility.
For cloud providers, specifically neoclouds, moving up the stack is critical to longevity as the market moves away from training workloads to inference. Capturing and addressing the enterprise opportunity for neoclouds requires a full stack approach. Otherwise, they will be confined to selling bare-metal or even simply houses for hyperscalers and/or foundation model developers to house their model development/internal workloads. For further information, please see ABI Research’s Neoclouds Aim to Build Commercial Moats by Investing in Software and Managed Services (AN-6514) report.
For hyperscalers, Mistral AI’s move down the stack is a new competitive challenge and they must observe the actions of others such as OpenAI and Anthropic closely moving forward. For example, OpenAI is set to have Broadcom-developed custom AI chips coming online from 2H 2026. Combined with potential acquisitions of independent inference platforms, foundation model developers could increasingly decouple from hyperscaler infrastructure entirely, owning more of the stack from silicon to API and reducing their dependency on specific cloud providers for serving their models. Hyperscalers must continue to develop their custom inference platforms. Locking in customers is key to negating this longer-term challenge.
AI hardware vendors must ensure that their inference stacks are deeply integrated into the serverless platforms winning developer adoption. Especially as more inference-optimized AI chips become available, ensuring that the platforms developers access are fully optimized for their own compute is key to mitigating this threat, and continuing to acquire independent serverless platforms as they move up the stack helps to ensure customer stickiness. For example, NVIDIA’s acquisition of both OctoML and Lepton AI have supported the move toward its DGX Cloud offering, and its latest US$20 billion non-exclusive licensing agreement with Groq could also see Language Processing Units (LPUs) added to DGX Cloud, or even its team (which joined NVIDIA as part of the agreement), augmenting NVIDIAs Graphics Processing Unit (GPU) inference competitiveness from a cost perspective.
Written by Larbi Belkhit
- Competitive & Market Intelligence
- Executive & C-Suite
- Marketing
- Product Strategy
- Startup Leader & Founder
- Users & Implementers
Job Role
- Telco & Communications
- Hyperscalers
- Industrial & Manufacturing
- Semiconductor
- Supply Chain
- Industry & Trade Organizations
Industry
Services
Spotlights
5G, Cloud & Networks
- 5G Devices, Smartphones & Wearables
- 5G, 6G & Open RAN
- Cellular Standards & Intellectual Property Rights
- Cloud
- Enterprise Connectivity
- Space Technologies & Innovation
- Telco AI
AI & Robotics
Automotive
Bluetooth, Wi-Fi & Short Range Wireless
Cyber & Digital Security
- Citizen Digital Identity
- Digital Payment Technologies
- eSIM & SIM Solutions
- Quantum Safe Technologies
- Trusted Device Solutions