The Battle for AI Inference Stack Control Intensifies as Independent Platforms Consolidate

By Larbi Belkhit | 04 Mar 2026 | IN-8059

Mistral AI's acquisition of Koyeb reflects a broader consolidation of independent serverless inference platforms as model developers, cloud providers, and hardware vendors compete to control the inference value chain. This ABI Insight explores the market dynamics and implications of this consolidation trend and provides strategic recommendations for industry stakeholders.

Checking your access...

By Larbi Belkhit | 04 Mar 2026 | IN-8059

NEWS

Serverless Inference Platforms Become a Key Competitive Battleground

As enterprise Artificial Intelligence (AI) adoption continues to grow, the inference market begins to develop at a great pace across all layers of the stack. Inference compute, whether custom or general-purpose, continues to evolve to reduce the cost of inference, and cloud providers are allowing developers and enterprises to access model inference via Application Programming Interfaces (APIs) through serverless inference platforms, abstracting the management and provisioning of underlying cloud infrastructure away from them, in contrast to the AI training domain, which typically requires greater control over the underlying infrastructure. Furthermore, as different types of inference workloads proliferate, such as code generation, video generation, and agentic systems generally, having a platform to abstract this complexity is becoming a key battleground.

Hyperscalers such as Amazon Web Services (AWS), Microsoft, and Google Cloud have offered serverless inference through their respective platforms (AWS Bedrock, Azure AI Foundry, Vertex AI) for the last couple of years. Neoclouds are now entering this space to capture the growing inference opportunity and to address both training and inference workloads. Examples of neoclouds moving up the stack include Crusoe Cloud (Managed Inference), Together AI (Serverless Inference), and Nebius (Token Factory).

Now, some foundation model developers are making a more concerted effort to drive adoption of their models by moving down the stack. In February 2026, Mistral AI agreed to acquire Koyeb, an independent serverless AI inference platform provider, in its first acquisition. Independent serverless AI inference platforms are cloud-agnostic, horizontal platforms that enable developers to deploy and serve AI models via APIs, abstracting infrastructure management and optimizing for cost and latency across different compute environments. Koyeb’s 13-person team and founders are set to join Mistral AI and the acquisition is set to accelerate the launch of Mistral Compute, which was announced back in June 2025. This deal follows Mistral’s US$1.4 billion data center investment in Sweden, as it continues to evolve beyond a pure foundation model developer.

IMPACT

Independent Platform Consolidation Signals a Battle for Inference Stack Control

The acquisition of Koyeb is part of a wider consolidation pattern of independent inference platforms in the industry recently. Cloudflare acquired Replicate to supplement its Workers platform at the beginning of 2026, NVIDIA acquired OctoML in 2024 and Lepton AI in 2025, and Red Hat acquired Neural Magic at the beginning of 2025. This trend reflects how the inference value chain is becoming vertically integrated from multiple directions:

Model developers are moving down the stack to have more control over infrastructure and to optimize the cost efficiency of their models (e.g., Mistral).
Hardware vendors are moving up the stack to optimize the cost of inference on top of their compute (e.g., NVIDIA).
Cloud service providers are developing their own optimization layers to move beyond bare-metal provisioning to support adoption and build their moats (e.g., AWS, Microsoft).

The reason behind this trend is relatively simple: inference stack control matters to all stakeholders in the AI space. With it being continuous and revenue-generating, having control over the inference stack means that industry players can better manage their margins and end-customer relationships. For cloud service providers, they have pre-existing relationships with enterprises, but the developer-centric positioning of independent inference players means that acquiring such players almost guarantees buy-in from the developer community. For AI hardware vendors such as NVIDIA or AMD, acquiring such players helps to better optimize these platforms for their compute and to expand their Total Addressable Market (TAM) along with fostering an ecosystem around their compute.

For the independent inference platform players such as Baseten, Modal, and Modular, two trends will continue to occur: 1) acquisitions by incumbent players, or 2) mergers to strengthen value proposition and remaining independent from cloud or compute players. As for the latter, we saw Modular merge with BentoML in February 2026 to have a single cohesive stack for AI workflow performance optimization.

RECOMMENDATIONS

Securing Position in the Inference Value Chain

The battle for AI inference control is set to be fiercer and more important than it was for AI training. The intersection between AI hardware, cloud providers, and the underlying platforms end users engage with is where the entire industry expects to see significant Returns on Investment (ROIs) that it has made. The next few years will determine which compute and cloud providers shape the competitive landscape for the AI inference market, and ABI Research expects to see Mergers & Acquisitions (M&A) activity of independent serverless inference platforms continue to occur over the next 3 to 5 years at an accelerated pace, as has occurred in recent months.

Independent inference platform providers must develop a differentiation in the market by specializing in specific inference workloads to continue embedding themselves in the developer community. This can be supplemented by forming strategic partnerships/merging with other independent players to complement their value proposition to become prevalent players in this market in the long term, challenging cloud providers for end users. The other strategy is to position the company for an acquisition, forming relationships with a wide variety of inference chip providers to ensure they are well positioned to enable developer access to a variety of compute platforms and making them attractive to cloud providers seeking hardware flexibility.

For cloud providers, specifically neoclouds, moving up the stack is critical to longevity as the market moves away from training workloads to inference. Capturing and addressing the enterprise opportunity for neoclouds requires a full stack approach. Otherwise, they will be confined to selling bare-metal or even simply houses for hyperscalers and/or foundation model developers to house their model development/internal workloads. For further information, please see ABI Research’s Neoclouds Aim to Build Commercial Moats by Investing in Software and Managed Services (AN-6514) report.

For hyperscalers, Mistral AI’s move down the stack is a new competitive challenge and they must observe the actions of others such as OpenAI and Anthropic closely moving forward. For example, OpenAI is set to have Broadcom-developed custom AI chips coming online from 2H 2026. Combined with potential acquisitions of independent inference platforms, foundation model developers could increasingly decouple from hyperscaler infrastructure entirely, owning more of the stack from silicon to API and reducing their dependency on specific cloud providers for serving their models. Hyperscalers must continue to develop their custom inference platforms. Locking in customers is key to negating this longer-term challenge.

AI hardware vendors must ensure that their inference stacks are deeply integrated into the serverless platforms winning developer adoption. Especially as more inference-optimized AI chips become available, ensuring that the platforms developers access are fully optimized for their own compute is key to mitigating this threat, and continuing to acquire independent serverless platforms as they move up the stack helps to ensure customer stickiness. For example, NVIDIA’s acquisition of both OctoML and Lepton AI have supported the move toward its DGX Cloud offering, and its latest US$20 billion non-exclusive licensing agreement with Groq could also see Language Processing Units (LPUs) added to DGX Cloud, or even its team (which joined NVIDIA as part of the agreement), augmenting NVIDIAs Graphics Processing Unit (GPU) inference competitiveness from a cost perspective.

Written by Larbi Belkhit

Senior Analyst

Larbi Belkhit is a Senior Analyst part of ABI Research’s Strategic Technologies research group and leads its coverage of AI software & platforms. He delivers end-to-end research, closely analysing adoption trends, growth opportunities, business models, and domain-specific implementations in end markets.

Related Services

AI & Machine Learning

Cloud

Related Products

AI Cloud Workloads

Market Data | 2Q 2026 | MD-AICW-101

AI Cloud Workloads Market Data Overview: 2Q 2026

Presentation | 2Q 2026 | PT-4030

Neocloud Infrastructure Strategies: Silicon to Servers

Report | 4Q 2025 | AN-6476

The Battle for AI Inference Stack Control Intensifies as Independent Platforms Consolidate

By Larbi Belkhit | 04 Mar 2026 | IN-8059

By Larbi Belkhit | 04 Mar 2026 | IN-8059

NEWS

Serverless Inference Platforms Become a Key Competitive Battleground

IMPACT

Independent Platform Consolidation Signals a Battle for Inference Stack Control

RECOMMENDATIONS

Securing Position in the Inference Value Chain

Written by Larbi Belkhit

Related Services

Related Products

Related Insights

Nebius Strengthens Its Token Factory with Eigen AI and Clarifai

Inference Is Disaggregating to Balance Performance, Latency, and Cost

AI Data Center Semiconductor Market Consolidation: Intel Considers Snapping up Challenger SambaNova Systems

Job Role

Industry

By Topic

Packages

Services

Spotlights

5G, Cloud & Networks

AI & Robotics

Automotive

Bluetooth, Wi-Fi & Short Range Wireless

Cyber & Digital Security

IoT

Vertical Markets

All Other Services

News & Resources

Vendors & Rankings

About Us

RESEARCH SERVICES

5G, Cloud & Networks

AI & Robotics

Automotive

Bluetooth, Wi-Fi & Short Range Wireless

Cyber & Digital Security

IoT

Vertical Markets

All Other Services

FREE RESOURCES

PRESS RESOURCES

COMPANY