Multimodal Learning: Rewriting the Rules of AI Data Collection

514 million devices with multimodal learning applications will ship in 2023
09 Oct 2019

The total installed base of devices with Artificial Intelligence (AI) will grow from 2.7 billion in 2019 to 4.5 billion in 2024, forecasts global tech market advisory firm, ABI Research. There are billions of petabytes of data flowing through these AI devices every day; the challenge now facing both technology companies and implementers is getting all these devices to learn, think, and work together. According to a recent whitepaper from ABI Research, Artificial Intelligence Meets Business Intelligence, multimodal learning is the key to making this happen, and it’s fast becoming one of the most exciting — and potentially transformative — fields of artificial intelligence.

“Multimodal learning consolidates disconnected, heterogeneous data from various sensors and data inputs into a single model,” explains Stuart Carlaw, Chief Research Officer at ABI Research. “Learning-based methods that combine signals from different modalities can generate more robust inference, or even new insights, which would be impossible in a unimodal system.”

Multimodal is well placed to scale, as the underlying supporting technologies like Deep Neural Networks (DNNs) –  a giant leap forward over rules-based software -  have already done so in unimodal applications like image recognition in camera surveillance or voice recognition and Natural Language Processing (NLP) in virtual assistants like Amazon’s Alexa. At the same time, organizations are recognizing the need for multimodal learning to manage and automate processes that span the entirety of their operations. Given these factors, ABI Research estimates that the total number of devices shipped with multimodal learning applications will grow from 3.9 million in 2017 to 514 million in 2023.

“There is impressive momentum driving multimodal applications into devices, with five key end-market verticals most aggressively adopting multimodal learning: automotive, robotics, consumer, healthcare, and media and entertainment,” Carlaw points out.

In the automotive space, multimodal learning is being introduced to Advanced Driver Systems (ADAS), In-Vehicle Human Machine Interface (HMI) assistants and Driver Monitoring Systems (DMSs) for real-time inferencing and prediction.

Robotics vendors are incorporating multimodal learning systems into robotics HMIs and movement automation to broaden consumer appeal and provide greater collaboration between workers and robots in the industrial space.

Consumer device companies, particularly those in the smartphone and smart home markets, are competing intensely to demonstrate the value of their products over competitors. New features and refined systems are critical to generating a marketing edge, making consumer electronics companies good candidates for adopting multimodal learning-enabled systems into their products. Growing use cases include security and payment authentication, recommendation and personalization engines and personal assistants.

Medical companies and hospitals are still relatively early in their exploration of multimodal learning techniques, but there are already some promising emerging applications in medical imaging. The value of multimodal learning to patients and doctors will be difficult for health services to resist, even if adoption is initially slow.

Media and entertainment companies are already using multimodal learning to help with structuring their content into labeled metadata, so they can improve content recommendation systems, personalized advertising, and automated compliance marking. So far, deployments of metadata tagging systems have been limited, as the technology has only recently been made available to the industry.

“The most extensive application of multimodal learning today is for behavior and language modeling in smartphones. Classification, decision-making, and HMI systems are going to play a significant role in driving adoption of multimodal learning, providing a catalyst to refine and standardize some of the technical approaches,” Carlaw concludes.

These findings are from ABI Research’s Artificial Intelligence Meets Business Intelligence whitepaper. This whitepaper is part of the company’s AI & Machine Learning research service, which includes research, data, and ABI Insights.

To learn more about the growing commercial demand for Artificial Intelligence multimodal learning and an in-depth look at the end market opportunities being created in key verticals, download our free whitepaper, Artificial Intelligence Meets Business Intelligence



About ABI Research

ABI Research provides strategic guidance to visionaries, delivering actionable intelligence on the transformative technologies that are dramatically reshaping industries, economies, and workforces across the world. ABI Research’s global team of analysts publish groundbreaking studies often years ahead of other technology advisory firms, empowering our clients to stay ahead of their markets and their competitors. 

For more information about ABI Research’s services, contact us at +1.516.624.2500 in the Americas, +44.203.326.0140 in Europe, +65.6592.0290 in Asia-Pacific or visit

About ABI Research

ABI Research is a global technology intelligence firm uniquely positioned at the intersection of technology solution providers and end-market companies. We serve as the bridge that seamlessly connects these two segments by providing exclusive research and expert guidance to drive successful technology implementations and deliver strategies proven to attract and retain customers.

ABI Research 是一家全球性的技术情报公司,拥有得天独厚的优势,充当终端市场公司和技术解决方案提供商之间的桥梁,通过提供独家研究和专业性指导,推动成功的技术实施和提供经证明可吸引和留住客户的战略,无缝连接这两大主体。

For more information about ABI Research’s services, contact us at +1.516.624.2500 in the Americas, +44.203.326.0140 in Europe, +65.6592.0290 in Asia-Pacific, or visit

Contact ABI Research

Media Contacts

Americas: +1.516.624.2542
Europe: +44.(0).203.326.0142
Asia: +65 6950.5670

Related Service