The primary objective of multimodal learning is to consolidate the learning process from heterogeneous data streamed from various sensors and other data inputs into a single model, either for prediction or inference. Multimodal learning systems can improve on unimodal ones because modalities can carry complementary information about each other, which will only become evident when they are both included in the learning process. Therefore, learning-based methods that combine signals from different modalities can generate more robust inference, or even new insights impossible in a unimodal system. Multimodal learning has been a research topic in computer science since the mid-1970s, but recent improvements in Deep Learning reignited interest in the field. In the initial phase of multimodal learning, rules-based approaches dominated implementations. However, increasingly, a hybrid mixture of rules-based and deep learning based multimodal learning is becoming the most popular style of software implementation, creating specific implementation requirements for multimodal learning systems.
The market is currently experiencing the first wave of multimodal learning applications and products that draw on Deep Learning techniques to both interrupt sensor data and increasingly inform the multimodal learning process itself. Multimodal learning exploits complementary aspects of modality data streams, making it a powerful technology and enabling new business applications that fall into three categories: classification, decision making, and HMI. Shipments of devices using multimodal learning will increase from 24.74 million in 2018 to 514.12 million in 2023. The market sectors most aggressively introducing multimodal learning systems include automotive, robotics, consumer devices, media and entertainment, and healthcare.
At present, several applications are driving the uptake of multimodal learning, creating demand for systems which can support it. Implementing multimodal learning is still challenging, as open source software efforts remain limited, while capable hardware platforms that bring multimodal learning inference to devices at the edge are only just starting to emerge. The inference of hybrid multimodal learning software has compute requirements that are best served by heterogeneous computing architectures. Consequently, some companies are now building specialized chips based on heterogeneous architectures.