Co-Inference: An Artificial Intelligence Technique that 5G Will Unlock

Subscribe To Download This Insight

4Q 2018 | IN-5327

A team of researchers from Sun Yat-sen University in China have developed a new technique for Artificial Intelligence (AI) inference that spreads inference across both the edge and the cloud—the researchers are calling this technique “co-inference.” A combination of 5G and co-inference could massively improve flexibility around the management of inference on devices. The researchers present an approach that marries both the edge and the cloud together in a framework called Edgenet, a deep-learning co-inference model. Co-inference relies on an idea called Deep Neural Network (DNN) partitioning—a process of adaptively splitting DNN layers between the edge and the cloud relative to the available bandwidth and compute at both. Co-inferencing segments inference processing between the edge device and the cloud by splitting the layers of the DNN and assigning them either to the edge or to the cloud. The critical task here is identifying the most computationally intensive layers of a DNN and having the inference of those layers take place in the cloud. Done correctly, this can reduce latency and at the same time send as little data to the cloud as possible.

Registered users can unlock up to five pieces of premium content each month.

Log in or register to unlock this Insight.

 

What Is Co-Inference, and How Can It Improve AI Inference?

NEWS


A team of researchers from Sun Yat-sen University in China have developed a new technique for Artificial Intelligence (AI) inference that spreads inference across both the edge and the cloud—the researchers are calling this technique “co-inference.” A combination of 5G and co-inference could massively improve flexibility around the management of inference on devices. The researchers present an approach that marries both the edge and the cloud together in a framework called Edgenet, a deep-learning co-inference model. Co-inference relies on an idea called Deep Neural Network (DNN) partitioning—a process of adaptively splitting DNN layers between the edge and the cloud relative to the available bandwidth and compute at both. Co-inferencing segments inference processing between the edge device and the cloud by splitting the layers of the DNN and assigning them either to the edge or to the cloud. The critical task here is identifying the most computationally intensive layers of a DNN and having the inference of those layers take place in the cloud. Done correctly, this can reduce latency and at the same time send as little data to the cloud as possible.

DNNs—the underlying technology which supports several AI use cases such as computer vision, speech recognition, and natural language processing—require a tremendous amount of computation to process. DNNs are made of neural layers. The most sophisticated and modern DNNs can be 100-plus layers deep and may not even be organized in a linear style. The ones commonly implemented on mobile devices tend to be in the tens of layers deep with hundreds of nodes per layer—meaning that the number of parameters can easily reach millions. As such, there is also a large disparity between the inference run time of different layers of a neural network. This has made inference a difficult challenge for anyone implementing DNNs, as there are many calculations to perform. Right now, there are two main approaches to performing inference.

The first approach is to leverage the cloud for processing the DNN in its entirety. All data is sent to the cloud for processing, and then the results are sent back to the device. Due to the nature of sending high volumes of data (common in video and image AI processing) across networks, this cloud-centric approach incurs a large end-to-end latency penalty, especially when relying on the cloud for inference. The researchers found that DNN latency was heavily dependent on bandwidth. Researchers conducted an experiment: When the bandwidth that connected their device ran an image, recognition dropped from 1 Megabits per second (Mbps) to 50 Kilobits per second (Kbps), and the latency of edge-based inference went from 0.123 seconds to 2.317 seconds—on par with localized processing on a device with nonspecialized hardware.

The second approach involves moving processing to the edge. This is often advocated by those building edge AI hardware platforms. Again, given the high volumes of calculations required in edge processing, there still can be significant latency cost even with custom hardware. There is also a power consumption penalty incurred by including such hardware on the device. A device-centric approach may be costly in terms of model accuracy, as the compute resources available in the cloud cannot be matched. Only a limited number of devices are currently using edge processing for inference.

Putting the focus of AI inference computation on either the edge or the cloud independently has pitfalls that can result in higher latency, lower accuracy, or greater power consumption. The researchers were able to show that Edgenet could outperform both a cloud-centric and device-centric approach, particularly when bandwidth on the local network became limited.

Why Is Co-Inference Going to Be Significant for AI and 5G?

IMPACT


Some vendors like Gemalto have been promoting 5G as a solution for AI. With 5G, latency can be achieved in 1 millisecond, and inference times can be lowered by providing the fast hand over back-and-forth of inference models between the edge and the cloud—in effect enabling co-inference. A combination of 5G and co-inference will give flexibility to those devices performing inference at the edge, allowing them to partially hand over some inference to the cloud. This will be particularly helpful for devices with power constraints, like mobile battery-powered machines, as some level of dynamic hand over between the edge and the cloud could help devices save power while maintaining low latency. This kind of co-inference hand over with the aim of saving power could be offered as a service.

The combination of 5G and co-inference could also enable more sophisticated models to be inferred at low latency on a wider range of devices. This will be relevant where it is only possible to add a limited amount of processing power to the edge. The 5G technology will enable parts of the model that are too computationally heavy to infer at the edge to be passed to the cloud, allowing the device to deal only with those layers it is best suited to handle. Given that DNN models are getting more sophisticated in nature as the number of applications in which they are being used is expanding, co-inference on a 5G platform could become an important mixed solution.

Another merit of using co-inference is that a device does not have to send all its data directly to the cloud. Instead it will just send the segments of partially processed DNNs. This is good from a data protection, storage, and security perspective. Sending less data across the network and doing less processing in the cloud is also going to be cheaper than simply performing inference in the cloud.

There is already a trend toward placing some dedicated hardware for DNN processing at the edge, as has been seen so far with Huawei and Apple smartphone platforms. This approach looks to proliferate. These devices will also be fitted with 5G chipsets to provide a platform for co-inference experimentation.

Who Should Be Exploring Co-Inferencing?

RECOMMENDATIONS


Co-inferencing will become important for applications where inference must be performed at low latency and where it may not be possible to implement all inferencing hardware on the device itself. Co-inferencing and 5G will also be important as algorithms get more sophisticated and deeper to process. Companies that are attempting to automate machines but have constraints with the amount of hardware and total power consumption that they can fit on the device should certainly look at exploring co-inference models, especially if they plan to use sophisticated DNNs and to require a high degree of accuracy and low latency. Vendors in the smartphone, transportation, robotics, and industrial sectors should begin exploring in detail how they can leverage a combination of 5G and co-inference.

The Edgenet framework is still very nascent, and significant investments in its development will be required to bring the technology to a state where it can be used in other technology. Players in the verticals mentioned above are trying to automate as many processes in their machines as possible and are exploring using DNNs in their decision-making systems. Decision making and planning in autonomous vehicle systems is still being left up to heuristic systems. DNNs will play an increasing role in these decision-making systems, given the rise and potential of reinforcement learning (a self-learning approach that uses DNNs) as a more robust alternative to heuristic systems. The DNNs used in these decision-making systems will have layers that will be much more difficult to compute than the ones employed for more straightforward tasks such as image recognition, segmentation, and tracking. The computational stress these DNNs will create will provide the catalyst for the exploration of co-inference and, by virtue, 5G.

Co-inference could also create a powerful service proposition. Using co-inference, autonomous vehicles running low on fuel could dynamically allow the cloud to take over some of the inference tasks normally performed on the vehicle itself. This would only be possible with 5G low latency connectivity but given the high-power demand of AI-capable hardware, this could prove a powerful tool in autonomous vehicle cloud management.

Services

Companies Mentioned