DeGirum Leveraging Pruning to Deliver Efficient Edge AI Solution

Subscribe To Download This Insight

4Q 2021 | IN-6397

DeGirum offers a unique solution for both dense and sparce neural networks that need pruning.

Registered users can unlock up to five pieces of premium content each month.

Log in or register to unlock this Insight.


DeGirum Introduced ORCA-NNX


In Q4 2021, Artificial Intelligence (AI) chipset startup DeGirum emerged from stealth mode and announced the launch of its ORCA-NNX Edge Inference Accelerator. Coupled with the company’s Deep Neural Network (DNN) pruning support, the company offers a full stack, hardware-to-software technology suite to address low-powered high-accuracy Machine Learning (ML) inference at the edge.

Founded in 2017, the California-based startup focuses on ML hardware and software solutions based on its support for pruned DNN models. The company’s first ML inference processor is called ORCA-NNX, a flexible Application Specific Integrated Circuit (ASIC) that accelerates data processing through high bandwidth and compute efficiency. The processor comes with a network compiler that identifies and generates the most memory conserving code for a particular pruned DNN. Additionally, the company has also developed a set of software libraries and tools targeted at developers.

A Combination of Hardware and Software for Pruning


Pruning has always drawn significant interest among developers. According to ABI Research’s report on the Edge AI Ecosystem (AN-5334), there are six main ways of model compression: Neural Network Architectural Search (NAS), filter decomposition, knowledge distillation, pruning, quantization, and compilation. Among them, pruning is a popular process that reduces the size of neural networks primarily through the reduction of parameters and the removal of redundant neural connections, as well as the efficient storage of model weights.

As the size of current DNNs have reached several million parameters, inference can be inefficient on resource-constrained devices. Through pruning, a dense DNN, like a Convolutional Neural Network (CNN), can be thinned with negligible quality degradation, while a sparse DNN for speech recognition can have significant reduction in size. Therefore, a pruned network leads to higher efficiency in power and memory usage. However, without hardware that can exploit the sparsity in the networks, most current pruning techniques do not necessarily yield significant computational reduction.

DeGirum aims to change this by offering a hardware designed specifically to support pruned DNNs. ORCA-NNX is optimized for both structured and random pruning, thereby providing model developers with high flexibility while designing the DNNs. Since the pruned DNNs are smaller in size, this hardware-software combination can reduce both compute and bandwidth requirements, thereby multiplying workload capacity without increasing resources requirements and energy consumption. At the same time, the company is developing software to support model porting from popular ML frameworks, such as TensorFlow, PyTorch and ONNX. Such support reduces development time and complexity, allowing for easy migration, update, and maintenance.

Designed for High Accuracy Dense and Sparse Models


DeGirum’s support for pruned DNNs and full stack approach to edge ML is rather unique in the industry, as it enables the company to tackle a wide range of edge ML applications. Not just designed for power/performance optimization of a single ML model, DeGirum hardware supports a wide range of ML models while still offering a high frame-per-second (FPS) performance. This contrasts against most, if not all, other accelerators that are optimized to deliver high FPS performance on one or two models that can completely fit inside the limited on-chip memory. This enables edge AI developers to create rich, sophisticated, and highly functional applications. 

Additionally, most ML accelerators in the market today are mainly designed to tackle dense DNNs. DeGirum is capable of supporting both dense and sparse networks. The company has demonstrated high power efficiency in running complex DNNs such as YOLO V5, ResNet50, and PoseNet.

Currently, the company is targeting researchers that require highly accurate edge ML models. Moving forward, the company is looking to address edge ML use cases that require running multiple DNNs with high accuracy. Some potential use cases include autonomous navigation in robotics, machine vision in healthcare diagnostics, and defect inspection and anomaly detection in industrial sectors.

An area where DeGirum can play a key role is in ambient sound and natural language processing at the edge. At the moment, most of the implementations focus on simple tasks, such as wake word detection, scene recognition, and voice biometrics. More complex applications are still heavily reliant on cloud computing. However, language and speech DNNs have been shown to retain their accuracy despite being heavily pruned. This means AI-enabled devices will feature more complex audio and voice processing applications in the future. In ABI Research’s recent report on Deep Learning-Based Ambient Sound and Language Processing: Cloud to Edge (AN-5031), it was predicted that over a billion end devices will be shipped with a dedicated chipset for ambient sound or natural language processing by 2026. As the company can connect multiple ORCA-NNX accelerators together through high throughput, low latency fabric using standard interfaces, the company can also target the edge AI gateway and server market that are becoming more and more important in enabling brownfield infrastructure.



Companies Mentioned