Nvidia Strengthens Software AI Offering for GPU

Subscribe To Download This Insight

2Q 2018 | IN-5099

NVIDIA’s significant improvements of TensorFlow’s performance bodes well for developers and implementers looking to deploy AI applications, with NVIDIA promising dramatic AI processing speeds and compatibility across multiple platforms.

Registered users can unlock up to five pieces of premium content each month.

Log in or register to unlock this Insight.


TensorRT for TensorFlow and Kubernetes Integration for GPU Announced at NVIDIA GTC


NVIDIA is doing some very important heavy lifting when it comes to AI. Much of this is self-fulfilling; when it comes to doing both AI training and inference, its GPUs dramatically outperform CPUs on most fronts, so it is the obvious choice for any implementer looking to build and apply AI algorithms. Selling hardware is NVIDIA’s main goal, so the company is committed to making the life of potential developers and implementors as easy as possible, at whatever scale that may be. During NVIDIA’s annual conference GPU Technology Conference (GTC) in Silicon Valley, CEO Jensen Huang took the opportunity to make significant announcements regarding NVIDIA’s AI software stack. NVIDIA’s AI implementation layer, TensorRT, is a software compiler optimized for deep learning on GPUs. TensorRT will now be integrated with TensorFlow, and it will also be possible to run Kubernetes on GPUs.

Announcements Should Increase Performance and Developer Options


NVIDIA will be integrating TensorFlow with NVIDIA’s TensorRT optimization tool. TensorRT speeds up deep learning inference through optimization and high-performance run-times, in some cases synthesizing deep learning models up to 4X their previous size. TensorRT will also redistribute a model across a hardware stack in a manner that greatly improves efficiency and inference speed. ABI Research has been in direct discussion with many developers running training and inferring models using TensorFlow, and one of the main criticisms of the framework has been the speed at which it operates; some have even taken to nicknaming it “TensorSlow.” Developers should take note of this announcement. To illustrate the improvement in performance, NVIDIA describes how TensorFlow executes optimization. As an example, assume a stochastic graph trained in TensorFlow has three segments: A, B, and C. Segment B is optimized by TensorRT and replaced by a single node. During inference, TensorFlow executes A, then calls TensorRT to execute B, which is now the single node, and then TensorFlow executes C. In practice, NVIDIA reports this will increase AI processing speeds dramatically. ResNet-50 is a popular image recognition model for TensorFlow. When NVIDIA ran the model with TensorFlow-TensorRT integration using NVIDIA Volta Tensor Cores, performance was 8X faster than running TensorFlow only. TensorRT integration will become available once TensorFlow 1.7 is released.

In addition, NVIDIA is supporting Kubernetes, the open-source software for deploying and managing containerized applications on GPUs. Adding GPUs to Kubernetes will also be incredibly useful. This will mean improved performance and effective system scaling for customers. Kubernetes is popular among enterprise users that adopt distributed computing architecture and the ability to perform distributed training and inference will bring cost efficiency, redundancy, and scalability. During a demonstration, NVIDIA showed how it could scale an image classification model for flower types from a CPU at four flower types per second to a single GPU at 873 flowers per second, and then to NVIDIA’s own Saturn V GPU cluster. By using the Kubernetes Load Balancer, it can easily add replicas of containers scaling the system to run at nearly 7,000 flower types per second.

GPU Software Story Is One Area Challengers Will Struggle to Match


Last year saw the emergence of several AI startups looking to challenge the current dominance of NVIDIA’s GPUs in AI applications. Venture capitalists invested more than US$1.5 billion in AI chip startups last year, nearly doubling the investments made 2 years ago, according to the research firm CB Insights. Cerebras, Graphcore, Wave Computing, and two Beijing companies, Horizon Robotics and Cambricon, have raised around US$100 million in funding. These companies all promise chip architecture designs that will outperform the GPU when it comes to neural network processing. However, few have much of a compelling story when it comes to the surrounding software ecosystem and integration in general. These companies have much work to do when it comes to challenging NVIDIA in this respect, especially as it continues to announce improvements like those mentioned above. Cultivating a software stack that is functional, compatible across multiple platforms, and easy to use for developers in TensorRT will contribute significantly to NVIDIA’s continued dominance as the lead vendor in AI hardware.

The elephant in the room is, of course, Google with its development of the TensorFlow custom-built tensor processing unit (TPU), which it has been marketing as the lead solution for TensorFlow, offering access to the TPU through Google Cloud beta-grade services. However, upon discussing the penetration of GPUs relative to TPUs in the Google Cloud AI offering with its team, Google mentioned that the TPU was still a niche piece of hardware, and that Google was unlikely to stop its continued acquisition of large quantities of GPUs. Google is also the main proponent of Kubernetes. The support for Kubernetes, which comes after Docker, illustrates the willingness of NVIDIA to partner with Google. The support for open-source software in general by the two largest AI players shows that the best way to move forward is through openness and interoperability.


Companies Mentioned