GTC '22: NVIDIA’s Holistic Approach Marries Technology Components to Create Powerful Union

Author: Lian Jye Su

written by ABI Research Principal Analyst Lian Jye Su

This is part 2 of a series of blogs covering the NVIDIA GTC '22 event. You can read part 1 here and part 3 here

Achieving full-stack status as a company is important if NVIDIA is to help democratize HPC and AI/ML workloads and technologies. heterogeneous computing is driving the convergence of those workloads by pushing the boundaries of what is possible with the available technology. By achieving more with less investment, the benefits of these high-end workloads can be enjoyed by all tiers of enterprises, not just those that have the resources to buy expensive specialist systems. By owning the full stack, from hardware through to application, NVIDIA can optimize, tune, and improve every component in the stack to further increase productivity and drive down the effective cost of the investment required to achieve that productivity on which business outcomes rely.

As the boundaries to what we can achieve are pushed, so too are expectations, and new workloads present fresh challenges to be solved. Before the revolution that was made possible by acceleration technologies being deployed in heterogeneous systems, the workloads were limited in size to a certain degree by the time it would take to perform the calculations, as capabilities improved and that time dropped, the size of the workloads increased and the bottlenecks or pinch points were formed in new locations. Workloads moved from being compute-bound to being memory-bound and workload scaling started to become limited by the transfer of workload elements between these heterogeneous components.

Examples of these extreme workloads are becoming commonplace; workloads such as natural language models where 500 billion or even one trillion parameters are not considered extraordinary today. Transformer deep learning models can consume huge data sets and learn without the need for human supervision, and as a result the models and training sizes have grown enormously, but these models are not served well by traditional CPU and GPU systems; they generate huge demands on memory capacity and bandwidth.

This is where the full stack approach can start to have an impact because every element in the stack is being driven by the same common goal. This means that solutions can become tailored to specific scenarios, with the controller of the stack having the ability to make the programming models more efficient at the application layer, and relevant to the hardware that it is intended to run on, both at the system and the component level. This means that the enterprise looking to enter the AI arena, assuming they understand their workload, can make the simple choice to partner with NVIDIA on their AI journey. And for the enterprise who does not yet fully understand their workload? They can work with NVIDIA via LaunchPad, which has 9 facilities globally where customers can have immediate access to the full portfolio of NVIDIA AI software and hardware components. This allows them to test and prototype AI workflows via cloud-based interfaces into private compute, storage, and network infrastructure. All of NVIDIA’s development tools and application frameworks are also available via LaunchPad which means that every layer of the stack can be tested prior to the enterprise selecting a technology partner.

The heavy bias on hardware announcements at this GTC event reflects the importance that hardware integration and optimization plays in the servicing of this new wave of AI workloads. One of the headline announcements was that of the new H100 GPU, the new Hopper architecture GPU that replaces the current Ampere GPU and is expected to outperform Ampere by 6 times. In addition to this huge jump in performance, the Hopper architecture introduces per-instance partition isolation for confidential computing and a new DPX instruction set to accelerate dynamic processing algorithms. NVIDIA gave details on how the new H100 will be packaged with high-bandwidth memory (HBM) to form the SXM superchip module. Eight of these modules will be connected by NVLink onto the HGX system board with 2 CPU and either InfiniBand or ethernet networking modules to make one giant GPU, the DGX H100 GPU. Up to 32 of these DGX servers (256 GPUs) can be connected via the NVLink Switch System to create the hugely capable DGX POD which will really push the boundaries of GPU technologies and the workloads that run on them. In addition, to bring the benefits of the H100 to the mainstream market, NVIDIA have removed the bottlenecks caused by moving data between the GPU, CPU and network interfaces by directly attaching the network to the GPU. They announced the H100 CNX, which comes in PCI-e format. The H100 CNX combines the advanced networking processor, the ConnectX-7 SmartNIC, with the H100 GPU into one module, meaning that data from the network is piped through Direct Memory Access (DMA) straight into the GPU at 50GB per second, eliminating bottlenecks at the CPU and system memory and avoiding multiple passes across the PCI-e bus, at the same time freeing those resources to be more productive processing other workloads.

NVIDIA also announced the Grace Hopper module that combines the NVIDIA Grace CPU that was announced at the last GTC event, and the Hopper GPU just announced. Grace Hopper will connect the CPU and the GPU via the memory-coherent, direct chip-to-chip NVLink interconnect, which has 900 GB per second bandwidth. This NVLink chip-to-chip interconnect is also critical to the next product announced, the Grace Superchip. The Grace CPU is expected in H1 2023, but this announcement provides some more color around NVIDIA’s plans for their first data center CPU. The Grace Superchip will see two Grace chips with a total of 144 cores, 1TB of low wattage memory with 1TB per second memory bandwidth connected coherently over NVLink chip-to-chip interconnects. The Grace Superchip is expected to be used for High-performance Computing (HPC), AI, data analytics, scientific computing and hyperscale workloads and it will be compatible with all of NVIDIA’s software platforms.

The enabling technology for the Grace Hopper and Grace Superchip is the NVLink chip-to-chip interconnect, which provides connectivity between major components that is energy-efficient, very low latency and memory-coherent. It’s designed to scale from die-to-die, chip-to-chip and system-to-system and enables Grace and Hopper to be configured to address a very large diversity of customer workloads. The 1 Terabit/sec bandwidth provided by the interconnect and the coherent nature of the memory means that very large data sets can be shared between the resources in the system and overall latencies associated with moving large datasets between system components are drastically reduced. The impressive bandwidth offered by the NVLink chip-to-chip interconnect and its ability to scale at the chip and the system level means that NVIDIA can create a very diverse set of systems to meet the growing range of customer use cases; from the dual-CPU Grace superchip that can target traditional HPC workloads through to a system with two Grace Superchips augmented by eight Hopper GPUs to target deep learning and advanced acceleration-optimized workloads.

NVIDIA’s plans for NVLink do not stop with Grace and Hopper. NVIDIA plans to integrate the capabilities of NVLink to include all of their processor technologies, expanding the capabilities of the resulting chips and systems, yet to be announced, to include the benefits brought by NIC, SOC and DPU functionalities. Furthermore, NVIDIA plans to make the NVLink interconnect and underlying Serializer/Deserializer technology available to customers and partners who want to implement custom chips that connect to NVIDIA platforms

Another data center technology announced was the NVIDIA Spectrum-4 Ethernet Switch, which the company describes as the most advanced switch built to date and with features such as fair bandwidth distribution, adaptive routing and congestion control, it promises to be an industrial scale switch that will boost overall network throughput. It is compatible with the DOCA data center infrastructure software that was first announced in conjunction with the NVIDIA BlueField DPU at the last GTC event. It has ConnectX-7and BlueField-3 adapters and promises to be the world's first 400Gb per second end-to-end networking platform. Importantly for edge and hyperscaler workloads it can achieve timing precision to within a few nanoseconds versus the current several-millisecond jitter that is seen in today’s data centers. For the hyperscalers and enterprise factories of the future the NVIDIA Spectrum-4 switch will offer improved throughput, quality of service and security as well as a reduction in operating costs and power consumption. It will enable a new class of computing for cloud and edge workloads as well as digital twins.

Individually these announcements are impressive, but collectively, they represent a big step forward for current and future workloads. Innovating across the full stack and building in multi-component integration wherever possible makes for a very strong product portfolio at the component level and cements the relationships between stack layers and verticals. This whole-stack integration makes a very strong argument for itself to both enterprise customers and hyperscalers or service providers who will provide services to the enterprise. The role of the original equipment manufacturer (OEM) is also recognized and catered for by NVIDIA’s strategy; there is still plenty of scope for them to use the technology to build their own solutions or to partner their technology with NVIDIAs to create value-add appliances and specialized processing hardware.

Part 1: Full Stack, Layered Technology will Democratize AI
Part 3: Full Stack Approach Accelerates Democratization of Technology