NVIDIA Brings the CPU Back to Center Stage — While Pushing Intel to the Margins

NVIDIA yesterday unveiled, for the first time, the full architecture of its Vera Rubin platform at GTC — the company’s next-generation AI infrastructure designed for the era of agentic AI. Unlike previous generations built around discrete servers, Rubin is presented as a complete rack-scale system, where multiple specialized racks operate together as what NVIDIA describes as an “AI factory.”

The architecture consists of several types of racks, each responsible for a different layer of the system. GPU racks, powered by Rubin processors, handle the most compute-intensive workloads — training large models and running real-time inference. CPU racks manage environments, agents and orchestration logic. Storage racks handle memory and large-scale context management, while networking racks connect the system through high-speed interconnects. In addition, dedicated inference accelerators are integrated to optimize response generation.

The CPU Returns to Center Stage

One of the most notable innovations in the architecture is the Vera CPU rack — a dedicated system designed to address the emerging workloads of autonomous AI agents.

In traditional AI systems, GPUs handled most of the heavy lifting, while CPUs played a supporting role. In the agentic era, however, a growing share of the workload shifts to the CPU: running code, invoking tools, orchestrating workflows, validating outputs and managing simulations.

According to NVIDIA, each CPU rack can include up to 256 Vera processors, capable of running tens of thousands of independent CPU environments in parallel. The processor itself features 88 custom-designed cores, with a strong emphasis on single-thread performance and high memory bandwidth — reaching up to 1.2 TB/s — alongside significant gains in energy efficiency.

A key architectural element is the direct connection between CPU and GPU via NVLink, enabling far faster data exchange than traditional server interconnects. As a result, the CPU is no longer just coordinating GPU workloads — it becomes an integral part of the computation pipeline itself.

Meanwhile, Intel Remains — But in a Reduced Role

At the same time, Intel announced that its Xeon 6 processors have once again been selected as the host CPU for NVIDIA’s DGX Rubin NVL8 systems — servers equipped with eight GPUs that serve as a fundamental building block of the platform.

In these systems, Xeon continues to perform its traditional role: managing GPUs, scheduling workloads and handling data movement. Its strengths remain in large memory capacity, high I/O bandwidth and broad compatibility with existing infrastructure.

However, within the broader Rubin architecture, this role is becoming increasingly limited.

From Blackwell to Rubin: A Shift in Power

The transition from Blackwell to Rubin clearly illustrates the shift. In previous generations, AI infrastructure was largely built around DGX or HGX servers, each combining NVIDIA GPUs with CPUs — typically from Intel. In practice, Xeon processors were present in nearly every system.

With Rubin, the architecture evolves. Instead of uniform servers, the system becomes a heterogeneous, rack-scale infrastructure, where each component is purpose-built. The CPU takes on a more central role — but not necessarily through Intel.

The Vera CPU rack now serves as the layer responsible for running agents and orchestrating the system, while Xeon processors are largely confined to host roles within NVL8 systems. In other words, Intel remains inside the system — but no longer defines its foundation.

The Big Picture: Vertical Integration of AI Infrastructure

The broader picture is clear: NVIDIA is steadily moving toward full-stack control of AI infrastructure — from GPUs to CPUs, networking and storage.

While its partnership with Intel continues, NVIDIA is simultaneously building an internal alternative that strengthens its long-term position. Where the CPU once represented a critical external dependency, it is now becoming another layer under NVIDIA’s own control.

For the data center market, this marks a fundamental shift: from general-purpose servers to AI-native infrastructure — where even the CPU is purpose-built for the agentic era. The result could redefine the balance of power among chipmakers in the years ahead.

Deci launches new models for enhancing Deep Learning on CPU 

Photo above:Deci’s founders (from left to right): Jonathan Elial- COO, Yonatan Giefman- CEO and Ran El Yaniv- Chief scientist. Credit: Deci

Deci, the deep learning company harnessing Artificial Intelligence (AI) to build AI, announced a new set of image classification models, dubbed DeciNets, for Intel Cascade Lake CPUs. According to Deci, its proprietary Automated Neural Architecture Construction (AutoNAC) technology automatically generated the new image classification models that significantly improve all published models and deliver more than 2x improvement in runtime, coupled with improved accuracy, as compared to the most powerful models publicly available such as EfficientNets, developed by Google.

While GPUs have traditionally been the hardware of choice for running convolutional neural networks (CNNs), CPUs, already more commonly utilized for various computing tasks, would serve as a much cheaper alternative. Although it is possible to run deep learning inference on CPUs, generally they are significantly less powerful than GPUs. Consequently, deep learning models typically perform 3-10X slower on a CPU than on a GPU.

As explained by Deci, its DeciNets closes the gap significantly between GPU and CPU performance for CNNs. With DeciNets, tasks that previously could not be carried out on a CPU because they were too resource intensive are now possible. Additionally, these tasks will see a marked performance improvement: by leveraging DeciNets, the gap between a model’s inference performance on a GPU versus a CPU is cut in half, without sacrificing the model’s accuracy.

“As deep learning practitioners, our goal is not only to find the most accurate models, but  to uncover the most resource-efficient models which work seamlessly in production – this combination of effectiveness and accuracy constitutes the ‘holy grail’ of deep learning,” said Yonatan Geifman, co-founder and CEO of Deci. “AutoNAC creates the best computer vision models to date, and now, the new class of DeciNets can be applied and effectively run AI applications on CPUs.”

All networks were compiled and quantized using OpenVino, with latency measured on AWS instance c5.4xlarge with Cascade Lake CPU (16 vCPUs, batch size = 1)

“There is a commercial, as well as academic desire, to tackle increasingly difficult AI challenges. The result is a rapid increase in the complexity and size of deep neural models that are capable of handling those challenges,” said Prof. Ran El-Yaniv, co-founder and Chief Scientist of Deci and Professor of Computer Science at the Technion – Israel Institute of Technology. The hardware industry is in a race to develop dedicated AI chips that will provide sufficient compute to run such models; however, with model complexity increasing at a staggering pace, we are approaching the limit of what hardware can support using current chip technology. Deci’s AutoNAC creates powerful models automatically, giving users superior accuracy and inference speed even on low-cost devices, including  traditional CPUs.”

In March 2021, Deci and Intel announced a broad strategic collaboration to optimize deep learning inference on Intel Architecture (IA) CPUs. Prior to this, Deci and Intel worked together at MLPerf, where on several popular Intel CPUs, Deci’s AutoNAC technology accelerated the inference speed of the well-known ResNet50 neural network, reducing the submitted models’ latency by a factor of up to 11.8x and increasing throughput by up to 11x.

Deci enables deep learning to live up to its true potential by using AI to build better AI. With the company’s end-to-end deep learning development platform, AI developers can build, optimize, and deploy faster and more accurate models for any environment including cloud, edge, and mobile, allowing them to revolutionize industries with innovative products.  The platform is powered by Deci’s proprietary automated Neural Architecture Construction technology (AutoNAC), which automatically generates and optimizes deep learning models’ architecture and allows teams to accelerate inference performance, enable new use cases on limited hardware, shorten development cycles and reduce computing costs. Founded by Yonatan Geifman, Jonathan Elial, and Professor Ran El-Yaniv, Deci’s team of deep learning engineers and scientists are dedicated to eliminating production-related bottlenecks across the AI lifecycle.