NVIDIA Vera Rubin Platform Scales AI Factories Fast Now 2026

News Summary

The NVIDIA Vera Rubin Platform leads the next era of AI with integrated features:

Vera Rubin NVL72 GPU Racks

Vera CPU racks (servers with central processing units for handling calculations)

NVIDIA Groq 3 LPX inference accelerator racks (systems designed to speed up AI model predictions)

NVIDIA Bluefield 4 STX storage racks (high-speed storage and networking hardware)

NVIDIA Spectrum 6 SPX Ethernet Racks (Advanced Switches for Fast Data Networking)

Following the introduction of the platform’s key features, Nvidia made a significant announcement at GTC: the Nvidia Vera Rubin platform is initiating a new chapter in Agentic AI. Seven new chips are now in full production to help scale the world’s largest AI factories.

The platform features the Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX9 SuperNIC, BlueField DPU, Spectrum 6 Ethernet switch, and Growth 3 LPU. These work as a unified AI supercomputer, powering all stages of AI, from pre-training to real-time agentic inference.

Vera Rubin represents a leap with seven breakthrough chips, five racks, and one supercomputer powering every AI phase, said Jensen Huang, founder and CEO of Nvidia. The agentic AI inflection point has arrived, and Vera Rubin is driving historic infrastructure growth.

Enterprises and developers are using cloud for increasingly intricate reasoning, agentic workflows, and mission‑critical decisions that demand infrastructure that can keep pace, said Dario Amodei, CEO and co‑founder of Anthropic. NVIDIA’s platform provides the compute, the network, and system design to keep delivering while improving the safety and reliability our customers depend on.

“Nvidia infrastructure is the foundation that lets us keep advancing the frontier of AI,” said Sam Altman, CEO of OpenAI. With Nvidia, Vera Rubin will run more powerful models and agents at a massive scale, delivering faster, more reliable systems to hundreds of millions of people. AI infrastructure is changing quickly, moving from separate chips and standalone servers to fully integrated Rack Scale systems, POD-scale deployments, AI factories, and sovereign AI. These changes are leading to big improvements in performance and cost efficiency for organizations of all sizes and industries, from startups and mid-sized businesses to public and private institutions and enterprises. They also help make AI easier to use and improve energy efficiency for the world’s most challenging workloads.

By integrating compute, networking, and storage—with support from over 80 Nvidia NGX partners—Vera Rubin offers a unified, extensive POD-scale platform comprising multiple AI racks working together as one system.

NVIDIA Vera Rubin NVL 722 Rack

The Vera Rubin NVL 72 connects 72 Rubin GPUs and 36 Vera CPUs for efficient large model training, requiring only a quarter of the GPUs used by Blackwell and delivering up to 10x higher inference throughput per watt and lower cost per token. It’s built for hyperscale AI factories, reducing both training time and costs.

NVIDIA Vera CPU Rack

Reinforcement learning and agentic AI workloads rely on many CPU-based environments. These environments test and validate the model’s results. They are running on GPU systems.

The Nvidia Vera CPU RAC offers a dense, liquid-cooled setup based on Nvidia MGX. It includes 256 Vera CPUs, providing scalable, power-efficient capacity with top single-core performance, enabling large-scale agentic AI.

Integrated with Spectrum X networking, Vera CPU racks keep environments synchronized across the AI factory. Paired with GPU racks, they form the CPU base for large-scale agentic AI and reinforcement learning, delivering results twice as efficiently and 50% faster than traditional CPUs.

NVIDIA Groq 3 LPS Rack

The Nvidia Groq 3 LPX is a major step forward in accelerated computing, built for the fast, large-scale needs of API and genetic systems. LPX and Vera Rubin combine their high performance to deliver up to 35 times more inference throughput per megawatt and up to 10 times more revenue potential for trillion-parameter models.

When scaled up, many LPUs can work together as a single large processor. This speeds up inference tasks. The LPX rack includes 256 LPU processors, 128 GB of on-chip SRAM, and 60 TB of bandwidth when used with Vera Rubin MVL 72. Rubin, GPUs, and LPUs work together to process every layer of the AI model. They handle each output token.

The LPX architecture is built for trillion-parameter models and million-token contexts. It works with Vera Rubin to maximize power, memory, and computing resources. Its higher throughput per watt and better token performance open up new possibilities for advanced inference. These also mean more revenue for AI providers; with full liquid cooling and MGX infrastructure, LPX will fit easily into the next generation of Vera Rubin AI factories. It will be available later this year.

NVIDIA Bluefield 4 STX Storage Rack

The NVIDIA Bluefield for STX Rack Scale system is an AI storage solution. It extends GPU memory across the POD, combining the Vera CPU and ConnectX-9 SuperNIC for high-bandwidth storage and retrieval of key-value cache data.

NVIDIA DOCA memos are a new system that improves BlueField 4 storage. It enables dedicated KV cache storage processing, boosting MPs’ inference throughput by up to 5x and making power use much more efficient than with general-purpose storage. These changes lead to faster multi-turn interactions with AI agents. AI services become more scalable, and infrastructure is used more effectively across the POD.

The Nvidia BlueField 4 STX Rack Scale Context Memory Storage System will enable a critical performance boost needed to exponentially scale our Agentic AI efforts, said Timothee Lacroix, co-founder and chief technology officer of Mistral AI. By delivering a new storage tier purpose-built for an AI agency’s memory, STX is ideally positioned to ensure our models can retain logic and speed when reasoning across large datasets.

NVIDIA Spectrum 6 SPX Ethernet Rack

Spectrum 6 SPX Ethernet accelerates data movement between the AI factory, using Spectrum X Ethernet or NVX. Quantum X800 InfiniBand switches ensure high-speed rack connections at scale.

Spectrum X, Ethernet, and Photonics use co-packaged optics. It offers up to five times better optical power efficiency and ten times greater resiliency than traditional pluggable transceivers.

Improved Resiliency And Energy Efficiency

NVIDIA, along with more than 200 data center partners, has announced the NVIDIA DSX platform for Vera Rubin. DSX Max Q dynamically manages power across the entire AI factory, enabling data centers to deploy 30% more AI infrastructure without increasing power consumption. The new DSX Flex software also helps AI factories use grid power more flexibly, unlocking 100 gigawatts of unused grid power. We released the Vera Rubin DSX AI factory reference design, a blueprint for code-signed AI infrastructure that maximizes tokens per watt and overall goodput, improving system resiliency and accelerating time-to-first-production.

By integrating compute, networking, storage, power, and cooling, Vera Rubin’s architecture boosts energy efficiency, scales reliably under heavy workloads, and maintains high uptime for AI factories.

Broad Ecosystem Support

Partners will start offering Vera Rubin–based products in the second half of this year. These products will be available through major cloud providers like Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure, as well as NVIDIA cloud partners such as CoreWeave, Cursoe, Lambada, Nebius, NScale, and Together AI.

Within this broad ecosystem, global system manufacturers such as Cisco, Dell Technologies, HPE, Lenovo, and Supermicro are expected to offer a variety of servers built with Vera Rubin products. In addition, other companies such as Aivres, Asus, Foxconn, Gigabyte, Inventec, Pegatron, Quanta Cloud Technology (QCT), Wistron, and Wiwynn will also provide these servers.

On this foundation, leading AI labs and developers, including Anthropic, Meta, Mistral AI, and OpenAI, are embracing Vera Rubino to train larger models and accelerate long-context multimodal systems, aiming for greater speed and efficiency.

Source: NVIDIA Vera Rubin Opens Agentic AI Frontier