Blackwell Is Already Old: NVIDIA’s Rubin Platform and the New Economy of Inference Scaling 

NVIDIA introduced the Rubin architecture at CES 2026, marking a major shift from focusing solely on training power, as with Blackwell, to a 6-chip platform built for large-scale inference.  

Rubin is designed to meet the growing needs of AI agents and mixture-of-experts (MOE) models. Delivers 5x better inference performance and cuts inference costs per token by 10x compared to the previous generation.  

The Announcement Of The Rubin Architecture (2026) 

In Media, CEO Jensen Huang said Rubin is in full production for 2026, with deployments starting in the second half of the year through partners such as AWS, Google Cloud, Microsoft, and CoreWeave.  

  • Six-chip architecture: rather than just a GPU, Rubin is a complete system-level redesign. It combines the Rubin GPU (R200), the Vera CPU (designed for agentic reasoning), NVLink 6 (switch networking), Connect X-9, Supernics Blue Field for DP use, and Spectrum-6 Ethernet switches.  
  • Performance measures: Rubin delivers 50 petaflops of FP for influence performance compared to 10 on Blackwell and offers training that is 3.5 times faster  
  • Key Focus: Rubin is built for Agentic AI and long-context RAG retrieval-augmented generation, supporting smarter, more independent agents that can work over longer periods.  

Why Inference Economics Is the New Metric for 2026 

From 2023 to 2025, the industry focused on training. But 2026 is the inflection point when global spending on AI will exceed the cost of training it.  

As companies shift from AI research to production, the cost of running inference rather than training becomes the main factor in return on investment.  

The 15-20x cost multiplier: for every $1B spent training an AI model, organizations face $15-20B in inference costs over the model’s production lifetime. This incurring always-on cost crushes budgets, making it the dominant driver of AI spending.  

  1. The shift to Agentic workflows: AI is moving from simple one-shot queries to complex agents that engage in multi-step reasoning, tool usage, and long-horizon tasks. This exponentially increases token consumption, turning inference economics into a critical operational expense.  
  1. From tokens per second to cost per resolved task: the industry is moving away from focusing solely on speed toward CPT (cost per resolved task) as the primary measure of productivity. If the total cost of completing a complex task is too high, the service cannot continue.  
  1. Efficiency over raw power: Blackwell was built for top training performance, but Rubin aims to make inference ten times cheaper. The goal is to make AI agents affordable for everyday business use.  

Rubens Extreme Code Sign handles this by improving network memory bandwidth (288 GB HBM4) and by adding specialized inference hardware. This helps make sure the huge increase in AI inference demand stays within what firms can afford.  

NVIDIA has launched the NVIDIA Rubin platform, which includes six new chips built to power a state-of-the-art AI supercomputer. Rubin aims to make it easier and more affordable to build, deploy, and secure advanced AI systems, helping more people and businesses use AI.  

The Rubin platform brings together six chips:  

  1. NVIDIA VERA CPU  
  1. RUBIN GPU  
  1. NVLINK 6 Switch  
  1. ConnectX 9 SuperNIC  
  1. Bluefield 4 DPU  
  1. Spectrum 6 Ethernet switch  

To reduce training time and lower the cost of running AI models.  

Rubin arrives at exactly the right moment as AI computing demand for both training and inference is going through the roof,” said Jensen Huang, founder and CEO of NVIDIA. “With our annual pace of launching new AI supercomputers and the combined design of six new chips, Rubin is a big step forward for AI.”  

The Rubin Platform is named after Vera Florence Cooper Rubin, the American astronomer whose discoveries changed how we grasp the universe. It includes the NVIDIA Vera Rubin NVL72 rack-scale solution and the NVIDIA HGX Rubin NVL8 system.  

The Rubin platform brings five new features:  

  1. The latest NVLink interconnect.  
  1. Transformer Engine  
  1. Confidential Computing  
  1. RAS Engine  
  1. NVIDIA Vera CPU  

These advances help speed up agentic AI, advanced reasoning, and inference from a large-scale mixture-of-experts model by up to 10x at a lower cost per token than the NVIDIA Blackwell platform. Rubin also trains MOE models with 4x fewer GPUs than its predecessor, enabling faster AI adoption.  

Wide Industry Support 

Many top AI labs, cloud providers, computer makers, and startups plan to use Rubin. These include Amazon Web Services (AWS), Anthropic, Black Forest Labs, Cisco, Cohere, CoreWeave, Cursor, Dell Technologies, Google, Harvie, HPE, Lambda, Lenovo, Meta, Microsoft, Mistral AI, Nebius, Nscale, OpenAI, OpenEvidence, Oracle Cloud Infrastructure (OCI), Perplexity, Runway, Super Micro, Thinking Machines Lab, and X-Ai.  

Built To Scale Intelligence 

Tragic AI reasoning models and advanced video generation are expanding the limits of computing. Solving complex problems means models must process, reason, and act over long sequences. The Rubin platform meets these needs with five key technologies.  

  • Sixth-generation NVIDIA NVLink: this technology provides fast, seamless GPU-to-GPU communication for large MOE models. Each GPU has 3.6 TB/s of bandwidth, and the Vera Rubin NVL72 rack offers 260 TB/s, exceeding the total bandwidth of the internet. Built-in network compute speeds up collective operations, and new features improve serviceability and reliability. The NVLink6 switch makes AI training and inference faster and more efficient at scale.  
  • NVIDIA Vera CPU: Built for Agentic Reasoning. NVIDIA Vera is the most power-efficient CPU for large-scale AI operations. It uses 88 custom Olympus+ cores, supports AR/MV 9.2, and features ultrafast NVLink C2C connectivity. Vera offers strong performance, high bandwidth, and top efficiency for current data centers.  
  • NVIDIA Rubin GPU: With a third-generation Transformer engine and hardware-accelerated adaptive compression, the Rubin GPU delivers 50 petaflops of NVIDIA V100-P4 compute for AI inference.  
  • Third-generation NVIDIA Confidential Computing: Vera Rubin is the first rack-scale platform to offer it. It keeps data secure across CPU/GPU and NVLink domains, protecting large proprietary models training and inference workloads.  
  • 2nd Generation RAS Engine: The Rubin platform supports GPUs/CPUs and NVLink, and includes instantaneous health checks, fault tolerance, and pre-emptive maintenance to boost productivity. Its configurable, cable-free tray design allows assembly and servicing up to 18 times faster than Blackwell.  

AI-Native Storage and Secure Software-Defined Infrastructure 

NVIDIA Rubin has launched the N Media Inference Context Memory Storage Platform, a new AI-native storage platform designed to handle inference context at massive scale.  

With NVIDIA BlueField-4, the platform, AI systems can share and reuse key-value cache data more efficiently. This boosts responsiveness and throughput, helping agentic AI scale in a predictable, energy-efficient way.  

As more AI factories adopt bare-metal and multi-tenant approaches, it is important to maintain strong control and isolation in the infrastructure.  

Bluefield-4 also brings in Advanced Secure Trusted Resource Architecture (ASTRA). This system-level trust setup gives AI infrastructure teams a single secure control point to set up, isolate, and run large AI environments without sacrificing performance.  

As AI applications advance, AI-focused organizations need to handle and share much larger amounts of inference context across users, sessions, and services.  

Different Forms for Different Workloads 

The NVIDIA Vera Rubin NVL72 is a secure all-in-one system with:  

  • 72 NVIDIA Rubin GPUs  
  • 36 NVIDIA Vera CPUs  
  • 6 NVIDIA NVLink, NVIDIA ConnectX-9 SuperNICs, and   
  • NVIDIA Bluefield 4 DPUs  

NVIDIA is also releasing the HGX Rubin NVL8 platform, a server board that connects eight Rubin GPUs with NVLink to support x86-based generative AI systems. This platform accelerates training, inference, and scientific computing for AI and high-performance computing.  

NVIDIA’s DGX SuperPod is a reference design for a large-scale Rubin-based platform. It brings together either the DGX Vera Rubin NVL-72 or DGX Rubin NVL-8, along with Bluefield for DPUs, ConnectX-9 SuperNICs, InfiniBand networking, and Mission Control software.  

Next-Generation Ethernet Networking 

Advanced Ethernet networking and storage are key parts of AI infrastructure. They help data centers run at peak performance, boost productivity, and reduce costs.  

NVIDIA Spectrum 6 Ethernet is the next step in Ethernet for AI networking. It is designed to help Rubin-based AI factories grow more efficiently and with greater resilience, using 200G SerDes circuits, co-packaged optics, and AI-optimized fabrics.  

Spektrum X Ethernet Photonics, built on the Spectrum 6 architecture, uses co-packaged optical switches to deliver 10x greater reliability and 5x longer uptime, along with 5x better power efficiency.  

Spectrum-XGS Ethernet technology, part of the SpectrumX platform, enables facilities hundreds of kilometers apart to work together as a single AI environment.  

All these advantages make up the next generation of NVIDIA Spectrum-X Ethernet Platform. It is specifically designed for Rubin to support large-scale AI factories and prepare for future environments with millions of GPUs.  

Rubin Readiness 

NVIDIA’s Rubin is now in full production, and products based on it will be available from partners in the second half of 2026.  

AWS, Google Cloud, Microsoft, and OCI will be among the first cloud providers to offer Vera/Rubin-based instances in 2026. NVIDIA cloud partners like CoreWeave/Lambda/Nebius and Nscale will also deploy these instances.  

Microsoft plans to use NVIDIA, Vera Rubin, NVL72 rack-scale systems in its next-generation AI data centers, including upcoming Fairweather AI Super Factory locations.  

The Rubin platform is built to offer high efficiency and performance for training and inference tasks. It will support Microsoft’s next-generation cloud AI features. Microsoft Azure will provide an optimized platform to help customers speed up innovation in enterprise research and consumer applications.  

Starting in the second half of 2026, CoreWeave will add NVIDIA Rubin-based systems to its AI cloud platform. CoreWeave supports multiple architectures, enabling customers to run Rubin in their own environments for training, inference, and agent workloads.  

CoreWeave and NVIDIA will work together to help AI innovators use Rubin’s new features in reasoning and MOE models. They will continue to provide the performance, reliability, and scale needed for production of AI throughout its lifetime with CoreWeave Mission Control.  

CISCO, Dell, HPE, Lenovo, and Supermicro are also expected to offer a variety of servers built on Rubin products.  

AI labs such as Anthropic, Black Forest, Cohere, Cursor, Harvey, Meta, Mistral AI, OpenAI, OpenEvidence, Perplexity, Runway, Thinking Machines Lab, and xAI plan to use N Media, the Rubin platform, to train bigger and more advanced models. They also aim to run long-running, multi-modal systems with lower latency and cost than earlier GPU generations.  

Infrastructure, software, and storage partners like AIC, Canonical, Cloudian, DDN, Dell, HPE, Hitachi, Vantara, IBM, NetApp, Nuantix, Pure Storage, Supermicro, SUSE, Vast Data, and Weka are working with NVIDIA to create next-generation platforms for Rubin infrastructure.  

The Rubin platform is NVIDIA’s third-generation rack-scale architecture and includes over 80 NVIDIA MGX ecosystem partners.  

To support this density, Red Hat has announced a wider partnership with NVIDIA to deliver a full AI stack optimized for the NVIDIA Rubin platform. This will use Red Hat’s hybrid cloud products, including Red Hat Enterprise Linux, Red Hat OpenShift, and Red Hat AI. Most Fortune Global 500 companies use these solutions.

Source: https://nvidianews.nvidia.com/news/rubin-platform-ai-supercomputer 

Amazon

Leave a Reply

Your email address will not be published. Required fields are marked *