NVIDIA Rubin CPX-GPU is purpose-built to handle million-token coding and generative video applications.
- NVIDIA Vera Rubin NV114 CPX platform packs 8 exaFLOPS of AI performance and 100 TB of fast memory in a single rack.
- Companies can achieve $5B in token revenue for every $100M invested.
- AI innovators such as Cursor, Runway, and Magic are evaluating how Rubin CPX can accelerate their applications.
NVIDIA introduced its next-generation Rubin AI platform at CES 2026, initiating an annual release cycle to keep pace with the rapid expansion of agentic AI and reasoning models marketed as six chips that make one AI supercomputer, Rubin. Rubin is designed for high efficiency, offering up to 10x reductions in in inference token costs and requiring 4x fewer GPUs for training than the Blackwell platform.
The platform is now in full production with partner products expected in the second half of 2026.
Core Components of the Rubin Platform
Rubin emphasizes extreme co-design through integrated computing, networking, and software into a unified rack-scale system (Vera Rubin NvL 72):
- Rubin GPU: The flagship AI accelerator featuring a third-generation transformer engine, HBM4 memory, and 50 petaflops of NVF P4 compute.
- There is a CPU: An ARM-based processor with 88 custom cores optimized for agentic reasoning and data movement.
- Nvlink 6 switch: Sixth-generation interconnect offering 3.6 TBs per GPU, allowing 72 GPUs to operate as a single unified accelerator.
- ConnectX-9 SuperNIC: Enables 1.6T bits of networking bandwidth to each GPU.
- Bluefield-4DPU: Integrates networking, security, and storage tasks, including the new Advanced Secure Trusted Resource Architecture (ASTRA) for secure multi-tenant operation.
- Spectrum-6 Ethernet switch supports 102.4 TB per switch chip with co-packaged optics for large-scale AI fabrics.
Key Performance and Efficiency Breakthroughs
- Inference efficiency: Rubin delivers a 10x reduction in inference token cost and 5x greater power efficiency than Blackwell.
- Training capabilities: The platform trains a mix of expert (MOE) models with only one-quarter of the number of GPUs required by previous generations.
- Memory and bandwidth: Rubin uses HBM4 memory, doubling interface width and delivering 3x faster attention capabilities than earlier systems.
- Backscale design: The Vera Rubin NVL72 system supports 72 GPUs and 36 Vera CPUs in a cable-free, liquid-cooled design, enabling assembly and servicing 18x faster.
New Rubin CPX variant
NVIDIA also introduced Rubin CX, a GPU purpose-built for massive context, long-term reasoning, including large-scale software engineering, and 1-million-token context video generation.
- Vera Rubin: Nvl144cpx: A rack-scale platform delivering 8 exaflops of AI performance and 100 TB of high-speed memory.
- Availability: Rubin CPX is expected by the end of 2026.
Ecosystem and Adoption
Major industry partners are already adopting the Rubin platform for 2026, including:
- Cloud Providers: Microsoft Open (Fairwater AI Superfactories), Coreweave, AWS, Google, and Oracle.
- AI Labs: OpenAI, Anthropic, Meta, and xAI.
- Infrastructure Partners: Dell, HPE, Lenovo, Supermicro, and Cisco.
The Rubin platform tackles increasing power density and cooling requirements in data centers while reducing the total cost of ownership for AI-native organizations.
At the AI Infra summit, NVIDIA announced NVIDIA Rubin CPX, a new GPU designed for massive context processing. This allows AI systems to handle million-token software coding and generative video with significant speed and operational efficiency gains.
Rubin CPX operates alongside NVIDIA/NVIDIA VERA CPUs and Rubin GPUs within the new NVIDIA/VERA Rubin NVL144 CPX platform. The Integrated NVIDIA/MGX system delivers eight exaflops of AI compute, offering 7.5 times the AI performance of NVIDIA GB300 NVL72 systems, along with 100 TB of Fast Memory and 1.7 PB/s of memory bandwidth in a single rack. A dedicated Ruby-Rubin-CPX Compute Tray will be available for customers seeking to re-use existing VERA/RUBIN NVL144 systems.
The Vera-Rubin platform will mark another leap in the frontier of AI computing, introducing both the next-generation Rubin GPU and a new category of processors called CPX, said Jensen Huang, founder and CEO of NVIDIA. Just as RTX revolutionized graphics and physical AI, Rubin CPX is the first purpose-built CUDA GPU for massive-context AI, where models reason across millions of tokens of knowledge at once.
NVIDIA Rubin CPX delivers high performance and token revenue for long context processing, exceeding the capabilities of current systems. This advancement enables AI coding assistance to evolve from basic code generation tools to refined systems capable of understanding and optimizing large-scale software projects.
Processing video can require AI models to handle up to 1 million tokens per hour of content, pushing traditional GPU capabilities to the limit. Rubin CPX integrates video encoders, decoders, and long-context inference processing into a single chip, enabling advanced long-form applications such as video search and high-quality generative video.
Based on the NVIDIA Rubin architecture, the Rubin-CPX-GPU features a cost-efficient monolithic die design with advanced NVFP4 computing resources. It is optimized to deliver high performance and energy efficiency for AI inference tasks.
Advancements Offered By Rubin CPX
NCPX provides up to 30 petaflops of compute with NVFP4 precision guaranteeing high performance and accuracy. It includes 128 GB of cost-efficient GDDR7 memory to support demanding context-based workloads. Additionally, it offers three times the attention speed of NVIDIA’s GB300/NVL72 systems, enabling AI models to process longer context sequences without sacrificing speed.
Rubin CPX is available in multiple configurations, including the Vera-Rubin NVL144 CPX, which can be integrated with the NVIDIA Quantum x800 InfiniBand Scale-out Compute Fabric or the NVIDIA Spectrum X Ethernet Networking Platform featuring Spectrum XGS Ethernet Technology and ConnectX9 SuperNICs. The Vera-Rubin NVL144 CPX supports large-scale monetization with $5B in token revenue projected for every $100M invested.
Software Support
NVIDIA Rubin CPX will be supported by the full NVIDIA AI stack from accelerated infrastructure to enterprise-ready software. The NVIDIA Dynamo platform efficiently scales AI inference, increasing throughout while lowering response times and model serving costs.
Our processors will support the latest NVIDIA Nervana Mx family of multi-model models, which offer advanced reasoning for enterprise AI agents and production-grade AI. Nemotron models are available through NVIDIA AI Enterprise. This software platform includes NIM microservices, as well as AI frameworks, libraries, and tools for deployment on NVIDIA-accelerated clouds, data centers, and workstations.
The Rubin platform builds on years of innovation, expands NVIDIA’s developer ecosystem, which includes CUDA-X libraries, a community of over 6 million developers, and nearly 6000 CUDA applications.
Availability
NVIDIA Rubin CPX is expected to be available at the end of 2026.










