Santa Clara, California  

Every enterprise technology officer has felt the same bottleneck: a GPU cluster powerful enough to run large language models at scale, throttled not by compute but by the network traffic choking between servers. That friction is not a minor inconvenience. It is a billion-dollar drag on AI ambitions. AMD advancing AI infrastructure at its June 2025 summit in Santa Clara offered what looked, on paper, like a direct answer to that problem — and the scale of the bet AMD is placing makes the proposal worth taking seriously. 

AMD Advancing AI: From Chips to Systems 

At the Advancing AI 2025 event on June 12, AMD CEO Dr. Lisa Su introduced more than just a new processor. She announced a change in approach. Instead of focusing on having the best GPU, AMD now aims to have the best overall rack system. 

This change is important. For years, discussions about AI infrastructure focused on individual accelerator benchmarks like FLOPS, memory bandwidth, and chip size. Now, AMD’s approach is different: it argues that a chip performance does not matter if the system around it cannot move data quickly enough. 

At Advancing AI 2025, AMD showed a complete, open-standards rack-scale AI infrastructure. This system is already being used with AMD Instinct MI350 Series GPUs, 5th Gen AMD EPYC processors, and AMD Pensando Pollara NICs in extensive deployments like Oracle Cloud Infrastructure. These are not just future plans—these systems are already running. 

The Instinct GPU Arrays Powering the Next Phase 

The main hardware feature is the Instinct GPU clusters using the MI350 Series. The Instinct MI355X GPU, built on AMD’s CDNA 4 architecture, offers up to 20 PFLOPS of FP4 performance, 288GB of HBM3E memory, and 8 TB/s of bandwidth. These systems can scale to 128 GPUs per rack with liquid cooling, reaching 2.6 exaFLOPS of AI compute and supporting models with more than 500 billion parameters. 

In practical terms, a single Helios rack with 128 MI355X GPUs can train a 500-billion-parameter model without sending data to another rack. For enterprise teams with strict compliance needs, such as financial institutions, healthcare providers, or defense contractors, keeping training within a single secure, isolated rack is not just helpful—it is required. 

AMD’s Instinct GPU clusters also have a strong competitive angle. The Helios rack-scale solution will use 72 MI400 Series GPUs, next-generation EPYC Venice CPUs, and Pensando Vulcano network adapters. Compared to the prerelease specs of NVIDIA’s Vera Rubin NVL72, Helios is expected to offer the same scale-up bandwidth and similar FP4 and FP8 performance, but with 50% more HBM4 memory capacity, memory bandwidth, and scale-out bandwidth. 

Having more memory in each rack changes how models are served. Operators do not have to split models across many nodes as much, which lowers the delays users notice during inference. 

Rethinking the Data Center Ecosystem 

The main challenge in the past was building faster GPUs. Now, the focus is on creating a well-integrated data center ecosystem around those GPUs. AMD is tackling this with a multi-layered, open-standards strategy that spans the rack, software, and networking layers. 

AMD leads to open standards like the Open Compute Project (OCP), Ultra Accelerator Link (UALink), and Ultra Ethernet Consortium (UEC). This leadership helps the industry scale through collaboration, enabling the development of open, high-performance systems for both scale-up and scale-out AI clusters. 

AMD’s data center ecosystem is built to avoid the vendor lock-in seen with NVIDIA’s GB200 NVL72 systems. While NVIDIA’s NVLink fabric keeps customers tied to one vendor, AMD’s approach lets operators choose networking, cooling, and power equipment from different suppliers. For large-scale operators spending billions, this pliability has real financial benefits. 

The market has responded quickly. Oracle plans to launch a public AI supercluster with 50,000 Instinct MI450 Series GPUs in Q3 2026, using the Helios rack design, next-gen EPYC Venice CPUs, and Pensando Vulcano networking. Vultr is also building a 50 MW AI supercluster in Ohio with 24,000 Instinct MI355X GPUs. These are full-scale projects, not just tests. 

Solving Network Transport: The Hidden Bottleneck 

The most important technical announcement from the June summit received little attention in mainstream coverage. Network transport, or how data moves between accelerators during distributed training, is now the main limit on cluster efficiency. AMD is addressing this issue directly. 

AMD helped start the UALink Consortium, which is creating an open standard for GPU-to-GPU communication across servers and racks. UALink provides 260 TB/s of bandwidth within a rack, offering greater scalability and openness than proprietary options like NVLink. AMD also plans to support UALink over Ultra Ethernet, combining high performance with Ethernet’s flexibility. 

The network transport issue is very real. When training large models on hundreds of GPUs, the interconnections among GPUs affect how much time is spent waiting for data updates rather than computing. Cutting interconnect latency by 10% can reduce training time by the same amount, yielding considerable cost savings for long training runs. 

With the Helios reference design, performance scales smoothly across 72 GPUs using UALink. UALink connects the GPUs and scale-out NICs, and when used over Ethernet, links all the GPUs in the rack, so they work together as a single system. 

AMD Accelerates Rack Scale Infrastructure for Enterprise AI Training 

The most consequential implication of all this activity is the enterprise angle. AMD accelerates rack-scale infrastructure for enterprise AI training, making it available not only to hyperscalers but also to thousands of mid-sized organizations, such as regional banks, pharmaceutical companies, and national labs. These groups cannot build massive data centers but still need to handle demanding training jobs. 

AMD is well-positioned to support every part of the AI stack, from Instinct GPUs and EPYC CPUs to Pensando DPUs and scale-out networking. All of this is built on open, flexible, and programmable infrastructure made for today’s AI needs. 

AMD’s open-ecosystem approach is especially valuable in the enterprise market. For example, a hospital using AI for medical imaging cannot spend months integrating a proprietary system. A financial institution training credit-risk models on sensitive data needs full control over its deployment. AMD’s open-standard solutions, like OCP-compliant racks, UEC-compliant NICs, and ROCm open-source software, help remove the barriers that have kept enterprise AI from moving beyond the testing phase. 

The $10 Billion Question 

In the first quarter of 2026, AMD reported strong results. Demand for AI infrastructure drove data center revenue up 57% from the previous year. Total revenue hit $10.3 billion, thanks to hyperscalers and enterprise customers expanding their AI capacity. Wall Street analysts expect AMD’s AI GPU revenue for the year to be between $10 billion and $12 billion. 

AMD and Meta have signed a multi-year deal to power Meta’s AI infrastructure with up to 6GW of AMD Instinct GPUs. Shipments will start in the second half of 2026, using the Helios rack-scale architecture. 

These developments do not guarantee that AMD will catch up to NVIDIA. NVIDIA’s software ecosystem, including CUDA’s long lead time, developer tools, and optimization libraries, remains a major advantage. However, AMD is now competing on more than just chip performance. The company argues that the future of AI infrastructure lies in open systems and that enterprises seeking flexibility should have a supply chain that supports it. 

The real test will come when Helios systems are shipped in large numbers in late 2026. AMD’s ability to deliver the combined quality and software reliability that enterprise customers expect will decide if this strategy leads to lasting market share or just a memorable keynote.

Source: AMD Newsroom 

Amazon

Leave a Reply

Your email address will not be published. Required fields are marked *