Intel set a new standard in AI performance by fine-tuning Llama 2 70B with low-rank adapters and training the MLPerf GPT-3 model using over 1,000 Gaudi 2 accelerators in the Intel Tiber development cloud, according to MLCommons’ latest benchmark results.  

What’s new: MLCommons has released the results of its MLPerf training v4.0 benchmark (an industry standard set of tests to measure machine learning training performance). Intel’s results highlight the options that Gaudi2 AI accelerators (specialized hardware components designed to accelerate AI tasks) offer businesses. Community-driven software (improvements and tools created by open-source contributors) makes generative AI development easier, and standard Ethernet networking (the common network technology used to connect computers and devices) enables flexible scaling for the first time. Intel submitted the results from a single Gaudi2 system with 1,024 accelerators on the Intel Tiber Developer Cloud, demonstrating Gaudi2’s performance and scalability, as well as the cloud’s ability to train the MLPerf GPT-3 175B parameter model (a benchmark test using a very large AI language model with 175 billion parameters).  

“The industry needs better generative AI solutions with high performance and efficiency. The latest MLPerf results from MLCommons highlight the unique value of Intel Gaudi as businesses seek more affordable, scalable systems with standard networking and open software. This makes generative AI more accessible to more customers.” – Zane Ball, Intel Corporate Vice President and General Manager, DCAI Product Management.  

Why it matters: Many customers want to use generative AI but face challenges with cost, scale, and development. Last year, only 10% of enterprises successfully launched GenAI projects. Intel’s AI solutions help businesses overcome these barriers. Gaudi AI is a scalable, accessible option for training large language models with 70-175 billion parameters. The upcoming Gaudi 3 accelerator will offer even better performance, openness, and choice for enterprise GenAI.  

How Intel Gaudi 2 MLPerf Results Show Transparency 

The MLPerf results confirm that Gaudi2 remains the only MLPerf benchmarked alternative to the Nvidia H100 for AI computing training GPT-3 on the Tiber Developer Cloud. Intel achieved a time-to-train of 66.9 minutes using 1024 Gaudi accelerators, highlighting strong scaling performance for very large language models in a cloud environment.  

The benchmark suite introduced a new test: fine-tuning the Llama 2 70B parameter model with low-rank adapters. Fine-tuning large language models is a common need for many customers and AI practitioners, making this a practical benchmark. Intel’s submission reached a time-to-train of 78.1 minutes on eight Gaudi 2 accelerators. For this, Intel used open-source software from OptiML (a toolkit for optimizing AI models for Habana accelerators), 03 from DeepSpeed (a tool for memory-efficient training), and FlashAttention-2 (a method to speed up attention mechanisms in transformer models). The benchmark task force, led by engineers from Intel’s Habana Labs (developers of the Gaudi accelerators) and Hugging Face (a provider of open-source AI tools), created the reference code and rules.  

How Intel Gaudi Delivers Value In AI 

High costs have kept many businesses out of the AI market, but Gaudi (Intel’s specialized AI hardware accelerator) is changing that. At Computex (an annual computer expo), Intel announced that a standard AI kit with eight Gaudi accelerators and a universal baseboard costs $65,000, about one-third the cost of similar platforms. A kit with eight Gaudi 3 accelerators (the next generation of Intel’s AI hardware) and a baseboard is listed at $125,000, about two-thirds the cost of comparable options.  

Growing momentum shows Gaudí’s value. Customers chose Gaudi for its price-performance benefits and accessibility, such as:  

  • Naver, a major South Korean cloud provider and search engine with over 600 million users, is building a new AI ecosystem. They are making it easier for customers to adopt large language models (advanced AI systems that understand and generate text) by reducing development costs and project timelines.  
  • AI Sweden, a partnership between the Swedish government and private companies, uses Gaudi (Intel’s AI accelerator hardware) to fine-tune models with municipal content (data from local governments). This helps improve efficiency and public services for people in Sweden.  

How Intel Type Developer Cloud Helps Customers Use Gaudi 

The Tiber Developer Cloud (Intel’s managed cloud platform) offers a managed, cost-effective platform for developing and deploying AI models, from single nodes to large clusters. In the Tiber Developer Cloud, Intel provides access to its accelerators (specialized AI processors), CPUs, GPUs, OpenAI software (artificial intelligence tools), and other services. Intel customer Seekr recently launched SeekrFlow, an AI development platform using Intel’s Developer Cloud to serve its clients.  

According to cio.com, Seekr cited cost savings of 40 to 400% from the Tiber developer cloud for select AI workloads compared to on-premises systems with other vendors, GPUs, and another cloud service provider, along with 20% faster AI training and 50% faster AI inference than other on-premises systems.  

What’s next: Intel plans to submit MLPerf results for the Gaudi3 AI accelerator in the next inference benchmark. Gaudi3 is expected to deliver stronger AI training and inference performance on key models and will be available from equipment manufacturers in fall 2024.

Source: Intel Gaudi Enables a Lower Cost Alternative for AI Compute and GenAI