Views: 23
Read Time:7 Minute, 18 Second

AMD provided a deep-dive look at its latest AI accelerator arsenal for data centers and supercomputers, as well as consumer client devices, but software support, optimization and developer adoption will be key.

Advanced Micro Devices held its Advancing AI event in San Jose this week, and in addition to launching new AI accelerators for the data center, supercomputing and client laptops, the company also laid out its software and ecosystem enablement strategy with an emphasis on open source accessibility. Market demand for AI compute resources is currently outstripping supply from incumbents like Nvidia, so AMD is racing to provide compelling alternatives. Underscoring this emphatically, AMD CEO Dr. Lisa Su, noted that the company is raising its TAM forecast for AI accelerators from the $150 billion number it projected a year ago at this time, to $400 billion by 2027 with a 70% compounded annual growth rate. Artificial Intelligence is obviously a massive opportunity for the major chip players, but it’s really anybody’s guess as to the true potential market demand. AI will be so transformational that it will impact virtually all industries in some way or another. Regardless, the market will likely be welcoming and eager for these new AI silicon engines and tools from AMD.

Instinct MI300X And MI300A: Tip Of The AMD AI Spear

AMD’s data center group formally launched two major product family offerings this week, known as the MI300X and MI300A , for the enterprise and cloud AI and supercomputing markets, respectively. These two products are purpose-built for their respective applications, but are based on similar chiplet-enabled architectures with advanced 3D packaging techniques and a mix of optimized 5 and 6nm semiconductor chip fab processes. AMD’s High Performance Computing AI accelerator is the Instinct MI300A that is comprised of both the company’s CDNA 3 data center GPU architecture, along with Zen 4 CPU core chiplets (24 EPYC Genoa cores) and 128GB of shared, unified HBM3 memory that both the GPU accelerators and CPU cores have access to, as well as 256MB of Infinity Cache. The chip is comprised of a whopping 146B transistors and offers up to 5.3 TB/s of peak memory bandwidth, with its CPU, GPU, and IO interconnect enabled via AMD’s high speed serial Infinity Fabric.

This AMD accelerator can also run as both a PCIe connected add-in device and a root complex host CPU. All-in, the company is making bold claims for MI300A in HPC, with up to a 4X performance lift versus Nvidia’s H100 accelerator in applications like OpenFOAM for computational fluid dynamics, and up to a 2X performance-per-watt uplift over Nvidia’s GH200 Grace Hopper Superchip. AMD MI300A will also be powering HPE’s El Capitan at the Lawrence Livermore National Laboratory, where it will replace Frontier (also powered by AMD) as the world’s first two-exaflop supercomputer, reportedly making it the fastest, most powerful supercomputer in the world.

MI300X is a different sort of beast, however, targeted squarely at cloud data centers and enterprise AI workloads like Large Language Models, natural language recognition and generative AI. MI300X does not have any Zen 4 CPU chiplets on board (what AMD calls CCDs), though it accommodates more AMD CDNA 3 Accelerator Complex Die chiplets in an all-GPU design. There are up to a total of 6 XCDs on board MI300X, totaling 228 GPU Compute Units. MI300X also has a larger memory capacity with 192GB of HBM3. Like the MI300A, MI300X also offers about 5.3TB/s of aggregate memory bandwidth, and a massive 17TB/s of peak bandwidth from its 256MB of AMD Infinity Cache.

Once again the performance claims AMD has made are bold, with Su proclaiming a 1.4X performance lift (latency reduction) in Llama2 (Meta’s assistant-like natural language model) to a 1.6X uplift in the BLOOM transformer-based LLM, alternative to GPT-3 versus competitive offerings from Nvidia. In inferencing workloads like these, AMD is claiming performance leadership over Nvidia, though MI300X will supposedly offer roughly performance parity with H100 in AI training workloads. Of course Nvidia just released an update to its optimized software for Llama2, so it’s likely that AMD did not have this factored into its benchmark results above. In addition, Nvidia’s H200 Hopper GPU is waiting in the wings and should bring even more gains for Nvidia inferencing performance.

AMD Ryzen 8040 Series To Bring An AI Lift For Laptops

From a hardware standpoint, fleshing out the remainder of AMD’s Advancing AI day offerings was Ryzen AI and a new line of Ryzen 8040 series mobile processors for laptops. Code named, Hawk Point, these APUs are similar to AMD’s current generation Ryzen 7040 series, with up to eight Zen 4 CPU cores and up to twelve RDNA 3 compute units for graphics, which also have goosed-up clock speeds. However, Hawk Point’s Neural Processing Unit has been been optimized both in hardware and firmware, and AMD says that its new XDNA NPU delivers up to 16 trillion operations per second of throughput for AI workloads, representing a 60% performance lift over the its previous generation 7040 series.

AMD claims this will raise real-world AI application performance in this new class of laptops by as much as 40%, with AI models like Llama 2 and other applications involving machine vision. Since the Ryzen 8040s XDNA NPU is essentially a slice of Xilinx FPGA, optimizations were likely made to this block of circuitry, reconfiguring it for better performance and efficiency. AMD notes that Ryzen 8040-series AI-enabled PCs will be available in Q1 of 2024, and that it’s sampling OEM partners now.

Software Enablement Is Key: Enter ROCm 6 And Ryzen AI Software

All of this powerful new silicon will need a lot of heavy duty software enablement effort from AMD, and in that regard the company announced two new installments in its software suite for developers, ROCm6, which will work in concert with its Xilinx Vitis AI development and deployment tools, as well as Ryzen AI software for client machines. AMD notes a second installment of ROCm 6 for training workloads is also incoming. ROCm is AMD’s open-source software development platform, and it supports many of the leading AI frameworks like ONYX, TensorFlow and PyTorch. AMD also notes that data center AI developers coming from Nvidia’s CUDA language can easily port and optimize their existing models and applications with ROCm as well. AMD’s CEO Dr. Su also had a show of force in support on stage with her, with representatives from Lamini, Databricks and Essential AI extolling the virtues of working with ROCm, with Lamini CEO Sharon Shou, specifically underscoring that Lamini has reached feature and performance parity with CUDA.

On client machines, Ryzen AI will take pre-trained models and quantize and optimize them to run on AMD’s silicon for easy deployment. In conversations with AMD, I was told that the goal is to have a simple one click interface for developers, with support for ONYX, Tensorflow and Pytorch live right now in the first installment of Ryzen AI Software. The folks in Redmond are readying Windows support as well, but AMD will be at Microsoft’s mercy ultimately in this regard.

Wrapping-up this quick-take Advancing AI Day digest, I would offer that AMD’s success will rely heavily on its software enablement effort, which will have to be a long-standing, continual investment in ease of use, performance and efficiency optimization, and ultimately developer adoption. It appears the company has the hardware muscle at the ready to take on its primary rivals Nvidia and Intel. With AMD President Victor Peng heading up its AI strategy, and with the long lineage of software enablement he fostered at Xilinx before the company was acquired, it appears that AMD has the leadership and resources in place to execute on this side of the equation as well. It’s going to be a dogfight with Nvidia, no question about it. With the heavy optimization and tuning of models that’s going on right now, the AI performance landscape can and will change on a dime. And let’s face it, AI is still very much in its infancy.

Scroll to Top