NVIDIA, AMD Win Contract for Supercomputer "Testbed" Ahead of Intel’s Aurora
NVIDIA and AMD are helping the US Department of Energy (DoE) build a new supercomputer that will allow scientists and researchers to dial in software to run on the upcoming Intel-based Aurora exascale system.
The supercomputer, called Polaris, will be housed at the agency’s Argonne National Laboratory in Illinois. It is designed to carry out up to 44,000 trillion operations per second of double-precision performance—or 44 petaflops—placing it in the top 10 of the 500 fastest supercomputers in the world. It will be assembled by HPE using AMD's central processing units (CPUs), paired with NVIDIA’s graphic processing units (GPUs).
The national lab said Polaris will serve as a “testbed” to start prepping software that researchers plan to run on the $500 million Aurora system, which Intel and HPE are building under contract for the DoE. The Aurora supercomputer, which will be used for a wide range of artificial intelligence (AI), engineering, and scientific projects, will also be installed at Argonne and provide exascale-level performance when it debuts in 2022.
The announcement comes as Intel faces delays in supplying its latest Xeon data center CPUs (code-named Sapphire Rapids) and Xe GPUs (Ponte Vecchio) at the heart of Aurora. Both are on pace to enter production by early 2022. Together, they will support a million trillion—1,000,000,000,000,000,000—operations a second, or one exaflop of performance, giving Aurora up to four times faster than any of Argonne’s current systems.
AMD and NVIDIA are moving aggressively to pry into Intel’s stronghold in the market for data center chips. Winning a federal contract to help build a supercomputer is more about prestige than revenue, but many of the technologies engineered for these colossal computers filter down into the wider market. The Polaris win underlines the growing threat that Intel faces from AMD’s CPUs and its uphill battle against NVIDIA’s GPUs.
The race for a faster supercomputer is critical in scientific circles. These massive systems are vital for research into the biology of diseases, including cancer and viruses, as well as advanced materials and weapons development. They can handle heavy-duty computations used to predict the impacts of climate change, breach encryption codes, and study the physics of particle collisions in the search for dark matter.
Polaris will be based on 280 nodes, each containing a pair of AMD EPYC processors. AMD said Polaris will run on its second-generation EPYC CPUs, code-named Rome, and be upgraded to its new Milan CPUs in the future. Each CPU will be attached to four NVIDIA A100 GPUs each (2240 total) for handling AI chores. When running AI workloads at mixed-precision, Argonne said Polaris’s performance could jump up to 1.4 exaflops.
The server nodes are lashed together with Slingshot, a high-performance Ethernet fabric designed by HPE and targeted at the high-performance computing (HPC) market. Slingshot will also be featured in Aurora.
NVIDIA, which has overtaken Intel as the most valuable US chip company, has been expanding its footprint in the HPC market. In July, it announced it had worked on building Cambridge-1, the fastest supercomputer in the UK, which uses 80 of its DGX A100 servers. The company’s hardware is also behind Perlmutter, which is said to be the world’s fastest supercomputer specifically for AI workloads, and Argonne's Selene system.
AMD is also raising its profile in the market. Last year, AMD landed a $600 million contract with HPE to build a supercomputer that will be used by the DoE to support the nation’s nuclear arsenal. The system, called El Capitan, will be housed at Lawrence Livermore National Laboratory when it is completed in 2023. It will be designed to pump out peak speeds of 2 exaflops, which should turn it into the fastest system in the world.
El Capitan will also be faster than another system AMD and HPE are building for the Oak Ridge Laboratory. Expected to debut this year, the Frontier supercomputer is designed to run at a maximum of 1.5 exaflops.
The Polaris supercomputer is already being installed at Argonne and should be available in early 2022. Neither NVIDIA nor AMD revealed the dollar amount of the contract.
“Polaris is a powerful platform that will allow our users to enter the era of exascale AI,” said Michael Papka, who leads the supercomputing arm of Argonne National Laboratory, in a statement.