U.S. Rolls Out Mini Supercomputer to Test Software for AMD's Frontier
The U.S. Department of Energy introduced a small computing cluster that will serve as a test platform for its upcoming Frontier supercomputer, which is on pace to become the nation’s first exascale machine.
Oak Ridge National Laboratory (ORNL) said researchers are using the “Crusher” system to test software that they plan to run for Frontier, which is set to become one of the world’s leading supercomputers. Frontier is designed to deliver up to 1.5 million trillion—1,500,000,000,000,000,000—operations per second, or 1.5 exaFLOPS (EFLOPS). That’s more than three times the peak performance of Japan's Fugaku, the world’s current No. 1.
Crusher is a miniature Frontier, which was en route to becoming the world's top supercomputer upon its planned launch last year. But the final $600 million system is still undergoing integration and testing at ORNL.
Crusher is based on the same architectural building blocks as the Frontier supercomputer, which will consist of 100 cabinets of servers designed by HPE. Crusher, however, consists of only 1.5 cabinets, combining 192 server nodes connected by HPE’s Slingshot interconnect. Each node contains one third-generation 64-core AMD EPYC CPU paired with 512 GB of DDR4 memory and four AMD Instinct MI250xGPUs that are fed by 512 GB of HBM2e.
The Crusher system packs a vast amount of computing power despite being designed for troubleshooting and testing software to run on Frontier in the future. ORNL said it pumps out more performance than its now-decommissioned Titan supercomputer, the world’s fastest when it was introduced a decade ago, while occupying 1% of the floor space. Crusher takes up 44 square feet, compared to the 4,352-square-foot Titan.
Processing Power-Up
The U.S. is racing against China and other nations to take the lead in the supercomputer realm. These systems are key for research in areas ranging from the development of advanced materials, medicines, and weapons to the design of automobiles and consumer goods. They can handle the hefty computations used to model the implications of climate change and simulate galaxies, providing insights into how they form and evolve.
ORNL said researchers are already seeing promising results on the Crusher testbed. It is helping them prepare software for Frontier upon its completion by late 2022 and availability to outside researchers next year.
The Crusher system is running a type of astrophysical hydrodynamics software used to simulate the dynamics of galaxies up to 15 times faster than the lab’s current top supercomputer, the IBM-based Summit machine, which uses a combination of IBM’s Power CPUs and NVIDIA’s Volta GPU accelerators. Summit is the fastest system in the U.S. today and No. 2 on the Top 500 ranking of the fastest supercomputers in the world.
Researchers are also running nuclear physics software used to calculate the properties of atoms. The code runs up to 8 times faster on one of the AMD Instinct MI250x GPUs that power Frontier than on Summit’s NVIDIA V100 GPUs.
Bragging Rights
AMD, Intel, and NVIDIA are fighting for market share in high-performance server chips used in data centers. While producing chips for one-off supercomputers is a relatively small slice of the business, the advanced technologies at the heart of these systems tend to trickle down into corporate data centers. That gives a competitive edge to chip firms that win contracts from the likes of the U.S. Department of Energy (DoE).
AMD has also landed a second $600 million contract with HPE to build an exascale supercomputer for the DoE to support the nation’s nuclear arsenal. The new supercomputer, called El Capitan, will be housed within the closely guarded grounds of Lawrence Livermore National Lab when it is completed early next year.
The U.S. is building another exascale supercomputer called Aurora, which promises even more performance than Frontier when it opens at Argonne National Laboratory in late 2022.
Intel, which is supplying Aurora’s CPUs and GPUs, said last year the new supercomputer will pump out more than 2 EFLOPS of peak performance and deploy the same Slingshot interconnects as Crusher and Frontier.