Server Processors Stack Up to 1.1 GB of 3D Cache

AMD is leveraging one of its latest families of EPYC server CPUs, code-named Genoa X, in-house to run the electronic design automation (EDA) tools it uses for product development. Based on TSMC's 5-nm process node, the Santa Clara, California-based company said the new server chips that top out at 96 cores and 192 threads are tailor-made for technical computing.

Despite relying on the same Zen 4 microarchitecture at the heart of its regular general-purpose Genoa CPU, the chips come with a massive on-chip cache of more than 1 GB, up to 3X more than the base processors. The gains are made possible by AMD’s advanced 3D chip-stacking technology called V-Cache.

The large amount of shared L3 cache in the new Genoa CPU is what suits it specifically for EDA and other computationally heavy workloads. Such workloads range from structural analysis to determine the structural integrity of a bridge or building, finite element analysis (FEA) to replicate the physics of an automotive test crash, or computational fluid dynamics (CFD) to simulate air currents gliding over an airplane wing.

“From aircraft engines to the most advanced semiconductors, the rapid design and simulation of new products is imperative in today’s market,” said Dan McNamara, senior vice president of AMD’s server unit, at the company’s recent data center and AI technology event, where it rolled out Genoa X.

AMD worked with many of the leading players in semiconductor and system design, including Altair, Ansys, Cadence, Dassault Systèmes, Siemens, and Synopsys to make sure their software can harness the additional cache.

Microsoft is also plugging the Genoa X into a new series of cloud services from its Azure unit.

3D Chip Stacking

AMD touted Genoa as the fastest general-purpose server CPU in the market. But the Genoa X brings a major uplift in performance specifically for technical computing, where large caches make a difference.

Genoa X is based on the same Zen 4 microarchitecture as AMD’s more general-purpose EPYC server CPUs.

The new server CPU is disaggregated into up to 12 core complex die (CCD)—also called chiplets—that contain up to eight Zen 4 cores each. While every chiplet in its more general-purpose Genoa chips have 32 GB of L3 cache and 1 MB of L2 cache, the Genoa X with the 3D V-Cache brings another 64 GB of L3 cache to the table, for a total of up to 96 cores and 1.15 GB of L3 cache per server processor.

This is the second generation of AMD’s V-Cache technology. The company introduced the 3D stacking technology to the data-center market with the Milan X, based on its Zen 3 architecture, in 2022.

To mount the memory on top of the CPU, advanced die-stacking technology from TSMC called “hybrid bonding” is employed. Hybrid bonding uses tiny copper-to-copper interconnects to supply a total of 2 Tb/s of communications bandwidth between the chiplets connected face-to-face.

Sam Naffziger, SVP of product technology architecture at AMD, said that this pays dividends in power efficiency and interconnect density—at least compared to the on-package 2D chiplet packaging it uses in its more general-purpose silicon.

The V-Cache is located further from the CPU cores than the L3 cache that runs through the middle of the CCD, serving as a sort of central repository where data is stored for fast, repeated access by Zen 4 cores. But the penalty on performance is relatively limited, said AMD. It takes less time for CPU cores to access the 3D cache above it than to leave the CPU, access system memory, and then return to the CPU.

The chiplets are co-packaged with the same central I/O die as the general-purpose Genoa CPU family that uses AMD’s Infinity Fabric to coordinate data traveling between the CPU chiplets surrounding it. Based on 6-nm technology, the I/O tile supports up to 12 channels of DDR5-4800, up to 128 lanes of PCIe Gen 5, and other connectivity properties, including the emerging CXL 1.1 cache-coherent interconnect standard.

The lineup’s flagship 96-core CPU has a base clock speed of 2.55 GHz while the 16-core CPU supports up to 3.55 GHz, with a maximum boost frequency ranging from 3.7 to 4.2 GHz.

The Genoa X family tops out at 96 cores, 192 threads, and 1.15 GB of L3 cache.

The chips also fit into the same thermal and power envelope as Genoa, with a thermal design power (TDP) ranging from 320 to 400 W.

Climbing to the Cloud

As a chip company itself, AMD is also using Genoa X CPUs to boost its chip design efforts.

Ram Peddibhotla, AMD’s corporate VP of product management, said it’s using the 3D Genoa internally to not only to benchmark it against the competition, but also to speed up steps in the chip design process where it uses EDA, including verification. He added that a 16-core Genoa X CPU is used to do functional verification runs on one of its graphics cards over 70% faster than a 16-core Genoa EPYC without the 3D V-Cache.

It's not feasible for even the most skilled engineers to test every detail in a semiconductor design by hand. They must run thousands of simulations to verify the performance of the design before sending the final blueprint to the fab to be manufactured. To save time, companies run many simulations at the same time on separate CPU cores. But the cores all fighting for limited cache memory takes a toll on performance.

AMD said Genoa X can ease these bottlenecks by reducing latency and enabling larger on-chip memory. Aside from the company’s engineering teams, other semiconductor makers are taking notice.

Microsoft said STMicroelectronics is using one of the cloud services powered by Genoa X to run simulations on the register transfer level (RTL)—the abstract blueprint of the digital portions of a chip design—of its next-generation silicon.

Nidhi Chappell, head of high-performance computing (HPC) and AI at Azure, said the company was able to reduce simulation times by 30%. “What that means is that their engineers can look at a lot more design possibilities and they can improve product quality because they are doing a lot more validation.”

She added, “But ultimately, it means they can bring products to market faster. They can do all of this in cloud.”