What you’ll learn
- What's included in Arm’s Total Compute Solution 2023.
- Why the Cortex-X4 is important.
- How Arm is streamlining the SoC development process.
Arm’s big announcement for 2023 was its Total Compute Solution (TCS) that brings together the Cortex-X4, Cortex-A720, and Cortex-A520 cores, along with the Immortalis-G720 GPU (Fig. 1). The CPU cores are part of the DynamIQ Shared Unit 120 (DSU-120), which is a Cortex cluster that can be replicated up to eight times in a system-on-chip (SoC) design. It targets TSMC’s N3E 3-nm process node.
These CPU and GPU resources are tied together with a combination of Arm’s CoreLink CI-700 coherent interconnect, the NI-700 network interconnect, and the MMU-700 memory-management units. The components are designed to work together, thereby significantly reducing a designer’s work.
The Cortex-X4 bumps up performance by 15% while cutting power requirements by 40%. This is made possible by a wider, out-of-order (OOO) execution unit with up to eight ALUs (Fig. 2). It can decode 10 instructions at a time. The OOO execution window is 768 instructions split between two fused micro-operations (microOPs). Arm changed the design of its instruction fetch unit to keep the pipeline fed. It’s been optimized to reduce pipeline stalls.
These days, performance increases tend to be smaller, but cutting power is a significant benefit since most of the applications for this platform are battery-powered mobile devices. The Cortex-X4 can be paired with up to 2 MB of L2 cache per core. The Cortex-X4 utilizes a new temporal data prefetcher for its L1 cache.
The trick to low-power operation is creating a balance between performance and power requirements. That’s why the big-LITTLE architecture approach of mixing different size cores with different performance and power specs makes sense. They all run the same instruction set; therefore, applications don’t care what core they run on. It’s just a matter of how quickly they operate and how much power they use.
The Cortex-A720 is 20% more power-efficient, while the Cortex-520 is 22% more power-efficient (Fig. 3). All cores can be used to provide maximum performance—the Cortex-520 alone can sip the minimum amount of power.
The DSU-120 supports up to 14 cores per cluster (Fig. 4). The cores, which can share up to 32 MB of L3 cache, work with the latest Armv9.2 architecture. The architecture adds support for the QARMA3 algorithm for the Pointer Authentication (PAC) and branch target identification extensions introduced in ARMv8.10-M.
In addition, the cores manage the Memory Tagging Extension (MTE) for enhanced security and Scalable Vector Extension 2 (SVE2) SIMD that also support machine-learning (ML) acceleration.
The Immortalis-G720 GPU is Arm’s fifth-generation GPU (Fig. 5). It improves power and system performance efficiency by 14%. It also adds a deferred vertex shading (DVS) unit to the mix. Memory bandwidth requirements have been cut by 40%. The GPU is designed to handle ray tracing, 3D apps, and high-fidelity gaming applications.
The GPU supports tiles up to 64 by 64 and performance for processing Multisample Anti-Aliasing (MSAA) has been increased. Upgrades in performance are provided for 64-bit/pixel content, too. Drivers for OpenGL, Vulkan, and OpenCL are available.
TCS23 is the latest incarnation of the family (Fig. 6). We can expect similar upgrades next year, but right now the Cortex-X4 rules the roost.