Arm's Once-in-a-Decade Architectural Upgrade to Take on Intel
Arm overhauled the architecture of its central processing chips for the first time in a decade, giving its customers new capabilities to take on Intel in artificial intelligence and data centers.
Arm said that it revamped its core instruction set architecture or ISA, which serves as a sort of contract between the hardware and software and determines the software that a CPU can run. Arm sells the blueprints to chips based on the ISA to chip vendors and technology giants that have transformed it into the industry standard in mobile processors over the last decade.
Arm said the improvements in its instruction set architecture, Armv9, are all about bringing its central processing chips into the future. Arm said that Armv9 offers performance gains for AI and digital signal processing (DSP) workloads, ranging from audio and video to voice. The company is also bringing a wide range of new hardware security capabilities into the fold.
While it is not competing directly against Intel, Arm said the architectural overhault will allow its customers to roll out more advanced chips for energy-efficient devices like smartphones, where preserving battery life is key. It is also giving them a fresh set of tools to compete on more level ground in Intel's strongholds, such as in personal computers and data centers.
Arm is one of the major players behind the scenes of the global chip business. Arm licenses its chip designs to more than 500 partners that have shipped nearly 200 billion chips to date with Arm inside. Top customers include Qualcomm, NXP, Renesas, Marvell, Nvidia, Apple and others that are set to take advantage of Armv9's enhancements in the years ahead.
Arm said it expects the second generation of CPU products based on the Armv9 architecture, code named "Makalu," to deliver 30% higher performance than its most advanced Armv8 CPU.
Once-in-a-Decade Upgrade
Arm said it expects chips based on the Armv9 architecture to be on the market by early 2022. It is on pace to introduce its first Armv9 CPU, code named "Matterhorn," this year.
Arm rolled out its current architecture, called Armv8, which introduced Arm’s 64-bit instruction set, AArch64, a decade ago. Since then, the company has been upgrading it with a wide range of extensions and other enhancements that it used to wring more and more performance per watt from its CPU designs. Nvidia announced last year that it plans to buy Arm for $40 billion.
Arm said its customers have shipped more than 100 billion chips in the last half decade as its mobile processsors outstripped Intel, which controls the x86 architecture at the heart of most chips for PCs and data centers, and lured in Silicon Valley giants from Apple to Microsoft. The Armv9 architecture shares the same base features as v8 along with its unique improvements.
While Armv9 brings a generational leap in performance, Arm plans to upgrade the underlying architecture once a year for the next 10 years, rolling out versions v9.1, v9.2, and so on. Arm said Armv9 would underpin all of its architectural "profiles" for different classes of devices—"A" for apps processors for mobile and servers, "R" for real-time processors, and "M" for microcontrollers—and power the next 300 billion chips with Arm inside.
Arm CEO Simon Segars said these improvements are vital as the CPU transforms from more of a one-dimensional compute block to one that can handle heterogeneous workloads, such as AI and 5G. "As we look toward a future that will be defined by AI, we must lay a foundation of leading-edge compute that will be ready to address the unique challenges to come ahead."
Arm featured supporting comments from silicon partners including Nvidia, Renesas, TSMC, NXP Semiconductors, Ampere Computing, Marvell, Mediatek, Samsung, and others.
Machine Learning Boost
Arm said it would pump out more performance by upgrading the Scalable Vector Extensions (SVE) technology, which adds vector processing performance to improve AI and DSP chores, and incorporating it in the Armv9 architecture. The SVE, introduced in the Armv8 architecture, is currently at the heart of Fugaku, the world's fastest supercomputer designed by Fujitsu.
Arm frequently adds new groups of instructions or extensions to its underlying architecture to handle specific chores and that can be enabled and disabled as needed by its silicon partners. The SVE and Intel's AVX technology are so-called SIMD instruction sets that are used to boost parallel processing and are fundamental to high-performance and power-efficient computing.
Arm is incorporating its latest SIMD instruction set, SVE2, as a standard feature of Armv9, upgrading from the Neon SIMD technology in its Armv8 architecture, which is used to speed up audio and voice processing and image and voice recognition. SVE2 promises to improve the performance and power efficiency of AI and DSP workloads on future Armv9 CPUs.
Arm said SVE2 promises to bring performance gains of SVE to a broader range of chips, from Arm's Cortex-M microcontrollers in the IoT, to Cortex-A processors in smartphones, to general purpose chips for data centers. Arm said that SVE2 also improves the ability of the CPU to run 5G baseband and other chores locally instead of offloading them to accelerators.
Arm also promised to further extend its SVE2 technology with enhancements in matrix math computations in the CPU, in addition to ongoing innovations in its Mali GPUs and Ethos NPUs.
“Addressing the demand for more complex AI-based workloads is driving the need for more secure and specialized processing, said Richard Grisenthwaite, chief architect at Arm, adding that this "will be the key to unlocking new markets and opportunities.” He said ArmV9 would also "help our partners balance faster time-to-market and cost control alongside the ability to create their own unique solutions."
Computing "Confidentially"
The other major improvement in the Armv9 architecture is related to hardware security.
Arm introduced what it calls the confidential computing architecture (CCA) to its Armv9 roadmap. Even though it early in development, the technology will be used to protect portions of software and data while they are being processed—even from the operating system (OS)—by carrying out computations in secure regions of memory isolated on the CPU.
Usually, the operating system in the device is trusted unconditionally. But in the event the OS—or the hypervisor the software is running on—is compromised or hacked, it becomes a threat.
Arm also introduced the concept of "realms," which resemble software containers but in the hardware with the Armv9 architecture. Arm said the applications or other workloads can be transferred to a "realm"—a region completely isolated from the secure and nonsecure worlds in the CPU—where it runs independently from other code. If software running on the other side of a partition is hijacked, the realm denies access and cancels the operation.
The goal is to protect the content of the realm—the data and the operations that process it all—from software running in other zones of the chip, whether that is an application, service or firmware. These realms can only be accessed by software with specific permission to do so. It remains completely invisible to software in a smartphone or other devices without it.
The realms expand on Arm's TrustZone technology, which creates "secure" and "non-secure" worlds in the CPU that operate independently to reduce the threat of malware or viruses. Arm said TrustZone is used today to defend billions of smartphones and other devices by establishing secure zones in the CPU where the software cannot be accessed.
The realms could also better secure the cloud. Today, cloud services vendors rent servers and use software to divide the chips inside into virtual machines or VMs that can run applications from different customers at the same time. Arm said a realm could be used to isolate the VM and protect it from harm in case the OS or other privileged software in the CPU is hacked.
The focus on the hardware security fits with Arm's estimate that 100% of the world's data will be processed by Arm CPUs at some stage of its life, in the device, network, or the cloud.
The "Total Compute" Package
For the last half decade, Arm has improved the performance of its Cortex-A CPUs annually at a rate that has outpaced the industry, giving it a foothold in areas dominated by Intel.
Last year, Apple replaced Intel CPUs in its Mac lineup in favor of its internally-designed Arm-based M1 system-on-a-chip (SoC). The move came after more than a decade of using its A-series chips in its iPhone and iPad. Amazon is also deploying its Arm-based Graviton CPUs in data centers, which it said offers a 40% performance uplift for cloud services over Intel CPUs.
Rene Haas, who leads Arm’s intellectual property (IP) business, said the architectural advances will translate to a performance uplift of 15% this year and another 15% next year. But as the semiconductor industry at large shifts from general- to more special-purpose computing, Arm said annual double-digital CPU performance gains in are not enough.
The Armv9 architecture also includes what the company calls its “Total Compute" strategy, wringing out more performance via a series of improvements at system and software levels. Arm said it plans to apply the principles of its "Total Compute" philosophy to its entire portfolio of IP for automobiles, Internet of Things, consumer electronics, and data centers.
Arm is also developing technologies for the future to boost frequency, bandwidth and cache, and reduce the memory latency in order to boost the performance of CPUs based on Armv9.