(Image courtesy of Dreamstime).
Dreamstime Xl 150164558 60cad8a2c07ff

Arm Challenges Intel and AMD's Lead in Laptops With Cortex-X2 CPU

June 17, 2021
Arm is trying to plunder more market share in personal computers with the Cortex-X2 core, which it hopes will allow vendors to match Apple's accomplishments with the M1 processor at the heart of the latest Mac laptops.

Arm is looking to challenge Intel and AMD’s leadership in the personal computer (PC) market with its latest Cortex-X2 CPU core, which is designed to deliver the levels of performance required by laptops.

The company said the X1 core—part of its Cortex-X family of processors—is its latest flagship CPU for high-end smartphones and laptops. The CPU core delivers gains of 16% in instructions per clock (IPC) over the previous Cortex-X1 core at the same process node and frequency but with double the cache memory, Arm said. The company called the X2 "its most powerful CPU to date" for consumer devices.

The X2 is the first in its family of client CPUs based on the latest Armv9 architecture that brings major boosts in performance, power efficiency, and security. As part of a long-term strategy to take out 32-bit instruction set from its mobile chips by 2023, Arm said the X2 only has 64-bit support. The server-grade Neoverse N2 core, which uses the same underlying Armv9 architecture, was introduced in April.

Arm said the X2 also supports its second-generation scalable vector extension (SVE 2) technology. The core contains 128-bit SIMD processing pipelines based on SVE 2, giving it up to double the performance for machine learning over the X1. Performance gains apply to other workloads as well, including 5G. As part of the Armv9 architecture, the X2 supports INT-8 and BFloat-16 data formats to speed up AI chores.

Arm's latest product rollout comes as the company looks to take advantage of the growing momentum for laptops with its chip designs inside. Last year, Apple replaced Intel CPUs in its Mac line of laptops in favor of its internally designed M1 system-on-chip (SoC), which it has also started shipping in its Mac computers. The move came after a decade of Apple using its custom-designed A-series chips in the iPhone, culminating in the A14 chip in the iPhone 12.

Apple has a world-class chip engineering department that spent years building the CPU cores in the M1 from the ground up, giving it 16 computing cores that are manufactured on the 5-nm node from TSMC. According to Apple, the M1—which is also used in the latest iPad Pro tablet—offers better performance than Intel CPUs that powered its Mac laptop and desktop computers for more than a decade and a half.

Apple does not license a predesigned Arm core in the M1. Instead, it leverages a so-called “architecture license” to design its own. Apple has been able to roll out chips in recent years that can rival Intel and AMD in single-threaded performance in PCs, pushing the envelope in a way that has not been possible for vendors using pre-made Arm cores.

Arm is trying to plunder more market share in personal computers with the X2 CPU, which it hopes will allow more vendors to match—or even outdo—Apple's accomplishments with its A- and M-series chips.

Arm said the X2 core is targeted at the world's most advanced 5-nm and smaller nodes from TSMC and Samsung. When combined with the right components at the system level of the SoC, the X2 brings up to 30% gains in single-threaded performance over chips used in the latest flagship Android smartphones, said Paul Williamson, senior vice president and general manager of the client business at Arm, in a blog.

The CPU’s improvements will intensify pressure on Intel in the personal computer area. Qualcomm has rolled out a family of Arm-based PC chips as part of a long-term play to challenge Intel’s lead in laptops that run on the Windows operating system. Microsoft placed Arm-based CPUs that it co-developed with Qualcomm in its Surface Pro X laptop, including the "SQ1" that debuted in 2019 and the "SQ2" last year.

When it was introduced to complement the A78 last year, the X1 represented a completely new class of Arm CPUs based on a philosophy of performance at all costs. While the Cortex-A series of CPUs used in most of the world's smartphones continued Arm's strategy of striking the best balance of performance and power in a constrained area, the X1 sacrifices some area and power efficiency to get faster speeds.

The X1 core was used previously by Qualcomm in its Snapdragon 888 chip for high-end smartphones and Samsung Electronics in the Exynos 2100 processor at the heart of its Galaxy S21 5G smartphone.

While it does not compete directly against Intel and AMD, Arm said the X2 core will allow its customers to craft more advanced SoCs for smartphones and laptops. Last year, the company launched its Cortex-X Custom program, where its engineering department agrees to work closely with silicon partners such as Qualcomm to create a CPU based on Cortex-X cores that are tailored to their specific requirements.

Most of the smartphones with Arm CPUs inside today have cores arranged in a big.LITTLE architecture. They contain groups of large, high-performance (but power-hungry) CPU cores and clans of smaller, less powerful (but more energy-efficient) cores to bolster battery life. The operating system engages the right CPU in the cluster to run user applications, balancing the need for computing power against long battery life.

Many of the smartphone chips now in production use Arm's Cortex-A78 core as the powerhouse of the CPU, and the Cortex-A55 as the “little” cores. But late last month, Arm introduced the Cortex-X2 and A710 to replace the A78 and Cortex-A510 to swap out the A55. Binding them all together on the silicon die are the CoreLink CI-700 coherent interconnect and CoreLink N1-700 network-on-chip interconnect, also launched last month.

The A710 and A510 are based on the Armv9 architecture, giving them the same security improvements as its X2 CPU core, including internal cryptography acceleration and memory tagging extensions (MTE).

Akash Jani, a semiconductor analyst with The Linley Group, said that the Cortex-X2, Cortex-A710, and Cortex-A510 deliver “impressive double-digit performance gains" at the expense of some added power consumption and die area. He said that Arm started supplying the blueprints to lead customers before the end of last year and he anticipates chips based on the CPU cores to enter production in early 2022.

Even though its latest generation of CPUs—code-named “Matterhorn”—supply faster speeds and higher power efficiency, Arm is unable to get those gains without improvements to the interconnects that bind them together on a silicon die. Using its latest dynamic shared unit (DSU)—the DSU-110—and CoreLink interconnects, Arm's clients can roll out different configurations of Armv9-A CPUs for different markets.

According to Arm, its suite of interconnects allows its customers to build CPUs with a maximum of eight Cortex-X2 cores, 1 MB of L2 cache, and 16 MB of L3 cache on a single slab of silicon. That arrangement closes the gap with other chips used in personal computers, promising up to 40% more single-threaded performance compared with Intel's i5-1135G7 clocked at 3.5 GHz used in laptops released in 2020.

The DSU-100 supports a wide range of different CPU cluster configurations for different end markets. Other possible combinations include four X2 cores and four A710s for laptops; a single X2, three A710s, and four A510s for premium 5G smartphones; two A710s and six A510s to be used in smart speakers and televisions; and four A510 cores for chips slapped on smartwatches and other wearable devices.

“The Cortex-X series is designed to maximize performance on single-threaded and 'bursty' workloads," explained Aditya Bedi, director of product management at Arm. "The pipeline in the microarchitecture is structured and provisioned to push IPC performance improvements." He added, "The Cortex-A700 series is prioritized for sustained processor workloads, with the best balance of efficiency and performance.”

Arm improved the branch prediction unit in the X2, one of the fundamental building blocks of modern CPUs. These blocks are used to predict the most likely result of a computation ahead of time to speed up performance. Arm also decoupled the branch prediction unit from the instruction fetcher in the CPU so it can run ahead faster. That makes the X2 less likely to make incorrect guesses, bringing better performance on a wide range of workloads.

Arm also upgraded the instruction pipeline in the CPU, making it possible to execute more instructions in a shorter time span, giving it better performance and power efficiency. According to Arm, it reduced the number of clock cycles the CPU needs to run instructions from 11 to 10. The performance gains on a single instruction add up across the millions of operations the CPU runs through every second.

Moreover, Arm said it improved the prefetcher in the X2, which loads instructions and other data into memory caches before they are executed. The company also enlarged the out-of-order execution window, allowing the CPU to carry out instructions as soon as they are ready in order to reduce stalls that can hamper performance. The reorder buffer in the CPU was also increased by 30%.

The X2 core can support 512 kB or 1 MB of L2 memory cache depending on the demands of specific vendors. Arm said that it can be scaled up to clusters of eight CPU cores and up to 16 MB of L3 cache.

About the Author

James Morra | Senior Editor

James Morra is a senior editor for Electronic Design, covering the semiconductor industry and new technology trends, with a focus on power management. He also reports on the business behind electrical engineering, including the electronics supply chain. He joined Electronic Design in 2015 and is based in Chicago, Illinois.

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!