Scalable Vector Extensions Expand the ARMv8-A's Scope


ARM’s Scalable Vector Extensions

ARM’s Scalable Vector Extensions (SVE) for the ARMv8-A architecture (see figure) expands ARM’s scope to supercomputing, but it will also have a significant impact on high-performance embedded computing (HPEC) systems. It has been half-a-dozen years since Intel released its SSEx (Streaming SIMD Extensions). In the meantime, vendors with chips based on the Power architecture let support for AltiVec SIMD (single instruction multiple data) instructions flounder. This, along with other changes, allowed Intel’s Xeon to gain the high ground in HPEC.

AltiVec has made a resurgence, but the Power is no longer the platform of choice for high-end embedded systems where it was once dominant, with the Xeon taking the lead. ARM’s rise in general has included adoption in the HPEC as well as the military and avionic embedded systems. ARMv8-A SVE will help improve its market share, but it will take many years for this change to make a dent in Intel’s dominance. Simply getting SVE in to the queue of production chips will take years and those environments require significant investments in time, money, and certifications. Fujitsu plans on using ARMv8-A SVE in silicon that will be used in the RIKEN Post-K supercomputer project, which is scheduled for deployment in 2020.

ARM has included its NEON SIMD support in the ARMv8-A architecture, but this is akin to Intel’s SSEx (Streaming SIMD Extensions). While useful, these are not in the same category as SVE, which is designed to handle 128- to 2,048-bit data. Intel’s AVX initially handled 256-bit data and was designed to handle 512- and 1,024-bit registers. AVX-512 is supported by the latest Intel Skylake Xeon and Xeon Phi processors. The scope of the ARM implementations will depend upon the vendors, since ARM does not create its own chips.

SVE supports a vector-length agnostic (VLA) programming model. The instructions adjust to handle the length of a vector versus fixed length vectors, avoiding the need to rewrite code if future changes occur in the size of vectors within an application.

SVE supports a range of optimizations, such as gather-load and scatter-store that help with vectorization of non-linear data structures—a common occurrence in high-performance computing (HPC). The per-lane predication support allows vectorization of nested control code containing side effects. It also addresses avoidance of loop heads and tails. The predicate-driven loop control and management helps reduce vectorization overhead compared to scalar code.

The new technology also supports vector partitioning and software managed speculation. This allows vectorization of uncounted loops that have data-dependent exits when entire vector does not have processed. There are extended integer and floating-point horizontal reductions so vectorization can be applied to more types of reducible loop-carried dependencies. Finally, scalarized intra-vector sub-loops allows vectorization of loops containing complex loop-carried dependencies.

ARM’s SVE targets only the 64-bit instruction set (A64). A64 uses fixed 32-bit instructions. SVE uses 25% of the remaining A64 instructions. Three sixteens remain for future A64 enhancements.

SVE will have a major impact on exascale computing in application areas such as pharmaceutical research, quantum physics, and fluid dynamics, as well as addressing areas like weather and geological simulation and analysis. It will also be used for applications such as machine vision and learning. 

Please or Register to post comments.

What's alt.embedded?

Blogs focusing on embedded, software and systems


William Wong

Bill Wong covers Digital, Embedded, Systems and Software topics at Electronic Design. He writes a number of columns, including Lab Bench and alt.embedded, plus Bill's Workbench hands-on column....
Commentaries and Blogs
Guest Blogs
Jan 26, 2017

An Amateur’s View on the P2 (Part 2): Slew Rate and the Oscillator 3

Justin Mamaradlo takes a further look into the P2 op amp and how it functions, analyzing the oscillation and slew-rate characteristics of the venerable component....More
Jul 15, 2016

Simple Yet Effective ESD Testing Methods for Higher Reliability 11

There are multiple ways to test for electrostatic discharge, ranging from implementing a human-body or machine model to....using a balloon and a comb?...More
Apr 8, 2016

Confabbing on the Fabless Fad 5

High capital and maintenance costs, and EDA advances along with abstractions to deal with chip complexity, have been leading contributors to the fabless migration....More

Sponsored Introduction Continue on to (or wait seconds) ×