Download this article in .PDF format
This file type includes high resolution graphics and schematics when applicable.
AMD’s Heterogeneous System Architecture (HSA) is the floorplan for its next generation of accelerated processing units (APUs), which combine the CPU and GPU into a common memory environment (see “Unified CPU/GPU Memory Architecture Raises The Performance Bar”) via a cache-coherent shared virtual memory (CC-SVM). The original APU combined CPU and GPU cores but maintained distinct memory for each type of core (see “APU Blends Quad Core x86 With 384-Core GPU”).
Related Articles
- Unified CPU/GPU Memory Architecture Raises The Performance Bar 2
- APU Blends Quad Core x86 With 384 Core GPU
- Embedded GizmoSphere APU Delivers Over 52 GFLOPS 4
- Low Power, Single-Chip APU Delivers High Performance
OpenCL 2.0 addresses CPU and GPU parallel processing environments (see “OpenCL 2.0, OpenGL 4.4 Officially Released”). Normally, the environment has a unique address space for the GPU. This model is also used for some FPGA-based OpenCL environments (see “How To Put OpenCL Into An FPGA”), but HSA is different because it has a unified memory environment. The Heterogeneous System Architecture Intermediate Language (HSAIL) and associated design environment were developed to take advantage of HSA. Frameworks like OpenCL can generate HSAIL that can run on a virtual machine that targets CPU/GPU cores.
HSAIL divides work into a grid hierarchy (Fig. 1). Like OpenCL, programmers define kernels that can be run in parallel on data. The big difference is that HSAIL essentially maps to an HSA-based virtual machine. The HSA finalizer is akin to the JIT (just-in-time) compilation for a Java virtual machine (JVM).
The HSAIL virtual machine consists of at least one host CPU and an HSA component. The Architected Queue Language (AQL) links the two. The host generates and enqueues AQL packets. The packets incorporate kernels that are executed by the HSA component. A kernel defines a multidimensional cube-shaped grid with a work-group item per grid point. Jobs are dispatched as work groups. They require all data to be available.
AMD would like HSA to be a standard so HSAIL is open, but for now it will take advantage only of AMD’s HSA-based hardware. It may be wishful thinking that Intel would incorporate it, although an integrated CPU/GPU/memory environment has advantages and AMD and Intel have at least agreed upon the x86 instruction set. HSAIL could be applied to an ARM environment. It is interesting to note that ARM is one of the founding members of the HSA Foundation along with AMD, Samsung, Qualcomm, MediaTek, Imagination, and Texas Instruments.
Another aspect of HSAIL and HSA is Java support (Fig. 2). APARAPI (A PARallel API) is one way for Java to support parallel programming environments. It typically maps to OpenCL, but it could target an HSAIL finalizer.
APARAPI eventually may be replaced by OpenJDK’s Project Sumatra, which brings native parallel programming support to Java. Project Sumatra also could target HSAIL directly. Oracle and AMD are involved with Project Sumatra, so this combination may wind up in production. Support is targeted for Java 9.
Developers can leverage the HSA architecture using the BOLT library from AMD. The C++ template library inspired BOLT, which also has been targeted at OpenCL and C++AMP.
OpenCL and Project Sumatra will remain the primary programming environments for programmers, but HSA can provide a better infrastructure. The unified memory architecture eliminates unnecessary copy operations since pointers can be shared between CPU and GPU cores. The approach also has a lower dispatch overhead.
For now, AMD’s hardware will be driving HSAIL development and the software that runs on top of it. In the future, it could be much more.