Easing Design of Heterogeneous Clusters
The increasingly super-fast, super-complex, single-core solution has given way to multicore architectures as chip designers have bumped into electrical and power limitations associated with increased clock rates. A single core may be easier to program, but many applications benefit from multiple cores—such as video processing in applications like advanced driver assistance systems (ADAS) in automotive settings. In this case, it is better to have more cores available.
Imagination Technologies' new 64-bit I6500 Warrior core will be finding a home many multicore solutions, ranging from compact SoCs to very large, heterogeneous clusters (Fig. 1). These cores can include IO coherent units (IOCU) and up to six I6500 CPU cores. All cores support up to 31 secure execution domains that implement Imagination's OmniShield IO virtualization, which allows software reconfiguration of a domain including—support for a virtualized global interrupt controller (GIC).
The cluster architecture is based around an eight-core cluster node with a common coherency manager, the latter of which includes a low-latency L2 cache with up to 8 Mybtes of storage and four non-coherent AXI ports designed for low latency peripherals. The cores can include up to six CPU cores. The IOCUs can be linked to application specific features.
The CPU supports Simultaneous Multi-Threading (SMT) with up to four threads per CPU core (Fig. 2). The threads feed a pair of execution pipelines. The L1 CPU cache can be utilized as scratch pad RAM (SPRAM) for deterministic application operation. The system utilizes a 256-bit memory bus.
2. The I6500 supports Simultaneous Multi-Threading (SMT) with up to four threads per CPU core.
Designers can customize the CPU core by selecting features like the maximum number of threads, the L1 cache and SPRAM sizes, and support for SIMD and floating point instructions. The operating frequency and voltages can be adjusted at runtime.
The cluster nodes are linked using the ACE coherent fabric. This fabric also supports Imagination's PowerVR GPUs. These GPUs can share the same memory space as the CPUs, reducing copying and coherency issues. This approach allows designers to provide a heterogeneous configuration outside the cluster node as well as inside it.
The OmniShield IO virtualization mentioned earlier provides significant advantages when it comes to security. The isolation of a domain is managed in hardware down to the peripheral level, even though a core might be running up to four threads from different domains. There is no additional overhead if four threads will be sufficient, compared to a time-sliced approach where a hypervisor handles the switching (Fig. 3).
The I6500 is designed for high performance applications. Although it is possible to have a single-core CPU it is more likely to show up in platforms that incorporate at least a single cluster that can run 24 threads simultaneously. A 64 cluster system can support up to 1536 threads and at least 128 IOCUs. Of course, configurations for applications like ADAS may be optimized with a different mix for CPU cores.