The demands of multimedia are
pushing hardware to extremes,
requiring advanced architectures
and support for multimedia
single-instruction, multiple-data
(SIMD) instructions. DSP and graphics
support also are part of the mix. Yet
ARM's Cortex-A9 and MIPS32's 74K 32-
bit cores both break the 1-GHz barrier.
Chips based on these architectures
will wind up in high-volume applications
such as residential gateways with Voice
over IP (VoIP) support, digital TV applications
like set-top boxes, gaming, and
automotive infotainment.
Arm Cortex-A9 Core
The Cortex-
A9 targets the top end of ARM's
product line (Fig. 1). It can utilize ARM's
multicore architecture, which has been
used with the ARM11 (see "ARMv7
Makes A Move To Multicore" at
www.electronicdesign.com, ED Online 16156). Up to four Cortex-
A9 cores can be used with this approach.
The core supports the Thumb-2 instruction set and the
Jazelle Java hardware acceleration implementation. As with
most core designs, custom instructions can be added. Standard
options such as SIMD support can be added as well. In
this case, ARM's Neon advanced SIMD support takes advantage
of the DSP enhancements available in the Cortex-A9, as
it's often combined with the Mali graphics processing unit.
The Cortex-A9 architecture is designed to maximize instruction
parallel execution. Its eight-stage pipeline can handle outof-
order instruction flow using a six-entry queue. The instruction
dispatch stage can forward up to four instructions per
clock cycle. Processing unit pipelines execute independently.
Two AMBA 3 AXI external bus interfaces are used to handle
this level of throughput. The use of ECC RAM indicates
the need for high reliability given the Cortex-A9's target
applications.
Debugging support is very important in a multicore solution.
ARM addresses this with its CoreSight debug and trace
capability, which spans the entire system-on-a-chip, including
multiple ARM processors, DSPs, and intelligent peripherals.
ARM has a range of PrimeCell components, such as the
interrupt and cache controllers, which can be combined to
form a system.
MIPS32 74K Core
MIPS has a similar complement of
peripherals that can be tied to its MIPS32 74K core, including
instruction extension with its CorExtend support (Fig. 2). The
MIPS32 74K uses a 17-stage pipeline, also with out-of-order
dispatch. Likewise, its multiple execution units operate in parallel. Its stall-free ALU is linked into the DSP-style support of
the multiply/divide unit instead of having a completely separate
execution unit.
An advanced prediction unit enables all of this parallel, out-oforder
processing to occur. Most 32-bit solutions in this range,
like the Cortex-A9, implement this approach in some fashion. In
MIPS's case, though, the 74K uses dual, independent eight-entry
instruction queues. MIPS keeps three branch history tables to
handle prediction more efficiently. Also, a return stack is maintained
in hardware.
The MIPS architecture's inclusion of shadow registers permits
zero overhead context switching. The DSP support adds
three additional pairs of accumulator registers. Low-power
operation is accomplished using a range of approaches,
including the use of fine-grain clock gating. Each major block
can be clocked independently as well. For example, the dualpipeline,
asymmetric dual-issue floating-point unit can run at
its own clock frequency.
The MIPS JTAG debugging architecture provides cross-CPU
breakpoint support. The debug controller is chainable for multi-
CPU management. Virtualization support is key for the efficient
handling of virtual-machine managers (VMMs). VMM support
isn't exclusive to high-end 64-bit x86 platforms.