Start with a standard processor core design. Highlight
performance and power bottlenecks. Replace
key logic with advanced clock gated logic. Significantly
cut power requirements. Incorporate into
popular multimedia devices. Profit.
That’s Intrinsity’s plan. The company started with ARM’s
Cortex A8 architecture with a 13-stage, in-order, dual-issue,
superscalar microprocessor core and a global history-based
branch prediction system (Fig. 1). It incorporates a 10-stage
Neon media pipeline designed to accelerate media codecs
such as H.264 and MP3.
The core is ARMv7-compliant, including Thumb-2 support
and Jazelle RCT (runtime compilation target) Java-acceleration
technology designed to optimize Just In Time (JIT) and
Dynamic Adaptive Compilation (DAC) support. It also supports
ARM’s TrustZone technology for secure transactions
and Digital Rights Management (DRM).
The Cortex A8 is already designed to be a low-power platform.
By identifying critical spots in the design and replacing
them with dynamic logic, though, it was possible for Intrinsity’s
designers to increase performance while reducing power
requirements (Fig. 2).
Gate delays are 1/4 clock cycle, and overlapping clocks
allow delays to be borrowed from adjacent phases. Intrinsity
used its proprietary Fast14 1-of-n domino logic (NDL)
technology. It’s possible to employ NDL in non-ARM designs
as well. Domino logic usually uses less space than conventional
CMOS logic. The parasitic capacitance is also lower.
It employs an inverting circuit/dynamic gate between each
stage. There is no fanout on the inverter, so it can be small
and fast.
PRECHARGE AND EVALUATION PHASES
The system operates in two phases: precharge and evaluation.
It effectively operates like a latch between stages.
Multiple states within the overall circuit increase overall bandwidth.
Timing is critical since the charge/eval cycle dynamic
operates more like dynamic memory, unlike latched stages.
NDL is just one of a number of optimizations that Intrinsity
employs in the design. Power gating, custom static logic
and memory, and short wire floor planning are a few others.
These design approaches are built into Intrinsity’s design flow
toolset, allowing the approach to be applied to almost any
design. For example, highly automated Vt and cell selection
flows permit selection of the best gates for speed while balancing
power usage.
The Cortex A8 is available in 65 nm with a low-power (LP),
low-leakage version that runs up to 650 MHz. The generalpurpose
(GP) version cracks the 1-GHz/2000 DMIPS mark,
but Intrinsity’s Hummingbird achieves this using the LP
process while drawing under 0.75 mW/MHz. Multi-VDD and
multi-frequency design methodology enable the chip to run
at high speed even at the minimum supply voltage of 1.0 V.
Samsung chips based on the design will use this platform in
a range of portable multimedia applications.
BILL WONG
ARM
www.arm.com
INTRINSITY
www.intrinsity.com
SAMSUNG
www.samsung.com