Keeping it cool is imperative in all
kinds of applications. That's why
designers are increasingly turning
to the 64-bit Power architecture, which always sits at the top of the list in
power/performance ratios. Following that
trend, PA Semi uses the single- or dual-core
PWRficient PA6T-1628M and takes the ratio
more than a few steps further, consuming a
mere 13 W on average for a 2-GHz processor ().
The PA6T core's very fine-grain clock gating significantly reduces power requirements (). It takes more power to switch
a flip-flop than it does to keep it stable. Likewise, a small percentage of flip-flops in a
processor will change state at the same
time. Minimizing the unnecessary changes
reduces power requirements.
Most advanced processors already use clock gating at a high functional level. This improves efficiency,
but even more efficiency is achievable by splitting blocks with a
dedicated clock into smaller collections of transistors. PA Semi takes this almost to the extreme with over
25,000 blocks.
This fine-grain approach increases the
number of gates to support clock gating,
though the number is relatively small compared to the overall processor architecture.
Now that the number of transistors is relatively unimportant, power and diagnostics can
use techniques that add such overhead to
improve overall system efficiency.
In this case, the approach and payoff are
considerable. Coarse clock gating often can
reduce power requirements by 40% ().
Of course, the fine-grain design upgrade is
better at its maximum, while the average
even trumps that number.
The amount of power the system requires
will vary depending on the program being
executed, which is why it's important to know
the limits. PA Semi's designers put together
worst-case tests, quaintly named a "thermal virus," to see how
well or how poorly the new design would work. Even here, the
results were significantly lower than a coarse-grain approach.
The fine-grain approach
isn't easily applied to existing
designs. PA Semi used a number of techniques to generate
gated clock blocks, such as
augmented register and logic
definitions that incorporate
the gated clock architecture.
The process also required
routing a larger number of
clock and control signals
throughout the chip.
Instead of significant alterations to the design process, the process required a greater
awareness of the design approach. The approach partitions the power plane so voltages can be
optimized per region. It isn't just a matter of using new design blocks.
PERFORMANCE STILL MATTERS
PA Semi didn't slow down the clocks or
skimp on peripherals for its first chip
(). The level 1 and 2 caches, as well
as the speed, are on par with other Power architecture chips.
The dual DDR2 memory controllers
provide access to off-chip memory and
deliver it across the Conexium Interchange, a high-speed on-chip switch with
a 64-Mbyte/s peak data rate and up to 1
Gtransaction/s. The controllers use
active and pre-charge standby to reduce
power consumption.
The processor is compatible with the
Power architecture, including support
for virtualization. It's a superscalar, out-of-order design with a strongly ordered
memory model and minimized use of
content-addressable memories (CAMs).
It supports a host of power-down modes
as well.
The SMbus and UART interfaces are
low-speed compared to the 24 high-speed serializer-deserializer (SERDES)
units in the Envio intelligent I/O subsystem. These SERDES include 8 PCI
Express ports supporting one to 16 lanes and dual 10-Gbit Ethernet interfaces, as well as quad 1-Gbit
Ethernet interfaces. The SERDES won't
handle all of these interfaces, but they
will support various combinations that
may use one or more SERDES.
The subsystem also incorporates
offload engines that support RAID,
TCP/IP, and encryption, including AES,
DES, DES3, ARC4, Kusumi plus SHA-1,
SHA-256, and MD5 hashing. Also, on-chip trace support augments the JTAG
debugging. There are trace buffers for
transactions on the Conexium Interchange and the Envio peripherals.