[Leapfrog: First Look]
Judicious Clocking Subdues Power-Architecture Cooling Needs
This 2-GHz, dual-core processor uses thousands of gated clocks to cut power requirements by more than a factor of three.
Keeping it cool is imperative in all kinds of applications. That's why designers are increasingly turning to the 64-bit Power architecture, which always sits at the top of the list in power/performance ratios. Following that trend, PA Semi uses the single- or dual-core PWRficient PA6T-1628M and takes the ratio more than a few steps further, consuming a mere 13 W on average for a 2-GHz processor (Fig. 1).
The PA6T core's very fine-grain clock gating significantly reduces power requirements (Fig. 2). It takes more power to switch a flip-flop than it does to keep it stable. Likewise, a small percentage of flip-flops in a processor will change state at the same time. Minimizing the unnecessary changes reduces power requirements.
Most advanced processors already use clock gating at a high functional level. This improves efficiency, but even more efficiency is achievable by splitting blocks with a dedicated clock into smaller collections of transistors. PA Semi takes this almost to the extreme with over 25,000 blocks.
This fine-grain approach increases the number of gates to support clock gating, though the number is relatively small compared to the overall processor architecture. Now that the number of transistors is relatively unimportant, power and diagnostics can use techniques that add such overhead to improve overall system efficiency.
In this case, the approach and payoff are considerable. Coarse clock gating often can reduce power requirements by 40% (Fig. 3). Of course, the fine-grain design upgrade is better at its maximum, while the average even trumps that number.
The amount of power the system requires will vary depending on the program being executed, which is why it's important to know the limits. PA Semi's designers put together worst-case tests, quaintly named a "thermal virus," to see how well or how poorly the new design would work. Even here, the results were significantly lower than a coarse-grain approach.
The fine-grain approach isn't easily applied to existing designs. PA Semi used a number of techniques to generate gated clock blocks, such as augmented register and logic definitions that incorporate the gated clock architecture. The process also required routing a larger number of clock and control signals throughout the chip.
Instead of significant alterations to the design process, the process required a greater awareness of the design approach. The approach partitions the power plane so voltages can be optimized per region. It isn't just a matter of using new design blocks.
PERFORMANCE STILL MATTERS PA Semi didn't slow down the clocks or skimp on peripherals for its first chip (Fig. 4). The level 1 and 2 caches, as well as the speed, are on par with other Power architecture chips.
The dual DDR2 memory controllers provide access to off-chip memory and deliver it across the Conexium Interchange, a high-speed on-chip switch with a 64-Mbyte/s peak data rate and up to 1 Gtransaction/s. The controllers use active and pre-charge standby to reduce power consumption.
The processor is compatible with the Power architecture, including support for virtualization. It's a superscalar, out-of-order design with a strongly ordered memory model and minimized use of content-addressable memories (CAMs). It supports a host of power-down modes as well.
The SMbus and UART interfaces are low-speed compared to the 24 high-speed serializer-deserializer (SERDES) units in the Envio intelligent I/O subsystem. These SERDES include 8 PCI Express ports supporting one to 16 lanes and dual 10-Gbit Ethernet interfaces, as well as quad 1-Gbit Ethernet interfaces. The SERDES won't handle all of these interfaces, but they will support various combinations that may use one or more SERDES.
The subsystem also incorporates offload engines that support RAID, TCP/IP, and encryption, including AES, DES, DES3, ARC4, Kusumi plus SHA-1, SHA-256, and MD5 hashing. Also, on-chip trace support augments the JTAG debugging. There are trace buffers for transactions on the Conexium Interchange and the Envio peripherals.
Please refresh the page if you have trouble reading this text.
Search Electronic Design
Email Newsletter
Sponsored By:
The Find Power Products monthly newsletter brings you the most important new developments within the world of power design. The newsletter includes exerpts from industry leader Sam Davis's exclusive blog, as well as overviews of the latest new products.
Enter Email to Subscribe
Web Seminar
Sponsored By:
Title: Exploring How Good GUIs Drive Adoption in the Digital Power Management Space