[Design View / Design Solution]
Innovate For Low Power In A High-Performance FPGA
Overcome static and dynamic power consumption challenges by employing novel power-reduction techniques.
Paul Ekas
ED Online ID #16249
August 16, 2007
Copyright © 2006 Penton Media, Inc., All rights reserved. Printing of this document is for personal use only.
Reprints
Traditionally, digital logic has not consumed significant static
power, but this has changed dramatically as process nodes shrink.
Leakage current in digital logic is
now the primary challenge for
FPGAs as process geometries
decrease. If power-reduction
strategies are not employed, power consumption becomes a critical
issue as static power can increase
dramatically at the 65-nm process
node. Static power consumption
rises largely because of increases
in various sources of leakage current (Fig. 1).
Power consumption is composed of static and dynamic power. Static power is the power consumed
by an FPGA when it's programmed with a
programmer object file (.pof), but no
clocks are operating. Both digital and
analog logic consume static power. In an
analog system, static power primarily
consists of the quiescent current of the
analog circuit based on its interface (Fig.
2 and the table).
Dynamic power is the added power
consumed when the device is operating,
which is caused by toggling signals and
charging and discharging capacitive
loads. The main variables affecting
dynamic power are capacitance charging, the supply voltage, and the clock frequency (Fig. 3).
Dynamic power decreases with
Moore's Law by taking advantage of
process node shrinks to reduce capacitance and voltage. The challenge is
when more circuits are implemented
with each process shrink and the maximum clock frequency increases. While
the power reduction declines for an
equivalent circuit from process node to process node, the FPGA capacity keeps
doubling and the maximum clock frequency keeps increasing.
FPGA ARCHITECTURE
Advances in
architecture, process technology, and circuit techniques help attack these power
challenges. One such example is Altera's
Stratix III FPGA.
The company's Programmable Power
Technology helps reduce power in high-end FPGAs. Traditionally, all high-performance FPGAs are implemented with a high-performance fabric, where every logic
element (LE) provides the maximum performance with a subsequent high leakage power.
Programmable Power Technology takes advantage of the
fact that most circuits in a
design have excess slack and
therefore don't require the highest performance logic. Figure 4 shows a typical excess slack
histogram, where the majority
of the paths (on the left) have
slack and only a few critical
paths (on the right) need the
highest performance logic to
meet timing requirements.
With Programmable Power Technology, the logic fabric of Stratix III can be
programmed at the logic-array-block
(LAB) level by providing high-speed logic
or low-power logic, depending on which
is required by the specific logic path (Fig. 5). In this way, the small percentage of
timing-critical circuits is "selected" to the
high-speed setting, with the remainder
using the low-power setting, resulting in
a 70% decrease in leakage power for the
low-power logic. Placing unused logic, as
well as DSP blocks and TriMatrix memory into the low-power modes, further
decreases power.
SELECTABLE CORE VOLTAGE
Selectable core voltage lets designers
use a 0.9- or 1.1-V core voltage based on
performance requirements. The 0.9-V
core voltage provides the overall minimum dynamic and leakage power, while
the 1.1-V core voltage delivers the overall highest performance. Dynamic power
scales with the square of core voltage, while static power scales by the power of
2.5 of core voltage.
The selectable core voltage input can be set to 0.9 V or 1.1 V during board
design. This core voltage supplies all of
the LABs, memories, and DSP functions
in the core fabric. The selectable core
voltage affects the fabric performance,
so when a device and speed grade are
selected in the software, a core voltage
selection is also required. The software
uses timing and power models specific
to the selected core voltage to implement all timing-dependent and power-dependent analysis and optimization.
When choosing which core voltage to
use, a designer must consider the system performance requirements reported
from the timing analysis. If a system's
performance requirements can be met
with 0.9 V, they always produce lower
power than when using 1.1 V.
MERGING TECHNOLOGIES
Combining Programmable Power Technology
and selectable core voltage delivers various performance and power operating
points that achieve over 50% power
reduction at 1.1 V (Fig. 6). Static power
varies considerably depending on the utilization of the various resources, such as
DSP blocks and TriMatrix memory blocks.
The combined static and dynamic
power varies across combinations of
core voltage and percentage of high-speed versus low-power logic. In most
designs, where the maximum performance of the FPGA is not required, the
total power of a design can be reduced
by as much as 50% or more.
PROCESS AND CIRCUIT TECHNOLOGY
The semiconductor industry constantly battles the evolving challenges of small process dimensions
through huge investments in equipment,
process technologies, design tools, and
circuit techniques. In particular, the challenge of increasing leakage power with
small process geometries is felt across
the industry. Thus, many well-known
technologies at the 65-nm process node
(and prior) are used to maintain or
increase performance while managing
leakage power:
- Copper routing
- Low-k dielectric
- Multi-threshold transistors
- Variable gate-length transistors
- Triple gate oxide
- Super-thin gate oxide
- Strained silicon
LOWEST POWER, HIGHEST PERFORMANCE
To attain high efficiency and performance, Stratix III FPGAs
leverage an adaptive-logic-module
(ALM) logic architecture and a MultiTrack interconnect fabric. This combination allows more logic to be packed with
less routing.
ALM technology, which is said to implement 80% more logic functions than other architectures, includes an eight-input
fracturable lookup table (LUT), two 2-bit
adders, and two registers.
MultiTrack interconnect provides onehop interconnectivity between different
LABs and can be measured by the number of "hops" required to get from one
LAB to another. Adding interconnect
hops increases capacitance; the fewer
the hops, the less high-speed logic is
required to meet performance. MultiTrack interconnect provides one-hop
interconnectivity that yields the lowest
possible power (Fig. 7).
Hierarchical clocking is used in the
Stratix III FPGAs to support up to 360
unique clocks. The propagation of every
clock network can be controlled down to
a LAB level. Logic with common clocks is
grouped into LABs. Clocks are only propagated where the logic uses that clock.
All other clock signals are shut down to
minimize power consumption.
MEMORY INTERFACE POWER
SAVINGS
Double-data-rate (DDR) memory interfaces are one of the most
common I/O interfaces in designs today,
and they can be fairly power-hungry. To
combat those power issues, designers
can turn to dynamic on-chip termination
and DDR3.
When reading and writing to external
memory, it's vital to have an impedance-matched buffer, both in series and parallel termination. If there's a 50-Ω transition line when writing to memory, a
matched buffer with a series impedance
of 50 Ω is needed. When receiving data
from the memory, a 50-Ω parallel termination resistor pulled to a termination
voltage is desired. Not only is this used
for DDR-type interfaces, but also for
RLDRAM and QDRRAM.
By supporting dynamic on-chip termination, FPGA designers can turn the parallel
termination resistor to an on or off (open
circuit) state, depending on whether a
read or write is being executed. During a
write, the FPGA output driver impedance
must be matched to the transmission
line. However, the parallel resistor to VTT wastes energy and reduces signal swing. To avoid this, the resistor can be turned
off (Fig. 8).
During a read, the parallel resistor is
on to terminate the transmission line
to reduce reflections that degrade signal integrity and the ability to reliably
read data.
The significant benefits of dynamic onchip termination are realized whenever
the bus is either performing a write from
the FPGA or the bus is idle. First, power
is greatly reduced—1.6 W of static power
can be saved on a 72-bit DDR2 bus. In
addition, a pure series line termination is
achieved when writing. Finally, the need
for a large number of board termination
resistors is removed, saving board cost
and complexity.
DDR3 provides 30% lower power than
DDR2 because it runs at a lower voltage: 1.5 V versus 1.8 V. For example, a system with a 72-pin, 200-MHz or 400Mbit/s memory interface with on-chip
termination would dissipate 3.9 W for
only one memory interface. Using
dynamic on-chip termination (wherein
the parallel termination resistor is turned off when idle or when performing a write) can save 1.6 W. If both DDR3 and dynamic on-chip termination are used,
power drops to 1.6 W, saving a total of 2.3 W. These savings
are on a per interface basis (i.e., four memory interfaces in an
FPGA would save 9.2 W).
The move to very small process nodes—65-nm and below—
delivers the expected Moore's Law benefits of increased density and performance. However, the boost in performance results
in huge increases in power consumption, introducing the risk
of consuming unacceptable amounts of power.
If power-reduction strategies aren't used, static power consumption will increase significantly. Also, without a specific power optimization effort, dynamic power consumption rises due to
the increased logic capacity and higher switching frequencies.
Overcoming these power challenges with an enabling and innovative architecture, combined with process technology and circuit
techniques advances, provides an efficient and scalable solution
for today's increasingly complex FPGA-based designs.
|