Thanks to advances in process technology and architecture, suppliers of FPGAs and complex programmable logic devices (PLDs) have created devices with megagate complexities, on-board CPUs, and other integrated features that rival what full ASIC providers can deliver.
The availability of low-resistance multilevel copper metallization solves the interconnect limitations faced by previous generations of high-gate-count devices. The migration down to 0.18- and 0.13-µm design features enables many more gates and memory bits to be integrated on a chip. Smaller design features also let the circuits operate at higher clock rates and lower voltage levels. So, the programmable chips can deliver high performance at low power levels, comparable to what many full ASIC solutions can achieve.
A key reason for using programmable logic is the rapid time-to-market it offers. Today's FPGAs can truly supply designers with a system-on-a-chip (SoC) solution that can be delivered in weeks rather than six to 12 months. But that's not always the case. Programmable-logic designs with a million gates or more can take a considerable amount of time to develop.
To reduce this time, FPGA suppliers are creating and licensing blocks of intellectual property (IP) that can be merged into your circuit design. These blocks range from simple elements such as counters and de-coders to complex functions like SRAM, PCI bus interfaces, and microprocessor and DSP cores.
The ability to pre-integrate CPU or DSP cores, large blocks of RAM, and other complex functions allows the FPGA suppliers to provide performance levels (higher clock frequencies) that the programmable logic elements wouldn't be able to achieve if the same function were implemented using the programmable elements.
Many of these cores are available in "soft" formats, a design file that can be integrated into your hardware-description-language (HDL) description of the circuit and then synthesized. A few that are speed-critical are available as "firm" cores, blocks in which the circuit interconnect has already been predefined.
In a some cases, cores such as blocks of SRAM, CPUs, analog phase-locked loops (PLLs), and high-speed serializer-deserializer (SERDES) I/O ports are pre-integrated on the FPGAs so the blocks can operate at their maximum clock rates. With the advent of 0.13-µm features, 32-bit CPUs can clock at 300 MHz, SERDES ports can handle 3.125 Gbits/s, and memory blocks can boast access times less than 3 ns.
To leverage the blocks of IP and the million-plus gates, FPGA suppliers have paid a lot more attention to design tools. Improved synthesis software, better timing-driven placement-and-routing tools, and more accurate verification software will help designers move concepts to silicon without the iterative loops previously encountered when trying to get timing or routing closure.
The Megagate Battle Intensifies: At the megagate and higher density levels, just three companies are duking it out in the market's high end (see the table). Altera and Xilinx are pulling no punches this year as they introduce next-generation SRAM-based families with multimegagate complexities. At the same time, Actel has pumped up the complexity of its flash-memory-based ProASIC series. Its new ProASIC Plus family of FPGAs will deliver a top complexity of 1 million gates and almost 200 kbits of embedded blocks of SRAM. Each supplier counts gates differently, though, making one-on-one comparisons extremely difficult.
The FPGA suppliers are basically battling it out in four areas to gain designers' favor. The arrays are vying with each other based on the number of available gates, the amount of embedded memory, the availability of embedded processor or other computer support blocks, and the amount and type of I/O pins and ports. In the SRAM-based FPGA arena, Altera and Xilinx have been trying to outdo each other, and system designers are benefiting from this product development frenzy.
For example, Altera's latest family of devices, the Stratix series, will supply up to 114,000 logic elements (each the equivalent of about 26 gates) for a total of about 3 million logic gates. In addition to the gate-level logic, these top-of-the-line devices will include 10 Mbits of dedicated SRAM, 28 support blocks for DSP operations, up to 12 PLLs, and many single-ended and differential I/O options.
Even as Altera introduces the Stratix family, the company is still developing new members for its previous APEX II and Excalibur series. It recently released the high-end member of the Excalibur series, which has a complexity of 38,400 logic elements (about 1 million gates). But in addition to the logic, the Excalibur series includes an embedded ARM9 hard core that can run at clock speeds of up to 200 MHz.
The Stratix arrays are based on a 1.5-V, 0.13-µm all-layer-copper process that delivers system performance levels approaching 250 MHz. The specialized DSP support blocks include complex multiplier-accumulators that deliver 2 gigamultiplication-accumulation operations (GMACs) per second (Fig. 1). Each DSP block can be configured to provide either eight 9- by 9-bit multipliers, four 18- by 18-bit multipliers, or one 36- by 36-bit multiplier. If all 28 DSP blocks are used, the largest Stratix family member's computational throughput exceeds 56 GMACs. Since the blocks don't consume other on-chip resources, the chip's logic portion can be used for control processors, other compute tasks, and many other functions.
Designers at Altera also developed a novel triple architecture approach to the on-chip SRAM. Dubbed TriMatrix memory, the method sets up a hierarchy of three memory types--a fine-grained group consisting of up to 1118 small memory blocks containing 512 bits each, a second group containing up to 520 blocks of 4 kbits each, and a group of up to 12 MegaRAM blocks that each contain 512 kbits.
What's more, Altera's designers pumped up the I/O capabilities, adding true low-voltage differential signaling (LVDS) I/O lines that can handle data transfers at 840 Mbits/s. The LVDS I/O cells have dedicated SERDES circuits as well as differential I/O buffers and data-realignment logic. Single-ended I/O lines include on-chip termination to reduce external component counts and simplify system design.