Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Technology Report]
Advanced VLIW Architectures Unleash Raw DSP Horsepower
A new wave of DSPs boasts a tenfold improvement in signal processing while slashing power to a new low.

Ashok Bindra  |   ED Online ID #3465  |   May 15, 2000


Emerging broadband wireless basestations and handheld phone services, as well as other consumer multimedia systems, are demanding more processing horsepower from programmable DSPs. Simultaneously, the power-consumption and operating-voltage requirements for these applications are dropping.

Serving the insatiable appetite of these forthcoming systems—where voice, video, audio, and data all are converging—requires a dramatic improvement in performance. The present computational levels of a few hundreds of millions of instructions per second (MIPS) or hundreds of millions of floating-point operations per second (MFLOPS) aren't adequate anymore. Future applications will need an order-of-magnitude improvement in performance. It's not surprising, then, that designers are calling for several billions of instructions per second (BIPS) and billions of floating-point operations per second (GFLOPS) from a single DSP engine.

Toward that goal, major DSP suppliers have released a wave of DSPs that signals a new era in performance. They've accomplished this by substantially revamping their existing very-long-instruction-word (VLIW) cores and crafting variations of superscalar structures. Some companies have even combined the best of the VLIW and superscalar worlds to push the performance bar to the next level. While these advanced VLIW or highly parallel superscalar DSP architectures promise to deliver better than a tenfold improvement in processing, power consumption also has been reduced to a record low.

Interestingly, as these DSPs proliferate into a wide range of applications, the market pie is getting fatter and the competition is getting stiffer. Companies are looking at a market for programmable DSPs of all kinds that will hit $6 billion this year and surge at an average annual growth rate of over 34% in the future, according to market analyst Will Strauss of Forward Concepts in Tempe, Ariz. Also, more newcomers are throwing their hat into the ring. As time-to-market becomes a critical factor in this race, the traditional and new players alike will support their architectures with efficient high-level-language C compilers and integrated development environments.

Indeed, the architectures are tailored to be compiler friendly, as compilers are tweaked to tap every register on the chip. The new architectures are backward-compatible as well. Developers can reuse valuable software code and engineering, accelerate the development time, and cut overall system cost.

Leveraging its advanced VLIW architecture, Texas Instruments Inc. has revamped its VelociTI platform to create a new 16-bit fixed-point DSP core known as the C64x. Offering a tenfold improvement over the flagship C62x DSP core, the VelociTI.2-based C64x boasts a clocking speed of up to 1.1 GHz and processing performance near 9 BIPS.

That kind of processing prowess is being aimed at third-generation (3G) wireless basestations and xDSL modems. For feature-rich portable and personal products, TI has released a superset of the popular 16-bit C54x integer core. The C55X dual-MAC-based core is tailored for ultra-low power consumption while doubling the number of instructions per clock cycle. With the ability to clock at 400 MHz, it can deliver performance up to 800 MIPS (Fig. 1). By comparison, the previous-generation C54x runs at 200 MHz.

The new C55x core is architected to cut power consumption down to 0.05 mW/MIPS, which is six times lower than its predecessor. Also, it offers a scalable word length that reduces code size by 30% for optimal memory use. It emphasizes the power efficiency that will be needed in forthcoming feature-rich next-generation wireless handsets, which intend to "roll voice, data, and streaming video into one single product," says Mark Mattson, marketing manager for TI's C5000 platform. "Since it is backward-code-compatible with the C54x, it will provide an easy upgrade path for builders of next-generation cellular phones and other consumer devices like digital audio players and digital cameras."

Advanced power-management techniques implemented on chip automatically power down inactive peripherals, memory, and core functional units tominimize consumption and maximize power efficiency. Designers can customize the power management to their specific application via user-configurable idle domains. This feature gives the designer up to 64 configurable combinations of power management for the CPU, cache, peripherals, DMA controller, clock generator, and the external memory interface. The C55x core also now features wider bus widths and more buses to obtain much higher data throughput on and off the chip.

To accomplish faster data reads and writes, the core incorporates three data read buses, two data write buses, a 32-bit program bus, and six 24-bit address buses. Unlike the C54x, which uses a 16-bit external memory interface bus, the C55x employs a 32-bit version to speed up the data flow. It also provides a number of memory options, such as synchronous burst RAM, synchronous DRAM, ROM, and flash.

Likewise, the C64x DSP packs more features than the previous-generation C62x. It includes twice as many on-chip registers, level-2 cache, ten special-purpose instructions to enhance parallelism, multiple data types to perform more operations per clock cycle, improved orthogonality, 25% code-size reduction, and clever logic techniques to boost speed without penalizing power.

While the extensions support quad 8-bit and dual 16-bit operations, the wider 64-bit load/store data paths produce much higher throughput. The C64x core offers two complete sets of compute resources. Each set comprises four units, labeled L, D, S, and M. The L, D, and S units conduct basic integer arithmetic operations. The M unit performs multiple 16- or 8-bit multiplications, Galois multiplications, and special operations like bit shuffling, shifting, and rotations. The ten special-purpose instructions accelerate key tasks within digital communications, imaging, and video applications. Some of these instructions simplify computations in error-correction codes. Others improve motion-estimation algorithms and data density.


<-- prev. page     [1] 2 3 4     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?



POST YOUR COMMENTS HERE
Name:

Email:
Your Comments:

Enter the text from the image below


Please refresh the page if you have trouble reading this text.

Search Electronic Design
     
  
 
Web Seminar
Sponsored By:
Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
Speakers: 
Date: 07/01/08
Register: 

Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources